This app was built to detect text within an image using the Google Cloud Vision API, and give the user the ability to highlight particular words that were found. The front-end is built on React.js and the back-end is built on node.js.
A Google Cloud Platform (GCP) project is required for using the Google Cloud Vision API. Read about the setting up a Google Cloud Vision project. In my code project I have setup React.js to be built with parcel.js (a zero config web application bundler), and have adding configurations for babel, eslint, and prettier. For the back-end server which is running on node.js I am using express.js and @google-cloud/vision.
The express.js server is setup to process the image file from the multi-part form data it receives.
app.use('/', async (req, res) => { const { file } = req.body || {}; const data = file ? await processImage(file) : []; res.send(data); });
Then the image is sent to be processed by the Google Vision API
const vision = require('@google-cloud/vision'); const fs = require('fs'); const processImage = async imageFile => { const client = new vision.ImageAnnotatorClient(); const fileName = imageFile.path; const [result] = await client.textDetection(fileName); // cleanup uploaded file fs.unlinkSync(fileName); return result.textAnnotations; }; module.exports = processImage;
After the OCR module has processed the text within the image and validated it through language models/dictionarys it then returns the data in the following JSON structure:
{ "textAnnotations": [ { "locations": [], "properties": [], "mid": "", "locale": "en", "description": "sponge cakes topped with a ....", "score": 0, "confidence": 0, "topicality": 0, "boundingPoly": { "vertices": [ { "x": 10, "y": 35 }, { "x": 1532, "y": 35 }, { "x": 1532, "y": 1193 }, { "x": 10, "y": 1193 } ], "normalizedVertices": [] } }, { "locations": [], "properties": [], "mid": "", "locale": "", "description": "sponge", "score": 0, "confidence": 0, "topicality": 0, "boundingPoly": { "vertices": [ { "x": 24, "y": 61 }, { "x": 193, "y": 59 }, { "x": 193, "y": 93 }, { "x": 25, "y": 96 } ], "normalizedVertices": [] } } } }
The first item in the textAnnotation appears to all the text found in the image and after that is the individual words with their bounding poly vertices. The vertices are used to draw a surrounding box around the word if set to be highlighted.
The React.js app processes the uploaded image with the FileReader and Image APIs to render on the page in a canvas element. Multiple words can be specified to be highlighted within the image. The results returned from the express.js app are filtered down to the ones that match the words to find. Then for each word found the bounding poly vertices are used to render a box to highlight on the image.