Text Detection and OCR with Google Cloud Vision API

In this lesson, you will:

Learn how to obtain your Google Cloud Vision API keys/JSON configuration file from the Google cloud admin panel
Configure your development environment for use with the Google Cloud Vision API
Implement a Python script used to make requests to the Google Cloud Vision API

This lesson is the last part of a 3-part series on Text Detection and OCR:

Text Detection and OCR with Amazon Rekognition API
Text Detection and OCR with Microsoft Cognitive Services
Text Detection and OCR with Google Cloud Vision API (this tutorial)

To learn about text detection and OCR with Google Cloud Vision API, just keep reading.

Looking for the source code to this post?

Text Detection and OCR with Google Cloud Vision API

In today’s lesson, we will look at the Google Cloud Vision API. In terms of code, the Google Cloud Vision API is easy to use. Still, it requires that we use their admin panel to generate a client JavaScript Object Notation (JSON) file that contains all the necessary information to access the Vision API.

We have mixed feelings about the JSON file. On the one hand, it’s nice not to have to hardcode our private and public keys. But on the other hand, it’s cumbersome to have to use the admin panel to generate the JSON file itself.

Realistically, it’s a situation of “six of one, half a dozen of the other.” It doesn’t make that much of a difference (just something to be aware of).

And as we’ll find out, the Google Cloud Vision API, just like the others, tends to be quite accurate and does a good job OCR’ing complex images.

Let’s dive in!

Google Cloud Vision API for OCR

In the first part of this lesson, you’ll learn about the Google Cloud Vision API and how to obtain your API keys and generate your JSON configuration file for authentication with the API.

From there, we’ll be sure to have your development environment correctly configured with the required Python packages to interface with the Google Cloud Vision API.

We’ll then implement a Python script that takes an input image, packages it within an API request, and sends it to the Google Cloud Vision API for OCR.

We’ll wrap up this lesson with a discussion of our results.

Obtaining Your Google Cloud Vision API Keys

Prerequisite

A Google Cloud account with billing enabled is all you’ll need to use the Google Cloud Vision API. You can find the Google Cloud guide on how to modify your billing settings here.

Steps to Enable Google Cloud Vision API and Download Credentials

You can find our guide to getting your keys in our book, OCR with OpenCV, Tesseract, and Python.

Configuring Your Development Environment for the Google Cloud Vision API

To follow this guide, you need to have the OpenCV library and the google-cloud-vision Python package installed on your system.

Luckily, both are pip-installable:

$ pip install opencv-contrib-python
$ pip install --upgrade google-cloud-vision

If you are using a Python virtual environment or an Anaconda package manager, be sure to use the appropriate command to access your Python environment before running the above pip-install command. Otherwise, the google-cloud-vision package will be installed in your system Python rather than your Python environment.

If you need help configuring your development environment for OpenCV, we highly recommend that you read our pip install OpenCV guide — it will have you up and running in a matter of minutes.

Having Problems Configuring Your Development Environment?

**Figure 1:** Having trouble configuring your dev environment? Want access to pre-configured Jupyter Notebooks running on Google Colab? Be sure to join PyImageSearch University — you’ll be up and running with this tutorial in a matter of minutes.

All that said, are you:

Short on time?
Learning on your employer’s administratively locked system?
Wanting to skip the hassle of fighting with the command line, package managers, and virtual environments?
Ready to run the code right now on your Windows, macOS, or Linux system?

Then join PyImageSearch University today!

Gain access to Jupyter Notebooks for this tutorial and other PyImageSearch guides that are pre-configured to run on Google Colab’s ecosystem right in your web browser! No installation required.

And best of all, these Jupyter Notebooks will run on Windows, macOS, and Linux!

Project Structure

Start by accessing the “Downloads” section of this tutorial to retrieve the source code and example images.

Let’s inspect the project directory structure for our Google Cloud Vision API OCR project:

|-- images
|   |-- aircraft.png
|   |-- challenging.png
|   |-- street_signs.png
|-- client_id.json
|-- google_ocr.py

We will apply our google_ocr.py script to several examples in the images directory.

The client_id.json file provides all necessary credentials and authentication information. The google_ocr.py script will load this file and supply it to the Google Cloud Vision API to perform OCR.

Implementing the Google Cloud Vision API Script

With our project directory structure reviewed, we can move on to implementing google_ocr.py, the Python script responsible for:

Loading the contents of our client_id.json file
Connecting to the Google Cloud Vision API
Loading and submitting our input image to the API
Retrieving the text detection and OCR results
Drawing and displaying the OCR’d text to our screen

Let’s dive in:

# import the necessary packages
from google.oauth2 import service_account
from google.cloud import vision
import argparse
import cv2
import io

Lines 2-6 import our required Python packages. Note that we need the service_account to connect to the Google Cloud Vision API while the vision package contains the text_detection function responsible for OCR.

Next, we have draw_ocr_results, a convenience function used to annotate our output image:

def draw_ocr_results(image, text, rect, color=(0, 255, 0)):
	# unpacking the bounding box rectangle and draw a bounding box
	# surrounding the text along with the OCR'd text itself
	(startX, startY, endX, endY) = rect
	cv2.rectangle(image, (startX, startY), (endX, endY), color, 2)
	cv2.putText(image, text, (startX, startY - 10),
		cv2.FONT_HERSHEY_SIMPLEX, 0.8, color, 2)

	# return the output image
	return image

The draw_ocr_results function accepts four parameters:

image: The input image we are drawing on
text: The OCR’d text
rect: The bounding box coordinates of the text ROI
color: The color of the drawn bounding box and text

Line 11 unpacks the (x, y)-coordinates of our text ROI. We use these coordinates to draw a bounding box surrounding the text along with the OCR’d text itself (Lines 12-14).

We then return the image to the calling function.

Let’s examine our command line arguments:

# construct the argument parser and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-i", "--image", required=True,
	help="path to input image that we'll submit to Google Vision API")
ap.add_argument("-c", "--client", required=True,
	help="path to input client ID JSON configuration file")
args = vars(ap.parse_args())

We have two command line arguments here:

--image: The path to the input image that we’ll be submitting to the Google Cloud Vision API for OCR.
--client: The client ID JSON file containing our authentication information (be sure to follow the Obtaining Your Google Cloud Vision API Keys section to generate this JSON file).

It’s time to connect to the Google Cloud Vision API:

# create the client interface to access the Google Cloud Vision API
credentials = service_account.Credentials.from_service_account_file(
	filename=args["client"],
	scopes=["https://www.googleapis.com/auth/cloud-platform"])
client = vision.ImageAnnotatorClient(credentials=credentials)

# load the input image as a raw binary file (this file will be
# submitted to the Google Cloud Vision API)
with io.open(args["image"], "rb") as f:
	byteImage = f.read()

Lines 28-30 connect to the Google Cloud Vision API, supplying the --client path to the JSON authentication file on disk. Line 31 then creates our client for all image processing/computer vision operations.

We then load our input --image from disk as a byte array (byteImage) to submit it to Google Cloud Vision API.

Let’s submit our byteImage to the API now:

# create an image object from the binary file and then make a request
# to the Google Cloud Vision API to OCR the input image
print("[INFO] making request to Google Cloud Vision API...")
image = vision.Image(content=byteImage)
response = client.text_detection(image=image)

# check to see if there was an error when making a request to the API
if response.error.message:
	raise Exception(
		"{}\nFor more info on errors, check:\n"
		"https://cloud.google.com/apis/design/errors".format(
			response.error.message))

Line 41 creates an Image data object, which is then submitted to the text_detection function of the Google Cloud Vision API (Line 42).

Lines 45-49 check to see if there was an error OCR’ing our input image and if so, we raise the error and exit from the script.

Otherwise, we can process the results of the OCR step:

# read the image again, this time in OpenCV format and make a copy of
# the input image for final output
image = cv2.imread(args["image"])
final = image.copy()

# loop over the Google Cloud Vision API OCR results
for text in response.text_annotations[1::]:
	# grab the OCR'd text and extract the bounding box coordinates of
	# the text region
	ocr = text.description
	startX = text.bounding_poly.vertices[0].x
	startY = text.bounding_poly.vertices[0].y
	endX = text.bounding_poly.vertices[1].x
	endY = text.bounding_poly.vertices[2].y

	# construct a bounding box rectangle from the box coordinates
	rect = (startX, startY, endX, endY)

Line 53 loads our input image from disk in OpenCV/NumPy array format (so that we can draw on it).

Line 57 loops over all OCR’d text from the Google Cloud Vision API response. Line 60 extracts the ocr text itself, while Lines 61-64 extract the text region’s bounding box coordinates. Line 67 then constructs a rectangle (rect) from these coordinates.

The final step is to draw the OCR results on the output and final images:

	# draw the output OCR line-by-line
	output = image.copy()
	output = draw_ocr_results(output, ocr, rect)
	final = draw_ocr_results(final, ocr, rect)

	# show the output OCR'd line
	print(ocr)
	cv2.imshow("Output", output)
	cv2.waitKey(0)

# show the final output image
cv2.imshow("Final Output", final)
cv2.waitKey(0)

Each piece of OCR’d text is displayed on our screen on Lines 75-77. The final image, with all OCR’d text, is displayed on Lines 80 and 81.

Google Cloud Vision API OCR Results

Let’s now put the Google Cloud Vision API to work! Open a terminal and execute the following command:

$ python google_ocr.py --image images/aircraft.png --client client_id.json
[INFO] making request to Google Cloud Vision API...
WARNING!
LOW
FLYING
AND
DEPARTING
AIRCRAFT
BLAST
CAN
CAUSE
PHYSICAL
INJURY

Figure 2 shows the results of applying the Google Cloud Vision API to our aircraft image, the same image we have been benchmarking OCR performance across all three cloud services. Like Amazon Rekognition API and Microsoft Cognitive Services, the Google Cloud Vision API can correctly OCR the image.

**Figure 2:** OCR’ing a warning sign line-by-line using the Google Cloud Vision API.

Let’s try a more challenging image, which you can see in Figure 3:

$ python google_ocr.py --image images/challenging.png --client client_id.json
[INFO] making request to Google Cloud Vision API...
LITTER
First
Eastern
National
Bus
Fimes
EMERGENCY
STOP

**Figure 3:** OCR’ing a more challenging example with the Google Cloud Vision API. The OCR results are nearly 100% correct, the exception being that the API was thinking that the *“T”* in *“Times”* is an *“F.”*

Just like the Microsoft Cognitive Services API, the Google Cloud Vision API performs well on our challenging, low-quality image with pixelation and low readability (even by human standards, let alone a machine). The results are in Figure 3.

Interestingly, the Google Cloud Vision API does make a mistake, thinking that the “T” in “Times” is an “F.”

Let’s look at one final image, this one of a street sign:

$ python google_ocr.py --image images/street_signs.png --client client_id.json
[INFO] making request to Google Cloud Vision API...
Old
Town
Rd
STOP
ALL
WAY

Figure 4 displays the output of applying the Google Cloud Vision API to our street sign image. Microsoft Cognitive Services API OCRs the image line-by-line, resulting in the text “Old Town Rd” and “All Way” to be OCR’d as a single line. Alternatively, Google Cloud Vision API OCRs the text word-by-word (the default setting in the Google Cloud Vision API).

**Figure 4:** The Google Cloud Vision API OCRs our street signs but, by default, returns the results word-by-word.

What's next? I recommend PyImageSearch University.

Course information:
35+ total classes • 39h 44m video • Last updated: February 2022
★★★★★ 4.84 (128 Ratings) • 3,000+ Students Enrolled

I strongly believe that if you had the right teacher you could master computer vision and deep learning.

Do you think learning computer vision and deep learning has to be time-consuming, overwhelming, and complicated? Or has to involve complex mathematics and equations? Or requires a degree in computer science?

That’s not the case.

All you need to master computer vision and deep learning is for someone to explain things to you in simple, intuitive terms. And that’s exactly what I do. My mission is to change education and how complex Artificial Intelligence topics are taught.

If you're serious about learning computer vision, your next stop should be PyImageSearch University, the most comprehensive computer vision, deep learning, and OpenCV course online today. Here you’ll learn how to successfully and confidently apply computer vision to your work, research, and projects. Join me in computer vision mastery.

Inside PyImageSearch University you'll find:

&check; 35+ courses on essential computer vision, deep learning, and OpenCV topics
&check; 35+ Certificates of Completion
&check; 39h 44m on-demand video
&check; Brand new courses released every month, ensuring you can keep up with state-of-the-art techniques
&check; Pre-configured Jupyter Notebooks in Google Colab
&check; Run all code examples in your web browser — works on Windows, macOS, and Linux (no dev environment configuration required!)
&check; Access to centralized code repos for all 500+ tutorials on PyImageSearch
&check; Easy one-click downloads for code, datasets, pre-trained models, etc.
&check; Access on mobile, laptop, desktop, etc.

Click here to join PyImageSearch University

Summary

In this lesson, you learned how to utilize the cloud-based Google Cloud Vision API for OCR. Like the other cloud-based OCR APIs we’ve covered in this book, the Google Cloud Vision API can obtain high OCR accuracy with little effort. The downside, of course, is that you need an internet connection to leverage the API.

When choosing a cloud-based API, I wouldn’t focus on the amount of Python code required to interface with the API. Instead, consider the overall ecosystem of the cloud platform you are using.

Suppose you’re building an application that requires you to interface with Amazon Simple Storage Service (Amazon S3) for data storage. In that case, it makes a lot more sense to use Amazon Rekognition API. This enables you to keep everything under the Amazon umbrella.

On the other hand, if you are using the Google Cloud Platform (GCP) instances to train deep learning models in the cloud, it makes more sense to use the Google Cloud Vision API.

These are all design and architectural decisions when building your application. Suppose you’re just “testing the waters” of each of these APIs. You are not bound to these considerations. However, if you’re developing a production-level application, then it’s well worth your time to consider the trade-offs of each cloud service. You should consider more than just OCR accuracy; consider the compute, storage, etc., services that each cloud platform offers.

Citation Information

Rosebrock, A. “Text Detection and OCR with Google Cloud Vision API,” PyImageSearch, D. Chakraborty, P. Chugh. A. R. Gosthipaty, J. Haase, S. Huot, K. Kidriavsteva, R. Raha, and A. Thanki, eds., 2022, https://pyimg.co/evzxr

***@incollection{Rosebrock_2022_OCR_GCV,
  author = {Adrian Rosebrock},
  title = {Text Detection and {OCR} with {G}oogle Cloud Vision {API}},
  booktitle = {PyImageSearch},
  editor = {Devjyoti Chakraborty and Puneet Chugh and Aritra Roy Gosthipaty and Jon Haase and Susan Huot and Kseniia Kidriavsteva and Ritwik Raha and Abhishek Thanki},  year = {2022},
  note = {https://pyimg.co/evzxr},
}

Want free GPU credits to train models?

We used Jarvislabs.ai, a GPU cloud, for all the experiments.
We are proud to offer PyImageSearch University students $20 worth of Jarvislabs.ai GPU cloud credits. Join PyImageSearch University and claim your $20 credit here.

In Deep Learning, we need to train Neural Networks. These Neural Networks can be trained on a CPU but take a lot of time. Moreover, sometimes these networks do not even fit (run) on a CPU.

To overcome this problem, we use GPUs. The problem is these GPUs are expensive and become outdated quickly.

GPUs are great because they take your Neural Network and train it quickly. The problem is that GPUs are expensive, so you don’t want to buy one and use it only occasionally. Cloud GPUs let you use a GPU and only pay for the time you are running the GPU. It’s a brilliant idea that saves you money.

JarvisLabs provides the best-in-class GPUs, and PyImageSearch University students get between 10 - 50 hours on a world-class GPU (time depends on the specific GPU you select).

This gives you a chance to test-drive a monstrously powerful GPU on any of our tutorials in a jiffy. So join PyImageSearch University today and try for yourself.

Click here to get Jarvislabs credits now

To download the source code to this post (and be notified when future tutorials are published here on PyImageSearch), simply enter your email address in the form below!

Download the Source Code and FREE 17-page Resource Guide

Enter your email address below to get a .zip of the code and a FREE 17-page Resource Guide on Computer Vision, OpenCV, and Deep Learning. Inside you'll find my hand-picked tutorials, books, courses, and libraries to help you master CV and DL!

The post Text Detection and OCR with Google Cloud Vision API appeared first on PyImageSearch.

Text Detection and OCR with Google Cloud Vision API

Table of Contents

Text Detection and OCR with Google Cloud Vision API

Looking for the source code to this post?

Text Detection and OCR with Google Cloud Vision API

Google Cloud Vision API for OCR

Obtaining Your Google Cloud Vision API Keys

Prerequisite

Steps to Enable Google Cloud Vision API and Download Credentials

Configuring Your Development Environment for the Google Cloud Vision API

Having Problems Configuring Your Development Environment?

Project Structure

Implementing the Google Cloud Vision API Script

Google Cloud Vision API OCR Results

What's next? I recommend PyImageSearch University.

Summary

Citation Information

Want free GPU credits to train models?

Download the Source Code and FREE 17-page Resource Guide

Trending Articles

Gulabi kallu Lyrics and translation | GAV / Govindhudu andhari vadele (2014)

Rajasthan Board 10th Result 2016 Roll No wise & Name Wise

A/L Technology Stream – Subject combinations, Syllabuses and Teacher guides

Notification of Promotion of the Employees BPS-01 to BPS-04 as Junior Clerk...

Praye – Wodin (Throwback Music)

NTS QESCO ALM BD SDO Jobs Syllabus MCQs Past Papers

Mushelenga intervenes in Omuthiya standoff

Practice Sheet of Right form of verbs for HSC Students

How to utilize Exception Aggregation using BEx Query Designer.

Upgrade from 10.0.21 to 10.0.22 failing on step 72 - DVT script for service...

Bureau of Internal Revenue: Regional Offices (Directory)

Igo NextGen with Android Car or iOS CarPlay

Nottingham businessman jailed for three years for crimes...

Derby man jailed and must pay £71,000 for money laundering

‘We can do better to help dementia sufferers’

Download: KADAFFI FT RICH BIZZY – MUTOBE ILIBWE ” OFFICIAL VIDEO”

(SOLVED!) Remove "WWW.1-1ADS.COM" VIRUS FOREVER from CHROME, Firefox, Edge,...

Unable to add CauClusterRole

ERROR (OSSHNL-109)

BBYO Mentor Mitch Liebeskind Passes Away at 35