Text Detection and OCR with Microsoft Cognitive Services

This lesson is part of a 3-part series on Text Detection and OCR:

Text Detection and OCR with Amazon Rekognition API
Text Detection and OCR with Microsoft Cognitive Services (today’s tutorial)
Text Detection and OCR with Google Cloud Vision API

In this tutorial, you will:

Learn how to obtain your MCS API keys
Create a configuration file to store your subscription key and API endpoint URL
Implement a Python script to make calls to the MCS OCR API
Review the results of applying the MCS OCR API to sample images

To learn about text detection and OCR, just keep reading.

Looking for the source code to this post?

Text Detection and OCR with Microsoft Cognitive Services

In our previous tutorial, you learned how to use the Amazon Rekognition API to OCR images. The hardest part of using the Amazon Rekognition API was obtaining your API keys. However, once you had your API keys, it was smooth sailing.

This tutorial focuses on a different cloud-based API called Microsoft Cognitive Services (MCS), part of Microsoft Azure. Like Amazon Rekognition API, MCS is also capable of high OCR accuracy — but unfortunately, the implementation is slightly more complex (as is both Microsoft’s login and admin dashboard).

We prefer the Amazon Web Services (AWS) Rekognition API over MCS, both for the admin dashboard and the API itself. However, if you are already ingrained into the MCS/Azure ecosystem, you should consider staying there. The MCS API isn’t that hard to use (it’s just not as straightforward as Amazon Rekognition API).

Microsoft Cognitive Services for OCR

We’ll start this tutorial with a review of how you can obtain your MCS API keys. You will need these API keys to request the MCS API to OCR images.

Once we have our API keys, we’ll review our project directory structure and then implement a Python configuration file to store our subscription key and OCR API endpoint URL.

With our configuration file implemented, we’ll move on to creating a second Python script, this one acting as a driver script that:

Imports our configuration file
Loads an input image to disk
Packages the image into an API call
Makes a request to the MCS OCR API
Retrieves the results
Annotates our output image
Displays the OCR results to our screen and terminal

Let’s dive in!

Obtaining Your Microsoft Cognitive Services Keys

Before proceeding to the rest of the sections, be sure to obtain the API keys by following the instructions shown here.

Configuring Your Development Environment

To follow this guide, you need to have the OpenCV and Azure Computer Vision libraries installed on your system.

Luckily, both are pip-installable:

$ pip install opencv-contrib-python
$ pip install azure-cognitiveservices-vision-computervision

If you need help configuring your development environment for OpenCV, we highly recommend that you read our pip install OpenCV guide — it will have you up and running in a matter of minutes.

Having Problems Configuring Your Development Environment?

**Figure 1:** Having trouble configuring your dev environment? Want access to pre-configured Jupyter Notebooks running on Google Colab? Be sure to join PyImageSearch University — you’ll be up and running with this tutorial in a matter of minutes.

All that said, are you:

Short on time?
Learning on your employer’s administratively locked system?
Wanting to skip the hassle of fighting with the command line, package managers, and virtual environments?
Ready to run the code right now on your Windows, macOS, or Linux system?

Then join PyImageSearch University today!

Gain access to Jupyter Notebooks for this tutorial and other PyImageSearch guides that are pre-configured to run on Google Colab’s ecosystem right in your web browser! No installation required.

And best of all, these Jupyter Notebooks will run on Windows, macOS, and Linux!

Project Structure

We first need to review our project directory structure.

Start by accessing the “Downloads” section of this tutorial to retrieve the source code and example images.

The directory structure for our MCS OCR API is similar to the structure of the Amazon Rekognition API project in the previous tutorial:

|-- config
|   |-- __init__.py
│   |-- microsoft_cognitive_services.py
|-- images
|   |-- aircraft.png
|   |-- challenging.png
|   |-- park.png
|   |-- street_signs.png
|-- microsoft_ocr.py

Inside the config, we have our microsoft_cognitive_services.py file, which stores our subscription key and endpoint URL (i.e., the URL of the API we’re submitting our images to).

The microsoft_ocr.py script will take our subscription key and endpoint URL, connect to the API, submit the images in our images directory for OCR, and display our screen results.

Creating Our Configuration File

Ensure you have followed Obtaining Your Microsoft Cognitive Services Keys to obtain your subscription keys to the MCS API. From there, open the microsoft_cognitive_services.py file and update your SUBSCRPTION_KEY:

# define our Microsoft Cognitive Services subscription key
SUBSCRIPTION_KEY = "YOUR_SUBSCRIPTION_KEY"

# define the ACS endpoint
ENDPOINT_URL = "YOUR_ENDPOINT_URL"

You should replace the string "YOUR_SUBSCRPTION_KEY" with your subscription key obtained from Obtaining Your Microsoft Cognitive Services Keys.

Additionally, ensure you double-check your ENDPOINT_URL. At the time of this writing, the endpoint URL points to the most recent version of the MCS API; however, as Microsoft releases new API versions, this endpoint URL may change, so it’s worth double-checking.

Implementing the Microsoft Cognitive Services OCR Script

Now, let’s learn how to submit images for text detection and OCR to the MCS API.

Open the microsoft_ocr.py script in the project directory structure and insert the following code:

# import the necessary packages
from config import microsoft_cognitive_services as config
from azure.cognitiveservices.vision.computervision import ComputerVisionClient
from azure.cognitiveservices.vision.computervision.models import OperationStatusCodes
from msrest.authentication import CognitiveServicesCredentials
import argparse
import time
import sys
import cv2

Note on Line 2 that we import our microsoft_cognitive_services configuration to supply our subscription key and endpoint URL. Then, we’ll use the azure and msrest Python packages to send requests to the API.

Next, let’s define draw_ocr_results, a helper function used to annotate our output images with the OCR’d text:

def draw_ocr_results(image, text, pts, color=(0, 255, 0)):
	# unpack the points list
	topLeft = pts[0]
	topRight = pts[1]
	bottomRight = pts[2]
	bottomLeft = pts[3]

	# draw the bounding box of the detected text
	cv2.line(image, topLeft, topRight, color, 2)
	cv2.line(image, topRight, bottomRight, color, 2)
	cv2.line(image, bottomRight, bottomLeft, color, 2)
	cv2.line(image, bottomLeft, topLeft, color, 2)

	# draw the text itself
	cv2.putText(image, text, (topLeft[0], topLeft[1] - 10),
		cv2.FONT_HERSHEY_SIMPLEX, 0.8, color, 2)

	# return the output image
	return image

Our draw_ocr_results function has four parameters:

image: The input image that we will draw on.
text: The OCR’d text.
pts: The top-left, top-right, bottom-right, and bottom-left (x, y)-coordinates of the text ROI
color: The BGR color we’re using to draw on the image

Lines 13-16 unpack our bounding box coordinates. From there, Lines 19-22 draw the bounding box surrounding the text in the image. We then draw the OCR’d text itself on Lines 25 and 26.

We wrap up this function by returning the output image to the calling function.

We can now parse our command line arguments:

# construct the argument parser and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-i", "--image", required=True,
	help="path to input image that we'll submit to Microsoft OCR")
args = vars(ap.parse_args())

# load the input image from disk, both in a byte array and OpenCV
# format
imageData = open(args["image"], "rb").read()
image = cv2.imread(args["image"])

We only need a single argument here, --image, which is the path to the input image on disk. We read this image from disk, both as a binary byte array (so we can submit it to the MCS API, and then again in OpenCV/NumPy format (so we can draw on/annotate it).

Let’s now construct a request to the MCS API:

# initialize the client with endpoint URL and subscription key
client = ComputerVisionClient(config.ENDPOINT_URL,
	CognitiveServicesCredentials(config.SUBSCRIPTION_KEY))

# call the API with the image and get the raw data, grab the operation
# location from the response, and grab the operation ID from the
# operation location
response = client.read_in_stream(imageData, raw=True)
operationLocation = response.headers["Operation-Location"]
operationID = operationLocation.split("/")[-1]

Lines 43 and 44 initialize the Azure Computer Vision client. Note that we are supplying our ENDPOINT_URL and SUBSCRIPTION_KEY here — now is a good time to go back to microsoft_cognitive_services.py and ensure you have correctly inserted your subscription key (otherwise, the request to the MCS API will fail).

We then submit the image for OCR to the MCS API on Lines 49-51.

We now have to wait and poll for results from the MCS API:

# continue to poll the Cognitive Services API for a response until
# we get a response
while True:
	# get the result
	results = client.get_read_result(operationID)

	# check if the status is not "not started" or "running", if so,
	# stop the polling operation
	if results.status.lower() not in ["notstarted", "running"]:
		break
	
	# sleep for a bit before we make another request to the API
	time.sleep(10)

# check to see if the request succeeded
if results.status == OperationStatusCodes.succeeded:
	print("[INFO] Microsoft Cognitive Services API request succeeded...")

# if the request failed, show an error message and exit
else:
	print("[INFO] Microsoft Cognitive Services API request failed")
	print("[INFO] Attempting to gracefully exit")
	sys.exit(0)

I’ll be honest — polling for results is not my favorite way to work with an API. It requires more code, it’s a bit more tedious, and it can be potentially error-prone if the programmer isn’t careful to break out of the loop properly.

Of course, there are pros to this approach, including maintaining a connection, submitting larger chunks of data, and having results returned in batches rather than all at once.

Regardless, this is how Microsoft has implemented its API, so we must play by their rules.

Line 55 starts a while loop that continuously checks for responses from the MCS API (Line 57).

If we do not find the status in the ["notstarted", "running"] list, we can safely break from the loop and process our results (Lines 61 and 62).

If the above condition is not met, we sleep for a small amount of time and then poll again (Line 65).

Line 68 checks if the request has succeeded, if it has then we can safely continue to process our results. Otherwise, if the request hasn’t succeeded, then we have no OCR results to show (since the image could not be processed), and then we exit gracefully from our script (Lines 72-75).

Provided our OCR request succeeded, let’s now process the results:

# make a copy of the input image for final output
final = image.copy()

# loop over the results
for result in results.analyze_result.read_results:
	# loop over the lines
	for line in result.lines:
		# extract the OCR'd line from Microsoft's API and unpack the
		# bounding box coordinates
		text = line.text
		box = list(map(int, line.bounding_box))
		(tlX, tlY, trX, trY, brX, brY, blX, blY) = box
		pts = ((tlX, tlY), (trX, trY), (brX, brY), (blX, blY))

		# draw the output OCR line-by-line
		output = image.copy()
		output = draw_ocr_results(output, text, pts)
		final = draw_ocr_results(final, text, pts)

		# show the output OCR'd line
		print(text)
		cv2.imshow("Output", output)

# show the final output image
cv2.imshow("Final Output", final)
cv2.waitKey(0)

Line 78 initializes our final output image with all text drawn.

We start looping through all lines of OCR’d text on Line 81. On Line 83, we start looping through all lines of the OCR’d text. We extract the OCR’d text and bounding box coordinates for each line, then construct a list of the top-left, top-right, bottom-right, and bottom-left corners, respectively (Lines 86-89).

We then draw the OCR’d text line-by-line on the output and final image (Lines 92-94). We display the current line of text on our screen and terminal (Lines 97 and 98) — the final output image, with all OCR’d text drawn on it, is displayed on Lines 101 and 102.

Microsoft Cognitive Services OCR Results

Let’s now put the MCS OCR API to work for us. Open a terminal and execute the following command:

$ python microsoft_ocr.py --image images/aircraft.png
[INFO] making request to Microsoft Cognitive Services API...
WARNING!
LOW FLYING AND DEPARTING AIRCRAFT
BLAST CAN CAUSE PHYSICAL INJURY

Figure 2 shows the output of applying the MCS OCR API to our aircraft warning sign. If you recall, this is the same image we used in a previous tutorial when applying the Amazon Rekognition API. Therefore, I included the same image in this tutorial to demonstrate that the MCS OCR API can correctly OCR this image.

**Figure 2:** Image of a plane flying over a warning sign. The OCR results displayed on the sign in green.

Let’s try a different image, this one containing several challenging pieces of text:

$ python microsoft_ocr.py --image images/challenging.png
[INFO] making request to Microsoft Cognitive Services API...

LITTER
EMERGENCY
First
Eastern National
Bus Times
STOP

Figure 3 shows the results of applying the MCS OCR API to our input image — and as we can see, MCS does a great job OCR’ing the image.

**Figure 3:** *Left:* Sample text from the First Eastern National bus timetable. The sample text is a tough image to OCR due to the low image quality and glossy print. Still, Microsoft’s OCR API can correctly OCR it! *Middle:* The Microsoft OCR API can correctly OCR the *“Emergency Stop”* text. *Right:* A trash can with the text “*Litter*.” We’re able to OCR the text, but the text at the bottom of the trash can is unreadable, even to the human eye.

On the left, we have a sample image from the First Eastern National bus timetable (i.e., the schedule of when a bus will arrive). The document is printed with a glossy finish (likely to prevent water damage). Still, the image has a significant reflection due to the gloss, particularly in the “Bus Times” text. Still, the MCS OCR API can correctly OCR the image.

In the middle, the “Emergency Stop” text is highly pixelated and low-quality, but that doesn’t phase the MCS OCR API! It’s able to correctly OCR the image.

Finally, the right shows a trash can with the text “Litter.” The text is tiny, and due to the low-quality image, it is challenging to read without squinting a bit. That said, the MCS OCR API can still OCR the text (although the text at the bottom of the trash can is illegible — neither human nor API could read that text).

The next sample image contains a national park sign shown in Figure 4:

**Figure 4:** OCR’ing a park sign using the Microsoft Cognitive Services OCR API. Notice that API can give us *rotated* text bounding boxes along with the OCR’d text itself.

The MCS OCR API can OCR each sign line-by-line (Figure 4). We can also compute rotated text bounding box/polygons for each line.

$ python microsoft_ocr.py --image images/park.png
[INFO] making request to Microsoft Cognitive Services API...

PLEASE TAKE
NOTHING BUT
PICTURES
LEAVE NOTHING
BUT FOOT PRINTS

The final example we have contains traffic signs:

$ python microsoft_ocr.py --image images/street_signs.png
[INFO] making request to Microsoft Cognitive Services API...

Old Town Rd
STOP
ALL WAY

Figure 5 shows that we can correctly OCR each piece of text on both the stop sign and street name sign.

**Figure 5:** The Microsoft Cognitive Services OCR API can detect the text on traffic signs.

What's next? I recommend PyImageSearch University.

Course information:
35+ total classes • 39h 44m video • Last updated: February 2022
★★★★★ 4.84 (128 Ratings) • 3,000+ Students Enrolled

I strongly believe that if you had the right teacher you could master computer vision and deep learning.

Do you think learning computer vision and deep learning has to be time-consuming, overwhelming, and complicated? Or has to involve complex mathematics and equations? Or requires a degree in computer science?

That’s not the case.

All you need to master computer vision and deep learning is for someone to explain things to you in simple, intuitive terms. And that’s exactly what I do. My mission is to change education and how complex Artificial Intelligence topics are taught.

If you're serious about learning computer vision, your next stop should be PyImageSearch University, the most comprehensive computer vision, deep learning, and OpenCV course online today. Here you’ll learn how to successfully and confidently apply computer vision to your work, research, and projects. Join me in computer vision mastery.

Inside PyImageSearch University you'll find:

&check; 35+ courses on essential computer vision, deep learning, and OpenCV topics
&check; 35+ Certificates of Completion
&check; 39h 44m on-demand video
&check; Brand new courses released every month, ensuring you can keep up with state-of-the-art techniques
&check; Pre-configured Jupyter Notebooks in Google Colab
&check; Run all code examples in your web browser — works on Windows, macOS, and Linux (no dev environment configuration required!)
&check; Access to centralized code repos for all 500+ tutorials on PyImageSearch
&check; Easy one-click downloads for code, datasets, pre-trained models, etc.
&check; Access on mobile, laptop, desktop, etc.

Click here to join PyImageSearch University

Summary

In this tutorial, you learned about Microsoft Cognitive Services (MCS) OCR API. Despite being slightly harder to implement and use than the Amazon Rekognition API, the Microsoft Cognitive Services OCR API demonstrated that it’s quite robust and able to OCR text in many situations, including low-quality images.

When working with low-quality images, the MCS API shined. Typically, I recommend you programmatically detect and discard low-quality images (as we did in a previous tutorial). However, if you find yourself in a situation where you have to work with low-quality images, it may be worth your while to use the Microsoft Azure Cognitive Services OCR API.

Citation Information

Rosebrock, A. “Text Detection and OCR with Microsoft Cognitive Services,” PyImageSearch, D. Chakraborty, P. Chugh, A. R. Gosthipaty, J. Haase, S. Huot, K. Kidriavsteva, R. Raha, and A. Thanki, eds., 2022, https://pyimg.co/0r4mt

@incollection{Rosebrock_2022_OCR_MCS,
  author = {Adrian Rosebrock},
  title = {Text Detection and {OCR} with {M}icrosoft Cognitive Services},
  booktitle = {PyImageSearch},
  editor = {Devjyoti Chakraborty and Puneet Chugh and Aritra Roy Gosthipaty and Jon Haase and Susan Huot and Kseniia Kidriavsteva and Ritwik Raha and Abhishek Thanki},
  year = {2022},
  note = {https://pyimg.co/0r4mt},
}

Want free GPU credits to train models?

We used Jarvislabs.ai, a GPU cloud, for all the experiments.
We are proud to offer PyImageSearch University students $20 worth of Jarvislabs.ai GPU cloud credits. Join PyImageSearch University and claim your $20 credit here.

In Deep Learning, we need to train Neural Networks. These Neural Networks can be trained on a CPU but take a lot of time. Moreover, sometimes these networks do not even fit (run) on a CPU.

To overcome this problem, we use GPUs. The problem is these GPUs are expensive and become outdated quickly.

GPUs are great because they take your Neural Network and train it quickly. The problem is that GPUs are expensive, so you don’t want to buy one and use it only occasionally. Cloud GPUs let you use a GPU and only pay for the time you are running the GPU. It’s a brilliant idea that saves you money.

JarvisLabs provides the best-in-class GPUs, and PyImageSearch University students get between 10 - 50 hours on a world-class GPU (time depends on the specific GPU you select).

This gives you a chance to test-drive a monstrously powerful GPU on any of our tutorials in a jiffy. So join PyImageSearch University today and try for yourself.

Click here to get Jarvislabs credits now

To download the source code to this post (and be notified when future tutorials are published here on PyImageSearch), simply enter your email address in the form below!

Download the Source Code and FREE 17-page Resource Guide

Enter your email address below to get a .zip of the code and a FREE 17-page Resource Guide on Computer Vision, OpenCV, and Deep Learning. Inside you'll find my hand-picked tutorials, books, courses, and libraries to help you master CV and DL!

The post Text Detection and OCR with Microsoft Cognitive Services appeared first on PyImageSearch.

Text Detection and OCR with Microsoft Cognitive Services

Table of Contents

Text Detection and OCR with Microsoft Cognitive Services

Looking for the source code to this post?

Text Detection and OCR with Microsoft Cognitive Services

Microsoft Cognitive Services for OCR

Obtaining Your Microsoft Cognitive Services Keys

Configuring Your Development Environment

Having Problems Configuring Your Development Environment?

Project Structure

Creating Our Configuration File

Implementing the Microsoft Cognitive Services OCR Script

Microsoft Cognitive Services OCR Results

What's next? I recommend PyImageSearch University.

Summary

Citation Information

Want free GPU credits to train models?

Download the Source Code and FREE 17-page Resource Guide

Trending Articles

Bath man appears in court charged with attempted murder of a man...

MACLEAN, Allan

Black Angus Grilled Artichokes

Practice Sheet of Right form of verbs for HSC Students

Police blotter for Jan. 12

99 God Status for Whatsapp, Facebook

Rajasthan Board 12th Science Result 2018 name wise- RBSE 12th commerce result...

Notorious Naushad of Ippa gang nabbed

Child Kidnapping: Amy McNeil was kidnapped on her way to school by 5 adults;...

Sonible Smartlimit v1.1.5-R2R

NCERT Solutions for Class 9th Sanskrit Chapter 3 पाथेयम्

मतलबी दोस्त स्टेट्स | Matlabi Dost Status in Hindi – Selfish Friends Status

Arrow Flash 2 – Sinhala Dubbed – Episode 23 – 20th March 2016

[GET] AI Traffic Goldmine

[E² Plugin] HDF-Radio

Universal Multi-Patch v1.3 By RADIXX11

IWAN – Thanks and Praise ( Throw Back Thursday )

RONALD P SONDERGAARD Arrested by Miami-Dade County Corrections on Mar 03, 2017

मुख मैथुन से उठाएं सेक्स का भरपूर मज़ा, जानें क्या है इसका सही तरीकामुख मैथुन...

HSSC Excise & Taxation Inspector Result 2017 Scorecard/ Category Wise Merit List