Quantcast
Channel: PyImageSearch
Viewing all articles
Browse latest Browse all 195

Tesseract OCR: Text localization and detection

$
0
0

In this tutorial, you will learn how to utilize Tesseract to detect, localize, and OCR text, all within a single, efficient function call.

Back in September, I showed you how to use OpenCV to detect and OCR text. This method was a three stage process:

  1. Use OpenCV’s EAST text detection model to detect the presence of text in an image
  2. Extract the text Region of Interest (ROI) from the image using basic image cropping/NumPy array slicing
  3. Take the text ROI, and then pass it into Tesseract to actually OCR the text

Our method worked quite well but was a bit complicated and less efficient due to the multistage process.

PyImageSearch reader Bryan wonders if there is a better, more streamlined way:

Hi Adrian,

I noticed that OpenCV’s uses the EAST text detection model. I assume text detection also exists inside Tesseract?

If so, is there anyway we can utilize Tesseract to both detect the text and OCR it without having to call additional OpenCV functions?

You’re in luck, Bryan. Tesseract does have the ability to perform text detection and OCR in a single function call — and as you’ll find out, it’s quite easy to do!

To learn how to detect, localize, and OCR text with Tesseract, just keep reading.

Looking for the source code to this post?

Jump Right To The Downloads Section

Tesseract OCR: Text localization and detection

In the first part of this tutorial, we’ll discuss the concept of text detection and localization.

From there, I will show you how to install Tesseract on your system.

We’ll then implement text localization, detection, and OCR using Tesseract and Python.

Finally, we’ll review our results.

What is text localization and detection?

Text detection is the process of localizing where an image text is.

You can think of text detection as a specialized form of object detection.

In object detection, our goal is to (1) detect and compute the bounding box of all objects in an image and (2) determine the class label for each bounding box, similar to the image below:

Figure 1: Tesseract can be used for both text localization and text detection. Text localization can be thought of as a specialized form of object detection

In text detection, our goal is to automatically compute the bounding boxes for every region of text in an image:

Figure 2: Once text has been localized/detected in an image, we can decode it using OCR software. Tesseract can be used for text localization/detection as well as OCR.

Once we have those regions, we can then OCR them.

How to install pytesseract for Tesseract OCR

Figure 3: Installing Tesseract and pytesseract allows you to use Python code to perform text detection and OCR.

I have provided instructions for installing the Tesseract OCR engine as well as pytesseract (the Python bindings used to interface with Tesseract) in my blog post OpenCV OCR and text recognition with Tesseract.

Follow the instructions in the “How to install Tesseract 4” section of that tutorial, confirm your Tesseract install, and then come back here to learn how to detect and localize text with Tesseract.

Project structure

Go ahead and grab today’s .zip from the “Downloads” section of this blog post. Once you extract the files, you’ll be presented with an especially simple project layout:

% tree
.
├── apple_support.png
└── localize_text_tesseract.py

0 directories, 2 files

As you can see, we have only one Python script to review today — the localize_text_tesseract.py file.

Secondly, we have a single image to test our OCR script with. Feel free to grab other photos and graphics to test today’s code with as well!

Implementing text localization, text detection, and OCR with Tesseract

We are now ready to implement text detection and localization with Tesseract.

Open up a new file, name it localize_text_tesseract.py, and let’s get to work:

Review localize_text_tesseract.py

# import the necessary packages
from pytesseract import Output
import pytesseract
import argparse
import cv2

# construct the argument parser and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-i", "--image", required=True,
	help="path to input image to be OCR'd")
ap.add_argument("-c", "--min-conf", type=int, default=0,
	help="mininum confidence value to filter weak text detection")
args = vars(ap.parse_args())

We begin by importing packages, namely pytesseract and OpenCV. Be sure to refer to the “How to install pytesseract for Tesseract OCR” section above for installation links.

Next, we parse two command line arguments:

  • --image: The path to the input image upon which we will perform OCR
  • --min-conf: In order to filter weak text detections, a minimum confidence threshold can be provided. By default, we’ve set the threshold to 0 so that all detections are returned

Let’s go ahead and run our input --image through pytesseract next:

# load the input image, convert it from BGR to RGB channel ordering,
# and use Tesseract to localize each area of text in the input image
image = cv2.imread(args["image"])
rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
results = pytesseract.image_to_data(rgb, output_type=Output.DICT)

Lines 17 and 18 load the input --image and swap color channel ordering from BGR (OpenCV’s default) to RGB (compatible with Tesseract and pytesseract).

Then we detect and localize text using Tesseract and the image_to_data function (Line 19). This function returns results, which we’ll now post-process:

# loop over each of the individual text localizations
for i in range(0, len(results["text"])):
	# extract the bounding box coordinates of the text region from
	# the current result
	x = results["left"][i]
	y = results["top"][i]
	w = results["width"][i]
	h = results["height"][i]

	# extract the OCR text itself along with the confidence of the
	# text localization
	text = results["text"][i]
	conf = int(results["conf"][i])

Looping over the text localizations (Line 22), we begin by extracting the bounding box coordinates (Lines 25-28).

To grab the OCR’d text itself, we extract the information contained within the results dictionary using the "text" key and index (Line 32). This is the recognized text string.

Similarly, Line 33 extracts the confidence of the text localization (the confidence of the detected text).

From here, we’ll filter out weak detections and annotate our image:

# filter out weak confidence text localizations
	if conf > args["min_conf"]:
		# display the confidence and text to our terminal
		print("Confidence: {}".format(conf))
		print("Text: {}".format(text))
		print("")

		# strip out non-ASCII text so we can draw the text on the image
		# using OpenCV, then draw a bounding box around the text along
		# with the text itself
		text = "".join([c if ord(c) < 128 else "" for c in text]).strip()
		cv2.rectangle(image, (x, y), (x + w, y + h), (0, 255, 0), 2)
		cv2.putText(image, text, (x, y - 10), cv2.FONT_HERSHEY_SIMPLEX,
			1.2, (0, 0, 255), 3)

# show the output image
cv2.imshow("Image", image)
cv2.waitKey(0)

Comparing confidence versus our --min-conf command line argument ensures that the confidence is sufficiently high (Line 36).

In our terminal, we print information for debugging/informational purposes, including both the confidence and text itself (Lines 38-40).

OpenCV’s cv2.putText function doesn’t support non-ASCII characters, so we need to strip any non-ASCII characters out. This is handled by Line 45, where we work with character ordinals (ord(c)). Be sure to refer to this ASCII chart in Wikipedia as needed.

With the special characters eliminated from our text, now we’ll annotate the output image. Line 46 draws a bounding box around the detected text, and Lines 47 and 48 draw the text itself just above the bounding box region.

Finally, using OpenCV’s imshow function, we display the result on our screen (Line 51). In order to keep the GUI window on the screen longer than a few milliseconds, the cv2.waitKey(0) call locks the GUI window as visible until any key is pressed.

Great job performing OCR with Tesseract and pytesseract.

Tesseract text localization, text detection, and OCR results

We are now ready to perform text detection and localization with Tesseract!

Make sure you use the “Downloads” section of this tutorial to download the source code and example image.

From there, open up a terminal, and execute the following command:

$ python localize_text_tesseract.py --image apple_support.png
Confidence: 26
Text: a

Confidence: 96
Text: Apple

Confidence: 96
Text: Support

Confidence: 96
Figure 4: Using Tesseract to perform text detection and OCR with Python. Without a confidence threshold set, there is room for misidentified text regions, as is evident in the top-left of this graphic.

Here, you can see that Tesseract has detected all regions of text and OCR’d each text region. The results look good, but what is up with Tesseract thinking the leaf in the Apple logo is a 4?

If you look at our terminal output, you’ll see that particular text region has low confidence.

We can improve our Tesseract text detection results simply by supplying a --min-conf value:

$ python localize_text_tesseract.py --image apple_support.png --min-conf 50
Confidence: 96
Text: Apple

Confidence: 96
Text: Support

Confidence: 96
Text: 1-800-275-2273
Figure 5: Using Tesseract to perform text detection and OCR with Python. By setting a confidence threshold, we are able to eliminate the false detection, as in Figure 4.

Here, we are filtering out any text detections and OCR results that have a confidence <= 50, and as our results show, the low quality text region has been filtered out.

When developing your own text detection and OCR applications with Tesseract, consider using the image_to_data function — it’s super easy to use and makes text localization a breeze.

What’s next?

Figure 6: Join the PyImageSearch Gurus course and community for breadth and depth into the world of computer vision, image processing, and deep learning. There are 168 lessons, each with example Python code for you to learn from as you grow your knowledge and skillset.

Today’s blog post was admittedly simple and straightforward, and my hope is that it gives you a little inspiration and confidence.

I’ll be blunt: Computer vision apps and services can be quite complex — much more so than today’s tutorial.

If you are interested in learning more about image processing, computer vision, and machine/deep learning, look no further than the PyImageSearch Gurus course and community.

Inside PyImageSearch Gurus, you’ll find:

  • An actionable, real-world course on Computer Vision, Deep Learning, and OpenCV. Each lesson in PyImageSearch Gurus is taught in the same hands-on, easy-to-understand PyImageSearch style that you know and love
  • The most comprehensive computer vision education online today. The PyImageSearch Gurus course covers 13 modules broken out into 168 lessons, with over 2,161 pages of content. You won’t find a more detailed computer vision course anywhere else online; I guarantee it
  • A community of like-minded developers, researchers, and students just like you, who are eager to learn computer vision, level-up their skills, and collaborate on projects. I participate in the forums nearly every day. These forums are a great way to get expert advice, both from me as well as the more advanced students

Spend a moment reviewing these previous students’ success stories — each of these students invested in themselves and has achieved success. I have no doubt the same will be true for you once you enroll.

If you’d like more information, simply click here:

Send me the course syllabus and 10 free lessons!

Summary

In this tutorial, you learned how to use Tesseract to detect text, localize it, and then OCR it.

The benefit of using Tesseract to perform text detection and OCR is that we can do so in just a single function call, making it easier than the multistage OpenCV OCR process.

That said, OCR is still an area of computer vision that is far from solved.

Whenever confronted with an OCR project, be sure to apply both methods and see which method gives you the best results — let your empirical results guide you.

To download the source code to this post (and be notified when future tutorials are published here on PyImageSearch), simply enter your email address in the form below!

Download the Source Code and FREE 17-page Resource Guide

Enter your email address below to get a .zip of the code and a FREE 17-page Resource Guide on Computer Vision, OpenCV, and Deep Learning. Inside you’ll find my hand-picked tutorials, books, courses, and libraries to help you master CV and DL!

The post Tesseract OCR: Text localization and detection appeared first on PyImageSearch.


Viewing all articles
Browse latest Browse all 195

Trending Articles