Handwritten text detection from an image file using Google Vision API in Python

Mayur_Gade
5 min readMar 28, 2021

--

An implementation of Google Vision API using python to detect handwritten text from the image file. #Google Cloud #Google Vision API #Python #Text Extraction #OCR

Introduction to Google Vision API

Google Cloud offers two computer vision products that use machine learning to help you understand your images with industry-leading prediction accuracy.

1. Auto-ML Vision

Automate the training of your own custom machine learning models. Simply upload images and train custom image models with AutoML Vision’s easy-to-use graphical interface to optimize your models for accuracy, latency, and size and export them to your application in the cloud, or to an array of devices at the edge.

2. Vision API

Google Cloud’s Vision API offers powerful pre-trained machine learning models through REST and RPC APIs. Assign labels to images and quickly classify them into millions of predefined categories. Detect objects and faces, read printed and handwritten text, and build valuable metadata into your image catalog.

Implementation in Python:

In this article we will try to detect handwritten text from an image using google vision api in python. We can use any image file with handwritten text or we can create our own sample image with own handwritten text. For this exercise I will use below images with handwritten texts to check how the google vision api behaves with different handwriting.

text image 1 with simple way of handwriting
text image 2 with slightly jumbled up handwritten text
text image 3 henry ford signature

Kindly refer above YouTube videos to setup vision api in your system

# importing required libraries
import os, io
import pandas as pd
from google.cloud import vision
from google.cloud import vision_v1

# calling up google vision json file
os.environ[‘GOOGLE_APPLICATION_CREDENTIALS’] = r’C:vision api/Google vision API test-8a0999498f6f.json’

# initiate a client
client = vision.ImageAnnotatorClient()

# setting up required path
folder_path = r’F:Google cloud/’
image_path = ‘text image 1.jpg’
file_path = os.path.join(folder_path,image_path)

# load image into memory
with io.open(file_path,’rb’) as image_file:
file_content = image_file.read()

# perform text detection from the image
image_detail = vision.Image(content=file_content)
response = client.document_text_detection(image=image_detail)

# print text from the dcoment
doctext = response.full_text_annotation.text
print(doctext)

We start With Good
Because all businesses should
be doin
sod.

From the output we can clearly see that vision api has detected half of the text correctly but it has failed to detect the word “doing some thing good”. The reason for incorrect detection could be different way of handwriting.

Let’s review the confidence of text extraction for first example

# review the confidence of text extraction
pages = response.full_text_annotation.pages
for page in pages:
for block in page.blocks:
print(‘block confidence:’,block.confidence)

for paragraph in block.paragraphs:
print(‘paragraph confidence:’,paragraph.confidence)

for word in paragraph.words:
word_text = ‘’.join([symbol.text for symbol in word.symbols])
print(‘word_text: {0} , confidence: {1}’.format(word_text,word.confidence))

block confidence: 0.949999988079071
paragraph confidence: 0.949999988079071
word_text: We , confidence: 0.9800000190734863
word_text: start , confidence: 0.9800000190734863
word_text: With , confidence: 0.8899999856948853
word_text: Good , confidence: 0.9700000286102295
block confidence: 0.9399999976158142
paragraph confidence: 0.9700000286102295
word_text: Because , confidence: 0.9399999976158142
word_text: all , confidence: 0.9700000286102295
word_text: businesses , confidence: 0.9900000095367432
word_text: should , confidence: 0.9399999976158142
word_text: be , confidence: 0.9900000095367432
word_text: doin , confidence: 0.9900000095367432
paragraph confidence: 0.6800000071525574
word_text: sod , confidence: 0.6600000262260437
word_text: . , confidence: 0.7599999904632568

I will generate the output for other two images which I have mentioned above, using the same code which we have used for text extraction from first image

Text extraction from image 2:

image_path = ‘text image 2.jpg’
file_path = os.path.join(folder_path,image_path)

# load image into memory
with io.open(file_path,’rb’) as image_file:
file_content = image_file.read()

# perform text detection from the image
image_detail = vision.Image(content=file_content)
response = client.document_text_detection(image=image_detail)

# print text from the dcoment
doctext = response.full_text_annotation.text
print(doctext)

Hope you have done it,

# review the confidence of text extraction
pages = response.full_text_annotation.pages
for page in pages:
for block in page.blocks:
print(‘block confidence:’,block.confidence)

for paragraph in block.paragraphs:
print(‘paragraph confidence:’,paragraph.confidence)

for word in paragraph.words:
word_text = ‘’.join([symbol.text for symbol in word.symbols])
print(‘word_text: {0} , confidence: {1}’.format(word_text,word.confidence))

block confidence: 0.9300000071525574
paragraph confidence: 0.9300000071525574
word_text: Hope , confidence: 0.8999999761581421
word_text: you , confidence: 0.9900000095367432
word_text: have , confidence: 0.9900000095367432
word_text: done , confidence: 0.9700000286102295
word_text: it , confidence: 0.9900000095367432
word_text: , , confidence: 0.41999998688697815

Surprisingly the api is able to detect all the text correctly though the handwriting in the second image is slightly jumbled up.

Text extraction from image 3:

image_path = ‘text image 3.jpg’
file_path = os.path.join(folder_path,image_path)

# load image into memory
with io.open(file_path,’rb’) as image_file:
file_content = image_file.read()

# perform text detection from the image
image_detail = vision.Image(content=file_content)
response = client.document_text_detection(image=image_detail)

# print text from the dcoment
doctext = response.full_text_annotation.text
print(doctext)

Nemy, Ford

# review the confidence of text extraction
pages = response.full_text_annotation.pages
for page in pages:
for block in page.blocks:
print(‘block confidence:’,block.confidence)

for paragraph in block.paragraphs:
print(‘paragraph confidence:’,paragraph.confidence)

for word in paragraph.words:
word_text = ‘’.join([symbol.text for symbol in word.symbols])
print(‘word_text: {0} , confidence: {1}’.format(word_text,word.confidence))

block confidence: 0.7699999809265137
paragraph confidence: 0.7699999809265137
word_text: Nemy , confidence: 0.7900000214576721
word_text: , , confidence: 0.4399999976158142
word_text: Ford , confidence: 0.8399999737739563

From the text extraction output of third image we could see that the word “henry” has been detected incorrectly. Where as the word “Ford” is been correctly extracted with confidence of 84%

Summary:

  1. From above three examples we could see that Google Vision API performs pretty good job in detecting handwritten text. But it majorly depends on the handwriting. From first example it seems like if there are words with different style of handwriting in same paragraph then it struggles to detect correct text.
  2. Also, from third example we could say that the vision API could struggle to detect text from signature or if the handwritten texts are too much jumbled up.
  3. From the text extraction confidence scores of above three images, it seems that the extracted word with above 80% confidence could be the correctly extracted word.
  4. The Vision API would helps us to perform OCR operation with a minimal line of code. Also it could perform well for handwritten text extraction in comparison with traditional OCR libraries of Python or R. However there could be constraints on customization with Vision API.

--

--

Mayur_Gade

Data science professional with experience in executing knowledge of data science and machine learning for customer churn prediction, brand positioning analysis.