Here we are learning a new technique which enables user to hear the contents of text images instead of reading them.Text Extraction from color images is a challenging task in computer vision. Text-to-Speech conversion is a method that scans and reads English alphabets and numbers that are in the image using OCR technique and changing it to voices.


Tesseract is a well-known open source OCR library .Optical character recognition (OCR) is a technology that enables one to extract text out of printed documents, captured images, etc. It was originally developed by Hewlett Packard Labs and was then released as free software under the Apache licence 2.0 in 2005. The development has been sponsored by Google since 2006.


1)Power up the raspberry pi and connect internet.

2)Open the terminal and type the following command,

sudo apt-get install tesseract-ocr

This will install TESSERACT library.


1)Open the terminal and type the command,

sudo apt-get install espeak

Now espeak will be installed.To check if you have installed espeak type the following in your terminal,

espeak "Hello World"

If everything is ok ,then you should be able to hear “Hello World” through your headphone.If you are not hearing anything it might be because the default audio output will be through  HDMI. Type the following command ,

amixer cset numid=3 1

The above command will switch the audio output from HDMI to headphone jack.


import os

Load the image file.

First convert the image to text using tesseract .Here the image imported is “image.jpg” and the output is saved as “textread.txt

 os.system('tesseract /root/Desktop/image.jpg /root/Desktop/textread')

Now we need to read the “textread.txt” using “espeak“.

os.system('espeak -ven-us+f3 -f textread.txt -a 300 -s 110 -p 300')

Now you will be able to hear the text in the image.

Note that the text file and the image should be saved in the same folder as the program or else you need to give the path.

