Read Text from an Image with One Line of Python Code
2 min read
To read text from an image with one line of Python code, you can use the pytesseract library. Pytesseract is a wrapper for Tesseract OCR (Optical Character Recognition), an open-source, highly accurate OCR engine developed by Google. It can recognize and extract text from images in more than 100 languages.
To use pytesseract, you need to install it first. You can do this using pip, the Python package manager, by running the following command in your terminal:
pip install pytesseract
Once pytesseract is installed, you can use the following line of code to read text from an image:
import pytesseract text = pytesseract.image_to_string(image)
image is a PIL (Python Imaging Library) Image object or a file path to the image. The
image_to_string the function will return the extracted text as a string.
It's worth noting that OCR is not perfect, and the accuracy of the results may vary depending on the quality and clarity of the image. Preprocessing the image to improve its contrast and remove any noise can often improve the OCR results. You can use functions from the PIL library or other image-processing libraries to preprocess the image before passing it to
For example, to convert the image to grayscale and apply thresholding to improve the contrast, you can use the following code:
from PIL import Image import pytesseract image = Image.open(image_file) image = image.convert("L") # convert to grayscale image = image.point(lambda x: 0 if x < 128 else 255, "1") # apply thresholding text = pytesseract.image_to_string(image)
I hope this helps! Let me know if you have any questions.