altText of the image

Optical Character Recognition

The aim of this Repository is to be able to recognize text from an image file using the e ArabicOcr Library in the Python Programming Language.

e ArabicOcr is an Open Source library for Optical Character Recognition(OCR). We will be using e ArabicOcr to print the recognized Arabic text given an input image.

https://github.com/fagrahmed12/OCR_AI_Team

How to Calculate OCR Accuracy 1. The Quality of Original Source Images 2. The Quality of OCR Engine

These factors lead to improving the quality of reading and making the words appear as correctly as possible:  Good Quality of Source Images Before using OCR, make sure you can read the images with your own eyes. If you, with your own eyes, can’t see the image clearly, make sure the original source images are not damaged AND wrinkle-free. So, use the cleanest and most original files for better results.  Right Size of Images OCR engine needs to read source images not only the ones with the best quality but also the right resolution. Make sure the image is resized to the correct size, which is usually about 1 / 10 of the original size (1.5 mm x 1 mm) or less. This way, the result will be more accurate.  Remove Noise / Denoise Human eyes can’t even read documents that have many noises, so does the OCR engine. Noises make the engine difficult to read original sources and it can decrease the OCR accuracy. If the image has background or foreground noise, remove it to get a higher quality data extraction.  Increase Image Contrast How do you see white papers with light grey ink? You -and the OCR engine must be uncomfortable reading such papers. Thus, try to increase the contrast between text and background brings more clarity to the output. The best contrast will help the OCR engine to read images accurately.  De-skew Original Source No one wants to read papers upside down. Thus, make sure you get the image in the right format and shape (text should appear horizontal and not inclined). The image can be rotated by tilting it to one side, turning it clockwise or counter-clockwise, and turning it back to the other side.

I have also used easyocr to convert any language in the image to text using OCR techniques This was their output https://github.com/fagrahmed12/OCR_AI_Team/tree/master/english https://github.com/fagrahmed12/OCR_AI_Team/tree/master/french

In the case of Arabic, it was the same result as Arabic that appeared in arabicocr previously Here was the knowledge that the quality of the image affects the quality of OCR Changing the package does not affect the quality of the reading

Read next

Diamond Price Prediction

Using Data Science with Machine Learning to detect the price of diamonds using significant features given by the most linked features that are taken into consideration when evaluating price by diamond sellers.

Data Science Job Salaries

This notebook is a study on the Data Science Job Salaries dataset, containing information on salary, company size and location, remote work, employee location, and many other related to people working in the Data Science field throughout the world.

Automated EDA

Python Script that would take any type of data as an input whether (.csv or .xlsx) that hold data with different type of errors that need to be corrected  first then return an automated Exploratory Data Analysis Charts as an output..