Learn how to build a simple Optical Character Recognition (OCR) system using OpenCV-Python to recognize handwritten digits with ease.
This code snippet demonstrates a simple optical character recognition (OCR) process for recognizing handwritten digits in an image. It utilizes OpenCV for image processing and a pre-trained machine learning model for digit classification.
import cv2
import numpy as np
image = cv2.imread('image.jpg')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
cnts = cv2.findContours(thresh, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
for c in cnts:
x,y,w,h = cv2.boundingRect(c)
ROI = thresh[y:y+h, x:x+w]
# Resize the digit image to a standard size
ROI = cv2.resize(ROI, (28,28), interpolation=cv2.INTER_AREA)
# Flatten the image into a 1D array
ROI = ROI.reshape((1,784))
# Use a pre-trained model (e.g., KNN, SVM) to predict the digit
prediction = model.predict(ROI)
print("Predicted digit:", prediction)
cv2.imshow('Image', image)
cv2.waitKey(0)
Explanation:
This Python code performs handwritten digit recognition using a KNN classifier. It loads the MNIST dataset for training, preprocesses an input image to find digit contours, and then uses the trained KNN model to predict the digit in each contour. The predictions are then displayed on the original image.
import cv2
import numpy as np
from sklearn.neighbors import KNeighborsClassifier
# Load the pre-trained MNIST dataset
with np.load('mnist.npz') as data:
train_images = data['x_train']
train_labels = data['y_train']
# Flatten the images and normalize pixel values
train_images = train_images.reshape((60000, 784)) / 255.0
# Create a KNN classifier and train it on the MNIST dataset
model = KNeighborsClassifier(n_neighbors=5)
model.fit(train_images, train_labels)
# Load and preprocess the image
image = cv2.imread('digits_image.jpg')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
# Find contours of the digits
cnts = cv2.findContours(thresh, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
# Loop through each contour and recognize the digit
for c in cnts:
x, y, w, h = cv2.boundingRect(c)
ROI = thresh[y:y + h, x:x + w]
# Resize the digit image to a standard size
ROI = cv2.resize(ROI, (28, 28), interpolation=cv2.INTER_AREA)
# Flatten the image into a 1D array
ROI = ROI.reshape((1, 784))
# Use the trained KNN model to predict the digit
prediction = model.predict(ROI)
print("Predicted digit:", prediction[0])
# Draw a rectangle around the digit and display the prediction
cv2.rectangle(image, (x, y), (x + w, y + h), (0, 255, 0), 2)
cv2.putText(image, str(prediction[0]), (x, y - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.9, (0, 255, 0), 2)
# Display the results
cv2.imshow('Image', image)
cv2.waitKey(0)
Before running the code:
Install necessary libraries:
pip install opencv-python numpy scikit-learn
Download the MNIST dataset:
Replace 'digits_image.jpg'
with the actual path to your image containing digits.
Explanation:
General:
Preprocessing:
cv2.RETR_EXTERNAL
flag ensures that only the outermost contours of the digits are detected, avoiding nested contours.Digit Recognition:
Improvements:
Additional Considerations:
This code snippet demonstrates a simple pipeline for recognizing handwritten digits in an image using OpenCV and a pre-trained machine learning model.
Here's a breakdown of the process:
Image Preprocessing:
Digit Extraction and Recognition:
Result Display:
Key Libraries Used:
This code provides a basic framework for handwritten digit recognition. You can adapt and expand it by incorporating different pre-trained models, improving preprocessing techniques, and adding features like digit localization and bounding box visualization.
This code provides a practical example of using OpenCV and machine learning for handwritten digit recognition. By leveraging pre-trained models and image processing techniques, it accurately identifies and classifies digits within images. This foundation can be further developed for applications like automating data entry from handwritten forms, recognizing digits in license plates, and other OCR tasks. The code's modularity allows for easy integration of different machine learning models and preprocessing steps, making it adaptable to various digit recognition scenarios.