this is my code for finding persian text in an ima...
Creado el: 8 de febrero de 2025
Creado el: 8 de febrero de 2025
this is my code for finding persian text in an image
import matplotlib.patches as patches
from ultralytics import YOLO
model_path = "farsichar.pt"
model = YOLO(model_path)
frame = (to_check_cluster[0] * 255).astype(np.uint8) # Ensure frame is correctly scaled
min_confidence = 0.5
results = model(frame)
fig, ax = plt.subplots(figsize=(5, 5))
for result in results:
for box in result.boxes:
x1, y1, x2, y2 = map(int, box.xyxy[0]) # Get bounding box coordinates
confidence = box.conf[0]
label = result.names[int(box.cls[0])] # Get label
print(confidence)
if confidence > 0:# min_confidence:
textroi = frame[y1:y2, x1:x2] gray = cv2.cvtColor(roi, cv2.COLOR_BGR2GRAY) thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1] detected_text = pytesseract.image_to_string(Image.fromarray(thresh), lang="fas") print(detected_text) # Draw bounding box using matplotlib rect = patches.Rectangle((x1, y1), x2 - x1, y2 - y1, linewidth=2, edgecolor='green', facecolor='none') ax.add_patch(rect) # Add detected text above the bounding box using matplotlib if detected_text.strip(): # Only add text if it's not empty ax.text(x1, y1 - 10, detected_text.strip(), fontsize=10, color='green', weight='bold')
frame_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
ax.imshow(frame_rgb)
ax.axis('off')
plt.show()
but in this part print( detected_text ) there is nothing to show anything. even when it show me a text its not precisely the text is on the picture what can I do? do you have any other and more precise idea???
Here are a few suggestions and best practices you can try to improve Persian OCR results and debugging in your pipeline:
Make sure you actually have the Persian language data (fas.traineddata
) installed.
Tesseract-OCR\tessdata
directory./usr/share/tesseract-ocr/4.00/tessdata
(the path can vary by distro).In your code, confirm that you use the exact language code (fas
) that matches the .traineddata
file name.
pythondetected_text = pytesseract.image_to_string( Image.fromarray(thresh), lang="fas" )
If Tesseract is not actually loading the Persian language pack, you will get empty or incorrect results.
When calling image_to_string
, you can pass extra Tesseract config. For example, try:
psm
):
--psm 6
or --psm 7
can help if your bounding box is already tight around a single line or single word.oem
):
--oem 3
(the default LSTM-based engine).In code:
pythonconfig = "--oem 3 --psm 6" # or 7, depending on your use case detected_text = pytesseract.image_to_string( Image.fromarray(thresh), lang="fas", config=config )
Experiment with different psm
values to see which works best for your bounding boxes.
Persian text (like Arabic) can be sensitive to small image artifacts, noise, or poor binarization. You can attempt:
cv2.medianBlur()
or small dilations/erosions.pythonscale_factor = 2 # or higher resized_roi = cv2.resize(thresh, None, fx=scale_factor, fy=scale_factor, interpolation=cv2.INTER_LINEAR)
and pass that to Tesseract.pythoninverted = cv2.bitwise_not(thresh)
Sometimes YOLO might give bounding boxes that are:
Sanity check your bounding boxes visually. For instance, save each ROI as an image in a loop to confirm you’re getting the correct region:
pythonfor i, box in enumerate(result.boxes): x1, y1, x2, y2 = map(int, box.xyxy[0]) roi = frame[y1:y2, x1:x2] cv2.imwrite(f"roi_{i}.png", roi)
Review these saved images to see what Tesseract is actually receiving.
Tesseract’s built-in text detection can be unreliable for certain fonts or layouts. You are already using YOLO for detection, but for recognition you might explore a more specialized OCR approach:
fa
).pythonimport easyocr reader = easyocr.Reader(['fa']) # make sure 'fa' is installed result = reader.readtext(roi, detail=0) print(result)
readtext
with a custom detector=False
.pythonfrom paddleocr import PaddleOCR ocr = PaddleOCR(lang='arabic') # or check if 'persian' is supported result = ocr.ocr(roi, cls=True)
Google Vision OCR / AWS Rekognition / Azure Cognitive Services
Here’s a typical workflow to get the best possible result:
fas
), best psm/oem, or a neural approach like EasyOCR/PaddleOCR.lang="eng"
) just to see if Tesseract is picking up anything at all. If you get an empty string with English as well, then it’s likely an issue with bounding boxes or pre-processing.To summarize:
--oem
, --psm
).By methodically addressing each of these points, you should see a significant improvement in extracting Persian text accurately from your images.