this is my code for finding persian text in an ima...

Question

this is my code for finding persian text in an image

import matplotlib.patches as patches
from ultralytics import YOLO

model_path = "farsichar.pt"
model = YOLO(model_path)

frame = (to_check_cluster[0] * 255).astype(np.uint8) # Ensure frame is correctly scaled

min_confidence = 0.5
results = model(frame)

Create a figure and axis for matplotlib

fig, ax = plt.subplots(figsize=(5, 5))

Loop through results and process boxes

for result in results:
for box in result.boxes:
x1, y1, x2, y2 = map(int, box.xyxy[0]) # Get bounding box coordinates
confidence = box.conf[0]
label = result.names[int(box.cls[0])] # Get label
print(confidence)
if confidence > 0:# min_confidence:

# Extract region of interest (ROI) and perform OCR

text
        roi = frame[y1:y2, x1:x2]
        gray = cv2.cvtColor(roi, cv2.COLOR_BGR2GRAY)
        thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]
        detected_text = pytesseract.image_to_string(Image.fromarray(thresh), lang="fas")
        print(detected_text)

        # Draw bounding box using matplotlib
        rect = patches.Rectangle((x1, y1), x2 - x1, y2 - y1, linewidth=2, edgecolor='green', facecolor='none')
        ax.add_patch(rect)

        # Add detected text above the bounding box using matplotlib
        if detected_text.strip():  # Only add text if it's not empty
            ax.text(x1, y1 - 10, detected_text.strip(), fontsize=10, color='green', weight='bold')

frame_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
ax.imshow(frame_rgb)

ax.axis('off')

plt.show()

but in this part print( detected_text ) there is nothing to show anything. even when it show me a text its not precisely the text is on the picture what can I do? do you have any other and more precise idea???

Answer 1

Here are a few suggestions and best practices you can try to improve Persian OCR results and debugging in your pipeline:

1. Verify Tesseract Persian language data is installed

Make sure you actually have the Persian language data (fas.traineddata) installed.
- If you are on Windows, check inside your Tesseract-OCR\tessdata directory.
- If you are on Linux, check inside /usr/share/tesseract-ocr/4.00/tessdata (the path can vary by distro).
- You can download additional languages from the official repo: tesseract-ocr/tessdata.

In your code, confirm that you use the exact language code (fas) that matches the .traineddata file name.

python
detected_text = pytesseract.image_to_string(
    Image.fromarray(thresh),
    lang="fas"
)

If Tesseract is not actually loading the Persian language pack, you will get empty or incorrect results.

2. Use Tesseract configuration parameters

When calling image_to_string, you can pass extra Tesseract config. For example, try:

Page Segmentation Mode (psm):
- Commonly, --psm 6 or --psm 7 can help if your bounding box is already tight around a single line or single word.
OCR Engine Mode (oem):
- Try --oem 3 (the default LSTM-based engine).

In code:

python
config = "--oem 3 --psm 6"  # or 7, depending on your use case
detected_text = pytesseract.image_to_string(
    Image.fromarray(thresh),
    lang="fas",
    config=config
)

Experiment with different psm values to see which works best for your bounding boxes.

3. Improve your pre-processing

Persian text (like Arabic) can be sensitive to small image artifacts, noise, or poor binarization. You can attempt:

Adaptive Thresholding: Instead of a single global Otsu threshold, you might try adaptive thresholding or different morphological operations to enhance text.
Noise removal / morphological operations:
- E.g., cv2.medianBlur() or small dilations/erosions.

Resize: If your bounding boxes are small, Tesseract can fail. Sometimes enlarging the ROI before performing OCR helps:

python
scale_factor = 2  # or higher
resized_roi = cv2.resize(thresh, None, fx=scale_factor, fy=scale_factor, interpolation=cv2.INTER_LINEAR)

Invert: Tesseract sometimes does better with black text on white background. If your text is light on dark, try:
```
python
inverted = cv2.bitwise_not(thresh)
```
and pass that to Tesseract.

4. Check bounding boxes from YOLO

Sometimes YOLO might give bounding boxes that are:

Too large and include a lot of background.
Too small or cut off parts of the text.
Slightly off in coordinates due to resizing or pre-processing mismatch.

Sanity check your bounding boxes visually. For instance, save each ROI as an image in a loop to confirm you’re getting the correct region:

python
for i, box in enumerate(result.boxes):
    x1, y1, x2, y2 = map(int, box.xyxy[0])
    roi = frame[y1:y2, x1:x2]
    cv2.imwrite(f"roi_{i}.png", roi)

Review these saved images to see what Tesseract is actually receiving.

5. Consider a dedicated text detection + recognition pipeline

Tesseract’s built-in text detection can be unreliable for certain fonts or layouts. You are already using YOLO for detection, but for recognition you might explore a more specialized OCR approach:

EasyOCR
- Built-in detection and recognition for many languages including Persian (fa).
- Example:
```
python
import easyocr
reader = easyocr.Reader(['fa'])  # make sure 'fa' is installed
result = reader.readtext(roi, detail=0)
print(result)
```
- Or, if you want to keep YOLO for detection, you can pass each YOLO-cropped ROI to EasyOCR’s readtext with a custom detector=False.

PaddleOCR

Has multilingual support and often yields better results for complex scripts.

They provide ready-to-use Python APIs:

python
from paddleocr import PaddleOCR
ocr = PaddleOCR(lang='arabic')  # or check if 'persian' is supported
result = ocr.ocr(roi, cls=True)

Google Vision OCR / AWS Rekognition / Azure Cognitive Services
- Cloud-based, more accurate for complex scripts but requires an API key and usage costs.

6. Pipeline summary to improve accuracy

Here’s a typical workflow to get the best possible result:

Detect bounding boxes (YOLO or another text detection model specialized for Persian).
Crop each bounding box ROI carefully.
Pre-process the ROI:
- Convert to grayscale
- Possibly invert if needed
- Noise removal / morphological operations
- Adaptive threshold
- Resize to a comfortable size for OCR
Use a robust OCR engine with the correct language config (fas), best psm/oem, or a neural approach like EasyOCR/PaddleOCR.
Evaluate results visually, keep adjusting thresholding, bounding box accuracy, and Tesseract config until you find what works best for your images.

Common debug steps

Print out or display the thresholded ROI to confirm you actually see your text in the ROI.
Try reading the ROI with Tesseract’s English model (lang="eng") just to see if Tesseract is picking up anything at all. If you get an empty string with English as well, then it’s likely an issue with bounding boxes or pre-processing.
Pass the same ROI to e.g. EasyOCR to see if it can recognize anything. If EasyOCR works but Tesseract fails, you likely need better Tesseract config or language data.

Conclusion

To summarize:

Confirm the Persian Tesseract model is installed and loaded correctly.
Tune your Tesseract config (--oem, --psm).
Improve pre-processing (thresholding, resizing, morphological steps).
Check bounding boxes are correct.
If Tesseract still fails, try a specialized OCR solution like EasyOCR or PaddleOCR for Persian script.

By methodically addressing each of these points, you should see a significant improvement in extracting Persian text accurately from your images.