Data Scientist Assignment file: https://drive.goog...

Question

Data Scientist Assignment file: https://drive.google.com/drive/folders/1ThHnUQjkCNTOKXnvySVfpHZxZbYsFfMQ

Problem Statement: License Plate Recognition

Detecting and recognizing vehicle license plates represents a widely recognized challenge that has garnered significant attention. This challenge revolves around the utilization of two distinct datasets:

A collection of vehicle images (totaling 900 images) sourced from the internet and meticulously annotated. These annotations encompass the precise coordinates of bounding boxes encapsulating the license plates within each image.
An assemblage of license plate images (also amounting to 900 images) where the annotations take the form of the alphanumeric text inscribed on each license plate.

The primary aim of this task is twofold: first, to identify and locate the license plates affixed to the vehicles in the images, and second, to perform character recognition on these license plates, deciphering the alphanumeric text they contain.

The dataset is organized into three distinct sets:

Training Set 1: This set comprises 900 images, with each image featuring a single car along with its corresponding license plate. The provided annotations include the coordinates of the bounding box (ymin, xmin, ymax, xmax) that outlines the license plate in each image.

Training Set 2: Consisting of another 900 images, this set focuses solely on license plates. Each image in this set contains a license plate, and the provided annotations contain the characters present on each license plate.

Test Set: This set consists of 201 images and is structured similarly to the first training set. In this set, your task is twofold: you need to detect the license plates within the images and recognize the characters on those plates.

Evaluation Criteria:

1.Pre-processing, data analysis, and
understanding
2.Data exploration
3.Model building
4.Accuracy of the character recognition from the license plate

Above is the problem statement for a task that i must complete to get shortlisted for an intern.

I want you to first go through the above given problem statement and understand in thoroughly and in detail. Make sure you understand what the expectation. Then based on that I want you to generate a step by step plan to implement this project. Be thorough and methodic in your approach.

Make this plan very detailed and very explicit and specific in terms of what must be done.
The plan should be actually instructions that are very detailed and precise and must be the end to end solution to the given problem statement.

I will use the plan you generate to give it to a coding agent to generate the code for me. So do accordingly.

Answer 1

Below is a turn-by-turn implementation play-book you can hand directly to an automated coding agent.
Every task is written as an imperative instruction with all needed file names, folder layouts, hyper-parameters, and command-line calls.
Feel free to shorten or reorder steps to fit available compute, but do not skip any numbered task—each one feeds the next.

0 Project scaffold (≈15 min)

Repository layout

text
lp-recog/
├── data/
│   ├── raw/                 # Copy the three zips here
│   ├── images/              # Symlink or extract under this
│   └── annotations/         # All json / txt label files
├── notebooks/
├── src/
│   ├── detector/            # YOLO training code
│   ├── ocr/                 # CRNN or TrOCR code
│   ├── utils/               # shared helpers
│   └── pipeline.py          # end-to-end inference script
├── outputs/                 # logs, checkpoints, predictions
├── requirements.txt
└── README.md

Environment

bash
conda create -n lp python=3.10 -y
conda activate lp
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121  # or match GPU
pip install ultralytics==8.* albumentations opencv-python pytesseract wandb pandas matplotlib scikit-learn rapidfuzz

Global config file (src/config.yaml) – store all paths & hyper-params so every module can load one source of truth.

1 Raw-data ingestion & sanity checks (≈1 h)

Download & extract

bash
gdown --folder 1ThHnUQjkCNTOKXnvySVfpHZxZbYsFfMQ -O data/raw
unzip data/raw/train1.zip -d data/images/train1
unzip data/raw/train2.zip -d data/images/train2
unzip data/raw/test.zip   -d data/images/test

Unify annotation formats

Training Set 1
- Source: a CSV/JSON with filename,ymin,xmin,ymax,xmax.
- Convert to YOLO txt (<class_id> x_center y_center w h in relative coords) using script src/utils/convert_det_labels.py.
Training Set 2
- Source: CSV with filename,label_text.
- Save to ocr_labels.csv and create a char-vocabulary file (chars.txt).
Train/val split
- Detector: 900 car images → 720 train / 180 val (stratify on plate height–width ratio buckets).
- OCR: 900 plate images + cropped plates from Set 1 (see §3) → 90/10 split per-plate (not per-char).
- Persist split indices to splits.yaml.

2 Exploratory data analysis (EDA) (≈1 h, notebook)

Visualise distribution of bounding-box areas & aspect ratios.
Heat-map of plate locations (normalised 0–1 grid).
Character frequency bar-plot – confirm coverage of 0–9 and A–Z.
Sample montage: 5×5 grid overlaying bboxes to eyeball label quality.

Flag any corrupt / extreme-outlier entries and write their filenames to data/annotation_errors.txt for later exclusion.

3 Data preprocessing & augmentation (coding agent creates modules)

3.1 Plate detector

Albumentations pipeline

python
A.Compose([
    A.LongestMaxSize(max_size=640, interpolation=1),
    A.PadIfNeeded(640, 640, border_mode=cv2.BORDER_CONSTANT, value=(114,114,114)),
    A.RandomBrightnessContrast(0.2, 0.2, p=0.5),
    A.MotionBlur(blur_limit=5, p=0.2),
    A.ShiftScaleRotate(scale_limit=0.15, rotate_limit=5, shift_limit=0.1, border_mode=0, p=0.5),
    A.RandomFog(p=0.1),
    A.HorizontalFlip(p=0.02)  # low prob; most plates are symmetric-safe
], bbox_params=A.BboxParams(format='yolo', label_fields=['class_labels']))

Save to src/detector/augment.py.

3.2 OCR corpus builder

Crop additional plates: for each Training Set 1 image, apply bounding box, expand by 5 px per side, save PNG to data/images/plates_extra/.
Perspective rectification (optional): detect four corners via minimal-area rectangle and warp to 1:4 aspect.
Albumentations OCR transform:
- Resize height → 32 px, keep ratio, pad width to next multiple of 8 with zeros.
- Apply RandomBrightnessContrast, ImageCompression (quality 30-60), Sharpen, GaussianNoise.
- To simulate dirty plates, occasionally overlay random speckles.
Implement in src/ocr/augment.py.

4 Model 1 – License-plate detector (YOLOv8) (≈2-3 h GPU)

Dataset YAML (detector/data.yaml):

yaml
path: ../../data/images
train: train1
val: val1
nc: 1
names: ['plate']

Training command

bash
yolo detect train \
    model=yolov8n.pt \
    data=src/detector/data.yaml \
    epochs=120 \
    imgsz=640 \
    batch=16 \
    lr0=0.01 \
    optimizer=SGD \
    hsv_h=0.015 hsv_s=0.7 hsv_v=0.4 mosaic=1.0 mixup=0.0 \
    project=outputs/detector \
    name=v8n_640

Validation metric: monitor val/box/mAP50-95. Target ≥ 0.90.
Export best checkpoint to weights/detector.pt.

5 Model 2 – OCR (two options, pick one or implement both)

5.1 Option A: CRNN + CTC (fast)

Tokenizer: map each char in chars.txt to id; blank=0.

Model:

python
class CRNN(nn.Module):
    cnn = torchvision.models.resnet18(weights='IMAGENET1K_V1', num_classes=None)  # drop fc
    rnn = nn.Sequential(
        nn.LSTM(512, 256, 2, bidirectional=True, batch_first=True),
        nn.LSTM(512, 256, 2, bidirectional=True, batch_first=True)
    )
    classifier = nn.Linear(512, num_classes)

Loss: torch.nn.CTCLoss(blank=0, zero_infinity=True).
Training loop (trainer script src/ocr/train_crnn.py):
- epochs = 80, batch = 64, ADAgrad lr = 1e-3.
- Mixed precision (torch.cuda.amp).
- Early-stop on val_plate_char_acc.

5.2 Option B: TrOCR (VisionEncoder–Decoder) (higher accuracy)

bash
pip install transformers>=4.40 sentencepiece

from transformers import TrOCRProcessor, VisionEncoderDecoderModel – load microsoft/trocr-base-handwritten.
Fine-tune in FP16, lr = 5e-5, epochs = 20.
Cast vocabulary to dataset-specific by re-initialising decoder embeddings (characters only).

6 End-to-end inference pipeline (`src/pipeline.py`)

Load detector.pt, perform NMS at conf=0.25, iou=0.45.

For each prediction:

python
x1,y1,x2,y2 = scale_coords(...)
plate_img = letterbox_crop(frame, x1,y1,x2,y2, pad=5)
plate_img = rectify_if_needed(plate_img)
text_pred = ocr_model(plate_img)  # greedy or beam-search decode

Post-processing
- Upper-case everything; strip spaces.
- Validate against regex ^[A-Z0-9]{5,10}$; if fail, attempt autocorrect with rapidfuzz.process.extractOne using training-set strings as dictionary.

Output format – produce submission.csv

text
filename,xmin,ymin,xmax,ymax,plate_text
0001.jpg,123,45,312,108,KA01AB1234
...

7 Evaluation script (`evaluate.py`)

Detection: mAP @ IoU 0.5 – 0.95 using ultralytics.utils.metrics.
Recognition:
- char_accuracy = (# correct characters) / (# total characters)
- plate_accuracy = (# perfectly decoded plates) / (# plates)
Global score = harmonic mean of mAP and plate_accuracy.

8 Hyper-parameter search (optional, 4–8 h overnight)

Sweep detector image-size {640, 896}, lr {1e-2,5e-3}.
Sweep CRNN hidden size {256,384} and augment probability.
Use Weights & Biases sweeps; log best run ids.

9 Packaging & reproducibility

requirements.txt – freeze via pip-compile.

Dockerfile

dockerfile
FROM nvidia/cuda:12.0.1-cudnn8-runtime-ubuntu22.04
WORKDIR /workspace/lp-recog
COPY . .
RUN apt-get update && apt-get install -y tesseract-ocr
RUN pip install -r requirements.txt
CMD ["python", "src/pipeline.py", "--images", "data/images/test", "--output", "outputs/predictions"]

README.md – include:
- Dataset download instructions.
- One-line training & inference commands.
- Hardware used and expected runtimes.
- Sample predictions screenshot.

10 Timeline (assuming single RTX 4090)

Phase	Duration
0 Scaffold & env	0.5 h
1 Ingestion & split	1 h
2 EDA	1 h
3 Augment modules	1 h
4 Detector training	2 h (YOLOv8n)
5 OCR fine-tune	1.5 h (CRNN)
6 Pipeline + eval	1 h
7 Write-up	0.5 h
Total	~8.5 h

Deliverables checklist for submission

submission.csv with predictions on 201 test images
weights/detector.pt and weights/ocr.pt
evaluate.py + metrics report (printout or markdown)
Full repo zipped / GitHub link with README and reproducible Dockerfile

Follow the above roadmap exactly and you’ll cover every evaluation bullet: data understanding → exploration → modelling → recognition accuracy. Good luck!

Data Scientist Assignment file: https://drive.goog...

Frage

Gedanke

Creating the authentication pipeline

Outlining the recognition pipeline

Antwort

0 Project scaffold (≈15 min)

1 Raw-data ingestion & sanity checks (≈1 h)

2 Exploratory data analysis (EDA) (≈1 h, notebook)

3 Data preprocessing & augmentation (coding agent creates modules)

3.1 Plate detector

3.2 OCR corpus builder

4 Model 1 – License-plate detector (YOLOv8) (≈2-3 h GPU)

5 Model 2 – OCR (two options, pick one or implement both)

5.1 Option A: CRNN + CTC (fast)

5.2 Option B: TrOCR (VisionEncoder–Decoder) (higher accuracy)

6 End-to-end inference pipeline (`src/pipeline.py`)

7 Evaluation script (`evaluate.py`)

8 Hyper-parameter search (optional, 4–8 h overnight)

9 Packaging & reproducibility

10 Timeline (assuming single RTX 4090)

Deliverables checklist for submission

Diesen Q&A teilen

Frage

Gedanke

Creating the authentication pipeline

Outlining the recognition pipeline

Antwort

0 Project scaffold (≈15 min)

1 Raw-data ingestion & sanity checks (≈1 h)

2 Exploratory data analysis (EDA) (≈1 h, notebook)

3 Data preprocessing & augmentation (coding agent creates modules)

3.1 Plate detector

3.2 OCR corpus builder

4 Model 1 – License-plate detector (YOLOv8) (≈2-3 h GPU)

5 Model 2 – OCR (two options, pick one or implement both)

5.1 Option A: CRNN + CTC (fast)

5.2 Option B: TrOCR (VisionEncoder–Decoder) (higher accuracy)

6 End-to-end inference pipeline (src/pipeline.py)

7 Evaluation script (evaluate.py)

8 Hyper-parameter search (optional, 4–8 h overnight)

9 Packaging & reproducibility

10 Timeline (assuming single RTX 4090)

Deliverables checklist for submission

Diesen Q&A teilen

6 End-to-end inference pipeline (`src/pipeline.py`)

7 Evaluation script (`evaluate.py`)