Data Scientist Assignment file: https://drive.goog...
생성일: 2025년 6월 10일
생성일: 2025년 6월 10일
Data Scientist Assignment file: https://drive.google.com/drive/folders/1ThHnUQjkCNTOKXnvySVfpHZxZbYsFfMQ
Problem Statement: License Plate Recognition
Detecting and recognizing vehicle license plates represents a widely recognized challenge that has garnered significant attention. This challenge revolves around the utilization of two distinct datasets:
A collection of vehicle images (totaling 900 images) sourced from the internet and meticulously annotated. These annotations encompass the precise coordinates of bounding boxes encapsulating the license plates within each image.
An assemblage of license plate images (also amounting to 900 images) where the annotations take the form of the alphanumeric text inscribed on each license plate.
The primary aim of this task is twofold: first, to identify and locate the license plates affixed to the vehicles in the images, and second, to perform character recognition on these license plates, deciphering the alphanumeric text they contain.
The dataset is organized into three distinct sets:
Training Set 1: This set comprises 900 images, with each image featuring a single car along with its corresponding license plate. The provided annotations include the coordinates of the bounding box (ymin, xmin, ymax, xmax) that outlines the license plate in each image.
Training Set 2: Consisting of another 900 images, this set focuses solely on license plates. Each image in this set contains a license plate, and the provided annotations contain the characters present on each license plate.
Test Set: This set consists of 201 images and is structured similarly to the first training set. In this set, your task is twofold: you need to detect the license plates within the images and recognize the characters on those plates.
Evaluation Criteria:
1.Pre-processing, data analysis, and
understanding
2.Data exploration
3.Model building
4.Accuracy of the character recognition from the license plate
Above is the problem statement for a task that i must complete to get shortlisted for an intern.
I want you to first go through the above given problem statement and understand in thoroughly and in detail. Make sure you understand what the expectation. Then based on that I want you to generate a step by step plan to implement this project. Be thorough and methodic in your approach.
Make this plan very detailed and very explicit and specific in terms of what must be done.
The plan should be actually instructions that are very detailed and precise and must be the end to end solution to the given problem statement.
I will use the plan you generate to give it to a coding agent to generate the code for me. So do accordingly.
Below is a turn-by-turn implementation play-book you can hand directly to an automated coding agent.
Every task is written as an imperative instruction with all needed file names, folder layouts, hyper-parameters, and command-line calls.
Feel free to shorten or reorder steps to fit available compute, but do not skip any numbered task—each one feeds the next.
Repository layout
textlp-recog/ ├── data/ │ ├── raw/ # Copy the three zips here │ ├── images/ # Symlink or extract under this │ └── annotations/ # All json / txt label files ├── notebooks/ ├── src/ │ ├── detector/ # YOLO training code │ ├── ocr/ # CRNN or TrOCR code │ ├── utils/ # shared helpers │ └── pipeline.py # end-to-end inference script ├── outputs/ # logs, checkpoints, predictions ├── requirements.txt └── README.md
Environment
bashconda create -n lp python=3.10 -y conda activate lp pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121 # or match GPU pip install ultralytics==8.* albumentations opencv-python pytesseract wandb pandas matplotlib scikit-learn rapidfuzz
Global config file (src/config.yaml
) – store all paths & hyper-params so every module can load one source of truth.
Download & extract
bashgdown --folder 1ThHnUQjkCNTOKXnvySVfpHZxZbYsFfMQ -O data/raw unzip data/raw/train1.zip -d data/images/train1 unzip data/raw/train2.zip -d data/images/train2 unzip data/raw/test.zip -d data/images/test
Unify annotation formats
Training Set 1
filename,ymin,xmin,ymax,xmax
.<class_id> x_center y_center w h
in relative coords) using script src/utils/convert_det_labels.py
.Training Set 2
filename,label_text
.ocr_labels.csv
and create a char-vocabulary file (chars.txt
).Train/val split
splits.yaml
.Flag any corrupt / extreme-outlier entries and write their filenames to
data/annotation_errors.txt
for later exclusion.
Albumentations pipeline
pythonA.Compose([ A.LongestMaxSize(max_size=640, interpolation=1), A.PadIfNeeded(640, 640, border_mode=cv2.BORDER_CONSTANT, value=(114,114,114)), A.RandomBrightnessContrast(0.2, 0.2, p=0.5), A.MotionBlur(blur_limit=5, p=0.2), A.ShiftScaleRotate(scale_limit=0.15, rotate_limit=5, shift_limit=0.1, border_mode=0, p=0.5), A.RandomFog(p=0.1), A.HorizontalFlip(p=0.02) # low prob; most plates are symmetric-safe ], bbox_params=A.BboxParams(format='yolo', label_fields=['class_labels']))
Save to src/detector/augment.py
.
Crop additional plates: for each Training Set 1 image, apply bounding box, expand by 5 px per side, save PNG to data/images/plates_extra/
.
Perspective rectification (optional): detect four corners via minimal-area rectangle and warp to 1:4 aspect.
Albumentations OCR transform:
RandomBrightnessContrast
, ImageCompression (quality 30-60)
, Sharpen
, GaussianNoise
.Implement in src/ocr/augment.py
.
Dataset YAML (detector/data.yaml
):
yamlpath: ../../data/images train: train1 val: val1 nc: 1 names: ['plate']
Training command
bashyolo detect train \ model=yolov8n.pt \ data=src/detector/data.yaml \ epochs=120 \ imgsz=640 \ batch=16 \ lr0=0.01 \ optimizer=SGD \ hsv_h=0.015 hsv_s=0.7 hsv_v=0.4 mosaic=1.0 mixup=0.0 \ project=outputs/detector \ name=v8n_640
Validation metric: monitor val/box/mAP50-95
. Target ≥ 0.90.
Export best checkpoint to weights/detector.pt
.
Tokenizer: map each char in chars.txt
to id; blank=0
.
Model:
pythonclass CRNN(nn.Module): cnn = torchvision.models.resnet18(weights='IMAGENET1K_V1', num_classes=None) # drop fc rnn = nn.Sequential( nn.LSTM(512, 256, 2, bidirectional=True, batch_first=True), nn.LSTM(512, 256, 2, bidirectional=True, batch_first=True) ) classifier = nn.Linear(512, num_classes)
Loss: torch.nn.CTCLoss(blank=0, zero_infinity=True)
.
Training loop (trainer script src/ocr/train_crnn.py
):
torch.cuda.amp
).val_plate_char_acc
.bashpip install transformers>=4.40 sentencepiece
from transformers import TrOCRProcessor, VisionEncoderDecoderModel
– load microsoft/trocr-base-handwritten
.src/pipeline.py
)Load detector.pt, perform NMS at conf=0.25, iou=0.45
.
For each prediction:
pythonx1,y1,x2,y2 = scale_coords(...) plate_img = letterbox_crop(frame, x1,y1,x2,y2, pad=5) plate_img = rectify_if_needed(plate_img) text_pred = ocr_model(plate_img) # greedy or beam-search decode
Post-processing
^[A-Z0-9]{5,10}$
; if fail, attempt autocorrect with rapidfuzz.process.extractOne
using training-set strings as dictionary.Output format – produce submission.csv
textfilename,xmin,ymin,xmax,ymax,plate_text 0001.jpg,123,45,312,108,KA01AB1234 ...
evaluate.py
)ultralytics.utils.metrics
.char_accuracy = (# correct characters) / (# total characters)
plate_accuracy = (# perfectly decoded plates) / (# plates)
mAP
and plate_accuracy
.requirements.txt
– freeze via pip-compile
.
Dockerfile
dockerfileFROM nvidia/cuda:12.0.1-cudnn8-runtime-ubuntu22.04 WORKDIR /workspace/lp-recog COPY . . RUN apt-get update && apt-get install -y tesseract-ocr RUN pip install -r requirements.txt CMD ["python", "src/pipeline.py", "--images", "data/images/test", "--output", "outputs/predictions"]
README.md
– include:
Phase | Duration |
---|---|
0 Scaffold & env | 0.5 h |
1 Ingestion & split | 1 h |
2 EDA | 1 h |
3 Augment modules | 1 h |
4 Detector training | 2 h (YOLOv8n) |
5 OCR fine-tune | 1.5 h (CRNN) |
6 Pipeline + eval | 1 h |
7 Write-up | 0.5 h |
Total | ~8.5 h |
submission.csv
with predictions on 201 test imagesweights/detector.pt
and weights/ocr.pt
evaluate.py
+ metrics report (printout or markdown)Follow the above roadmap exactly and you’ll cover every evaluation bullet: data understanding → exploration → modelling → recognition accuracy. Good luck!