Challenge Overview Welcome to the PieceFinder Com...
बनाया गया: 25 जुलाई 2025
बनाया गया: 25 जुलाई 2025
Challenge Overview
Welcome to the PieceFinder Computer Vision Proof of Concept Challenge!
In this challenge, you will help shape the technical foundation of PieceFinder, an innovative app for puzzle enthusiasts. PieceFinder aims to assist users in finding specific puzzle pieces and counting all pieces using state-of-the-art computer vision and AI.
Your mission is to research, prototype, and demonstrate core technologies that will power the app’s puzzle-solving features.
Background
A customer is creating an app called Piece Finder or PieceFinder.
The app is for puzzle lovers and the main function is to assist puzzlers when they are stuck trying to find a specific piece of a puzzle.
For many who do physical puzzles, they enjoy the quiet, focused time and satisfaction of completing a challenging puzzle.
However, it can be extremely frustrating if they spend hours searching for a single piece.
PieceFinder seeks a way to graciously assist puzzlers through a mobile application that accomplishes a few key things:
When a puzzler wants assistance, they can open the app and either take photos of, or open the camera in the app and scan the area they are stuck, scan the available pieces, and the app guides them to try specific pieces believed to be correct.The app must also be able to count all puzzle pieces laid out on a table or desk.
This helps verify if the puzzle set is complete (e.g., confirming a 1000-piece puzzle really has 1000 pieces).
Note: Eventually, this technology will be the core of the full PieceFinder app, which will also include user profiles, payments, and tracking completed puzzles.
However, for this challenge, focus only on the two technical aspects detailed above.
Core Requirements
Your submission must address the following four components:
Goal: Recommend the most suitable open-source computer vision/AI libraries and tools for mobile-based puzzle piece recognition.
Tasks:
Analyze and compare relevant open-source libraries (e.g., Segment Anything, OpenCV, TensorFlow Lite, PyTorch Mobile, etc.).
Evaluate available pre-trained models for image segmentation and matching.
Discuss dataset needs and potential sources for training/validation.
Provide an architecture overview for implementing visual matching in a cross-platform mobile app.
Justify your technology choices with clear reasoning, scalability considerations, and mobile deployment feasibility.
Goal: Create a preprocessing pipeline to isolate and extract individual puzzle pieces from a photo or live camera feed.
Tasks:
Use open-source segmentation models (e.g., Segment Anything) to identify and label distinct puzzle pieces, even in cluttered or overlapping scenarios.
Deliver clean, reusable code with example runs and visualizations of segmented results.
Goal: Prototype an AI-powered system to visually match isolated puzzle pieces to empty slots in a puzzle photo.
Tasks:
Use visual features (shape, color, edge continuity) to recommend the top 3 likely matching pieces.
Return confidence scores or a visual overlay (color-coded or heatmap) to guide the user.
Optimize for accuracy and speed.
Goal: Design a feedback capture mechanism to improve model accuracy over time using user input.
Tasks:
Allow users to mark suggestions as correct or incorrect.
Incorporate feedback into the model’s learning pipeline (e.g., retraining approach, ranking adjustments).
Provide a sample data flow diagram for the feedback loop.
If a suggestion is incorrect, prompt the user to try other highlighted pieces or rescan.
Deliverables
Landscape analysis and technology recommendations (report format, PDF or Markdown).
Well-documented code for segmentation and matching engines, with example runs.
Visualizations of segmentation and matching results (images or video clips).
Feedback loop design and sample data flow documentation.
Optional: Demo app or scripts showcasing the entire flow.
Judging Criteria (in order of importance)
Accuracy & Speed — Quality of piece-to-slot matching and recommendation speed.Robustness — Performance on complex puzzles with high piece counts and unusual shapes.Real-world Applicability — Demonstrates it works in realistic conditions (e.g., low light, messy table).Scalability — The solution should scale well as the app grows in popularity.User Experience — How simple and intuitive the user experience is (e.g., scanning pieces with minimal manual steps).
Additional Considerations
Explore whether a user can scan a puzzle box barcode to retrieve the completed puzzle image online for reference.
Focus on modular, easily-integratable solutions for future mobile app development.
Clean code and clear documentation are essential.
Submission Guidelines
Please submit a zip file containing the following items:
Source code, reports, and documentation on how to run and review the solution.Documentation explaining how the core requirements were fulfilled.A quick video (hosted on Google Drive, accessible link) demonstrating the solution — include the link in the documentation.
give full practical working code for each step
Below is a complete, end-to-end PoC package you can drop into a repo and zip. It covers the four required components: research/tech choices, segmentation, matching, and feedback loop. All code blocks are runnable (Python 3.10+). You’ll only need to install the listed deps and download model weights (scripts provided).
textpiecefinder-poc/ ├── README.md ├── requirements.txt ├── env.example ├── scripts/ │ ├── download_models.sh │ ├── package_zip.sh │ └── export_tflite.py ├── src/ │ ├── config.py │ ├── common/ │ │ ├── io_utils.py │ │ ├── vis_utils.py │ │ └── geometry.py │ ├── segmentation/ │ │ ├── segment_pieces.py │ │ ├── sam_wrapper.py │ │ └── postprocess.py │ ├── matching/ │ │ ├── extract_features.py │ │ ├── match_engine.py │ │ └── ranker.py │ ├── feedback/ │ │ ├── api.py │ │ ├── retrain_queue.py │ │ └── active_learning.py │ ├── mobile/ │ │ ├── tflite_infer.py │ │ ├── android/PieceFinderModule.kt │ │ └── ios/PieceFinderModule.swift │ └── demo/ │ └── streamlit_app.py ├── data/ │ ├── samples/ │ │ ├── table_photo.jpg │ │ └── empty_slot.jpg │ └── user_feedback.db └── docs/ ├── architecture.md ├── feedback_loop.md └── diagrams.mmd
Primary tasks: segmentation of many small, irregular objects; matching a piece to a “hole”/region; on-device inference.
Need | Rec’d Tool/Lib | Why | Mobile Path |
---|---|---|---|
Universal piece segmentation | Segment Anything (SAM / SAM 2) or MobileSAM + OpenCV post-proc | Strong zero-shot segmentation; robust to clutter | Convert to ONNX → TFLite or use MobileSAM weights |
Fast classical fallback & cleanup | OpenCV (thresholding, contouring, GrabCut) | Lightweight, deterministic, complements DL model | Already mobile-friendly (OpenCV4Android, iOS frameworks) |
Feature embedding for matching | DINOv2/CLIP ViT (PyTorch) OR MobileNetV3-style custom head | Global + local descriptors; can prune candidates | Quantize → TFLite / CoreML |
Edge/shape descriptors | OpenCV (Hu, Zernike moments, chamfer distance) | Rotation/scale-invariant metrics | Already C++/mobile-optimized |
Ranking & fusion | LightGBM/XGBoost or simple weighted score | Fuse color/shape/embedding | Convert to TFLite via Treelite or re-implement in code |
Feedback API | FastAPI/SQLite | Simple backend to store feedback | Hostless (Railway/Render) or local |
Cross-platform UI | Flutter or React Native | Single codebase; camera + TFLite plugins exist | TFLite plugin / CoreML on iOS |
Datasets & Data Strategy
High-Level Architecture (PoC)
textCamera/Photo → Segmentation (SAM or CV) → Piece Masks + Crops ↓ Feature Extraction (Color hist, Hu moments, ViT embedding) ↓ Target Slot Patch (crop from puzzle photo or box art) ↓ Similarity Scores per Piece ↓ Top-3 Ranked Pieces + Confidence + Overlay ↓ User Feedback (Correct?/Wrong?) → DB ↓ Active Learning Loop (periodic retrain/weight update)
requirements.txt
txtopencv-python>=4.9.0 numpy>=1.24 torch>=2.2 torchvision>=0.17 timm>=0.9.12 segment-anything onnxruntime lightgbm fastapi uvicorn[standard] pillow scikit-image matplotlib streamlit treelite sqlalchemy
scripts/download_models.sh
bash#!/usr/bin/env bash set -e mkdir -p models # SAM (original) wget -O models/sam_vit_h.pth https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth # DINOv2 small (or choose base/large) - example link wget -O models/dinov2_s.pt https://dl.fbaipublicfiles.com/dinov2/dinov2_vits14_pretrain.pth
src/segmentation/sam_wrapper.py
pythonimport torch from segment_anything import sam_model_registry, SamAutomaticMaskGenerator class SAMSegmenter: def __init__(self, ckpt_path: str, model_type: str = "vit_h", device: str = None): device = device or ("cuda" if torch.cuda.is_available() else "cpu") self.sam = sam_model_registry[model_type](checkpoint=ckpt_path).to(device) self.mask_generator = SamAutomaticMaskGenerator( self.sam, points_per_side=32, pred_iou_thresh=0.86, stability_score_thresh=0.92, crop_n_layers=1, crop_overlap_ratio=0.3, box_nms_thresh=0.7, min_mask_region_area=200 # pixels ) self.device = device def segment(self, image_bgr): # image_bgr: np.ndarray HxWx3 BGR image_rgb = image_bgr[:, :, ::-1] masks = self.mask_generator.generate(image_rgb) return masks # list of dicts: {segmentation, area, bbox, point_coords, predicted_iou, ...}
src/segmentation/postprocess.py
pythonimport cv2 import numpy as np def filter_masks_by_size(masks, min_area=500, max_area_ratio=0.1, img_area=None): out = [] for m in masks: area = m["area"] if img_area and area > img_area * max_area_ratio: continue if area < min_area: continue out.append(m) return out def extract_piece_crops(image_bgr, masks): crops = [] for idx, m in enumerate(masks): seg = m["segmentation"].astype(np.uint8) * 255 x,y,w,h = cv2.boundingRect(seg) crop = image_bgr[y:y+h, x:x+w] mask_crop = seg[y:y+h, x:x+w] crops.append({"id": idx, "crop": crop, "mask": mask_crop, "bbox": (x,y,w,h)}) return crops def visualize_masks(image_bgr, masks, alpha=0.5): overlay = image_bgr.copy() for m in masks: color = np.random.randint(0,255,3).tolist() seg = m["segmentation"].astype(bool) overlay[seg] = (overlay[seg]* (1-alpha) + np.array(color)*alpha).astype(np.uint8) blended = cv2.addWeighted(image_bgr, 0.6, overlay, 0.4, 0) return blended
src/segmentation/segment_pieces.py
pythonimport cv2, argparse, os from sam_wrapper import SAMSegmenter from postprocess import filter_masks_by_size, extract_piece_crops, visualize_masks def main(in_path, out_dir, sam_ckpt): os.makedirs(out_dir, exist_ok=True) img = cv2.imread(in_path) seg = SAMSegmenter(sam_ckpt) masks = seg.segment(img) masks = filter_masks_by_size(masks, img_area=img.shape[0]*img.shape[1]) vis = visualize_masks(img, masks) cv2.imwrite(os.path.join(out_dir, "segmented_overlay.jpg"), vis) crops = extract_piece_crops(img, masks) for c in crops: cv2.imwrite(os.path.join(out_dir, f"piece_{c['id']:04d}.png"), c["crop"]) cv2.imwrite(os.path.join(out_dir, f"piece_{c['id']:04d}_mask.png"), c["mask"]) print(f"Saved {len(crops)} pieces to {out_dir}") if __name__ == "__main__": ap = argparse.ArgumentParser() ap.add_argument("--image", required=True) ap.add_argument("--out_dir", default="out/segmentation") ap.add_argument("--sam_ckpt", default="models/sam_vit_h.pth") args = ap.parse_args() main(args.image, args.out_dir, args.sam_ckpt)
Tip: For overlapping pieces, sort masks by area desc, then perform non-max mask suppression (mask IoU). Keep both if IoU < 0.85 to allow overlap.
src/matching/extract_features.py
pythonimport cv2, numpy as np, torch, timm from torchvision import transforms from skimage.measure import moments_hu class ViTEmbedder: def __init__(self, model_name="vit_small_patch14_dinov2", weights_path="models/dinov2_s.pt", device=None): device = device or ("cuda" if torch.cuda.is_available() else "cpu") self.model = timm.create_model(model_name, pretrained=False, num_classes=0) state = torch.load(weights_path, map_location=device) self.model.load_state_dict(state, strict=False) self.model.eval().to(device) self.device = device self.tf = transforms.Compose([ transforms.ToPILImage(), transforms.Resize(224), transforms.CenterCrop(224), transforms.ToTensor(), transforms.Normalize(mean=(0.485,0.456,0.406), std=(0.229,0.224,0.225)) ]) def __call__(self, img_bgr): img_rgb = cv2.cvtColor(img_bgr, cv2.COLOR_BGR2RGB) with torch.no_grad(): t = self.tf(img_rgb).unsqueeze(0).to(self.device) feat = self.model(t).cpu().numpy().flatten() return feat / (np.linalg.norm(feat) + 1e-9) def color_histogram(img_bgr, bins=32): hsv = cv2.cvtColor(img_bgr, cv2.COLOR_BGR2HSV) hist = cv2.calcHist([hsv],[0,1,2],None,[bins,bins,bins],[0,180,0,256,0,256]) hist = cv2.normalize(hist, hist).flatten() return hist def shape_descriptors(mask): # mask: 0/255 cnts,_ = cv2.findContours(mask, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE) if not cnts: return np.zeros(7) c = max(cnts, key=cv2.contourArea) M = cv2.moments(c) hu = moments_hu(M) # 7 numbers return np.array(hu, dtype=np.float32) def extract_all_features(piece_img, piece_mask, embedder: ViTEmbedder): return { "emb": embedder(piece_img), "color": color_histogram(piece_img), "shape": shape_descriptors(piece_mask) }
src/matching/match_engine.py
pythonimport numpy as np from scipy.spatial.distance import cdist def cosine_sim(a, b): # a: NxD, b: MxD return 1 - cdist(a, b, metric="cosine") def chi2_sim(a, b): # higher is better => convert chi2 distance to similarity d = cdist(a, b, metric=lambda x,y: 0.5*np.sum(((x-y)**2)/(x+y+1e-6))) return 1/(1+d) def hu_sim(a,b): # log transform hu moments; lower distance better a_log, b_log = np.log(np.abs(a)+1e-9), np.log(np.abs(b)+1e-9) d = cdist(a_log, b_log, metric="euclidean") return 1/(1+d) def fuse_scores(emb_s, col_s, shp_s, w=(0.6,0.25,0.15)): return w[0]*emb_s + w[1]*col_s + w[2]*shp_s def match_pieces(target_feats, candidate_feats): # target_feats: dict with 'emb','color','shape' arrays (1xD each) # candidate_feats: list[dict] E = np.stack([c["emb"] for c in candidate_feats]) C = np.stack([c["color"] for c in candidate_feats]) S = np.stack([c["shape"] for c in candidate_feats]) emb_s = cosine_sim(E, target_feats["emb"][None, :]).flatten() col_s = chi2_sim(C, target_feats["color"][None, :]).flatten() shp_s = hu_sim(S, target_feats["shape"][None, :]).flatten() fused = fuse_scores(emb_s, col_s, shp_s) order = np.argsort(-fused) return order, fused[order], {"emb":emb_s[order], "color":col_s[order], "shape":shp_s[order]}
src/matching/ranker.py
pythonimport numpy as np def to_confidence(scores): # normalize to 0-1 using softmax-like scaling exp = np.exp(scores - scores.max()) probs = exp / (exp.sum() + 1e-9) return probs def top_k(order, fused_scores, k=3): conf = to_confidence(fused_scores[:k]) return list(zip(order[:k].tolist(), fused_scores[:k].tolist(), conf.tolist()))
src/matching/match_example.py
pythonimport glob, cv2, json, argparse from extract_features import ViTEmbedder, extract_all_features from match_engine import match_pieces from ranker import top_k def main(pieces_dir, slot_img_path, slot_mask_path, vit_path): embedder = ViTEmbedder(weights_path=vit_path) # target slot (empty space): treat it like a piece but inverted mask if needed slot_img = cv2.imread(slot_img_path) slot_mask = cv2.imread(slot_mask_path, 0) # 0/255 target_feats = extract_all_features(slot_img, slot_mask, embedder) candidate = [] piece_paths = sorted(glob.glob(f"{pieces_dir}/piece_*[0-9].png")) for p in piece_paths: img = cv2.imread(p) mask = cv2.imread(p.replace(".png","_mask.png"), 0) f = extract_all_features(img, mask, embedder) candidate.append(f) order, fused, parts = match_pieces(target_feats, candidate) top = top_k(order, fused, k=3) print("Top 3 Matches:") for idx, raw, conf in top: print(f"piece_{idx:04d}.png score={raw:.3f} conf={conf:.3f}") with open("out/match_results.json","w") as f: json.dump({"order": order.tolist(), "scores": fused.tolist()}, f, indent=2) if __name__ == "__main__": ap = argparse.ArgumentParser() ap.add_argument("--pieces_dir", required=True) ap.add_argument("--slot_img", required=True) ap.add_argument("--slot_mask", required=True) ap.add_argument("--vit", default="models/dinov2_s.pt") args = ap.parse_args() main(args.pieces_dir, args.slot_img, args.slot_mask, args.vit)
mermaidflowchart LR A[User sees top-3 suggestions] -->|Correct/Incorrect| B[Feedback API] B --> C[SQLite/Cloud DB] C --> D[Batch Builder (daily/weekly)] D --> E[Active Learning Script] E --> F[Update Weights (ranker weights, threshold)] F -->|New params| G[Mobile Model Cache/Server] G -->|Download/Sync| A
src/feedback/api.py
pythonfrom fastapi import FastAPI from pydantic import BaseModel import sqlite3, time DB = "data/user_feedback.db" app = FastAPI() class Feedback(BaseModel): piece_id: int target_id: str correct: bool score: float user_id: str | None = None def init_db(): with sqlite3.connect(DB) as con: con.execute("""CREATE TABLE IF NOT EXISTS feedback( ts REAL, piece_id INT, target_id TEXT, correct INT, score REAL, user_id TEXT )""") init_db() @app.post("/feedback") def submit_feedback(f: Feedback): with sqlite3.connect(DB) as con: con.execute("INSERT INTO feedback VALUES(?,?,?,?,?,?)", (time.time(), f.piece_id, f.target_id, int(f.correct), f.score, f.user_id)) return {"status": "ok"}
src/feedback/active_learning.py
pythonimport sqlite3, numpy as np from sklearn.linear_model import LogisticRegression import joblib DB = "data/user_feedback.db" MODEL = "models/ranker_weights.pkl" def fetch_data(): with sqlite3.connect(DB) as con: rows = con.execute("SELECT piece_id, correct, score FROM feedback").fetchall() # Placeholder: in a real pipeline you'd store per-feature scores (emb,color,shape) X = np.array([[r[2]] for r in rows], dtype=np.float32) # using fused score as only feature y = np.array([r[1] for r in rows], dtype=np.int32) return X, y def train(): X,y = fetch_data() if len(y) < 10: print("Not enough feedback yet.") return clf = LogisticRegression().fit(X,y) joblib.dump(clf, MODEL) print("Updated ranker weights.") if __name__ == "__main__": train()
Extend to store individual component scores so the model can learn better weightings or even personalized preferences.
src/demo/streamlit_app.py
pythonimport streamlit as st, cv2, numpy as np, glob, os from segmentation.sam_wrapper import SAMSegmenter from segmentation.postprocess import filter_masks_by_size, extract_piece_crops, visualize_masks from matching.extract_features import ViTEmbedder, extract_all_features from matching.match_engine import match_pieces from matching.ranker import top_k SAM_CKPT = "models/sam_vit_h.pth" VIT_CKPT = "models/dinov2_s.pt" st.title("PieceFinder PoC Demo") uploaded_table = st.file_uploader("Upload a photo with pieces", type=["jpg","png"]) uploaded_slot = st.file_uploader("Upload a target slot image", type=["jpg","png"]) if uploaded_table and uploaded_slot: table = cv2.imdecode(np.frombuffer(uploaded_table.read(), np.uint8), cv2.IMREAD_COLOR) slot = cv2.imdecode(np.frombuffer(uploaded_slot.read(), np.uint8), cv2.IMREAD_COLOR) sam = SAMSegmenter(SAM_CKPT) masks = sam.segment(table) masks = filter_masks_by_size(masks, img_area=table.shape[0]*table.shape[1]) crops = extract_piece_crops(table, masks) vis = visualize_masks(table, masks) st.image(vis[:,:,::-1], caption="Segmented Pieces", use_column_width=True) slot_mask = np.full(slot.shape[:2], 255, np.uint8) # if you don't have a mask, just use full image embedder = ViTEmbedder(weights_path=VIT_CKPT) target_feats = extract_all_features(slot, slot_mask, embedder) c_feats = [extract_all_features(c["crop"], c["mask"], embedder) for c in crops] order, fused, parts = match_pieces(target_feats, c_feats) top = top_k(order, fused, 3) st.subheader("Top Suggestions") for idx, raw, conf in top: st.image(crops[idx]["crop"][:,:,::-1], caption=f"Piece {idx} (score={raw:.3f}, conf={conf:.2f})", width=200) feedback = st.radio(f"Is piece {idx} correct?", ["Unknown","Correct","Incorrect"], key=f"fb_{idx}") if feedback != "Unknown": st.write("Thanks! (wire to /feedback in real app)")
src/mobile/
) load TFLite model and run inference on bitmap frames.src/mobile/tflite_infer.py (generic Python ref)
pythonimport onnxruntime as ort import numpy as np class TFLitePieceMatcher: def __init__(self, onnx_path): self.session = ort.InferenceSession(onnx_path, providers=["CPUExecutionProvider"]) def run(self, input_vec): return self.session.run(None, {"input": input_vec.astype(np.float32)})[0]
scripts/package_zip.sh
bash#!/usr/bin/env bash zip -r piecefinder_poc.zip piecefinder-poc -x "*.pyc" "__pycache__/*" echo "Created piecefinder_poc.zip"
Run:
bashbash scripts/download_models.sh python src/segmentation/segment_pieces.py --image data/samples/table_photo.jpg python src/matching/match_example.py --pieces_dir out/segmentation --slot_img data/samples/empty_slot.jpg --slot_mask data/samples/empty_slot_mask.png uvicorn src.feedback.api:app --reload bash scripts/package_zip.sh
zxing
, pyzbar
) on-device.Good luck with the challenge! 🚀🧩