https://medium.com/@MostafaA.Shaban/yolov8-9-objec...

Question

https://medium.com/@MostafaA.Shaban/yolov8-9-object-detection-in-flutter-with-tensorflow-lite-tflite-2024-70f1b93076cf follow these documten pls give the step by for me to ggreat the flutter app and intergrate with my yolo model hhere is the model info that i have convert to tflife Input Shape : [ 1 640 640 3]
Input Type : <class 'numpy.float32'>
Output Shape: [ 1 7 8400]
Output Type : <class 'numpy.float32'>

Input Quantization: (0.0, 0)
Output Quantization: (0.0, 0)
PS C:\TEST1>

Answer 1

Below is a step-by-step Flutter setup that follows the Medium guide you linked (it uses flutter_vision + camera) and shows exactly what to change to use your YOLOv8 TFLite model (input [1, 640, 640, 3] float32, output [1, 7, 8400] float32).

Your output shape [1, 7, 8400] is the common YOLOv8 “raw head” format (4 box values + N class scores) across 8400 candidate locations (80×80 + 40×40 + 20×20). That means N = 7 − 4 = 3 classes, so your labels.txt should have 3 lines. (nvidia-isaac-ros.github.io)

1) Create a new Flutter project

bash
flutter create yolo_flutter_app
cd yolo_flutter_app

2) Add dependencies (pubspec.yaml)

Add these packages (the Medium guide uses flutter_vision + camera). (Read Medium articles with AI)

yaml
dependencies:
  flutter:
    sdk: flutter

  flutter_vision: ^2.0.0
  camera: ^0.12.0

Then:

bash
flutter pub get

(Those versions are from pub.dev right now.) (Dart packages)

3) Add your model + labels as assets

Create:

text
assets/
  model.tflite
  labels.txt

labels.txt must contain exactly 3 lines (since your output channels imply 3 classes):

text
class0
class1
class2

Now register them in pubspec.yaml:

yaml
flutter:
  assets:
    - assets/model.tflite
    - assets/labels.txt

(That “assets folder + register in pubspec” is exactly what the guide does.) (Read Medium articles with AI)

4) Android Gradle settings (important)

4.1 Prevent model compression

In android/app/build.gradle, inside the android { ... } block add: (Read Medium articles with AI)

gradle
android {
    aaptOptions {
        noCompress 'tflite'
        noCompress 'lite'
    }
}

4.2 Set minSdk high enough

Recent camera requires Android SDK 24+. Set this in the same file:

gradle
defaultConfig {
    minSdkVersion 24
}

(camera docs show Android SDK 24+ support) (Dart packages)

4.3 Kotlin version (only if you hit build errors)

The Medium post suggests setting:

gradle
buildscript {
  ext.kotlin_version = '1.7.10'
}

in android/build.gradle. (Read Medium articles with AI)
In newer Flutter projects you may not need this (and sometimes newer Kotlin is better). Only change it if your build complains.

5) Permissions

iOS

Add to ios/Runner/Info.plist: (Dart packages)

xml
<key>NSCameraUsageDescription</key>
<string>This app needs camera access for object detection.</string>
<key>NSMicrophoneUsageDescription</key>
<string>This app needs microphone access.</string>

Android

Usually handled by the camera plugin, but if you need it explicitly, ensure your AndroidManifest.xml includes camera permission.

Note: flutter_vision says iOS support is “coming soon / working in progress” depending on version, so Android is the safest target first. (Dart packages)

6) Main Flutter code (copy/paste starter)

Replace lib/main.dart with this (this is the Medium guide structure, but wired to your asset names and includes a frame-throttle so inference doesn’t stack up): (Read Medium articles with AI)

dart
import 'package:camera/camera.dart';
import 'package:flutter/material.dart';
import 'package:flutter_vision/flutter_vision.dart';

late final List<CameraDescription> cameras;

Future<void> main() async {
  WidgetsFlutterBinding.ensureInitialized();
  cameras = await availableCameras();
  runApp(const MaterialApp(home: YoloVideo()));
}

class YoloVideo extends StatefulWidget {
  const YoloVideo({super.key});

  @override
  State<YoloVideo> createState() => _YoloVideoState();
}

class _YoloVideoState extends State<YoloVideo> with WidgetsBindingObserver {
  late final CameraController _controller;
  late final FlutterVision _vision;

  bool _isLoaded = false;
  bool _isDetecting = false;
  bool _isProcessingFrame = false;

  CameraImage? _cameraImage;
  List<Map<String, dynamic>> _yoloResults = [];

  @override
  void initState() {
    super.initState();
    WidgetsBinding.instance.addObserver(this);
    _vision = FlutterVision();
    _init();
  }

  Future<void> _init() async {
    _controller = CameraController(
      cameras.first,
      ResolutionPreset.high,
      enableAudio: false,
    );

    await _controller.initialize();

    // Load YOUR model:
    await _vision.loadYoloModel(
      labels: 'assets/labels.txt',
      modelPath: 'assets/model.tflite',
      modelVersion: "yolov8",
      quantization: false, // your model is float32
      numThreads: 2,
      useGpu: false, // start with false; switch to true after it works
    );

    setState(() => _isLoaded = true);
  }

  @override
  void dispose() {
    WidgetsBinding.instance.removeObserver(this);
    _controller.dispose();
    _vision.closeYoloModel();
    super.dispose();
  }

  @override
  void didChangeAppLifecycleState(AppLifecycleState state) {
    // Camera plugin does not auto-handle lifecycle anymore; safe handling:
    // (If you want, you can expand this further.)
    if (!_controller.value.isInitialized) return;

    if (state == AppLifecycleState.inactive) {
      _controller.stopImageStream();
    }
  }

  Future<void> _startDetection() async {
    if (_controller.value.isStreamingImages) return;

    setState(() {
      _isDetecting = true;
      _yoloResults = [];
    });

    await _controller.startImageStream((image) async {
      if (!_isDetecting || _isProcessingFrame) return;

      _isProcessingFrame = true;
      _cameraImage = image;

      final result = await _vision.yoloOnFrame(
        bytesList: image.planes.map((p) => p.bytes).toList(),
        imageHeight: image.height,
        imageWidth: image.width,
        iouThreshold: 0.4,
        confThreshold: 0.4, // mainly for yolov5; ok to keep
        classThreshold: 0.5,
      );

      if (mounted) {
        setState(() => _yoloResults = result);
      }

      _isProcessingFrame = false;
    });
  }

  Future<void> _stopDetection() async {
    setState(() {
      _isDetecting = false;
      _yoloResults = [];
    });

    if (_controller.value.isStreamingImages) {
      await _controller.stopImageStream();
    }
  }

  List<Widget> _displayBoxes(Size screen) {
    if (_yoloResults.isEmpty || _cameraImage == null) return [];

    // Same scaling approach as the Medium guide:
    final factorX = screen.width / _cameraImage!.height;
    final factorY = screen.height / _cameraImage!.width;

    return _yoloResults.map((result) {
      final box = result["box"] as List<dynamic>;

      final left = (box[0] as num) * factorX;
      final top = (box[1] as num) * factorY;
      final right = (box[2] as num) * factorX;
      final bottom = (box[3] as num) * factorY;
      final confidence = (box[4] as num);

      return Positioned(
        left: left.toDouble(),
        top: top.toDouble(),
        width: (right - left).toDouble(),
        height: (bottom - top).toDouble(),
        child: Container(
          decoration: BoxDecoration(
            border: Border.all(width: 2),
            borderRadius: BorderRadius.circular(8),
          ),
          child: Align(
            alignment: Alignment.topLeft,
            child: Container(
              padding: const EdgeInsets.symmetric(horizontal: 6, vertical: 3),
              color: Colors.black54,
              child: Text(
                "${result['tag']} ${(confidence * 100).toStringAsFixed(1)}%",
                style: const TextStyle(color: Colors.white, fontSize: 14),
              ),
            ),
          ),
        ),
      );
    }).toList();
  }

  @override
  Widget build(BuildContext context) {
    final size = MediaQuery.of(context).size;

    if (!_isLoaded) {
      return const Scaffold(
        body: Center(child: Text("Loading model...")),
      );
    }

    return Scaffold(
      body: Stack(
        fit: StackFit.expand,
        children: [
          AspectRatio(
            aspectRatio: _controller.value.aspectRatio,
            child: CameraPreview(_controller),
          ),
          ..._displayBoxes(size),
          Positioned(
            bottom: 60,
            left: 0,
            right: 0,
            child: Center(
              child: FloatingActionButton(
                onPressed: _isDetecting ? _stopDetection : _startDetection,
                child: Icon(_isDetecting ? Icons.stop : Icons.play_arrow),
              ),
            ),
          ),
        ],
      ),
    );
  }
}

7) Run it

bash
flutter run

If something doesn’t work, here are the 3 most common fixes

Model/labels mismatch
Your model output suggests 3 classes. If your labels.txt has a different count, you can get wrong tags or errors. (Because YOLOv8 output channels = 4 + numClasses.) (Stack Overflow)
GPU crashes / black screen
Set useGpu: false first (CPU). After it works, try useGpu: true like the Medium guide shows. (Read Medium articles with AI)
Boxes drawn in the wrong place / rotated
The Medium guide scales using screen.width / cameraImage.height and screen.height / cameraImage.width. If your preview orientation differs, swap those or adjust mapping. (Read Medium articles with AI)

If you tell me Android only or Android + iOS, I can also give you the best next step:

Android-only (fastest): keep flutter_vision. (Dart packages)
Android + iOS (more control): switch to tflite_flutter and do post-processing in Dart (works reliably cross-platform).

Answer 2

No problem — we’ll drop it.

If you want, tell me what you want to do instead (e.g., “just run the model on a single image”, “do it in Python”, “try a different Flutter package”, “deploy on Android only”), and I’ll give you the simplest path.

Answer 3

You can’t “convert” Model2 (output [1,7]) into a detector like your working Model1, because Model2 is only doing image classification (one class per image). A detector must learn bounding boxes, and the exported TFLite must output the usual detection tensors (boxes/classes/scores/num). Your working Model1 output [1,10,4] is the detection_boxes tensor from the standard TF Lite Object Detection format. (GitHub)

So the fix is: train an object detection model (SSD/EfficientDet/YOLO) using TFRecords that include bbox fields, then export to TFLite with detection postprocess.

Below is a Colab notebook-style code that:

Uses TensorFlow Object Detection API (TF2)
Fine-tunes SSD MobileNet V2 FPNLite 320x320
Exports a TFLite model that produces detection_boxes shape [1,10,4] (and also classes/scores/num)
Builds a uint8 input interface with quantization ≈ (0.0078125, 128) by using (x-128)/128 normalization (same idea as the mobile conversion docs). (GitHub)

Note: TF2 model zoo SSD is 320x320 (not 300x300). That’s normal. If your Flutter code is hardcoded to 300, change it to resize to 320. The TF2 detection zoo model list shows SSD MobileNet V2 / FPNLite as 320x320. (GitHub)

✅ COLAB CODE (copy cell-by-cell)

0) Check runtime (Python / TF)

python
import sys, tensorflow as tf
print("Python:", sys.version)
print("TF:", tf.__version__)

TensorFlow supports Python 3.9–3.12 for the latest stable line. (TensorFlow)

1) Install TF Object Detection API (TF2)

bash
pip -q install --upgrade pip
pip -q install "protobuf==4.25.3" "tf_slim" "pillow" "lxml" "Cython" "contextlib2" "matplotlib" "opencv-python-headless"
apt-get -qq update
apt-get -qq install -y protobuf-compiler
rm -rf models
git clone --depth 1 https://github.com/tensorflow/models.git

cd models/research
protoc object_detection/protos/*.proto --python_out=.
cp object_detection/packages/tf2/setup.py .
pip -q install .
cd /content

2) Mount Google Drive + set paths

python
from google.colab import drive
drive.mount('/content/drive')

TRAIN_TFRECORD = "/content/drive/MyDrive/tfrecord_data/train/objects-sN0S.tfrecord"
VAL_TFRECORD   = "/content/drive/MyDrive/tfrecord_data/valid/objects-sN0S.tfrecord"

WORK_DIR = "/content/od_work"
import os
os.makedirs(WORK_DIR, exist_ok=True)

3) Create label map (pbtxt) for 7 classes (1-based IDs)

python
LABEL_MAP_PATH = os.path.join(WORK_DIR, "label_map.pbtxt")

classes = [
    "cassava field",
    "corn field",
    "house",
    "neem tree",
    "palm tree",
    "rice field",
    "tree cluster",
]

with open(LABEL_MAP_PATH, "w", encoding="utf-8") as f:
    for i, name in enumerate(classes, start=1):
        f.write("item {\n")
        f.write(f"  id: {i}\n")
        f.write(f"  name: '{name}'\n")
        f.write("}\n\n")

print("Wrote:", LABEL_MAP_PATH)

4) QUICK CHECK: does your TFRecord contain bounding boxes?

If this prints bbox keys, you’re good. If NOT, you cannot train detection from this TFRecord.

python
import tensorflow as tf

raw = next(iter(tf.data.TFRecordDataset([TRAIN_TFRECORD]).take(1)))
ex = tf.train.Example.FromString(raw.numpy())
print("TFRecord keys:")
for k in sorted(ex.features.feature.keys()):
    if "bbox" in k or "box" in k or "image/object" in k:
        print(" -", k)

# Most OD TFRecords include:
# image/object/bbox/xmin, xmax, ymin, ymax + image/object/class/label

5) Download a pretrained SSD model (fast + mobile friendly)

We’ll use SSD MobileNet V2 FPNLite 320x320 from TF2 detection zoo. (GitHub)

bash
cd /content
MODEL_NAME="ssd_mobilenet_v2_fpnlite_320x320_coco17_tpu-8"
wget -q http://download.tensorflow.org/models/object_detection/tf2/20200711/${MODEL_NAME}.tar.gz
tar -xf ${MODEL_NAME}.tar.gz
rm ${MODEL_NAME}.tar.gz

6) Build a custom pipeline.config for your dataset (7 classes, TFRecords, top-10 detections)

python
import shutil
from object_detection.protos import pipeline_pb2
from google.protobuf import text_format

PRETRAIN_DIR = f"/content/{MODEL_NAME}"
PIPELINE_SRC = os.path.join(PRETRAIN_DIR, "pipeline.config")

TRAIN_DIR = os.path.join(WORK_DIR, "train")
os.makedirs(TRAIN_DIR, exist_ok=True)

PIPELINE_DST = os.path.join(TRAIN_DIR, "pipeline.config")
shutil.copy(PIPELINE_SRC, PIPELINE_DST)

pipeline_config = pipeline_pb2.TrainEvalPipelineConfig()
with tf.io.gfile.GFile(PIPELINE_DST, "r") as f:
    text_format.Merge(f.read(), pipeline_config)

# 1) num classes
pipeline_config.model.ssd.num_classes = 7

# 2) fine-tune checkpoint
pipeline_config.train_config.fine_tune_checkpoint = os.path.join(PRETRAIN_DIR, "checkpoint/ckpt-0")
pipeline_config.train_config.fine_tune_checkpoint_type = "detection"

# 3) batch size (adjust for your GPU/TPU)
pipeline_config.train_config.batch_size = 8

# 4) dataset paths + label map
pipeline_config.train_input_reader.label_map_path = LABEL_MAP_PATH
pipeline_config.train_input_reader.tf_record_input_reader.input_path[:] = [TRAIN_TFRECORD]

pipeline_config.eval_input_reader[0].label_map_path = LABEL_MAP_PATH
pipeline_config.eval_input_reader[0].tf_record_input_reader.input_path[:] = [VAL_TFRECORD]

# 5) make output like your working model: top-10 detections
nms = pipeline_config.model.ssd.post_processing.batch_non_max_suppression
nms.max_detections_per_class = 10
nms.max_total_detections = 10

with tf.io.gfile.GFile(PIPELINE_DST, "w") as f:
    f.write(text_format.MessageToString(pipeline_config))

print("✅ Wrote custom config:", PIPELINE_DST)

7) Train

Start small (e.g., 2k–10k steps). Increase if needed.

bash
python /content/models/research/object_detection/model_main_tf2.py \
  --model_dir=/content/od_work/train \
  --pipeline_config_path=/content/od_work/train/pipeline.config \
  --num_train_steps=5000 \
  --alsologtostderr

8) Export SavedModel

bash
python /content/models/research/object_detection/exporter_main_v2.py \
  --input_type image_tensor \
  --pipeline_config_path /content/od_work/train/pipeline.config \
  --trained_checkpoint_dir /content/od_work/train \
  --output_directory /content/od_work/exported

9) Export a TFLite-friendly SavedModel (OD API step)

bash
python /content/models/research/object_detection/export_tflite_graph_tf2.py \
  --trained_checkpoint_dir /content/od_work/train \
  --pipeline_config_path /content/od_work/train/pipeline.config \
  --output_directory /content/od_work/tflite_export

This matches the official OD API → TFLite flow (export_tflite_graph_tf2.py then convert). (GitHub)

10) Convert to TFLite with uint8 input + float32 outputs

This produces the “Model1-style” interface: uint8 input and standard detection outputs (boxes/classes/scores/num). The mobile guide describes these tensors and the idea of uint8 input with mean/std normalization. (GitHub)

python
import numpy as np
import tensorflow as tf

TFLITE_SAVEDMODEL_DIR = "/content/od_work/tflite_export/saved_model"
OUT_TFLITE = "/content/od_work/detect_uint8.tflite"

# Representative dataset: normalize to (x-128)/128 => range [-1 .. 0.992]
def rep_data():
    ds = tf.data.TFRecordDataset([TRAIN_TFRECORD]).take(200)

    feature_description = {
        "image/encoded": tf.io.FixedLenFeature([], tf.string),
    }

    for raw in ds:
        ex = tf.io.parse_single_example(raw, feature_description)
        img = tf.image.decode_jpeg(ex["image/encoded"], channels=3)
        img = tf.image.resize(img, (320, 320))  # model is 320x320
        img = tf.cast(img, tf.float32)
        img = (img - 128.0) / 128.0
        img = tf.expand_dims(img, 0)
        yield [img]

converter = tf.lite.TFLiteConverter.from_saved_model(TFLITE_SAVEDMODEL_DIR)

converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.representative_dataset = rep_data

# Request uint8 input interface, float outputs (like your working model)
converter.inference_input_type = tf.uint8
converter.inference_output_type = tf.float32

# OD models use custom postprocess op in TFLite
converter.allow_custom_ops = True

tflite_model = converter.convert()

with open(OUT_TFLITE, "wb") as f:
    f.write(tflite_model)

print("✅ Saved:", OUT_TFLITE)

11) Verify the final model looks like your working model

python
import tensorflow as tf

interpreter = tf.lite.Interpreter(model_path=OUT_TFLITE)
interpreter.allocate_tensors()

print("INPUTS:")
for d in interpreter.get_input_details():
    print(d["shape"], d["dtype"], d.get("quantization"))

print("\nOUTPUTS:")
for d in interpreter.get_output_details():
    print(d["name"], d["shape"], d["dtype"], d.get("quantization"))

Expected:

Input shape [1, 320, 320, 3], dtype uint8, quantization near (0.0078125, 128)
Output tensors should include boxes shaped [1, 10, 4] plus classes/scores/num (names often start with TFLite_Detection_PostProcess...). (GitHub)

Why your current Model2 “scans forever” in Flutter

Because your Flutter detector code expects detection outputs, but your model returns only 7 class probabilities ([1,7]). There are no boxes, so nothing can be drawn.

If you paste here the Flutter package you’re using for the working model (e.g., tflite_flutter, google_mlkit_object_detection, tflite_flutter_helper, etc.) and the snippet where it reads outputs, I can tell you the exact output tensor order (boxes/classes/scores/num) that your app expects.

問題

思考

Clarifying shape and model integration steps

Ensuring correct input size and providing code examples

Loading model and handling streaming

Handling bounding boxes and scaling with coordinates

回答

1) Create a new Flutter project

2) Add dependencies (pubspec.yaml)

3) Add your model + labels as assets

4) Android Gradle settings (important)

4.1 Prevent model compression

4.2 Set minSdk high enough

4.3 Kotlin version (only if you hit build errors)

5) Permissions

iOS

Android

6) Main Flutter code (copy/paste starter)

7) Run it

If something doesn’t work, here are the 3 most common fixes

問題

回答

問題

問題

思考

Creating a Flutter app with YOLO model integration

Planning Flutter app integration with YOLOv8

Deciding on plugin for YOLO model integration

Integrating custom YOLOv8 model with flutter_vision

Clarifying model output and initial setup steps

Setting up the Flutter app and prerequisites

Configuring Android and Flutter dependencies

Configuring permissions and setting up code

Configuring the app and testing setup

Ensuring export settings match model expectations

Providing the step-by-step process

Addressing GPU and compatibility considerations

Providing both approaches for model integration