https://medium.com/@MostafaA.Shaban/yolov8-9-objec...

Creato il: 3 marzo 2026

Risposto usando GPT-5.2 Thinking da Chat01

Domanda

https://medium.com/@MostafaA.Shaban/yolov8-9-object-detection-in-flutter-with-tensorflow-lite-tflite-2024-70f1b93076cf follow these documten pls give the step by for me to ggreat the flutter app and intergrate with my yolo model hhere is the model info that i have convert to tflife Input Shape : [ 1 640 640 3]
Input Type : <class 'numpy.float32'>
Output Shape: [ 1 7 8400]
Output Type : <class 'numpy.float32'>

Input Quantization: (0.0, 0)
Output Quantization: (0.0, 0)
PS C:\TEST1>

Pensiero

Clarifying shape and model integration steps

I’ll explain that the output shape [1, 7, 8400] corresponds to three feature maps, which can be broken down into 80x80 + 40x40 + 20x20 for a clearer understanding. I’ll reference sources like Nvidia Isaac ROS and StackOverflow for context, while reassuring the user they don't need to manually parse the output since the plugin handles it. Following the documentation, I’ll suggest renaming the model file and using the correct input shape.

Ensuring correct input size and providing code examples

I’ll ensure the plugin resizes input to 640 by confirming it uses imageHeight and imageWidth. If the model is already set to 640, that’s perfect.

Then, I’ll provide an example labels.txt for the user's model, along with Flutter project setup steps, necessary Android configuration, and main.dart code. The code will handle object detection starting/stopping, overlay bounding boxes, and scaling. I'll make sure to include all relevant imports and initialize everything correctly.

Loading model and handling streaming

I'll load the YOLO model with the specified labels and paths, setting useGpu to false by default, but I could make it optional for the user. Then, I'll start the stream for image processing with the camera.

Also, I'll ensure that the bounding boxes are correctly displayed by adjusting for rotation during the calculation of the scaling factor. I will follow the example from the article for portrait orientation.

Handling bounding boxes and scaling with coordinates

For each detection result, I'll apply the scale factors (factorX and factorY) to calculate the bounding box coordinates. The coordinates output by flutter_vision are in image pixel coordinates, not normalized, so I'll ensure to mention that. I’ll also clarify how the orientation might affect those coordinates if the image is rotated.

Thought for 3m 39s

Risposta

Below is a step-by-step Flutter setup that follows the Medium guide you linked (it uses flutter_vision + camera) and shows exactly what to change to use your YOLOv8 TFLite model (input [1, 640, 640, 3] float32, output [1, 7, 8400] float32).

Your output shape [1, 7, 8400] is the common YOLOv8 “raw head” format (4 box values + N class scores) across 8400 candidate locations (80×80 + 40×40 + 20×20). That means N = 7 − 4 = 3 classes, so your labels.txt should have 3 lines. (nvidia-isaac-ros.github.io)


1) Create a new Flutter project

bash
flutter create yolo_flutter_app cd yolo_flutter_app

2) Add dependencies (pubspec.yaml)

Add these packages (the Medium guide uses flutter_vision + camera). (Read Medium articles with AI)

yaml
dependencies: flutter: sdk: flutter flutter_vision: ^2.0.0 camera: ^0.12.0

Then:

bash
flutter pub get

(Those versions are from pub.dev right now.) (Dart packages)


3) Add your model + labels as assets

Create:

text
assets/ model.tflite labels.txt

labels.txt must contain exactly 3 lines (since your output channels imply 3 classes):

text
class0 class1 class2

Now register them in pubspec.yaml:

yaml
flutter: assets: - assets/model.tflite - assets/labels.txt

(That “assets folder + register in pubspec” is exactly what the guide does.) (Read Medium articles with AI)


4) Android Gradle settings (important)

4.1 Prevent model compression

In android/app/build.gradle, inside the android { ... } block add: (Read Medium articles with AI)

gradle
android { aaptOptions { noCompress 'tflite' noCompress 'lite' } }

4.2 Set minSdk high enough

Recent camera requires Android SDK 24+. Set this in the same file:

gradle
defaultConfig { minSdkVersion 24 }

(camera docs show Android SDK 24+ support) (Dart packages)

4.3 Kotlin version (only if you hit build errors)

The Medium post suggests setting:

gradle
buildscript { ext.kotlin_version = '1.7.10' }

in android/build.gradle. (Read Medium articles with AI)
In newer Flutter projects you may not need this (and sometimes newer Kotlin is better). Only change it if your build complains.


5) Permissions

iOS

Add to ios/Runner/Info.plist: (Dart packages)

xml
<key>NSCameraUsageDescription</key> <string>This app needs camera access for object detection.</string> <key>NSMicrophoneUsageDescription</key> <string>This app needs microphone access.</string>

Android

Usually handled by the camera plugin, but if you need it explicitly, ensure your AndroidManifest.xml includes camera permission.

Note: flutter_vision says iOS support is “coming soon / working in progress” depending on version, so Android is the safest target first. (Dart packages)


6) Main Flutter code (copy/paste starter)

Replace lib/main.dart with this (this is the Medium guide structure, but wired to your asset names and includes a frame-throttle so inference doesn’t stack up): (Read Medium articles with AI)

dart
import 'package:camera/camera.dart'; import 'package:flutter/material.dart'; import 'package:flutter_vision/flutter_vision.dart'; late final List<CameraDescription> cameras; Future<void> main() async { WidgetsFlutterBinding.ensureInitialized(); cameras = await availableCameras(); runApp(const MaterialApp(home: YoloVideo())); } class YoloVideo extends StatefulWidget { const YoloVideo({super.key}); State<YoloVideo> createState() => _YoloVideoState(); } class _YoloVideoState extends State<YoloVideo> with WidgetsBindingObserver { late final CameraController _controller; late final FlutterVision _vision; bool _isLoaded = false; bool _isDetecting = false; bool _isProcessingFrame = false; CameraImage? _cameraImage; List<Map<String, dynamic>> _yoloResults = []; void initState() { super.initState(); WidgetsBinding.instance.addObserver(this); _vision = FlutterVision(); _init(); } Future<void> _init() async { _controller = CameraController( cameras.first, ResolutionPreset.high, enableAudio: false, ); await _controller.initialize(); // Load YOUR model: await _vision.loadYoloModel( labels: 'assets/labels.txt', modelPath: 'assets/model.tflite', modelVersion: "yolov8", quantization: false, // your model is float32 numThreads: 2, useGpu: false, // start with false; switch to true after it works ); setState(() => _isLoaded = true); } void dispose() { WidgetsBinding.instance.removeObserver(this); _controller.dispose(); _vision.closeYoloModel(); super.dispose(); } void didChangeAppLifecycleState(AppLifecycleState state) { // Camera plugin does not auto-handle lifecycle anymore; safe handling: // (If you want, you can expand this further.) if (!_controller.value.isInitialized) return; if (state == AppLifecycleState.inactive) { _controller.stopImageStream(); } } Future<void> _startDetection() async { if (_controller.value.isStreamingImages) return; setState(() { _isDetecting = true; _yoloResults = []; }); await _controller.startImageStream((image) async { if (!_isDetecting || _isProcessingFrame) return; _isProcessingFrame = true; _cameraImage = image; final result = await _vision.yoloOnFrame( bytesList: image.planes.map((p) => p.bytes).toList(), imageHeight: image.height, imageWidth: image.width, iouThreshold: 0.4, confThreshold: 0.4, // mainly for yolov5; ok to keep classThreshold: 0.5, ); if (mounted) { setState(() => _yoloResults = result); } _isProcessingFrame = false; }); } Future<void> _stopDetection() async { setState(() { _isDetecting = false; _yoloResults = []; }); if (_controller.value.isStreamingImages) { await _controller.stopImageStream(); } } List<Widget> _displayBoxes(Size screen) { if (_yoloResults.isEmpty || _cameraImage == null) return []; // Same scaling approach as the Medium guide: final factorX = screen.width / _cameraImage!.height; final factorY = screen.height / _cameraImage!.width; return _yoloResults.map((result) { final box = result["box"] as List<dynamic>; final left = (box[0] as num) * factorX; final top = (box[1] as num) * factorY; final right = (box[2] as num) * factorX; final bottom = (box[3] as num) * factorY; final confidence = (box[4] as num); return Positioned( left: left.toDouble(), top: top.toDouble(), width: (right - left).toDouble(), height: (bottom - top).toDouble(), child: Container( decoration: BoxDecoration( border: Border.all(width: 2), borderRadius: BorderRadius.circular(8), ), child: Align( alignment: Alignment.topLeft, child: Container( padding: const EdgeInsets.symmetric(horizontal: 6, vertical: 3), color: Colors.black54, child: Text( "${result['tag']} ${(confidence * 100).toStringAsFixed(1)}%", style: const TextStyle(color: Colors.white, fontSize: 14), ), ), ), ), ); }).toList(); } Widget build(BuildContext context) { final size = MediaQuery.of(context).size; if (!_isLoaded) { return const Scaffold( body: Center(child: Text("Loading model...")), ); } return Scaffold( body: Stack( fit: StackFit.expand, children: [ AspectRatio( aspectRatio: _controller.value.aspectRatio, child: CameraPreview(_controller), ), ..._displayBoxes(size), Positioned( bottom: 60, left: 0, right: 0, child: Center( child: FloatingActionButton( onPressed: _isDetecting ? _stopDetection : _startDetection, child: Icon(_isDetecting ? Icons.stop : Icons.play_arrow), ), ), ), ], ), ); } }

7) Run it

bash
flutter run

If something doesn’t work, here are the 3 most common fixes

  1. Model/labels mismatch
    Your model output suggests 3 classes. If your labels.txt has a different count, you can get wrong tags or errors. (Because YOLOv8 output channels = 4 + numClasses.) (Stack Overflow)

  2. GPU crashes / black screen
    Set useGpu: false first (CPU). After it works, try useGpu: true like the Medium guide shows. (Read Medium articles with AI)

  3. Boxes drawn in the wrong place / rotated
    The Medium guide scales using screen.width / cameraImage.height and screen.height / cameraImage.width. If your preview orientation differs, swap those or adjust mapping. (Read Medium articles with AI)


If you tell me Android only or Android + iOS, I can also give you the best next step:

  • Android-only (fastest): keep flutter_vision. (Dart packages)
  • Android + iOS (more control): switch to tflite_flutter and do post-processing in Dart (works reliably cross-platform).

Domanda

okay leav it ,

Risposta

No problem — we’ll drop it.

If you want, tell me what you want to do instead (e.g., “just run the model on a single image”, “do it in Python”, “try a different Flutter package”, “deploy on Android only”), and I’ll give you the simplest path.

Domanda

🚀 STEP 1 — Install Packages
Run FIRST:
!pip uninstall -y jax jaxlib keras-cv
!pip install tensorflow==2.16.1 ml-dtypes==0.3.2 -q
👉 Restart runtime after install.


🚀 STEP 2 — Mount Google Drive
from google.colab import drive
drive.mount('/content/drive')


🚀 STEP 3 — Dataset Paths
Change path if needed.
train_path = "/content/drive/MyDrive/tfrecord_data/train/objects-sN0S.tfrecord"
val_path = "/content/drive/MyDrive/tfrecord_data/valid/objects-sN0S.tfrecord"


🚀 STEP 4 — Number of Classes
Your label map:
cassava field
corn field
house
neem tree
palm tree
rice field
tree cluster
Total = 7
NUM_CLASSES = 7


🚀 STEP 5 — TFRecord Parser (VERY IMPORTANT ⭐)
import tensorflow as tf

def parse_tfrecord(example_proto):

text
feature_description = { 'image/encoded': tf.io.FixedLenFeature([], tf.string), 'image/object/class/label': tf.io.VarLenFeature(tf.int64), } example = tf.io.parse_single_example(example_proto, feature_description) image = tf.image.decode_jpeg(example['image/encoded'], channels=3) image = tf.image.resize(image, (300,300)) image = image / 255.0 label = tf.sparse.to_dense( example['image/object/class/label'] )[0] # Convert 1-based → 0-based label label = tf.cast(label - 1, tf.int64) return image, label

🚀 STEP 6 — Build Dataset
raw_train = tf.data.TFRecordDataset(train_path)
raw_val = tf.data.TFRecordDataset(val_path)

train_dataset = raw_train.map(parse_tfrecord)
.shuffle(500)
.batch(8)
.prefetch(tf.data.AUTOTUNE)

val_dataset = raw_val.map(parse_tfrecord)
.batch(8)
.prefetch(tf.data.AUTOTUNE)


🚀 STEP 7 — Build Model (Stable CNN Detector)
inputs = tf.keras.Input(shape=(300,300,3))

x = tf.keras.layers.Conv2D(32,3,activation='relu',padding='same')(inputs)
x = tf.keras.layers.MaxPooling2D()(x)

x = tf.keras.layers.Conv2D(64,3,activation='relu',padding='same')(x)
x = tf.keras.layers.MaxPooling2D()(x)

x = tf.keras.layers.Conv2D(128,3,activation='relu',padding='same')(x)
x = tf.keras.layers.GlobalAveragePooling2D()(x)

x = tf.keras.layers.Dense(128,activation='relu')(x)

outputs = tf.keras.layers.Dense(
NUM_CLASSES,
activation='softmax'
)(x)

model = tf.keras.Model(inputs, outputs)

model.compile(
optimizer=tf.keras.optimizers.Adam(0.001),
loss='sparse_categorical_crossentropy',
metrics=['accuracy']
)

print("✅ Model ready")


🚀 STEP 8 — Train Model (15 Epoch Test)
model.fit(
train_dataset,
validation_data=val_dataset,
epochs=15
)


🚀 STEP 9 — Save Model
model.save("saved_model")


🚀 STEP 10 — Convert To Flutter TFLite Model ⭐
This is VERY IMPORTANT.
converter = tf.lite.TFLiteConverter.from_saved_model("saved_model")

converter.optimizations = [tf.lite.Optimize.DEFAULT]

converter.inference_input_type = tf.uint8
converter.inference_output_type = tf.float32

tflite_model = converter.convert()

with open("detect.tflite","wb") as f:
f.write(tflite_model)

print("✅ TFLite model exported")
here is the code that i try to treian the model2 and here is the result model 2
Model2: new training model
Input Shape : [ 1 300 300 3]
Input Type : <class 'numpy.float32'>
Output Shape: [1 7]
Output Type : <class 'numpy.float32'>

Input Quantization: (0.0, 0)
Output Quantization: (0.0, 0)
when i try to link to the flutter app it not erro but u knew it scaning forrever don't show the result of detect, and here is the defual model that work with that app Input Shape : [ 1 300 300 3]
Input Type : <class 'numpy.uint8'>
Output Shape: [ 1 10 4]
Output Type : <class 'numpy.float32'>

Input Quantization: (0.0078125, 128)
Output Quantization: (0.0, 0)
so can u pls make the model2 look aliek model 1 that work with fluter and can detect the object , noted i will run on the google colab with python verseion 3.12, so pls make sure everyting no confused

Domanda

🚀 STEP 1 — Install Packages
Run FIRST:
!pip uninstall -y jax jaxlib keras-cv
!pip install tensorflow==2.16.1 ml-dtypes==0.3.2 -q
👉 Restart runtime after install.


🚀 STEP 2 — Mount Google Drive
from google.colab import drive
drive.mount('/content/drive')


🚀 STEP 3 — Dataset Paths
Change path if needed.
train_path = "/content/drive/MyDrive/tfrecord_data/train/objects-sN0S.tfrecord"
val_path = "/content/drive/MyDrive/tfrecord_data/valid/objects-sN0S.tfrecord"


🚀 STEP 4 — Number of Classes
Your label map:
cassava field
corn field
house
neem tree
palm tree
rice field
tree cluster
Total = 7
NUM_CLASSES = 7


🚀 STEP 5 — TFRecord Parser (VERY IMPORTANT ⭐)
import tensorflow as tf

def parse_tfrecord(example_proto):

text
feature_description = { 'image/encoded': tf.io.FixedLenFeature([], tf.string), 'image/object/class/label': tf.io.VarLenFeature(tf.int64), } example = tf.io.parse_single_example(example_proto, feature_description) image = tf.image.decode_jpeg(example['image/encoded'], channels=3) image = tf.image.resize(image, (300,300)) image = image / 255.0 label = tf.sparse.to_dense( example['image/object/class/label'] )[0] # Convert 1-based → 0-based label label = tf.cast(label - 1, tf.int64) return image, label

🚀 STEP 6 — Build Dataset
raw_train = tf.data.TFRecordDataset(train_path)
raw_val = tf.data.TFRecordDataset(val_path)

train_dataset = raw_train.map(parse_tfrecord)
.shuffle(500)
.batch(8)
.prefetch(tf.data.AUTOTUNE)

val_dataset = raw_val.map(parse_tfrecord)
.batch(8)
.prefetch(tf.data.AUTOTUNE)


🚀 STEP 7 — Build Model (Stable CNN Detector)
inputs = tf.keras.Input(shape=(300,300,3))

x = tf.keras.layers.Conv2D(32,3,activation='relu',padding='same')(inputs)
x = tf.keras.layers.MaxPooling2D()(x)

x = tf.keras.layers.Conv2D(64,3,activation='relu',padding='same')(x)
x = tf.keras.layers.MaxPooling2D()(x)

x = tf.keras.layers.Conv2D(128,3,activation='relu',padding='same')(x)
x = tf.keras.layers.GlobalAveragePooling2D()(x)

x = tf.keras.layers.Dense(128,activation='relu')(x)

outputs = tf.keras.layers.Dense(
NUM_CLASSES,
activation='softmax'
)(x)

model = tf.keras.Model(inputs, outputs)

model.compile(
optimizer=tf.keras.optimizers.Adam(0.001),
loss='sparse_categorical_crossentropy',
metrics=['accuracy']
)

print("✅ Model ready")


🚀 STEP 8 — Train Model (15 Epoch Test)
model.fit(
train_dataset,
validation_data=val_dataset,
epochs=15
)


🚀 STEP 9 — Save Model
model.save("saved_model")


🚀 STEP 10 — Convert To Flutter TFLite Model ⭐
This is VERY IMPORTANT.
converter = tf.lite.TFLiteConverter.from_saved_model("saved_model")

converter.optimizations = [tf.lite.Optimize.DEFAULT]

converter.inference_input_type = tf.uint8
converter.inference_output_type = tf.float32

tflite_model = converter.convert()

with open("detect.tflite","wb") as f:
f.write(tflite_model)

print("✅ TFLite model exported")
here is the code that i try to treian the model2 and here is the result model 2
Model2: new training model
Input Shape : [ 1 300 300 3]
Input Type : <class 'numpy.float32'>
Output Shape: [1 7]
Output Type : <class 'numpy.float32'>

Input Quantization: (0.0, 0)
Output Quantization: (0.0, 0)
when i try to link to the flutter app it not erro but u knew it scaning forrever don't show the result of detect, and here is the defual model that work with that app Input Shape : [ 1 300 300 3]
Input Type : <class 'numpy.uint8'>
Output Shape: [ 1 10 4]
Output Type : <class 'numpy.float32'>

Input Quantization: (0.0078125, 128)
Output Quantization: (0.0, 0)
so can u pls make the model2 look aliek model 1 that work with fluter and can detect the object , noted i will run on the google colab with python verseion 3.12, so pls make sure everyting no confused so give me the work code that work on google colab to trian other model that can detect likt the defaul model

Pensiero

Creating a Flutter app with YOLO model integration

The user has requested a step-by-step guide for integrating a YOLO model in a Flutter app using a Medium article. I’ll read the article to make sure the process is up-to-date. Then, I'll outline the steps such as creating a Flutter project, setting dependencies, and configuring the YOLO model output format.

Planning Flutter app integration with YOLOv8

The output likely involves 4 box values, 1 score (objectness), and 2 class scores. We'll need to verify the exact decode logic from the article. The steps involve converting the model to TFLite, using the tflite_flutter plugin, and handling image preprocessing with the image_picker plugin. I’ll include relevant code snippets, ensuring compatibility with both Android and iOS.

{"search_query":[{"q":"Medium yolov8 9 object detection in flutter with tensorflow lite tflite 2024 70f1b93076cf","recency":30}],"response_length":"short"}

{"open":[{"ref_id":"turn0search0"}],"response_length":"short"}

Deciding on plugin for YOLO model integration

The article uses the flutter_vision plugin, which could simplify integrating YOLOv8/9 with TFLite. Since the user wants to stick with their custom YOLO model, I can suggest using either flutter_vision or tflite_flutter. To follow the instructions, I'd adapt the code to include their specific model file and label paths. I'll also check for any references to loadYoloModel in the article.

{"find":[{"ref_id":"turn1view0","pattern":"loadYoloModel"}],"response_length":"short"}

{"open":[{"ref_id":"turn0search7"}],"response_length":"short"}

Integrating custom YOLOv8 model with flutter_vision

The flutter_vision plugin supports models like YOLOv8 and YOLOv5. Since the user has a float32 YOLOv8 model, I must ensure quantization is turned off. Labels should have one class per line, and their model output suggests 7 channels—likely representing 2 classes. I'll need to adapt for YOLOv8's expected output structure without objectness. The model should be exported as YOLOv8 detection, not segmentation.

Clarifying model output and initial setup steps

It looks like the user’s model likely has 3 classes, meaning the output has four box values plus three class scores, summing to 7 values. There’s no need for objectness in this case. The plugin should be able to parse the model, and I’ll ensure the label file counts match. For the setup: I’ll guide the user through creating the Flutter project and adding dependencies. I will check the latest camera version online!

{"search_query":[{"q":"pub.dev camera package latest version","recency":30},{"q":"pub.dev flutter_vision latest version","recency":30}],"response_length":"short"}

{"open":[{"ref_id":"turn4search0"}],"response_length":"short"}

Setting up the Flutter app and prerequisites

I’ll note that the Camera plugin requires a minimum SDK version of 24 for Android, which needs to be updated in the project configuration. We should also pin the camera plugin to version ^0.12.0 to avoid compatibility issues with older Flutter versions. I'll confirm necessary configurations like compileSdk and advise based on the latest Flutter stable. For the step-by-step: First, the prerequisites include the Flutter SDK, Android Studio, and a device or emulator. Then we create the project and add the necessary assets.

Configuring Android and Flutter dependencies

The pubspec.yaml file should include flutter_vision: ^2.0.0 and camera: ^0.12.0. We may also add permission_handler, but it's probably not necessary. In build.gradle, configure aaptOptions to prevent compression of TFLite models. Also, ensure minSdkVersion is 24 and compileSdkVersion is around 34-35. If needed, adjust Kotlin version to match your Flutter version. Mention troubleshooting tips for mismatched Kotlin versions. Finally, configure camera permissions in AndroidManifest.xml.

Configuring permissions and setting up code

For permissions, add <uses-permission android:name="android.permission.CAMERA"/> to the Android manifest and include NSCameraUsageDescription and NSMicrophone in the iOS Info.plist. In the main.dart, initialize the camera, load the model, and run inference. Use flags like isDetecting to manage concurrency. Adjust image resolution and configure the model path, labels, and useGpu settings as needed, while handling YUV bytes from the camera for processing.

Configuring the app and testing setup

We need to make sure the label length matches the expected class count (3 classes in this case) by checking the label file. If there’s a mismatch, it may cause conversion issues. We should also consider testing with yoloOnImage for static detection. To test: run flutter pub get and flutter run. I'll confirm export details through the Ultralytics docs and cite them.

{"open":[{"ref_id":"https://docs.ultralytics.com/integrations/tflite/"}],"response_length":"short"}

Ensuring export settings match model expectations

The user is using YOLOv8, but the Ultralytics docs suggest arguments for newer versions like YOLO26, which are still quite similar. For model export, we’ll use imgsz=640 and avoid setting nms to True, as the plugin handles NMS with the default eiou threshold. I'll confirm that the input matches the required shape and float32 format.

Providing the step-by-step process

Here's the process for setting up YOLO in Flutter:

  1. Confirm labels match the class count based on the output shape (7 channels for 3 classes).
  2. Create the Flutter project.
  3. Add necessary packages.
  4. Configure Android settings (aaptOptions, minSdk, etc.).
  5. Register assets (model and labels).
  6. Add required permissions.
  7. Write code to create the YoloVideo widget, load the model, handle detection, and overlay bounding boxes.

We'll also address lifecycle handling and performance (inference throttling and flag modifications) to prevent backlogs.

Addressing GPU and compatibility considerations

We need to note that useGpu depends on whether the GPU delegate is supported. It's safer to start with false and set it to true if applicable. Also, make sure the code compiles with latest camera plugin updates (0.12.0). I'll include a full main.dart, required imports, the camera setup, and Android/iOS manifest changes. Note that iOS support for the plugin is still in progress. If iOS is necessary, the user could explore the tflite_flutter approach.

Providing both approaches for model integration

The user may want to perform the integration manually instead of using flutter_vision, but the documentation employs it as the primary method. I'll include both options: the easier method with flutter_vision and a lower-level alternative with tflite_flutter. To avoid verbosity, I'll suggest the primary path first, and briefly mention the secondary option. Additionally, I'll clarify that the output shape is [1,7,8400], supporting YOLOv8 with 3 classes, but if there's an issue with the shape, using tflite_flutter might be necessary.

{"search_query":[{"q":"flutter_vision yolov8 output 8400 shape","recency":365},{"q":"flutter_vision loadYoloModel quantization parameter yolov8","recency":365}],"response_length":"short"}

Clarifying shape and model integration steps

I’ll explain that the output shape [1, 7, 8400] corresponds to three feature maps, which can be broken down into 80x80 + 40x40 + 20x20 for a clearer understanding. I’ll reference sources like Nvidia Isaac ROS and StackOverflow for context, while reassuring the user they don't need to manually parse the output since the plugin handles it. Following the documentation, I’ll suggest renaming the model file and using the correct input shape.

Ensuring correct input size and providing code examples

I’ll ensure the plugin resizes input to 640 by confirming it uses imageHeight and imageWidth. If the model is already set to 640, that’s perfect.

Then, I’ll provide an example labels.txt for the user's model, along with Flutter project setup steps, necessary Android configuration, and main.dart code. The code will handle object detection starting/stopping, overlay bounding boxes, and scaling. I'll make sure to include all relevant imports and initialize everything correctly.

Loading model and handling streaming

I'll load the YOLO model with the specified labels and paths, setting useGpu to false by default, but I could make it optional for the user. Then, I'll start the stream for image processing with the camera.

Also, I'll ensure that the bounding boxes are correctly displayed by adjusting for rotation during the calculation of the scaling factor. I will follow the example from the article for portrait orientation.

Handling bounding boxes and scaling with coordinates

For each detection result, I'll apply the scale factors (factorX and factorY) to calculate the bounding box coordinates. The coordinates output by flutter_vision are in image pixel coordinates, not normalized, so I'll ensure to mention that. I’ll also clarify how the orientation might affect those coordinates if the image is rotated.

{"search_query":[{"q":"TensorFlow 2.16.1 supported Python versions 3.12","recency":30},{"q":"tflite model maker object detector install TensorFlow 2.16 Python 3.12","recency":365},{"q":"TensorFlow Lite object detection TFLite model output boxes classes scores num_detections shapes [1,10,4] [1,10] [1,10] [1]","recency":365},{"q":"TensorFlow Object Detection API export tflite uint8 input float32 output representative dataset","recency":365}],"response_length":"short"}

{"open":[{"ref_id":"turn0search4"},{"ref_id":"turn0search12"},{"ref_id":"turn0search8"}],"response_length":"short"}

{"search_query":[{"q":"PyPI tensorflow latest version 2026","recency":30},{"q":"tensorflow pypi latest version 2.21.0 release date","recency":180},{"q":"Google Colab Python 3.12 tensorflow version install","recency":60},{"q":"tflite model maker object detector pip install version 2026","recency":180}],"response_length":"short"}

{"search_query":[{"q":"TensorFlow 2 detection model zoo ssd_mobilenet 300x300 coco17 tpu-8","recency":3650},{"q":"ssd_mobilenet_v1_300x300_coco17 tpu-8 download","recency":3650},{"q":"TensorFlow model zoo ssd_mobilenet_v2_320x320 coco17 tpu-8 input size 320","recency":3650},{"q":"TensorFlow Lite object detection model ssd_mobilenet_v1_300x300 tflite input quantization mean 128 std 128","recency":3650}],"response_length":"short"}

{"open":[{"ref_id":"turn3search0"}],"response_length":"short"}

{"find":[{"ref_id":"turn4view0","pattern":"ssd"}],"response_length":"short"}

{"open":[{"ref_id":"turn4view0","lineno":140}],"response_length":"short"}

{"open":[{"ref_id":"https://raw.githubusercontent.com/tensorflow/models/master/research/object_detection/g3doc/tf2_detection_zoo.md"}],"response_length":"short"}

{"open":[{"ref_id":"turn0search7"}],"response_length":"short"}

{"open":[{"ref_id":"https://raw.githubusercontent.com/tensorflow/models/master/research/object_detection/colab_tutorials/convert_odt_model_to_TFLite.ipynb"}],"response_length":"short"}

Thought for 10m 18s

Risposta

You can’t “convert” Model2 (output [1,7]) into a detector like your working Model1, because Model2 is only doing image classification (one class per image). A detector must learn bounding boxes, and the exported TFLite must output the usual detection tensors (boxes/classes/scores/num). Your working Model1 output [1,10,4] is the detection_boxes tensor from the standard TF Lite Object Detection format. (GitHub)

So the fix is: train an object detection model (SSD/EfficientDet/YOLO) using TFRecords that include bbox fields, then export to TFLite with detection postprocess.

Below is a Colab notebook-style code that:

  • Uses TensorFlow Object Detection API (TF2)
  • Fine-tunes SSD MobileNet V2 FPNLite 320x320
  • Exports a TFLite model that produces detection_boxes shape [1,10,4] (and also classes/scores/num)
  • Builds a uint8 input interface with quantization ≈ (0.0078125, 128) by using (x-128)/128 normalization (same idea as the mobile conversion docs). (GitHub)

Note: TF2 model zoo SSD is 320x320 (not 300x300). That’s normal. If your Flutter code is hardcoded to 300, change it to resize to 320. The TF2 detection zoo model list shows SSD MobileNet V2 / FPNLite as 320x320. (GitHub)


✅ COLAB CODE (copy cell-by-cell)

0) Check runtime (Python / TF)

python
import sys, tensorflow as tf print("Python:", sys.version) print("TF:", tf.__version__)

TensorFlow supports Python 3.9–3.12 for the latest stable line. (TensorFlow)


1) Install TF Object Detection API (TF2)

bash
pip -q install --upgrade pip pip -q install "protobuf==4.25.3" "tf_slim" "pillow" "lxml" "Cython" "contextlib2" "matplotlib" "opencv-python-headless" apt-get -qq update apt-get -qq install -y protobuf-compiler rm -rf models git clone --depth 1 https://github.com/tensorflow/models.git cd models/research protoc object_detection/protos/*.proto --python_out=. cp object_detection/packages/tf2/setup.py . pip -q install . cd /content

2) Mount Google Drive + set paths

python
from google.colab import drive drive.mount('/content/drive') TRAIN_TFRECORD = "/content/drive/MyDrive/tfrecord_data/train/objects-sN0S.tfrecord" VAL_TFRECORD = "/content/drive/MyDrive/tfrecord_data/valid/objects-sN0S.tfrecord" WORK_DIR = "/content/od_work" import os os.makedirs(WORK_DIR, exist_ok=True)

3) Create label map (pbtxt) for 7 classes (1-based IDs)

python
LABEL_MAP_PATH = os.path.join(WORK_DIR, "label_map.pbtxt") classes = [ "cassava field", "corn field", "house", "neem tree", "palm tree", "rice field", "tree cluster", ] with open(LABEL_MAP_PATH, "w", encoding="utf-8") as f: for i, name in enumerate(classes, start=1): f.write("item {\n") f.write(f" id: {i}\n") f.write(f" name: '{name}'\n") f.write("}\n\n") print("Wrote:", LABEL_MAP_PATH)

4) QUICK CHECK: does your TFRecord contain bounding boxes?

If this prints bbox keys, you’re good. If NOT, you cannot train detection from this TFRecord.

python
import tensorflow as tf raw = next(iter(tf.data.TFRecordDataset([TRAIN_TFRECORD]).take(1))) ex = tf.train.Example.FromString(raw.numpy()) print("TFRecord keys:") for k in sorted(ex.features.feature.keys()): if "bbox" in k or "box" in k or "image/object" in k: print(" -", k) # Most OD TFRecords include: # image/object/bbox/xmin, xmax, ymin, ymax + image/object/class/label

5) Download a pretrained SSD model (fast + mobile friendly)

We’ll use SSD MobileNet V2 FPNLite 320x320 from TF2 detection zoo. (GitHub)

bash
cd /content MODEL_NAME="ssd_mobilenet_v2_fpnlite_320x320_coco17_tpu-8" wget -q http://download.tensorflow.org/models/object_detection/tf2/20200711/${MODEL_NAME}.tar.gz tar -xf ${MODEL_NAME}.tar.gz rm ${MODEL_NAME}.tar.gz

6) Build a custom pipeline.config for your dataset (7 classes, TFRecords, top-10 detections)

python
import shutil from object_detection.protos import pipeline_pb2 from google.protobuf import text_format PRETRAIN_DIR = f"/content/{MODEL_NAME}" PIPELINE_SRC = os.path.join(PRETRAIN_DIR, "pipeline.config") TRAIN_DIR = os.path.join(WORK_DIR, "train") os.makedirs(TRAIN_DIR, exist_ok=True) PIPELINE_DST = os.path.join(TRAIN_DIR, "pipeline.config") shutil.copy(PIPELINE_SRC, PIPELINE_DST) pipeline_config = pipeline_pb2.TrainEvalPipelineConfig() with tf.io.gfile.GFile(PIPELINE_DST, "r") as f: text_format.Merge(f.read(), pipeline_config) # 1) num classes pipeline_config.model.ssd.num_classes = 7 # 2) fine-tune checkpoint pipeline_config.train_config.fine_tune_checkpoint = os.path.join(PRETRAIN_DIR, "checkpoint/ckpt-0") pipeline_config.train_config.fine_tune_checkpoint_type = "detection" # 3) batch size (adjust for your GPU/TPU) pipeline_config.train_config.batch_size = 8 # 4) dataset paths + label map pipeline_config.train_input_reader.label_map_path = LABEL_MAP_PATH pipeline_config.train_input_reader.tf_record_input_reader.input_path[:] = [TRAIN_TFRECORD] pipeline_config.eval_input_reader[0].label_map_path = LABEL_MAP_PATH pipeline_config.eval_input_reader[0].tf_record_input_reader.input_path[:] = [VAL_TFRECORD] # 5) make output like your working model: top-10 detections nms = pipeline_config.model.ssd.post_processing.batch_non_max_suppression nms.max_detections_per_class = 10 nms.max_total_detections = 10 with tf.io.gfile.GFile(PIPELINE_DST, "w") as f: f.write(text_format.MessageToString(pipeline_config)) print("✅ Wrote custom config:", PIPELINE_DST)

7) Train

Start small (e.g., 2k–10k steps). Increase if needed.

bash
python /content/models/research/object_detection/model_main_tf2.py \ --model_dir=/content/od_work/train \ --pipeline_config_path=/content/od_work/train/pipeline.config \ --num_train_steps=5000 \ --alsologtostderr

8) Export SavedModel

bash
python /content/models/research/object_detection/exporter_main_v2.py \ --input_type image_tensor \ --pipeline_config_path /content/od_work/train/pipeline.config \ --trained_checkpoint_dir /content/od_work/train \ --output_directory /content/od_work/exported

9) Export a TFLite-friendly SavedModel (OD API step)

bash
python /content/models/research/object_detection/export_tflite_graph_tf2.py \ --trained_checkpoint_dir /content/od_work/train \ --pipeline_config_path /content/od_work/train/pipeline.config \ --output_directory /content/od_work/tflite_export

This matches the official OD API → TFLite flow (export_tflite_graph_tf2.py then convert). (GitHub)


10) Convert to TFLite with uint8 input + float32 outputs

This produces the “Model1-style” interface: uint8 input and standard detection outputs (boxes/classes/scores/num). The mobile guide describes these tensors and the idea of uint8 input with mean/std normalization. (GitHub)

python
import numpy as np import tensorflow as tf TFLITE_SAVEDMODEL_DIR = "/content/od_work/tflite_export/saved_model" OUT_TFLITE = "/content/od_work/detect_uint8.tflite" # Representative dataset: normalize to (x-128)/128 => range [-1 .. 0.992] def rep_data(): ds = tf.data.TFRecordDataset([TRAIN_TFRECORD]).take(200) feature_description = { "image/encoded": tf.io.FixedLenFeature([], tf.string), } for raw in ds: ex = tf.io.parse_single_example(raw, feature_description) img = tf.image.decode_jpeg(ex["image/encoded"], channels=3) img = tf.image.resize(img, (320, 320)) # model is 320x320 img = tf.cast(img, tf.float32) img = (img - 128.0) / 128.0 img = tf.expand_dims(img, 0) yield [img] converter = tf.lite.TFLiteConverter.from_saved_model(TFLITE_SAVEDMODEL_DIR) converter.optimizations = [tf.lite.Optimize.DEFAULT] converter.representative_dataset = rep_data # Request uint8 input interface, float outputs (like your working model) converter.inference_input_type = tf.uint8 converter.inference_output_type = tf.float32 # OD models use custom postprocess op in TFLite converter.allow_custom_ops = True tflite_model = converter.convert() with open(OUT_TFLITE, "wb") as f: f.write(tflite_model) print("✅ Saved:", OUT_TFLITE)

11) Verify the final model looks like your working model

python
import tensorflow as tf interpreter = tf.lite.Interpreter(model_path=OUT_TFLITE) interpreter.allocate_tensors() print("INPUTS:") for d in interpreter.get_input_details(): print(d["shape"], d["dtype"], d.get("quantization")) print("\nOUTPUTS:") for d in interpreter.get_output_details(): print(d["name"], d["shape"], d["dtype"], d.get("quantization"))

Expected:

  • Input shape [1, 320, 320, 3], dtype uint8, quantization near (0.0078125, 128)
  • Output tensors should include boxes shaped [1, 10, 4] plus classes/scores/num (names often start with TFLite_Detection_PostProcess...). (GitHub)

Why your current Model2 “scans forever” in Flutter

Because your Flutter detector code expects detection outputs, but your model returns only 7 class probabilities ([1,7]). There are no boxes, so nothing can be drawn.


If you paste here the Flutter package you’re using for the working model (e.g., tflite_flutter, google_mlkit_object_detection, tflite_flutter_helper, etc.) and the snippet where it reads outputs, I can tell you the exact output tensor order (boxes/classes/scores/num) that your app expects.

Condividi questa Q&A