AI-Powered Coffee Capsule Recognition
Imagine pointing your phone at a coffee capsule and instantly knowing which one it is.
That’s the power of our first use case with Capsule Scanner, a simple yet impactful example of how computer vision can enhance everyday routines.
What did we build?
We developed a mobile-first AI solution capable of detecting and identifying different types of coffee capsules in real time. At its core is a custom-trained deep learning model: YOLO-Nano, a lightweight convolutional neural network optimized for on-device performance.
Why not use external APIs?
While external APIs like OpenAI Vision and Google Vision offer quick integration, they lack the precision and flexibility required for our use case.
By training our own model, we gained:
Higher accuracy tailored to our capsule categories
Full control over model evolution and retraining
True offline capability with predictable latency
Zero per-request cost
Complete user privacy
Training our own model was the clear choice.
Why YOLO-Nano?
After evaluating object detection models (YOLOv5n, MobileNet-SSD, EfficientDet-Lite), YOLO-Nano offered the best balance of:
Real-time performance on mid-range phones
Small model size
Reliable accuracy across varied lighting conditions
Low memory and battery consumption
Because the model runs on the device, it supports 100% offline operation, making it ideal for mobile apps and edge use cases.
Interference Framework
We benchmarked multiple inference runtimes, including TensorFlow Lite, ONNX Runtime Mobile, and NCNN.
We selected TensorFlow Lite due to:
Excellent support for model quantization (INT8, FP16)
Fast startup time and low memory overhead
Native compatibility with Android and iOS
Broad hardware acceleration (NNAPI, GPU delegate, Core ML delegate)
How We Trained the YOLO-Nano Model
Running ML directly on mobile devices offers speed and privacy, but requires careful model size tuning, and efficient memory and battery usage.
To enable real-time capsule recognition directly on mobile devices, we trained a custom YOLO-Nano model using a structured and efficient pipeline:
Environment Setup
Prepared a clean development environment with support for GPU acceleration and integrated tools like Roboflow for dataset management.
Dataset Preparation
Recorded a 60-second capsule video
From that video, Roboflow extracts images with varied lighting conditions and angles to improve generalization.
Extracted and labeled 1600 images across 4 different capsule types.
Model Configuration
Selected YOLO-Nano + TF Lite for its speed-accuracy-size balance
Tuned it specifically for mobile deployment (Android/iOS)
Training
We trained the model using multiple setups — including Roboflow-exported training workflows, our own equipped workstations, and dedicated servers — and all approaches produced equivalent performance, confirming both the stability of our dataset and the robustness of the YOLO-Nano architecture.
Evaluation
Achieved a mean Average Precision (mAP) between 50% and 75%, depending on capsule type and test conditions.
Performance was validated using precision-recall curves and live detection tests on mobile devices.
Roboflow: Our Dataset Ally
Roboflow is an end-to-end computer vision platform that helps developers and businesses create, train, and deploy custom AI models. It streamlines the entire workflow, from data management and annotation to training, evaluation, and deployment .Roboflow helped us:
Organize and annotate the dataset
Apply preprocessing and augmentation (e.g., rotation, brightness, flipping)
Model training
Export in YOLO-compatible format
Visualize predictions and iterate quickly
Where can this be applied?
This technology can be extended to:
Enriching digital inventory systems
Automating quality control
Powering accessibility features
Smart home automation
Enhancing retail experiences
Explore our AI expertise
If you're curious about AI in real-world applications or want to collaborate, connect with SQLI Spain!
Related Content