AI-Powered Coffee Capsule Recognition

Imagine pointing your phone at a coffee capsule and instantly knowing which one it is.

That’s the power of our first use case with Capsule Scanner, a simple yet impactful example of how computer vision can enhance everyday routines.

What did we build?

We developed a mobile-first AI solution capable of detecting and identifying different types of coffee capsules in real time. At its core is a custom-trained deep learning model: YOLO-Nano, a lightweight convolutional neural network optimized for on-device performance.

Why not use external APIs?

While external APIs like OpenAI Vision and Google Vision offer quick integration, they lack the precision and flexibility required for our use case.

By training our own model, we gained:

Higher accuracy tailored to our capsule categories
Full control over model evolution and retraining
True offline capability with predictable latency
Zero per-request cost
Complete user privacy

Training our own model was the clear choice.

Why YOLO-Nano?

After evaluating object detection models (YOLOv5n, MobileNet-SSD, EfficientDet-Lite), YOLO-Nano offered the best balance of:

Real-time performance on mid-range phones
Small model size
Reliable accuracy across varied lighting conditions
Low memory and battery consumption

Because the model runs on the device, it supports 100% offline operation, making it ideal for mobile apps and edge use cases.

Interference Framework

We benchmarked multiple inference runtimes, including TensorFlow Lite, ONNX Runtime Mobile, and NCNN.

We selected TensorFlow Lite due to:

Excellent support for model quantization (INT8, FP16)
Fast startup time and low memory overhead
Native compatibility with Android and iOS
Broad hardware acceleration (NNAPI, GPU delegate, Core ML delegate)

How We Trained the YOLO-Nano Model

Running ML directly on mobile devices offers speed and privacy, but requires careful model size tuning, and efficient memory and battery usage.

To enable real-time capsule recognition directly on mobile devices, we trained a custom YOLO-Nano model using a structured and efficient pipeline:

Environment Setup

Prepared a clean development environment with support for GPU acceleration and integrated tools like Roboflow for dataset management.

Dataset Preparation

Recorded a 60-second capsule video
From that video, Roboflow extracts images with varied lighting conditions and angles to improve generalization.
Extracted and labeled 1600 images across 4 different capsule types.

Model Configuration

Selected YOLO-Nano + TF Lite for its speed-accuracy-size balance
Tuned it specifically for mobile deployment (Android/iOS)

Training

We trained the model using multiple setups — including Roboflow-exported training workflows, our own equipped workstations, and dedicated servers — and all approaches produced equivalent performance, confirming both the stability of our dataset and the robustness of the YOLO-Nano architecture.

Evaluation

Achieved a mean Average Precision (mAP) between 50% and 75%, depending on capsule type and test conditions.
Performance was validated using precision-recall curves and live detection tests on mobile devices.

Roboflow: Our Dataset Ally

Roboflow is an end-to-end computer vision platform that helps developers and businesses create, train, and deploy custom AI models. It streamlines the entire workflow, from data management and annotation to training, evaluation, and deployment .Roboflow helped us: