Live Object Detection Demo with Machine Learning

This live object detection demo app demonstrates the capabilities of using computer vision and machine learning algorithms to detect objects in a live video stream. You can interact with the app by moving objects in front of your camera.

The application is built using a pre-trained COCO model and YOLOv7, a real-time object detection algorithm. The model is loaded into the browser using ONNX and WebAssembly, a binary instruction format and interpreter for running machine learning models in the browser. The app itself is built in React.

The app runs on most desktop and browser platforms. The detection speed and performance are limited, as the app runs entirely in the web browser and on the CPU.

What is Object Detection?

Object detection with machine learning is a technology that helps computers recognize and identify objects in images or videos. The process starts by training a computer model using a large dataset of images that have been manually annotated with labels for the objects in the image. The computer model then learns to identify patterns in the images and associates those patterns with different types of objects.

When given a new image or video, the computer model uses these learned patterns to identify the presence and location of different objects in the image. This can be done by dividing the image into smaller regions and analyzing each region for the presence of an object.

Leverage Pre-Trained Models!

Training a model from scratch requires a large dataset of labeled data. Often, collecting and annotating datasets is very time-consuming and costly. For many use cases, however, so-called pre-trained models can be leveraged as a basis, which can be improved with a transfer learning technique. Many such models are open-sourced and freely available.

Transfer learning is a technique in machine learning where an initial model is trained on a large dataset for a specific task, and then the same model is fine-tuned on a smaller dataset for a related study.

We used a standard YOLOv7 model in the demo above, trained with the COCO database. YOLO stands for You Only Look Once and is a very efficient object detection model often used in computer vision. COCO stands for Common Objects in Context, a large-scale object detection, segmentation, and captioning dataset. COCO contains over 330,000 images and over 2.5 million object instances labeled with object category, object bounding box, and segmentation mask. It is one of the most popular datasets for object detection, and many state-of-the-art object detection models are trained on COCO. With transfer learning, it could be fine-tuned to detect new objects or specific sub-groups of things, like “teacup” or “coffee cup” instead of just “cup”.