Edit

Computer vision

Computer vision on Viam turns camera frames into structured results your code can act on: bounding boxes, class labels, or 3D point cloud objects. Three built-in vision service models cover the common tasks, and the registry has more for specialized cases.

What the vision service does

The vision service exposes a single API. Your code calls the same methods whether the underlying detector is an ML model, a color heuristic, or a 3D segmenter. The implementation you pick decides what the service recognizes; the code you write does not change.

Available methods:

GetDetections and GetDetectionsFromCamera return 2D bounding boxes with labels and confidence.
GetClassifications and GetClassificationsFromCamera return top-N label-confidence pairs for the whole image.
GetObjectPointClouds returns 3D point cloud objects, each with a label.
CaptureAllFromCamera returns an image, its detections, classifications, and point clouds in a single round trip. Use this when you need more than one kind of result.
GetProperties reports which result types the service supports at runtime.

Pick a built-in model

Most vision tasks fall into one of three goals. Pick the matching model:

Goal	Model	When to use it
Detect or classify objects with a trained ML model	`mlmodel`	General-purpose path: wraps a TFLite, ONNX, TensorFlow, or PyTorch model deployed through the ML model service.
Find regions of a specific hue	`color_detector`	The target stands out by color, no training data is available, or the task is too simple for ML.
Project 2D detections into 3D point cloud objects	`viam:vision:detections-to-segments`	You need 3D positions of detected objects (for example, to pick them up). Requires a depth camera.

For more specialized tasks (small-object detection, face recognition, hand pose estimation, specific model architectures), browse the registry.

Detection vs classification vs segmentation

These three task types answer different questions.

Detection asks where objects are in an image. A detector returns one bounding box per object, each with a label and confidence score. Use detection when you need object locations. Stopping a robot when a person enters a zone or guiding an arm to pick up a cup both need detection.

Classification asks what the image contains. A classifier returns a small number of labels with confidence scores for the whole image or a region of it. Use classification when you just need to categorize the scene. “Is this picture of a cat or a dog?” and “is the conveyor belt clear or blocked?” are classification questions.

3D segmentation asks where in 3D space objects are. A 3D segmenter returns a point cloud per object with coordinates in the camera frame. Use it when a robot needs physical positions. Planning an arm motion to an object or feeding obstacle positions into a navigation stack both need 3D segmentation.

Verify that it works

After configuring a vision service, the fastest way to confirm it is producing results is the vision control card in the Viam app:

Open your machine in the Viam app.
Navigate to the CONTROL tab and click your vision service.
In the Camera dropdown, select the camera whose feed you want the vision service to run on. Detections or classifications appear as an overlay on the live camera feed at up to 20 frames per second. The overlay refreshes automatically.

The control card calls CaptureAllFromCamera under the hood, so what you see on screen matches what your code receives. If the overlay is empty, check GetProperties to confirm the service registered in the role you expected, then lower the confidence threshold and try again.

Close the loop with retraining

Accuracy almost always drops when a model moves from the lab to production. The training-to-production gap is the single most common reason a vision system fails after launch. Viam provides the pieces to close that loop on production machines:

Capture failing images from deployed machines with data capture.
Label the new images in the DATA tab and update a dataset.
Retrain the model with managed training or a custom script.
Deploy the new model version through the ML model service and push it to your fleet.

This capture, label, train, deploy, monitor cycle is often called CVOps (computer-vision operations). Viam’s data, training, fleet, and vision sections are the tools. The vision section is where your code meets the model.

Where to go next

Get started

How a vision service works: the two-service architecture behind every vision pipeline
Configure a vision pipeline: end-to-end setup of an ML model service plus vision service

Classification

Classify images: whole-image labels and scene categorization

Object detection

Detect objects: 2D bounding boxes in code
Detect by color: no ML needed
Tune detection quality: match symptoms to the right mlmodel attribute
Track objects across frames: persistent IDs across video frames
Act on detections: trigger machine behavior from vision results
Alert on detections: send notifications on detections

3D vision

Segment 3D objects: point cloud objects with 3D coordinates
Measure depth: distance readings from a depth camera

Deploy and maintain models

What’s in the registry for vision: the three kinds of registry entries and how to pick among them
Deploy a model from the registry: pick a pre-trained model and run it on a machine
Deploy a custom ML model: bring your own trained model
Retrain when accuracy drops: the CVOps loop for model maintenance
Roll out a new model to a fleet: staged model rollouts
Run batch inference: run a model against stored images with viam infer

Reference

Was this page helpful?

Glad to hear it! If you have any other feedback please let us know:

We're sorry about that. To help us improve, please tell us what we can do better:

Thank you!