Computer vision

Computer vision on Viam turns camera frames into structured results your code can act on: bounding boxes, class labels, or 3D point cloud objects. Three built-in vision service models cover the common tasks, and the registry has more for specialized cases.

What the vision service does

The vision service exposes a single API. Your code calls the same methods whether the underlying detector is an ML model, a color heuristic, or a 3D segmenter. The implementation you pick decides what the service recognizes; the code you write does not change.

Available methods:

Pick a built-in model

Most vision tasks fall into one of three goals. Pick the matching model:

GoalModelWhen to use it
Detect or classify objects with a trained ML modelmlmodelGeneral-purpose path: wraps a TFLite, ONNX, TensorFlow, or PyTorch model deployed through the ML model service.
Find regions of a specific huecolor_detectorThe target stands out by color, no training data is available, or the task is too simple for ML.
Project 2D detections into 3D point cloud objectsviam:vision:detections-to-segmentsYou need 3D positions of detected objects (for example, to pick them up). Requires a depth camera.

For more specialized tasks (small-object detection, face recognition, hand pose estimation, specific model architectures), browse the registry.

Detection vs classification vs segmentation

These three task types answer different questions.

Detection asks where objects are in an image. A detector returns one bounding box per object, each with a label and confidence score. Use detection when you need object locations. Stopping a robot when a person enters a zone or guiding an arm to pick up a cup both need detection.

Classification asks what the image contains. A classifier returns a small number of labels with confidence scores for the whole image or a region of it. Use classification when you just need to categorize the scene. “Is this picture of a cat or a dog?” and “is the conveyor belt clear or blocked?” are classification questions.

3D segmentation asks where in 3D space objects are. A 3D segmenter returns a point cloud per object with coordinates in the camera frame. Use it when a robot needs physical positions. Planning an arm motion to an object or feeding obstacle positions into a navigation stack both need 3D segmentation.

Verify that it works

After configuring a vision service, the fastest way to confirm it is producing results is the vision control card in the Viam app:

  1. Open your machine in the Viam app.
  2. Navigate to the CONTROL tab and click your vision service.
  3. In the Camera dropdown, select the camera whose feed you want the vision service to run on. Detections or classifications appear as an overlay on the live camera feed at up to 20 frames per second. The overlay refreshes automatically.

The control card calls CaptureAllFromCamera under the hood, so what you see on screen matches what your code receives. If the overlay is empty, check GetProperties to confirm the service registered in the role you expected, then lower the confidence threshold and try again.

Close the loop with retraining

Accuracy almost always drops when a model moves from the lab to production. The training-to-production gap is the single most common reason a vision system fails after launch. Viam provides the pieces to close that loop on production machines:

  1. Capture failing images from deployed machines with data capture.
  2. Label the new images in the DATA tab and update a dataset.
  3. Retrain the model with managed training or a custom script.
  4. Deploy the new model version through the ML model service and push it to your fleet.

This capture, label, train, deploy, monitor cycle is often called CVOps (computer-vision operations). Viam’s data, training, fleet, and vision sections are the tools. The vision section is where your code meets the model.

Where to go next

Get started

Classification

Object detection

3D vision

Deploy and maintain models

Reference