Computer vision
Computer vision on Viam turns camera frames into structured results your code can act on: bounding boxes, class labels, or 3D point cloud objects. Three built-in vision service models cover the common tasks, and the registry has more for specialized cases.
What the vision service does
The vision service exposes a single API. Your code calls the same methods whether the underlying detector is an ML model, a color heuristic, or a 3D segmenter. The implementation you pick decides what the service recognizes; the code you write does not change.
Available methods:
GetDetectionsandGetDetectionsFromCamerareturn 2D bounding boxes with labels and confidence.GetClassificationsandGetClassificationsFromCamerareturn top-N label-confidence pairs for the whole image.GetObjectPointCloudsreturns 3D point cloud objects, each with a label.CaptureAllFromCamerareturns an image, its detections, classifications, and point clouds in a single round trip. Use this when you need more than one kind of result.GetPropertiesreports which result types the service supports at runtime.
Pick a built-in model
Most vision tasks fall into one of three goals. Pick the matching model:
| Goal | Model | When to use it |
|---|---|---|
| Detect or classify objects with a trained ML model | mlmodel | General-purpose path: wraps a TFLite, ONNX, TensorFlow, or PyTorch model deployed through the ML model service. |
| Find regions of a specific hue | color_detector | The target stands out by color, no training data is available, or the task is too simple for ML. |
| Project 2D detections into 3D point cloud objects | viam:vision:detections-to-segments | You need 3D positions of detected objects (for example, to pick them up). Requires a depth camera. |
For more specialized tasks (small-object detection, face recognition, hand pose estimation, specific model architectures), browse the registry.
Detection vs classification vs segmentation
These three task types answer different questions.
Detection asks where objects are in an image. A detector returns one bounding box per object, each with a label and confidence score. Use detection when you need object locations. Stopping a robot when a person enters a zone or guiding an arm to pick up a cup both need detection.
Classification asks what the image contains. A classifier returns a small number of labels with confidence scores for the whole image or a region of it. Use classification when you just need to categorize the scene. “Is this picture of a cat or a dog?” and “is the conveyor belt clear or blocked?” are classification questions.
3D segmentation asks where in 3D space objects are. A 3D segmenter returns a point cloud per object with coordinates in the camera frame. Use it when a robot needs physical positions. Planning an arm motion to an object or feeding obstacle positions into a navigation stack both need 3D segmentation.
Verify that it works
After configuring a vision service, the fastest way to confirm it is producing results is the vision control card in the Viam app:
- Open your machine in the Viam app.
- Navigate to the CONTROL tab and click your vision service.
- In the Camera dropdown, select the camera whose feed you want the vision service to run on. Detections or classifications appear as an overlay on the live camera feed at up to 20 frames per second. The overlay refreshes automatically.
The control card calls CaptureAllFromCamera under the hood, so what you see on screen matches what your code receives. If the overlay is empty, check GetProperties to confirm the service registered in the role you expected, then lower the confidence threshold and try again.
Close the loop with retraining
Accuracy almost always drops when a model moves from the lab to production. The training-to-production gap is the single most common reason a vision system fails after launch. Viam provides the pieces to close that loop on production machines:
- Capture failing images from deployed machines with data capture.
- Label the new images in the DATA tab and update a dataset.
- Retrain the model with managed training or a custom script.
- Deploy the new model version through the ML model service and push it to your fleet.
This capture, label, train, deploy, monitor cycle is often called CVOps (computer-vision operations). Viam’s data, training, fleet, and vision sections are the tools. The vision section is where your code meets the model.
Where to go next
Get started
- How a vision service works: the two-service architecture behind every vision pipeline
- Configure a vision pipeline: end-to-end setup of an ML model service plus vision service
Classification
- Classify images: whole-image labels and scene categorization
Object detection
- Detect objects: 2D bounding boxes in code
- Detect by color: no ML needed
- Tune detection quality: match symptoms to the right
mlmodelattribute - Track objects across frames: persistent IDs across video frames
- Act on detections: trigger machine behavior from vision results
- Alert on detections: send notifications on detections
3D vision
- Segment 3D objects: point cloud objects with 3D coordinates
- Measure depth: distance readings from a depth camera
Deploy and maintain models
- What’s in the registry for vision: the three kinds of registry entries and how to pick among them
- Deploy a model from the registry: pick a pre-trained model and run it on a machine
- Deploy a custom ML model: bring your own trained model
- Retrain when accuracy drops: the CVOps loop for model maintenance
- Roll out a new model to a fleet: staged model rollouts
- Run batch inference: run a model against stored images with
viam infer
Reference
- Vision service API
- ML model service API
mlmodelconfigurationcolor_detectorconfigurationdetections-to-segmentsconfiguration
Was this page helpful?
Glad to hear it! If you have any other feedback please let us know:
We're sorry about that. To help us improve, please tell us what we can do better:
Thank you!