Configure a vision pipeline

You have a camera on your machine, and you have an ML model: either a pre-trained one from the registry, one you trained yourself (managed training or a custom training script), or one you brought from elsewhere. This how-to wires them up: an ML model service loads the model and a vision service turns the model’s output into detections, classifications, or 3D point cloud objects.

Downstream how-tos (detect, classify, track, measure depth) assume the pipeline described here is running.

What you are configuring, in one paragraph

Viam splits ML inference into two services. The ML model service loads the model file and runs tensors through it. The vision service turns those tensors into structured detections and classifications and ties them to a specific camera. Your application code talks to the vision service; the ML model service is a building block underneath. For the longer explanation and when the split matters, see How a vision service works.

1. Add an ML model service

The ML model service matches your model file’s framework. For most tasks, tflite_cpu is the right starting point.

  1. Navigate to the CONFIGURE tab of your machine in the Viam app.
  2. Click the + icon next to your machine part and select Configuration block.
  3. In the search field, type tflite_cpu (or the service matching your model format) and select the matching result. For other frameworks, see the framework table.
  4. Click Add component, name the service my-ml-model, and click Add component again to confirm.

2. Configure the ML model service

Point the service at your model file. The Builder flow populates the config for you; the JSON tab shows what you end up with (or what to write manually, for example for a local model file).

  1. Click your new ML model service card on the CONFIGURE tab.
  2. Click Select model. A dialog titled “Select a model” opens.
  3. Use the My models and Registry tabs to switch between models in your organization and public models in the Viam Registry. Use the search field and the Task type, Framework, and Visibility filters to narrow the list.
  4. Click a model card to open its details view. Pick a Version from the dropdown: Latest (auto-updates when a newer version is published) or a specific timestamp version (recommended for production).
  5. Click Choose. The dialog closes and the service panel now shows the selected model as a pill with its version and author.

Behind the scenes the builder adds a packages entry for the model and sets model_path and label_path on the service attributes to point into the package. Registry models ship with their own labels.txt, so you do not create one yourself.

{
  "name": "my-ml-model",
  "api": "rdk:service:mlmodel",
  "model": "tflite_cpu",
  "attributes": {
    "model_path": "${packages.my-model}/model.tflite",
    "label_path": "${packages.my-model}/labels.txt"
  }
}

${packages.my-model} resolves to the directory where the registry package was downloaded. Replace my-model with the name of your deployed model package. Registry models ship with their own labels.txt, so the label_path above resolves to the labels file inside the package. You do not need to create one yourself. The packages array at the top level of the machine config (not shown here) names the package to download.

{
  "name": "my-ml-model",
  "api": "rdk:service:mlmodel",
  "model": "tflite_cpu",
  "attributes": {
    "model_path": "/path/to/your/model.tflite",
    "label_path": "/path/to/your/labels.txt"
  }
}

For local models, label_path is optional but recommended: it maps the numeric class IDs the model outputs to human-readable names like person or car. Provide a .txt file with one label per line, in class-ID order (line 0 is class 0).

For more on the deployment flow, see Deploy a model from the registry or Deploy a custom ML model.

3. Add a vision service

  1. Click the + icon and select Configuration block.
  2. In the search field, type vision or mlmodel and select the vision/mlmodel result.
  3. Click Add component, name the service my-detector, and click Add component again to confirm.

4. Configure the vision service

{
  "name": "my-detector",
  "api": "rdk:service:vision",
  "model": "mlmodel",
  "attributes": {
    "mlmodel_name": "my-ml-model",
    "camera_name": "my-camera"
  }
}

mlmodel_name must match the name of the ML model service from step 1. camera_name is the default camera used by GetDetectionsFromCamera, GetClassificationsFromCamera, and GetObjectPointClouds when no camera name is passed in the call.

For the full list of mlmodel vision service attributes (confidence thresholds, per-label thresholds, tensor remapping, input normalization), see the mlmodel reference. If your detections come out shifted, mirrored, or with unexpected labels, see Tune detection quality to find the attribute that fixes your symptom.

5. Save

Click Save in the upper right. viam-server reconfigures in place and initializes both services.

6. Verify with the control card

The fastest check is the Viam app’s live overlay:

  1. Go to the CONTROL tab.
  2. Find your vision service in the component list and open it.
  3. In the Camera dropdown, select the camera whose feed you want the vision service to run on. Detections appear as an overlay on the live camera feed and refresh automatically.

Bounding boxes or classification labels should appear on the live camera feed within a second or two. If you are using a COCO-class general-purpose model, point the camera at a person, a cup, or a keyboard.

If the camera feed appears but no detections are shown, see Tune detection quality.

From code, you can confirm which roles the vision service registered by calling GetProperties. The response is three booleans reporting whether detections, classifications, and 3D point clouds are supported at runtime.

7. Complete configuration

With both services saved, a minimal end-to-end configuration (camera + ML model service + vision service) looks like this:

{
  "components": [
    {
      "name": "my-camera",
      "api": "rdk:component:camera",
      "model": "webcam",
      "attributes": {}
    }
  ],
  "services": [
    {
      "name": "my-ml-model",
      "api": "rdk:service:mlmodel",
      "model": "tflite_cpu",
      "attributes": {
        "model_path": "${packages.my-model}/model.tflite",
        "label_path": "${packages.my-model}/labels.txt"
      }
    },
    {
      "name": "my-detector",
      "api": "rdk:service:vision",
      "model": "mlmodel",
      "attributes": {
        "mlmodel_name": "my-ml-model",
        "camera_name": "my-camera"
      }
    }
  ]
}

viam-server resolves the dependencies between the camera, ML model service, and vision service automatically, so order within the file does not matter.

Try it from code

Verify end-to-end by pulling a detection from your own code.

Install the SDK if you have not already:

pip install viam-sdk

Save as vision_test.py:

import asyncio

from viam.robot.client import RobotClient
from viam.services.vision import VisionClient


async def main():
    opts = RobotClient.Options.with_api_key(
        api_key="YOUR-API-KEY",
        api_key_id="YOUR-API-KEY-ID",
    )
    robot = await RobotClient.at_address("YOUR-MACHINE-ADDRESS", opts)

    detector = VisionClient.from_robot(robot, "my-detector")
    detections = await detector.get_detections_from_camera("my-camera")

    print(f"Found {len(detections)} detections:")
    for d in detections:
        print(f"  {d.class_name}: {d.confidence:.2f}")

    await robot.close()


if __name__ == "__main__":
    asyncio.run(main())

Run it:

python vision_test.py
mkdir vision-test && cd vision-test
go mod init vision-test
go get go.viam.com/rdk

Save as main.go:

package main

import (
    "context"
    "fmt"

    "go.viam.com/rdk/logging"
    "go.viam.com/rdk/robot/client"
    "go.viam.com/rdk/services/vision"
    "go.viam.com/utils/rpc"
)

func main() {
    ctx := context.Background()
    logger := logging.NewLogger("vision-test")

    machine, err := client.New(ctx, "YOUR-MACHINE-ADDRESS", logger,
        client.WithDialOptions(rpc.WithEntityCredentials(
            "YOUR-API-KEY-ID",
            rpc.Credentials{
                Type:    rpc.CredentialsTypeAPIKey,
                Payload: "YOUR-API-KEY",
            })),
    )
    if err != nil {
        logger.Fatal(err)
    }
    defer machine.Close(ctx)

    detector, err := vision.FromProvider(machine, "my-detector")
    if err != nil {
        logger.Fatal(err)
    }

    detections, err := detector.DetectionsFromCamera(ctx, "my-camera", nil)
    if err != nil {
        logger.Fatal(err)
    }

    fmt.Printf("Found %d detections:\n", len(detections))
    for _, d := range detections {
        fmt.Printf("  %s: %.2f\n", d.Label(), d.Score())
    }
}

Run it:

go run main.go

You should see a list of detected objects with their confidence scores. If the list is empty, point the camera at something your model was trained to recognize.

Get the placeholder values from the Viam app:

  1. Open your machine’s CONNECT tab.
  2. Select SDK code sample.
  3. Click the Include API key switch. The snippet on the page regenerates with your machine’s real API key, API key ID, and machine address in place of the <API-KEY>, <API-KEY-ID>, and <MACHINE-ADDRESS> placeholders. Click the copy icon on the snippet to copy the whole thing.
  4. To view or copy API keys separately, open the API keys sidebar item on the same tab.

For the three kinds of registry entries a vision pipeline uses (ML model service implementations, vision service models, public ML models) and how to pick among them, see What’s in the registry for vision. Browse the registry directly at app.viam.com/registry.

What’s next