Run batch inference on stored data

Use viam infer to run a deployed ML model against an image that is already stored in the Viam Cloud. The model runs in Viam’s cloud infrastructure, so this is the path to use when:

  • The target machine cannot run the model locally (not enough compute, wrong architecture, or the model requires a GPU).
  • You want to try a model against a large backlog of captured images without reconfiguring the machine.
  • You are building a labeling-assistance pipeline: an existing model proposes initial labels on stored images, which a human then reviews.
  • You need to validate a new model against historical production images before deploying it to live machines.

viam infer is a batch, one-image-at-a-time CLI command. For live inference on a camera feed, use a vision service on the machine instead.

Prerequisites

  • The Viam CLI installed and authenticated (viam login).
  • An image captured to the Viam Cloud with a known binary data ID.
  • A deployed ML model in the Viam registry (yours or shared with you).
  • Your organization ID and the model’s organization ID.

1. Find the binary data ID

  1. Navigate to the DATA tab in the Viam app.
  2. Filter to the image you want to run inference on.
  3. Click the image to open its side panel.
  4. Copy the binary data ID from the panel header.

2. Find the model information

  1. Navigate to the MODELS tab.
  2. Find the model you want to run.
  3. Note:
    • The model name.
    • The organization ID that owns the model.
    • The specific version you want to use (dropdown or timestamp).

3. Find your organization ID

Run:

viam organizations list

Copy the organization ID for the organization that should run the inference (this can be a different organization from the model’s owner).

4. Run the command

viam infer \
  --binary-data-id <binary-data-id> \
  --model-name <model-name> \
  --model-org-id <org-that-owns-model> \
  --model-version <version> \
  --org-id <org-that-runs-inference>

Flag reference

FlagRequiredDescription
--binary-data-idYesID of the image to run the model against. From the DATA tab.
--model-nameYesModel name as it appears in the MODELS tab.
--model-org-idYesID of the organization that owns the model.
--model-versionYesSpecific model version to use, typically a timestamp like 2025-04-14T16-38-25. Does not accept latest.
--org-idNoID of the organization that runs the inference. Defaults to the model’s organization.

Example output

Inference Response:
Output Tensors:
  Tensor Name: num_detections
    Shape: [1]
    Values: [1.0000]
  Tensor Name: classes
    Shape: [32 1]
    Values: [...]
  Tensor Name: boxes
    Shape: [32 1 4]
    Values: [...]
  Tensor Name: confidence
    Shape: [32 1]
    Values: [...]
Annotations:
Bounding Box Format: [x_min, y_min, x_max, y_max]
  No annotations.

Bounding box coordinates are returned as proportions between 0 and 1, with (0, 0) in the top-left and (1, 1) in the bottom-right. Multiply by the image width and height to get pixel coordinates.

Script the command for many images

viam infer runs against one image per invocation. To run against many images, script the CLI call in a loop.

Bash example

#!/usr/bin/env bash
# Run a model against every image matching a filter, print detections to JSONL.
set -euo pipefail

MODEL_NAME="person-detector"
MODEL_ORG="abcdef12-0000-0000-0000-000000000000"
MODEL_VERSION="2025-04-14T16-38-25"
ORG_ID="ghijkl34-0000-0000-0000-000000000000"

# Get binary data IDs for all images captured in a time range.
viam data export --mime-types image/jpeg \
  --start 2025-04-01T00:00:00Z --end 2025-04-07T23:59:59Z \
  --output-format ids > ids.txt

while read -r BIN_ID; do
  echo "=== $BIN_ID ==="
  viam infer \
    --binary-data-id "$BIN_ID" \
    --model-name "$MODEL_NAME" \
    --model-org-id "$MODEL_ORG" \
    --model-version "$MODEL_VERSION" \
    --org-id "$ORG_ID"
done < ids.txt

Run time is a few seconds per image plus cold-start time on the first call. Larger models take longer.

Rate limits and cost

Cloud inference consumes Viam cloud compute. Check your organization’s billing page before starting a large batch. For backlogs above a few thousand images, consider:

  • Filtering the images to the subset that actually needs inference.
  • Sampling rather than running every image.
  • Running inference on-machine instead, if the machines are powerful enough.

When not to use viam infer

  • Live camera feeds: use a vision service on the machine. viam infer is not for real-time inference.
  • Raw tensor outputs for non-vision uses: use the ML model service API directly; viam infer is a CLI wrapper, not a general tensor API.
  • Modifying or retraining the model: this command only runs inference. Training happens in the train section.

Next steps