Train a model
Submit a training job from your labeled dataset. Viam runs the job on cloud infrastructure – no GPU provisioning or framework installation needed. Training logs are available for 7 days after the job completes.
For background on model frameworks (TFLite and TensorFlow), task types, and how deployment works, see the overview.
1. Start a training job from the web UI
- Go to app.viam.com.
- Click the DATA tab in the top navigation.
- Click the DATASETS subtab.
- Click the dataset you want to train on.
- Click Train model.
- Select the model framework:
- TFLite for edge devices (recommended for most use cases)
- TF for general-purpose models requiring more compute
- Enter a name for your model. Use a descriptive name like
part-inspector-v1orpackage-detector-v1. This name identifies the model in your organization’s registry. - Select the task type:
- Single Label Classification if each image has one tag
- Multi Label Classification if images have multiple tags
- Object Detection if you used bounding box annotations
- Select which labels to include in training. You can exclude labels that have too few examples or that you do not want the model to learn.
- Click Train model.
The training job starts. You will see a confirmation message with the job ID.
2. Start a training job from the CLI
If you prefer the command line, use the Viam CLI:
viam train submit managed \
--dataset-id=YOUR-DATASET-ID \
--model-org-id=YOUR-ORG-ID \
--model-name=part-inspector-v1 \
--model-type=single_label_classification \
--model-framework=tflite \
--model-labels=good-part,defective-part
Required flags:
| Flag | Description | Accepted values |
|---|---|---|
--dataset-id | Dataset to train on | Your dataset ID |
--model-org-id | Organization to save the model in | Your organization ID |
--model-name | Name for the trained model | Any string |
--model-type | Task type | single_label_classification, multi_label_classification, object_detection |
--model-framework | Model framework | tflite, tensorflow |
--model-labels | Labels to train on | Comma-separated list of labels from your dataset |
--model-version is optional and defaults to the current timestamp.
The command returns a training job ID that you can use to check status.
3. Monitor training progress
Web UI:
- In the Viam app, click the DATA tab.
- Click the MODELS subtab, then expand Active Training.
- You will see a list of training jobs with their status:
- Pending – the job is queued
- In Progress – training is running
- Completed – the model is ready
- Failed – something went wrong
- Canceled – the job was canceled before completing
- Click a job ID to view detailed logs.
CLI:
Check the status of a training job:
viam train get --job-id=YOUR-JOB-ID
View training logs:
viam train logs --job-id=YOUR-JOB-ID
Training logs expire after 7 days. If you need to retain logs for longer, copy them before they expire.
async def main():
viam_client = await connect()
ml_training_client = viam_client.ml_training_client
job = await ml_training_client.get_training_job(
id="YOUR-TRAINING-JOB-ID",
)
print(f"Status: {job.status}")
print(f"Model name: {job.model_name}")
print(f"Created: {job.created_on}")
viam_client.close()
job, err := mlTrainingClient.GetTrainingJob(ctx, "YOUR-TRAINING-JOB-ID")
if err != nil {
logger.Fatal(err)
}
fmt.Printf("Status: %s\n", job.Status)
fmt.Printf("Model name: %s\n", job.ModelName)
fmt.Printf("Created: %s\n", job.CreatedOn)
4. Test your model
After training completes, test the model by deploying it to a machine with a vision service and checking its predictions against live or captured data.
- Deploy the model to a machine with a camera.
- Configure a vision service that uses the model.
- On the machine’s CONTROL tab, open the vision service panel to see live classifications or detections.
- Evaluate the results against a variety of conditions:
- Images that clearly belong to each class (should get high confidence)
- Ambiguous images (helps you understand the model’s decision boundary)
- Images from conditions not in the training set (reveals generalization gaps)
5. Deploy and iterate
When training completes, the model is stored in your organization’s registry. See Deploy a model to a machine to configure the module, ML model service, and vision service on your machine.
After deploying, improve your model by collecting targeted data where it struggles (edge cases, counterexamples, varied conditions), using auto-annotation to label efficiently, and retraining. If your machine is configured to use the model, the new version deploys automatically.
To review past training jobs:
async def main():
viam_client = await connect()
ml_training_client = viam_client.ml_training_client
jobs = await ml_training_client.list_training_jobs(
org_id=ORG_ID,
)
for job in jobs:
print(f"Job: {job.id}, Status: {job.status}, "
f"Model: {job.model_name}, Created: {job.created_on}")
viam_client.close()
jobs, err := mlTrainingClient.ListTrainingJobs(
ctx, orgID, app.TrainingStatusUnspecified)
if err != nil {
logger.Fatal(err)
}
for _, job := range jobs {
fmt.Printf("Job: %s, Status: %d, Model: %s, Created: %s\n",
job.ID, job.Status, job.ModelName, job.CreatedOn)
}
Troubleshooting
What’s next
- Deploy a model to a machine – configure the module, ML model service, and vision service to run your model.
- Add computer vision – the full guide to configuring vision services and cloud inference.
- Detect objects (2D) – use your object detection model to find and locate objects in camera images.
- Classify images – use your classification model to categorize images from your machine’s camera.
Was this page helpful?
Glad to hear it! If you have any other feedback please let us know:
We're sorry about that. To help us improve, please tell us what we can do better:
Thank you!