Since this is in the physical AI / real-world robotics area, I think the LeRobot Discord is probably the best place to get the most useful follow-up. But here is how I would separate the pieces first, based on what I can check from the public docs, examples, and related issues:
Short version
I think there are three different visualization/debugging questions mixed together here:
| Question |
Tool family |
What it helps with |
| “Is my recorded dataset sane?” |
lerobot-dataset-viz, LeRobot dataset visualizers, Rerun-style episode inspection |
Camera streams, robot states, actions, episode structure |
| “How is training progressing?” |
W&B, Trackio, TensorBoard, CSV/JSONL logs |
Loss, learning rate, grad norm, eval metrics |
| “Will the policy actually work on the robot?” |
Evaluation rollouts, open-loop evaluation, real robot testing, dataset/action sanity checks |
Success rate, action correctness, camera/state/action mismatch |
So I would not treat lerobot-dataset-viz as a replacement for a learning-curve dashboard. It is more of a dataset/episode inspection tool. For learning curves, W&B is the documented path in current LeRobot examples, and Trackio looks like the most relevant Hugging Face-native local/W&B-like alternative, but probably not a confirmed one-flag replacement for lerobot-train yet.
For SmolVLA/SO101 specifically, I would also be careful not to over-trust the loss curve. There are related LeRobot issues where the loss converged and/or W&B plots looked fine, but evaluation success was still 0%. That suggests that for VLA/robotics, the dataset schema, camera setup, state/action definition, normalization/statistics, and rollout evaluation can matter as much as the scalar training curve.
1. First split: dataset visualization vs training metrics
Dataset/episode visualization
LeRobot has dataset visualization tools for looking at recorded episodes. This is useful for checking things like:
- camera frames
- camera names/views
- robot state streams
- action streams
- episode timing
- whether the recorded behavior looks physically plausible
Relevant docs/pages:
This kind of tool answers questions like:
Did I record the right cameras, states, actions, and episodes?
It does not directly answer:
Is my loss decreasing over training steps?
Those are different layers.
Training metric visualization
For training curves, the current LeRobot docs show W&B as the normal documented example. In the real-world imitation learning tutorial, wandb.enable=true is described as optional and used for visualizing training plots:
So for training metrics, I would think in terms of:
| Option |
Local? |
Good for |
Caveat |
| W&B |
Not local-first by default |
Mature experiment tracking, training plots, media, artifacts |
Requires W&B setup/login unless using offline mode |
| W&B offline |
Local logging first |
Keeping W&B-style logs without immediate cloud sync |
Still W&B-oriented; dashboard workflow may not be what you want |
| Trackio |
Yes, local-first |
Local scalar curves and lightweight dashboards |
Promising, but not necessarily a full W&B replacement for LeRobot |
| TensorBoard |
Yes |
Classic local scalar curves |
May require adding a writer if not already supported |
| CSV/JSONL logs |
Yes |
Simple, robust, reproducible |
No rich dashboard unless you build/plot one |
2. Trackio may be the HF-native option you were thinking of
If you were remembering a Gradio-based Hugging Face alternative to W&B, I think you may be thinking of Trackio:
Trackio is very relevant here because it is:
- Hugging Face-native
- local-first
- W&B-like
- built around a Gradio dashboard
- designed to log experiment metrics
- able to sync/share through Hugging Face Spaces
The Trackio migration docs say that migrating from W&B is usually simple because Trackio uses W&B-like API syntax. In simple scripts, the idea can be as small as:
import trackio as wandb
wandb.init(project="my-project", name="my-run")
wandb.log({"train/loss": 0.123, "train/lr": 1e-4}, step=100)
wandb.finish()
That said, I would be careful with wording here.
I would say:
Trackio looks like the closest Hugging Face-native local/W&B-like option for scalar training curves.
I would not say:
Trackio is a guaranteed drop-in replacement for LeRobot’s current --wandb.enable=true path.
Why not? Because LeRobot appears to have its own W&B-specific logger wrapper rather than only calling plain wandb.log() everywhere. So Trackio may work well with a small custom logger/wrapper, but I would not assume that lerobot-train already exposes something like:
lerobot-train \
--trackio.enable=true
unless that has been added in the specific LeRobot version you are using.
A safer expectation is:
| LeRobot logging feature |
Trackio likelihood |
Notes |
| Scalar metrics: loss, lr, grad norm |
High |
This is the easiest case |
| Eval metrics |
High |
If logged as scalars |
| Tables/images |
Likely |
Trackio has W&B-like media APIs, but exact behavior should be checked |
| Videos |
Maybe |
Needs checking for the exact current API and dashboard behavior |
| Checkpoint/artifact tracking |
Be careful |
W&B Artifacts and Trackio storage are not necessarily equivalent |
| Resume/run-id behavior |
Be careful |
W&B-specific run resume logic may not map 1:1 |
| Full W&B feature parity |
No |
Trackio is lightweight, not a full W&B clone |
So my practical recommendation would be:
- Use the standard documented W&B path first if you are okay with W&B.
- If you want local-first scalar curves, investigate Trackio.
- If using
lerobot-train, assume Trackio may need a small logger wrapper or code patch.
- If you only need a quick local curve, parse stdout/logs or write CSV/JSONL first.
3. Why the loss curve is not enough in SmolVLA/SO101
This is the most important robotics-specific point.
In ordinary ML, a learning curve can often tell you a lot. In real-world robotics and VLA training, it is only one signal.
There are related LeRobot issues where training loss or W&B plots looked good, but evaluation did not work:
The main lesson I would take from those is:
A clean loss curve does not guarantee a working rollout.
For SmolVLA/SO101, I would inspect at least these layers:
| Layer |
What to check |
Why it matters |
| Camera setup |
Number of cameras, camera names, view order, resolution |
VLA policies are sensitive to visual input schema |
| State schema |
Shape, order, meaning of observation.state |
A converged loss can still learn the wrong mapping if state semantics differ |
| Action schema |
Shape, order, joint vs end-effector meaning, gripper representation |
Action mismatch can make rollout fail even if training looks fine |
| Dataset metadata |
meta/info.json, feature names, fps, codebase version |
Confirms what the dataset actually contains |
| Dataset statistics |
meta/stats.json, normalization values |
Wrong normalization can break policy behavior |
| Episode visualization |
Camera/state/action streams |
Helps detect recording/config mistakes |
| Evaluation |
Open-loop eval, sim eval if available, real rollout |
The final check is behavior, not just loss |
| Versioning |
LeRobot version, model checkpoint, dataset format version |
LeRobot/SmolVLA are moving quickly |
The SmolVLA docs describe SmolVLA as taking multiple camera views, the current sensorimotor state, and a natural language instruction, then generating an action chunk. That means the model is not just learning from a text prompt or a single tensor. The camera/state/action contract matters.
4. What I would try locally
If I wanted the simplest local path before going deeper, I would try this order.
Step 1: Confirm the dataset visually
Use the LeRobot dataset visualization path first.
Things to look for:
- Are all expected camera views present?
- Do the camera names match what the policy/config expects?
- Are the wrist/front/top/side views in the expected places?
- Does the robot state change smoothly?
- Do actions look non-zero and physically meaningful?
- Are gripper actions represented correctly?
- Is fps consistent with what the training config expects?
- Are there broken/missing videos or episodes?
Relevant links:
Step 2: Inspect metadata and stats
Open the dataset metadata files if available.
For LeRobotDataset v3, I would look at:
meta/info.json
meta/stats.json
meta/tasks.jsonl
meta/episodes.jsonl
In particular:
observation.state
action
observation.images.<camera_name>
fps
features
shape
dtype
codebase_version
This is boring but important. If the dataset schema and policy expectation disagree, the loss curve may not tell you the real problem.
Step 3: Start with the official W&B path if possible
If you can use W&B, the official path is probably the least surprising first test:
lerobot-train \
--policy.path=lerobot/smolvla_base \
--dataset.repo_id=<your-dataset-repo-id> \
--batch_size=<batch-size> \
--steps=<num-steps> \
--wandb.enable=true
The exact command should follow the current LeRobot imitation learning docs and SmolVLA docs, because the CLI/config names can change across LeRobot versions.
Step 4: If you want local-first curves, try Trackio separately
For a custom training script, Trackio may be very simple:
import trackio as wandb
wandb.init(
project="smolvla-so101",
name="local-test",
config={
"policy": "smolvla_base",
"robot": "so101",
},
)
wandb.log(
{
"train/loss": 0.123,
"train/lr": 1e-4,
"train/grad_norm": 0.5,
},
step=100,
)
wandb.finish()
For lerobot-train, I would expect this to require a small logger integration unless LeRobot has added official Trackio support in your version.
Step 5: If you want the most robust local fallback, log CSV/JSONL
A very boring but reliable fallback is:
{"step": 100, "train/loss": 0.123, "train/lr": 0.0001, "train/grad_norm": 0.5}
{"step": 200, "train/loss": 0.098, "train/lr": 0.0001, "train/grad_norm": 0.47}
Then plot it locally with Python.
This is not fancy, but it avoids account setup, dashboard assumptions, and integration drift.
5. What to ask in LeRobot Discord
For SO101/SmolVLA, I would bring a compact but complete report to the LeRobot Discord. That will probably get better answers than only asking “how do I visualize the curve?”
Useful information to include:
| Category |
Include |
| LeRobot version |
pip show lerobot, git commit, or install method |
| Command |
Exact lerobot-train command |
| Policy |
lerobot/smolvla_base or other checkpoint |
| Robot |
SO101 / SO100 / other, follower/leader setup |
| Dataset |
Hub repo id or local path |
| Dataset format |
LeRobotDataset version if known |
| Cameras |
Number, names, views, order |
| State/action |
Shapes from metadata |
| Metadata |
Relevant parts of meta/info.json |
| Stats |
Relevant parts of meta/stats.json |
| Training curves |
loss, lr, grad_norm, eval metrics if any |
| Visualization |
screenshots or notes from lerobot-dataset-viz |
| Evaluation |
open-loop eval, real rollout behavior, success/failure examples |
| Requirement |
whether you need fully local/offline visualization |
A good short Discord/forum report might look like:
I am fine-tuning SmolVLA on SO101 with LeRobot.
Goal:
- I want to visualize training curves locally if possible.
- I also want to confirm whether my dataset/camera/action setup is correct.
Setup:
- LeRobot version: <version-or-commit>
- Install method: <pip/source/docker/etc>
- Policy: <policy-path>
- Dataset: <dataset-repo-or-local-path>
- Robot: SO101
- Cameras: <camera-names-and-count>
- Training command: <exact-command>
What I checked:
- lerobot-dataset-viz: <works/does-not-work>
- meta/info.json: <relevant-shapes>
- meta/stats.json: <normalization-stats>
- W&B/Trackio/TensorBoard/logs: <what-you-tried>
Observed behavior:
- Training loss: <summary>
- Eval/rollout: <summary>
- Failure mode: <what-the-robot-does>
That gives the LeRobot community enough context to answer the robotics-specific part.
6. My current recommendation
If your immediate goal is just “I want to see the learning curve locally,” I would rank the options like this:
| Rank |
Option |
Why |
| 1 |
Parse local logs / CSV / JSONL |
Most robust, fully local, no integration risk |
| 2 |
Trackio |
Best HF-native local/W&B-like dashboard candidate |
| 3 |
W&B offline |
Good if you already want W&B-style tracking |
| 4 |
TensorBoard |
Solid generic local ML tool |
| 5 |
Full W&B online |
Easiest if you accept W&B account/cloud workflow |
But for SmolVLA/SO101 specifically, I would not stop at the learning curve. I would also inspect:
- dataset episodes
- camera names/order/count
meta/info.json
meta/stats.json
- state/action shapes
- normalization
- open-loop evaluation
- real rollout behavior
In other words:
Trackio may help you see the curve, but lerobot-dataset-viz and dataset metadata may help you understand whether the curve is meaningful.
7. Links worth checking
LeRobot / SmolVLA
Trackio
Related GitHub issues
Practical physical AI examples