Since this is Physical AI, I would still recommend asking in the LeRobot Discord for a definitive answer. But at first glance, it looks like there may be a few common traps here:
Short version
I would not debug this as a camera-resolution bug first.
The first thing I would check is whether you are mixing two different LeRobot paths:
-
Simulation/environment evaluation
Example shape:lerobot-eval --env.type=pusht -
Real SO101 + SmolVLA + physical cameras
Example shape:lerobot-rolloutor olderlerobot-record, with explicit--robot.type,--robot.port,--robot.cameras, and--policy.path
Your description sounds like a real SO101 + SmolVLA + camera setup, but the command you showed uses:
--env.type=pusht
That is the first thing I would separate.
Decision tree
| Question | If yes | If no |
|---|---|---|
| Are you intentionally evaluating in the PushT simulation environment? | lerobot-eval --env.type=pusht may be the right direction. Debug the PushT env / policy / eval config. |
Go to the next question. |
| Are you trying to run a policy on a real SO101 with physical cameras? | Do not start from --env.type=pusht. Use the real-robot rollout/record path and pass camera settings through --robot.cameras. |
The answer depends on the actual environment you want to evaluate. |
| Are you already using the real-robot path and explicitly passing camera width/height/fps? | Then debug camera backend / device identity / FPS / USB bandwidth / V4L2 vs FFMPEG. | First fix the command path and camera config. |
| Does the policy expect 3 cameras, but the robot/dataset provides different camera names/count/order? | Check SmolVLA image features, dataset metadata, camera keys, and rename maps. | Then focus on the camera backend/device layer. |
Why --env.type=pusht looks suspicious here
LeRobot examples often use lerobot-eval with an environment such as pusht or libero when evaluating a policy in a benchmark/simulation-style environment.
For example, the LeRobot README shows evaluation with an environment argument like:
lerobot-eval \
--policy.path=lerobot/pi0_libero_finetuned \
--env.type=libero \
--env.task=libero_object \
--eval.n_episodes=10
See: LeRobot README — Inference & Evaluation
There are also PushT examples/tests using:
--env.type=pusht
See: lerobot_eval.py
That does not look like the same layer as configuring real SO101 cameras.
For a real robot, the important object is not only an env; it is the robot and its camera config.
Real SO101 / physical camera path
For real robot inference, current LeRobot docs point to lerobot-rollout.
See: LeRobot Policy Deployment / lerobot-rollout
The cheat sheet also says:
Inference means running the trained policy/model on a robot. For that we use
lerobot-rollout.
It also notes that camera configuration needs to match what was used when collecting the dataset.
See: LeRobot Cheat Sheet
So for a real SO101 setup, I would expect the command to be closer to this shape:
lerobot-rollout \
--robot.type=so101_follower \
--robot.port=/dev/ttyACM0 \
--robot.id=<your_robot_id> \
--robot.cameras='{
front: {type: opencv, index_or_path: 0, width: 640, height: 480, fps: 30},
wrist: {type: opencv, index_or_path: 2, width: 640, height: 480, fps: 30}
}' \
--policy.path=<your_policy_path_or_hub_repo> \
--task="<your task>"
Exact arguments may differ by LeRobot version, so check your installed version first.
Version check
Before changing too much, I would collect:
lerobot-info
pip show lerobot
lerobot-eval --help
lerobot-rollout --help
lerobot-record --help
This matters because LeRobot CLI names and argument names have changed across versions. Some older examples use record.py, control_robot.py, or lerobot-record where newer docs use lerobot-rollout.
For example, there are real SO101 + SmolVLA issue examples using lerobot-record with explicit robot and camera config:
lerobot-record \
--robot.type=so101_follower \
--robot.port=/dev/ttyACM0 \
--robot.cameras="{camera1: {type: opencv, index_or_path: 0, width: 640, height: 480, fps: 30}, ...}" \
--policy.path=...
See: Inference lerobot/smolvla_base with SO-101 failed #2210
So I would not assume that a command from one LeRobot version maps cleanly to another version.
If the real-robot path is already correct, then check camera backend
After confirming that you are using the real SO101 robot path, then the camera issue becomes very plausible.
There is a relevant LeRobot issue where Linux/OpenCV camera settings did not apply correctly with the default backend:
backend=ANYauto-selected FFMPEG- requested
fourcc="MJPG"did not apply - width/height/fps setting failed or behaved incorrectly
- forcing
backend=V4L2made the same settings work
See: OpenCVCameraConfig backend=ANY vs V4L2 issue #3198
There is also a Hugging Face forum thread from an SO101 user where forcing Video4Linux2 was used as a temporary fix for camera backend problems:
See: Lerobot Camera Backend Issues
So if you are on Linux with USB/OpenCV cameras, the camera backend is a strong second-level suspect.
Camera discovery
I would also run:
lerobot-find-cameras opencv
LeRobot’s camera docs say OpenCV and RealSense cameras support auto-discovery, and that identifiers may change after rebooting or re-plugging cameras.
See: LeRobot Cameras docs
Please check:
| Item | Why it matters |
|---|---|
/dev/video* path or camera index |
It may change after unplug/reboot. |
| backend API | FFMPEG vs V4L2 can matter on Linux. |
| default stream profile | The camera may default to unexpected width/FPS. |
| supported modes | Some resolutions only work at lower FPS. |
fourcc / MJPG support |
MJPEG is often needed for higher resolution/FPS USB camera modes. |
SmolVLA-specific trap: camera keys/count/order
SmolVLA uses multiple camera views plus robot state and language instruction.
See: SmolVLA docs
So after fixing the command path and camera backend, another possible issue is that the policy expects different image features than your robot/dataset provides.
Things to check:
policy expected image keys
dataset image keys
robot camera names
camera order
number of cameras
image resolution
FPS
normalization/statistics
There are related issues where users hit camera feature/name/count problems with SmolVLA or SO101:
- Policy image features vs robot cameras mismatch #1620
- Best camera setup/order for SmolVLA + SO101 #1763
- Debugging poor eval with SmolVLA and two cameras #2753
- Problems with running pretrained lerobot/smolvla_base #2374
This may not be the first cause of your resolution/FPS symptom, but it is worth checking if the policy runs but behaves badly.
What I would ask you to post
For a useful LeRobot Discord/GitHub/forum question, I would include:
lerobot-info
pip show lerobot
lerobot-eval --help
lerobot-rollout --help
lerobot-record --help
lerobot-find-cameras opencv
Also include:
OS:
LeRobot version:
Install method:
Robot type:
Camera model(s):
Camera backend:
Full command:
Policy path:
Dataset repo_id:
Number of cameras used during training:
Number of cameras connected during evaluation:
Expected width/height/fps:
Actual width/height/fps:
Practical triage order
I would debug in this order:
- Decide whether this is PushT simulation evaluation or real SO101 robot rollout.
- If it is real SO101, switch to the real-robot path rather than
--env.type=pusht. - Pass cameras explicitly through
--robot.cameras. - Run
lerobot-find-cameras opencv. - If width/FPS still changes, check OpenCV backend: FFMPEG vs V4L2, MJPG, supported modes, and USB bandwidth.
- If the policy runs but behaves badly, check SmolVLA camera keys/count/order and dataset metadata.
My best guess
Based only on your command and description, the most likely first issue is not SmolVLA itself and not necessarily OpenCV yet.
The most likely first issue is:
The command is using a PushT environment-evaluation path,
while the problem description sounds like real SO101 robot-camera evaluation.
After that is fixed, then the OpenCV/V4L2/FPS issue becomes the next thing to check.