Nvidia Built Robots That Train Themselves Using AI Coding Agents

In brief

Nvidia, Carnegie Mellon, and UC Berkeley have released ENPIRE, a framework that lets AI coding agents run the full loop of teaching robots new skills with no human supervision.
Agents running Codex, Claude Code, and Kimi Code pushed an eight-robot fleet to a 99% success rate on tasks including pin insertion, GPU insertion, and zip-tie cutting.
Scaling from one robot to eight cut the time needed to master a task by more than half, though the token bill grew even faster than the time saved.

A fleet of eight robot arms at Nvidia's GEAR lab spent the past few weeks teaching themselves to insert pins, seat graphics cards, and cut zip ties. The only humans involved were the ones who wrote the paper afterward.

The skill came from ENPIRE, a framework detailed in a paper published Tuesday by researchers at Nvidia, Carnegie Mellon University, and UC Berkeley. ENPIRE hands the entire job of training a robot to AI coding agents, the same software that already writes and tests its own code, and lets them run that process directly on physical hardware.

Coding agents like OpenAI's Codex, Anthropic's Claude Code, and Moonshot's Kimi Code have spent the past year running what researchers call autoresearch—writing code, testing it, and rewriting it again without a person in the loop. That loop has mostly stayed on a screen, where resetting a failed experiment costs nothing. ENPIRE drags it into the physical world, where resetting an experiment means moving an actual robot arm.

Building the ‘Enpire’

The system splits the work into two stages. In the first, a human walks the agent through building two permanent tools: a reset routine that returns the workspace to a fresh starting position, and a reward function that watches camera footage to score success—basically a referee that never blinks and never takes a lunch break. That setup happens once, then gets reused for every attempt that follows.

Once those tools exist, the agent takes over completely. It searches published research for ideas, picks between training methods like imitation learning, reinforcement learning, or hand-written rules, then rewrites its own code and tests the result on the robot. Nothing in that loop requires a person to watch, which is either liberating or slightly unsettling depending on how you feel about a robot holding scissors unsupervised.

Nvidia ran the experiment on eight bimanual robot stations, each with its own hardware, computer, and coding agent. The stations trade progress via Git, the same tool coders use to merge code, so a winning idea spreads fleet-wide within minutes.

Researchers measured the payoff on “Push-T,” a task where a robot slides a T-shaped block into a target zone using only pushes, and pin insertion, where it threads pins into 4-millimeter holes. Scaling from one robot to eight cut the time to master Push-T from roughly five hours to two, and pin insertion from more than 90 minutes to about 40.

Across the four real-world tasks tested, the agents drove their policies to a 99% success rate, according to the paper. For pin insertion, the agents reached near-perfect reliability faster than a comparable human-in-the-loop method, the kind that still needs someone to show up every morning.

Nvidia's Jim Fan, the GEAR Lab co-lead who directs the company's AI research, called the project an effort to enable AutoResearch in the physical world for the first time. Fan said the team handed the agents a fleet of robots, a GPU allocation, and a token budget, then stepped back and let the robots take over.

Today, we enable AutoResearch in the physical world for the first time! Introducing ENPIRE: we give 8 Codex agents a fleet of robots, an allocation of GPUs, and generous token budget. We set them free with a simple goal: solve the task as quickly as possible, keep the robots busy… pic.twitter.com/zC0OQNzDBs

— Jim Fan (@DrJimFan) June 16, 2026

The gap between simulation and reality showed up almost immediately. All three coding agents solved Push-T inside a simulator, but two of the three failed once the same task moved onto a physical robot, the paper notes.

Simulators don't have friction problems. Real tables do.

Nvidia also tested ENPIRE inside RoboCasa, a simulated kitchen benchmark that scores robots on chores like opening cabinets or turning off stoves by success rate, mercifully without any risk of burning the place down. There, ENPIRE outperformed both Nvidia's own end-to-end model GR00T and CaP-X, a tool-using agent that skips the autoresearch loop entirely.

ENPIRE extends an idea Nvidia first floated with Eureka, a 2023 system that used a language model to write reward functions for robots inside a simulator instead of having human engineers do it by hand. ENPIRE moves that self-improvement loop off the simulator and onto real hardware, with the agent designing its own tests rather than just its own rewards.

The release lands the same week Alibaba unveiled its own embodied-AI push, the Qwen-Robot Suite, a trio of foundation models for robot navigation, manipulation, and physics simulation. Alibaba is building software brains for robot bodies it doesn't manufacture; Nvidia is testing whether agents can run the whole research loop on hardware it owns end to end. Both point to the same trend: physical robots are becoming the next arena for coding agents to compete in.

Daily Debrief Newsletter

Start every day with the top news stories right now, plus original features, a podcast, videos and more.

Read full story at Decrypt

Nvidia Built Robots That Train Themselves Using AI Coding Agents

In brief

Building the ‘Enpire’

Daily Debrief Newsletter

Related Stories