The race to build a general-purpose “robot brain” has produced three distinct architectural bets. Physical Intelligence’s π-series adapts pre-trained vision-language models for action. Skild AI trains one gigantic cross-embodiment generalist on imitation data. FieldAI chose neither road. The difference matters, and it starts with how each approach thinks about being wrong.
The VLA shortcut — and its cost
Most robotics foundation models today start from a vision-language model (VLM) and bolt on an action head. The recipe reads simply: take a transformer trained on images and text, fine-tune it on robot trajectories, and ask it to emit motor commands. Pi-Zero, OpenVLA, Google DeepMind’s RT-2, and NVIDIA’s GR00T all follow variants of this approach. Figure’s Helix model takes the same general shape. The advantage is speed of iteration. The disadvantage is that the foundation was never built for physics.
VLMs were trained on internet text and images. They have no native concept of risk, uncertainty, or physical consequence. They excel at describing an image and struggle at quantifying how confident they are in the description. A VLM confidently hallucinating is annoying. A robot running on a VLA confidently stepping into a puddle of hydraulic fluid is expensive. Confidently stepping into a worker is a lawsuit.
Ali Agha, FieldAI’s CEO and co-founder, is blunt about the gap. “Rather than attempting to shoehorn large language and vision models into robotics — only to address their hallucinations and limitations as an afterthought — we have designed intrinsically risk-aware architectures from the ground up.”
FFMs start where physics starts
FieldAI’s Field Foundation Models (FFMs) were built physics-first. “We look at AI quite differently from what’s mainstream,” Agha told IEEE Spectrum. “We do very heavy probabilistic modeling.” In the FFM world, every perception is a distribution, every action is weighted against risk, and every decision carries an explicit measure of confidence.
The stack is decomposed into three tightly integrated models:
- Dynamics Foundation Model (DFM) — handles the physical behavior of the robot itself. How a step propagates through a quadruped’s joints, how a humanoid recovers from a near-fall, how a wheeled base behaves on loose gravel. DFM is what catches a slip before it becomes a crash.
- Multi-Agent Foundation Model (MFM) — coordinates multiple robots operating in the same space. Fleet-level task allocation, collision avoidance, and shared world understanding. Critical for construction sites and warehouses where several machines move simultaneously.
- Safety and Risk Awareness layer — a probabilistic overlay that converts every perception and action into a risk-weighted distribution. This is where FieldAI’s Belief World Model lives: a predictive engine that reasons about uncertainty, distributes belief across possible futures, and selects behavior that remains safe under the worst plausible outcome.
Inference runs on the edge at sub-100ms latency. No cloud dependency. Cloud-based analytics and federated learning exist only for long-term refinement; nothing the robot needs to make its next step has to round-trip through the internet.
Probability distributions, not fixed values
FieldAI’s first patent filing, U.S. Application 2025/0252306 (published August 7, 2025), gives a rare public glimpse of how the system actually reasons. The patent describes a terrain-analysis framework where the robot predicts features like slope, roughness, and step height and expresses traversability as a probability distribution rather than a single value.
The practical consequence is that the robot doesn’t just decide “I can step here” or “I can’t.” It decides “there is a 94% chance I can step here safely, a 5% chance I will slip, and a 1% chance the surface is not what it appears.” Combined with the risk-awareness layer, that distribution becomes a decision. If the downside is recoverable, the robot proceeds. If the downside is catastrophic, it backs off.
This approach maps directly to the way safety-critical systems in aviation and medicine have worked for decades. What FieldAI has done is compress that probabilistic reasoning into a foundation-model architecture that runs fast enough to drive a robot in real time.
The three practical capabilities
In deployment, FFMs unlock three behaviors that the VLA approach struggles with:
- GPS-denied, map-less navigation. The robot builds its world model as a byproduct of moving through it, not as a prerequisite. This is why Spot running the FieldAI Brain can walk onto a construction site that changes daily without anyone re-uploading a floor plan.
- Risk-aware decision-making. When confidence drops, the system slows, re-perceives, or backs off — rather than confabulating. A VLA faced with an ambiguous scene will often produce a confidently wrong action. A FFM faced with the same scene produces a low-confidence output, and the safety layer translates low confidence into conservative behavior.
- Cross-embodiment transfer. Because the model reasons about physics — forces, torques, traversability — rather than about a specific motor configuration, the same core runs on a quadruped, a humanoid, a forklift, or a passenger vehicle. FieldAI has publicly demonstrated all four.
What operators actually see
Agha frames the deployment experience bluntly: “Our customers don’t need to train anything. They don’t need to have precise maps. They press a single button, and the robot just discovers every corner.”
That is the operational promise. An operator sets a goal — “scan this floor,” “inspect this pipeline” — and the robot figures out the rest. No RTK-GPS installation, no LiDAR pre-mapping, no path scripting. In a world where deploying a single inspection robot traditionally required weeks of site preparation, FFMs collapse that cost to near zero.
The open question
The field test for FieldAI is whether a physics-first architecture scales as smoothly as the bigger imitation-learning bets. Skild AI has seven times the capital. Physical Intelligence has Alphabet’s compute access. If the winning architecture turns out to be the one with the most data, not the most principled modeling, the smaller well-reasoned approach loses.
But if the VLA approach hits a reliability ceiling — and there is early evidence that it does, especially in safety-critical deployments — then FieldAI is positioned to become the rarest of things in AI: the quieter bet that turned out to be the correct one.
If the VLA approach is the GPT of robotics — powerful, creative, and hallucination-prone — then FieldAI is arguing that robotics needs something closer to a Kalman filter that learned to see.



