Who Owns The Brains And Minds Of Humanoid Robotics

How next generation chips and foundation models will quietly decide which humanoids win.

May 3, 2026

9m read

Upvote this article

In robotics the “brain” is not a metaphor. It is the on board compute stack that ingests raw sensor data and produces motor commands in real time. Cameras, LiDAR, force sensors and joint encoders stream continuous data into CPUs, GPUs or accelerators that must fuse perception, planning and control within tight latency and power envelopes. A cloud round trip is acceptable for a chatbot. It is unacceptable for a biped that is about to misstep on a stair.

That is why edge compute is becoming a strategic bottleneck for physical AI. The most visible benchmark today is Jetson AGX Thor, a Blackwell based module that delivers up to roughly 2,070 FP4 teraflops of AI compute with 128 gigabytes of memory in a power envelope around 130 watts, offering about seven and a half times the AI performance and more than triple the energy efficiency of its Orin predecessor. This is not a loose collection of chips. It is an integrated system on module with tightly engineered thermal design, high speed camera interfaces, industrial buses, and a mature CUDA, TensorRT and ROS2 software ecosystem that already plugs into commercial robotics workflows.

For many humanoid teams Thor or its derivatives will become the default high end option because it compresses years of platform engineering into something that can be ordered, mounted and deployed. Yet dominance at the top of the stack does not preclude specialization underneath. The more diverse the use cases become the more room there is for focused silicon that optimizes for different slices of the embodied workload.

NVIDIA

NVIDIAhas become the default silicon backbone for high end robotics. Its Jetson AGX Thor module effectively condenses a data center class AI stack into a compact system on module with integrated CPU, GPU, accelerators, memory, industrial I/O, and a mature CUDA, TensorRT and ROS 2 ecosystem. For humanoid builders this solves a painful problem. Instead of spending years on platform engineering, teams can anchor their design on a known quantity with a roadmap, developer tooling and a large community. The cost is dependence. As more robots converge on the same compute substrate, NVIDIA’s roadmap, pricing and software choices become central to the economics of embodied AI.

Etched

Etched represents the most radical bet on transformer dominance. The company is building single purpose accelerators that hard wire transformer computation graphs into silicon, sacrificing almost all generality in exchange for extraordinary throughput and efficiency on those workloads.

Early systems have shown order of magnitude speedups and striking power savings on large language model inference versus general GPUs. That makes Etched conceptually interesting for robotics as transformers and multimodal models creep from the cloud into the control stack. If high level reasoning and planning in humanoids are dominated by transformer like architectures for the next decade, a transformer only co processor starts to look like a TPU for embodied cognition. If the field shifts to very different model families, Etched’s hyper focused silicon risks becoming a cul de sac. The upside and the risk are both extreme.

Hailo

Hailo sits at the opposite end of the spectrum, optimising for tight power and space budgets rather than peak throughput. Its chips use a dataflow architecture tuned for edge inference, pushing tens of trillions of operations per second at just a few watts in small M.2 and mini PCIe form factors. That makes Hailo modules natural “sensory lobes” for humanoids.

A robot can dedicate them to vision, processing several camera streams for detection, depth and segmentation, while freeing the main compute for coordination and reasoning. Crucially, Hailo leans into pragmatic integration. Support for mainstream frameworks and standard hardware slots means robotics teams can extend existing designs rather than rebuild their stack. In a capital intensive field where development cycles are long, that kind of friction reduction is often more valuable than another headline benchmark.

Physical Intelligence

Physical Intelligence is one of the most ambitious attempts to build a general purpose “mind” for robots. The company is training large vision language action models that can take natural language prompts and translate them into sequences of behaviour in unfamiliar environments.

Early systems demonstrated the ability to tidy new kitchens or adapt to unseen homes, and the team has raised substantial capital to feed their models with vast amounts of robot experience.

Physical Intelligence believes the brain of future robots should be a shared foundation model, not a bespoke stack per platform. If they succeed, onboarding a new robot becomes less about writing control code and more about connecting it to a common intelligence. The risk is that robotics data is expensive and messy, and turning a beautiful research result into a robust product that survives warehouses, homes and factories is brutally hard.

Skild AI

Skild AI is pursuing an “omni bodied” foundation model that can drive different robot morphologies with the same core brain. In demos, the company’s Skild Brain has piloted humanoids, arms and quadrupeds without retraining from scratch, supported by a training corpus that blends large scale simulation with real world data at unusual scale.

Investors have responded with multi billion dollar valuations and billion plus rounds, treating Skild less like a point solution and more like an operating system bet on the future of robotics. If one model can generalise across bodies and environments, the economics change dramatically. New robots can be brought online by mapping their kinematics to an existing brain instead of building yet another control stack. The open question is whether that generality can be preserved as customers push for hard guarantees on safety, latency and edge deployment.

Covariant

Covariant has spent years proving that general purpose robot intelligence can survive contact with messy commercial reality. Its Covariant Brain and RFM class models power fleets of industrial robots in warehouses that collectively learn from billions of grasp attempts, successes and failures.

Each new deployment starts with the entire network’s experience baked in, and every shift in SKU mix or packaging adds more data to the model. Covariant’s significance for the humanoid story is twofold.

First, it demonstrates that “fleet learning” can deliver compounding performance improvements in production, not just in simulation. Second, it shows that a horizontal intelligence layer can sustain a business in a very specific vertical while remaining portable to others. That makes Covariant a critical proof point for the whole “foundation model for robots” thesis.

OpenAI

OpenAI has never been a robotics company in the narrow sense, yet it is becoming a central actor in the minds of embodied systems. Its large language and multimodal models are being woven into the cognitive stacks of full stack humanoid builders as high level planners, reasoning engines and interfaces. The

OpenAI backed humanoid maker 1X Technologies, for example, uses large language models in its EVE and NEO robots to allow natural language instruction and guidance, and is now scaling production capacity for tens of thousands of units.

Rather than building all the hardware itself, it seeks to be the default API for cognition, planning and interaction, creating a form of soft control over multiple robot platforms. The more teams build their user experience and task planning around OpenAI models, the harder it becomes to unplug them later, which turns the company into a quiet but powerful gatekeeper for the humanoid era.

Sanctuary AI

Sanctuary AI takes the opposite approach. It is a full stack builder that designs the body, the cognitive architecture and the data engine in one loop. Its Phoenix humanoid and Carbon cognitive system draw heavily on human inspired sensory and memory structures and were trained initially through teleoperation, with humans guiding robots via VR rigs.

Over successive generations, more autonomy has been handed to Carbon as it absorbs thousands of hours of human control. Sanctuary’s strategy is to prove real value in specific verticals such as retail and light manufacturing with partners like Magna and cloud providers, and then decide whether to scale hardware production or license the brain and key mechanical IP, including its dexterous hands.

Even if Phoenix never dominates the mass market, a licensed Carbon stack could quietly permeate other platforms, turning Sanctuary from a robot maker into a supplier of minds and critical hardware designs.

FieldAI

FieldAI targets a part of the robotics landscape that most demo reels ignore. The company is building software minds for robots operating in messy outdoor environments where human access is costly, slow or dangerous. Its models focus on robust real time mapping, long horizon path planning and adaptive control under conditions that are hard to predict and harder to simulate.

The emphasis is on uptime and hazard avoidance for infrastructure inspection, construction, energy and disaster response. For the broader humanoid story, FieldAI and peers signal an important structural point. Not all high value robots will live in tidy homes or climate controlled warehouses. Minds optimised for the wild will drive a significant share of the economic impact, and their requirements will ripple back into hardware, compute and connectivity choices.

How to read this map

Seen together, these companies outline a layered embodied stack. NVIDIA, Etched and Hailo contest the “brains” layer, deciding who can deliver safe, responsive capability within practical power and cost envelopes. Physical Intelligence, Skild AI, Covariant, OpenAI linked stacks, Sanctuary AI and FieldAI contest the “minds” layer, where data, algorithms and deployment networks compound into something that increasingly resembles an operating system for physical work.

For builders, the message is that constructing everything from scratch is becoming less rational. The durable edge is drifting toward differentiated data, domain expertise and deployment networks sitting on top of shared brains and minds.

For capital, the most interesting exposure often lies with the platforms whose technology can inhabit many forms. Chips that can live inside humanoids, mobile bases and drones. Minds that can move from warehouses to homes to refineries and the open field. Those are the entities that will quietly define who really wins humanoid robotics, regardless of which specific robot is on stage.