Final Automata – Overview

Final Automata started as a spectator experience: fully simulated robots fighting in a physically realistic arena. It quickly evolved into one of the most challenging autonomy projects we have worked on.

At its core, Final Automata is a high dimensional control problem. Each fighter is a bipedal robot driven by reinforcement learning policies that must:

Stay balanced and mobile
Execute realistic martial arts movements
Adapt strategy against a changing, adversarial opponent
Do all of that in real time in a physically based environment

This combination of high level strategy and low level control is exactly the kind of problem that appears in many real-world autonomy and robotics applications.

Today, Final Automata serves as a training ground for our autonomy and simulation expertise. In the long term it can grow into a full esports-style experience, but right now its main value is as a research and prototyping platform for difficult reinforcement learning problems.

Currently the system supports one martial art style with roughly 40 reusable skills. The roadmap includes dozens of styles and more than one hundred unique movements per fighter.

Tech stack

Unity Engine
High Definition Render Pipeline
Unity ML-Agents framework
PyTorch based training stack with protobuf and gRPC under the hood
Training across various hardware: CPU, GPU, Nvidia, Apple Silicon
Data oriented design and custom optimizations in the hot paths of the RL loop

Skill hierarchy

The agent’s behavior is organized as a hierarchy of skills. At a high level, the controller reasons about intent and timing. At a lower level, it selects and blends motion primitives while staying physically stable.

Skill groups include:

Locomotion – approximately 20 independent movements for stepping, circling, entering and exiting range.
Blocking and defense – around 12 skills for covering different attack lines, reacting to incoming strikes, and controlling distance.
Striking – currently about 10 core strikes, with a roadmap toward around 100 unique offensive movements with variations in angle, speed, and setup.

This hierarchy mirrors many real autonomy problems where you need both a library of low level maneuvers and a high level policy that knows when and how to use them.

Long horizon decision making

The adversarial nature of fighting is what makes this environment especially interesting for decision making.

The agent is not just trying to execute a single optimal move. It needs to:

Reason over long horizons under uncertainty
Evaluate tradeoffs between aggression and safety
Choose sequences of actions that remain physically plausible and visually convincing to a human observer

Pre-simulating all possible futures is intractable in real time. Instead, the policy must learn an internal sense of value for each state, and use that to guide decision making under uncertainty. This is directly relevant to any autonomous system operating in dynamic, partially unpredictable environments.

Key technical challenges and how we address them

RL is hard to debug

Agents can happily learn in a broken environment with no explicit error messages. The only symptom is a policy that behaves strangely. We deal with this through:

Extensive observation and action logging
Visual debugging tools and Unity Gizmos
Systematic simplification of the training pipeline so that bugs have fewer places to hide

Sample efficiency and throughput

Reinforcement learning is sample hungry. We invest time into:

Optimizing the simulation loop and physics
Efficient reward computation
Data oriented animation and contact handling

This leads to higher throughput in environment steps per second, which directly reduces training time.

Reward shaping and unintended behavior

Naive reward functions tend to reinforce every action in a successful trajectory, which can produce odd or undesirable behaviors. We iterate heavily on reward design and introduce structure that encourages:

Natural looking movement
Clean, interpretable strategies
Robustness instead of fragile exploits

How this expertise transfers to your project

Final Automata is a showcase environment, but the underlying skills are directly applicable to real projects.

We can help with:

Autonomy and control

Navigation in complex environments
Reactive decision making under uncertainty
Long term planning and execution in adversarial or safety critical contexts

Simulation and digital twins

High performance simulation design
Data oriented optimization and profiling
Building simulation environments that are useful for both training and validation

World modeling and RL pipelines

Designing observation spaces and action spaces that match your hardware and sensors
Structuring hierarchies of skills and controllers
Setting up training pipelines, from environment design to logging and evaluation

We see Final Automata as one concrete example of a more general principle: as DeepMind and others have demonstrated, progress in autonomy and decision making comes from mastering complex artificial environments. Our work focuses on building these environments and the agents that can thrive in them.

If you have a project involving reinforcement learning, autonomous agents, or simulation and would like help designing or implementing it, Final Automata gives you a very concrete demonstration of what we can deliver.

Services for Simulation, Robotics & Gaming

Accelerate Development with Intelligent Simulation & Autonomy

From high-fidelity simulations to adversarial game agents, we build robust decision-making systems for virtual and physical worlds.

✔Scalable Simulation – Train agents in parallel for rapid iteration.
✔Adversarial Testing – Stress-test systems with adaptive opponents.
✔Sim-to-Real – Deploy robust policies to physical hardware.

Let's Discuss Your Project

Tell us about your project. We’ll reply within 1 business day.