From Games to General Capabilities

A principled route towards general intelligence

Our thesis: if you want systems that generalize, you should train them on a wide diversity of situations. Our idea: generate automatically a variety of games on the basis of their value towards enhancing the cognitive abilities of models.

Xent games are a family of games designed specifically for language models. They are constructed to elicit and measure transferable skills—competencies that remain useful when an agent encounters novel objectives, unfamiliar constraints, or new information structures.

What is AGI?

A common operational definition is that intelligence is an agent's ability to achieve goals across a wide range of environments. Under this view, 'more general' means robust performance across broader and less familiar settings.

On that definition, AGI would correspond to an agent that performs well across a large class of tasks and environments because it has learned strategies that generalize, not because it has memorized task-specific patterns.

This framing reduces the question to something measurable: How does performance change under distribution shift—i.e., on tasks and environments the system has not encountered before? A formal treatment of this perspective appears in Legg & Hutter. (arXiv:0712.3329)

Why don't we have AGI yet?

Current models can be highly capable, but their competence is often conditional: performance depends on how closely a new problem matches the training distribution.

When the problem specification changes—new rules, different objectives, altered constraints, missing information—performance can drop sharply. The bottleneck is frequently not 'raw capability,' but robust generalization.

If we want systems that behave reliably in novel or high-stakes settings, we need training regimes that make exposure to the unfamiliar a first-class design constraint rather than an afterthought.

What prepares an AI for the unknown?

Games are useful here because they let us systematically generate variation. We can create environments spanning:

competitive and cooperative objectives
perfect and imperfect information
planning, negotiation, deception, error recovery, and coordination
settings where the agent must infer latent state and adapt under uncertainty

Training across sufficiently diverse games pushes the agent toward skills that transfer: fast adaptation, hypothesis formation about new rules, and strategy selection under partial information.

The key practical question is not 'games, in general,' but: which game families yield broad transfer, and how do we scale them without degenerating into shallow template variants?

How do Xent games work?

Xent games are our answer: a structured space of games tailored to language models, designed to convert a model's internal statistical knowledge into a training signal in a way that supports transfer. For the full theoretical treatment, see our paper. (arXiv:2506.06832)

Implicit knowledge

Pretrained language models contain substantial latent structure: patterns about language, the world, reasoning, and strategy that are not always accessible via naive prompting.

Xent games leverage this structure using a scoring signal derived from cross-entropy (Xent) / negative log-likelihood—informally, how unexpected an outcome or continuation is under the model. By shaping gameplay around this signal, we can extract informative feedback from what the model already represents, and then train the agent to deploy that knowledge more consistently and effectively.

Xent game structure

Xent games are scored quantitatively in ways aligned with properties we care about in decision-making:

Novelty: avoiding trivial repetition and template matching
Constraint satisfaction: producing valid moves under nontrivial rules
Counterfactual relevance: reasoning about alternatives that could have occurred
Information content: actions that meaningfully reduce uncertainty

Because these games share a common mathematical structure, they form a connected family rather than a collection of unrelated tasks. That connectivity matters: it increases the likelihood that competencies learned in one game transfer to others, enabling training that targets general skills rather than narrow tricks.

Xent solver

We've built computationally efficient solvers that can identify strong (often optimal) moves in Xent games. This enables higher-quality training data: instead of relying solely on unguided self-play, models can learn from expert trajectories and principled search—closer to training with a strong reference policy than learning entirely from scratch.

The structure of the Xent game space also helps keep training transparent and analyzable: game instances, scoring, and trajectories can be inspected directly.

Where are we now?

We've released an open-source benchmark based on Xent games to measure progress toward general capabilities in a way that is:

transparent and extensible
resistant to contamination and shortcutting
auditable at the level of individual game traces

Xent game engine

We also built an open-source engine that lets agents play Xent games—for benchmarking, skill development, concrete problem-solving, and experimentation.

Learn More About Xent Labs

Xent Benchmark

See how top models perform on the Xent benchmark

Learn About Xent Games

What do Xent games look like? And how do they work?

Read the Theory

Dive into the theory behind the Xent benchmark.

Meet the Team

Learn about the people behind Xent.