tl;dr: We find that by emulating a language model’s internal latent space, represented as a graph, we achieve 99% fewer tokens while retaining the level of semantic meaning.
What Latent Space Actually Is
Latent space is the hidden geometry where modern language models actually think. Inside every transformer layer, the model continuously compresses meaning into dense vectors — high‑dimensional coordinates that encode relationships, structure, tone, and semantics far more compactly than natural language ever could. Similar ideas cluster near each other. Small vector moves correspond to subtle semantic shifts. Entire tasks — summaries, classifications, transformations — are performed as traversals in this space.

Humans never see this internal landscape. We only see the surface‑level text the model decodes after all the reasoning is done. And crucially, today’s agents force the model to perform most reasoning in natural language, discarding the efficiency of its native latent operations. That mismatch — internal vector reasoning vs. external text-based loops — is one of the biggest bottlenecks in agent design.
In this article, I discuss latent state and latent space. Latent space is the landscape, and latent state is where you are in that landscape right now.
Limitations of Text-Based Agents and the TOON Breakthrough
Traditional agents rely heavily on chain-of-thought (CoT) prompting, regenerating long passages of text at each step. This approach is wasteful, unstable, and extremely token‑heavy. Every iteration reintroduces noise and expands context until cost and latency spike.
My recent work on TOON (Token-oriented Object Notation) compression (“Extreme Token Compression for LLM Agents”) demonstrated a decisive counterexample. By rewriting and compressing the agent’s internal representation into a structured, low‑entropy encoding, we reduced token usage while maintaining meaningful output quality. The work demonstrated that agents don’t require verbose text loops; compressed, structured representations can preserve intent while drastically reducing context.
TOON proved the feasibility of a compressed surface form. What remained was the deeper question: could we externalize something closer to latent space itself, allowing the agent to reason in a compact substrate instead of free-form language? The key idea is challenging the verbose reasoning schemes implemented by most chat services: how far can we push semantic compression before it becomes lossy?
Introducing CELS (Compressed External Latent State)
CELS is the next step: a structured architecture where the agent maintains its entire reasoning loop inside a compressed external latent state.
CELS stores the agent’s cognition in three components:
- A typed graph capturing tasks, hypotheses, constraints, and steps.
- A factor vector representing latent-like directions — abstraction level, uncertainty, risk, and time horizon.
- Canonical micro-summaries that serve as dense semantic anchors.
You can follow my research on GitHub: https://github.com/chrisk60331/CELS_Agent
Instead of regenerating natural language each turn, the agent performs small, token‑efficient state edits to this compressed representation. The core loop becomes:
# normalize → propose → apply
self.state = self.pipeline.normalizer.normalize(self.state)
edits = self.pipeline.proposer.propose_edits(self.state, goal, context)
self.state = self.pipeline.applier.apply_edits(self.state, edits)
Minimal Latent State Structure
The latent substrate remains compact and structured:
class LatentState(BaseModel):
nodes: Dict[str, GraphNode] = {}
edges: Dict[str, GraphEdge] = {}
factors: Dict[str, Factor] = {}
summaries: Dict[str, MicroSummary] = {}
version: int = 0
Nodes and edges form the typed graph:
class GraphNode(BaseModel):
id: str
type: NodeType
label: str
State Edits as Latent State Moves
CELS uses small symbolic edits instead of long natural‑language chains:
class StateEdit(BaseModel):
operation: str # add_node, add_edge, add_summary, etc.
data: Dict[str, Any]
reason: str
priority: int
Applying an edit is a direct structural update:
if operation == "add_node":
node_id = StateNormalizer._canonicalize_node_id(...)
state.nodes[node_id] = GraphNode(...)
Agent Loop With Compressed Latent Reasoning
A CELS agent orchestrates multi‑turn planning without expanding text:
result = tool.execute(self.state, parameters)
state_edits = [StateEdit(**e) for e in result.state_edits]
self.state = self.pipeline.applier.apply_edits(self.state, state_edits)
This is the core idea: CELS externalizes a latent‑like manifold, enabling stable multi‑turn reasoning with minimal tokens.
Initial Results and Why They Matter
Early benchmarks already show the promise of CELS:
| Agent | Runs | Avg F1 | Avg Duration (s) | Total Duration (s) | Avg Tokens | Total Tokens |
|------------------|--------|----------|--------------------|----------------------|--------------|----------------|
| main | 5 | 0.441 | 9.26 | 46.3 | 62421.2 | 312106 |
| main_toon | 5 | 0.398 | 9.551 | 47.756 | 64499.2 | 322496 |
| compressed_agent | 5 | 0.23 | 3.689 | 18.443 | 455 | 2275 |
- Baseline agents use 60k–68k tokens per task.
- TOON reduced that to around 60k tokens.
- CELS drops usage to roughly 400 tokens per task.
Latency follows the same pattern: 8–13 seconds drop to ~2 seconds. F1 scores are lower in early tests, but this is expected at this stage. We’ve proven the concept: fewer tokens really do same trick.
The key result: CELS still performs meaningful reasoning while operating with two orders of magnitude fewer tokens. That compression multiplier is the breakthrough.
Why CELS Could Reshape Multi-Turn Agents
If agent cognition can be moved out of verbose text and into an externalized latent substrate, several benefits emerge:
- Cost collapses: minimal token loops radically reduce inference spend.
- Stability increases: structured edits avoid chain-of-thought (CoT) drift. CoT drift happens when the LLM starts to stray from logical, grounded steps, leading to hallucinations or violations of safety rules.
- Interpretability improves: we can inspect the latent graph directly.
- Planning becomes reliable: factors and typed nodes support multi-turn reasoning without context explosion.
CELS suggests a new paradigm where agents behave more like actual latent models — performing compact, vector-like state updates instead of rewriting their entire reasoning in English.
Closing Thoughts
CELS represents an early prototype of a future direction for agents: external latent cognition. Rather than forcing models to reason through sprawling text, we can let them operate in a compressed substrate that mirrors the structure of latent space while remaining auditable and tool-friendly.
The initial results are already strong enough to justify deeper experimentation. The next steps — latent calibration, structured reconstruction, adaptive factors — could turn CELS into a practical foundation for scalable, low‑cost, high‑stability agent systems.
If this direction resonates, collaboration and exploration are welcome. The frontier of agent design is wide open, and external latent state may be the architecture that defines its next chapter.