5 May 2026

Why We Moved Beyond Runtime RAG

Runtime RAG breaks agent workloads: repeated retrieval wastes compute and breaks determinism. Why Copyl Knowledge Briefs move reasoning off the query path.

The architecture most teams use to ground AI agents was built for a different problem. Here’s what changed when we stopped doing reasoning at query time.

The pattern that worked — until it didn’t

For two years, retrieval-augmented generation was the default way to give a language model access to private data. Chunk the documents, embed them, store them in a vector database, retrieve the top-k matches at query time, paste them into the prompt. It worked. It still works for one-off questions over static documents.

Agentic workloads break the pattern.

An agent doesn’t ask one question. It runs through dozens of tool calls, each one re-discovering the same context from scratch. Every session starts blank. Every retrieval re-interprets the same chunks. Every answer is a fresh reasoning pass over raw text the model has already processed a thousand times before.

By industry estimates, 80–85% of agent compute goes to re-discovery rather than task completion. We can debate the exact number, but the pattern is real: the same Knowledge Base, queried by the same agent, asked structurally similar questions, doing the same interpretive work over and over.

That isn’t a retrieval problem. It’s an architecture problem.

What runtime RAG can’t do

Three things break at scale.

Determinism. Run the same task twice against the same documents and an agent can return different answers, with no record of which source drove either result. For any workflow that touches compliance, audit, finance, or HR, that’s a structural disqualifier. You can’t ship an agent that gives a different number on Tuesday than it gave on Monday.

Cost. Reasoning at query time means paying inference tokens for work that didn’t need to happen at query time. What a policy section means doesn’t change between sessions. Re-deriving it on every call is a tax on every user.

Citation honesty. Document-level citations — “this answer came from this PDF” — aren’t enough when the agent’s claim is built from three sentences across two chapters. Buyers in regulated industries want claim-level provenance with confidence scores. Vector similarity alone doesn’t produce that.

The shift: move reasoning to compile time

A clear architectural shift is underway. Stated simply:

Stop interpreting source data at query time. Interpret it once, at compile time, and store the result.

The vector index doesn’t disappear. It becomes the fallback for long-tail questions, not the front door. In front of it sits a layer of pre-compiled artifacts: distilled summaries, entity indexes, structure maps, claim-with-citation graphs, conflict registries. The agent reads compiled knowledge first and falls back to raw retrieval only when the question doesn’t fit any compiled artifact.

This isn’t a new idea in software. Materialized views, build pipelines, ahead-of-time compilation — every mature platform eventually moves expensive interpretation work out of the hot path. AI infrastructure is finally catching up.

How Copyl handles it — natively

A compiled knowledge layer can be assembled from disparate parts: a vector store here, an orchestration tool there, a custom pipeline gluing them together. That works — until it has to scale across tenants, agents, policies, languages, and audit requirements at once. At that point, the seams become the work.

In Copyl, compilation is a first-class part of the platform, not a bolt-on.

The Knowledge Base — Books, Chapters, and Docs authored in markdown — compiles into Knowledge Briefs: task-optimized representations bound to a specific Agent Profile. A Brief contains:

A distilled summary tuned to the agent’s scope and persona
An entity index extracted once, not re-derived per query
A structure map of the underlying Books and Chapters
A citation graph linking each claim to its source section, with a confidence score
A conflict registry where contradictions between Docs are detected and resolved using the agent’s own Policies and SOPs

What makes this work isn’t the compilation step in isolation — it’s the integration around it.

Agent Profiles are the task spec. Every Copyl agent already declares its scope, persona, and goals. Briefs compile against that profile directly. There is no separate task-definition layer to author and maintain.

Policies and SOPs drive conflict resolution. When two Docs contradict each other, the resolution rule isn’t ad-hoc. It comes from the Policies and SOPs the customer has already written. Compliance isn’t bolted on at the end; it’s the source of truth the compiler runs against.

CIP makes invalidation automatic. When a Doc, Chapter, or Book changes, the platform’s event bus invalidates the affected Briefs and queues recompilation. No external orchestration tool, no manual cache flush.

“Fork, don’t break” applies to compiled knowledge too. Template Agents come with template Briefs. When a customer customizes their agent or its KB, the Brief forks per tenant — the original is never corrupted, and one customer’s compiled knowledge never leaks into another customer’s runtime.

Versioned and audit-ready by default. Every Brief is versioned, every claim is sourced, every conflict resolution is recorded. The audit trail is the artifact, not something reconstructed after the fact.

What buyers actually care about

The technical story is interesting. The buyer story is shorter.

Same answer twice. Reproducibility stops being a wishlist item.

Lower per-query cost. Compiled artifacts are smaller and more focused than raw retrieval payloads.

Auditable citations. Every claim ties to a source section with a confidence score. Compliance and legal teams can actually sign off.

Multi-tenant safety without extra plumbing. Tenant isolation is enforced at the compilation layer, not improvised at runtime.

Multi-language without runtime translation. Compilation produces Briefs in the languages the KB and the user actually speak — no on-the-fly translation tax.

These aren’t features. They’re the difference between an agent you can demo and an agent you can deploy.

What we’re not saying

Compiled knowledge layers are not a silver bullet. There’s overhead: compilation jobs cost tokens upfront, Briefs go stale if invalidation isn’t wired correctly, and the architecture only pays back at meaningful query volume. Teams running a handful of queries a day per agent should fix their hybrid retrieval and reranking before they touch any of this.

We also don’t think runtime RAG is going away. It’s becoming a fallback layer rather than the primary one — which is, in retrospect, where it always belonged.

Where this goes

The teams shipping production agents through 2026 are going to look less like teams running clever prompt pipelines and more like teams running disciplined data infrastructure. The novelty premium on “we have an AI agent” is gone. The premium going forward is on agents that are cheap, reproducible, audit-ready, and grounded in knowledge their builders can actually defend.

That’s the bar.

In Copyl, the Knowledge Compilation Layer is how we meet it — and because it’s built into the same platform that already owns the agents, the policies, the data, and the events, you don’t have to assemble it yourself.

All posts