Two Sessions, One Lesson: What Happened When Our Agent Had Experience vs. When It Didn't

An engineering case study from the BigNumberTheory team

Some quick context: BigNumberTheory is an experience network for AI coding agents. When an agent finishes a session, we automatically capture what it learned — what worked, what failed, what edge cases it hit — and make that knowledge available to future sessions. Think of it as collective memory: your agent gets smarter every time it runs, first from its own history, then from your team's, then from the broader community. It works through lightweight hooks that run outside the agent's awareness — no code changes, no extra prompts, no token overhead.

We use coding agents as our primary engineering force, which makes us our own first customer. Recently, two internal debugging sessions happened days apart that made the value of shared experience concrete. One session had experiences loaded. The other didn't.

Session A: Debugging Without Experience

One of our engineers noticed the production dashboard showed stale data compared to the local development dashboard. Same app, same database, different numbers everywhere.

The agent started from scratch.

It searched through frontend configuration files, environment variables, deployment configs, hosting settings, and git history — over ten distinct investigation steps. After all that, it produced its first diagnosis: the production backend hadn't been redeployed after recent code changes, so it was running older extraction logic. A reasonable hypothesis. But wrong.

The engineer shared more files. The agent kept digging and found the real culprit: production and development shared the same database but used separate agent directories. Knowledge created by a dev agent got tagged with a dev ID; the production app queried for a production ID. The IDs didn't match, so production couldn't see any of the dev-created knowledge. Everything looked correct in isolation — the bug only surfaced when you traced the data flow end to end.

It took 10+ investigation steps, one false start, and multiple rounds of file sharing to get there. The agent did good work. But it had to reconstruct the entire chain from first principles.

Session B: Answering With Experience

A few days later, a different session started. This time, BigNumberTheory's hook loaded 15 community experiences at session start — condensed problem-solving patterns extracted from prior sessions, things like backfill crash fixes and empty session filtering, matched to the current branch of work.

The engineer asked about the system's data pipeline: how session data flows through the backend, why certain data wasn't visible yet, and what the access model looks like. These are architectural questions that normally require reading multiple source files and tracing the request path yourself.

The agent didn't search a single file. It explained the full data lifecycle — how sessions get captured, processed, and fed back into future sessions — and correctly identified why the engineer was seeing the behavior they asked about. The answer was accurate — not because the agent guessed well, but because the loaded experiences were extracted from prior sessions where other engineers had already traced through the code and verified the behavior firsthand.

This wasn't a simpler question. Understanding how data flows through a multi-service pipeline is the kind of thing that takes a new engineer days to piece together. The agent had it because other engineers' sessions had already mapped that territory, and the knowledge had been extracted, compressed, and delivered automatically.

What Struck Us

The effort difference is visible in the transcripts — ten-plus investigation steps versus zero. But what stood out more was the confidence difference. In Session A, the agent hedged, revised its diagnosis, and iterated. In Session B, it spoke with the precision of someone who'd seen this before — because, in a meaningful sense, it had. Same model, same capabilities. The only variable was whether prior experience was available.

Session A also reveals something subtle: the false root cause. The agent's first diagnosis (outdated deployment) was plausible enough that a human might have acted on it — redeployed, found the problem unchanged, and burned an hour before looking deeper. The real bug (environment-specific ID tagging causing invisible data) required tracing a chain across multiple files and services. That's exactly the kind of hard-won knowledge that should be captured once and never re-derived.

How the Experience Pipeline Works

For those curious about the mechanics — how knowledge actually moves from one session to another:

Capture. When a session ends, a hook fires and syncs the full transcript to BigNumberTheory's backend. "Hook" here means a lightweight shell script that runs at specific moments in the agent's lifecycle — session start, each prompt, and session end. The agent doesn't know the hooks exist. There's no extra tool call, no token cost, no change to how the agent works.

Extraction. The backend distills the transcript into reusable patterns — what strategies worked, what failed, and when those insights apply. A full debugging session gets compressed into a compact, retrievable experience.

Delivery. When a new session starts, the hook queries the backend for experiences relevant to the current context and injects them into the agent's context window before the first prompt. The agent in Session B didn't call a retrieval tool. The 15 experiences were already there. From the agent's perspective, it simply "knew" things — the same way a senior engineer "just knows" patterns from years of accumulated work.

What We Took Away

Session A isn't a failure — it's the baseline. It's what every agent does today: capable, methodical, but amnesiac. Every session is day one.

The gap between the two sessions isn't theoretical. It's the difference between an agent that spends ten steps and produces a wrong answer first, and one that answers correctly on the first try. The knowledge compounds in the literal next session, not in some future version of the product.

If a thousand agents are each debugging environment-specific data mismatches this week, today they'll each trace through config files independently and reach the same conclusion separately. With an experience network, the first agent to solve it effectively solves it for all of them.

BigNumberTheory is the experience network for AI agents. One command to connect, and your agent starts learning from every session — its own and the community's. Try it at bignumbertheory.com.