Opening Pandora's Box: A Developer's Guide to AI Agents

This is the long-form version of a talk I gave at DevTalks Romania 2026 in Bucharest. The talk was not recorded, so consider this its replacement. The slides are available on GitHub.

Paul Ardeleanu on stage at DevTalks Romania with the Opening Pandora's Box title slide on the big screen behind him.

Title slide showing a sealed terracotta jar beside the talk title Opening Pandora's Box.

Pandora was given a box. Well, a jar, really: “box” is a mistranslation that stuck.

She was given a jar with a single instruction, never open it, and you already know how this goes: curiosity won.

She lifted the lid, and bad things the world did not yet know flew out: sickness, toil, despair. She forced the lid back down, but far too late for almost everything was already gone. One thing stayed inside: Hope.

That is the myth, and we are living it today. You see, we have opened a box of our own.

Twelve months ago, AI coding agents were cool demos. Today they are in your editor, your terminal, your pipeline. And most of us opened the box exactly the way Pandora did: quickly, and with curiosity. But without understanding what came with it.

So here is the plan for this talk: not hype, not fear, just a walk through what flew out, what is still inside, and what turns it from a threat into an advantage.

Before we get started, I just want to make sure it is understood that this is not a talk for senior engineers. It is for everyone willing to listen and try! Anyone can peek inside these tools and see what they are actually doing. That is not about seniority; it is about curiosity.

1. The arc every developer is on

Section divider with a terracotta jar, lid ajar and smoke rising, captioned We have already opened it.

Every developer is somewhere on this seven-step AI Adoption Arc, left to right and rising:

It starts at the bottom, with the Anxious Prompter: you type something and hope.

One step up is the Fluent Prompter, where you have become good at this and you are genuinely productive. But it is still magic.

From step four onwards, the same skill reaches further each time:

you shape the tool
you design workflows
you bring your team along
and at the top you grow the practice across a whole organisation.

We are going to walk this arc together, and by the end you will know two things: which step you are standing on, and where it leads next.

Rising curve with seven numbered stages from Anxious Prompter to Platform Thinker.

The seven-step developer arc, from anxious prompter to platform thinker.

Anxious prompter: you type something and hope.
Fluent prompter: you are genuinely productive, but it is still magic.
Harness reader: you stop guessing and look inside.
Harness shaper: you treat the tool as yours to change.
Workflow designer: you design the workflows the agent runs, and build the thing that runs them.
Team enabler: you teach the discipline to a team.
Platform thinker: you scale the practice across a whole organisation.

Those bottom two steps share something: on both, the tool is a black box. On step one you are afraid of it; on step two you are productive with it, but either way the box is closed, and you cannot see inside.

And honestly, most of our industry stays on those bottom two steps. We are all fluent prompters now, and that is a huge step forward, but it is still a box we do not understand.

Step three is where that changes: we stop guessing and look in.

2. The loop is boring

And here is the first thing we notice: it is boring. This is the agent, the whole thing, roughly seven lines.

while not done:
    reply = model(conversation)

    if reply.wants_tool:
        result = run_tool(reply.tool_call)
        conversation.append(result)
    else:
        done = True

It sends the conversation so far to the model, and the model answers in one of two ways:

Either it wants to use a tool, be that reading a file or running a test, in which case the harness runs it and loops.
Or it says it is done, and the loop stops.

Let me repeat that! The engine inside the technology that thrills and terrifies our industry is a while-loop! The kind you wrote when you first learned to program.

So why is Claude Code hundreds of thousands of lines of code?

Remember when, a couple of months ago, the Claude Code source code got released publicly by mistake?!

A community study found that only about 1.6% of it is what we could fairly call AI decision logic: the clever part, the bit that talks to the model. The other 98.4% is operational infrastructure: be that permissions, tool definitions, state management, error handling, sandboxing.

Unit chart of one hundred squares with two highlighted, beside the figures 98.4 per cent operational infrastructure and 1.6 per cent AI decision logic.

So what we call artificial intelligence is, by volume, 98% ordinary software engineering. The magic is a rounding error, and that is genuinely good news, because ordinary software engineering is something we already know how to read.

The “Dive into Claude Code” paper is available at https://arxiv.org/pdf/2604.14228 if you’d like to read it.

Liu and colleagues also decomposed the production harness into seven parts:

Architecture diagram with a central Agent Loop connected to Developer, Interfaces, Tools, Execution environment, State and persistence, and a Permission system.

It starts with you on the left, and your interfaces, be that terminal, editor, CI.
They all feed one agent loop.
Around that loop sit the permission system, the tools, state and persistence, and the execution environment.
Every entry point feeds the same loop, and that loop reaches its tools only through the same permission gate. Seven components, one loop in the middle, and the shape is legible.

That is not an accident: its architecture is driven by five values.

Numbered list of five values: human decision authority; safety, security and privacy; reliable execution; capability amplification; contextual adaptability.

Human decision authority: humans (not the model) make the calls that matter.
Safety, security and privacy: protect the user, the system, and data from harm.
Reliable execution: what the agent says it will do, the harness validates and dispatches faithfully.
Capability amplification: extend what the developer can do, not just automate what they already do.
Contextual adaptability: the harness fits different developers, projects, and contexts.

These are the 5 values. The first one is what the architecture revolves around. The other four are how the harness honours it.

You can watch this in a real session. Ask the agent to fix a failing test, and it reads the test, reads the code, and runs the test to verify it fails: each is just a tool call, and reads are safe, so the harness lets them straight through. Then the model wants to change a file, and that pause is the permission system, because reading your code was free but changing it is not. The agent asks, a human says yes, and nothing is written that a human had not cleared. That is the loop, start to finish: not magic, just a while-loop with good engineering around it and human checkpoints at the right moments. In Claude Code you can open any of those tool calls and read the whole trace with Cmd+O.

Prompt, model, tool, result, turning again and again until the work is done, with human checkpoints at the right moments. Boring and legible.

So we opened the box, found the loop, and watched it turn. And just like that, we have climbed step three on the arc: the harness reader. This is the step everything else is built on, because we cannot shape what we cannot read.

Before moving to the next step in the arc, we have to ask the hard question: now that the box is open, what flew out?

3. What flew out: the six ills

Six things that flew out. However, none of them is a reason to put the lid back on; in fact, we cannot close the box.

It is open, and we have to deal with what flew out.

But the good news is that every one of these six is something we can build against. They are not vibes, they are work, and work is something we know how to do.

3.1. Opacity

You might have reviewed an agent’s work this week. You open the PR, read the diff. What you cannot read is why.

The agent had reasons. But all that reasoning lived inside a session, and when the session ends, it’s gone. The diff is the only thing that survived.

And the diff is the what, never the why.

For a human teammate, you’d have the commit message, the PR description, or a Confluence page. For an agent, by default, you get a diff and a shrug.

Slide showing a git diffstat and a persistSession true-to-false change, with the line The why left with the session.

Opacity: a diff tells you what changed, but the reasoning leaves with the session.

However, it is fixable

Take the git log of an agent-assisted project: it is agent-assisted work, with standard commit messages but the why is missing.

To the right in the screenshot below, is the same work again: same three commits, same messages. But this time, a hook fired during each session and attached the reasoning to history, so every commit now carries a Checkpoint. The Checkpoint itself is just a pointer: short, generic, and machine-readable.

Two terminal windows side by side: the left git log shows plain commits, the right log shows the same commits each tagged with a Session-Checkpoint hash.

The log file contains every session, every meaningful event, the whole thing. We can filter the log by the session id:

Dark terminal filtering the harness checkpoint log with jq, revealing each change's rationale, including lifting the password rule out of auth.ts.

On that session, the agent made two changes and the hook captured each one with a short rationale. That sentence is the why, and 6 months from now anybody reading that commit will be able to retrieve the reasoning behind it.

This is how we fix opacity. The why is not lost because agents are mysterious; it is lost because, by default, nothing wrote it down.

3.2. Misalignment

Three months ago, OpenAI published how they run their own internal coding agents, and they do not trust them to run unwatched.

A separate model, GPT-5.4 Thinking at maximum reasoning effort, watches them: it reviews the chain-of-thought and tool usage for every session. Categorises each one and assigns a severity level, all within thirty minutes.

Slide showing a screenshot of an OpenAI article on monitoring internal coding agents, with the source URL beneath.

And that monitor caught every misalignment OpenAI’s own engineers had independently escalated. In fact, it flagged more that the humans had missed.

The company at the frontier of this technology is afraid of its own tools, in production, today. That is not a science-fiction problem; it is an operational one, and has operational answers.

OpenAI’s answer is a monitor.

Now, you are not OpenAI, but the shape is the same: when an agent acts on your behalf, something should be watching it. That is engineering we can build.

3.3. Displacement

The headline version is “AI takes developer jobs”, and the honest version is yes and no.

Slide contrasting plus 27 per cent novel tasks against minus 17 per cent code comprehension.

Displacement is two numbers pulling opposite ways: more novel tasks, less code comprehension.

Last year, Anthropic surveyed its own engineers and found about 27% of Claude-Code-assisted tasks were work that would not have been attempted otherwise. The ceiling rises, and the things that were too expensive, too fiddly or too far down the backlog, are now getting done.
In another study completed this year, a team of researchers gave developers code-comprehension tests, and the ones working in AI-assisted conditions scored 17% lower at understanding the code.

The ceiling rises, but the floor drops too.

Do more and understand more, and we are lifted with the ceiling.
Do more and understand less, and we sink with the floor.

Displacement does not come for developers who are willing to understand their tools and their limitations.

3.4. Three more ills

On top of those three, three more flew out:

Accountability vacuum: our current business models of who’s responsible for what, always assumed a human pressed the merge button. For an agentic world, most orgs have no clean answer for who owns what.
Amplified dysfunction: AI is an amplifier, not a fixer. A mature team gets a real lift. An immature one gets more defects, bigger PRs nobody can review, and faster wrong decisions. AI multiplies your team, including the problems you already have.
Prompt injection at agent scale: this has two routes.
- A tool can be poisoned when you install it, via hidden instructions in its description.
- Or poisoned whilst it runs, through hidden instructions in what it sends back.

The MCP ecosystem has already shipped vulnerable servers by the thousand. The fixes differ: a vetted registry for what you install and a sanitising hook for what comes back.

Summary slide listing the six ills beside a smoking earthenware pot.

So that is what flew out: six ills. Not one was magic. Not one is a reason to panic, or to pretend we can close the box again.

Every one is shaped like engineering: a hook for opacity, a monitor for misalignment, a reading habit for displacement, or a fix for each injection route. This isn’t vibes, it’s work.

OK. Now, if that is what flew out, what is still inside?

4. What is still inside: the safety primitives

Going back to the myth for a moment, when Pandora closed the lid, the box was not empty: one thing was still inside, and that thing was hope.

For us, the hope that stayed in the box is engineering: the safety primitives that ship in a production harness today, whether or not anyone reads them.

Let us look into the box.

4.1. Permissions

Slide titled Permissions showing a 93 per cent approval figure and seven permission modes.

When our box opened, the first thing we were all met with was the approval prompt: the agent wants to act, so it stops and asks. Reads are free but anything that changes the world waits for us to confirm it.

When Anthropic measured how people actually used it, it found that we approve about 93% of those per-action prompts. 93! A control we say yes to nine times in ten is not a control; it is a speed bump we will stop noticing and mindlessly approve.

What Anthropic did next was not add more warnings but to restructure around permission boundaries. Instead of a hundred prompts a session, we define once what the agent may do: which tools are allowed, which denied, which gated. Deny-first and only escalating to a human when one is genuinely needed. So, the per-action triage that used to interrupt us, is now code.

In that Claude Code leaked repo, there was a file called yoloClassifier.ts, about 1500 lines of TypeScript, that scores how risky each action is and decides whether to let it through, block it, or ask a human. That is the permission system, and it is a boundary we set once and can audit, not a prompt we rubber-stamp a hundred times.

4.2. Hooks

Slide titled Hooks stating five safety events plus twenty-two lifecycle events.

Permissions decide what the agent is allowed to do. And unlike everything we put in AGENTS.md (which the model can choose to ignore), hooks are code that runs. Instructions can be ignored. Hooks cannot.

That Claude Code study counted 27 hook event types: 5 safety events, and 22 for lifecycle and orchestration, like before a tool runs, after it runs, or when a session starts or ends. At each of those events you can run your own code, and it can do one of three things:

block the action
rewrite it before it happens
or annotate it, attaching something to the record.

A hook is how we address the opacity we encountered before. Attach one on the after-an-edit event, and every change carries your metadata: the task, the reasoning, the trail. The why stays welded to the what, automatically, because you wired it that way.

Let us watch one fire. Same harness as before. This time, we are watching a hook.

Claude Code starting in the auth-service project, with a tail on the checkpoint log running in a pane below, still empty.

We are asking the agent to perform a simple docs task.

The same session with the prompt typed in: add a JSDoc block to the password validator in src/auth.ts.

The Agent attempts to make an edit - we gave it permission to do so.

Claude Code proposing the JSDoc edit to auth.ts and asking permission before making it.

And the hook fires immediately after that edit, writing the reasoning into the session’s lineage record.

After the edit, the checkpoint-log pane shows a PostToolUse event recording the Edit to src/auth.ts.

The hook captures every change the instant it happens; and when the work is committed, that record becomes the Session-Checkpoint we saw earlier: the line that carries the reasoning.

Nobody typed that, and nobody approved that.

The agent's reasoning for the change captured in the session, with the PostToolUse lineage event still visible in the log pane.

It is policy, running as code, on the after-an-edit event.

So the what is logged automatically, and the why is welded on at commit time. The opacity we worried about in the last section, resolved by wiring a hook.

4.3. The quieter primitives

Let’s cover 3 more primitives, already in the box.

MCP boundaries: MCP has become the official protocol a harness uses to reach external tools and data. It extends the agent NOT at the reasoning layer BUT at the protocol layer. When your team ships an MCP server for an internal API, the agent gets to see exactly what every engineer on that team sees through it, no more, no less. We are NOT widening what the model can think, we are defining what it can reach.
Subagent isolation: When the harness spins up a subagent, it runs the same loop again, but with a fresh context window of its own. It does its work, and hands back just a summary. Nothing leaks across that line: not the context, not the permissions. It is a sealed box, by design.
Append-only sessions: As the session runs, the harness writes it to disk: an append-only log you can resume, fork, and rewind. So the session is not a black box that vanishes when you close the terminal. It is a record. And you can audit it.

So the hope stayed in the box BUT hope was never a feeling. It was these primitives all along, and across all five they share one trait: governance you can open and shape.

Recap slide Governance is running code listing all five safety primitives.

In most organisations, governance is a whitepaper: a document on a shared drive that everyone ignores. In a production harness, governance is running code.

But before we go on, one piece of honest pushback.

5. A contrarian beat: single-agent by default

The loudest advice in the industry right now is not shape your harness. It is build a swarm.

So let me swim against the current for a minute. The thinking goes like this: if one agent is good, a team of agents must be better. A planner, an architect, a coder, a reviewer, a tester, all talking to each other, orchestrating the swarm. It looks like an org chart. And we trust org charts.

Earlier this year, a Stanford team finally ran the comparison fairly by holding one thing equal: the thinking budget, i.e. the tokens an agent gets to reason before it answers.

Slide comparing equal-budget single agents against multi-agent swarms, with a large 4.7x figure, citing Tran and Kiela, Stanford 2026.

Turns out: the single agent matches or beats the swarm at any serious budget. So where did all those swarm wins come from? Compute nobody counted. The swarm didn’t win; it just spent more, 4.7 times the compute, and took the credit.

Why? Because one agent sees the whole problem at once. A swarm can’t; each agent only gets a short summary of what the others did, and every summary loses something. So more agents don’t add understanding; they add telephone-game distortion.

There’s really only one time a swarm pulls ahead: when the information is so messy and noisy that a single agent gets lost in it, and that situation is becoming more common.

So, decompose deliberately. A swarm carries a hidden cost: the coordination tax.

Every extra agent burns tokens, time, and money.
Every hand-off is another chance to drop or distort information.

We pay that tax whether or not the swarm earns it back.

So: single agent by default. A swarm is the exception that should only be triggered if:

context is too long and genuinely corrupted.
sub-tasks that truly don’t need to talk to each other
sub-tasks reaching for completely different tools

One focused agent with good tools will always beat six confused agents arguing.

6. Agency is your job-safety strategy

OK, something else we’ve been circling deserves to be named. AI is closing the skill gap: the distance between a junior developer and a senior engineer, the distance that used to take a decade to cross, is getting shorter.

If your value was “I can write the code that most people cannot”, that moat is draining.

Slide with a photo of hands shaping clay on a potter's wheel and Max Schoening's quote that the first 10 per cent of every project is now free.

Max Schoening, who runs product at Notion, put it cleanly: “the first ten per cent of every project is now free”. We can spin up ten agents and explore ten rough versions before lunch. But the last ten per cent is still ninety per cent of the work. Exploration got cheap but shipping something real did not.

So if the skill is flattening, the honest question is: what is left? What is the thing that still separates the developers who compound from the developers who stall?

What is left is agency! Not skill. Agency!

Agency is a disposition to treat the world around us as malleable. Something we are allowed to reach into and change.

AI-native is not about having an LLM open on your screen all the time; it is about how you think about the value of your work.

If you do not know what matters, what good looks like, or where to focus, AI just helps you do the wrong thing faster.

The seven-step arc over a photo of potter's hands, titled What's left is agency, with step four Harness Shaper highlighted in red.

Most developers are still on steps one and two of our arc: the anxious and the fluent prompter. They accept the harness exactly as it shipped: they type into the box, and they take whatever the box was configured to give them.

The harness shaper, step four, does the opposite. You look at the harness the way a potter looks at clay: all of it is malleable, all of it is yours to throw.

So shape your harness. Don’t just prompt it.

A terminal tree of the auth-service project, with the hooks, settings.json, AGENTS.md and a prepare-commit-msg git hook circled in red.

“Shape your harness” might sound like a slogan, but it is not.

Here is the harness we saw earlier. We have:

a project-memory file
a permissions file
and the hooks that wire your policy into every edit and every commit.

That is not a manifesto. It is a directory listing:

Two terminals: the project tree beside cat AGENTS.md, showing rules, workflow and prohibitions for the auth-service module.

And every one of those is a plain text file, and every one of them is yours to edit.

Agency is the disposition; these files are the artefact. Which means shaping your harness is not aspirational: it is just opening files. And that is exactly what you can do on Monday.

The Monday-morning checklist

Here are 5 moves for your Monday morning.

Slide titled Five moves, Monday morning, listing a five-item developer checklist.

Write your AGENTS.md like a system contract.

Most teams treat the project-memory file as a README: a few notes, mostly stale. It is not a README. It is the contract the agent reads before every task. Treat it like load-bearing code: precise, current, reviewed. Whatever your harness calls it, be that AGENTS.md, CLAUDE.md, .cursorrules, the discipline is the same.
Ship one MCP server.

Find that one thing your team reaches for every day but that the agent currently cannot see, be that your deployment status, your internal docs, your ticketing. Make it a tool, instead of a paragraph of instructions. Start with one single server.
Install one hook for lineage.

Pick the smallest hook that captures the why alongside the what in your history, so that six months from now the diff still explains itself. Smallest one that works.
Run a permissions audit.

Open your harness’s permission rules and actually read them, then tighten one and maybe loosen another, both deliberately. Stop treating permissions as someone else’s default.
Run the single-agent default.

Resist the swarm. One focused agent with good tools, until you hit a real trigger for decomposition, not before.

These are the 5. And one thing rings true of all of them. They don’t need a budget, a meeting, or a platform team.

They are all things you can do on Monday morning, with the tools you already have, and the permissions you already have.

To help you remember all five, I’ve put together a PDF with the checklist, and the engineering primitives. It contains every move on that checklist, as ready-to-paste starting points.

Now, the checklist is the easy part. The hardest part is the mindset shift.

The developers who thrive in the next decade will not be the ones who memorise the cleverest prompts. They’ll be the ones who shape their harness, who treat the permissions, the hooks, the project memory, the tools, all of it, as clay on a wheel.

And the best part? You don’t need to wait for someone to hand you a mandate.

7. Build the thing itself

The checklist is the harness-shaper’s move. The next step up is to build the thing all of that wraps around: the loop itself. Yes, you are perfectly capable of building an agent yourself today.

Remember the harness loop we saw earlier?

while not done:
    reply = model(conversation)

    if reply.wants_tool:
        result = run_tool(reply.tool_call)
        conversation.append(result)
    else:
        done = True

Let’s remind ourselves what it does:

it calls the model.
if the reply asks for a tool, run the tool and feed the result back.
if it does not, we are done.

That’s it!

Here are the same lines but in JavaScript, and they are about to grow.

let done = false;
while (!done) {
  const reply = await model(conversation);

  if (reply.wantsTool) {
    const result = await runTool(reply.toolCall);
    conversation.push(result);
  } else {
    done = true;
  }
}

Everything we add from here is pure engineering. And each thing we bolt on is a safety primitive we’ve already spoken about. Watch the code grow one move at a time.

JavaScript agent harness with each added guard labelled in the margin: trusted-source check (injection schema route), permission gate, response sanitiser (injection response route), and the lineage log line (opacity fix).

First move: before we let the loop run a tool, we check against a permission ruleset, so the agent cannot run a destructive command without our say-so. That is the safety primitive we named earlier, and it is 4 extra lines.
Second move: After the tool returns, a hook fires and writes the result and the reason to a log, so the why is captured at the moment it exists. And that is the opacity fix; one single extra line.
Third move: the prompt-injection. In two halves. Because the threat has two routes.
- Before the dispatcher runs a tool, it checks the tool’s source against a list of trusted MCP servers: that closes the schema route, the case where the tool’s own description was poisoned at install.
- And after the tool returns, a small sanitiser strips instruction-shaped strings from the result before the loop reads it back: that closes the response route, the case where the tool’s output is carrying instructions for the agent.

And that’s it!

We designed a workflow!

Slide reading The artefact is small, the discipline scales, beside the complete hardened agent harness.

The whole hardened harness on one screen: the artefact is small, but the discipline scales.

That was Step five on our AI adoption arc.

Step six is the enabler: the same discipline taught to a team. The senior engineers in this room have done this already, almost certainly, without naming it.

And Step seven, the platform thinker, is where the practice scales beyond your team.

8. Where the practice scales

Section divider headline Your checklist does not scale by itself, above a row of tagged terracotta jars.

As we mentioned earlier, AI amplifies whatever system it runs on. Amplify a good system and you compound. But amplify a broken one and you compound the breakage.

We can do everything on that checklist and become genuinely good at shaping our harness, and our organisations still gain nothing. The literature has a name for it: the AI Productivity Paradox.

Individual lift, no organisational gain: more tasks completed, more PRs merged, but delivery measured at the level of the whole team is dead flat.

So, if individual agency alone does not scale, what does?

What scales is an operating model, a new paradigm-shift operating model. At Vonage, we are building a paradigm-shift operating model; we call it the AI-Native Software Factory and it’s something I’ve spent my last half a year on.

It embeds AI assistants, automation, and intelligent tooling into every phase of software delivery, from requirements and architecture through to testing, deployment, and operations.

It rests on four pillars.

Architecture diagram of the AI-Native Software Factory showing Intent Engine, Context Graph, Agentic Assembly Line and Governance Gate.

The AI-Native Software Factory: four pillars wired from inputs through to deployment.

The first is a Context Graph: a governed, queryable model of the code, architecture and previous decisions, tickets, runtime, so an agent reasons over a real picture of the system instead of reading files and hoping.
The second is an Intent Engine: a system that captures the intent behind every change, so the why is always available to the agent, and to the humans who review it. Specifications are the source of truth, testable and executable.
The third pillar is the Agentic Assembly Line: receives Units of Work from the Intent Engine and decomposes them into tasks for specialised agents executing in parallel (not a swarm).
The fourth is a Governance Gate: a system that enforces the rules of the organisation, so the agent cannot act outside of them, and every action is logged and auditable.

Those four pillars are the answers to the ills we named: opacity is what the Context Graph closes, by making the why a durable, queryable record rather than a session detail that vanishes, and the accountability vacuum is what the Governance Gate combats, the same problems solved structurally rather than heroically.

Full seven-step developer arc, all nodes highlighted, titled The whole climb.

So here is the whole climb. We have walked the entirety of it, from the anxious prompter, all the way to the platform thinker at the top. Reading became shaping, shaping became designing and building. And building, taught to a team, becomes the platform thinking the factory runs on. That is the climb, and all of it is ours to make.

Paul Ardeleanu on the AI Stage at DevTalks Romania, mid-gesture, with the Opening Pandora's Box speaker banner behind him.

On the AI Stage at DevTalks Romania, Bucharest, June 2026.