Test In Prod Or Live A Lie

Ariel Zeitlin

Co-Foudner & CTO

Bottom line: You cannot secure modern applications by reviewing code alone.

Table of contents

This is some text inside of a div bloc

Many vulnerabilities only emerge in production systems - in the interactions between services, identity boundaries, cloud configurations, and in runtime behavior under pressure and focused attacks. At Tenzai, we focus on active validation, testing real systems in realistic environments to determine whether they can actually be compromised.

Anthropic’s release of Claude Code Security joins others in clarifying that securing source code is table stakes. When agents can write entire applications, insecure patterns propagate just as quickly. At Tenzai, we recently published research showing that every popular coding agent produced exploitable vulnerabilities.

To be clear, tightening security at the point of generation isn’t a breakthrough. Other model providers, such as OpenAI’s Aardvark (name choice aside 😀), have already moved in this direction.

Following the Claude Code Security announcement, the market reacted in hysteria, suggesting something much bigger, as if safer code equals solved cyber security.

It doesn’t.

There is a large ecosystem built around scanning codebases for known vulnerability patterns. As AI agents get better at avoiding obvious insecure constructs, the marginal value of detecting them after generation compresses. Those products are under real pressure.

But application security has never been just a pattern recognition problem.

Even when models review generated output, they are still reasoning at the source level. They are not behaving like adversaries because they are not observing what adversaries observe.

Code mistakes are the cause of many breaches -Imperva, Equifax or First American Financial, among others. However, the last decade of breaches makes it painfully clear that a secure codebase does not equal a secure product.

They were caused by broken API security (Optus), overly permissive access controls(Capital One) or flawed integration between services(Microsoft). A perfect codebase would not have prevented them.

In observability and cloud engineering circles, there’s a saying that has floated around for years:

“Test in prod or live a lie.”

Looking at source code tells only part of the story. The rest lives in application configurations, service boundaries, deployment stacks, identity providers, and the way all of these components interact at runtime.

In practice: Our motto is POC || GTFO .

We know that many vulnerabilities only surface when an attacker chains small weaknesses into something meaningful and when services interact with all the other components that make up an application. More importantly for defenders' sanity, many findings that look alarming in theory quietly evaporate once production defenses show up.

To be clear, improving code-level security is good for the ecosystem. It reduces noise. It raises the baseline. Developers benefit from fewer obvious mistakes. It’s important to remember, resilience in modern systems is not achieved at the source layer. It is forged in the messy reality of deployment, configuration, identity, monitoring and operations over time.

We’ve seen this movie before. Microservices promised deployment speed and scaling software organizations flexibility, and delivered fragile architectures with blurred trust boundaries. AI amplifies this dynamic by increasing code volume, accelerating deployment and adding more pressure to deliver faster.

This chapter of Application Security will not be solved by bolting on AI onto the same workflows of the past but by focusing and hammering on the real world.