Today, we're announcing that Tenzai is expanding to AI applications, moving beyond the web and API surfaces our AI hacker is known for.
AI is everywhere, built into existing enterprise apps using the same authentication mechanisms, internal and external APIs, and client-side interfaces that carry web application security risks. Testing these applications well means treating the AI surface and the classic web surface as one connected target. Attackers already do.
For example, in one production enterprise application Tenzai tested, our hacker uncovered a critical vulnerability - a chain of findings that spans both the web and AI surfaces. This attack chain demonstrates how the Tenzai agent operates through that connected surface: how it maps the AI side, how it links AI primitives into web primitives, and how it validates that what it found is actually exploitable. The demo below walks through what that vulnerability looks like within Tenzai, where a Member-role user lands RCE on a production Oracle server. It was achieved by stitching a web access control failure IDOR into an AI tool-execution path.
Let's see it in action:
Where web and AI surfaces meet
The attack chain described is a canonical example. A Member-role user, with no admin privilege and no can_override_ssh_commands flag, ends up with shell on a production Oracle server. The path:
- Web layer — Admin sessions BAC. Member-role user can access
/api/admin/sessions/*endpoints that should be admin-only. The response includes every active session — names, emails, IPs, session IDs — and, inside one session's activity history, a UUID belonging to an agent owned by the DBA team. Classic broken access control; the AI surface is nowhere in sight yet. - AI surface — Agent-card IDOR. Armed with the UUID, the attacker hits
/api/agents/{uuid}/card. The base/api/agents/{uuid}returns 404 — auth is working there. The sub endpoint/cardreturns 200 with the full agent profile: an SSH tool pointing at production hostnames, an Oracle DB tool, the agent's instructions. The attacker now knows this specific agent has tool authority that maps to production infrastructure. - AI surface — /execute IDOR via LLM-mediated tool call. The attacker invokes
/api/agents/{uuid}/executewith a natural-language instruction. The model decides to invoke the SSH tool with the attacker-controlled command. The tool runs under the agent's inherited oracle credentials. The response containswhoami: oracle,hostname: dbsvr01.
When the Tenzai agent finds multiple primitives that share an attacker identity, target asset, and a timeline, the platform stitches them into a single attack-chain finding. That chain finding is not a paste of the underlying issues - it is the narrative of how the attacker uses the different vulnerabilities one after the other, with the captured conversation, the HTTP transcripts, and a validator's independent reproduction all attached as evidence.
The AI surface, in practice
Tenzai's AI hacker maps the target application as a set of actors, instructions, tools, credentials, guardrails, state transitions, and HTTP endpoints. While a web scanner will see an endpoint that returns a tool schema or agent identifier and move on, Tenzai's agent matches those details against how the target application uses the model to make decisions, call tools, inherit permissions, move data, or trigger business workflows. These matches turn a JSON response containing an agent schema into a lead.
The categories of AI surface the agent investigates include:
agent metadataendpoints — identifiers, capabilities, configuration, workflow state, and peer-agent relationshipstool inventorypaths — disclosed tools, schemas, parameters, permissions, and execution constraintsinstruction and policy leakage— debug, template, trace, status, or orchestration endpoints that return prompts or policy verbatimtool credentialstorage and inheritance — where scoped credentials become available to model-driven actionsguardrailconfiguration and scan endpoints — endpoints that expose or invoke input/output filteringagent-to-agentor orchestration paths — where one agent's output becomes another agent's instruction, context, or action triggerbusiness-logic paths— where model output drives approvals, routing, data access, task execution, or external requests
Despite the AI framing, findings on these surfaces are classic security issues. A prompt injection may be the entry point, but the actual bug is typically excessive tool authority, missing object authorization, a credential inherited by the wrong workflow, or a guardrail that fails open . The AI angle changes the entry point and possible blast radius but not the underlying CWE.
When a finding lands, Tenzai's report explains both the underlying security primitive (IDOR, BAC, excessive privilege) and the AI-mediated impact. Tenzai highlights what the attacker controls, what the victim model trusts, which tool paths are involved and most importantly what business actions becomes reachable. Alongside the standard OWASP Top 10 and CWE references, the Tenzai platform tags the issue with AI-specific categories: For example, OWASP LLM01 prompt injection or LLM06 excessive agency, and the new AI-specific CWEs that exist for these patterns.
What this changes
Tenzai's ability to test AI capabilities, as well as web applications in the same engagement means cross-domain findings surface.
This is built into how the agent operates:
- Attack-surface agnostic. From the agent's point of view, an AI platform is just another type of application. The same primitives — HTTP issuance, source-code reading, surface mapping, exploitation — apply across web apps, infrastructure-facing services, and AI agents in the same run.
- Cross-surface chaining. Primitives discovered on one surface feed exploits on another, inside the same agent, with no human-in-the-loop wiring. A permission flag in a JavaScript bundle leads to an internal endpoint that leaks an identifier that names a tool credential that is inherited by an LLM-invoked tool. The chain is the finding.
- Knowledge that compounds across runs. If the first agent iteration learns where agent configurations live, the next iteration does not start by rediscovering them. The agent's methodology adapts itself to the target application while lead generation is focused.
As enterprise applications become increasingly AI-enabled, defenders must test the full attack surface, where vulnerabilities in AI systems, web applications, and APIs converge to create entirely new attack paths.
.avif)