Tenzai Launches AI Application Testing, Chaining Vulnerabilities Across Web, API, and AI Surfaces

Arnon Trabelsi

The Tenzai AI hacker expands to AI apps. Testing these applications well means treating the AI surface and the classic web surface as one connected target, since the findings that matter are almost always chains.

Tenzai Launches AI Application Testing, Chaining Vulnerabilities Across Web, API, and AI Surfaces

Table of contents

This is some text inside of a div bloc

Today, we're announcing that Tenzai is expanding to AI applications, moving beyond the web and API surfaces our AI hacker is known for.

AI is everywhere, built into existing enterprise apps using the same authentication mechanisms, internal and external APIs, and client-side interfaces that carry web application security risks. Testing these applications well means treating the AI surface and the classic web surface as one connected target. Attackers already do.

For example, in one production enterprise application Tenzai tested, our hacker uncovered a critical vulnerability - a chain of findings that spans both the web and AI surfaces. This attack chain demonstrates how the Tenzai agent operates through that connected surface: how it maps the AI side, how it links AI primitives into web primitives, and how it validates that what it found is actually exploitable. The demo below walks through what that vulnerability looks like within Tenzai, where a Member-role user lands RCE on a production Oracle server. It was achieved by stitching a web access control failure IDOR into an AI tool-execution path.

Let's see it in action:

Where web and AI surfaces meet

The attack chain described is a canonical example. A Member-role user, with no admin privilege and no can_override_ssh_commands flag, ends up with shell on a production Oracle server. The path:

Web layer — Admin sessions BAC. Member-role user can access /api/admin/sessions/* endpoints that should be admin-only. The response includes every active session — names, emails, IPs, session IDs — and, inside one session's activity history, a UUID belonging to an agent owned by the DBA team. Classic broken access control; the AI surface is nowhere in sight yet.
AI surface — Agent-card IDOR. Armed with the UUID, the attacker hits /api/agents/{uuid}/card. The base /api/agents/{uuid} returns 404 — auth is working there. The sub endpoint/card returns 200 with the full agent profile: an SSH tool pointing at production hostnames, an Oracle DB tool, the agent's instructions. The attacker now knows this specific agent has tool authority that maps to production infrastructure.
AI surface — /execute IDOR via LLM-mediated tool call. The attacker invokes /api/agents/{uuid}/execute with a natural-language instruction. The model decides to invoke the SSH tool with the attacker-controlled command. The tool runs under the agent's inherited oracle credentials. The response contains whoami: oracle, hostname: dbsvr01.

When the Tenzai agent finds multiple primitives that share an attacker identity, target asset, and a timeline, the platform stitches them into a single attack-chain finding. That chain finding is not a paste of the underlying issues - it is the narrative of how the attacker uses the different vulnerabilities one after the other, with the captured conversation, the HTTP transcripts, and a validator's independent reproduction all attached as evidence.

The AI surface, in practice

Tenzai's AI hacker maps the target application as a set of actors, instructions, tools, credentials, guardrails, state transitions, and HTTP endpoints. While a web scanner will see an endpoint that returns a tool schema or agent identifier and move on, Tenzai's agent matches those details against how the target application uses the model to make decisions, call tools, inherit permissions, move data, or trigger business workflows. These matches turn a JSON response containing an agent schema into a lead.

The categories of AI surface the agent investigates include:

agent metadata endpoints — identifiers, capabilities, configuration, workflow state, and peer-agent relationships
tool inventory paths — disclosed tools, schemas, parameters, permissions, and execution constraints
instruction and policy leakage — debug, template, trace, status, or orchestration endpoints that return prompts or policy verbatim
tool credential storage and inheritance — where scoped credentials become available to model-driven actions
guardrail configuration and scan endpoints — endpoints that expose or invoke input/output filtering
agent-to-agent or orchestration paths — where one agent's output becomes another agent's instruction, context, or action trigger
business-logic paths — where model output drives approvals, routing, data access, task execution, or external requests

Despite the AI framing, findings on these surfaces are classic security issues. A prompt injection may be the entry point, but the actual bug is typically excessive tool authority, missing object authorization, a credential inherited by the wrong workflow, or a guardrail that fails open . The AI angle changes the entry point and possible blast radius but not the underlying CWE.

When a finding lands, Tenzai's report explains both the underlying security primitive (IDOR, BAC, excessive privilege) and the AI-mediated impact. Tenzai highlights what the attacker controls, what the victim model trusts, which tool paths are involved and most importantly what business actions becomes reachable. Alongside the standard OWASP Top 10 and CWE references, the Tenzai platform tags the issue with AI-specific categories: For example, OWASP LLM01 prompt injection or LLM06 excessive agency, and the new AI-specific CWEs that exist for these patterns.

What this changes

Tenzai's ability to test AI capabilities, as well as web applications in the same engagement means cross-domain findings surface.

This is built into how the agent operates:

Attack-surface agnostic. From the agent's point of view, an AI platform is just another type of application. The same primitives — HTTP issuance, source-code reading, surface mapping, exploitation — apply across web apps, infrastructure-facing services, and AI agents in the same run.
Cross-surface chaining. Primitives discovered on one surface feed exploits on another, inside the same agent, with no human-in-the-loop wiring. A permission flag in a JavaScript bundle leads to an internal endpoint that leaks an identifier that names a tool credential that is inherited by an LLM-invoked tool. The chain is the finding.
Knowledge that compounds across runs. If the first agent iteration learns where agent configurations live, the next iteration does not start by rediscovering them. The agent's methodology adapts itself to the target application while lead generation is focused.

As enterprise applications become increasingly AI-enabled, defenders must test the full attack surface, where vulnerabilities in AI systems, web applications, and APIs converge to create entirely new attack paths.