This repository was archived by the owner on Feb 3, 2025. It is now read-only.

githubnext/testpilotPublic archive

NotificationsYou must be signed in to change notification settings
Fork50
Star553

Test generation using large language models

githubnext.com/projects/testpilot

License

MIT license

553 stars 50 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
.github		.github
benchmark		benchmark
examples		examples
ql		ql
src		src
test		test
typings		typings
.gitignore		.gitignore
.npmignore		.npmignore
CODEOWNERS		CODEOWNERS
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
SUPPORT.md		SUPPORT.md
package-lock.json		package-lock.json
package.json		package.json
tsconfig-base.json		tsconfig-base.json

Repository files navigation

Note: This version of TestPilot has been archived. Please refer to the new version at https://github.com/neu-se/testpilot2.

TestPilot

TestPilot is a tool for automatically generating unit tests for npm packageswritten in JavaScript/TypeScript using a large language model (LLM).

Note that TestPilot represents an early exploration in the use of LLMs fortest generation, and has been made available in open source as a basis forresearch and exploration. For day-to-day use the test generation featuresinCopilot Chatare likely to yield better results.

Background

TestPilot generates tests for a given functionf by prompting the LLM with askeleton of a test forf, including information aboutf embedded in codecomments, such as its signature, the body off, and examples usages offautomatically mined from project documentation. The model's response is thenparsed and translated into a runnable unit test. Optionally, the test is run andif it fails the model is prompted again with additional information about thefailed test, giving it a chance to refine the test.

Unlike other systems for LLM-based test generation, TestPilot does not requireany additional training or reinforcement learning, and no examples of functionsand their associated tests are needed.

A research paper describing TestPilot in detail is available onarXiv andIEEExplore.

Requirements

In general, to be able to run TestPilot you need access to a Codex-style LLMwith completion API. Set theTESTPILOT_LLM_API_ENDPOINT environment variable tothe URL of the LLM API endpoint you want to use, andTESTPILOT_LLM_AUTH_HEADERS to a JSON object containing the headers you need toauthenticate with the API.

Typical values for these variables might be:

TESTPILOT_LLM_API_ENDPOINT='https://api.openai.com/v1/engines/code-cushman-001/completions'
TESTPILOT_LLM_AUTH_HEADERS='{"Authorization": "Bearer <your API key>", "OpenAI-Organization": "<your organization ID>"}'

Note, however, that you can run TestPilot in reproduction mode without access tothe LLM API where model responses are taken from the output of a previous run;see below for details.

Installation

You can install TestPilot from a pre-built package or from source.

Installing from a pre-built package

TestPilot is a available as a pre-built npm package, though it is not currentlypublished to the npm registry. You can download a tarball from the repositoryand install it in the usual way. Note that this distribution only contains thecore part of TestPilot, not the benchmarking harness.

Installing from source

Thesrc/ directory contains the source code for TestPilot, which is written inTypeScript and gets compiled into thedist/ directory. Tests are intest/;thebenchmark/ directory contains a benchmarking harness for running TestPiloton multiple npm packages; andql/ contains the CodeQL queries used to analyzethe results.

In the root directory of a checkout of this repository, runnpm build toinstall dependencies and build the package.

You can also usenpm run build:watch to automatically build anytime you makechanges to the code. Note, however, that this will not automatically installdependencies, and also will not build the benchmarking harness.

Usenpm run test to run the tests. For convenience, this will also installdependencies and run a build.

Benchmarking

If you install TestPilot from source, you can use the benchmarking harness torun TestPilot on multiple packages and analyze the results. This is notcurrently available if you install TestPilot from a pre-built package.

Running locally

Basic usage is as follows:

node benchmark/run.js --outputDir<report_dir> --package<package_dir>

This generates tests for all functions exported by the package in<package_dir>, validates them, and writes the results to<report_dir>.

Note that this assumes that package dependencies are installed and any buildsteps have been run (e.g., usingnpm i andnpm run build). TestPilot alsorelies onmocha, so if the package under test does not already depend on it,you must install it separately, for example using the commandnpm i --no-save mocha.

Running on Actions

Therun-experiment.yml workflow runs an experiment on GitHub Actions,producing the final report as an artifact you can download. Theresults-allartifact contains the results of all packages, while the other artifacts containthe individual results of each package.

Reproducing results

The results of TestPilot are non-deterministic, so even if you run it from thesame package on the same machine multiple times, you will get different results.However, the benchmarking harness records enough data to be able to replay abenchmark run in many cases.

To do this, use the--api and--responses options to reuse the API listingsand responses from a previous run:

node benchmark/run.js --outputDir<report_dir> --package<package_dir> --api<api.json> --responses<prompts.json>

Note that by default replay will fail if any of the prompts are not found in theresponses file. This typically happens if TestPilot is refining failing tests,since in this case the prompt to the model depends on the exact failure message,which can be system-specific (e.g., containing local file-system paths), ordepend on the Node.js version or other factors.

To work around these limitations, you can pass the--strictResponses falseflag handle treat missing prompts by treating them as getting no response fromthe model. This will not, in general, produce the same results as the initialrun, but suffices in many cases.

Analyzing results

The CodeQL queries inql/queries can be used to analyze the results of runningan experiment. Seeql/CodeQL.md for instructions on how to setup CodeQL andrun the queries.

License

This project is licensed under the terms of the MIT open source license. Please refer toMIT for the full terms.

Maintainers

Max Schaefer (@max-schaefer)
Frank Tip (@franktip)
Sarah Nadi (@snadi)

Support

TestPilot is a research prototype and is not officially supported. However, ifyou have questions or feedback, please file an issue and we will do our best torespond.

Acknowledgement

We thank Aryaz Eghbali (@aryaze) for his work on the initial version ofTestPilot.

About

Test generation using large language models

githubnext.com/projects/testpilot

Code of conduct

Security policy

Activity

Custom properties

Stars

553 stars

Watchers

14 watching

Forks

50 forks

Report repository

Movatterモバイル変換

License

githubnext/testpilot

Folders and files

Latest commit

History

Repository files navigation

TestPilot

Background

Requirements

Installation

Installing from a pre-built package

Installing from source

Benchmarking

Running locally

Running on Actions

Reproducing results

Analyzing results

License

Maintainers

Support

Acknowledgement

About

Topics

Resources

License

Code of conduct

Security policy

Uh oh!

Stars

Watchers

Forks

Uh oh!

Contributors6

Uh oh!

Languages