- Notifications
You must be signed in to change notification settings - Fork7
Minimal x86-64 emulator for WebAssembly - run ELF binaries in your browser
License
xarantolus/ax
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
This is a minimal x86-64 emulator for WebAssembly. It executes real machine code and can be used to emulate x86-64 user-space programs in the browser.
Currently implemented are 315 opcodes for 65 mnemonics (37 complete, 28 partial), which is only a very small subset of the more than 981 available mnemonics with at least 3684 variantsSource. More detailed stats can be found via thestats.py
script.
Note that not all implemented instructions work exactly the same way as on real hardware, but the goal is to be as close as possible while staying reasonable. Notable exceptions are instructions that interact with the operating system (interrupts, syscalls) and the omission of all flags that are not used by jump instructions.
In addition to the emulator itself, this repository contains scripts that should be interesting for anyone who wants to write an x86-64 emulator. The most important one,t.py
, automatically generates test cases for an instruction by trying out different inputs and thus finding interesting inputs, outputs and flag combinations. Seeautomatically generate test cases for more information.
You can try out the emulator right now by visitingthe website, selecting a suitable ELF binary and clicking "Run". The emulator will then execute the binary and show the output. Note that currently support for ELF binaries is limited/buggy (there are some problems getting libc to work), you can however use binaries from theexamples/programs
directory. The source code for this site is in theexamples/web
directory.
- Thedemo site uses
ax
to run ELF binaries in the browser - MemeAssembly is a meme programming language that compiles to x86-64 instructions. MyMemeAssembly Playground uses
ax
to run this language right in the browser, emulating some syscalls likeread
andwrite
to make the programs work. - Maybe you? If you use
ax
in a project, please let me know and I'll add it here! :)
The emulator is compiled to WebAssembly and can be used as a JavaScript Module. This works in modern browsers.
The recommended approach is to just install theNPM module:
npm i ax-x86
Before using any functions, you have to make sure the WASM binary has been downloaded using the defaultinit
function:
importinit,{version}from'ax-x86';// This will download the WASM binary and initialize the moduleawaitinit();// Now you can use the moduleconsole.log("ax version:",version());
This readme contains two examples that show typical use cases, for a more detailed API reference, seethe documentation.
Two warnings/pitfalls when using this emulator:
- Make sure that all numbers are passed as
bigint
, which can be done using ann
suffix.0x1000n
is abigint
literal (which is what we want),0x1000
is anumber
(which willnot work) - When using frontend frameworks, it is recommended to await the
init
function before your components are mounted, e.g. in asetup
function. This will make sure the WASM binary is downloaded before the component is rendered. You can look atthis Vue component for an example.
The following is a simple example that executes a few instructions and logs the calculated result.
Open for more info on the example
importinit,{Axecutor,Mnemonic,Register,version}from'ax-x86';awaitinit();// Define bytes for x86 instructions:letcode=newUint8Array([// mov rax, 0xff0x48,0xc7,0xc0,0xff,0,0,0,// mov rbx, 0xf0x48,0xc7,0xc3,0xf,0,0,0,// and rax, rbx0x48,0x21,0xd8]);// Create a new emulator instance// You could also create an instance from an ELF/Linux binary using `Axecutor.from_binary` insteadletax=newAxecutor(code,// Code start address, this is where the first instruction byte is located0x1000n,// Entrypoint address, this is where execution starts. It is usually, but not always, the same as the code start address0x1000n);console.log("Initial state:",ax.toString());// One could set up a stack of size 0x1000 here, but it's not necessary for this example// This automatically writes the stack pointer to RSP// let stack_addr = ax.init_stack(0x1000n);// This function will be called before any "Mov" instruction is executed. There's also a hook_after_mnemonic function.// It can be used to e.g. implement custom syscall handlersax.hook_before_mnemonic(Mnemonic.Mov,(instance)=>{console.log("Executing a mov instruction");// Here you can e.g. modify registers, memory etc.instance.reg_write_64(Register.RCX,0xabn);// this function *MUST* return one of// - instance.commit(): keep the changes we made in this handler and continue execution// - instance.stop(): keep changes, stop execution and return from the ax.execute() function// - instance.unchanged(): don't keep changes, continue execution// this will reset RCX to its previous valuereturninstance.unchanged();});// Execute all instructionsawaitax.execute();// Log the final state of the emulatorconsole.log("Final state:",ax.toString());// Outputs "15"console.log("RAX:",ax.reg_read_64(Register.RAX));
The emulator will just stop when reaching the end of the code.
The emulator also has some convenience functions for handling Linux/ELF binaries. Thedemo site uses these convenience functions to emulate programs.
Open for more info on how to emulate ELF files
One thing to note is that binaries usually exit via theexit
syscall, which is not implemented by default (same as any other syscall).You can either implement your own syscall handler that handles theexit
syscall, or you can use thehandle_syscalls
method to register predefined handlers for a small set of syscalls.
letax=Axecutor.from_binary(/* elf file content as Uint8Array */);// Set up the stack according to the System V ABI.// This sets up memory locations for command-line arguments and environment variables// and writes the stack pointer to RSPax.init_stack_program_start(8n*1024n,// Stack size["/bin/my_binary","arg1","arg2"],// argv["COLORTERM=truecolor","TERM=xterm-256color"]// environment variables);// Use a predefined handler for the `exit` syscall, which just stops executionax.handle_syscalls(Syscall.Exit);// Register a syscall handler for the `write` syscall (1)letsyscallHandler=asyncfunction(ax:Axecutor){letsyscall_num=ax.reg_read_64(Register.RAX);letrdi=ax.reg_read_64(Register.RDI);letrsi=ax.reg_read_64(Register.RSI);letrdx=ax.reg_read_64(Register.RDX);console.log(`Syscall${syscall_num} with args${rdi},${rsi},${rdx}`);switch(syscall_num){case1n:{// WRITE syscall MUST write to stdout or stderr (stdin supported for compatibility)if(rdi!=0n&&rdi!=1n&&rdi!=2n){thrownewError(`WRITE syscall: cannot write non-std{out,err} (!= 1,2) fds, but tried${rdi}`);}// Read data we should write from memoryletresult_buf=ax.mem_read_bytes(rsi,rdx);// Decode to stringletresult_str=newTextDecoder().decode(result_buf);// Do something with the stringconsole.log("WRITE syscall:",result_str);// Return the number of bytes that were written in RAX,// that way the program knows it workedax.reg_write_64(Register.RAX,rdx);returnax.commit();}}throw`Unhandled syscall${syscall_num}`;}// Register the write syscall handlerax.hook_before_mnemonic(Mnemonic.Syscall,syscallHandler);// Log function callsax.hook_after_mnemonic(Mnemonic.Call,function(ax:Axecutor){// After a call instruction, RIP points to the first instruction of the called functionletfunc_addr=ax.reg_read_64(Register.RIP);// Resolve a name for that function - it might be undefined if no symbol is available// Make sure to compile your binary with -g to include symbolsletname=ax.resolve_symbol(func_addr);console.log("Calling function "+(name||"<unknown>")+" at "+func_addr.toString(16)+"\n");returnax.unchanged();});// Execute the programawaitax.execute();
If your binaries need more system calls, you can look at theexample site implementation, seethe docs for pre-existing handler functions or get more details for your own implementationhere.
If you want to contribute to this project, that's great! A good way to involved is using the emulator and finding things that could be improved :)You could e.g. get started by adding support for a new instruction mnemonic. There's a tutorial on how to do that below.If you run into problems setting up the development tools or have any other questions, feel free to open an issue.
TheMakefile
has a lot of targets that can be useful for development. Thetest
target runs tests both on your native machine and in WASM via NodeJS, making sure implemented instructions behave as expected in the target environment. Thetest-js
target makes sure the public-facing JS API works as expected.
For any changes you want to make, you can branch off from thedevelop
branch. Please format the code usingmake fmt
before submitting a pull request and make sure thatmake precommit
passes.
If you want to work on something, I recommend having two terminals opened: one job for automatically running tests on changes (make watch-tests
) and one for automatically rebuilding the module, serving the web server and rebuilding the example programs (make watch
). This configuration is already included in thetasks.json
file, VSCode should offer to run them automatically.
If you want to develop in a Docker container, there's aDev Container configuration provided in the repository. I would however recommend setting up the development tools on your native machine:
- Make sure you have installed Rust/Cargo,
wasm-pack
, Node.js, NPM, Python, PIP, Make, GCC and the GNU Assembler- You can optionally installmold to speed up link times (mostly for tests); the Makefile will automatically use it if it's installed
- You should now be able to build the WebAssembly module with
make
(this will also build a native binaryax
) - You can run
make dependencies
to installcargo-watch
,cargo-tarpaulin
(for generating test coverage info files) and python script dependencies - Try out running
make test
ormake watch-tests
to run tests - Run
make watch-debug
in one terminal to rebuild the WebAssembly module on changes, then runmake web
in another terminal to start the development server. You can also just runmake watch
, which runs both in the same terminal - Open the local example site and make changes! (link should be in the
make web
/make watch
output)
Thegenerate.py
script is used for generating instruction implementation stubs. You can e.g. runpython3 generate.py push
to generate a file for all instruction mnemonics that start withpush
; if you only want more exact matches usepush_
as argument. Note that you must have run a build for the WebAssembly package, as otherwise the script won't be able to find the files from theiced-x86
crate that are used for generating the stubs.
Afterwards, runmake switch
to regenerate the instruction mnemonic switch statement (insrc/instructions/generated.rs
). Now your new stub functions are reachable.
Afterwards, it is recommended to automatically generate test cases, then implement the instruction. You can also add newintegration tests that use the instruction, this is especially important for instructions that operate on flags.
The repository comes with scripts for generating test cases for x86-64 instructions. They basically try out instructions on your real x86-64 CPU and extract the processor state change.
The test cases are generated by running e.g.python3 t.py add al, [rbx]
, which tries around 6000 different inputs for the instruction (the extreme mode with-e
tries around 100k inputs).The generated test cases are deduplicated, resulting in only one test case per unique combination of flags that are set and cleared. Note that not necessarily all combinations will be discovered.When generating tests, make sure that only the flags that are defined in the manual are tested for, e.g. forimul
where onlyCF
andOF
are defined, you would pass-f CF,OF
to the script. If an instruction needs to be tested with different flag values (e.g.adc
), you can pass e.g.-s cf
to permutate one or multiple flags.For some instructions it makes more sense go byresult instead of flags, you can do this by passing-r
to the script.For instructions with implicit arguments (e.g.imul rax
also modifiesrdx
), you can pass-i rdx
to the script to also test the implicit arguments (any comma separated list of operands should work).
Here is one of 19 test cases that was automatically discovered foradd al, [rbx]
(without-e
):
// ax_test is macro that sets up the emulator, then runs setup and a post-execution assertion function// add al, byte ptr [rbx]ax_test![add_al_byte_ptr_rbx_cf;// The encoded instruction bytes:0x2,0x3;// A setup function that is called before the instruction is executed: |a:&mutAxecutor|{ write_reg_value!(b; a;AL;0x8);// This sets up a memory area with a size of one byte, containing 0xff init_mem_value!(b; a;0x1000;0xff);// -1// RBX points to that memory area write_reg_value!(q; a;RBX;0x1000);};// This function is called after the instruction ran and checks the result: |a:Axecutor|{ assert_reg_value!(b; a;AL;0x7);// Also make sure the source operand is unchanged assert_reg_value!(q; a;RBX;0x1000); assert_mem_value!(b; a;0x1000;0xff);};// On the left side of `;` are the flags that must be set after the instruction ran,// on the right are flags that must not be set(FLAG_CF;FLAG_PF |FLAG_ZF |FLAG_SF |FLAG_OF)];
The test case generation scriptt.py
supports register operands (both general purpose and SSE 128-bit), as well as a subset of memory and immediate operands. It requires that the GNU Assembleras
andgcc
are installed and must be run on x86-64 Linux. It places thousands of generated assembly files and binaries in/dev/shm/ax_*
, so in case you run out of RAM that is the place to check.
Another script for testing jumps (j.py
) is also available, but it's not as automated. Some other convenience copy-paste texts can be generated witha.py
, e.g. withpython3 a.py add al, [rbx]
and then selectingu
you'll get a code snippet for a JavaScriptUint8Array
containing the bytes of the instruction.
If you want to adjustt.py
for testing your own emulator, you should adjust the__str__
method of theTestCase
class to generate different syntax with the same information.
Here are some useful links for more information about x86-64/AMD64, the System V ABI and ELF files:
- Intel x64 Manuals
- AMD64 Developer Guides
- System V ABI (direct link)
- Linux ELF Specification,OSDev ELF with info on loading and relocations
- Man pages:
man 8 ld.so
Here are some limitations that could be inspiration for future features:
- Only the Signed, Carry, Overflow, Zero and Parity status flags are supported
- Most instructions aren't implemented, especially
- Anything I found too legacy
- Many instructions
- Syscall and Interrupts are not implemented to spec.
- If you have registered hooks using
hook_before_mnemonic
orhook_after_mnemonic
) they are essentially a no-op with your handler executing - If no hooks are registered and a syscall/interrupt is executed, an exception is thrown
- If you have registered hooks using
- The memory implementation is quite weird and needs an overhaul
- Access restrictions (partially implemented), page management (maybe better to leave to the user) etc. is missing
- ELF file parsing is currently really basic
- Binaries with libc don't work due to relocations and more not being implemented
- Basically only very basic binaries work
- Segments are not really implemented to spec, but kind of work OK
Seeissue 1 for some ideas for future features. Also feel free to open an issue if you have an idea :)
AGPL v3
About
Minimal x86-64 emulator for WebAssembly - run ELF binaries in your browser