Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up

Minimal x86-64 emulator for WebAssembly - run ELF binaries in your browser

License

NotificationsYou must be signed in to change notification settings

xarantolus/ax

Repository files navigation

This is a minimal x86-64 emulator for WebAssembly. It executes real machine code and can be used to emulate x86-64 user-space programs in the browser.

Currently implemented are 315 opcodes for 65 mnemonics (37 complete, 28 partial), which is only a very small subset of the more than 981 available mnemonics with at least 3684 variantsSource. More detailed stats can be found via thestats.py script.

Note that not all implemented instructions work exactly the same way as on real hardware, but the goal is to be as close as possible while staying reasonable. Notable exceptions are instructions that interact with the operating system (interrupts, syscalls) and the omission of all flags that are not used by jump instructions.

In addition to the emulator itself, this repository contains scripts that should be interesting for anyone who wants to write an x86-64 emulator. The most important one,t.py, automatically generates test cases for an instruction by trying out different inputs and thus finding interesting inputs, outputs and flag combinations. Seeautomatically generate test cases for more information.

Try it out!

You can try out the emulator right now by visitingthe website, selecting a suitable ELF binary and clicking "Run". The emulator will then execute the binary and show the output. Note that currently support for ELF binaries is limited/buggy (there are some problems getting libc to work), you can however use binaries from theexamples/programs directory. The source code for this site is in theexamples/web directory.

Usage on the web

  • Thedemo site usesax to run ELF binaries in the browser
  • MemeAssembly is a meme programming language that compiles to x86-64 instructions. MyMemeAssembly Playground usesax to run this language right in the browser, emulating some syscalls likeread andwrite to make the programs work.
  • Maybe you? If you useax in a project, please let me know and I'll add it here! :)

How to use

The emulator is compiled to WebAssembly and can be used as a JavaScript Module. This works in modern browsers.

The recommended approach is to just install theNPM module:

npm i ax-x86

Before using any functions, you have to make sure the WASM binary has been downloaded using the defaultinit function:

importinit,{version}from'ax-x86';// This will download the WASM binary and initialize the moduleawaitinit();// Now you can use the moduleconsole.log("ax version:",version());

This readme contains two examples that show typical use cases, for a more detailed API reference, seethe documentation.

Two warnings/pitfalls when using this emulator:

  • Make sure that all numbers are passed asbigint, which can be done using ann suffix.0x1000n is abigint literal (which is what we want),0x1000 is anumber (which willnot work)
  • When using frontend frameworks, it is recommended to await theinit function before your components are mounted, e.g. in asetup function. This will make sure the WASM binary is downloaded before the component is rendered. You can look atthis Vue component for an example.

Simple emulation of instructions

The following is a simple example that executes a few instructions and logs the calculated result.

Open for more info on the example
importinit,{Axecutor,Mnemonic,Register,version}from'ax-x86';awaitinit();// Define bytes for x86 instructions:letcode=newUint8Array([// mov rax, 0xff0x48,0xc7,0xc0,0xff,0,0,0,// mov rbx, 0xf0x48,0xc7,0xc3,0xf,0,0,0,// and rax, rbx0x48,0x21,0xd8]);// Create a new emulator instance// You could also create an instance from an ELF/Linux binary using `Axecutor.from_binary` insteadletax=newAxecutor(code,// Code start address, this is where the first instruction byte is located0x1000n,// Entrypoint address, this is where execution starts. It is usually, but not always, the same as the code start address0x1000n);console.log("Initial state:",ax.toString());// One could set up a stack of size 0x1000 here, but it's not necessary for this example// This automatically writes the stack pointer to RSP// let stack_addr = ax.init_stack(0x1000n);// This function will be called before any "Mov" instruction is executed. There's also a hook_after_mnemonic function.// It can be used to e.g. implement custom syscall handlersax.hook_before_mnemonic(Mnemonic.Mov,(instance)=>{console.log("Executing a mov instruction");// Here you can e.g. modify registers, memory etc.instance.reg_write_64(Register.RCX,0xabn);// this function *MUST* return one of// - instance.commit(): keep the changes we made in this handler and continue execution// - instance.stop(): keep changes, stop execution and return from the ax.execute() function// - instance.unchanged(): don't keep changes, continue execution// this will reset RCX to its previous valuereturninstance.unchanged();});// Execute all instructionsawaitax.execute();// Log the final state of the emulatorconsole.log("Final state:",ax.toString());// Outputs "15"console.log("RAX:",ax.reg_read_64(Register.RAX));

The emulator will just stop when reaching the end of the code.

Emulate ELF binaries

The emulator also has some convenience functions for handling Linux/ELF binaries. Thedemo site uses these convenience functions to emulate programs.

Open for more info on how to emulate ELF files

One thing to note is that binaries usually exit via theexit syscall, which is not implemented by default (same as any other syscall).You can either implement your own syscall handler that handles theexit syscall, or you can use thehandle_syscalls method to register predefined handlers for a small set of syscalls.

letax=Axecutor.from_binary(/* elf file content as Uint8Array */);// Set up the stack according to the System V ABI.// This sets up memory locations for command-line arguments and environment variables// and writes the stack pointer to RSPax.init_stack_program_start(8n*1024n,// Stack size["/bin/my_binary","arg1","arg2"],// argv["COLORTERM=truecolor","TERM=xterm-256color"]// environment variables);// Use a predefined handler for the `exit` syscall, which just stops executionax.handle_syscalls(Syscall.Exit);// Register a syscall handler for the `write` syscall (1)letsyscallHandler=asyncfunction(ax:Axecutor){letsyscall_num=ax.reg_read_64(Register.RAX);letrdi=ax.reg_read_64(Register.RDI);letrsi=ax.reg_read_64(Register.RSI);letrdx=ax.reg_read_64(Register.RDX);console.log(`Syscall${syscall_num} with args${rdi},${rsi},${rdx}`);switch(syscall_num){case1n:{// WRITE syscall MUST write to stdout or stderr (stdin supported for compatibility)if(rdi!=0n&&rdi!=1n&&rdi!=2n){thrownewError(`WRITE syscall: cannot write non-std{out,err} (!= 1,2) fds, but tried${rdi}`);}// Read data we should write from memoryletresult_buf=ax.mem_read_bytes(rsi,rdx);// Decode to stringletresult_str=newTextDecoder().decode(result_buf);// Do something with the stringconsole.log("WRITE syscall:",result_str);// Return the number of bytes that were written in RAX,// that way the program knows it workedax.reg_write_64(Register.RAX,rdx);returnax.commit();}}throw`Unhandled syscall${syscall_num}`;}// Register the write syscall handlerax.hook_before_mnemonic(Mnemonic.Syscall,syscallHandler);// Log function callsax.hook_after_mnemonic(Mnemonic.Call,function(ax:Axecutor){// After a call instruction, RIP points to the first instruction of the called functionletfunc_addr=ax.reg_read_64(Register.RIP);// Resolve a name for that function - it might be undefined if no symbol is available// Make sure to compile your binary with -g to include symbolsletname=ax.resolve_symbol(func_addr);console.log("Calling function "+(name||"<unknown>")+" at "+func_addr.toString(16)+"\n");returnax.unchanged();});// Execute the programawaitax.execute();

If your binaries need more system calls, you can look at theexample site implementation, seethe docs for pre-existing handler functions or get more details for your own implementationhere.

Contributing

If you want to contribute to this project, that's great! A good way to involved is using the emulator and finding things that could be improved :)You could e.g. get started by adding support for a new instruction mnemonic. There's a tutorial on how to do that below.If you run into problems setting up the development tools or have any other questions, feel free to open an issue.

TheMakefile has a lot of targets that can be useful for development. Thetest target runs tests both on your native machine and in WASM via NodeJS, making sure implemented instructions behave as expected in the target environment. Thetest-js target makes sure the public-facing JS API works as expected.

For any changes you want to make, you can branch off from thedevelop branch. Please format the code usingmake fmt before submitting a pull request and make sure thatmake precommit passes.

If you want to work on something, I recommend having two terminals opened: one job for automatically running tests on changes (make watch-tests) and one for automatically rebuilding the module, serving the web server and rebuilding the example programs (make watch). This configuration is already included in thetasks.json file, VSCode should offer to run them automatically.

Development setup

If you want to develop in a Docker container, there's aDev Container configuration provided in the repository. I would however recommend setting up the development tools on your native machine:

  1. Make sure you have installed Rust/Cargo,wasm-pack, Node.js, NPM, Python, PIP, Make, GCC and the GNU Assembler
    • You can optionally installmold to speed up link times (mostly for tests); the Makefile will automatically use it if it's installed
  2. You should now be able to build the WebAssembly module withmake (this will also build a native binaryax)
  3. You can runmake dependencies to installcargo-watch,cargo-tarpaulin (for generating test coverage info files) and python script dependencies
  4. Try out runningmake test ormake watch-tests to run tests
  5. Runmake watch-debug in one terminal to rebuild the WebAssembly module on changes, then runmake web in another terminal to start the development server. You can also just runmake watch, which runs both in the same terminal
  6. Open the local example site and make changes! (link should be in themake web/make watch output)

How to implement a new mnemonic

Thegenerate.py script is used for generating instruction implementation stubs. You can e.g. runpython3 generate.py push to generate a file for all instruction mnemonics that start withpush; if you only want more exact matches usepush_ as argument. Note that you must have run a build for the WebAssembly package, as otherwise the script won't be able to find the files from theiced-x86 crate that are used for generating the stubs.

Afterwards, runmake switch to regenerate the instruction mnemonic switch statement (insrc/instructions/generated.rs). Now your new stub functions are reachable.

Afterwards, it is recommended to automatically generate test cases, then implement the instruction. You can also add newintegration tests that use the instruction, this is especially important for instructions that operate on flags.

Automatically generate test cases

The repository comes with scripts for generating test cases for x86-64 instructions. They basically try out instructions on your real x86-64 CPU and extract the processor state change.

The test cases are generated by running e.g.python3 t.py add al, [rbx], which tries around 6000 different inputs for the instruction (the extreme mode with-e tries around 100k inputs).The generated test cases are deduplicated, resulting in only one test case per unique combination of flags that are set and cleared. Note that not necessarily all combinations will be discovered.When generating tests, make sure that only the flags that are defined in the manual are tested for, e.g. forimul where onlyCF andOF are defined, you would pass-f CF,OF to the script. If an instruction needs to be tested with different flag values (e.g.adc), you can pass e.g.-s cf to permutate one or multiple flags.For some instructions it makes more sense go byresult instead of flags, you can do this by passing-r to the script.For instructions with implicit arguments (e.g.imul rax also modifiesrdx), you can pass-i rdx to the script to also test the implicit arguments (any comma separated list of operands should work).

Here is one of 19 test cases that was automatically discovered foradd al, [rbx] (without-e):

// ax_test is macro that sets up the emulator, then runs setup and a post-execution assertion function// add al, byte ptr [rbx]ax_test![add_al_byte_ptr_rbx_cf;// The encoded instruction bytes:0x2,0x3;// A setup function that is called before the instruction is executed:    |a:&mutAxecutor|{        write_reg_value!(b; a;AL;0x8);// This sets up a memory area with a size of one byte, containing 0xff        init_mem_value!(b; a;0x1000;0xff);// -1// RBX points to that memory area        write_reg_value!(q; a;RBX;0x1000);};// This function is called after the instruction ran and checks the result:    |a:Axecutor|{        assert_reg_value!(b; a;AL;0x7);// Also make sure the source operand is unchanged        assert_reg_value!(q; a;RBX;0x1000);        assert_mem_value!(b; a;0x1000;0xff);};// On the left side of `;` are the flags that must be set after the instruction ran,// on the right are flags that must not be set(FLAG_CF;FLAG_PF |FLAG_ZF |FLAG_SF |FLAG_OF)];

The test case generation scriptt.py supports register operands (both general purpose and SSE 128-bit), as well as a subset of memory and immediate operands. It requires that the GNU Assembleras andgcc are installed and must be run on x86-64 Linux. It places thousands of generated assembly files and binaries in/dev/shm/ax_*, so in case you run out of RAM that is the place to check.

Another script for testing jumps (j.py) is also available, but it's not as automated. Some other convenience copy-paste texts can be generated witha.py, e.g. withpython3 a.py add al, [rbx] and then selectingu you'll get a code snippet for a JavaScriptUint8Array containing the bytes of the instruction.

If you want to adjustt.py for testing your own emulator, you should adjust the__str__ method of theTestCase class to generate different syntax with the same information.

Interesting Documentation

Here are some useful links for more information about x86-64/AMD64, the System V ABI and ELF files:

Limitations

Here are some limitations that could be inspiration for future features:

  • Only the Signed, Carry, Overflow, Zero and Parity status flags are supported
  • Most instructions aren't implemented, especially
    • Anything I found too legacy
    • Many instructions
  • Syscall and Interrupts are not implemented to spec.
    • If you have registered hooks usinghook_before_mnemonic orhook_after_mnemonic) they are essentially a no-op with your handler executing
    • If no hooks are registered and a syscall/interrupt is executed, an exception is thrown
  • The memory implementation is quite weird and needs an overhaul
    • Access restrictions (partially implemented), page management (maybe better to leave to the user) etc. is missing
  • ELF file parsing is currently really basic
    • Binaries with libc don't work due to relocations and more not being implemented
    • Basically only very basic binaries work
  • Segments are not really implemented to spec, but kind of work OK

Ideas

Seeissue 1 for some ideas for future features. Also feel free to open an issue if you have an idea :)

AGPL v3


[8]ページ先頭

©2009-2025 Movatter.jp