Movatterモバイル変換


[0]ホーム

URL:


Bytecode

Game Programming PatternsBehavioral Patterns

Intent

Give behavior the flexibility of data by encoding it as instructions for avirtual machine.

Motivation

Making games may be fun, but it certainly ain’t easy. Modern games requireenormous, complex codebases. Console manufacturers andapp marketplace gatekeepers have stringent quality requirements, and a singlecrash bug can prevent your game from shipping.

I worked on a game that had six million lines of C++ code. For comparison, thesoftware controlling the Mars Curiosity rover is less than half that.

At the same time, we’re expected to squeeze every drop of performance out of theplatform. Games push hardware like nothing else, and we have to optimizerelentlessly just to keep pace with the competition.

To handle these high stability and performance requirements, we reach forheavyweight languages like C++ that have both low-level expressiveness to makethe most of the hardware and rich type systems to prevent or at least corralbugs.

We pride ourselves on our skill at this, but it has its cost. Being a proficientprogrammer takes years of dedicated training, after which you must contend withthe sheer scale of your codebase. Build times for large games can vary somewherebetween “go get a coffee” and “go roast your own beans, hand-grind them, pull anespresso, foam some milk, and practice your latte art in the froth”.

On top of these challenges, games have one more nasty constraint:fun. Playersdemand a play experience that’s both novel and yet carefully balanced. Thatrequires constant iteration, but if every tweak requires bugging an engineer tomuck around in piles of low-level code and then waiting for a glacial recompile,you’ve killed your creative flow.

Spell fight!

Let’s say we’re working on a magic-based fighting game. A pair of wizards squareoff and fling enchantments at each other until a victor is pronounced. We coulddefine these spells in code, but that means an engineer has to be involved everytime one is modified. When a designer wants to tweak a few numbers and get afeel for them, they have to recompile the entire game, reboot it, and get backinto a fight.

Like most games these days, we also need to be able to update the game after it ships,both to fix bugs and to add new content. If all of these spells are hard-coded,then updating them means patching the actual game executable.

Let’s take things a bit further and say that we also want to supportmodding.We wantusers to be able to create their own spells. If those are in code,that means every modder needs a full compiler toolchain to build the game, andwe have to release the sources. Worse, if they have a bug in their spell, it cancrash the game on some other player’s machine.

Data > code

It’s pretty clear that our engine’s implementation language isn’t the right fit.We need spells to be safely sandboxed from the core game. We want them to beeasy to modify, easy to reload, and physically separate from the rest of theexecutable.

I don’t know about you, but to me that sounds a lot likedata. If we candefine our behavior in separate data files that the game engine loads and“executes” in some way, we can achieve all of our goals.

We just need to figure out what “execute” means for data. How do you make somebytes in a file express behavior? There are a few ways to do this. I think itwill help you get a picture ofthis pattern’s strengths and weaknesses if wecompare it to another one: theInterpreter pattern.

The Interpreter pattern

I could write a whole chapter on this pattern, but four other guys alreadycovered that for me. Instead, I’ll cram the briefest of introductions in here.It starts with a language — thinkprogramming language — that you want toexecute. Say, for example, it supports arithmetic expressions like this:

(1 + 2) * (3 - 4)

Then, you take each piece of that expression, each rule in the language’sgrammar, and turn it into anobject. The number literals will be objects:

A series of number literal objects.

Basically, they’re little wrappers around the raw value. The operators will beobjects too, and they’ll have references to their operands. If you take intoaccount the parentheses and precedence, that expressionmagically turns into a little tree of objects like so:

A syntax tree. The number literals are connected by operator objects.

What “magic” is this? It’s simple — parsing. A parser takes a string ofcharacters and turns it into anabstract syntax tree, a collection of objectsrepresenting the grammatical structure of the text.

Whip up one of these and you’ve got yourself half of a compiler.

The Interpreter pattern isn’t aboutcreating that tree; it’s aboutexecutingit. The way it works is pretty clever. Each object in the tree is an expressionor a subexpression. In true object-oriented fashion, we’ll let expressionsevaluate themselves.

First, we define a base interface that all expressions implement:

classExpression{public:virtual~Expression(){}virtualdoubleevaluate()=0;};

Then, we define a class that implements this interface for each kind of expression in ourlanguage’s grammar. The simplest one is numbers:

classNumberExpression:publicExpression{public:NumberExpression(doublevalue):value_(value){}virtualdoubleevaluate(){returnvalue_;}private:doublevalue_;};

A literal number expression simply evaluates to its value. Addition andmultiplication are a bit more complex because they contain subexpressions.Before they can evaluate themselves, they need to recursively evaluate theirsubexpressions. Like so:

classAdditionExpression:publicExpression{public:AdditionExpression(Expression*left,Expression*right):left_(left),right_(right){}virtualdoubleevaluate(){// Evaluate the operands.doubleleft=left_->evaluate();doubleright=right_->evaluate();// Add them.returnleft+right;}private:Expression*left_;Expression*right_;};

I’m sure you can figure out what the implementation of multiply looks like.

Pretty neat right? Just a couple of simple classes and now we can represent andevaluate arbitrarily complex arithmetic expressions. We just need to create theright objects and wire them up correctly.

Ruby was implemented like this for something like 15 years. At version 1.9, theyswitched to bytecode like this chapter describes. Look how much time I’m savingyou!

It’s abeautiful, simple pattern, but it has someproblems. Look up at the illustration. What do you see? Lots of little boxes,and lots of arrows between them. Code is represented as a sprawling fractal treeof tiny objects. That has some unpleasant consequences:

Put those together, and what do they spell? S-L-O-W. There’s a reason mostprogramming languages in wide use aren’t based on the Interpreter pattern. It’sjust too slow, and it uses up too much memory.

Machine code, virtually

Consider our game. When we run it, the player’s computer doesn’t traverse abunch of C++ grammar tree structures at runtime. Instead, we compile it ahead oftime to machine code, and the CPU runs that. What’s machine code got going forit?

This sounds swell, but we don’t want actual machine code for our spells. Lettingusers provide machine code which our game executes is just begging forsecurity problems. What we need is a compromise between theperformance of machine code and the safety of the Interpreter pattern.

What if instead of loading actual machine code and executing it directly, wedefined our ownvirtual machine code? We’d then write a little emulator for itin our game. It would be similar to machine code — dense, linear, relativelylow-level — but would also be handled entirely by our game so we could safelysandbox it.

This is why many game consoles and iOS don’t allow programs to execute machinecode loaded or generated at runtime. That’s a drag because the fastestprogramming language implementations do exactly that. They contain a“just-in-time” compiler, orJIT, that translates the language to optimizedmachine code on the fly.

We’d call our little emulator avirtual machine(or “VM” for short), and the synthetic binary machine code it runsbytecode.It’s got the flexibility and ease of use of defining things in data, but it has betterperformance than higher-level representations like the Interpreter pattern.

In programming language circles, “virtual machine” and “interpreter” aresynonymous, and I use them interchangeably here. When I refer to the Gang ofFour’s Interpreter pattern, I’ll use “pattern” to make it clear.

This sounds daunting, though. My goal for the rest of this chapter is to showyou that if you keep your feature list pared down, it’s actually prettyapproachable. Even if you end up not using this pattern yourself, you’ll atleast have a better understanding of Lua and many other languages which areimplemented using it.

The Pattern

Aninstruction set defines the low-level operations that can be performed.A series of instructions is encoded as asequence of bytes. Avirtual machine executesthese instructions one at a time, using astack for intermediate values. Bycombining instructions, complex high-level behavior can be defined.

When to Use It

This is the most complex pattern in this book, and it’s not something to throw intoyour game lightly. Use it when you have a lot of behavior you need to define andyour game’s implementation language isn’t a good fit because:

Of course, that list describes a bunch of your game. Who doesn’t want a fasteriteration loop or more safety? However, that doesn’t come for free. Bytecode isslower than native code, so it isn’t a good fit for performance-critical partsof your engine.

Keep in Mind

There’s somethingseductive about creating yourown language or system-within-a-system. I’ll be doing a minimal example here,but in the real world, these things tend to grow like vines.

For me, game development is seductive in the same way. In both cases, I’mstriving to create a virtual space for others to play and be creative in.

Every time I see someone define a little language or a scripting system, theysay, “Don’t worry, it will be tiny.” Then, inevitably, they add more and morelittle features until it’s a full-fledgedlanguage.Except, unlike some other languages, it grew in an ad-hoc, organic fashion andhas all of the architectural elegance of a shanty town.

For example, see every templating language ever.

Of course, there’s nothingwrong with making a full-fledged language. Justmake sure you do so deliberately. Otherwise, be very careful to control thescope of what your bytecode can express. Put a short leash on it before it runsaway from you.

You’ll need a front-end

Low-level bytecode instructions are great for performance, but a binary bytecodeformat isnot what your users are going to author. One reason we’re movingbehavior out of code is so that we can express it at ahigher level. If C++ istoo low-level, making your users effectively write inassembly language — even one of your own design — isn’tan improvement!

Challenging that assertion is the venerable gameRoboWar. In that game,players writelittle programs to control a robot in a language very similar to assembly andthe kind of instruction sets we’ll be discussing here.

It was my first introduction to assembly-like languages.

Much like the Gang of Four’s Interpreter pattern, it’s assumed that you alsohave some way togenerate the bytecode. Usually, users author their behaviorin some higher-level format, and a tool translates that to the bytecode that ourvirtual machine understands. In other words, a compiler.

I know, that sounds scary. That’s why I’m mentioning it here. If you don’t havethe resources to build an authoring tool, then bytecode isn’t for you. But aswe’ll see later, it may not be as bad as you think.

You’ll miss your debugger

Programming is hard. We know what we want the machine to do, but we don’t alwayscommunicate that correctly — we write bugs. To help find and fix those, we’veamassed a pile of tools to understand what our code is doing wrong, and how toright it. We have debuggers, static analyzers, decompilers, etc. All of those tools aredesigned to work with some existing language: either machine code or somethinghigher level.

When you define your own bytecode VM, you leave those tools behind. Sure, youcan step through the VM in your debugger, but that tells you what the VMitself is doing, and not what the bytecode it’s interpreting is up to. Itcertainly doesn’t help you map that bytecode back to the high-level form it wascompiled from.

If the behavior you’re defining is simple, you can scrape by without too muchtooling to help you debug it. But as the scale of your content grows, plan toinvest real time into features that help users see what their bytecode is doing.Those features might notship in your game, butthey’ll be critical to ensure that you actuallycan ship your game.

Of course, if you want your game to be moddable, then youwill ship thosefeatures, and they’ll be even more important.

Sample Code

After the previous couple of sections, you might be surprised howstraightforward the implementation is. First, we need to craft an instructionset for our VM. Before we start thinking about bytecode and stuff, let’s justthink about it like an API.

A magical API

If we were defining spells in straight C++ code, what kind of API would we needfor that code to call into? What are the basic operations in the game enginethat spells are defined in terms of?

Most spells ultimately change one of the stats of a wizard, so we’ll start witha couple for that:

voidsetHealth(intwizard,intamount);voidsetWisdom(intwizard,intamount);voidsetAgility(intwizard,intamount);

The first parameter identifies which wizard is affected, say0 for theplayer’s and1 for their opponent. This way, healing spells can affect theplayer’s own wizard, while damaging attacks harm their nemesis. These threelittle methods cover a surprisingly wide variety of magical effects.

If the spells just silently tweaked stats, the game logic would be fine, butplaying it would bore players to tears. Let’s fix that:

voidplaySound(intsoundId);voidspawnParticles(intparticleType);

These don’t affect gameplay, but they crank up the intensity of the gameplayexperience. We could add more for camera shake, animation, etc., but this isenough to get us started.

A magical instruction set

Now let’s see how we’d turn thisprogrammatic API into something that can becontrolled from data. Let’s start small and then we’ll work our way up to thewhole shebang. For now, we’ll ditch all of the parameters to these methods.We’ll say theset___() methods always affect the player’s own wizard andalways max out the stat. Likewise, the FX operations always play a singlehard-coded sound and particle effect.

Given that, a spell is just a series of instructions. Each one identifies whichoperation you want to perform. We can enumerate them:

enumInstruction{INST_SET_HEALTH=0x00,INST_SET_WISDOM=0x01,INST_SET_AGILITY=0x02,INST_PLAY_SOUND=0x03,INST_SPAWN_PARTICLES=0x04};

To encode a spell in data, we store an array ofenum values. We’ve only gota few different primitives, so the range ofenum values easily fits into a byte.This means the code for a spell is just a list ofbytes — ergo“bytecode”.

A sequence of bytecode instructions: 0x00 HEALTH, 0x03 SOUND, 0x004 PARTICLES, ...

Some bytecode VMs use more than a single byte for each instruction and have morecomplicated rules for how they are decoded. Actual machine code on common chipslike x86 is a good bit more complex.

But a single byte is good enough for theJava VirtualMachine and Microsoft’sCommon Language Runtime,which forms the backbone of the .NET platform, and it’s good enough for us.

To execute a single instruction, we see which primitive it is and dispatch tothe right API method:

switch(instruction){caseINST_SET_HEALTH:setHealth(0,100);break;caseINST_SET_WISDOM:setWisdom(0,100);break;caseINST_SET_AGILITY:setAgility(0,100);break;caseINST_PLAY_SOUND:playSound(SOUND_BANG);break;caseINST_SPAWN_PARTICLES:spawnParticles(PARTICLE_FLAME);break;}

In this way, our interpreter forms the bridge between code world and data world.We can wrap this in a little VM that executes an entire spell like so:

classVM{public:voidinterpret(charbytecode[],intsize){for(inti=0;i<size;i++){charinstruction=bytecode[i];switch(instruction){// Cases for each instruction...}}}};

Type that in and you’ll have written your first virtual machine. Unfortunately,it’s not very flexible. We can’t define a spell that touches the player’sopponent or lowers a stat. We can only play one sound!

To get something that starts to have the expressive feel of an actual language,we need to get parameters in here.

A stack machine

To execute a complex nested expression, you start with the innermostsubexpressions. You calculate those, and the results flow outward as arguments tothe expressions that contain them until eventually, the whole expression has beenevaluated.

The Interpreter pattern models this explicitly as a tree of nested objects, butwe want the speed of a flat list of instructions. We still need to ensureresults from subexpressions flow to the right surrounding expressions. But,since our data is flattened, we’ll have to use theorder of the instructionsto control that. We’ll do it the same way your CPU does — with a stack.

This architecture is unimaginatively called astackmachine. Programming languageslikeForth,PostScript, andFactor expose thismodel directly to the user.

classVM{public:VM():stackSize_(0){}// Other stuff...private:staticconstintMAX_STACK=128;intstackSize_;intstack_[MAX_STACK];};

The VM maintains an internal stack of values. In our example, the only kinds ofvalues our instructions work with are numbers, so we can use a simple arrayofints. Whenever a bit of data needs to work its way from one instruction toanother, it gets there through the stack.

Like the name implies, values can be pushed onto or popped off of the stack, solet’s add a couple of methods for that:

classVM{private:voidpush(intvalue){// Check for stack overflow.assert(stackSize_<MAX_STACK);stack_[stackSize_++]=value;}intpop(){// Make sure the stack isn't empty.assert(stackSize_>0);returnstack_[--stackSize_];}// Other stuff...};

When an instruction needs to receive parameters, it pops them off the stack likeso:

switch(instruction){caseINST_SET_HEALTH:{intamount=pop();intwizard=pop();setHealth(wizard,amount);break;}caseINST_SET_WISDOM:caseINST_SET_AGILITY:// Same as above...caseINST_PLAY_SOUND:playSound(pop());break;caseINST_SPAWN_PARTICLES:spawnParticles(pop());break;}

To get some valuesonto that stack, we need one more instruction: a literal.It represents a raw integer value. But where doesit get its value from? Howdo we avoid some turtles-all-the-way-down infinite regress here?

The trick is to take advantage of the fact that our instruction stream is asequence of bytes — we can stuff the number directly in the byte array.We define another instruction type for a number literal like so:

caseINST_LITERAL:{// Read the next byte from the bytecode.intvalue=bytecode[++i];push(value);break;}

Here, I’m reading a single byte for the value to avoid the fiddly coderequired to decode a multiple-byte integer, but in a real implementation, you’llwant to support literals that cover your full numeric range.

Binary encoding of a literal instruction: 0x05 (LITERAL) followed by 123 (the value).

It reads the nextbyte in the bytecode streamas anumber and pushes it onto the stack.

Let’s string a few of these instructions together and watch the interpreterexecute them to get a feel for how the stack works. We start with an empty stackand the interpreter pointing to the first instruction:

Executing a bytecode sequence. The execution pointer points to the first literal instruction and the stack is empty.

First, it executes the firstINST_LITERAL. That reads the next byte from thebytecode (0) and pushes it onto the stack:

The next step. The literal 0 has been pushed onto the stack and the execution pointer is on the next literal.

Then, it executes the secondINST_LITERAL. That reads the10 and pushes it:

The next step. Now 10 has been pushed onto the stack and the execution pointer is at the Health instruction.

Finally, it executesINST_SET_HEALTH. That pops10 and stores it inamount, then pops0 and stores it inwizard. Then, it callssetHealth()with those parameters.

Ta-da! We’ve got a spell that sets the player’s wizard’s health to ten points.Now, we’ve got enough flexibility to define spells that set either wizard’s statsto whatever amounts we want. We can also play different sounds and spawnparticles.

But… this still feels like adata format. We can’t, for example, raisea wizard’s health by half of their wisdom. Our designers want to be able toexpressrules for spells, not justvalues.

Behavior = composition

If we think of our little VM like a programming language, all it supports now isa couple of built-in functions and constant parameters for them. To get bytecodeto feel likebehavior, what we’re missing iscomposition.

Our designers need to be able to create expressions that combine differentvalues in interesting ways. For a simple example, they want spells that modify astatby a certain amount instead ofto a certain amount.

That requires taking into account a stat’s current value. We have instructionsforwriting a stat, but we need to add a couple toread stats:

caseINST_GET_HEALTH:{intwizard=pop();push(getHealth(wizard));break;}caseINST_GET_WISDOM:caseINST_GET_AGILITY:// You get the idea...

As you can see, these work with the stack in both directions. They pop aparameter to determine which wizard to get the stat for, and then they look up the stat’svalue and push that back onto the stack.

This lets us write spells that copy stats around. We could create a spell thatset a wizard’s agility to their wisdom or a strange incantation that set onewizard’s health to mirror his opponent’s.

Better, but still quite limited. Next, we need arithmetic. It’s time our baby VMlearned how to add 1 + 1. We’ll add a few more instructions. By now, you’veprobably got the hang of it and can guess how they look. I’ll just showaddition:

caseINST_ADD:{intb=pop();inta=pop();push(a+b);break;}

Like our other instructions, it pops a couple of values, does a bit of work, andthen pushes the result back. Up until now, every new instruction gave us anincremental improvement in expressiveness, but we just made a big leap. It isn’tobvious, but we can now handle all sorts of complicated, deeply nestedarithmetic expressions.

Let’s walk through a slightly more complex example. Say we want a spell thatincreases the player’s wizard’s health by the average of their agility andwisdom. In code, that’s:

setHealth(0,getHealth(0)+(getAgility(0)+getWisdom(0))/2);

You might think we’d need instructions to handle the explicit grouping thatparentheses give you in the expression here, but the stack supports thatimplicitly. Here’s how you could evaluate this by hand:

  1. Get the wizard’s current health and remember it.
  2. Get the wizard’s agility and remember it.
  3. Do the same for their wisdom.
  4. Get those last two, add them, and remember the result.
  5. Divide that by two and remember the result.
  6. Recall the wizard’s health and add it to that result.
  7. Take that result and set the wizard’s health to that value.

Do you see all of those “remembers” and “recalls”? Each “remember” correspondsto a push, and the “recalls” are pops. That means we can translate this tobytecode pretty easily. For example, the first line to get the wizard’s currenthealth is:

LITERAL 0GET_HEALTH

This bit of bytecode pushes the wizard’s health onto the stack. If wemechanically translate each line like that, we end up with a chunk of bytecodethat evaluates our original expression. To give you a feel for how theinstructions compose, I’ve done that below.

To show how the stack changes over time, we’ll walk through a sample executionwhere the wizard’s current stats are 45 health, 7 agility, and 11 wisdom. Nextto each instruction is what the stack looks like after executing it and then alittle comment explaining the instruction’s purpose:

LITERAL 0    [0]            # Wizard indexLITERAL 0    [0, 0]         # Wizard indexGET_HEALTH   [0, 45]        # getHealth()LITERAL 0    [0, 45, 0]     # Wizard indexGET_AGILITY  [0, 45, 7]     # getAgility()LITERAL 0    [0, 45, 7, 0]  # Wizard indexGET_WISDOM   [0, 45, 7, 11] # getWisdom()ADD          [0, 45, 18]    # Add agility and wisdomLITERAL 2    [0, 45, 18, 2] # DivisorDIVIDE       [0, 45, 9]     # Average agility and wisdomADD          [0, 54]        # Add average to current healthSET_HEALTH   []             # Set health to result

If you watch the stack at each step, you can see how data flows through italmost likemagic. We push0 for the wizardindex at the beginning, and it just hangs around at the bottom of the stack untilwe finally need it for the lastSET_HEALTH at the end.

Maybe my threshold for “magic” is a little too low here.

A virtual machine

I could keep going, adding more and more instructions, but this is a good placeto stop. As it is, we’ve got a nice little VM that lets us define fairlyopen-ended behavior using a simple, compact data format. While “bytecode” and“virtual machines” sound intimidating, you can see they’re often as simple as astack, a loop, and a switch statement.

Remember our original goal to have behavior be nicely sandboxed? Now that you’veseen exactly how the VM is implemented, it’s obvious that we’ve accomplishedthat. The bytecode can’t do anything malicious or reach out into weird parts ofthe game engine because we’ve only defined a few instructions that touch therest of the game.

We control how much memory it uses by how big of a stack we create, and we’recareful to make sure it can’t overflow that. We can evencontrol how muchtime it uses. In our instruction loop,we can track how many we’ve executed and bail out if it goes over some limit.

Controlling execution time isn’t necessary in our sample because we don’t haveany instructions for looping. We could limit execution time by limiting thetotal size of the bytecode. This also means our bytecode isn’t Turing-complete.

There’s just one problem left: actually creating the bytecode. So far, we’vetaken bits of pseudocode and compiled them to bytecode by hand. Unless you’vegot alot of free time, that’s not going to work in practice.

Spellcasting tools

One of our initial goals was to have ahigher-level way to author behavior,but we’ve gone and created somethinglower-level than C++. It has the runtimeperformance and safety we want, but absolutely none of the designer-friendlyusability.

To fill that gap, we need some tooling. We need a program that lets users definethe high-level behavior of a spell and then takes that and generates theappropriate low-level stack machine bytecode.

That probably sounds way harder than making the VM. Many programmers weredragged through a compilers class in college and took away from it nothing butPTSD triggered by the sight of abook with a dragonon the cover or the words “lex”and “yacc”.

I’m referring, of course, to the classic textCompilers: Principles,Techniques, and Tools.

In truth, compiling a text-based language isn’t that bad, though it’s abittoo broad of a topic to cram in here. However, you don’t have to do that. What Isaid we need is atool — it doesn’t have to be acompiler whose inputformat is atext file.

On the contrary, I encourage you to consider building a graphical interface tolet users define their behavior, especially if the people using it won’t behighly technical. Writing text that’s free of syntax errors is difficult forpeople who haven’t spent years getting used to a compiler yelling at them.

Instead, you can build an app that lets users “script” by clicking and dragginglittle boxes, pulling down menu items, or whatever else makes sense for thekind of behavior you want them to create.

A mock-up of a little tree-based UI for authoring behavior.

The scripting system I wrote forHenry Hatsworth in the PuzzlingAdventure worked like this.

The nice thing about this is that your UI can make it impossible for users tocreate“invalid” programs. Instead of vomiting errormessages on them, you can proactively disable buttons or provide defaultvalues to ensure that the thing they’ve created is valid at all points in time.

I want to stress how important error-handling is. As programmers, we tend toview human error as a shameful personality flaw that we strive to eliminate inourselves.

To make a system that users enjoy, you have to embrace their humanity,including their fallibility. Making mistakes is what people do, and is afundamental part of the creative process. Handling them gracefully with featureslike undo helps your users be more creative and create better work.

This spares you from designing a grammar and writing a parser for a littlelanguage. But, I know, some of you find UI programming equally unpleasant. Well,in that case, I don’t have any good news for you.

Ultimately, this pattern is about expressing behavior in a user-friendly,high-level way. You have to craft the user experience. To execute the behaviorefficiently, you then need to translate that into a lower-level form. It is realwork, but if you’re up to the challenge, it can pay off.

Design Decisions

Itried to keep this chapter as simple as I could,but what we’re really doing is creating a language. That’s a pretty open-endeddesign space. Exploring it can be tons of fun, so make sure you don’t forget tofinish your game.

Since this is the longest chapter in the book, it seems I failed that task.

How do instructions access the stack?

Bytecode VMs come in two main flavors: stack-based and register-based. In astack-based VM, instructions always work from the top of the stack, like in oursample code. For example,INST_ADD pops two values, adds them, and pushes theresult.

Register-based VMs still have a stack. The only difference is that instructionscan read their inputs from deeper in the stack. Instead ofINST_ADD alwayspoppingits operands, it has two indexes stored in the bytecode that identify where inthe stack to read the operands from.

So which should you do? My recommendation is to stick with a stack-based VM.They’re simpler to implement and much simpler to generate code for.Register-based VMs got a reputation for being a bit faster after Lua convertedto that style, but it dependsdeeply on your actual instructions and on lots ofother details of your VM.

What instructions do you have?

Your instruction set defines the boundaries of what can and cannot be expressedin bytecode, and it also has a big impact on the performance of your VM. Here’s alaundry list of the different kinds of instructions you may want:

How are values represented?

Our sample VM only works with one kind of value, integers. That makes answering this easy —the stack is just a stack ofints. A more full-featured VM will supportdifferent data types: strings, objects, lists, etc. You’ll have to decide howthose are stored internally.

My recommendation is that if you can stick with a single data type, do that. Otherwise,do a tagged union. That’s what almost every language interpreter in the worlddoes.

How is the bytecode generated?

I saved the most important question for last. I’ve walked you through the codetoconsume andinterpret bytecode, but it’s up to you to build something toproduce it. The typical solution here is to write a compiler, but it’s not theonly option.

See Also


[8]ページ先頭

©2009-2025 Movatter.jp