Give behavior the flexibility of data by encoding it as instructions for avirtual machine.
Making games may be fun, but it certainly ain’t easy. Modern games require
I worked on a game that had six million lines of C++ code. For comparison, thesoftware controlling the Mars Curiosity rover is less than half that.
At the same time, we’re expected to squeeze every drop of performance out of theplatform. Games push hardware like nothing else, and we have to optimizerelentlessly just to keep pace with the competition.
To handle these high stability and performance requirements, we reach forheavyweight languages like C++ that have both low-level expressiveness to makethe most of the hardware and rich type systems to prevent or at least corralbugs.
We pride ourselves on our skill at this, but it has its cost. Being a proficientprogrammer takes years of dedicated training, after which you must contend withthe sheer scale of your codebase. Build times for large games can vary somewherebetween “go get a coffee” and “go roast your own beans, hand-grind them, pull anespresso, foam some milk, and practice your latte art in the froth”.
On top of these challenges, games have one more nasty constraint:fun. Playersdemand a play experience that’s both novel and yet carefully balanced. Thatrequires constant iteration, but if every tweak requires bugging an engineer tomuck around in piles of low-level code and then waiting for a glacial recompile,you’ve killed your creative flow.
Let’s say we’re working on a magic-based fighting game. A pair of wizards squareoff and fling enchantments at each other until a victor is pronounced. We coulddefine these spells in code, but that means an engineer has to be involved everytime one is modified. When a designer wants to tweak a few numbers and get afeel for them, they have to recompile the entire game, reboot it, and get backinto a fight.
Like most games these days, we also need to be able to update the game after it ships,both to fix bugs and to add new content. If all of these spells are hard-coded,then updating them means patching the actual game executable.
Let’s take things a bit further and say that we also want to supportmodding.We wantusers to be able to create their own spells. If those are in code,that means every modder needs a full compiler toolchain to build the game, andwe have to release the sources. Worse, if they have a bug in their spell, it cancrash the game on some other player’s machine.
It’s pretty clear that our engine’s implementation language isn’t the right fit.We need spells to be safely sandboxed from the core game. We want them to beeasy to modify, easy to reload, and physically separate from the rest of theexecutable.
I don’t know about you, but to me that sounds a lot likedata. If we candefine our behavior in separate data files that the game engine loads and“executes” in some way, we can achieve all of our goals.
We just need to figure out what “execute” means for data. How do you make somebytes in a file express behavior? There are a few ways to do this. I think itwill help you get a picture ofthis pattern’s strengths and weaknesses if wecompare it to another one: theInterpreter pattern.
I could write a whole chapter on this pattern, but four other guys alreadycovered that for me. Instead, I’ll cram the briefest of introductions in here.It starts with a language — thinkprogramming language — that you want toexecute. Say, for example, it supports arithmetic expressions like this:
(1 + 2) * (3 - 4)
Then, you take each piece of that expression, each rule in the language’sgrammar, and turn it into anobject. The number literals will be objects:
Basically, they’re little wrappers around the raw value. The operators will beobjects too, and they’ll have references to their operands. If you take intoaccount the parentheses and precedence, that expression
What “magic” is this? It’s simple — parsing. A parser takes a string ofcharacters and turns it into anabstract syntax tree, a collection of objectsrepresenting the grammatical structure of the text.
Whip up one of these and you’ve got yourself half of a compiler.
The Interpreter pattern isn’t aboutcreating that tree; it’s aboutexecutingit. The way it works is pretty clever. Each object in the tree is an expressionor a subexpression. In true object-oriented fashion, we’ll let expressionsevaluate themselves.
First, we define a base interface that all expressions implement:
classExpression{public:virtual~Expression(){}virtualdoubleevaluate()=0;};
Then, we define a class that implements this interface for each kind of expression in ourlanguage’s grammar. The simplest one is numbers:
classNumberExpression:publicExpression{public:NumberExpression(doublevalue):value_(value){}virtualdoubleevaluate(){returnvalue_;}private:doublevalue_;};
A literal number expression simply evaluates to its value. Addition andmultiplication are a bit more complex because they contain subexpressions.Before they can evaluate themselves, they need to recursively evaluate theirsubexpressions. Like so:
classAdditionExpression:publicExpression{public:AdditionExpression(Expression*left,Expression*right):left_(left),right_(right){}virtualdoubleevaluate(){// Evaluate the operands.doubleleft=left_->evaluate();doubleright=right_->evaluate();// Add them.returnleft+right;}private:Expression*left_;Expression*right_;};
I’m sure you can figure out what the implementation of multiply looks like.
Pretty neat right? Just a couple of simple classes and now we can represent andevaluate arbitrarily complex arithmetic expressions. We just need to create theright objects and wire them up correctly.
Ruby was implemented like this for something like 15 years. At version 1.9, theyswitched to bytecode like this chapter describes. Look how much time I’m savingyou!
It’s abeautiful, simple pattern, but it has someproblems. Look up at the illustration. What do you see? Lots of little boxes,and lots of arrows between them. Code is represented as a sprawling fractal treeof tiny objects. That has some unpleasant consequences:
Loading it from disk requires instantiating and wiring up tons of these small objects.
Those objects and the pointers between them use a lot ofmemory. On a 32-bit machine, that little arithmetic expression up there takes up at least 68 bytes, not including padding.
If you’re playing along at home, don’t forget to take into account thevtable pointers.
Traversing the pointers into subexpressions is murder on yourdata cache. Meanwhile, all of those virtual method calls wreak carnage on your instruction cache.
See the chapter onDataLocality for more on what the cache is and how it affects yourperformance.
Put those together, and what do they spell? S-L-O-W. There’s a reason mostprogramming languages in wide use aren’t based on the Interpreter pattern. It’sjust too slow, and it uses up too much memory.
Consider our game. When we run it, the player’s computer doesn’t traverse abunch of C++ grammar tree structures at runtime. Instead, we compile it ahead oftime to machine code, and the CPU runs that. What’s machine code got going forit?
It’s dense. It’s a solid, contiguous blob of binary data, and no bit goes to waste.
It’s linear. Instructions are packed together and executed one right after another. No jumping around in memory (unless you’re doing actual control flow, of course).
It’s low-level. Each instruction does one relatively minimal thing, and interesting behavior comes fromcomposing them.
It’s fast. As a consequence of all of these (well, and the fact that it’s implemented directly in hardware), machine code runs like the wind.
This sounds swell, but we don’t want actual machine code for our spells. Lettingusers provide machine code which our game executes is just begging for
What if instead of loading actual machine code and executing it directly, wedefined our ownvirtual machine code? We’d then write a little emulator for itin our game. It would be similar to machine code — dense, linear, relativelylow-level — but would also be handled entirely by our game so we could safelysandbox it.
This is why many game consoles and iOS don’t allow programs to execute machinecode loaded or generated at runtime. That’s a drag because the fastestprogramming language implementations do exactly that. They contain a“just-in-time” compiler, orJIT, that translates the language to optimizedmachine code on the fly.
We’d call our little emulator avirtual machine(or “VM” for short), and the synthetic binary machine code it runsbytecode.It’s got the flexibility and ease of use of defining things in data, but it has betterperformance than higher-level representations like the Interpreter pattern.
In programming language circles, “virtual machine” and “interpreter” aresynonymous, and I use them interchangeably here. When I refer to the Gang ofFour’s Interpreter pattern, I’ll use “pattern” to make it clear.
This sounds daunting, though. My goal for the rest of this chapter is to showyou that if you keep your feature list pared down, it’s actually prettyapproachable. Even if you end up not using this pattern yourself, you’ll atleast have a better understanding of Lua and many other languages which areimplemented using it.
Aninstruction set defines the low-level operations that can be performed.A series of instructions is encoded as asequence of bytes. Avirtual machine executesthese instructions one at a time, using astack for intermediate values. Bycombining instructions, complex high-level behavior can be defined.
This is the most complex pattern in this book, and it’s not something to throw intoyour game lightly. Use it when you have a lot of behavior you need to define andyour game’s implementation language isn’t a good fit because:
It’s too low-level, making it tedious or error-prone to program in.
Iterating on it takes too long due to slow compile times or other tooling issues.
It has too much trust. If you want to ensure the behavior being defined can’t break the game, you need to sandbox it from the rest of the codebase.
Of course, that list describes a bunch of your game. Who doesn’t want a fasteriteration loop or more safety? However, that doesn’t come for free. Bytecode isslower than native code, so it isn’t a good fit for performance-critical partsof your engine.
There’s somethingseductive about creating yourown language or system-within-a-system. I’ll be doing a minimal example here,but in the real world, these things tend to grow like vines.
For me, game development is seductive in the same way. In both cases, I’mstriving to create a virtual space for others to play and be creative in.
Every time I see someone define a little language or a scripting system, theysay, “Don’t worry, it will be tiny.” Then, inevitably, they add more and morelittle features until it’s a full-fledgedlanguage.Except, unlike some other languages, it grew in an ad-hoc, organic fashion andhas all of the architectural elegance of a shanty town.
For example, see every templating language ever.
Of course, there’s nothingwrong with making a full-fledged language. Justmake sure you do so deliberately. Otherwise, be very careful to control thescope of what your bytecode can express. Put a short leash on it before it runsaway from you.
Low-level bytecode instructions are great for performance, but a binary bytecodeformat isnot what your users are going to author. One reason we’re movingbehavior out of code is so that we can express it at ahigher level. If C++ istoo low-level, making your users effectively write in
Challenging that assertion is the venerable gameRoboWar. In that game,players writelittle programs to control a robot in a language very similar to assembly andthe kind of instruction sets we’ll be discussing here.
It was my first introduction to assembly-like languages.
Much like the Gang of Four’s Interpreter pattern, it’s assumed that you alsohave some way togenerate the bytecode. Usually, users author their behaviorin some higher-level format, and a tool translates that to the bytecode that ourvirtual machine understands. In other words, a compiler.
I know, that sounds scary. That’s why I’m mentioning it here. If you don’t havethe resources to build an authoring tool, then bytecode isn’t for you. But aswe’ll see later, it may not be as bad as you think.
Programming is hard. We know what we want the machine to do, but we don’t alwayscommunicate that correctly — we write bugs. To help find and fix those, we’veamassed a pile of tools to understand what our code is doing wrong, and how toright it. We have debuggers, static analyzers, decompilers, etc. All of those tools aredesigned to work with some existing language: either machine code or somethinghigher level.
When you define your own bytecode VM, you leave those tools behind. Sure, youcan step through the VM in your debugger, but that tells you what the VMitself is doing, and not what the bytecode it’s interpreting is up to. Itcertainly doesn’t help you map that bytecode back to the high-level form it wascompiled from.
If the behavior you’re defining is simple, you can scrape by without too muchtooling to help you debug it. But as the scale of your content grows, plan toinvest real time into features that help users see what their bytecode is doing.Those features might notship in your game, butthey’ll be critical to ensure that you actuallycan ship your game.
Of course, if you want your game to be moddable, then youwill ship thosefeatures, and they’ll be even more important.
After the previous couple of sections, you might be surprised howstraightforward the implementation is. First, we need to craft an instructionset for our VM. Before we start thinking about bytecode and stuff, let’s justthink about it like an API.
If we were defining spells in straight C++ code, what kind of API would we needfor that code to call into? What are the basic operations in the game enginethat spells are defined in terms of?
Most spells ultimately change one of the stats of a wizard, so we’ll start witha couple for that:
voidsetHealth(intwizard,intamount);voidsetWisdom(intwizard,intamount);voidsetAgility(intwizard,intamount);
The first parameter identifies which wizard is affected, say0
for theplayer’s and1
for their opponent. This way, healing spells can affect theplayer’s own wizard, while damaging attacks harm their nemesis. These threelittle methods cover a surprisingly wide variety of magical effects.
If the spells just silently tweaked stats, the game logic would be fine, butplaying it would bore players to tears. Let’s fix that:
voidplaySound(intsoundId);voidspawnParticles(intparticleType);
These don’t affect gameplay, but they crank up the intensity of the gameplayexperience. We could add more for camera shake, animation, etc., but this isenough to get us started.
Now let’s see how we’d turn thisprogrammatic API into something that can becontrolled from data. Let’s start small and then we’ll work our way up to thewhole shebang. For now, we’ll ditch all of the parameters to these methods.We’ll say theset___()
methods always affect the player’s own wizard andalways max out the stat. Likewise, the FX operations always play a singlehard-coded sound and particle effect.
Given that, a spell is just a series of instructions. Each one identifies whichoperation you want to perform. We can enumerate them:
enumInstruction{INST_SET_HEALTH=0x00,INST_SET_WISDOM=0x01,INST_SET_AGILITY=0x02,INST_PLAY_SOUND=0x03,INST_SPAWN_PARTICLES=0x04};
To encode a spell in data, we store an array ofenum
values. We’ve only gota few different primitives, so the range ofenum
values easily fits into a byte.This means the code for a spell is just a list ofbytes — ergo“bytecode”.
Some bytecode VMs use more than a single byte for each instruction and have morecomplicated rules for how they are decoded. Actual machine code on common chipslike x86 is a good bit more complex.
But a single byte is good enough for theJava VirtualMachine and Microsoft’sCommon Language Runtime,which forms the backbone of the .NET platform, and it’s good enough for us.
To execute a single instruction, we see which primitive it is and dispatch tothe right API method:
switch(instruction){caseINST_SET_HEALTH:setHealth(0,100);break;caseINST_SET_WISDOM:setWisdom(0,100);break;caseINST_SET_AGILITY:setAgility(0,100);break;caseINST_PLAY_SOUND:playSound(SOUND_BANG);break;caseINST_SPAWN_PARTICLES:spawnParticles(PARTICLE_FLAME);break;}
In this way, our interpreter forms the bridge between code world and data world.We can wrap this in a little VM that executes an entire spell like so:
classVM{public:voidinterpret(charbytecode[],intsize){for(inti=0;i<size;i++){charinstruction=bytecode[i];switch(instruction){// Cases for each instruction...}}}};
Type that in and you’ll have written your first virtual machine. Unfortunately,it’s not very flexible. We can’t define a spell that touches the player’sopponent or lowers a stat. We can only play one sound!
To get something that starts to have the expressive feel of an actual language,we need to get parameters in here.
To execute a complex nested expression, you start with the innermostsubexpressions. You calculate those, and the results flow outward as arguments tothe expressions that contain them until eventually, the whole expression has beenevaluated.
The Interpreter pattern models this explicitly as a tree of nested objects, butwe want the speed of a flat list of instructions. We still need to ensureresults from subexpressions flow to the right surrounding expressions. But,since our data is flattened, we’ll have to use theorder of the instructionsto control that. We’ll do it the same way your CPU does —
This architecture is unimaginatively called astackmachine. Programming languageslikeForth,PostScript, andFactor expose thismodel directly to the user.
classVM{public:VM():stackSize_(0){}// Other stuff...private:staticconstintMAX_STACK=128;intstackSize_;intstack_[MAX_STACK];};
The VM maintains an internal stack of values. In our example, the only kinds ofvalues our instructions work with are numbers, so we can use a simple arrayofint
s. Whenever a bit of data needs to work its way from one instruction toanother, it gets there through the stack.
Like the name implies, values can be pushed onto or popped off of the stack, solet’s add a couple of methods for that:
classVM{private:voidpush(intvalue){// Check for stack overflow.assert(stackSize_<MAX_STACK);stack_[stackSize_++]=value;}intpop(){// Make sure the stack isn't empty.assert(stackSize_>0);returnstack_[--stackSize_];}// Other stuff...};
When an instruction needs to receive parameters, it pops them off the stack likeso:
switch(instruction){caseINST_SET_HEALTH:{intamount=pop();intwizard=pop();setHealth(wizard,amount);break;}caseINST_SET_WISDOM:caseINST_SET_AGILITY:// Same as above...caseINST_PLAY_SOUND:playSound(pop());break;caseINST_SPAWN_PARTICLES:spawnParticles(pop());break;}
To get some valuesonto that stack, we need one more instruction: a literal.It represents a raw integer value. But where doesit get its value from? Howdo we avoid some turtles-all-the-way-down infinite regress here?
The trick is to take advantage of the fact that our instruction stream is asequence of bytes — we can stuff the number directly in the byte array.We define another instruction type for a number literal like so:
caseINST_LITERAL:{// Read the next byte from the bytecode.intvalue=bytecode[++i];push(value);break;}
Here, I’m reading a single byte for the value to avoid the fiddly coderequired to decode a multiple-byte integer, but in a real implementation, you’llwant to support literals that cover your full numeric range.
It reads the nextbyte in the bytecode streamas anumber and pushes it onto the stack.
Let’s string a few of these instructions together and watch the interpreterexecute them to get a feel for how the stack works. We start with an empty stackand the interpreter pointing to the first instruction:
First, it executes the firstINST_LITERAL
. That reads the next byte from thebytecode (0
) and pushes it onto the stack:
Then, it executes the secondINST_LITERAL
. That reads the10
and pushes it:
Finally, it executesINST_SET_HEALTH
. That pops10
and stores it inamount
, then pops0
and stores it inwizard
. Then, it callssetHealth()
with those parameters.
Ta-da! We’ve got a spell that sets the player’s wizard’s health to ten points.Now, we’ve got enough flexibility to define spells that set either wizard’s statsto whatever amounts we want. We can also play different sounds and spawnparticles.
But… this still feels like adata format. We can’t, for example, raisea wizard’s health by half of their wisdom. Our designers want to be able toexpressrules for spells, not justvalues.
If we think of our little VM like a programming language, all it supports now isa couple of built-in functions and constant parameters for them. To get bytecodeto feel likebehavior, what we’re missing iscomposition.
Our designers need to be able to create expressions that combine differentvalues in interesting ways. For a simple example, they want spells that modify astatby a certain amount instead ofto a certain amount.
That requires taking into account a stat’s current value. We have instructionsforwriting a stat, but we need to add a couple toread stats:
caseINST_GET_HEALTH:{intwizard=pop();push(getHealth(wizard));break;}caseINST_GET_WISDOM:caseINST_GET_AGILITY:// You get the idea...
As you can see, these work with the stack in both directions. They pop aparameter to determine which wizard to get the stat for, and then they look up the stat’svalue and push that back onto the stack.
This lets us write spells that copy stats around. We could create a spell thatset a wizard’s agility to their wisdom or a strange incantation that set onewizard’s health to mirror his opponent’s.
Better, but still quite limited. Next, we need arithmetic. It’s time our baby VMlearned how to add 1 + 1. We’ll add a few more instructions. By now, you’veprobably got the hang of it and can guess how they look. I’ll just showaddition:
caseINST_ADD:{intb=pop();inta=pop();push(a+b);break;}
Like our other instructions, it pops a couple of values, does a bit of work, andthen pushes the result back. Up until now, every new instruction gave us anincremental improvement in expressiveness, but we just made a big leap. It isn’tobvious, but we can now handle all sorts of complicated, deeply nestedarithmetic expressions.
Let’s walk through a slightly more complex example. Say we want a spell thatincreases the player’s wizard’s health by the average of their agility andwisdom. In code, that’s:
setHealth(0,getHealth(0)+(getAgility(0)+getWisdom(0))/2);
You might think we’d need instructions to handle the explicit grouping thatparentheses give you in the expression here, but the stack supports thatimplicitly. Here’s how you could evaluate this by hand:
Do you see all of those “remembers” and “recalls”? Each “remember” correspondsto a push, and the “recalls” are pops. That means we can translate this tobytecode pretty easily. For example, the first line to get the wizard’s currenthealth is:
LITERAL 0GET_HEALTH
This bit of bytecode pushes the wizard’s health onto the stack. If wemechanically translate each line like that, we end up with a chunk of bytecodethat evaluates our original expression. To give you a feel for how theinstructions compose, I’ve done that below.
To show how the stack changes over time, we’ll walk through a sample executionwhere the wizard’s current stats are 45 health, 7 agility, and 11 wisdom. Nextto each instruction is what the stack looks like after executing it and then alittle comment explaining the instruction’s purpose:
LITERAL 0 [0] # Wizard indexLITERAL 0 [0, 0] # Wizard indexGET_HEALTH [0, 45] # getHealth()LITERAL 0 [0, 45, 0] # Wizard indexGET_AGILITY [0, 45, 7] # getAgility()LITERAL 0 [0, 45, 7, 0] # Wizard indexGET_WISDOM [0, 45, 7, 11] # getWisdom()ADD [0, 45, 18] # Add agility and wisdomLITERAL 2 [0, 45, 18, 2] # DivisorDIVIDE [0, 45, 9] # Average agility and wisdomADD [0, 54] # Add average to current healthSET_HEALTH [] # Set health to result
If you watch the stack at each step, you can see how data flows through italmost likemagic. We push0
for the wizardindex at the beginning, and it just hangs around at the bottom of the stack untilwe finally need it for the lastSET_HEALTH
at the end.
Maybe my threshold for “magic” is a little too low here.
I could keep going, adding more and more instructions, but this is a good placeto stop. As it is, we’ve got a nice little VM that lets us define fairlyopen-ended behavior using a simple, compact data format. While “bytecode” and“virtual machines” sound intimidating, you can see they’re often as simple as astack, a loop, and a switch statement.
Remember our original goal to have behavior be nicely sandboxed? Now that you’veseen exactly how the VM is implemented, it’s obvious that we’ve accomplishedthat. The bytecode can’t do anything malicious or reach out into weird parts ofthe game engine because we’ve only defined a few instructions that touch therest of the game.
We control how much memory it uses by how big of a stack we create, and we’recareful to make sure it can’t overflow that. We can even
Controlling execution time isn’t necessary in our sample because we don’t haveany instructions for looping. We could limit execution time by limiting thetotal size of the bytecode. This also means our bytecode isn’t Turing-complete.
There’s just one problem left: actually creating the bytecode. So far, we’vetaken bits of pseudocode and compiled them to bytecode by hand. Unless you’vegot alot of free time, that’s not going to work in practice.
One of our initial goals was to have ahigher-level way to author behavior,but we’ve gone and created somethinglower-level than C++. It has the runtimeperformance and safety we want, but absolutely none of the designer-friendlyusability.
To fill that gap, we need some tooling. We need a program that lets users definethe high-level behavior of a spell and then takes that and generates theappropriate low-level stack machine bytecode.
That probably sounds way harder than making the VM. Many programmers weredragged through a compilers class in college and took away from it nothing butPTSD triggered by the sight of abook with a dragonon the cover or the words “lex”and “yacc”.
I’m referring, of course, to the classic textCompilers: Principles,Techniques, and Tools.
In truth, compiling a text-based language isn’t that bad, though it’s abittoo broad of a topic to cram in here. However, you don’t have to do that. What Isaid we need is atool — it doesn’t have to be acompiler whose inputformat is atext file.
On the contrary, I encourage you to consider building a graphical interface tolet users define their behavior, especially if the people using it won’t behighly technical. Writing text that’s free of syntax errors is difficult forpeople who haven’t spent years getting used to a compiler yelling at them.
Instead, you can build an app that lets users “script” by clicking and dragginglittle boxes, pulling down menu items, or whatever else makes sense for thekind of behavior you want them to create.
The scripting system I wrote forHenry Hatsworth in the PuzzlingAdventure worked like this.
The nice thing about this is that your UI can make it impossible for users tocreate“invalid” programs. Instead of vomiting errormessages on them, you can proactively disable buttons or provide defaultvalues to ensure that the thing they’ve created is valid at all points in time.
I want to stress how important error-handling is. As programmers, we tend toview human error as a shameful personality flaw that we strive to eliminate inourselves.
To make a system that users enjoy, you have to embrace their humanity,including their fallibility. Making mistakes is what people do, and is afundamental part of the creative process. Handling them gracefully with featureslike undo helps your users be more creative and create better work.
This spares you from designing a grammar and writing a parser for a littlelanguage. But, I know, some of you find UI programming equally unpleasant. Well,in that case, I don’t have any good news for you.
Ultimately, this pattern is about expressing behavior in a user-friendly,high-level way. You have to craft the user experience. To execute the behaviorefficiently, you then need to translate that into a lower-level form. It is realwork, but if you’re up to the challenge, it can pay off.
Itried to keep this chapter as simple as I could,but what we’re really doing is creating a language. That’s a pretty open-endeddesign space. Exploring it can be tons of fun, so make sure you don’t forget tofinish your game.
Since this is the longest chapter in the book, it seems I failed that task.
Bytecode VMs come in two main flavors: stack-based and register-based. In astack-based VM, instructions always work from the top of the stack, like in oursample code. For example,INST_ADD
pops two values, adds them, and pushes theresult.
Register-based VMs still have a stack. The only difference is that instructionscan read their inputs from deeper in the stack. Instead ofINST_ADD
alwayspoppingits operands, it has two indexes stored in the bytecode that identify where inthe stack to read the operands from.
With a stack-based VM:
Instructions are small. Since each instruction implicitly finds its arguments on top of the stack, you don’t need to encode any data for that. This means each instruction can be pretty small, usually a single byte.
Code generation is simpler. When you get around to writing the compiler or tool that outputs bytecode, you’ll find it simpler to generate stack-based bytecode. Since each instruction implicitly works from the top of the stack, you just need to output instructions in the right order to pass parameters between them.
You have more instructions. Each instruction only sees the very top of the stack. This means that to generate code for something likea = b + c
, you need separate instructions to moveb
andc
to the top of the stack, perform the operation, then move the result intoa
.
With a register-based VM:
Instructions are larger. Since instructions need arguments for stack offsets, a single instruction needs more bits. For example, an instruction inLua — probably the most well-known register-based VM — is a full 32-bits. It uses 6 bits for the instruction type, and the rest are arguments.
The Lua folks don’t specify Lua’s bytecode format, and it changes from version toversion. What I’m describing here is true as of Lua 5.1. For anabsolutely amazing deep dive into Lua’s internals, readthis.
You have fewer instructions. Since each instruction can do more work, you don’t need as many of them. Some say you get a performance improvement since you don’t have to shuffle values around in the stack as much.
So which should you do? My recommendation is to stick with a stack-based VM.They’re simpler to implement and much simpler to generate code for.Register-based VMs got a reputation for being a bit faster after Lua convertedto that style, but it dependsdeeply on your actual instructions and on lots ofother details of your VM.
Your instruction set defines the boundaries of what can and cannot be expressedin bytecode, and it also has a big impact on the performance of your VM. Here’s alaundry list of the different kinds of instructions you may want:
External primitives. These are the ones that reach out of the VM into the rest of the game engine and do stuff that the user can see. They control what kinds of real behavior can be expressed in bytecode. Without these, your VM can’t do anything more than burn CPU cycles.
Internal primitives. These manipulate values inside the VM — things like literals, arithmetic, comparison operators, and instructions that juggle the stack around.
Control flow. Our example didn’t cover these, but when you want behavior that’s imperative and conditionally executes instructions or loops and executes instructions more than once, you need control flow. In the low-level language of bytecode, they’re surprisingly simple: jumps.
In our instruction loop, we had an index to track where we were in thebytecode. All a jump instruction does is modify that variable and changewhere we’re currently executing. In other words, it’s agoto
. You canbuild all kinds of higher-level control flow using that.
Abstraction. If your users start defining alot of stuff in data, eventually they’ll want to start reusing bits of bytecode instead of having to copy and paste it. You may want something like callable procedures.
In their simplest form, procedures aren’t much more complex than a jump. The onlydifference is that the VM maintains a secondreturn stack. When itexecutes a “call” instruction, it pushes the current instruction index ontothe return stack and then jumps to the called bytecode. When it hits a “return”,the VM pops the index from the return stack and jumps back to it.
Our sample VM only works with one kind of value, integers. That makes answering this easy —the stack is just a stack ofint
s. A more full-featured VM will supportdifferent data types: strings, objects, lists, etc. You’ll have to decide howthose are stored internally.
A single datatype:
It’s simple. You don’t have to worry about tagging, conversions, or type-checking.
You can’t work with different data types. This is the obvious downside. Cramming different types into a single representation — think storing numbers as strings — is asking for pain.
A tagged variant:
This is the common representation for dynamically typed languages. Everyvalue has two pieces. The first is a type tag — anenum
— that identifieswhat data type is being stored. The rest of the bits are then interpretedappropriately according to that type, like:
enumValueType{TYPE_INT,TYPE_DOUBLE,TYPE_STRING};structValue{ValueTypetype;union{intintValue;doubledoubleValue;char*stringValue;};};
Values know their type. The nice thing about this representation is that you can check the type of a value at runtime. That’s important for dynamic dispatch and for ensuring that you don’t try to perform operations on types that don’t support it.
It takes more memory. Every value has to carry around a few extra bits with it to identify its type. In something as low-level as a VM, a few bits here and there add up quickly.
An untagged union:
This uses a union like the previous form, but it doesnot have a type tagthat goes along with it. You have a little blob of bits that could representmore than one type, and it’s up to you to ensure you don’t misinterpretthem.
This is howstatically typed languages representthings in memory. Since the type system ensures at compile time that youaren’t misinterpreting values, you don’t need to validate it at runtime.
This is also howuntyped languages like assembly and Forth store values.Those languages leave it to theuser to make sure they don’t write codethat misinterprets a value’s type. Not for the faint of heart!
It’s compact. You can’t get any more efficient than storing just the bits you need for the value itself.
It’s fast. Not having type tags implies you’re not spending cycles checking them at runtime either. This is one of the reasons statically typed languages tend to be faster than dynamic ones.
It’s unsafe.This is the real cost, of course. A bad chunk of bytecode that causes you to misinterpret a value and treat a number like a pointer or vice versa can violate the security of your game or make it crash.
If your bytecode was compiled from a statically typed language, youmight think you’re safe here because the compiler won’t generate unsafebytecode. That may be true, but remember that malicious users may hand-craftevil bytecode without going through your compiler.
That’s why, for example, the Java Virtual Machine has to dobytecodeverification when it loads a program.
An interface:
The object-oriented solution for a value that maybe be one of severaldifferent types is through polymorphism. An interface provides virtualmethods for the various type tests and conversions, along the lines of:
classValue{public:virtual~Value(){}virtualValueTypetype()=0;virtualintasInt(){// Can only call this on ints.assert(false);return0;}// Other conversion methods...};
Then you have concrete classes for each specific data type, like:
classIntValue:publicValue{public:IntValue(intvalue):value_(value){}virtualValueTypetype(){returnTYPE_INT;}virtualintasInt(){returnvalue_;}private:intvalue_;};
It’s open-ended. You can define new value types outside of the core VM as long as they implement the base interface.
It’s object-oriented. If you adhere to OOP principles, this does things the “right” way and uses polymorphic dispatch for type-specific behavior instead of something like switching on a type tag.
It’s verbose. You have to define a separate class with all of the associated ceremonial verbiage for each data type. Note that in the previous examples, we showed the entire definition ofall of the value types. Here, we only cover one!
It’s inefficient. To get polymorphism, you have to go through a pointer, which means even tiny values like Booleans and numbers get wrapped in objects that are allocated on the heap. Every time you touch a value, you have to do a virtual method call.
In something like the core of a virtual machine, small performance hitslike this quickly add up. In fact, this suffers from many of theproblems that caused us to avoid the Interpreter pattern, except now theproblem is in ourvalues instead of ourcode.
My recommendation is that if you can stick with a single data type, do that. Otherwise,do a tagged union. That’s what almost every language interpreter in the worlddoes.
I saved the most important question for last. I’ve walked you through the codetoconsume andinterpret bytecode, but it’s up to you to build something toproduce it. The typical solution here is to write a compiler, but it’s not theonly option.
If you define a text-based language:
You have to define a syntax. Both amateur and professional language designers categorically underestimate how difficult this is to do. Defining a grammar that makes parsers happy is easy. Defining one that makesusers happy ishard.
Syntax design is user interface design, and that process doesn’t geteasier when you constrain the user interface to a string of characters.
You have to implement a parser. Despite their reputation, this part is pretty easy. Either use a parser generator like ANTLR or Bison, or — like I do — hand-roll a little recursive descent one, and you’re good to go.
You have to handle syntax errors. This is one of the most important and most difficult parts of the process. When users make syntax and semantic errors — which they will, constantly — it’s your job to guide them back onto the right path. Giving helpful feedback isn’t easy when all you know is that your parser is sitting on some unexpected punctuation.
It will likely turn off non-technical users. We programmers like text files. Combined with powerful command-line tools, we think of them as the LEGO blocks of computing — simple, but easily composable in a million ways.
Most non-programmers don’t think of plaintext like that. To them, textfiles feel like filling in tax forms for an angry robotic auditor that yells atthem if they forget a single semicolon.
If you define a graphical authoring tool:
You have to implement a user interface. Buttons, clicks, drags, stuff like that. Some cringe at the idea of this, but I personally love it. If you go down this route, it’s important to treat designing the user interface as a core part of doing your job well — not just an unpleasant task to be muddled through.
Every little bit of extra work you do here will make your tool easierand more pleasant to use, and that directly leads to better content inyour game. If you look behind many of the games you love, you’ll oftenfind the secret was fun authoring tools.
You have fewer error cases. Because the user is building behavior interactively one step at a time, your application can guide them away from mistakes as soon as they happen.
With a text-based language, the tool doesn’t seeany of the user’scontent until they throw an entire file at it. That makes it harder toprevent and handle errors.
Portability is harder. The nice thing about text compilers is that text files areuniversal. A simple compiler just reads in one file and writes one out. Porting that across operating systems is trivial.
Except for line endings. And encodings.
When you’re building a UI, you have to choose which framework to use,and many of those are specific to one OS. There are cross-platform UItoolkits too, but those often get ubiquity at the expense offamiliarity — they feel equally foreign on all of platforms.
This pattern’s close sister is the Gang of Four’sInterpreter pattern. Both give you a way to express composable behavior in terms of data.
In fact, you’ll often end up usingboth patterns. The tool you use togenerate bytecode will have an internal tree of objects that represents thecode. This is exactly what the Interpreter pattern expects.
In order to compile that to bytecode, you’ll recursively walk the tree, justlike you do to interpret it with the Interpreter pattern. Theonlydifference is that instead of executing a primitive piece of behaviorimmediately, you output the bytecode instruction to perform that later.
TheLua programming language is the most widely used scripting language in games. It’s implemented internally as a very compact register-based bytecode VM.
Kismet is a graphical scripting tool built into UnrealEd, the editor for the Unreal engine.
My own little scripting language,Wren, is a simple stack-based bytecode interpreter.