Programming game AI is hard. Like, really hard. When dealing with hard problems, humans usually try to come up with patterns, frameworks, and guidelines, that help minimize errors and keep the cognitive load at a manageable level. In the case of game AI there are (apart from the obvious “no pattern” pattern) many popular choices: state machines, hierarchical state machines, utility systems, behavior trees, and others.
They all have their pros and cons; I won’t talk about these here, mainly for the reason that I have zero experience with most of these. See these two streams by Bobby Anguelov about these patterns and what types of AI each of them is good for. See also this huge collection of articles on game AI.
Among all the AI patterns, behavior trees seemed most appealing to me, for a number of reasons:
- They are quite popular, meaning there are established best practices to learn and many implementations to be inspired by
- They are naturally hierarchical, which is good for my little monkey brain
- They are quite performant, with a lot of optimization oppotunities
- They sound like a good programming excercise (yes, I’m the reinvent-the-wheel type of person)
So, I decided to try this approach and, naturally, to design my own implementation from scratch.
Why make my own library, if there are already good implementations available? The answer is the same as to why I’m making my own game engine in the first place: it’s interesting, it’s fun, it’s a tremendeously valuable exprience, and I simply enjoy doing this.
Thie next few sections are more or less an introduction to behavior trees; feel free to skip to “Design goals” if you think you already know all that.
To begin with, what even are behavior trees? The idea goes roughly like this:
- A single AI entity is represented as a tree of nodes
- Nodes do stuff, with only one node being active (doing it’s stuff) at a time
- Leaf nodes typically do concrete stuff, like cutting down a tree or navigating to the next path node
- Inner (non-leaf) nodes typically do aggregation, like retry the child node 3 times until it succeeds, or run all the child nodes sequentially
So, you implement some concrete nodes for your specific AI needs, combine them somehow using generic aggregation nodes (hopefully provided by your behavior trees library) and you have your tree ready to go.
The crucial thing about AI is that it is asynchronous: you can’t just call it once and call it a day, you need to continuously run it. Coroutines or fibers would be perfect for this, but they are close to impossible to serialize (save & load) in a cross-platform way and hard to control precisely. Thus, behavior tree implementations usually use a tick-based approach, in some sense creating explicit coroutines: each behavior tree node has an
update() function, and every frame (or every N frames) of game simulation the current active node’s
update() is called. The
update() may have some
float dt parameter so that the node knows how much time elapsed from the previous update. Inner tree nodes’
update() can call their children’s
update() and return something based on what their children’s status was.
To communicate with the rest of the tree, the node must be able to tell it’s status: is it still running its stuff, or has it succeeded in doing its stuff, or has it failed? We can achieve this by simply returning this status from the update function.
So, putting all this together, a typical behavior tree node is a class that looks something like this:
A concrete node for, say, cutting down a tree may look like this:
and an aggregate node that runs all child nodes sequentially might look like this:
You can see that such a simple idea of running a set of actions sequentially already takes quite a handful of code to implement. This is not a downside of behavior trees specifically, though: running a sequence of asynchronous, potentially failing actions is not as simple as you might think! The strength of behavior trees comes in part from this separation & reusability of generic aggregate nodes. I’ll list a few of them:
sequencenode, which runs all children in sequence and succeeds if all children succeeded
selectornode, which runs all children in sequence until at least one of them succeds (otherwise, the node fails)
retrynode, which runs it’s child node until it succeds (and ignores all it’s failures; that would be a great trait for a parent/teacher, huh?)
repeatnode, which runs it’s child node while it succeds (and stops once it fails)
successnode, which runs it’s child until it finishes (successfully or otherwise), and reports that it succeeded
failurenode, which runs it’s child until it finishes (successfully or otherwise), and reports that it failed
negatenode, which runs it’s child until it finishes, then negates the child’s result (reports success if the child failed, and vice versa)
forevernode, which simply runs it’s child forever, ignoring it’s status (useful for e.g. the root of the tree if your AI should be active non-stop)
nopnode, which does nothing (may be useful as a fake child to other aggregate nodes)
You might notice that the
forever(child) node can be implemented as
repeat(success(child)). This compositional strength is probably what caught my eye with behavior trees in the first place. We can, for example, create a
repeat_until(condition, child) generic node that returns success if the condition node returns success, otherwise it calls the child node and reports failure if it fails:
repeat_until(condition, child) = sequence( repeat( sequence( child, negate(condition) ) ), condition )
- If the
childnode succeeds, we check for condition:
- If it is satisfied, the
condtionnode returns success, the
negatenode returns failure, the inner
sequencenode returns failure, and the
repeatnode stops repeating (i.e. calling the inner
- If it is not satisfied, the
condtionnode returns failure, the
negatenode returns success, the inner
sequencenode returns success, and the
repeatnode continues calling it’s child (the inner
- If it is satisfied, the
- If the
childnode fails, the inner
sequencenode returns failure, and the
repeatnode stops repeating
- After the
repeatnode stops, we check for the condition again, and the root
sequencenode returns whatever the second
- If the condition is satisfied, we are done and return success
- If it isn’t satisfied, the
childnode must have returned failure (otherwise the
repeatwould’ve still been repeating), and we return failure
All this might seem convoluted and definitely takes some time to get used to. However, composing simple to test atomic & aggregate nodes is way less error-prone than coding everything from scratch, and is arguably way more readable.
Note, however, that this implementation of
repeat_until doesn’t handle the situation when
child failed but the
condition is still somehow satisfied: it returns success in this case. This may be the desired behavior or it may not be. This example highlights a certain problem with behavior trees which tends to come up every now and then: there’s no simple way to communicate nontrivial information across the tree (e.g. from a child’s child to the root, etc) – a problem that would probably not arise in e.g. state machines. I never said behavior trees are perfect, did I? In this particular case, we could just implement the
repeat_until node from scratch, though, and make it work exactly as we want it to.
Starting a node
For many nodes, it is quite useful to know that a certain
update() call is the first one since this node has last finished executing, i.e. that this node is just starting to run, to prepare/fetch some useful data, check some conditions, etc. There are two approaches to managing that:
- Each node tracks explicitly whether it has already finished or not; if an
update()is called and the node has finished, it means the node is started running again
- Have an explicit
start()method for each node which gets called when the node is, well, started
The downside of the first approach is that you have to duplicate this logic in every node that needs it. The downside of the second approach is that all generic aggregate nodes must carefully implement calling their children’s
start() method in appropriate times.
For no particular reason, I’ve chosen the second approach, meaning the interface of a node becomes
Let’s upgrade our
status enum with a new value:
waiting. This value should also come with a time period, like
waiting(1.5f) meaning wait for 1.5 seconds. Semantically, this status is the same as
running, but with an explicit hint that the node can be assumed to still be running for this amount of time (at least). The cool thing is that if the tree returns
waiting status, we can skip all updates to it entirely for that duration, because we already know that it will still be running! This is a nice optimization, since AI is usually quite computationally heavy, and even traversing the tree all the way down to the node that is currently active and checking that it’s still just running comes at a cost.
We can actually do much more: store all the
waiting nodes in a priority queue, sorted by the time at which they need to be woken up. At each new AI update tick, we only call
update() for non-waiting nodes, and also “wake up” the nodes whose time has come to be woken up (which is fast thanks to the priority queue). This way, you can not only skip calling the
update() for waiting nodes, but you can skip iterating over them! This is a huge optimization, since, if done right, most of the time most of the AI agents will be just waiting for something. Moving to the next path node? Takes at least
distance / speed time, so the AI just waits (actual movement & animation should be governed by some lower-level systems, not AI). Cooking a meal? Takes
cooking_time time; wait for that long. Wandering around? Stop and wait for a random time period to make it look like you’re admiring the view. I’ve found that even in a system with a few hundred agents, only a few (3-5) of them are actually active at a certain update; others are waiting!
waiting status can’t be just another
enum value, since it comes with a time duration value. We could do something like
But it’s 2022 already and C++ has standard sum types, so I’m doing it like this (also merging the
failure states together):
Some behavior tree implementations can have more status values. For example, some have an
idle status, meaning this node isn’t running at all. I simply consider calling an
update() for a node that wasn’t started a programming error; this might sound harder to debug, but honestly I haven’t found any issues with this whatsoever (or maybe I’m misunderstanding the
idle status? Who knows).
The final touch for our behavior tree node interface is responding to events. Say, your poor AI is cutting down a tree, while suddenly it spots an angry woolf nearby. It better run away, or prepare to fight!
How do we implement this in a behavior tree? Surely we could simply check for all such conditions in the beginning of every node’s
update() methods (or maybe just the leaf nodes). Needless to say, this is ridiculously wasteful, and doesn’t work at all in the presence of the
waiting optimization we talked about above.
A much nicer way is to make the nodes explicitly respond to events – have an
on_event() method that gets the event that happened and decides what to do:
On a certain event, a node may e.g. record that it needs to fail at the next update, or something like that. We also have to immediately wake up a
waiting node when an event happens, of course.
So, what do I want from my implementation of a generic, reusable behavior trees library? Quite a lot, to be honest:
- Arbitrary time type.
floatis nice, but what if I’m making a roguelike with discrete equal time steps, and I want to use an integer for time? Or maybe I’m doing some nanorobots simulation and I have to use
doublenot to lose precision? Or maybe I’ve gone completely bonkers and my time is a
complexnumber or a
string? There definitely are use-cases for that, I promise!
- Arbitrary event type. This one is way less speculative than the time type – how does a generic library know anything about the specific AI event types? Ideally I’d like the
eventtype to be a variant of all possible event types specific to this particular AI, or something like that.
- Arbitrary extra parameters for
update(). A specific behavior tree runs for a specific agent; it needs access to this agent’s data, as well as to some world data like buildings and pathfinder. These could go to the node’s constructor, so that it would save them as members and access at any time. A much more flexible and memory-cheap approach is to put them directly to the
update()method parameters at each update: this way, we can change the pathfinder or even the entity this tree controls, and it still should work (in theory)! More importantly, we wouldn’t have to store this parameters in every node.
- Readable DSL-like tree description. Look at the
repeat_untilimplementation above: wouldn’t it be neat if this is what it actually looked like, in code? Instead of explicitly creating compound nodes as separate classes with a ton of boilerplate, we’d just write a nice behavior tree (or part of it) directly in code, with a readable & understandable syntax.
With these goals in mind, I proceeded to design this library.
Being such a C++ templates admirer, my first idea was to use the full power of compile-time ad-hoc polymorphism this language provides, and make each behavior tree node a separate template class with the interface outlined above, with no virtual methods:
When implementing, say, the
repeat node, we also have to add the child’s type to the template parameters:
sequence node would be parametrized by the types of all it’s children:
Now, the children types can be easily deduced thanks to C++17 class template argument deduction, but we still have all the other template parameters, and argument deduction can either deduce all or deduce nothing. Even if we could omit the children types, it would be a nightmare to have to write code like
We clearly need a way to deduce all this parameters as well. I’ve actually found a nice solution later, which I discuss below in the “Second attempt” section, but at the time I only managed to come up with this:
- Make a class called
behavior_treewhich is parametrized by all the common types (
Time, Event, Args...)
- Implement all the generic aggregate nodes as inner classes within this main class; they implicitly have all the template parameters of the parent
It looked roughly like this:
Now, to actually use it, you have to inherit the
behavior_tree with the required types, and implement all of your AI inside that derived class:
So, you’d implement leaf nodes as classes within the
human_behavior_tree class, and compound nodes are constructed via static methods. Here’s an actual snippet from a project where I tested this approach on:
Amusingly, it actually works! The benefits of this approach are:
- The whole tree is a single enormous object without any pointers to children (they are just class members!), so the allocator & cache are quite happy
- Everything is known at compile-time, meaning the compiler has all the opportunities to optimize, inline, and do other magic
Unfortunately, it also has a number of downsides:
- Everything is known at compile time. I can’t replace part of the tree somewhere mid-game, I can’t make a mod that e.g. replaces just a single part of the tree with a different implementation.
- You have to declare everything in a single class in a single file. C++ doesn’t support splitting a class definition across multiple files; and we need this single class to overcome that type deduction problem I mentioned before (as I’ve said, I’ve actually found a better solution for this particular problem)
- The whole behavior tree takes approximately forever to compile
- I’m relying on deducing template arguments for a class which is nested within another template class. C++ being C++, this is actually a defect in the standard, meaning it isn’t clearly specified how this should work. Clang fails to compile this and it will do so until the defect is resolved. This can be fixed by manually adding deduction guides, which in turn breaks under GCC and this isn’t going to be fixed either, due to the very same defect (without deduction guides, GCC compiles this happily, – this is why I managed to actually use this approach on a test project for quite a while).
Here’s the source for this “library”. It consists of a single 800-line file (due to having to implement everything inside the
behavior_tree template class) full of nothing but templates.
The disadvantages of this approach are quite serious, so I decided to abandon it, together with the test project.
All this was somewhere in November 2021. Fast-forward to June 2022, I’m once again working on something with agents doing their typical AI stuff like navigating and building (and I’m much more determined this time, I promise!) So, I decided to give behavior trees another try.
First, remove the child-nodes-as-template-parameters nonsense. Make it an actual interface:
Now, actual nodes implement this interface, are be stored by pointers (or, better,
unique_ptr), and created dynamically. We still have the problem deducing the template parameters, though!
That’s when it occurred to me that an aggregate node can easily deduce all the tree’s template parameters from its child nodes! We’ll only have to specify them for leaf nodes, which is not perfect but still bearable. To make things easier, I made something like a traits class that only stores these types:
Now I can specify the types of a specific AI tree like this:
and use this
droid_ai type whenever I need to refer to this particular setup of a behavior tree.
Next, all the behavior tree library templates depend not on the types
Time, Event, Args... themselves, but on this traits type, and use partial specialization:
Now, to refer to the base node type for droids, I don’t have to type
node<float, event, ai_context, entity_id>, I just say
node<droid_ai>, and I think it’s beautiful.
Finally, to make aggregate nodes work, I make special constuctor-like functions that help deduce the traits type:
And that’s it! Implementing generic nodes is still quite a handful of boilerplate (although this time each of them can live in it’s own file), but using them is a breeze. Here’s the code for the node that tries to find resources required for a droid’s task (
bt is the behavior tree library namespace):
And here’s a leaf node that waits for a specific time duration, using the generic
bt::wait library node (note that
<droid> is enough to specify the tree type):
Pretty neat, I’d say! We’ve lost the cache and allocator-friendliness, though we could restore them by supplying our own allocator for the whole tree. We’ve also lost the possibility for heavy inlining, which we can’t really do anything about. Having used this for quite a while by now I’d say that it is a pretty good approach, and seems to handle most complicated situations relatively well. Here’s the source code for this second approach. It is pretty much self-contained.