Programming game AI is hard. Like, really hard. When dealing with hard problems, humans usually try to come up with patterns, frameworks, and guidelines, that help minimize errors and keep the cognitive load at a manageable level. In the case of game AI there are (apart from the obvious "no pattern" pattern) many popular choices: state machines, hierarchical state machines, utility systems, behavior trees, and others.
They all have their pros and cons; I won't talk about these here, mainly for the reason that I have zero experience with most of these. See these two streams by Bobby Anguelov about these patterns and what types of AI each of them is good for. See also this huge collection of articles on game AI.
Among all the AI patterns, behavior trees seemed most appealing to me, for a number of reasons:
So, I decided to try this approach and, naturally, to design my own implementation from scratch.
Why make my own library, if there are already good implementations available? The answer is the same as to why I'm making my own game engine in the first place: it's interesting, it's fun, it's a tremendeously valuable exprience, and I simply enjoy doing this.To begin with, what even are behavior trees? The idea goes roughly like this:
So, you implement some concrete nodes for your specific AI needs, combine them somehow using generic aggregation nodes (hopefully provided by your behavior trees library) and you have your tree ready to go.
The crucial thing about AI is that it is asynchronous: you can't just call it once and call it a day, you need to continuously run it. Coroutines or fibers would be perfect for this, but they are close to impossible to serialize (save & load) in a cross-platform way and hard to control precisely. Thus, behavior tree implementations usually use a tick-based approach, in some sense creating explicit coroutines: each behavior tree node has an update()
function, and every frame (or every N frames) of game simulation the current active node's update()
is called. The update()
may have some float dt
parameter so that the node knows how much time elapsed from the previous update. Inner tree nodes' update()
can call their children's update()
and return something based on what their children's status was.
To communicate with the rest of the tree, the node must be able to tell it's status: is it still running its stuff, or has it succeeded in doing its stuff, or has it failed? We can achieve this by simply returning this status from the update function.
So, putting all this together, a typical behavior tree node is a class that looks something like this:
enum class status
{
running,
success,
failure,
};
struct node
{
status update(float dt)
{
// do stuff and return status
}
// ...
};
A concrete node for, say, cutting down a tree may look like this:
struct cut_tree_node
{
status update(float dt)
{
if (!tree_.valid())
{
// the tree is gone! maybe it was burned, or
// someone has already cut it down
return status::failure;
}
tree_.health -= woodcutter_.cutting_speed * dt;
if (tree_.health < 0.f)
{
woodcutter_.inventory.add_resources(tree_.resources);
tree_.remove();
return status::success;
}
return status::running;
}
private:
entity tree_;
entity woodcutter_;
};
and an aggregate node that runs all child nodes sequentially might look like this:
struct sequence_node
{
status update(float dt)
{
if (current_index_ == children_.size())
{
current_index_ = 0;
return status::success;
}
status result = children_[current_index_].update(dt);
if (result == status::running)
{
// the child node is still running; thus, we are running as well
return status::running;
}
if (result == status::failure)
{
// the child node has failed; thus, the whole sequence has failed
return status::failure;
}
// the child succeeded; thus, we proceed to the next child and say
// that we still have work to do
++current_index_;
return status::running;
}
private:
std::size_t current_index_ = 0;
std::vector<node> children_;
};
You can see that such a simple idea of running a set of actions sequentially already takes quite a handful of code to implement. This is not a downside of behavior trees specifically, though: running a sequence of asynchronous, potentially failing actions is not as simple as you might think! The strength of behavior trees comes in part from this separation & reusability of generic aggregate nodes. I'll list a few of them:
sequence
node, which runs all children in sequence and succeeds if all children succeededselector
node, which runs all children in sequence until at least one of them succeds (otherwise, the node fails)retry
node, which runs it's child node until it succeds (and ignores all it's failures; that would be a great trait for a parent or a teacher, huh?)repeat
node, which runs it's child node while it succeds (and stops once it fails)success
node, which runs it's child until it finishes (successfully or otherwise), and reports that it succeededfailure
node, which runs it's child until it finishes (successfully or otherwise), and reports that it failednegate
node, which runs it's child until it finishes, then negates the child's result (reports success if the child failed, and vice versa)forever
node, which simply runs it's child forever, ignoring it's status (useful for e.g. the root of the tree if your AI should be active non-stop)nop
node, which does nothing (may be useful as a fake child to other aggregate nodes)You might notice that the forever(child)
node can be implemented as retry(failure(child))
or repeat(success(child))
. This compositional strength is probably what caught my eye with behavior trees in the first place. We can, for example, create a repeat_until(condition, child)
generic node that returns success if the condition node returns success, otherwise it calls the child node and reports failure if it fails:
repeat_until(condition, child) =
sequence(
repeat(
sequence(
child,
negate(condition)
)
),
condition
)
child
node succeeds, we check for condition:condtion
node returns success, the negate
node returns failure, the inner sequence
node returns failure, and the repeat
node stops repeating (i.e. calling the inner sequence
node)condtion
node returns failure, the negate
node returns success, the inner sequence
node returns success, and the repeat
node continues calling it's child (the inner sequence
node)child
node fails, the inner sequence
node returns failure, and the repeat
node stops repeatingrepeat
node stops, we check for the condition again, and the root sequence
node returns whatever the second condition
node returned:child
node must have returned failure (otherwise the repeat
would've still been repeating), and we return failureAll this might seem convoluted and definitely takes some time to get used to. However, composing simple to test atomic & aggregate nodes is way less error-prone than coding everything from scratch, and is arguably way more readable.
Note, however, that this implementation of repeat_until
doesn't handle the situation when child
failed but the condition
is still somehow satisfied: it returns success in this case. This may be the desired behavior or it may not be. This example highlights a certain problem with behavior trees which tends to come up every now and then: there's no simple way to communicate nontrivial information across the tree (e.g. from a child's child to the root, etc) – a problem that would probably not arise in e.g. state machines. I never said behavior trees are perfect, did I? In this particular case, we could just implement the repeat_until
node from scratch, though, and make it work exactly as we want it to.
For many nodes, it is quite useful to know that a certain update()
call is the first one since this node has last finished executing, i.e. that this node is just starting to run, to prepare/fetch some useful data, check some conditions, etc. There are two approaches to managing that:
update()
is called and the node has finished, it means the node is started running againstart()
method for each node which gets called when the node is, well, startedThe downside of the first approach is that you have to duplicate this logic in every node that needs it. The downside of the second approach is that all generic aggregate nodes must carefully implement calling their children's start()
method in appropriate times.
For no particular reason, I've chosen the second approach, meaning the interface of a node becomes
struct node
{
void start();
status update(float dt);
};
Let's upgrade our status
enum with a new value: waiting
. This value should also come with a time period, like waiting(1.5f)
meaning wait for 1.5 seconds. Semantically, this status is the same as running
, but with an explicit hint that the node can be assumed to still be running for this amount of time (at least). The cool thing is that if the tree returns waiting
status, we can skip all updates to it entirely for that duration, because we already know that it will still be running! This is a nice optimization, since AI is usually quite computationally heavy, and even traversing the tree all the way down to the node that is currently active and checking that it's still just running comes at a cost.
We can actually do much more: store all the waiting
nodes in a priority queue, sorted by the time at which they need to be woken up. At each new AI update tick, we only call update()
for non-waiting nodes, and also "wake up" the nodes whose time has come to be woken up (which is fast thanks to the priority queue). This way, you can not only skip calling the update()
for waiting nodes, but you can skip iterating over them! This is a huge optimization, since, if done right, most of the time most of the AI agents will be just waiting for something. Moving to the next path node? Takes at least distance / speed
time, so the AI just waits (actual movement & animation should be governed by some lower-level systems, not AI). Cooking a meal? Takes cooking_time
time; wait for that long. Wandering around? Stop and wait for a random time period to make it look like you're admiring the view. I've found that even in a system with a few hundred agents, only a few (3-5) of them are actually active at a certain update; others are waiting!
Now, this waiting
status can't be just another enum
value, since it comes with a time duration value. We could do something like
struct status
{
enum {
running,
success,
failure,
} status;
float duration;
};
But it's 2022 already and C++ has standard sum types, so I'm doing it like this (also merging the success
and failure
states together):
struct running{};
struct finished
{
// true for success, false for failure
bool result;
};
struct waiting
{
float duration;
};
using status = std::variant<running, finished, waiting>;
Some behavior tree implementations can have more status values. For example, some have an idle
status, meaning this node isn't running at all. I simply consider calling an update()
for a node that wasn't started a programming error; this might sound harder to debug, but honestly I haven't found any issues with this whatsoever (or maybe I'm misunderstanding the idle
status? Who knows).
The final touch for our behavior tree node interface is responding to events. Say, your poor AI is cutting down a tree, while suddenly it spots an angry woolf nearby. It better run away, or prepare to fight!
How do we implement this in a behavior tree? Surely we could simply check for all such conditions in the beginning of every node's update()
methods (or maybe just the leaf nodes). Needless to say, this is ridiculously wasteful, and doesn't work at all in the presence of the waiting
optimization we talked about above.
A much nicer way is to make the nodes explicitly respond to events – have an on_event()
method that gets the event that happened and decides what to do:
struct node
{
void start();
status update(float dt);
void on_event(event);
};
On a certain event, a node may e.g. record that it needs to fail at the next update, or something like that. We also have to immediately wake up a waiting
node when an event happens, of course.
So, what do I want from my implementation of a generic, reusable behavior trees library? Quite a lot, to be honest:
float
is nice, but what if I'm making a roguelike with discrete equal time steps, and I want to use an integer for time? Or maybe I'm doing some nanorobots simulation and I have to use double
not to lose precision? Or maybe I've gone completely bonkers and my time is a complex
number or a string
? There definitely are use-cases for that, I promise!event
type to be a variant of all possible event types specific to this particular AI, or something like that.update()
. A specific behavior tree runs for a specific agent; it needs access to this agent's data, as well as to some world data like buildings and pathfinder. These could go to the node's constructor, so that it would save them as members and access at any time. A much more flexible and memory-cheap approach is to put them directly to the update()
method parameters at each update: this way, we can change the pathfinder or even the entity this tree controls, and it still should work (in theory)! More importantly, we wouldn't have to store this parameters in every node.repeat_until
implementation above: wouldn't it be neat if this is what it actually looked like, in code? Instead of explicitly creating compound nodes as separate classes with a ton of boilerplate, we'd just write a nice behavior tree (or part of it) directly in code, with a readable & understandable syntax.With these goals in mind, I proceeded to design this library.
Being such a C++ templates admirer, my first idea was to use the full power of compile-time ad-hoc polymorphism this language provides, and make each behavior tree node a separate template class with the interface outlined above, with no virtual methods:
template <typename Time, typename Event, typename ... Args>
struct node
{
void start(Args ... args);
status update(Time dt, Args ... args);
void on_event(Event event, Args ... args);
};
When implementing, say, the repeat
node, we also have to add the child's type to the template parameters:
template <typename Child, typename Time, typename Event, typename ... Args>
struct repeat
{
Child child;
void start(Args ... args);
status update(Time dt, Args ... args);
void on_event(Event event, Args ... args);
};
and the sequence
node would be parametrized by the types of all it's children:
template <typename ... Children, typename Time, typename Event, typename ... Args>
struct sequence
{
std::tuple<Children...> children;
void start(Args ... args);
status update(Time dt, Args ... args);
void on_event(Event event, Args ... args);
};
Now, the children types can be easily deduced thanks to C++17 class template argument deduction, but we still have all the other template parameters, and argument deduction can either deduce all or deduce nothing. Even if we could omit the children types, it would be a nightmare to have to write code like
sequence<float, event, entity_id, ai_context>(
repeat<float, event, entity_id, ai_context>(
...
),
...
)
We clearly need a way to deduce all this parameters as well. I've actually found a nice solution later, which I discuss below in the "Second attempt" section, but at the time I only managed to come up with this:
behavior_tree
which is parametrized by all the common types (Time, Event, Args...
)It looked roughly like this:
template <typename Time, typename Event, typename ... Args>
struct behavior_tree
{
template <typename Child>
struct repeat
{
Child child;
void start(Args ... args);
status update(Time dt, Args ... args);
void on_event(Event event, Args ... args);
};
template <typename ... Children>
struct sequence
{
std::tuple<Children...> children;
void start(Args ... args);
status update(Time dt, Args ... args);
void on_event(Event event, Args ... args);
};
};
Now, to actually use it, you have to inherit the behavior_tree
with the required types, and implement all of your AI inside that derived class:
struct human_behavior_tree
: behavior_tree<float, event, entity_id, ai_context>
{
static auto follow_path()
{
return sequence{
repeat{
...
},
...,
...
};
}
};
So, you'd implement leaf nodes as classes within the human_behavior_tree
class, and compound nodes are constructed via static methods. Here's an actual snippet from a project where I tested this approach on:
static auto do_catch_fish()
{
return sequence{
log("Catching fish"),
ensure_instrument(item::fishing_rod),
condition{[](world * w, int id){
auto & h = w->humans[id];
auto p = w->closest_water(h.position);
if (!p)
return false;
h.target_tile = *p;
return true;
}},
find_path{false},
follow_path(),
wait{[](world * w, int){
return random::uniform(w->rng, FISH_CATCH_INTERVAL);
}},
condition{[](world * w, int id){
auto & h = w->humans[id];
h.left_hand = item{item::fish};
return true;
}},
};
}
Amusingly, it actually works! The benefits of this approach are:
Unfortunately, it also has a number of downsides:
Here's the source for this "library". It consists of a single 800-line file (due to having to implement everything inside the behavior_tree
template class) full of nothing but templates.
The disadvantages of this approach are quite serious, so I decided to abandon it, together with the test project.
All this was somewhere in November 2021. Fast-forward to June 2022, I'm once again working on something with agents doing their typical AI stuff like navigating and building (and I'm much more determined this time, I promise!) So, I decided to give behavior trees another try.
First, remove the child-nodes-as-template-parameters nonsense. Make it an actual interface:
template <typename Time, typename Event, typename ... Args>
struct node
{
virtual void start(Args ... args) = 0;
virtual status update(Time dt, Args ... args) = 0;
virtual void on_event(Event event, Args ... args) = 0;
};
Now, actual nodes implement this interface, are stored by pointers (or, better, unique_ptr
), and created dynamically. We still have the problem deducing the template parameters, though!
That's when it occurred to me that an aggregate node can easily deduce all the tree's template parameters from its child nodes! We'll only have to specify them for leaf nodes, which is not perfect but still bearable. To make things easier, I made something like a traits class that only stores these types:
template <typename Time, typename Event, typename ... Args>
struct tree
{
using time_type = Time;
using event_type = Event;
using args_type = std::tuple<Args...>;
};
Now I can specify the types of a specific AI tree like this:
using droid_ai = tree<float, event, ai_context, entity_id>;
and use this droid_ai
type whenever I need to refer to this particular setup of a behavior tree.
Next, all the behavior tree library templates depend not on the types Time, Event, Args...
themselves, but on this traits type, and use partial specialization:
template <typename Tree>
struct node;
template <typename Time, typename Event, typename ... Args>
struct node<tree<Time, Event, Args...>>
{
struct running
{};
struct finished
{
bool result;
};
struct suspended
{
Time duration;
};
using status = std::variant<running, finished, suspended>;
virtual void start(Args ...) = 0;
virtual status update(Time dt, Args ... args) = 0;
virtual bool event(Event const &, Args ...) = 0;
virtual ~node() {}
};
template <typename Tree>
using node_ptr = std::unique_ptr<node<Tree>>;
Now, to refer to the base node type for droids, I don't have to type node<float, event, ai_context, entity_id>
, I just say node<droid_ai>
, and I think it's beautiful.
Finally, to make aggregate nodes work, I make special constuctor-like functions that help deduce the traits type:
template <typename Tree>
struct repeat_node;
template <typename Time, typename Event, typename ... Args>
struct repeat_node<tree<Time, Event, Args...>>
: node<tree<Time, Event, Args...>>
{
// ...
};
template <typename Tree>
node_ptr<Tree> repeat(node_ptr<Tree> child)
{
return std::make_unique<repeat_node<Tree>>(std::move(child));
}
And that's it! Implementing generic nodes is still quite a handful of boilerplate (although this time each of them can live in it's own file), but using them is a breeze. Here's the code for the node that tries to find resources required for a droid's task (bt
is the behavior tree library namespace):
bt::node_ptr<droid> get_task_resources()
{
return
bt::sequence(
fill_task_required_resources(),
can_find_task_resources(),
bt::selector(
task_required_resources_empty(),
bt::success(
bt::sequence(
fill_unload_resources_task(),
unload_inventory()
)
)
),
bt::repeat(
bt::sequence(
bt::negate(task_required_resources_empty()),
bt::selector(
select_task_resources_container(),
bt::failure(mark_task_unsupported())
),
wait_for_path(),
follow_path(),
do_get_task_resources(),
wait(interaction_time)
)
),
task_min_required_resources_empty()
);
}
And here's a leaf node that waits for a specific time duration, using the generic bt::wait
library node (note that <droid>
is enough to specify the tree type):
bt::node_ptr<droid> wait(float duration)
{
return bt::wait<droid>([=](droid_context const &, util::ecs::entity_handle){
return duration;
});
}
Pretty neat, I'd say! We've lost the cache and allocator-friendliness, though we could restore them by supplying our own allocator for the whole tree. We've also lost the possibility for heavy inlining, which we can't really do anything about. Having used this for quite a while by now I'd say that it is a pretty good approach, and seems to handle most complicated situations relatively well. Here's the source code for this second approach. It is pretty much self-contained.