Blood & Honor: Decisive AI’s Latest Challenge

It is no mystery why every new project during the infancy of our Intelligent Artificial Player (IAP) product has been significantly more challenging than the last one: the answer is ‘because we like it.’ Our product further advances in complexity and scope as we partner with pioneer video game companies that share our vision of IAPs that are fun, fair and flexible - and a fraction of the cost to produce than traditional hand-coded / hard-coded NPCs.

Sandstorm Interactive is certainly one of these partners. And our IAP for their Blood & Honor game is certainly our latest and greatest challenge yet. So what did we do for them? Well, first let me briefly explain the reality of the game from a IAP perspective:

B&H started as a RISK adaptation for mobile. Since then, it has evolved and now combines essentially three stages that are formed by (i) selecting a card from your deck, (ii) placement of troops, and (iii) three simultaneous attacks. In addition to these, there are currently 13 different objectives that are provided to any given player randomly and which are secret to other players, as well as 7 different maps, and as if this wasn’t enough, definitely more to come in future releases.

Given the absolutely massive space we are talking about here, with the immense variety of combinations and permutations, we inevitably found that no amount of training (the AI playing the game in order to learn the best moves organically through experience) was enough.

The other key thing to understand is that due to their large size, we can not hope to show the training AI every state or even a significant part of them all (or even 1% of the total!) in the context of a commercially viable product of this type. Regardless of us not having infinite resources in the form of computing power in order to brute-force our way through this, it simply wouldn’t make sense to do it that way due insurmountable costs - and let’s not forget that part of the value proposition for our IAP product is affordability in comparison to traditional NPCs.

Enter the well known conundrum of space selection: We cannot possibly “see” it all, so what do we show? The industry default and easy to implement ‘random’? The carefully selected ‘expert’?

Not random because it is simply too large and hence wasteful. Not expert because the agent will not be playing experts most of the time, and we know how boring an opponent (artificial or otherwise) can be when they win most of the time.

Now the interesting part: We solved this by splitting the machine learning training cycles into two distinct phases that are then combined at every stage throughout the overall process. First we show the agent several scenarios, then we tell the agent which scenarios are good and which bad.

So for the first part - showing: Please consider how narrow this aspect is, understanding that randomness cannot be used at the beginning, nor can crappy inefficient bots be used to explore the vast complexity. Picking random scenarios is easy and fast, but it teaches the agent about something that is too far from the reality it will face while actually playing the game.

The key was to start by beating the baseline agent. This gives a solid ground of knowledge for the AI to stand on. Then, beating the next level champion is possible, and necessary. Gently guiding the agent to places that are more likely to matter, otherwise imagine the huge waste of time and resources that an unguided agent might produce while stumbling through such a large world in its infancy. And all the while there is true machine learning taking place.

At this point we noticed that we were showing the agent too many early game states, and very few late game states. We of course realized that this is natural, because all games have a starting point, but few last 30+ turns. To address this slight unbalance we wrote some additional selection algorithms to have a better representation of the whole game. So if early games are somehow over-represented, the agent would be exposed to a healthy mix of late game states too.

In addition, as the agent progresses, small doses of randomness are added to the mix, allowing for the machine to learn what we cannot teach. And as it advances through the ranks, the guiding hand loosens until it’s gone completely, and the true power of AI can evolve into an expert agent that goes on to eventually explore over 60 million states, and learn for them all.

For the second part - discerning: Well, let’s just leave that for a future blog soon to come.