This week, the IPPD teams met on Tuesday for Prototype Inspection Day (PID)! On PID, each team gets to present their current prototype live in front of a series of judges. These judges then give us feedback on how to improve our product and development process.
For our prototype presentation, we showed off our simulation and how we plan on training our AI model to employ different strategies for victory.

To train an AI model to focus on a particular strategy (such as building roads or buying development cards), we use a reward function. In reinforcement learning, a reward function gives our AI a “positive reward” for good actions and a “negative reward” for bad actions. For example, winning the game might give +500 points while discarding resources might give -50 points. The model, during training, will attempt to maximize positive rewards while minimizing negative rewards. By adjusting the reward function, we can encourage certain kinds of behavior such as finishing the game quickly or building the longest road. For the game Catan, where it might not be clear which strategy is best, we intend on incorporating multiple strategies in our model to achieve victory.
In addition to showcasing our different reward functions, we also presented a testing plan for our project. When testing the performance of our AI, the most basic measure is the win rate, i.e., the number of games our model wins divided by the number of total games it plays. But there are different ways of measuring win rate. One way is to have our model play against three other models that provide random inputs. While these opponents might be easy to beat, it can be a good indicator if our model is learning a specific strategy. Another way is to play against the model ourselves, which we can do with our simulation. While this won’t provide very objective data, it can be a good source of qualitative data on how our model “thinks”.

Next week, we will be giving presentations to our peers for our System Level Design Review (SLDR). See you next week!