Blog Posts

Week 14: Final Design Review

It has been a wonderful year with IPPD! We’ve gone through a long journey to get here. We researched different board games to pick one suitable for our project. We visited CAE in Orlando to give our PDR presentation. We explored different simulations for our game before eventually settling on one at the end of the fall semester. We went through multiple peer reviews and QRB presentations to learn how we can improve our project. Finally, we ended things off by presenting an agent that managed to beat us in Catan.

During our last IPPD class last Tuesday, Team Tactica presented its project at IPPD’s Final Design Review event!

Team Tactica showing their project to keynote speaker Dr. Keith Stanfill from the University of Tennessee. Photo by Dave Schlenker.

As part of the event, we got to hear from keynote speakers on the history of industry-sponsored engineering design programs and on lifelong learning. Afterward, we gave our presentation to our sponsors and coaches, showing our GUI, our process for building an AI agent, and the results of our different agent designs. As discussed in previous posts, our most promising agent is one that reduces our action space by selecting from a set of strategies rather than choosing actions directly.

Our project poster.

Unfortunately, our original target win rate of 50% in all tests was a bit too ambitious, so we prioritized demonstrating the usefulness of ensemble models and Graph Neural Networks (GNNs). We’ll be sure to document our findings in our upcoming research paper.

We had two computers at our table with Catanatron running. Those who were familiar with the game of Catan got to sit at one of our chairs and try it for themselves. Our liaison even got to try it, and our agent managed to beat him! Some of us were able to play against the model as well.

Brian Magnuson standing at Team Tactica’s table at the FDR event.
Max, Han, and Jason at FDR playing against our model.

We noticed a few interesting behaviors of our model. For one, it never seems to offer trades and almost always rejects trades offered to it. Despite our efforts to implement trading, it seems our model couldn’t fully comprehend it. Brian and Han played against the model a few times and even lost a few times. However, Han was eventually able to pick up on the agent’s strategy and exploit it.

During the event, viewers also got to see our video, which played in the background during the public showcase. We tried to aim for something humorous to appeal to a wide audience:

Team Tactica’s final video
Team Tactica presenting their project at the FDR. Photo by Dave Schlenker.

If we had more time to continue the project, we would try more strategies for reinforcement learning, train against more opponent types, and improve our models’ explainability. Explainability is a common issue with deep learning since the model is like a “black box”, which makes it difficult to judge how a model arrives at a decision. We could improve our game log to show what strategies our model is taking and even use a Large Language Model (LLM) to analyze these strategies.

As the project comes to a close, some of us will continue to work on our research paper to hopefully present it at the Interservice/Industry Training, Simulation and Education Conference (I/ITSEC). Beyond that, however, we’ll all be going our separate ways. It’ll be sad to see things end, but for us, it’s the only the start of our future careers in engineering!

We are Team Tactica, and we thank you for reading!

Week 13: Preparing for FDR

This week, Team Tactica began its final preparations for our upcoming Final Design Review (FDR)! The Final Design Review is the biggest event for IPPD. We will be spending all day in the UF Reitz Union presenting our projects to our liaisons and to the public!

Results from our current most successful model, nicknamed Catanica.

To prepare for FDR, we’ve been working on our presentation, giving special attention to our project results. Above is one of our results slides, showing the current win rates we have for our models. Note that we are still doing some testing, so these figures are only temporary.

As a reminder, our testing plan was to have our agent play against 3 random choice players (abbreviated as R), 3 Tactica value function players (TVP), and 3 AlphaBeta players (AB) in 1000 games each. Here, we only have some win rates for 100-game test runs. However, our model is performing quite well, reaching win rates above 25% for TVP and nearly 25% for AB.

Unfortunately, it seems unlikely that our win rates will reach our target win rate of 50% (at least right now), but we’ve been able to demonstrate how our ensemble approach to reinforcement learning can drastically improve our agents. We will be presenting these findings in our research paper. We’ll also be sure to make note of our limitations and next steps to hopefully help anyone who would like to expand upon our work!

Our FDR presentation was well received by our peers, but there are still some adjustments to be made, such as how we explain the game of Catan. We should make it clear that we could have picked any other strategy game; we went for Catan because it features randomness and mutually beneficial trades that make it much more challenging for machine learning.

Here’s another photo from Prototype Inspection Day:

Team Tactica from Prototype Inspection Day.

Recently, our abstract from our research paper was accepted into the Interservice/Industry Training, Simulation and Education Conference (I/ITSEC), one of the biggest events in modeling, simulation, and training. We are excited about showing our work, and it would be amazing for our paper to be presented there!

Next week is our Final Design Review! There will be one more blog post where we’ll get to share all the details, including our poster and video! That’s all for now. See you next week!

Week 12: Videos and Final Project Modifications

This week, Team Tactica finished recording the first draft of our project video!

Max, Cody, and Brian sitting at a table playing a board game; Max has good cards at his hand and looks at the camera smugly.
Max, Cody, and Brian acting out a game of Catan in the project video.

Since our project centers on Catan, we wanted to include clips of us playing the game. We wanted to have clips showing moves like trading and building settlements. We also wanted to capture the feelings of tenseness and frustration that one might experience during the game. The video was presented to our fellow IPPD classmates, and we received some feedback on how it can be improved. We hope to show you the final project video soon!

Andres and Jason standing in front of a whiteboard and speaking to the camera.
Andres and Jason from the project video.

We plan on wrapping up the development phase of our project at the end of this week. After this week, we won’t be adding any more features to the core simulation or scripts; we’ll be focusing on bug fixes as well as minor tweaks to the UI (as long as they don’t affect training or testing). We then plan to do a full round of testing to get the final results of each of our models. Even if we do not hit our original 50% target win rate for all of our tests, we’ve still accomplished a lot by demonstrating how our action space reduction technique can be a powerful tool when building AI models in a complex environment.

As the Final Design Review (FDR) approaches (on April 22), we plan on preparing our presentation and presenting it to our peers during a peer review next Tuesday. By then, most of our project work will hopefully be done!

That’s all for now! See you next week!

Week 11: Prototype Inspection and Video Recording

This week, our team gave our presentation for Prototype Inspection and Evaluation Day. We got the chance to show off our improved user interface, our different approaches to machine learning models, and some of our results.

Team Tactica at Prototype Inspection and Evaluation Day.

Our team recently developed a new preliminary “Ensemble DQGNN” model. It takes a different approach to reinforcement learning: instead of choosing an action from a large action space, it chooses among different “strategies,” which are implemented in different value functions. Though it is not like traditional RL, it is a more practical approach as it reduces the action space to a much more manageable size. It shows promise, defeating random-choice players in 100% of games (1000 games total), VP-greedy players in 34% of games (one of the value functions it may select from), and AlphaBeta players (the current best-performing opponent on the Catanatron leaderboard) 22% of the time.

Team Tactica giving their PID presentation.

We received a lot of good feedback during our presentation. Certain parts of our presentation could be made clearer, such as our hypothesis and testing plan. We could also have included more graphics that better show off our process and our results.

While working on our poster, we made a cool animated graphic featuring our team logo. Since it doesn’t accurately reflect how our graph is represented, we will not be showing it in future presentations, but we thought it would be cool to show here:

Tactica Logo Animation

With prototype inspection finished, we’ll now be working toward our Final Design Review (FDR) in just a few weeks! One of the things we plan on preparing for FDR is a short video showcasing our project. Since our project is based on a board game, we figured we could include short clips of us playing Catan with voiceovers where we explain the challenges of our project. We have more plans for our video, but those will be revealed in a future blog post.

That’s all for now – see you next week!

Week 10: Preparing our Prototype Demo

We’re back from spring break! This week, Team Tactica began work on preparing for the upcoming Prototype Inspection and Evaluation Day. Previously, IPPD held a Prototype Inspection Day (PID) in mid-November for the Fall semester. Back then, the focus was on sharing our current builds, discussing our overall project plans, and getting feedback for improvement. For this upcoming prototype inspection day, our prototype will be of even greater focus as our project nears completion.

Catanatron user interface.
Screenshot of Catanatron.

In our demo, we will showcase our user interface, which clearly visualizes the game of Catan and the moves each player makes. We’ve modified the interface to run in multiple ways. One way we can run the game is by having only bots and ML models play against each other. Without any human players, the game will run automatically, allowing users to visualize games being played “faster than real time.” Another way we can run the game is by having a human user play against the bots. This will allow us to show more of the interface and how humans can interact with our trained models.

Brian sitting at a desk working on his computer with Catan on the screen.
Brian working on the Catanatron simulation.

One challenge we face is presenting our demo to audience members who’ve never played Catan. While it is a fairly well-known game, Catan isn’t as ubiquitous as Chess or Go. As such, we’ll have to focus on the aspects of Catan that most interact with the RL facet for our project: playing against multiple players and making mutually beneficial trades.

Prototype Inspection and Evaluation Day is next Tuesday. We’ll be sure to share all the details in the next blog post. That’s all for now – see you next week!

Week 9: Getting to Work

This week, Team Tactica met to continue work on the project. With spring break coming up, our team has to be ready to take advantage of our extra time and train our models.

Team Tactica standing in a room behind a meeting table.
Team Tactica after a long work session.

Each of us has been working on different aspects of the project. Cody has been working on improving the testing script, allowing other features of the simulation and AI agent to be tested. He has also been working on a state conversion function, which will allow us to check that the state information we send to our model is being translated properly.

Andres has been working on improving the GNN model with a focus on handling different action state types from Catan. He has also been working on creating a configuration file with Han to make sure our models are easily configurable per our requirements.

Jason has been working on speeding up training by using Python’s multiprocessing library. This allows us to train our models on multiple processes. Cathy has been working on timing the graph functions to measure the speedup.

Brian has been working on improving the trading interface and API, allowing users to trade with other players in the game and receive trade offers from other players.

Cody drawing on a whiteboard.
Cody drawing plans for the GNN.

We have the rest of March to experiment with our models. During spring break, we plan to train our models and wrap up our simulation modifications. We also have to prepare for our upcoming presentations. We will soon be having another Prototype Inspection Day (PID) where we plan on having a proper demo of our simulation and models. Then, not long after, we have our Final Design Review (FDR) where we present our final product.

There will be no blog post next week, but blog posts will continue the week after. See you then!

Week 8: The Trading Interface

This week, our team continued to work on our GNN reinforcement learning model for Catan. During this, we also sought to improve Catanatron’s trading interfaces.

Originally, Catanatron only allowed maritime trading, that is, trading at ports or with the resource bank, and not with other players. However, we wanted to implement player trading for two reasons: (1) player trading is a central game mechanic of Catan, and (2) we wanted to challenge ourselves to build a machine learning agent that could utilize more advanced tactics.

Current iteration of the trading interface in Catanatron.

Catanatron did have functions for player trading, but bots never attempted to perform player trades. This was because all bots base their actions on their provided list of allowed actions. This list never included player trade actions even if player trading was allowed. Thus, our first attempt at enabling player trading was to create a list of possible player trade actions for our players to make.

Here are examples of player trade actions:

  • Offer 1 lumber for 1 brick
  • Offer 1 lumber for 2 brick
  • Offer 2 lumber for 1 brick
  • Offer 2 lumber for 2 brick
  • Offer 1 lumber for 1 wool
  • Offer 1 lumber for 2 wool
  • And so on

Our goal was to enumerate every possible trade with a few limits: You can only offer/request a max of 2 resources at a time, and you can only make up to 3 trade offers in one turn. As you can imagine, this list of possible trades can get very long. As such, random choice players (which choose randomly among their list of allowed actions) would almost always attempt to trade since the trade options took up most of their action space.

Thus, we tried a different approach to trading: First, a player must decide if they want to trade. Only after they decide to trade are they given a list of possible trade options. By separating the actions of “deciding to trade” and “making a trade offer”, our bots won’t have to spend as much time considering their possible actions. This ensures our models will continue to face strong opponents during testing.

Trading is a difficult mechanic to implement in our simulation and teach to our AI models. However, we believe this added challenge can reveal more about the strengths and weaknesses of machine learning.

That’s all for now. See you next week!

Week 7: QRB 2 and Reward Functions

This week, our team gave our second Qualification Review Board (QRB) presentation! Unlike the first QRB presentation, this one focuses on our testing plan where we run our model against player bots of increasing difficulty to see how well it performs.

Testing plan diagram by Brian. Our model (TRL) will be tested in 3 sets, each 1000 games long. These sets use player bots of increasing difficulty with R being a very weak player, TVP-VP being a challenging player, and AB being a very challenging player. Each set yields a win rate.

In addition to our testing plan, we also explained our GNN approach to creating an RL model and our long-term goals for the project.

Overall, we were happy with how our presentation was delivered. We worked hard to improve upon our mistakes from the first presentation. We made sure to present our team roles, explain our diagrams better, and use our time more wisely. We also received excellent feedback from the review board. Among their suggestions was the creation of a more interactive demo to better show our project. This will prove helpful in our next Prototype Inspection Day (PID).

As for the current progress on our project, we found some interesting behaviors in our models. As mentioned in a previous post, Reinforcement Learning (RL) models attempt to maximize rewards given by a reward function. However, if the reward function is too simple, the model may behave strangely.

Consider the following example:

def my_reward_function(game, p0_color):
    winning_color = game.winning_color()
    if p0_color == winning_color:
        return 100
    elif winning_color is None:
        return 0
    else:
        return -100

The above reward function gives a positive reward if the player is winning and gives a negative reward if the player is losing. Although this makes sense at first, the reward is calculated at every timestep. Thus, if the model is already winning, the model will want to prolong the game to maximize its reward.

Our models end up collecting 8 or 9 victory points (10 is needed to win), then stop playing until another player beats the game. Ironically, our models lose by trying to stay in a winning position. This isn’t the behavior we want. What we need is a reward function that incorporates both victory points and time.

Next week, our team will continue working toward improving our models. See you then!

Week 6: Graph Neural Networks

This week, our team continued to work on building our AI agent. We considered the potential for newer methods of representing the game board.

Training an AI to play a game requires some representation of the game’s state. For board games like Chess or Go, it’s easy to programmatically recreate the game board using a 2D vector or matrix. Each matrix element would contain information about the piece in the corresponding space. Adjacent spaces are easy to calculate: you would adjust the row or column index by 1 to get the space up, down, left, or right.

Diagram of the Catan board, drawn by Brian Magnuson.

With Catan, however, the game is played on a hexagonal board rather than a square grid. Thus, it becomes difficult to represent the board using vectors. Even if one numbers each vertex (like in the diagram above), determining which vertices are adjacent is not intuitive. Because of this, it is easier to represent the board using a graph data structure instead.

This gave us the idea of using an artificial neural network specialized for graphs: Graph Neural Networks (GNNs). There would be 54 nodes total, one for each vertex in the game, with 72 edges connecting them. Each node has an associated feature vector detailing the pieces on the corresponding vertex and the adjacent tiles. Each edge also has an associated feature vector for any road placed on that edge. We hope that GNNs will provide a new means of representing the game board and improve the overall model’s understanding of the game of Catan.

Next week, we have our second Qualification Review Board (QRB) presentation, where we will once again present our project and hopefully get feedback on our new testing plan.

That’s all for now! See you next week!

Week 5: The Testing Plan

This week, Team Tactica met to discuss our upcoming testing plan for our project!

Team Tactica in a meeting room.
Team Tactica.

Testing our reinforcement learning agent is a little tricky. We want our model to have a high “win rate”, but Catan is a multiplayer game, meaning we need at least one more player to play against. Additionally, Catan is designed to be 3-4 players; scores from 1v1 games are less meaningful since Catan is not normally played this way.

We plan to use many different types of players in our tests:

  • Random choice players
    • Chooses randomly among the possible actions.
    • Useful for determining if our model has learned anything.
  • AlphaBeta players
    • The current best-performing player in Catanatron.
    • Useful for challenging our model to the highest standard.
  • Tactica Value Function players
    • Players that use a value function tuned to a specific strategy to decide on moves.
    • Useful for determining if our model outperforms any single strategy.
Brian working at a table with a laptop.
Brian working on Catanatron.

Furthermore, whatever testing plan we devise, we need to be able to test the model quickly. Right now, we can run a command on our personal computers to play 1 set of 1,000 games. But we may need to run many sets of games against many different player types. As our models become more complex, this process will become more tedious and time-consuming. What we need is a way to automate our tests and have them run on more powerful hardware. This way, we can iterate upon our designs quickly.

That’s all for now. See you next week!