Blog Posts

Week 4: Reviewing the Code

This week, our team met together in UF’s Nuclear Sciences Building to sit down and make sure that everyone on the team has a good understanding of our project’s code base.

Team Tactica posing in a meeting room.
Team Tactica in the meeting room.

During this meeting, we went over our current project’s design. Since we started using Catanatron as our simulation, we’ve developed a series of “value function players”. These players do not use reinforcement learning algorithms but rather a value function to evaluate each possible action on their turn. These players are vastly superior to the “random choice” and “weighted random choice” players, who both take actions based on the result of a random number generator. The value function players should prove to be useful when testing and training our AI models.

Additionally, we went over our current framework for training a model. As mentioned back in the fall semester, we intend to train our models using HiPerGator, a supercomputer located at UF. Since we have a lot of potential algorithms to test and a lot of games to play, we have to find a way to train and test these models as efficiently as possible.

Finally, we went over our plans for the next week. Jason will be working on adding multiprocessing to our project. Han, Cathy, and Andres will be researching ways to build our reinforcement learning model. Brian will be working on the UI, making sure that humans can play against the models we create.

That’s all for now! See you next week!

Week 3: QRB 1 and Hyperparameter Testing

This week our team gave our first Qualification Review Board (QRB) presentation! The QRB presentation is meant to provide an overview of our plans and our current progress, so we can receive feedback for improving our project.

Diagram showing two players, red and blue, playing Catan; blue is winning early in the game, and red is winning later on.
Diagram showcasing the importance of victory points in Catan and their role in determining the winning player.

Among the things we explained was our plan for designing our reinforcement learning agents. In order to develop effective and specialized sub-strategies, we set up and ran an experiment on value function hyperparameters. Hyperparameters are parameters for the model that affect the learning process. The goal of this experiment was to identify correlations between the input hyperparameters and different winning strategies. We found that certain hyperparameters had a higher correlation with different win strategies than others. For example, a model that was incentivized to decrease other players’ resource production tended to use a more aggressive strategy than one purely rewarded on its own production.

We received a lot of great feedback! The day after the presentation, we got together to reflect on what went well, what didn’t, and how we can improve ourselves in the future. There will be another QRB presentation later in February, and we plan to do even better then!

That’s all for now! See you next week!

Week 2: Catan Simulation

This week, our team began work on training our first models! The simulation we’ve chosen to use is called Catanatron. It is an open-source Catan simulation for testing bot strategies.

Screenshot of Catanatron simulation.
Catanatron, available on GitHub, code licensed under GNU GPL v3.0.

In Catanatron, we can build player classes using different strategies. Each player will have access to the game state and a list of possible actions to take when it is their turn. We can feed this information into a reinforcement learning model, and then have it predict which action to take.

Compared to previous simulations, this simulation is very stable and easy to work with. It provides a Gymnasium environment to help us train our models and lets us easily compare our players with other bots. It also provides visualizations of specific games of Catan in a web server, allowing us to see what kinds of actions our models are taking.

Catanatron report after running 1,000 games using 3 random players and 1 player using a trained RL model.

Here, we’ve created a player that uses a reinforcement learning model with a simple reward function and had it play against 3 other random players in 1,000 games. Random players simply pick a random action in their list of available actions. Our player won the most games, but only by a small margin.

In addition to training models, we also need to work on modifying the simulation. The simulation did not come with a complete trading interface. That is, players can perform maritime trades, but not trades with other players. We plan to enable more features of the game, so our AI players can have the proper Catan experience.

Next week, we’ll be working more on setting up our environment to test our models more rapidly. See you then!

Week 1: Welcome Back!

It’s a new year and a new semester, and Team Tactica is ready to get to work! Now that our plan is in place, we will spend this semester working directly on our project. Over the next sixteen weeks, we’ll hopefully have an AI model that demonstrates strategic decision-making in Catan.

We’ve decided to use a simulation different from the one we used for our Prototype Inspection Day (PID). Unlike the previous simulation, this one integrates well with a Python library called Gymnasium, which provides a consistent API for reinforcement learning. Gymnasium is a maintained fork of OpenAI’s Gym library. We can customize the environment, including the number of players, the reward function, the action space, etc. This will allow us to train our models more rapidly. Once our models are trained, we can export the models and reuse them in the testing simulation to see how well they perform.

Team Tactica in a meeting room.
Team Tactica. From left to right: Brian, Jason, Max, Andres, and Cody.

We plan to switch to a more agile workflow in the next few weeks. We’ll start by creating a backlog of action items to accomplish. This includes modifying our simulation to fit our needs, building our reinforcement-learning policy, and coming up with several reward functions to test. We also need a way to test rapidly since there are a lot of variables we can adjust, including the size of the neural network, the algorithm used in the model, the reward functions, etc.

That’s all for now! See you next week!

Week 14: SLDR and Semester Wrap-Up

This week, our team gave our System Level Design Review (SLDR) presentation! The event was an exciting experience for all of us as we gathered in the Reitz Union Rion Ballroom to network with the liaison engineers from other teams and learn more about life in the workplace.

Team Tactica at the Reitz Union giving the SLDR presentation.

The SLDR presentation is the final presentation of the semester, so we wanted to make sure we finished strong. We talked about how the game of Catan is played, previous research done on Catan, our requirements and specifications, our project architecture, and our testing plan. There were a few clarifying questions and mainly clarifying questions. Overall, the presentation was a success!

Han and Max in the Reitz Union Rion Ballroom talking with other students.
Han and Max speaking with other IPPD students at SLDR.

The event also gave us a chance to relax and celebrate. We learned a lot over the semester: how to create a list of specifications, concept generation, public speaking, and team management. After a long journey, we now have a plan for the spring semester starting in January.

IPPD can be struggling at times, but it’s a great experience as it emulates the engineering design process that professional engineers go through. Additionally, we get to work with real companies and expand our network.

Among the things we’ll be working on next are creating our ensemble model, fixing any remaining issues with the simulation, and creating a script to test our model according to our testing plan. We also have plans to explore other simulations of Catan and see if we can learn from them.

Some of our team members also plan on helping out with the project during the winter break.

Andres plans on reviewing reinforcement learning concepts and playing against the current simulation’s AI to see how they can be improved.

Brian plans on studying other simulations and building a small reinforcement learning model to try and learn more about building AI models.

Dr. Jorg Peters, Jason, and Brian talking at the Reitz Union Rion Ballroom.
Dr. Jorg Peters, Jason, and Brian at SLDR.

This will be the last blog post for a while. We hope you’ve found our project interesting! Things are only just getting started! See you next time!

Week 13: Preparing for SLDR

This week, our team has been working hard on our System Level Design Review (SLDR) report. As we mentioned in a previous post, the SLDR is like our PDR but aims to be as specific as possible in describing our project design. For our project, this involves coming up with a design for our AI model that will improve upon existing models, as well as a plan to thoroughly test it to make sure we hit our specifications.

We mentioned that we have access to bigger hardware with HiPerGator, a supercomputer at the University of Florida. To take advantage of this hardware, we plan on utilizing a form of ensemble learning to help our AI make decisions. Ensemble learning is like combining multiple AI models into one. Each base AI model can focus on a particular strategy (such as building roads or buying development cards), and we can add a decision algorithm to decide which strategy works best for the next move. This has the benefit of ensuring that our overall AI model doesn’t get “locked in” to a sub-optimal strategy.

Diagram showing multiple strategies being fed into a deciding algorithm
Diagram of our AI model design.

Earlier this week, we gave our SLDR presentations to our peers. Although things went well overall, our peers have helped us learn better ways to present our tables. We’ve also taken some time to reflect on the organization of the presentation, and we hope to improve upon it by December 3rd when we give our real SLDR presentation to our sponsor. Our presentation will be given here at the University of Florida, so we won’t be traveling again for the rest of the semester, unfortunately. However, there will be other opportunities in the next semester!

Team Tactica working together in a meeting room
Team Tactica, preparing for the SLDR peer presentation.

There will not be a blog post next week as we are taking a break for Thanksgiving. After next week, there will be one final blog post for the Fall semester where we will share details about our SLDR presentation to our sponsor, discuss our plans for the break, and discuss our plans for the next semester. See you then!

Week 12: Prototype Inspection Day

This week, the IPPD teams met on Tuesday for Prototype Inspection Day (PID)! On PID, each team gets to present their current prototype live in front of a series of judges. These judges then give us feedback on how to improve our product and development process.

For our prototype presentation, we showed off our simulation and how we plan on training our AI model to employ different strategies for victory.

Team Tactica presenting their simulation on two computer monitors.
Team Tactica on Prototype Inspection Day.

To train an AI model to focus on a particular strategy (such as building roads or buying development cards), we use a reward function. In reinforcement learning, a reward function gives our AI a “positive reward” for good actions and a “negative reward” for bad actions. For example, winning the game might give +500 points while discarding resources might give -50 points. The model, during training, will attempt to maximize positive rewards while minimizing negative rewards. By adjusting the reward function, we can encourage certain kinds of behavior such as finishing the game quickly or building the longest road. For the game Catan, where it might not be clear which strategy is best, we intend on incorporating multiple strategies in our model to achieve victory.

In addition to showcasing our different reward functions, we also presented a testing plan for our project. When testing the performance of our AI, the most basic measure is the win rate, i.e., the number of games our model wins divided by the number of total games it plays. But there are different ways of measuring win rate. One way is to have our model play against three other models that provide random inputs. While these opponents might be easy to beat, it can be a good indicator if our model is learning a specific strategy. Another way is to play against the model ourselves, which we can do with our simulation. While this won’t provide very objective data, it can be a good source of qualitative data on how our model “thinks”.

Team Tactica standing together.
Team Tactica at the end of Prototype Inspection Day.

Next week, we will be giving presentations to our peers for our System Level Design Review (SLDR). See you next week!

Week 11: Preparing for Prototype Inspection Day

This week, our team has been busy preparing for Prototype Inspection Day (PID). Next Tuesday, we and the other IPPD teams will present our prototypes to several judges to receive feedback on our project. In our case, we will present the simulation we have, hopefully with added visuals to show how the AI model is performing.

We’ve mentioned that we have a simulation and AI model, but we can now show more details about the inner workings of our project.

Diagram of simplified program archiecture.
Highly simplified diagram of the program to train our AI models.

To keep things clean, we try to keep the AI model and simulation as separate as possible. The simulation’s job is to keep track of the game state and relay state information to the AI agents. State information is all the information about the game’s current “state”, i.e., the cities, settlements, where they’re built, who owns them, what cards each player has, who has the most victory points, etc. The main issues with the current simulation are that it is not well encapsulated from the AI model and its functions are difficult to understand. The programming team’s goal is to address these issues.

The AI agents process this state information into data and feed it into a reinforcement learning algorithm. This should hopefully result in an action that will be sent back to the simulation, allowing the player to “make their move.” The main issue with the current AI model is that the reward function needs to be improved. The reward function determines which moves are good and which moves are bad (something that isn’t very obvious in Catan). The AI team’s goal is to address this issue by coming up with different reward functions and evaluating each of them. Hopefully, this gives a better understanding of our project!

As we work toward our prototype, we also have to be ready for our System-Level Design Review (SLDR). The SLDR is like our PDR, but it aims to be as specific as possible in describing our project design. It also must demonstrate that the project design will meet the specifications set in our PDR.

Picture of the team at their Wednesday meeting
Team Tactica at their Wednesday meeting.

That’s all for now! See you next week!

Week 10: More Concept Generation

This week, our team met several times to develop plans for our upcoming prototype. One thing we needed to decide on was how to develop the simulation. This is important since our team already has a simulation for training reinforcement learning models. However, this existing simulation has a few flaws and will need to be improved somehow. One idea was more or less an overhaul of the entire system; we needed to decide if the benefits would outweigh the downsides.

Team Tactica at their Wednesday meeting.
Team Tactica at our Wednesday meeting

Without going into detail on our implementation, we can explain our concept generation using an example. This example won’t show what we actually discussed, but it illustrates how considering alternatives requires a lot of thought.

If we wanted to write a program, we’d first have to decide what programming language to use. Our team happens to know Python and C++. Python is often praised for its simplicity and readability. Additionally, it has many libraries with plenty of documentation online. However, when the program increases in size it needs to be broken down and more modular. Python’s dynamic type system would make things more difficult and less readable since it becomes less clear what each part of the program needs. C++ is more type-safe since all variables need to be declared with a type before they are used, and the type of a variable cannot change. Additionally, C++ is often implemented as a compiled language and can be optimized to run faster. However, it may require writing significantly more code to achieve the same results. Furthermore, managing libraries for C++ is more difficult.

From this example, just choosing the language for a program involves weighing various pros and cons. This procedure becomes more difficult when one starts considering the entire architecture of the program.

The AI team has also been working on how to restructure the code to better fit their iterative improvements. If we want to try out different algorithms for machine learning, the code should be designed to easily switch algorithms in and out.

Next week, our team will be focusing on making our project presentable for our prototype inspection day. That’s all for now!

Week 9: Working Toward Our Prototype

This week, things have started to slow down a little since we finished our Preliminary Design Review (PDR). However, things will pick up again soon as we start working on our prototype. The purpose of our prototype will be to get as much feedback as possible. This way, we can determine if anything needs to be changed or if we need to take a new approach. It also shows our sponsor that our design works and can be iterated upon to create our final product.

Right now, we have split our group into two sub-teams: the AI team and the Programming team. The AI team’s job is to design the AI model process for our project. Right now, they are reviewing the documentation for machine learning libraries to understand how to build the algorithm in Python. They are also reviewing strategies to reward the AI model when it makes good moves and bad moves.

The Programming team’s job is to modify the simulation and understand how to run it on HiPerGator. Right now, they are reviewing code from an existing simulation and taking notes on each function. Once they get a full grasp of the simulation, they can determine if the code needs modification, or if it would be better to create a new simulation based on the existing one.

Throughout November, we hope to have a fully working simulation where we can fully utilize our AI model and have it running on HiPerGator. Then, by December, we will be wrapping things up for the semester with our System Level Design Review (SLDR). Then, by the spring semester, we’ll be working on training the AI model at full speed!

Our team presenting our PDR. From left to right, Max Banach, Dr. Jorg Peters, Andres Espinosa, Cody Flynn, Jason Li, Han Mach, Brian Magnuson, and Cathy Quan.

That’s all for now. See you next week!