Week 14/15: SLDR Success, Professional Growth, and December Plans
December 5, 2025

System Level Design Review and Sponsor Engagement
This week marked one of the biggest milestones of the semester: our System Level Design Review (SLDR). Sponsor companies from across the program attended to hear about each team’s progress, ask questions, and provide direction as we move into the next phase of development. Presenting our project in front of industry partners was both exciting and insightful, and we received highly constructive feedback that will help us strengthen our technical approach and overall narrative moving forward.
Before the SLDR presentations began, we participated in an interactive keynote session where all students were placed into semi-randomized tables. This setup encouraged us to meet new people, collaborate with students outside our usual circles, and engage directly with company liaisons. The conversations were incredibly valuable, ranging from advice on soft skills and networking to perspectives on thriving within corporate environments. It was a refreshing reminder that effective communication and professional awareness are just as important as technical competence.
Looking Ahead: December Work and Cross-Team Collaboration
Although this will be our final blog post for the fall semester, our work is far from over. Throughout December, our team will continue pushing toward a significant milestone: getting Buttercup to run a sample task fully end to end using Ollama for local LLM interactions rather than relying on API keys for private models. With local inference now more feasible, this shift should streamline development and make the system easier to test and iterate on.
On the Atlantis side, we are maintaining ongoing meetings with Team Atlanta to deepen our understanding of the system’s structure and component relationships. Their guidance has been instrumental in helping us diagnose build errors and determine which modules we should extract and scale down for our own implementation. These conversations will continue to shape our plan for setting up a clean, minimal Atlantis environment.
Closing Out the Semester
With SLDR behind us, our team feels both energized and focused. The feedback we received, both technical and professional, has given us a strong sense of direction heading into winter break. We look forward to making meaningful progress in December and starting the spring semester with a more mature, better-understood pipeline.
More updates to come in January!
Week 13: Deployment Progress and Exciting Meetings
November 21, 2025

Peer SLDR Session and Presentation Feedback
The Peer System Level Design Review (SLDR) was this week! Fellow students and project coaches reviewed our presentation ahead of the formal SLDR. The feedback was encouraging and highlighted noticeable growth in both our project clarity and our communication skills since the Preliminary Design Review (PDR). Reviewers pointed out that our system understanding and overall narrative are much stronger now, and the session helped us identify a few areas to refine before the official review. Overall, the Peer SLDR served as a valuable checkpoint and gave the team more confidence moving into the next stage of the process.
Advancing Buttercup Toward Full Local Execution
This week marked major progress on the Buttercup side of the project. All remaining local deployment errors were resolved, allowing the system to start up cleanly and begin processing a test task. The run ultimately failed only because a placeholder credential was used, which confirms that the full task flow is nearly operational. To support this work, we also set up a local LLM environment and verified that lightweight models can be prompted successfully. This positions us well for integrating model inference directly into Buttercup’s pipeline.
While troubleshooting, we noticed recent upstream updates in the Buttercup repository that addressed some of the same issues we had already fixed independently. This reinforced the importance of monitoring active repositories more frequently, since doing so helps prevent redundant work and keeps our local setup aligned with ongoing development activity. Moving forward, checking for updates will become a regular part of our workflow.
Progress on Atlantis and Cross-Team Engagement
In parallel with Buttercup, the team made progress in understanding and preparing the Atlantis stack. We met with a team member involved with another Atlantis effort and gained clarity on where to begin and how the major components relate to one another. Based on that discussion, we contacted a contributor from another part of the Atlantis ecosystem and began arranging a meeting to better understand one of the build systems and its role in the workflow. These conversations are helping us map out the architecture and identify the best sequence for bringing the necessary pieces online.
On the technical side, early work has begun on addressing version mismatch issues that are affecting one of the Atlantis build processes. Although this work is still in progress, the team now has a clearer understanding of the problem thanks to the discussions held this week.
Next Steps: Task Execution and Meeting Team Atlanta
In the coming week, the focus for Buttercup will be completing an end to end run using a small LLM and verifying that the entire pipeline functions as expected. The team will also continue testing collaboratively to confirm consistent behavior across environments. For Atlantis, the next steps involve meeting with the external collaborator, gaining additional clarity of its structure, and continuing to resolve the configuration issues that are preventing a successful local build. With Thanksgiving coming up next week, our team will take a short pause and return to our regular work the following week.
Week 12: Progress on Running CRS Modules
November 14, 2025

Running Buttercup: Holistic Approach
Given that Buttercup is relatively lightweight compared to other CRSs, our approach focuses on bringing the full system up at once rather than isolating individual modules. Earlier in the week, about half of its Kubernetes pods were failing during local deployment. Through targeted troubleshooting, we reduced this to only a small set of remaining pods, all tied to what appears to be a common underlying issue now under investigation.
Running Atlantis: Modular Approach
Atlantis is significantly more complex, so our strategy focuses on bringing key modules online first rather than running the entire system end-to-end. We have identified two core components that handle the majority of the vulnerability-detection workload, and these remain our primary area of effort.
Challenges in Local Deployment
While we are making steady progress toward getting both CRSs running locally, several challenges remain. These systems were originally designed for competition environments rather than long-term use, which means some of their dependencies are outdated or inconsistent. As a result, we encounter issues that require additional troubleshooting. Build times also present a bottleneck: Buttercup can take roughly an hour to build, and major components within Atlantis often take even longer.
Next Steps: Resolving Errors
The immediate goal is to resolve the remaining local deployment issues so that both CRSs can start running. For Buttercup, we will aim for full system execution, while for Atlantis, we will focus on its critical modules. Once this foundation is in place, we will implement modifications to support open-source models and integrate with HiPerGator. As of recent, our plan is to operate the CRSs locally while offloading LLM workloads to HiPerGator. This approach allows us to continue using Docker and Kubernetes without re-architecting the CRSs to fit HiPerGator, which would not align with Raytheon’s operational environment.
Week 11: Prototype Inspection and PDR
November 7, 2025

Prototype Inspection Day (Gainesville, FL)
Our team setting up the environment to run our prototype live at the University of Florida, taking the next steps towards developing and showcasing our prototype.
Preliminary Design Review (Largo, FL)
Our team at Raytheon’s Largo office following our Preliminary Design Review presentation.

Prototype Inspection Day

On Tuesday, November 4, 2025, we attended Prototype Inspection Day, presenting our Streamlit-based prototype to three sets of judges. The prototype pulls from a GitHub repository and uses a single LLM call to identify vulnerabilities and recommend fixes. While we are still actively working to get the Cyber Reasoning Systems (CRSs) running, this initial version allowed us to focus on the user interface and gather valuable feedback. The judges provided helpful suggestions, which we quickly applied to improve our project ahead of our upcoming presentations.
Preliminary Design Review (PDR)
On Thursday, November 6, 2025, our team traveled from Gainesville, FL to Largo, FL to present our PDR at Raytheon’s office. This was a great opportunity to meet our liaisons, executive sponsors, and other Raytheon members in person. During the visit, we also had the chance to see projects from other teams, including those from other schools, and it was interesting to learn about their work and approaches. We had a great time meeting everybody, receiving feedback, and exchanging ideas.
Looking Ahead: SLDR
With the PDR behind us, we will now focus on our System Level Design Review (SLDR) which is coming soon. To prepare, we are focusing on getting the CRSs running without errors. We are also finalizing design decisions, taking into consideration all the feedback received this week. We have much to look forward to.
See you next week!
Week 10: Let’s Get This Running
October 31, 2025

GatorDetect Weekly Update: Presenting at UF AI Days and Advancing System Deployment
This week was an exciting milestone for our team as we participated in UF AI Days, where we presented our project poster and shared our ongoing research with peers, faculty, and industry guests. It was a great opportunity to communicate the broader goals of our work, receive valuable feedback, and gain insights into how our project fits within the larger landscape of AI innovation. The event also gave us the chance to reflect on how far we have come in developing our system.
Attempting to Run Cyber Reasoning Systems Locally
Outside of AI Days, our team made significant progress in deploying cyber reasoning systems locally. At this point, we are shifting from research to software development. Much of our focus this week was on getting key modules to run in local and containerized environments. We successfully built the containers for Buttercup and began running the system locally. However, several of the Kubernetes pods failed during startup, which we traced back to repository migration issues.
We also made progress in terms of utilizing open-source LLM models. It is much more clear how we could leverage HiPerGator to run LLMs and how we can alter the source code of the cyber reasoning systems to make them use open-source models instead of private ones that require API keys. At the end of this week, it has become clear that we are moving towards stable, repeatable system runs with minimal cost.
Next Steps: Refining Deployment
Next week, we plan to continue debugging and refining the deployment of the cyber reasoning systems so that key components run smoothly in our local environment. We are working hard to continue seeing more pieces have successful runs.
Being hands-on with the cyber reasoning systems has introduced unexpected challenges, but our team remains flexible and adaptable. Each week helps us better understand the deployment process and strengthen the overall reliability of our system. We are excited to keep building on this momentum as we refine our system and move closer to a fully operational setup.
See you next week!
Week 09: Concept to Prototype
October 24, 2025

GatorDetect Weekly Update: Preparing for Prototype and Presentations
This week, we focused on two major topics, diving deep into the implementation of our CRSs and preparing for a significant upcoming presentation. Our focus is primarily on translating our research into a presentable and functional prototype to display our work and product.
Deepening Our CRS Research
As a team, our individual CRS studies continued this week with a focus on a practical and reasonable implementation plan. For Team 42, we are learning how to run its individual and dependent modules in a required sequence on HiPerGator without manual queries. This involves designing a workflow using SLURM requests to manage the execution order, which is the critical step for getting this CRS running.
At the same time, we are making progress with Team Buttercup by exploring its lightweight laptop version, which is proving to be an invaluable tool, allowing us to rapidly test our integration logic before deploying onto the HiPerGator environment.
Preparing for Key Milestones
With November rapidly approaching, we are actively preparing for two major events. First, we are building our presentation for our visit to Raytheon on Nov 6th. We are excited to share our progress and get direct feedback from our sponsors as well as to interact and network with professionals.
We are also gearing up for our upcoming System Level Design Review (SLDR) day. A significant part of our preparation for both milestones involves developing a functional prototype that we can use for demonstrations.
Next Steps: A Working Prototype
Our important and immediate goal is to get a working prototype up and running. This effort combines our research on Team 42 and Buttercup and is essential for both of our presentations. We are focused on creating a system that demonstrates our core vulnerability discovery concept via open source repositories and sets the stage for the next phase of development.
See you next week!
Week 08: AI Days and Implementation
October 17, 2025

GatorDetect Weekly Update: Integration and AI Days Prep
This week our team worked on development and integration as we began the critical process of modifying selected CRSs to work with HiPerGator. Alongside this technical work, we also started preparations to showcase our project at the upcoming UF AI Days.
CRS Integration and Adaptation for HiPerGator
Our primary focus has been on the technical challenge of integrating our chosen CRSs. A significant part of this effort involves adapting these systems which were originally designed to run in a Kubernetes (k8s) environment, to operate on HiPerGator which doesn’t support k8s. This is a crucial step to allow for us to use the resources of LLMs available at UF and is a core objective of our project.
Deepening Research and Local Testing
We are continuing our deep dive into the architectures of promising CRSs, with a focus on Team 42 and Buttercup. To accelerate our development and testing cycles, we are exploring the use of Buttercup’s lightweight, standalone version that can run on a laptop. This will allow us to test core functionalities and debug our integration logic efficiently before implementing the full scale model on HiPerGator.
Looking Ahead: UF AI Days
We are excited to announce that we will be presenting our work at the upcoming UF AI Days. This is an amazing opportunity for us to share our research and the potential of our CRS platform with the broader university community. This week, we officially started designing our poster for the event and look forward to sharing more details soon.
See you next week!
Week 07: Presentations and Coding
October 10, 2025

GatorDetect Weekly Update: PDR Peer Review & Preparing for Testing
This week marked a major milestone for our team as we presented our Preliminary Design Review (PDR) and finalized key logistical project and travel plans. The feedback from our peers and the coaches were invaluable and allowed for us to make improvements for our future presentations. Our focus is now shifting towards the testing phase of the project.
PDR Presentation and Feedback
A major highlight this week was presenting our PDR at the peer review session. This was an amazing opportunity to share our projects vision, architecture, and progress with our peers and coaches. We had a great time gaining constructive feedback on our work, answering questions that gave us ideas, and the insights we gathered will be essential in refining our design as we move forward.
Continued Research and Finalized Plans
On the technical side, we continued our deep dive into potential CRSs, with a focus this week on understanding the architecture of Team 42. Also, we finalized our travel plans for our upcoming site visit, locking in the logistics for what is sure to be a productive and exciting trip.
Next Steps: Sourcing Code for Testing
With our initial research and planning phases solidifying, our next crucial step is to prepare for CRS implementation and application. We have started discussions about getting sample source code from Raytheon for our testing purposes. Access to this code will be critical for validating our system’s effectiveness and is a key step in bringing GatorDetect to life.
See you next week!
Week 06: Big Changes
October 3, 2025

Weekly Update: Deepening Our Research and Planning Our Build
This week, we focused on expanding our technical knowledge by solidifying our current toolset and planning for key future events. We explored a new CRS, confirmed some critical implementation details for the Team Buttercup CRS, and made important decisions about our project’s future infrastructure plans.
Expanding Our Potential CRSs
Our primary focus was on researching a potential new simpler CRS option and understanding its potential integration. We took a deep dive into 42-B3ond-6ug, analyzing its architecture and potential for use in our system. Alongside this new research, we had a significant breakthrough with one of our initial choices, Buttercup, confirming that it has a standalone version that can run on a laptop, great for testing purposes. We also began outlining our implementation strategy and generating initial concepts for how these systems will work together.
Infrastructure Decisions and Industry Visit
On the infrastructure side, we explored the possibility of using Kubernetes with Raspberry Pis. This is a crucial decision that will guide our system’s deployment strategy and concept generation. We also looked up the resource usage, from team 42, ensuring our resources are properly allocated due to the extreme LLM queries that some of these CRSs have. We are also excited to announce our upcoming trip to Raytheon in Largo, Florida, which we plan for on Nov 6th, which will be a great opportunity to see the company we are working for up close.
Next Steps: From a Concept to a Creation
This week’s research has been incredibly productive, clarifying our path forward. With key decisions made about our CRS choices and infrastructure, our focus is now shifting from planning to hands-on creation. We are moving into the concept generation and implementation phases, ready to start building the core of GatorDetect.
See you next week!
Week 05: Big Changes
September 26, 2025

Team Name Change: From Cipherpilot to GatorDetect
This week was an important moment for our team as we officially changed our name from CipherPilot to GatorDetect. This new name better reflects our goal and focus of detecting issues in source code using AI systems. We are excited to carry it forward as we dive into the core of our project integration.
Evaluating Our Options: CRS Integration
With our primary CRS selections from last week Team Atlanta and Buttercup, our research efforts this week pivoted to a deeper technical exploration of their architectures as well as exploring the feasibility of their integration. We are now working on how to integrate the two systems to perform vulnerability detection as well as patch proposal. The goal is to develop a mechanism that can find vulnerabilities from each CRS and work together to provide a patch that strengthens the source code. This is a crucial step for our project’s success, and our team is studying the specifics of each CRS to prepare for this integration. We are also exploring how to modify these systems to be compatible with the UF LLMs.
Product Design Specification Draft
Our key milestones this week was the creation of our Product Design Specification (PDS) draft. This document outlines the project’s technical requirements, design expectations, concepts, and deliverables. We have defined the scope of work, which includes learning about the AI Cyber Grand Challenge teams, selecting our CRSs, and creating a plan for their integration.
Next Steps: GatorDetect shifts to building.
This was a very productive week that shifted our focus from planning to the hands on technical details of the project. We have laid the groundwork for our project and are now ready to start the building and testing phases. We are looking forward to applying our current knowledge of AI and cybersecurity to bring our vision to life.
See you next week!
Week 04: Final Planning
September 19, 2025

Evaluating our Options: CRS Selection
With our primary CRS (Team Atlanta) already selected, his week, our research efforts were focused on evaluating the two potential secondary CRS models, Theori and Buttercup. After a thorough analysis of their capabilities, score scaling in the Cyber Grand Competition, and compatibility with our project goals, our team is currently leaning towards selecting Buttercup as our second CRS. This preliminary decision was based on Buttercup’s architecture and performance which seems to complement the Team Atlanta CRS.
Team Development: FPL White Belt Training
Professional development was a key focus this week. Three of our team members attended the White Belt Quality Training hosted by Florida Power and Light. The workshop provided us with valuable insights into quality principles and collaborative team dynamics, which we are already applying to structure our workflow and enhance our group’s efficiency and communication.
Next Steps: CRS Integration Preparation
With our CRS selection nearly finalized, our next steps are to download and clone the Git repositories. Our primary goal is to begin a deeper dive into their core architecture, documentation, and APIs. This technical exploration is the critical first step in understanding how we can practically integrate these CRSs with UF LLMs.
This week was a productive shift from basic team planning to some more technical planning. We are excited to get our development environments configured and start building.
See you next week!
Week 03: Preliminary Work
September 12, 2025

Technologies, Raytheon. Español: Logo de Raytheon Technologies Corporation (2020). 3 Apr. 2020. https://www.rtx.com/ (web de Raytheon Technologies): https://www.rtx.com/-/media/project/united-technologies/shared/images/rtx_logo.svg, Wikimedia Commons, https://commons.wikimedia.org/wiki/File:Raytheon_Technologies_logo.svg.
Meeting our liaisons: Introductions
During this week, we had the opportunity to meet with our two Raytheon liaison sponsors, Randall Brooks and Sylvia Traxler. This meeting marked our first key milestone as we had the opportunity to discuss our project in detail, asking questions that helped to clarify the project expectations, while also brainstorming ideas as to how we can approach the required objectives. The guidance from our liaisons will be essential in helping us navigate the complexities CRS integration into our UF LLMs
Diving into CRSs: Our Initial Research
Throughout this week we began conducting research into the CRSs from DARPA’s AI Cyber Grand Challenge. In particular, we focused on systems developed by Team Atlanta, Theori, and Trail of Bits.
Each of these CRSs brings have their own strengths:
• Team Atlanta: was the winning approach at the AIxCC
• Theori: is known for innovative vulnerability detection strategies
• Trail of Bits: has a strong focus on secure development and practical implementation
Exploring these tools has given us a better understanding of the possibilities and challenges ahead as we prep to select, configure, and test them in UF LLMs.
Next Steps: Starting our Development
With the first meeting with our liaisons completed and our initial research in progress, we are now shifting toward identifying the two CRSs that we will be integrating. At the same time, we are refining our team management system to keep the workflow structured and on schedule.
Reflecting on this week, we have gained a lot of information and dismissed a lot of confusions we had by the information gained from our liaisons. This project is beginning to take shape, and we are excited to continue our work in the weeks ahead.
See you next week!
Week 02: Getting Started
September 5, 2025

Getting Started: Diving into AI-Powered Code Development
Hi everyone! Our team of four is beginning our partnership with Raytheon on the Artificial Intelligence (AI) Assisted Source Code Development and Analysis project. With guidance from our coach Dr. Andrea Goncher and support from Raytheon’s liaison engineers, we’ll be working with Cyber Reasoning Systems (CRSs) and exploring how AI can improve secure, memory-safe coding practices.
Waiting for Our First Meeting: Setting Expectations
We haven’t yet connected with our liaison partners or our coach Andrea Goncher, but we’re using this time to prepare. These upcoming meetings will be crucial for aligning our understanding of the project scope, clarifying responsibilities, and mapping out our approach to tasks like integrating CRSs, experimenting with AI based code generation, and testing secure code translations across different programming languages.
Understanding the Project: Scope and Objectives
We reviewed the project scope from Raytheon to get our bearings. The main components include researching CRSs from DARPA’s AI Cyber Grand Challenge, selecting and configuring at least two CRSs to run in a UF supported computing environment, and developing AI generated code by translating Ada to C++ and Rust while intentionally introducing specific security weaknesses (CWEs) for our testing purposes.
We will also be evaluating how effectively these CRSs can identify certain vulnerabilities and recommend fixes. It is a technically challenging project that combines AI, cybersecurity, and software engineering in ways we haven’t done before.
Early Planning: Getting Organized
Without any formal guidance yet, we have started some preliminary planning. We are discussing certain task management approaches, working out our weekly schedules for meetings, and considering how to divide the responsibilities among team members. We know that we will need strict method to track our progress on code analysis, integration work, and testing milestones to keep everything and everyone coordinated.
We may not have had our key meetings yet this week, but we are looking forward to the technical challenges ahead. This project involves new technology and methods for us, and will push us to learn new skills. We will keep you updated on our progress, the obstacles we encounter, and what we learn along the way.
See you next week!