Blog Posts

Team Blog

IPPD 2 Week 06: Scaling with NaviGator and Prepping for QRB2

February 20, 2026

,


Buttercup: Evaluating Test Cases and Execution Speeds

This week, we focused on running the Buttercup CRS through some basic test cases to gauge performance. Our testing revealed a notable dichotomy in the system’s execution times: while Buttercup is highly efficient and quick at the initial bug-finding stage, the subsequent fuzzing process required to generate functional Proofs of Vulnerability (PoVs) takes a significantly long time. We will continue to monitor and account for these execution times as we scale our testing efforts.

Atlantis: Deep Debugging and Architectural Insights

For Atlantis, we successfully completed the integration of local LLMs into crs-multilang. However, we quickly hit a ceiling regarding model intelligence; local hardware constraints restricted us to smaller models, which struggled to accurately process our complex queries. To bypass this bottleneck, we pivoted to using NaviGator. Because of the architectural groundwork laid previously, we were able to seamlessly reuse our existing LiteLLM proxy code, simply repointing it from our local Ollama instance to the NaviGator API.

Initial testing with NaviGator yielded a 50% success rate for API calls, leading to a heavy debugging phase. The primary culprit was a formatting mismatch: crs-multilang was structuring API calls in a format strictly supported by OpenAI models, specifically including Pydantic models. To resolve this, we implemented a callback file that flattens all prompts before they are passed to the API, alongside a few other minor system fixes. These adjustments massively improved stability, bringing our success rate to approximately 296 out of 300 API calls. The four remaining failures were traced back to the LLMs mistakenly outputting line-ending tokens directly into the output, which broke the parsing logic. A fix for this has been implemented and is awaiting testing. Going into next week, finalizing this NaviGator logic is our top priority so we can begin running our test suite.

Project Standards: Refining the Test Suite and Vulnerability Ranker

With QRB2 quickly approaching, establishing reliable metrics is critical. To that end, we heavily refined our testing infrastructure this week. We pared down our previous list of 20+ OSS-Fuzz compatible repositories to a curated test suite of about 15 codebases. For each of these, we manually isolated versions with known bugs and injected the specific metadata files required by both Atlantis and Buttercup. This refined suite will serve as our primary ground truth for evaluating both CRSs.

Simultaneously, the Vulnerability Ranker pipeline took a concrete step forward. Now that we have generated real example outputs from both Atlantis and Buttercup, we were able to successfully finalize and format the expected input JSON structure. This ensures the Ranker is fully prepared to ingest and process vulnerability data universally from both of our systems.

More progress to come next week.


IPPD 2 Week 05: Debugging Local LLMs and Standardizing Output

February 13, 2026

,


Buttercup: Transitioning to HiPerGator

While our detection pipeline performed exceptionally well last week using NaviGator, migrating that success to the university’s high-performance computing cluster, HiPerGator, has introduced new infrastructure challenges. We successfully resolved the initial issue where the system failed to execute tool calls when running models directly on HiPerGator, but we have now hit a hardware ceiling. It appears that our current allocation of a single GPU is insufficient for the compute-heavy workload required by the local models. Our immediate next step is to scale up our allocation to multiple GPUs to alleviate this bottleneck and match the performance stability we previously saw with the API.

Atlantis: Deep Debugging and Architectural Insights

Following our success in setting up the local LiteLLM proxy last week, our primary focus for Atlantis shifted to the actual execution of crs-multilang with these local LLMs enabled. While the system runs smoothly in isolation, re-introducing the intelligence layer has introduced significant complexity. We encountered persistent errors throughout the week, prompting us to make five distinct alterations to the project codebase to facilitate better communication between the components. While we are still in the debugging phase, these hurdles have forced us to learn significantly more about how crs-multilang functions under the hood.

Notably, we discovered that the system manages repositories differently depending on the component: the fuzzer operates on a local “partial repository” containing only relevant harnesses, whereas the LLM component utilizes a TAR’ed version of the full repository. Furthermore, we found that fuzzing occurs continuously, while LLM calls are executed on a 30-second repeating timer, feeding insights back to the fuzzer live.

Moving forward, we will continue debugging the LLM integration. However, to ensure we meet our deadlines, we have established a contingency plan: if the LLM errors persist, we will proceed with running crs-multilang without the LLM component on our testing suite. This will allow us to gather necessary baseline metrics in time for the next QRB.

Project Standards: Unifying Outputs and the Test Suite

In addition to our specific system work, the group met this week to standardize our testing and reporting infrastructure. Since Buttercup and Atlantis currently output vulnerability data in different structures, we officially decided on a specific JSON format to serve as the standard input for our Vulnerability Ranker. This is a critical step for ensuring the Ranker works universally across our tools.

Finally, we compiled a list of over 20 OSS-Fuzz style codebases to serve as our primary testing suite. To ensure we have a reliable ground truth for generating metrics, we plan to re-introduce specific vulnerabilities into these projects. This will give us a concrete baseline to measure exactly how well our CRSs are detecting and patching known issues.

More progress to come next week.


IPPD 2 Week 04: Local LLM Integration and Stabilizing Detection

February 6, 2026

,


Buttercup: Permission Fixes and Validating Detection

We made excellent progress on Buttercup this week by resolving low-level execution issues. We identified that several fuzzer binaries were missing execute permissions; correcting this allowed the system to successfully identify and patch a vulnerability in our test cases. While the patching capability works, it remains unstable. However, the vulnerability detection is performing exceptionally well. We tested the detection pipeline using gpt-oss-20b and gpt-oss-120b via NaviGator (the UF LLM API) to streamline our testing process, and the results were highly accurate. We were already planning on foregoing patching due to project time and scope constraints, so it is reassuring that it worked, even if it was unstable. Next week, we will verify that switching back to HiPerGator maintains vulnerability detection performance and hold an internal meeting to standardize the format of our vulnerability reports.

Atlantis: Establishing the Local LiteLLM Proxy

Building on last week’s milestone of running crs-multilang locally, our focus this week shifted to re-integrating the intelligence layer: the LLMs. Since the system is already configured to communicate via LiteLLM, our task was to establish a local LiteLLM proxy server that routes traffic to local models running via Ollama. This integration proved to be a significant hurdle. Because we are operating within the Windows Subsystem for Linux (WSL), we encountered complex environment conflicts between WSL and the host Windows environment. After extensive debugging to resolve these cross-environment dependencies, we successfully stood up the proxy. crs-multilang is now fully configured to use local LLMs, and we are ready to fully test it. Looking ahead, once local testing confirms stability, we will migrate the endpoint to HiPerGator. We are also beginning development on a dedicated test suite to generate concrete performance metrics for both Atlantis and Buttercup.

Vulnerability Ranker: Containerization and Format Standardization

Parallel to the CRS work, the RAG pipeline received significant infrastructure updates. We modified the system to specifically parse, store, and utilize crash log formats similar to those output by crs-multilang, ensuring tighter integration between our generation and ranking systems. Furthermore, we moved the entire RAG setup into a Docker container. This containerization ensures that the ranker is easy to deploy and consistent across different development environments.

More progress to come next week.


IPPD 2 Week 03: Atlantis Success and QRB1

January 30, 2026

,


Buttercup: Resolving Timeouts and Facing New Tooling Issues

Our focus on Buttercup this week centered on resolving the execution bottlenecks identified previously. We attempted to mitigate the issue by significantly increasing the system timeout thresholds. This worked, however the process is still failing due to a new issue. Diagnostics indicate that the error has shifted downstream: we are now encountering LLM failures that appear to stem from unsupported tool calling within our current model configuration.

Despite this new technical hurdle, we remain optimistic because we have a clear path forward for testing. We hypothesize that the root cause is the model itself rather than the framework, so our immediate next step is to swap to a different LLM model better optimized for tool usage to verify the fix.

Atlantis: A Major Milestone in Local Execution

We achieved a major breakthrough this week with our second CRS, Atlantis. We decided previously to isolate and focus exclusively on the primary subsystem, CRS-multilang, since it was the main contributor to Atlantis’s performance and allows us to scale down our infrastructure overhead. This week, we managed to get CRS-multilang fully running locally without reliance on external LLMs, and critically, the subsystem successfully identified the vulnerability we injected. This is a significant proof-of-concept that validates the core architecture of our work.

We also had another productive meeting with Dongkwan, who provided more specifications regarding the compute resources he used to achieve stable local execution of CRS-multilang. Additionally, he shared details about a new, open-source CRS orchestration framework he is currently developing that is designed to be significantly easier to use.

With the CRS-multilang baseline logic now proven, our roadmap for next week is clear: we will integrate local Ollama LLMs to enhance the system’s reasoning capabilities before migrating to HiPerGator to leverage university computing resources for scale.

RAG Re-Ranker Development: More Corpus

Parallel to our systems work, we continued to mature the Retrieval-Augmented Generation (RAG) pipeline. This week, we significantly expanded our dataset, ingesting even more specialized cybersecurity corpus data. With the vector database now more comprehensive, our focus has shifted toward the generative component of the architecture. We have started researching and evaluating specific LLMs to determine which model will best handle the nuances of re-ranking vulnerability reports, ensuring the final output is both accurate and actionable for the user.

QRB1: Feedback and Project Validation

Finally, we completed our first Quarterly Review Board (QRB1) this week, where we presented our current project progress to a panel of faculty. The feedback was encouraging; the faculty were in unanimous agreement regarding our direction and the validity of our approach. Their primary concern centered on the difficulty of getting the Atlantis environment operational given its historical instability. However, given our recent success in isolating and running the CRS-multilang subsystem locally, we feel confident that we have already begun to mitigate this risk and are well-positioned to satisfy the board’s technical requirements moving forward.

More progress to come next week.


IPPD 2 Week 02: Pipeline Refinements and Strategic Deadlines

January 23, 2026

,


Buttercup: Improving the LLMs and Reevaluating Bottlenecks

With Buttercup successfully running in a local environment, our primary objective this week was to transition its internal LLM dependencies to HiPerGator. By migrating away from local hardware, we aim to leverage the university’s high-performance computing clusters to run more sophisticated models with greater efficiency.

We achieved a significant milestone by successfully standing up LLMs using Ollama on HiPerGator. As we continue to refine the interface between Buttercup and these remote instances, we utilized NaviGator (the UF LLM API) as an intermediate bridge to maintain development momentum.

Despite this upgrade in model access, we encountered the same hurdle: the system consistently times out after 30 minutes of execution. Our diagnostics suggest that the bottleneck is likely not the latency of the LLM calls themselves, but rather local processing speeds within the Buttercup environment that struggle to keep pace with the task requirements. Over the coming weeks, we will be brainstorming fixes to resolve these processing inefficiencies, ensuring that the system can sustain the long-running tasks necessary for thorough vulnerability analysis.

Atlantis: Debugging Build Scripts and Defining a Pivot Point

Progress on Atlantis continued this week as we implemented several architectural fixes recommended by Dongkwan, a member of the Atlantis team. These adjustments moved us a step closer to a functional environment: our build script now successfully generates two Docker images instead of one.

While the system is still failing to run completely, these incremental successes in the build process are encouraging. However, to ensure we maximize our output for the semester, we have established a hard deadline of February 10th. If we cannot achieve a stable, running version of Atlantis by that date, the entire team will pivot to focus exclusively on Buttercup. This strategic “fail-fast” approach ensures that we dedicate our resources to the most viable path forward well before the end of the term.

RAG Re-Ranker Development: Embedding the Corpus

Parallel to our work on system development, we made significant strides in our Retrieval-Augmented Generation (RAG) ppeline. This week, we pulled a dedicated cybersecurity corpus, processed the data, and successfully embedded it into our vector database. This RAG component will be vital for re-ranking the outputs of our two CRS systems, allowing the user to clearly see which vulnerabilities need to be prioritized.

More progress to come next week.


IPPD 2 Week 01: Operational Buttercup and Next Technical Hurdles

January 16, 2026

,


Buttercup Progress and Local LLM Execution

This week marked a major technical milestone for our project. After continued debugging and refinement, we successfully got Buttercup running without deployment errors. Beyond simply standing up the system, we also integrated Buttercup with Ollama, enabling it to make local large language model (LLM) calls rather than relying on external APIs. With this integration complete, Buttercup is now fully operational and capable of beginning task execution.

With the system running, we validated Buttercup’s core functionality by initiating a sample task focused on scanning GitHub repositories for vulnerabilities and available patches. While the task executes as expected, we identified a new constraint: Buttercup currently enforces a 30-minute task timeout, and the sample task exceeds this limit. As a result, execution is cut off before completion. Although this is not a functional failure, it is a key limitation we will begin addressing next.

Team Atlanta Meetings and Deployment Planning

In parallel with Buttercup development, we continued progress on the Atlantis side of the project. We met again with a member of Team Atlanta, who provided valuable insight into Atlantis’s architecture and operational requirements. In particular, they shared estimates for the computational resources required to run specific Atlantis components.

Using the guidance from this meeting, we will refine our deployment strategy and begin addressing the remaining errors preventing Atlantis from running in a local environment. Their advice has helped us narrow our focus to the most critical components, allowing us to prioritize feasibility and efficiency rather than attempting a full-scale deployment prematurely.

Looking Ahead: RAG Database and Refinement

Looking forward, our next major initiative will be the development of a Retrieval-Augmented Generation (RAG) database. This will allow us to more effectively store, retrieve, and contextualize vulnerability information for downstream tasks. Combined with Buttercup’s now-functional execution pipeline, this represents an important step toward a more complete and scalable system.

Overall, this week represented a shift from foundational setup to identifying and resolving higher-level system constraints. With Buttercup operational, clearer expectations for Atlantis, and a concrete plan for RAG integration, we are well-positioned to tackle the next phase of development.

More progress to come next week.


IPPD 1 Week 14/15: SLDR Success, Professional Growth, and December Plans

December 5, 2025

,

System Level Design Review and Sponsor Engagement

This week marked one of the biggest milestones of the semester: our System Level Design Review (SLDR). Sponsor companies from across the program attended to hear about each team’s progress, ask questions, and provide direction as we move into the next phase of development. Presenting our project in front of industry partners was both exciting and insightful, and we received highly constructive feedback that will help us strengthen our technical approach and overall narrative moving forward.

Before the SLDR presentations began, we participated in an interactive keynote session where all students were placed into semi-randomized tables. This setup encouraged us to meet new people, collaborate with students outside our usual circles, and engage directly with company liaisons. The conversations were incredibly valuable, ranging from advice on soft skills and networking to perspectives on thriving within corporate environments. It was a refreshing reminder that effective communication and professional awareness are just as important as technical competence.

Looking Ahead: December Work and Cross-Team Collaboration

Although this will be our final blog post for the fall semester, our work is far from over. Throughout December, our team will continue pushing toward a significant milestone: getting Buttercup to run a sample task fully end to end using Ollama for local LLM interactions rather than relying on API keys for private models. With local inference now more feasible, this shift should streamline development and make the system easier to test and iterate on.

On the Atlantis side, we are maintaining ongoing meetings with Team Atlanta to deepen our understanding of the system’s structure and component relationships. Their guidance has been instrumental in helping us diagnose build errors and determine which modules we should extract and scale down for our own implementation. These conversations will continue to shape our plan for setting up a clean, minimal Atlantis environment.

Closing Out the Semester

With SLDR behind us, our team feels both energized and focused. The feedback we received, both technical and professional, has given us a strong sense of direction heading into winter break. We look forward to making meaningful progress in December and starting the spring semester with a more mature, better-understood pipeline.

More updates to come in January!


IPPD 1 Week 13: Deployment Progress and Exciting Meetings

November 21, 2025

,

Peer SLDR Session and Presentation Feedback

The Peer System Level Design Review (SLDR) was this week! Fellow students and project coaches reviewed our presentation ahead of the formal SLDR. The feedback was encouraging and highlighted noticeable growth in both our project clarity and our communication skills since the Preliminary Design Review (PDR). Reviewers pointed out that our system understanding and overall narrative are much stronger now, and the session helped us identify a few areas to refine before the official review. Overall, the Peer SLDR served as a valuable checkpoint and gave the team more confidence moving into the next stage of the process.

Advancing Buttercup Toward Full Local Execution

This week marked major progress on the Buttercup side of the project. All remaining local deployment errors were resolved, allowing the system to start up cleanly and begin processing a test task. The run ultimately failed only because a placeholder credential was used, which confirms that the full task flow is nearly operational. To support this work, we also set up a local LLM environment and verified that lightweight models can be prompted successfully. This positions us well for integrating model inference directly into Buttercup’s pipeline.

While troubleshooting, we noticed recent upstream updates in the Buttercup repository that addressed some of the same issues we had already fixed independently. This reinforced the importance of monitoring active repositories more frequently, since doing so helps prevent redundant work and keeps our local setup aligned with ongoing development activity. Moving forward, checking for updates will become a regular part of our workflow.

Progress on Atlantis and Cross-Team Engagement

In parallel with Buttercup, the team made progress in understanding and preparing the Atlantis stack. We met with a team member involved with another Atlantis effort and gained clarity on where to begin and how the major components relate to one another. Based on that discussion, we contacted a contributor from another part of the Atlantis ecosystem and began arranging a meeting to better understand one of the build systems and its role in the workflow. These conversations are helping us map out the architecture and identify the best sequence for bringing the necessary pieces online.

On the technical side, early work has begun on addressing version mismatch issues that are affecting one of the Atlantis build processes. Although this work is still in progress, the team now has a clearer understanding of the problem thanks to the discussions held this week.

Next Steps: Task Execution and Meeting Team Atlanta

In the coming week, the focus for Buttercup will be completing an end to end run using a small LLM and verifying that the entire pipeline functions as expected. The team will also continue testing collaboratively to confirm consistent behavior across environments. For Atlantis, the next steps involve meeting with the external collaborator, gaining additional clarity of its structure, and continuing to resolve the configuration issues that are preventing a successful local build. With Thanksgiving coming up next week, our team will take a short pause and return to our regular work the following week.


IPPD 1 Week 12: Progress on Running CRS Modules

November 14, 2025

,

Running Buttercup: Holistic Approach

Given that Buttercup is relatively lightweight compared to other CRSs, our approach focuses on bringing the full system up at once rather than isolating individual modules. Earlier in the week, about half of its Kubernetes pods were failing during local deployment. Through targeted troubleshooting, we reduced this to only a small set of remaining pods, all tied to what appears to be a common underlying issue now under investigation.

Running Atlantis: Modular Approach

Atlantis is significantly more complex, so our strategy focuses on bringing key modules online first rather than running the entire system end-to-end. We have identified two core components that handle the majority of the vulnerability-detection workload, and these remain our primary area of effort.

Challenges in Local Deployment

While we are making steady progress toward getting both CRSs running locally, several challenges remain. These systems were originally designed for competition environments rather than long-term use, which means some of their dependencies are outdated or inconsistent. As a result, we encounter issues that require additional troubleshooting. Build times also present a bottleneck: Buttercup can take roughly an hour to build, and major components within Atlantis often take even longer.

Next Steps: Resolving Errors

The immediate goal is to resolve the remaining local deployment issues so that both CRSs can start running. For Buttercup, we will aim for full system execution, while for Atlantis, we will focus on its critical modules. Once this foundation is in place, we will implement modifications to support open-source models and integrate with HiPerGator. As of recent, our plan is to operate the CRSs locally while offloading LLM workloads to HiPerGator. This approach allows us to continue using Docker and Kubernetes without re-architecting the CRSs to fit HiPerGator, which would not align with Raytheon’s operational environment.


IPPD 1 Week 11: Prototype Inspection and PDR

November 7, 2025

,

Prototype Inspection Day (Gainesville, FL)

Our team setting up the environment to run our prototype live at the University of Florida, taking the next steps towards developing and showcasing our prototype.

Preliminary Design Review (Largo, FL)

Our team at Raytheon’s Largo office following our Preliminary Design Review presentation.

Prototype Inspection Day

On Tuesday, November 4, 2025, we attended Prototype Inspection Day, presenting our Streamlit-based prototype to three sets of judges. The prototype pulls from a GitHub repository and uses a single LLM call to identify vulnerabilities and recommend fixes. While we are still actively working to get the Cyber Reasoning Systems (CRSs) running, this initial version allowed us to focus on the user interface and gather valuable feedback. The judges provided helpful suggestions, which we quickly applied to improve our project ahead of our upcoming presentations.

Preliminary Design Review (PDR)

On Thursday, November 6, 2025, our team traveled from Gainesville, FL to Largo, FL to present our PDR at Raytheon’s office. This was a great opportunity to meet our liaisons, executive sponsors, and other Raytheon members in person. During the visit, we also had the chance to see projects from other teams, including those from other schools, and it was interesting to learn about their work and approaches. We had a great time meeting everybody, receiving feedback, and exchanging ideas.

Looking Ahead: SLDR

With the PDR behind us, we will now focus on our System Level Design Review (SLDR) which is coming soon. To prepare, we are focusing on getting the CRSs running without errors. We are also finalizing design decisions, taking into consideration all the feedback received this week. We have much to look forward to.

See you next week!


IPPD 1 Week 10: Let’s Get This Running

October 31, 2025

,

GatorDetect Weekly Update: Presenting at UF AI Days and Advancing System Deployment

This week was an exciting milestone for our team as we participated in UF AI Days, where we presented our project poster and shared our ongoing research with peers, faculty, and industry guests. It was a great opportunity to communicate the broader goals of our work, receive valuable feedback, and gain insights into how our project fits within the larger landscape of AI innovation. The event also gave us the chance to reflect on how far we have come in developing our system.

Attempting to Run Cyber Reasoning Systems Locally

Outside of AI Days, our team made significant progress in deploying cyber reasoning systems locally. At this point, we are shifting from research to software development. Much of our focus this week was on getting key modules to run in local and containerized environments. We successfully built the containers for Buttercup and began running the system locally. However, several of the Kubernetes pods failed during startup, which we traced back to repository migration issues.

We also made progress in terms of utilizing open-source LLM models. It is much more clear how we could leverage HiPerGator to run LLMs and how we can alter the source code of the cyber reasoning systems to make them use open-source models instead of private ones that require API keys. At the end of this week, it has become clear that we are moving towards stable, repeatable system runs with minimal cost.

Next Steps: Refining Deployment

Next week, we plan to continue debugging and refining the deployment of the cyber reasoning systems so that key components run smoothly in our local environment. We are working hard to continue seeing more pieces have successful runs.

Being hands-on with the cyber reasoning systems has introduced unexpected challenges, but our team remains flexible and adaptable. Each week helps us better understand the deployment process and strengthen the overall reliability of our system. We are excited to keep building on this momentum as we refine our system and move closer to a fully operational setup.

See you next week!


IPPD 1 Week 09: Concept to Prototype

October 24, 2025

,

GatorDetect Weekly Update: Preparing for Prototype and Presentations

This week, we focused on two major topics, diving deep into the implementation of our CRSs and preparing for a significant upcoming presentation. Our focus is primarily on translating our research into a presentable and functional prototype to display our work and product.

Deepening Our CRS Research

As a team, our individual CRS studies continued this week with a focus on a practical and reasonable implementation plan. For Team 42, we are learning how to run its individual and dependent modules in a required sequence on HiPerGator without manual queries. This involves designing a workflow using SLURM requests to manage the execution order, which is the critical step for getting this CRS running.

At the same time, we are making progress with Team Buttercup by exploring its lightweight laptop version, which is proving to be an invaluable tool, allowing us to rapidly test our integration logic before deploying onto the HiPerGator environment.

Preparing for Key Milestones

With November rapidly approaching, we are actively preparing for two major events. First, we are building our presentation for our visit to Raytheon on Nov 6th. We are excited to share our progress and get direct feedback from our sponsors as well as to interact and network with professionals.

We are also gearing up for our upcoming System Level Design Review (SLDR) day. A significant part of our preparation for both milestones involves developing a functional prototype that we can use for demonstrations.

Next Steps: A Working Prototype

Our important and immediate goal is to get a working prototype up and running. This effort combines our research on Team 42 and Buttercup and is essential for both of our presentations. We are focused on creating a system that demonstrates our core vulnerability discovery concept via open source repositories and sets the stage for the next phase of development.

See you next week!


IPPD 1 Week 08: AI Days and Implementation

October 17, 2025

,

GatorDetect Weekly Update: Integration and AI Days Prep

This week our team worked on development and integration as we began the critical process of modifying selected CRSs to work with HiPerGator. Alongside this technical work, we also started preparations to showcase our project at the upcoming UF AI Days.

CRS Integration and Adaptation for HiPerGator

Our primary focus has been on the technical challenge of integrating our chosen CRSs. A significant part of this effort involves adapting these systems which were originally designed to run in a Kubernetes (k8s) environment, to operate on HiPerGator which doesn’t support k8s. This is a crucial step to allow for us to use the resources of LLMs available at UF and is a core objective of our project.

Deepening Research and Local Testing

We are continuing our deep dive into the architectures of promising CRSs, with a focus on Team 42 and Buttercup. To accelerate our development and testing cycles, we are exploring the use of Buttercup’s lightweight, standalone version that can run on a laptop. This will allow us to test core functionalities and debug our integration logic efficiently before implementing the full scale model on HiPerGator.

Looking Ahead: UF AI Days

We are excited to announce that we will be presenting our work at the upcoming UF AI Days. This is an amazing opportunity for us to share our research and the potential of our CRS platform with the broader university community. This week, we officially started designing our poster for the event and look forward to sharing more details soon.

See you next week!


IPPD 1 Week 07: Presentations and Coding

October 10, 2025

,

GatorDetect Weekly Update: PDR Peer Review & Preparing for Testing

This week marked a major milestone for our team as we presented our Preliminary Design Review (PDR) and finalized key logistical project and travel plans. The feedback from our peers and the coaches were invaluable and allowed for us to make improvements for our future presentations. Our focus is now shifting towards the testing phase of the project.

PDR Presentation and Feedback

A major highlight this week was presenting our PDR at the peer review session. This was an amazing opportunity to share our projects vision, architecture, and progress with our peers and coaches. We had a great time gaining constructive feedback on our work, answering questions that gave us ideas, and the insights we gathered will be essential in refining our design as we move forward.

Continued Research and Finalized Plans

On the technical side, we continued our deep dive into potential CRSs, with a focus this week on understanding the architecture of Team 42. Also, we finalized our travel plans for our upcoming site visit, locking in the logistics for what is sure to be a productive and exciting trip.

Next Steps: Sourcing Code for Testing

With our initial research and planning phases solidifying, our next crucial step is to prepare for CRS implementation and application. We have started discussions about getting sample source code from Raytheon for our testing purposes. Access to this code will be critical for validating our system’s effectiveness and is a key step in bringing GatorDetect to life.

See you next week!


IPPD 1 Week 06: Big Changes

October 3, 2025

,

Weekly Update: Deepening Our Research and Planning Our Build

This week, we focused on expanding our technical knowledge by solidifying our current toolset and planning for key future events. We explored a new CRS, confirmed some critical implementation details for the Team Buttercup CRS, and made important decisions about our project’s future infrastructure plans.

Expanding Our Potential CRSs

Our primary focus was on researching a potential new simpler CRS option and understanding its potential integration. We took a deep dive into 42-B3ond-6ug, analyzing its architecture and potential for use in our system. Alongside this new research, we had a significant breakthrough with one of our initial choices, Buttercup, confirming that it has a standalone version that can run on a laptop, great for testing purposes. We also began outlining our implementation strategy and generating initial concepts for how these systems will work together.

Infrastructure Decisions and Industry Visit

On the infrastructure side, we explored the possibility of using Kubernetes with Raspberry Pis. This is a crucial decision that will guide our system’s deployment strategy and concept generation. We also looked up the resource usage, from team 42, ensuring our resources are properly allocated due to the extreme LLM queries that some of these CRSs have. We are also excited to announce our upcoming trip to Raytheon in Largo, Florida, which we plan for on Nov 6th, which will be a great opportunity to see the company we are working for up close.

Next Steps: From a Concept to a Creation

This week’s research has been incredibly productive, clarifying our path forward. With key decisions made about our CRS choices and infrastructure, our focus is now shifting from planning to hands-on creation. We are moving into the concept generation and implementation phases, ready to start building the core of GatorDetect.

See you next week!


IPPD 1 Week 05: Big Changes

September 26, 2025

,

Team Name Change: From Cipherpilot to GatorDetect

This week was an important moment for our team as we officially changed our name from CipherPilot to GatorDetect. This new name better reflects our goal and focus of detecting issues in source code using AI systems. We are excited to carry it forward as we dive into the core of our project integration.

Evaluating Our Options: CRS Integration

With our primary CRS selections from last week Team Atlanta and Buttercup, our research efforts this week pivoted to a deeper technical exploration of their architectures as well as exploring the feasibility of their integration. We are now working on how to integrate the two systems to perform vulnerability detection as well as patch proposal. The goal is to develop a mechanism that can find vulnerabilities from each CRS and work together to provide a patch that strengthens the source code. This is a crucial step for our project’s success, and our team is studying the specifics of each CRS to prepare for this integration. We are also exploring how to modify these systems to be compatible with the UF LLMs.

Product Design Specification Draft

Our key milestones this week was the creation of our Product Design Specification (PDS) draft. This document outlines the project’s technical requirements, design expectations, concepts, and deliverables. We have defined the scope of work, which includes learning about the AI Cyber Grand Challenge teams, selecting our CRSs, and creating a plan for their integration.

Next Steps: GatorDetect shifts to building.

This was a very productive week that shifted our focus from planning to the hands on technical details of the project. We have laid the groundwork for our project and are now ready to start the building and testing phases. We are looking forward to applying our current knowledge of AI and cybersecurity to bring our vision to life.

See you next week!


IPPD 1 Week 04: Final Planning

September 19, 2025

,

Evaluating our Options: CRS Selection

With our primary CRS (Team Atlanta) already selected, his week, our research efforts were focused on evaluating the two potential secondary CRS models, Theori and Buttercup. After a thorough analysis of their capabilities, score scaling in the Cyber Grand Competition, and compatibility with our project goals, our team is currently leaning towards selecting Buttercup as our second CRS. This preliminary decision was based on Buttercup’s architecture and performance which seems to complement the Team Atlanta CRS.

Team Development: FPL White Belt Training

Professional development was a key focus this week. Three of our team members attended the White Belt Quality Training hosted by Florida Power and Light. The workshop provided us with valuable insights into quality principles and collaborative team dynamics, which we are already applying to structure our workflow and enhance our group’s efficiency and communication.

Next Steps: CRS Integration Preparation

With our CRS selection nearly finalized, our next steps are to download and clone the Git repositories. Our primary goal is to begin a deeper dive into their core architecture, documentation, and APIs. This technical exploration is the critical first step in understanding how we can practically integrate these CRSs with UF LLMs.

This week was a productive shift from basic team planning to some more technical planning. We are excited to get our development environments configured and start building.

See you next week!


IPPD 1 Week 03: Preliminary Work

September 12, 2025

,

Technologies, Raytheon. Español:  Logo de Raytheon Technologies Corporation (2020). 3 Apr. 2020. https://www.rtx.com/ (web de Raytheon Technologies):    https://www.rtx.com/-/media/project/united-technologies/shared/images/rtx_logo.svg, Wikimedia Commons, https://commons.wikimedia.org/wiki/File:Raytheon_Technologies_logo.svg.

Meeting our liaisons: Introductions

During this week, we had the opportunity to meet with our two Raytheon liaison sponsors, Randall Brooks and Sylvia Traxler. This meeting marked our first key milestone as we had the opportunity to discuss our project in detail, asking questions that helped to clarify the project expectations, while also brainstorming ideas as to how we can approach the required objectives. The guidance from our liaisons will be essential in helping us navigate the complexities CRS integration into our UF LLMs

Diving into CRSs: Our Initial Research

Throughout this week we began conducting research into the CRSs from DARPA’s AI Cyber Grand Challenge. In particular, we focused on systems developed by Team Atlanta, Theori, and Trail of Bits.

Each of these CRSs brings have their own strengths:

•           Team Atlanta: was the winning approach at the AIxCC

•           Theori: is known for innovative vulnerability detection strategies

•           Trail of Bits: has a strong focus on secure development and practical implementation

Exploring these tools has given us a better understanding of the possibilities and challenges ahead as we prep to select, configure, and test them in UF LLMs.

Next Steps: Starting our Development

With the first meeting with our liaisons completed and our initial research in progress, we are now shifting toward identifying the two CRSs that we will be integrating. At the same time, we are refining our team management system to keep the workflow structured and on schedule.

Reflecting on this week, we have gained a lot of information and dismissed a lot of confusions we had by the information gained from our liaisons. This project is beginning to take shape, and we are excited to continue our work in the weeks ahead.

See you next week!


IPPD 1 Week 02: Getting Started

September 5, 2025

,

Getting Started: Diving into AI-Powered Code Development

Hi everyone! Our team of four is beginning our partnership with Raytheon on the Artificial Intelligence (AI) Assisted Source Code Development and Analysis project. With guidance from our coach Dr. Andrea Goncher and support from Raytheon’s liaison engineers, we’ll be working with Cyber Reasoning Systems (CRSs) and exploring how AI can improve secure, memory-safe coding practices.

Waiting for Our First Meeting: Setting Expectations

We haven’t yet connected with our liaison partners or our coach Andrea Goncher, but we’re using this time to prepare. These upcoming meetings will be crucial for aligning our understanding of the project scope, clarifying responsibilities, and mapping out our approach to tasks like integrating CRSs, experimenting with AI based code generation, and testing secure code translations across different programming languages.

Understanding the Project: Scope and Objectives

We reviewed the project scope from Raytheon to get our bearings. The main components include researching CRSs from DARPA’s AI Cyber Grand Challenge, selecting and configuring at least two CRSs to run in a UF supported computing environment, and developing AI generated code by translating Ada to C++ and Rust while intentionally introducing specific security weaknesses (CWEs) for our testing purposes.

We will also be evaluating how effectively these CRSs can identify certain vulnerabilities and recommend fixes. It is a technically challenging project that combines AI, cybersecurity, and software engineering in ways we haven’t done before.

Early Planning: Getting Organized

Without any formal guidance yet, we have started some preliminary planning. We are discussing certain task management approaches, working out our weekly schedules for meetings, and considering how to divide the responsibilities among team members. We know that we will need strict method to track our progress on code analysis, integration work, and testing milestones to keep everything and everyone coordinated.

We may not have had our key meetings yet this week, but we are looking forward to the technical challenges ahead. This project involves new technology and methods for us, and will push us to learn new skills. We will keep you updated on our progress, the obstacles we encounter, and what we learn along the way.

See you next week!