Blog Posts

Week 11: Testing for Testing’s Sake

Following up on our plans from week 7 about using a 3D printed model of a tower in order to validate our results, we are glad to report that the model has been printed, and the process for setting up those tests has begun. We have not yet trained a model on data for the tower, but just for fun (and to see if it wanted to make our lives easier by just working immediately), we decided to run our object detection and tracking algorithm on a video of the tower model:

Video showing object detection and tracking on 3D printed tower analogue

As you can see, it unfortunately is not working perfectly immediately. I’ll emphasize that this is this video is a very different environment compared to the videos that our object detection algorithm was trained on, so the fact that it’s performing this well at all is actually rather surprising. If it can do this well immediately, imagine how well it would do if we actually gave it some applicable training data.

If for whatever reason we really needed to use this model, the biggest problem of it picking up objects in the background, could also be fixed by just setting up piece of paper or something in the background, and just keeping that facing the camera to block out anything in the background.

So while these results aren’t particularly impressive without context, they nonetheless mark an important milestone, and help us to know where we need to focus our efforts in order to keep up with our plans for the future.

Week 10: Tracking our Progress

This week, we’re showing off the improvements we made to our object tracking algorithm. If you’ll recall, before, we encountered issues of it reclassifying antennas a lot, resulting in several IDs for the same antenna. Though this is something that could be addressed afterwards, by setting up an algorithm to recognize when multiple IDs correspond to the same antenna, this would be rather complex and take time to do. So instead, we focused on the object detection and tracking algorithm itself, hoping to at the very least reduce the amount of post-processing we would need to do, and here are the results:

Output of tracking algorithm (at x2 speed)

As you can see, the algorithm now does a far better job at keeping IDs consistent. There was one period where it lost track of an antenna for some time, and as a result, assigned it a new ID, but that is more likely to be the fault of the object detection algorithm than the object tracking, and is something that can be fixed by adding more training data to that model (which is something either we or Verizon would need to do at some point regardless if this project were to ever be implemented on a large scale).

You may also notice that when the drone makes a complete loop and starts the same antennas in a second loop, that also assigns new IDs. This is something we have planned to address for a while now. If you scroll back to week 8, you’ll see the prototype interface for uploading a video, and under the ‘Description’ field, there is a drop-down with numbers. This is a field for the user to submit the number of antennas on the tower they’re examining. With this information, it should be a simple matter to have our algorithm restart the IDs back to 1 once they reach that maximum value.

Depending on the performance of our angle calculation, and how much data we need from that to get accurate results, we can also simply restrict the input video to only including a single loop of the tower.

Week 9: Setting the Record Straight

Having completed camera calibration last week, we can now begin applying those results to our angle calculation, to hopefully see more accurate results.

Pair of images before undistorting
Pair of images after undistorting

Shown above are a pair of images at two different steps in the angle calculation pipeline. Though the difference is not that significant at first glance, you’ll notice that in the second set of images, they are not perfect rectangles, and the frame is slightly curved, particularly near the corners. This is the result of undistorting the images, straightening them out to account for the distortion caused by the camera lens.

Despite the difference not being all that visually significant, in the context of angle calculation, a pixel being slightly off could translate to a difference of several feet, which would then throw off our angle calculation, so in order to ensure our results are as accurate as possible, this is an important and necessary step.

Week 8: Calibrating & Interfacing

Last week, we received video from Verizon for calibration. This video showed a checkerboard pattern shown below.

Frames from calibration video

Using this video, we can use software to detect the corners, then knowing that the corners should form straight lines with each other, we can calculate how much the lens is distorting the image. By performing this calculation several times across the entire video, we can then get the parameters describing the lens distortion, which we can then apply to our other videos.

We also created a draft of our basic interface for the project, which will allow users to upload their video and run it through the pipeline.

Upload interface draft, and output table with sample values

This allows us to better plan what information we can request from the user (particularly the number of antennas) and how we want to present our results. It also means once our pipeline is complete, it should simply be a matter of plugging it into the back end, which will aid in testing.

Week 7: Reviewing our Work

This week, we did our second qualitative review board presentation. The biggest takeaway was ensuring that we have a good way to test and validate our process, in order to both ensure it works, and determine where any issues may be.

We have a couple of approaches to how we can demonstrate the performance of our pipeline. First is, of course, just running the pipeline on the data, and comparing the final values to the true orientation of the antennas. We recently received data from Verizon documenting the orientations of all the antennas in the videos they sent, according to the third-party software they have been using in the past.

But what if our results aren’t immediately perfect? (Hard to imagine, I know.) Well then we’ll need to examine the individual steps of the pipeline to discover where the error is coming from. In most cases, this can be done by examining the intermediary results between these steps. Check how often the object detection is missing the antenna, make sure the 3D tagging isn’t losing track of anything, etc. Where this becomes challenging is in the angle calculations, and particularly in the Structure from Motion portion of the algorithm.

Structure from Motion relies on several parameters which can act as points of failure, and few if any meaningful intermediary results between them. The most notable risks among these are in the camera calibration step. The result of this step is to determine how the intrinsic features of the camera itself (zoom, lens distortion, etc.) are affecting the image so that we can correct for them, and ensure we know exactly where a given pixel refers to. The problem is that we don’t have a drone with which to perform this calibration ourselves. We don’t know how much these parameters might change between cameras even of the same model, or with different camera settings.

We could use a camera we do have, and perform the calibration on that, but since we can’t use that camera to record the cell towers, we wouldn’t be able to demonstrate much besides just that Structure from Motion, as a baseline concept, works (which has already been pretty well proven by countless others).

So if we can’t go to the towers, we’ll simply bring the tower to us.

3D model of tower analogue

In order to test the accuracy of the camera calibration data we’ve been supplied by Verizon, we can run our pipeline using a known camera, taking video of a rough analogue of a cell tower that we designed. The 3D model shown above will be 3D printed, so that if need be, we can see how the pipeline performs using a camera for which we have done the calibration ourselves. Each antenna on the model is at an angle we can measure, allowing us to compare the final results of the pipeline to the true value. This even provides some value over the data supplied by Verizon’s third party company, as that is already an approximation. This allows us to be even more confident that our results are accurate, and within the necessary parameters.

Week 6: A Good Fit

This week, we’ve made quite a bit of progress in integrating our object detection and 3D tagging algorithm with our angle calculation feature. Our project relies on tracking individual antennas, and assigning points as belonging to one antenna or another, in order to then draw a plane of best fit through the points for each antenna, and gathering a series of estimates to average into one conclusion about the orientation of the antenna.

Pointcloud generated for set of 3 antennas, with individual antennas differentiated by color.
Planes of best fit for each antenna

You may notice that the pointclouds do not currently resemble antennas. This is because we still need to fine-tune the margin around the bounding box with regards to what points to consider to be part of the antenna.

Image showing color-coded key points assigned by the algorithm

We anticipate that many of the keypoints we will use to detect the antenna position will be around the edges of the antenna, as the central portion of the antenna is a consistent texture and appearance, which is difficult for keypoint detection to use. So to ensure that we are getting all the keypoints we can, we need to add a slight margin around the bounding box that will still consider those points to be part of the antenna. That way, even if the bounding box is slightly too small, or barely excludes an edge, we will not miss out on data.

The drawback of this is the results above. With a margin that’s too large, the algorithm will include keypoints for the tower behind the antenna, which throw off our results.

Although it still needs improvement, this integration is nonetheless a marker of significant progress, and brings us one step closer to a complete pipeline.

Week 5: Getting Physical

An additional part of our project is a hardware component, running our object detection model on a Jetson Nano in real time so that the drone operators can tell if the video they’re collecting is working well with the model, or if they need to adjust the camera settings. This week, we’ve gotten the hardware set up and have begun testing how to run our software on the relevant hardware.

Image of hardware components including the Jetson Nano, portable monitor, webcam, and bluetooth keyboard.

One of the first issues we encountered was power constraints. With the lower level hardware, it often wasn’t able to both power the various peripherals and run complex computations. This was addressed by simply using a Bluetooth keyboard (which has its own battery) saving it enough power to operate normally. The final product will not need to have these peripherals anyway, as the relevant programs will already be loaded onto the device and will only need to run.

With the power concerns addressed, we were able to run a basic face detection algorithm:

Output of Jetson Nano running YOLOv8 Face Detection

As you can see, the video runs quite slowly at the moment due to the processing required to run the machine learning model. To address this, we have a few options including finding ways to simplify the model, or exporting the model to another machine learning framework designed to run on lower level hardware such as TensorFlow Lite.

Week 4: Tracking the Path Forward

This week, we made significant progress in our object detection and object tracking algorithms

On object detection, with a batch of annotations on a new tower completed, we were able to run our algorithm on a new tower with different conditions from the initial video. Not only are there more antennas on this tower, but the camera setters were different, resulting in the overall video being brighter than the previous one. Adding this to our training data should allow our model to function in a wider range of conditions.

We also tested with two separate types of model: One to detect any antennas, and another to differentiate antennas viewed from the front from antennas viewed from the side.

 

Results of the two object detection algorithms.

The object detection algorithms had good results, with intersection over union values of over 90% for both models, and over 99% for the model to detect any antennas.

We’ve also made headway in the object tracking algorithm. In order to keep track of which data is associated with which antenna, we need to track the bounding boxes across the image.

Clip from results of object tracking algorithm

As you can see, the algorithm does a pretty good job at tracking the antennas, but runs into some issues towards the edges with regards to assigning the IDs. It seems that as an antenna rotates and approaches the point where the model should not register it, it switches back and forth between labeling it. This then causes the object tracking algorithm to assign a new ID.

Although this behavior can be reduced by improving the object detection model, to guarantee consistent behavior, we will want to add some post processing that will interpret the raw IDs outputted by the algorithm, and map them to the actual antennas, but for now, the fact that the algorithm is doing as well as it is is promising.

Week 3: Finding Where We Are

This week, we continued our work on the angle calculations, along with improving our object detection algorithm and making progress on the 3D tagging.

In the angle calculation algorithm, the next step after finding and matching key points is to use those keypoints in calibration and triangulation. The goal of calibration is to determine where in 3D space the image was taken from in each picture, then using triangulation to determine where in 3D space each point would need to be in order for them to be at those locations in each image.

When a given point appears in two different locations in two images taken from different perspective, there is one spot in 3D space that point can be. If you know the location the images were taken, you can calculate it.

Traditionally, the location of the camera in both perspectives are calculated using the points themselves. Based on how much the matched points move between the perspectives, you can infer how the camera moved. But these calculations may not be necessary. Because the drone collects metadata on the position and orientation with GPS and gyroscopes, we may be able to use that data directly to determine the positions.

Plot in 3D space of drone position according to metadata
Plot in 2D space of drone GPS coordinates

The primary concern with using this data is the potential for inaccuracy. As you can see in the images, the points tend to align along certain lines unnaturally. This indicates that the values stored in the metadata are rounded, introducing some amount of error. We won’t know how much this error will affect things until we get some initial results, however we will have potential areas to address if those results need improvement.

Week 2: A Matter of Perspective

This week, we’ve made pretty significant progress on our angle calculation algorithm. This algorithm relies on a process called Structure from Motion (SfM). SfM relies on two separate perspectives of the same object, and by extracting key points of the object that are recognizably the same in both images, it can calculate where those points are in 3D space.

The first step in this process is finding and matching those key points, something we’ve been able to do as shown here:

Images that we are running our keypoint analysis on
Images overlaid with matched keypoints after running algorithm

Our algorithm has been able to find and match many points across the image, however you may notice there aren’t many on the antennas themselves. While the other points will still be useful in the calculations to calibrate the camera properties and positions, one of the things we will need to do is ensure that we have enough information about the antennas themselves to reliably find their orientation. We have a few ideas for how to accomplish that, so stay tuned to see how those ideas perform.