This presentation elevates FRC vision from a hardware challenge to a rigorous architectural discipline by clearly separating local precision from global localization. It is an essential masterclass in engineering pragmatism that moves beyond technical specs to focus on functional system design.
Deep Dive
Prerequisite Knowledge
- No data available.
Where to go next
- No data available.
Deep Dive
Beyond the Coprocessor: Lessons in Vision & Localization - FRC 6328 FIRST Championship ConferenceAdded:
Well, welcome everyone. Thank you very much for coming. Uh, this is, as the title says, beyond the co-processor, lessons in FRC vision and localization.
Uh, if you don't know who I am, my name is Jonah. I am the lead software mentor for F FRC team 6328.
For a little bit of background on me, so I've been involved in F FRC for eight years as both a student and a mentor.
And in that time, I've worked on a whole variety of F FRC vision systems. So, dating back to the retroreflective tape that uh used to be used on the fields to April tags and object detection that are used in more modern games. And I've helped out both 6328 and lots of other uh teams of all kinds of different skill levels. So, what are we going to talk about today? Well, as some of you know, 6328 shares all of our work via Open Alliance. So all of the code that we write is public and we talk a lot about our process. And what that also means is that we get a lot of questions asking for advice about vision. And over the years we've learned a lot of lessons about what works well and what doesn't.
Many of those lessons have just become institutional knowledge for us. But they were never really collected anywhere that we could point others to as a resource. So this session is really my attempt to distill down some of those most important lessons that we've learned on 6328 both for us personally and in our experience helping all kinds of other teams.
So before we get started I do have a few words of warning just to set expectations. So first I expect you'll get the most out of this presentation if you already have a little bit of familiarity with the nuts and bolts of vision in F FRC. There are some basics that are just better to learn by trying it yourself and experimenting with it.
So, I'm not going to start from absolute zero here. However, if you've been using vision before, you might have played around with it, but you're looking to go to the next level. This presentation is really for you. Now, one thing you might notice is if you walk around the pits, you can talk to a hundred different teams and get a hundred different answers about how to design a vision system. But we've noticed that there are plenty of trends that hold true almost universally about the systems that work better than others. And my goal is to focus on those lessons that we've seen work consistently. So I'm also not going to talk a lot about specific hardware solutions, hence the name of the talk, beyond the co-processor. I want to do my best to provide advice that is helpful regardless of whether you personally prefer oranges, apples, limes, or raspberries.
So let's get started. Now, I said we get a lot of questions about vision, so we can start there. Here's a few examples of some very common questions that we get on 6328. And what strikes me about all of these questions is that they are all missing a really important piece.
They're all asking about which of these options are better than others from a technological standpoint, but they skip the fundamental questions that should fuel that discussion. What is the purpose of vision? And what are we trying to achieve with this system? It's very very easy to start designing a vision system from the bottom up, starting with hardware and cameras and filtering algorithms. But we've observed that the very best vision systems are always designed from the top down. And unfortunately, too many teams fall victim to the trap of blackbox localization solutions or chasing FPS with alternative vision software. But none of that really matters if you haven't properly defined your goals to begin with. So let's start by trying to answer these questions. Now, of course, the purpose of vision can change dramatically depending on the team and the season and the robot, but let's put look back at some recent games to get a little bit of a reference point. And here we're specifically thinking about what 6328 has been doing with vision because it varies a lot between teams.
Now, first charged up 2023. This was the first game with April tags. So a lot of teams were still really figuring out what to do with them and and what the purpose of them could be. For us, we were focused on auto alignment to the nodes and substations, but also thinking about how to reliably navigate field obstacles in auto like the uh much smaller cable bump that uh was an issue in auto that year. Uh that came back a lot uh for 2026 where they they upgraded the size of the bump. Um, now Crescendo is a little different than than Charged Up because this was the first launching game with April Tags. And in some ways that's a simpler problem to solve, but it also presented new kinds of opportunities. And this game in particular really demonstrated the importance of passing accurately from midfield, which is a very different kind of launching challenge than many other games. and vision for 6328 enabled some more subtle features where we could change the robot's behavior based on our position on the field in order to align with our strategic goals.
Reefscape was revisiting pick and place and many teams started adopting autoscoring this year uh while it was fairly rare in for charged up and on 6328 we took it further by trying to automate branch selection as much as possible. So we wanted to always be aware of which branches were closest and which would be fastest to drive to. Now rebuilt 2026. This uh game we've taken a lot of lessons from 2024 and very exciting very exciting. This is the first game where we've seen very widespread adoption of launching on the move which requires very precise localization and more and more teams are experimenting with that and finding success with it.
Okay. So the most important vision tasks here tend to fall in a consistent category which we can call local positioning. That means that in order to achieve these tasks, we need to determine the position of the robot relative to a particular field object like a scoring target. And this is really useful for automatic alignment.
And it also doesn't need to be particularly complex. I'll show a couple of examples of that later. But also keep in mind when we say precise local positioning, that can mean many different things. There are some years where we need under an inch of tolerance to accurately score on the on for example the reef in 2025. But in many launching games, six or more inches of precision is all that you need to accurately hit the target. Now, this category doesn't cover everything. So the next category is what we could call imprecise global positioning. For these tasks, we care about the robot's location relative to multiple field objects. For example, if we want to determine which field objects are closest to the robot, we don't need nearly as much precision for those kinds of tasks. Even up to a few feet often makes little difference, but those tasks are also much more niche, and you can often find highly competitive teams that don't pursue these goals at all. They're very much secondary. The last category is a little bit different, and broadly we could call it gamepiece localization.
They tend to be the most different types of vision tasks and vary a lot by year.
And it's not necessarily obvious in any given year whether game piece tracking has significant utility. For example, we didn't use this at all in 2024 even when teams were fighting over notes on the center line. Uh we were still able to make dynamic autos and whatnot without using vision. However, regardless, this whole category isn't really going to be the focus of this talk. uh if you're curious, we can chat more in the Q&A section.
Okay, so in summary, this is really the framework that I think defines the best localization systems you see in F FRC.
They consistently achieve precise local positioning and imprecise global positioning. Now, I would like to uh preemptively address a question that you may be asking yourself when I put this on on the screen, which is what about precise global positioning? Would that not be even better? Now, there are a couple of issues with this. So, the most fundamental issue is one of purpose. The reality is that global positioning is already more limited in utility than local positioning, which accounts for all of the most critical game objectives. And there are certain items on these lists that benefit from global positioning. We've leveraged imprecise global positioning to great effect. The issue is that there just aren't a lot of game objectives that benefit from precise global positioning. you care about the robot's position to the degree that it affects the interaction with field objects, but really no more than that. However, suppose hypothetically we envisioned some task where precise global positioning was a benefit. We would still have a big problem to deal with. And to understand we have to talk about coordinate systems. For local positioning, what we care about are two transforms. The transform between the robot and a vision target and the transform between that vision target and our scoring location. Now the transform between the robot and vision target can be derived directly from vision data.
We'll we will certainly talk about that.
And the transform to the scoring location is fixed as part of a physical object on the field. It's better fixed the closer the able tag is to that scoring target. And that allows us to describe with these two transforms the total transform from the robot to a particular scoring location. And we can do it with a high degree of precision because both of these transforms are either well- definfined or directly measured.
Now, at first glance, global positioning seems like it would be very similar. We can plot the location of a robot and a target like this, one of the the branches in last year's game. But how do we represent these positions? Well, we use a global coordinate system. So, I've represented that in the lower left hand corner with those axes. And we can represent any position on the field using that coordinate system. And each pose represents a transform from the origin to a position on the field.
Something like that. But what difference does it actually make what coordinate system we're using? Well, here's why it matters. We have a nice coordinate system with poses for each field element. But this whole thing is really a lie. It only works if we assume that every element is in precise alignment with every other field element. And in practice, that will never be the case on the real field, despite what the manual may tell you about supposed field tolerances. That means our nice neat coordinate system really looks a lot more like this. And our scoring target is more like this. Now, we can use these transforms perfectly fine to represent the approximate location of field elements relative to each other, but nothing more than that. Even if you try to measure the location of each field element, you'll find that they move quite a lot when robots start hitting them. And to be clear, that's not a technological shortcoming that we can't uh achieve precise local precise precise global positioning. Even the fanciest VR tracking system or calibration utility doesn't change the fact that the coordinate system we're using is a lie.
And it's just a tool that we made up.
It's a useful tool. But a fixed coordinate system just fundamentally can't represent the precise layout of a real F FRC field. I also don't want to give the wrong impression that global positioning is useless. Even imprecise global positioning is an absolute gamecher in recent games where it's become more ubiquitous. We don't need a precise coordinate system to select the closest target, change our behavior based on field zone, or reliably navigate terrain. Depending on the use case, gamepiece localization can also fill a lot of those gaps. Global positioning is absolutely a worthy goal of many vision systems as long as we're clear about what it can and can't achieve and we're prepared to use alternative approaches as necessary.
Okay, so now we have a few more defined goals in mind. Let's talk about how we can achieve those goals. And here I'm focused on information that applies regardless of the specific hardware you have. And first let's talk about local positioning systems because this is the most important use case of vision and for 6328 it's the primary driver of our decisions regarding vision.
Okay, what better place to start with a vision system than cameras. This is one of the trickier aspects of vision because it's so dependent on the game and robot. But here's some advice that applies in most cases and I'll show some examples on 6028 robots in a minute. So first the dos. Do point cameras towards relevant vision targets. This may seem like an obvious one, but I've seen too many teams that focus from the start on just maximizing field of view. And really, you want to be thinking about where are the specific targets you care about for the tasks you're trying to achieve. Another thing you should do is choose a well- definfined mounting location. Generally speaking, cameras that are mounted lower on the robot have a better defined position, which you can measure from CAD. And generally speaking, uh, cameras that are moving on the robot are harder to deal with and should be avoided if you have the option. You should also include multiple cameras for different situations.
Combining functionality for different tasks sometimes works, but it often requires sacrifices to one or the other.
So, be careful about that. And you should consider using a variety of vision of camera types. So, different fields of view, using monochrome versus color cameras. These are good at different types of tasks and you should think about that carefully rather than always using the same camera uh just by default. Now there's some other things to avoid. One is maximizing fields of view. We already talked about that a little bit and mounting to a moving location. The reason for that is that it makes controls more difficult because your position of the camera is poorly defined and it can be made to work but it's often more trouble than it's worth and there you have to think about the sacrifice carefully.
You also should avoid designing an overly rigid mount. This is another unintuitive one because I said the position should be well defined.
However, rigid camera mounts tend to be much more vulnerable to breaking on on the field, particularly for rough games.
And cracked camera mounts do you no good, no good in terms of a well- definfined position.
Another thing is don't make access for maintenance an afterthought. It's really important to be able to debug wiring, to calibrate the camera off of the robot, to adjust the focus of the camera, and so on. Uh, so these are are some of the rules we follow for with mounting cameras. There's obviously some flexibility, but we try and stick to these guidelines where we can. Here's an example on 628's robot from 2025, Manta.
Here, diagrams like this are really helpful. Definitely take advantage of CAD. So you can place the robot before you've built it in different locations on the field to check the visibility for important field elements. Now in this game, we were mostly focused on visibility when aligning the robot to the reef. And the tags there were in very close proximity along the ground.
So our visibility is provided by two cameras on the right side of the robot with these overlapping fields of view.
And the nice thing, if you look at how those fields of view are projected out, we can confirm that we still get a nice wide field of view when we're further away from the reef. Now, the first version of this robot didn't have a ground intake on the back. And so, we had a back camera that was used specifically for local alignment to the human player station. We then pivoted once our goals changed and we had a ground intake on the robot where that same camera was used for gamepiece localization but always specific to a a single task that we wanted the robot to complete and we would add a camera in order to achieve that.
Okay, so once we actually have the camera, how do we use that data? And there are a few options here. They vary in complexity. The simplest option that you may be familiar with is servoing.
This is just means you turn until the target is in the center of your frame.
This is very simple to implement and there are a few points of failure with it. So in a lot of cases, this is really all you need. The downsides of this, it typically requires a fairly high FPS.
You can make it work if you have a camera with a lower FPS, but the implementation becomes more complex uh if you have to deal with latency compensation and other algorithms.
Another restriction is that the camera needs to be in line with the scoring mechanism. If you're off to the side or or asymmetric, that can often be a challenge. And another key point here is that whatever algorithm you're using for alignment can't effectively operate in multiple axes with this algorithm. So you can pivot in place, but it's much harder to control translation and rotation separately, which is often useful when you want to approach a vision target, especially in pick and place games.
Now sering was very common in F FRC back in the retroreflective era. But even then many teams including us were frustrated by some of those limitations.
So trig based solvers were an alternative that allowed us to start calculating the full pose of the robot on the field. And remember we're still talking about the pose relative to the target but starting to separate translation and rotation. Now the way these solvers work is by measuring the distance to the target using the vertical angle in the frame. There's a a diagram there that uh you can parse through. The downside here is this does require a significant height difference between the camera and the target which in some years can be a challenge like in 2025 where the tags are really close to the ground which also happens to be a good mounting location for cameras.
These solvers also rely on having a really accurate gyro position throughout the match which is a pretty major disadvantage. Gyros can and do drift over the course of an F FRC match. If you want to prove that to yourself, try hitting a wall at max speed and checking how accurate your gyro is after that.
Um, there is significant error even in the best gyros.
Now, trig solvers still have those few problems. So, one of the solutions that was developed for this were 3D solvers.
Now, it was possible to use these with retroreflective tape, but they are much more common with April tags. The idea is that we use an algorithm to determine the 3D transform from the camera to the target. The most common of which you might have heard of is solve PNP that's part of the OpenCV package. Now the advantages of this is that it doesn't require gyro data at all. We solve our rotation directly from the tag and we don't need any height difference between the camera and target. Another big advantage is that the it doesn't require that the robot be level on the ground.
extremely important for years where you might be going over large field obstacles. Now the disadvantages are that generally the noise there will be more noise than trade-based solvers in the output. It also suffers from a problem called ambiguity which means that for single tag estimates there are actually two different solutions for any given vision frame. That doesn't happen with multiple tags in frame and there are ways to eliminate it uh which we'll circle back on. So those are both solvable problems, but they do require additional thought in order to produce highquality pose estimates. And keep in mind, some vision solutions may not even provide enough data to do that type of filtering. Well, limelight being a prime culprit there. So of course, people have started to think about how to address some of the ex the continuing limitations of 3D solvers. And there's a whole class of algorithms for this which combine gyro data with a 3D solver.
There's a whole bunch of names for this which I've listed up there, but the idea is to use a relatively stable gyropose to increase the stability of your 3D estimates here in the video. You can see the standard 3D solver is in red and our variation that includes gyro data is in blue and it's it's much more stable.
Now, this class of algorithms is increasingly popular, but it's not the beall endall. The requirement for an accurate gyro can still introduce significant error if it's not handled properly. And accuracy can often be poor when the robot is in motion, especially if you don't properly handle latency.
Again, limelight sometimes has some issues there. It's also more sensitive to having a poorly defined mounting location of your camera, particularly when tracking targets at a distance.
It's really important that your angles uh of your mounting position are correct to within a fraction of a degree for it to be accurate. And as discussed before, when the robot is not level, this is going to suffer more than a pure 3D solver.
Okay, so here's a helpful summary chart of those solvers. You'll still find teams using all four of these solvers.
So all of them definitely still have a place. And there's no one right answer for every situation. Each of these solvers has its own quirks that require thought to handle. You may also want to use different solvers for local and global positioning. So to understand that, let's switch gears a little bit and talk about global positioning. Now there are some unique considerations here. Specifically, remember that we're talking about imprecise global positioning. So you need to think about how precise do you need. This will vary depending on the game and the purpose anywhere from a few feet uh to a few inches or less depending on context. And we'll circle back to how we can combine some of this data uh with local positioning data.
Okay. So, let's look back at camera mounting. The considerations here are mostly the same. Uh, st you want stable camera positions that are tuned to the particularity of each game and robot.
The main difference is that for global positioning, we actually do want to maximize field of view because that will give us more visibility to April tags no matter where we are on the field. So, as an example, here's our robot. This year, the cameras that are facing to the right here are positioned mostly to increase the total field of view of the vision system. We want to be tracking April tags, even imprecisely, regardless of the robot's orientation, and especially in the neutral zone where we might be getting pushed around by other robots.
Those cameras on the right are using high field of view lenses about 90°. And again, a good example of how you shouldn't be afraid to mix up your camera types depending on the use case.
We also want to think about when we're maximizing field of view. We're talking about maximizing the field of view of the full system rather than individual cameras. The reason for that is it's usually a bad idea to use cameras with a field of view wider than about 100°. The standard camera model that's used by Open CV and most vision solutions breaks down around that point as you get wider and wider angle lenses. So, the pose accuracy tends to be very poor with very high field of view lenses. However, running multiple cameras with relatively high field of view lenses will work much better in general. And we spent a bunch of time before talking about solvers.
So, how does that apply here? Well, some of these solvers are better for local versus global positioning. Now, my first note is we do definitely need full poses for global positioning. So, we can eliminate some options off the bat. Trig solvers here are becoming less and less common for global positioning. And you can kind of see why if you look at this chart. Incorporating data from a 3D solver removes the need for a height difference and it can produce more stable poses. So it makes the mounting accuracy less sensitive. Pure 3D solvers though still have an important role here. That's what we primarily use on 6328. And it's because we don't want to rely on a perfectly accurate gyro. In fact, we use data from our 3D solver in order to correct the gyro over time to maintain accuracy. So the cases where we do want to use a combined 3D and gyrosolver especially for local positioning we increase our accuracy by using the data from the 3D solver. Now those 3D solvers also require more work to set up appropriate filtering and tuning. And uh across all of these solvers an important thing to think about with global positioning is whether you can incorporate data from multiple tags in the frame. Some of these algorithms support that better than others and it depends on the specifics of the algorithm you're using. Another good rule of thumb here when you're looking at different solvers is regarding error. So 3D solvers tend to suffer mostly from noise but with relatively low bias which means there's high variation in the individual estimates you get frame to frame but they tend to be centered on the correct position. The other options suffer more from bias but with lower noise. So there's less variation in in individual observations, but you can be offset from the right position continuously, especially at long distances or with poor mounting. Both of these forms of error are possible to deal with using filtering. Though I will say generally speaking, noise is easier to deal with and more easily detectable. But you should think about different kinds of errors when you're comparing these solvers.
Now, regardless of the exact solver, good global positioning requires good filtering. And you can sometimes get away without this for local positioning, but global positioning really benefits a lot from good filtering when you're combining data from multiple cameras.
Now, uh there are a lot of techniques you'll see here in use, but here's a quick summary of some of the steps that some of the more successful pipelines use. First, it's important to resolve ambiguous estimates. For 3D solvers, you need to figure out which of the two solutions to use. For single tag frames, uh there are different techniques for this. You can check the relative reprojection error, which is a measure of quality of the two different solves.
You can disambiguate using gyro data.
That's what we do in a lot of cases. Um, and some estimates may be so ambiguous you just need to throw them out because you can't determine which one is accurate. Another detail here is that when we have to disambiguate the poses using the gyro data, we specifically don't update the gyro using that data because we can get into a feedback loop there. So some things to think about with 3D solvers. The next step is we want to reject any poor estimates. So for example, check that the robot is actually within the field and at a height that makes sense off of the ground. And you can eliminate a lot of uh errors just by doing that. The next step is calculating a trust metric.
Typically you'll see this notated as a standard deviation, but generally we find it's not that useful to think about it in terms of real units. You could measure standard deviations in theory, but there's usually little reason to.
Each estimate that you get from an individual camera and an individual frame will be better or worse based on several factors. So, one of one of the biggest factors is the distance to the tags. You'll be less accurate when you're further away. Also, the number of tags, the camera properties can all have an effect on this. And the last step is fusing the estimates. W Caleb has a very nice class for this. So mostly that's a solved problem at this point.
Okay. So we just talked about how to make a great local positioning system and how to make a great global positioning system. But ultimately we are creating one unified vision system.
So how do we balance those objectives?
Well, you may be wondering in code what does this actually look like when you have local positioning and global positioning? Well, first you might not actually need to separate the two in code. We've used April tags since 2023 and it was only one year 2025 that we explicitly handled local positioning as a separate data stream. In a lot of cases, you can implicitly switch between local and global positioning with a single pose estimator. And that was our approach this year as well. Here's an example of what I mean from 2024. So this is an example vision frame and I've annotated it with the standard deviations the trust factor for each of these four cameras on the robot here. A lower value means that we trust the estimate more. Now most of the time there's no reason to trust any camera more than any other arbitrarily. April tags tend to be evenly distributed around the robot. But here in we're looking at the speaker trying to score.
So we want to really be relying on local positioning relative to that target. And you can see that the filtering algorithm is trusting those speaker tags more than the others. That's not because we have separate logic that handles those specific tags. It's because we've positioned our cameras such that those tags tend to be the largest and most visible. So they are trusted more by the filtering algorithm implicitly. Here's another example. Here the robot is scoring in the AMP and so we want to rely on tags that are as close as physically possible. We strategically placed one of the cameras on the robot to have visibility to the tags along the alliance wall by the speaker and human player station. And the algorithm trusts that camera more because it's seeing more tags that comes implicitly from the camera placement. And essentially this means as we drive around the field, the single pose estimate we're maintaining is shape-shifting between different use cases. As we approach key field elements, it becomes our precise local estimate since those nearby tags are trusted. However, we're still fusing data to create an imprecise global estimate when we're elsewhere on the field. And note that across all of these local and global positioning, we're using the same field coordinate system.
But the pose estimate is adapting to be correct relative to different field elements depending on the robot's position. So all of this comes just from the filtering algorithms and the camera placement.
Now in 2025, we used a different approach where we explicitly separated global and local estimates that allowed us to tune those pipelines differently and used different solvers, control for latency versus smooth smoothness and the demands of the specific application. In fact, in 2025, we were maintaining separate local estimates for every single April tag on the field. And when we went to auto align to the reef, it would always use the local estimate from the correct April tag that was attached to that side of the reef. And it's not just using the closest April tag. We know where our target is and we will use that target right from the start in order to align. So there's no other interference from any other tag on the field. Now, that was specifically to target a use case for a game that demanded very high precision, but where camera placement and the implicit filtering didn't give us enough control over our pose estimate.
Okay. Now, to finish up, I have a collection of quick tips that we've acquired along the way. These apply to both local and global positioning, and we'll do these a little bit more rapid fire. So, first, there are a bunch of parameters you'll see when tuning an April tag detector. Some solutions make call these different things. Here's a quick breakdown of how those parameters affect the quality of your results. Uh I will also have links to these slides at the end if you want to refer back to this. Generally speaking, it's important to focus on resolution before FPS, but maintain some balance there. Downscaling is generally a bad idea uh except on very lowowered hardware where you don't have another option to maintain a usable FPS. and brightness uh controlled by exposure and gain should usually be as low as you can get away with to prevent motion blur.
Another quick tip, most cameras you find out in the world are rolling shutter, which does not work well for FRC games.
For April tag tracking, you really want a global shutter camera, which means the entire image of the camera frame is is captured at once rather than scanning down line by line. Here's an example where we can capture from the middle of this video a clean undistorted image.
Even when the robot is spinning at full speed, you would see the whole image would be skewed. If this was a rolling shutter camera, this is also a good demonstration of why you want a low exposure because you minimize motion blur that way.
Okay, now this one, uh, avoiding image compression. This is specifically a concern for anyone that is using USB 2 cameras. Now, to run at a reasonable frame rate and resolution over USB 2, those types of cameras generally rely on MJPEG compression, which means it's sending essentially JPEG images over the USB connection. Now, that's okay for some simpler pipelines. However, the additional noise that comes from that image compression can affect the pose estimates and the stability of them.
We've observed that MJPEG compression can increase noise by 40% or more on the final estimates. So there are a couple of solutions to this. One is to use a USB3 camera of which there are are increasing options on the market or using a MIP based camera connected directly to a Raspberry Pi. That is the solution that Limelights use incidentally.
Okay. The next tip is always calibrate your cameras. Regardless of the complexity of the pipeline you're using, calibration will significantly improve the accuracy of your estimated positions. Now, you can find more details about how to do this in the docs for your chosen vision solution, but keep in mind most of the cameras that are used commonly in FRC are built to relatively poor tolerances. So, they need to be individually calibrated.
Everyone is different. And that's because those cameras are not really designed for machine vision. So we have to sort of pick up the slack on that one.
Okay. Uh this next tip sort of speaks for itself. You can see in the image.
You would want each camera to be focused to an appropriate distance so you don't get the results seen here with an April tag that should definitely be visible and detected if our camera lens had not slipped out of place. It's generally good when focusing to bias towards slightly farther distances. Those are tags that are harder to detect. And we find even a slightly blurry tag, not not this blurry, but a little bit blurry, can still be detected if it's large enough in frame. So focusing a little bit farther is often a good solution.
And of course, make sure to secure the lens in place so it doesn't slip out of focus. Uh, often a little hot glue goes a long way there.
Okay, here's a really important one that matters more on the robot code side of things. Between capturing an image, the the camera has to capture an image. It has to process, do the April tag processing, and then send that result to your robot code, and then your robot code has to process it. That whole thing can take a few hundred milliseconds. So, here's a clip of an auto from 2024 where you can see these green ghost poses are the vision estimates that are coming in live from our camera. And you can see they're lagging behind the robot's position. That's the latency that's coming from the vision pipeline. If you don't account for that properly, all of your poses will be offset when the robot is moving. And it can be quite severe.
At our first event in 2024, we had major issues with accuracy in auto because our timestamps were off by 20 milliseconds and that was enough that we were missing notes in auto. It creates a really significant offset at high speed. So definitely check the documentation for your vision, your chosen vision solution. Read about how to account for this. Um there are different ways depending on if you're using photon vision or limelight, but they all support something like this that you can account for.
Okay, the next thing a lot of teams ask us about is how to utilize field calibration time because events provide it sometimes every day. And the main thing to remember about field calibration is that a great vision system is already designed to be tolerant to a wide variety of field conditions. Now that inherently makes field calibration less important, but for us we think there are two really important things to check that we always do during field calibration. One is to check for major field errors. Examples here would include April tags that are in the wrong location or upside down. Uh those are issues that we have caught before once or twice and they do confuse the robots. Here we're really concerned about outright errors, things that are obviously wrong and not field elements that are slightly out of tolerance.
Trying to measure every field element in April tag is rarely useful. And it's really a sign that the design of the vision system was flawed in in its creation because you should be designing a vision system that is tolerant to lots of field conditions. the field will change over the course of an event and you don't want to be missing because of that. The second thing we do during field calibration though is calibrate the camera brightness. And this is just because practically fields have to have different lighting. We can't have it consistent across every event. So you want to be adjusting your exposure and gain and maybe white balance to put the to make sure that you're getting a consistent uh exposure and a consistent detection regardless of where you are in the field. So we will usually place the robot at several different locations on the field and check that we see the specific tags that we need to achieve our objectives at that location.
Okay, another point. Recording video from the cameras is incredibly useful for debugging. Limelight added a feature for this midway through the 2026 season on Limelight 4s. Photon vision also allows you to capture snapshots throughout the match. I do hope that this sort of thing becomes more common over time uh because we found it incredibly helpful. It was much like when we started doing robot code logging for the first time where we weren't really sure how we were getting by without it. It allows you to debug a lot of problems much more easily. You can figure out whether a failed detection was because of a shadow or bad exposure or damaged April tag. For example, here's a a real screenshot from Champs last year where our robot was failing to detect this tag because it was physically damaged. And this saved us quite a headache. We were able to get that fixed before the next match. And we could confirm very quickly that there wasn't an issue with our robot. There was an issue with the field.
Okay. One other topic that I'm I'll discuss. We talked about filtering algorithms. There's a lot of complexity there. And one of the really critical tools in our toolkit has been log replay with advantage kit. This allows us to develop our filtering pipelines based on real log data with real field conditions rather than needing to rely on more approximate simulations. Now I'm not going to go into tremendous detail about how that works in this session. We have presented a whole conference before it uh about it before at Champs and if you check the documentation at that link, you can find uh a video of that conference and lots more details about how this works.
However, I do have one fun example to show which demonstrates the importance of filtering which we were able to produce using advantage kit. Now, this is based on log data from one of our 2024 events event events. And the solid blue robot here shows what our vision what our pose estimates look like when we have all of our cameras running our standard filters. The yellow robot here shows what it looks like with just a single camera and again all of our our normal filters. And the red shows what happens if we use all of our cameras but turn off all of our our filtering and just trust all of the estimates. You can see there's quite a dramatic difference in the stability between those estimates. So we can experiment with advantage kit in trying different filtering algorithms, different adjustments and see whether we get improvement in real matches without needing to experiment during uh actual matches on the field. So that's incredibly valuable.
Okay. Uh there we go. So that was a lot of details. I want to end by circling back on the big picture here. So these two goals, precise local positioning and imprecise global positioning are the common threads that we see guiding the very best vision systems in F FRC. And you can execute on these goals in all kinds of different ways. Vision really is one of the most exciting areas of F FRC software right now. And we're seeing still new innovations every year with teams coming up with new algorithms uh and new uh sets of software that can do different types of vision pipelines. So hopefully what I've talked about can be a bit of a guiding framework as you work towards solving some of these problems on your own and with future games that we don't know what types of vision problems we'll have to solve. So thank you all for listening.
Related Videos
Beyond Robotics | European Rover Challenge 2026
beyondrobotics
189 views•2026-06-01
Beatbot Sora70: JetPulse Technology and AI obstacle avoidance and navigation!
DroidModderX
26K views•2026-06-02
Tesla FSD 14.3.3 Hits Phoenix Streets - FIRST LOOK
anthonystesla
114 views•2026-05-29
Elon Musk Just Revealed Fremont Line for Optimus Gen 3 Mass Production
TheAINexusOfficial
180 views•2026-05-30
人機一体「零式人機 ver.2」 子ども企画【おもしろ発見!モビリティー】 #乗り物 #automobile #robot #shorts
KyodoNews
1K views•2026-05-28
China’s New Luna AI Robot Looks Shockingly Human...
NextGenHumanoids
850 views•2026-05-28
Reachy Mini: the $300 open source robot you can actually hack — Andres Marafioti, Hugging Face
aiDotEngineer
662 views•2026-05-29
柔軟指×AI画像処理食品の仕分け作業システム!#柔軟指 #ロボット #自動化 #製造業をもっと盛り上げたい
KiQ_Robotics_Corp.
113 views•2026-05-28











