OpenCV 5 represents the most significant release in the library's history, featuring a completely rewritten DNN engine with over 80% ONNX operator coverage (up from 22.5% in version 4.x), new hardware acceleration layers (HAL) for ARM, Intel, and RISC-V processors, support for FP16 and BF16 data types for reduced memory footprint, and a unified buffer system for memory optimization. The new engine supports dynamic shapes, subgraphs, and quantized models, enabling faster inference for popular models like YOLOv8, RT-DETR, and Llama 2.5 while outperforming ONNX Runtime on many computer vision tasks.
Deep Dive
Prerequisite Knowledge
- No data available.
Where to go next
- No data available.
Deep Dive
OpenCV 5 PreviewAdded:
It is Thursday. It's 9:00 a.m. Pacific time. And you know what that means? It's time for Open CV Live. We've got a full house here in the virtual studio today.
Um it's going to be a fun one. We're talking about OpenCV 5, the long awaited biggest release ever of the OpenCV computer vision library. And uh we've got a couple folks from the team here as well as of course our stalwart host CEO Dr. Satia Malik. I am coming to you from beautiful fragrant Soma in downtown San Francisco, California. It's raining a little bit this morning which I'm pretty happy about. Um Abishek, where are you joining us from?
>> Yeah, I'm joining from Bangalore, India.
>> Right on. And Gersamer.
>> Actually, I'm joining from Goa, India.
>> Oh, wow.
>> All right.
>> That's why you look extra tan.
We can we can see >> these televisions have already started.
>> Love it.
>> Love it. And uh folks, wherever you're watching out there, we're live on Twitch. We're live on LinkedIn live video, YouTube, Facebook for some reason, and Zoom, of course. Let us know where you're joining us from. We'd love to see the global OpenCV community chiming in on these Thursday mornings.
We've got some activity over on Zoom, of course, already. We see uh >> Samantha says good morning. What's up?
>> For people who do not know the reason we laughed at Goa India is Goa is like the u you you you can say that it is Hawaii and Las Vegas combined for India.
>> Makes sense. Makes sense.
>> You get a you the things to do there is basically you get some alcohol and sit at the beach and that's what you do.
That sounds >> sounds a bit like uh maybe like Cabo, right?
>> Yeah.
>> Let's see. Over on Zoom, we got Samantha coming from San Diego, California. We've got uh Bremen, Germany.
We've got uh Dr. Zintaloo from uh Hersing University, Atlanta. We've got Muhammad from Mali. We've got hugs from Carlos Nava in Baja, California.
We've got uh Moscow, Andre from Moscow, and our buddy Hussein on YouTube says, "Hey, heyo." Heyo, Hussein. How's it going, buddy?
>> Love to see it. Love to see it, folks.
We'll get started in just a minute here.
Um, and Hussein's joining us from Pakistan.
Howdy.
Got Bruce Wright from Calgary, Alberta, Canada. home of the legendary heart wrestling family.
Um, yeah, it is a star-studded affair here on OpenCV Live. Looks like everything is is good. We're live on all of our channels, I think. Um, now is a great time to go ahead and get started. Doc, you want to welcome everybody to the show?
>> Yeah.
Hello everybody. Welcome to OpenCV Live.
I'm Dr. Sacha Malik, the CEO of OpenCV.org. And today we are going to announce what is coming next in OpenCV5.
It will be released in just a few weeks and I'm so excited to uh invite uh Abhishek Gola and Gurimmer Singh for uh who are who are part of the core OpenCV team and they are going to explain what are the new things that are coming in this release. This is the biggest release of OpenCV in many years and it changes everything. I mean we literally built this thing ground up. I know this is this is a very cliched term but that's exactly what happened. We have pretty much rewritten the DNN library and there are ton of new features and we have also broken a few things. Uh we have we are going away uh from backward compatibility in a few cases. So we are going to discuss all of those things today on this on this webinar. So uh but before we do that uh as always with me is Phil Nelson who is the director of content and creative at OpenCV. He produces the show and if anything goes wrong it's his fault. Hi Phil.
>> Yes indeed it's me. It's me. It's PHIL.
I am your co-host with the co-host the second banana who's second to none. I'm also your plus one and only. You know it's Mr. Nelson if you're nasty but you my dear friends can call me Phil. And I'm here to remind you of a few things we do on every single episode of this here program. The first of which is a special giveaway to you out there in the audience. Stay tuned. Later on in the episode, I will be asking a trivia question based on today's episode and presentation. And the first person to answer that trivia question correctly will win the OpenCV University course of their choosing. Stay tuned for that.
It'll happen in about 40 45 minutes, give or take. We're also taking questions from you in the audience. ask your question wherever you're watching in the chat or use the little Zoom Q&A button if you're watching on Zoom because it helps me organize these things. I'll be watching those messages and pull out some good questions, throw them up on screen and we will answer those questions later on in the episode.
But before we get started with the episode proper, I actually wanted to talk a little bit about some Open CV history. Um, and that history is brought to you by our sponsors and OpenCV members and partners such as ARM, Futureway Technologies, Google Summer of Code, Rooflow, Orbit, the British Machine Vision Association, Jet Brains, the Edge AI and Vision Alliance, Open MV, Tanggram Vision, Amped Software, Intuitivo, Rerun, Intrinsic, BearsDev, and Big Vision because every company needs needs a big vision. Open CV depends on the support of these folks and the support of individuals such as yourself. If you've got a few bucks to spare, you can join the 16 sponsors we have on GitHub by going to github.com/sponsorscv and helping us out there for just n bucks a month. And you will be as cool as these people here. I'm going to read their names. We've got Heyo Yang. We've got Mario Berseron, Alberta Beef, Rooflow, Bears Dev LLC, Axel T81, DJ Greenwood, uh, Chonx, Fres, Hugh. Wow, that's a tough one.
Sarn electronic AI.
Alexander Voronov, the homies of Big Vision once again, double dipping, IPOP AI, Comet ML, Alexander Ismolov, Tala Hussein, and Nick Libertini. Thanks so much everybody for your support. Once again, that's github.com/sponsorscv.
Get your name on here. I will say it and we will say thanks.
>> Thank you. I did a little bit of digging before the show here. I wanted to uh pull up so uh OpenCV5 obviously momentous occasion here. Um but I wanted to show a little bit about OpenCV history. So, I went back as far as I could on the GitHub repository, and the uh the oldest release on there is the first official Pi Pi release from uh September 3rd, 2016. It seems both super long ago and super recent. What do you think about that, Satia?
>> Yeah. So, >> first released of Pi.
>> Yeah, we we used to be on Source Forge.
You remember that website that people used to publish their code for the longest time?
>> It's still around. It's still around.
>> Yeah.
>> Y >> um so yeah, first release on pi3.1.0.0.
That was September 3rd, 2016. Um I I dug up this release from uh Foronx here.
OpenCV 3.0 was uh June 2015.
Um it says uh the release announcement reads, "With a great pleasure and great relief, OpenCV team finally announces OpenCV 3.0." I thought you guys thought you guys would like that one.
And I think at this time it was actually like a a huge amount of time. It was like two and a half years between releases and that seems so quaint now but thought you guys had like that.
Thanks for for Onyx for the old post.
And then I found OpenCV 4.0 over on synced review. Uh OpenCV 4.0 release ends 3.5ear wait. Um we have eclipsed that one by a wide margin here for Open CV 5. I think we're approaching what, like eight years, something like that.
>> Um, so yeah, this is just just kind of interesting. I'll drop these links in the show notes as well, just to see what people are excited about. But, um, >> uh, see, all other companies, right, when they write code, they write a lot of bugs, so they need to release, uh, often to fix those bugs.
>> That's right.
>> We we just write code. We don't write any bugs.
>> It's one It's one weird trick that it really works, folks. I I encourage you to try it when you're in software. Just don't write any bugs in the software. It seems so it seems so simple, but >> you know, a lot of the a lot of the best advice is like that. I think >> so. I thought you guys like that little >> in the in the same uh you know uh in the same you know spirit I always tell people to uh when when investing in the stock market, buy low and sell high.
>> Ah see you're genius. Absolute genius.
just it's just nuggets of pure wisdom coming out on OpenCV Live this morning.
Um my my trick is always don't forget to make a lot of money.
>> It's it's easy you get you get caught up in the dayto-day and uh you know it's easy to forget but uh don't forget to make a lot of money. So I thought you guys would like that trip down memory lane here. Um of course we're on YouTube. Please, if you're watching on YouTube or anywhere else, go to youtube.com/opencv official and smash that subscribe button. And don't forget to like these videos. Uh you can also become a member over there and pay us a couple bucks a month just because you like our show. We would love it if you did that. You can find every episode of OpenCV Live, past and present, on the channel. And we also now have a Patreon account. You can go to patreon.com/opencv and sign up for seven bucks a month. You will get DRM free downloadable editions of every episode of OpenCV Live, including the one that you're watching right now. Skip the YouTube ads and join us over on Patreon.
>> Yeah. And if you guys want, we can also create an Only Fans account where Yeah.
>> you can see our developers code in real time.
>> Clothes on. Clothes on.
>> Well, I mean, depending on how much you pay. Um, but yes, if I mean, you know, hey, listen, at at this point, if if what people really want to see is a shirtless Phil uh writing newsletters, then so be it.
>> So be it.
>> I I'm not a proud man. Um, I'm here to get paid, folks. Speaking of getting paid, let's pull on up to the pay window. We've got a couple of awesome guests here today, Abashek and Gersmer from the team. Uh why don't you guys introduce yourselves and uh we'll we'll get started talking about OpenCV5.
Who wants to go first?
>> Yeah, I can first. Uh I'm Gimmer Singh.
I'm a principal AI engineer. I am also part of the uh OpenCV uh core team. I've been working on OpenCV5 for quite a while now and uh we're all excited for the release and I think the thing Phil mentioned >> uh the road note will be still the same with a sigh of release open will be out >> right >> yes yes and Abishek >> yeah hi folks uh Abishek and I am senior computer vision engineer uh working with open CV for quite a few time now And um I will be briefing about uh OpenCV 5 today along with GMO.
>> Fantastic. I think we can go ahead and uh get started. Whoever wants to share their screen, let's take it away.
>> Yeah, just give me a minute.
>> Yeah, no problem.
Don't forget folks, we are taking questions. Drop those questions in the chat wherever you're watching and I will dig them up like the truffle pig that I am. Here is Abashek's screen share.
>> Yeah. Yeah. So, uh for people who may be wondering right how OpenCV uh development team is organized. So, OpenCV is a nonprofit and we uh we work with other companies. All right. There are some core companies that we work with to uh for the development of OpenCV library. Um this company you know big vision it is uh basically my consulting company and Gimmer and Abishek are part of that. We contribute to the OpenCV library. Then we have OpenCV.pi I which is uh based out of Europe that's uh that also uh is a core uh you know company that helps and finally we have OpenCV China which uh is based out based out of China obviously. Uh so uh it's a it's a uh it's a uh it's a place very close to Shenzhen. So uh yeah so all these teams collaborate completely remotely to uh bring the OpenCV library to you. So it's sort of a global effort and there are a lot of people who contribute uh you know they they they came up with a better algorithm they came up with a bug fix etc. So the whole community from all around the world they contribute as well but the core team is basically Europe, China uh and India >> indeed it is. Okay take it away uh I guess Abishek.
>> Sure. So uh on the content part um these are quite a few things that we are going to discuss that's open CV ecosystem and the thing that is why OpenCV 5 the new DN engine in open CV5 and what are the performance gains that we are getting in open cv5 and related to the hot topic that is 3 vision what are like what are the things that are there in open cv 5 for 3 division and the last would be like release timeline and resources So yeah we can get going.
Saty would you like to talk about these things?
>> Yeah well so opencv library right it is uh one of the biggest libraries in the world. It is downloaded about a million times a day right just think about it a million times a day and of course these are automated systems that are installing on uh AWS and other cloud platforms but still 1 million times a day that's quite a large number and we have uh 86k uh GitHub stars. In fact, we are one of the top 100 organizations on GitHub. If you take all the repositories, uh they they publish uh a report. So, we are in the top 100 organizations. And you know, uh very popular among embedded vision uh engineers. About 89% uh of the embedded vision engineers, they use OpenCV as the primary uh computer vision platform. And it's a collection of 2,500 different algorithms. And a lot of people don't know how much uh optimizations are uh done. So uh you know one of the things that you will uh you have to keep in mind is uh when we release OpenCV5 you will notice that the same things right it's the same uh you know the the names of the functions etc are not changing but there is massive speed up under the hood and most people don't realize that right u if you're using the right hardware for which it is optimized things will run much faster than uh than normal so it's not just 200 different algorithms it is also So what kind of optimization has gone under the hood? So we'll talk about it uh today.
>> Yeah. And in addition to not writing any bugs, uh one of the tricks one of the tricks you can use when developing software is to just make sure that you make it fast, >> right?
>> It's already simple trick.
>> I've already talked about the slide so we can skip.
>> Sure. Yeah. So moving to the uh new OpenCV5. Yeah. I'll start it and uh guys feel free to jump in whenever. Yeah.
Uh so what are the core problems that uh open C5 is solving? Firstly like uh the obsolate things that needs to be removed are removed from that we have uh now faster and smaller core modules uh that should be there and we introducing uh things like hardware acceleration layers. So those are like faster code, faster performance on uh some specific hardwares. So we will talk about it uh in slides more. And then now we have cleaner API and uh nextgen DNN engine that is well more optimized and uh like very high coverage with respect to original OpenC4.x engine. We have some uh better like 3D vision support and a cleaner structure now. And the most important thing uh the old uh OpenCV documentation is now gone. There would be a new better documentation that's more user friendly.
>> So let's get going.
>> So before I was the CEO of uh OpenCV, I had a course >> um and that used OpenCV library uh extensively. One of the quiz questions in the course was um OpenCV documentation is easy to use and then there was only one option to answer, right? There were no yes or no, right?
It was just no.
>> The uh the you took the you took the the negging path to becoming CEO >> and then I was you know so this is one of the one of the things that I really wanted to fix. Um it's it's I'm glad that you know Abshik and Gimmer they took up this thing. Uh that's great. We'll see. the new newer one is the pretty it's much more prettier and uh the content is still strong. So the content was I think never never issue only the >> it is the yeah how how it looked and also you know uh just to provoke them I wipe coded a version of uh documentation which you know uh which which barely worked right I mean I spent like uh 20 30 minutes I wipe coded the document page it looked pretty it was not functional right so I never gave the code to them I said look look you could do this in half an hour why don't we have the documentation this guy just just post post in your L's publicly. I love it.
>> And then they said, "Oh yeah, yeah, yeah. We'll please submit your code and we'll take it from there." And I never submitted the code because it was just not usable. So they did it from scratch.
>> Thanks to Satya for the inspiration.
>> Yeah. So on the C++ part uh the API part we have removed the uh C API that was not quite useful and uh very hard to maintain as C++ is now like used everywhere in the Python bindings in the Java bindings everywhere it's used and CPI was too old so we decided to remove it. Now the minimum uh like C++ recommendation is C++ 17. We although we support uh like higher C++ as well and lower as well but uh we recommend C++ 17 as the minimum for the best performance also the next step would be making C++ 20 in some next releases uh as the as the step that on which we are benchmarking more. Currently we have used C++ 17.
Regarding the Python support, uh the Python 2 is already deprecated and like it's removed since 4.3. So that's there.
And uh in OpenC5 there would be support for uh NumPy 2X and some deeper integration of that also uh like to maintain like the complete and clean architecture and uh maintain that everything is working well. we have integrated uh Python bindings to open CVCI uh integrations and those CI servers run uh like on every pull request to make sure that nothing is breaking and the thing uh that's been introduced is named parameters that's much needed uh to like create cleaner bindings uh instead of writing Python code again and again and making the library bulkier the idea is like let's write C++ code generate the binding binding and uh we will define those macros and those will export the Python bindings as well as the Java bindings for us.
So just adding on to that one thing is that uh one thing is we don't want to make it bulkier. Other thing what it means for someone using uh OpenCV code in Python is that uh you would expect that I'm writing it in Python so it will be much slower than C++ right but what happens under the hood is that uh it's still uh calling the same C++ binary right so it's like a wrapper around the C++ uh code right so it it is slightly slower that is that is there but it's there is some overhead but it's not like extremely slower uh than C++ So that's how uh we build and uh one advantage of doing that is whatever we code in C++ automatically comes in Python uh without uh rewriting it in Python.
>> Yeah, it saves a lot of effort.
Yeah, >> on the OpenCV modules part the most awaiting thing uh on the like core module we have introduced a lot of uh data types that is BR 16 and 64 U 64 and boolean. So with that uh for the more functional uh core operations that are widely used we are seeing up to two 2x improvement on that part for those mathematical workloads and uh now it has a cleaner API. It has like you need not to define ifs and words uh to uh run on different CPU and get the best performance. Core module already covers it. So like some part of it uh we are going to discuss in detail as uh HAL uh in next slides and the thing that's been introduced is uh like ease of use that's uh like if you see uh numpy has broadcasting thing uh that was not there in core module in photoex so that's been introduced uh initially what you had to do is if you are going to like let's say add uh two matrix uh that are from different shape uh different dimensions. So what you need to do is you need to call CV repeat and repeat it uh along a particular direction before heading and that job is now being done by broadcast. So CV broadcast is there also. Uh in photoex there was like it was more about 2D operations and transpose flip all those operations were for 2D only and in five.ex transpose and dd flip nd and other type of operations are have been introduced and it will save a lot of code and a lot of time rather than duplicating simplifying those and like platting uh nd and then transposing it. So it's much easier now as uh it's moved like the major functionality of core have been moved to a main module so that it can be wastely used and like it has improved the maintainability and uh like better uses and faster performance for us as well.
uh on the universal intrinsic part uh FP16 BF16 have been introduced uh for uh neon and uh other processes as well. So like the question arises why do we need F16? Uh if FP32 is working well uh like if we see uh in the current flow LLMs and VLMs are doing vastly what we need is uh higher memory and another uh approach can be we can reduce the memory uses. So if we move to FP32 and uh like to around FP16 it would be like half of bandwidth that uh we are currently using. So the memory footprint just cut by half that's the target for any LM V model. So if we are not supporting FP16 uh while uh loading those model then that particular gain or the memory footprint that we are saving that 2x throughut would be gone. So that's why it's needed to support those and OpenCV5 has introduced it and uh like on the vectorzing part those uh like simplified imple implementations uh initially if you wanted to run some vectorized code that is like fast particular platform you had to implement like for each data type like you it uh any MP32 you had to implement it uh like from a scratch. Now it is like we have introduced uh template based code. So it internally decides uh based on the input uh which one uh like which data type to be used.
So you need not to like write a bulkier code and those minimized if and else calls also makes the code faster.
coming on the >> uh just to add a little bit of uh you know uh explanation to FP16 because a lot of people on this webinar may not be uh familiar or maybe starting out in their careers. So FP16 is basically floatingoint 16 and BF16 is brain floating point uh 16. So these are floatingoint formats. If you load an image um by default right in OpenCV or uh most libraries they are uh you know they are converted they're read as 8 bit uh per color channel but then for any mathematical operation they are converted to floating point 16 right floating point 32. Now what has happened in the last few years or maybe uh you know 5 to 10 years is uh floatingoint 16 uh has become very common. earlier you know the processors may not even support floatingoint 16 it was not a very common uh thing to support and BF16 definitely uh processors did not uh support uh and the benefit of these 16- bit uh operations is that they are uh very fast right and they don't consume uh as much as much uh memory right because it's literally half you're using half uh the the size now but because of neural networks neural networks they are trained using uh you know even during training they use uh BF-16 and then during uh entrance they usually use FP16 and uh so so all these processors uh treat FP16 uh and sometimes BF-16 as first class citizens uh so even when a processor the core processor would have something but there could be a neural accelerator uh which is also available on the system which would be FP16 right that way you could use uh the benefit you you you could use you could run things faster on that uh on that neural accelerator and that is why these two formats have become uh have become so popular. So uh yeah that's why you know we started supporting it. Uh so uh again this is one of those things that you would not notice uh immediately right uh because the function name doesn't change under the hood earlier we were processing it at 32-bit now we would be processing with 16 bits so things will be faster but uh code-wise you will not see the difference because earlier we would load a model even if the model was stored as uh FP16 we would run it uh using 32 bits right we would load it into a 32-bit uh space and that has changed substantially. So you will see speed up in many operations. Uh so but but code wise there will not be any change. Your old code will continue to work.
Okay. So moving ahead on the uh mathematical intrinsic part uh we we have expanded that. So like what is mathematical intrinsics? uh if you see uh in most of the models that soft max and sigmoid is used and it's basically based on exponentials right so uh like eventually it's a bottleneck for any model if it is running sequencially because that's compute expensive but uh what it is doing it's just taking one instructions and uh applying it for multiple data right so what we can do is uh we can use uh CPU score and uh like give that single instruction function to each core and a different data that do this operation uh parallelly on this this particular data let's say uh if we have eight cores we can divide the data into eight chunks and process it parallelly uh that's what uh it's done in mathematical intrinsics and uh that support has been extended uh in openc 5 >> yeah so the whole the whole idea of sim uh you know SIMD SIMD is single instruction multiple data. It says that apply the same instruction to multiple data and uh we have used but you have to you have to use it carefully uh uh to the processors which support this you are able to do it. So under the hood u even simple operations like exp and log etc they need to be handled based on the processor and if the processor supports it then OpenCV will basically uh do all these operations in parallel right using SIMD >> so um OpenCV has introduced SIMD uh intrinsics for image resizing warping remapping and uh for ARM based platforms there is something called neon that handles these optimizations. If we implement like uh we have implemented those and we are able to see 3 to 4x speed up on ARM devices compared to the original code that was uh running scalarly and uh like regarding extending the support for future OpenCV code is maintained in such a way that uh any kind of SIMD is introduced in future that it can adapt it very easily >> and and neon is coming as default on Right from >> OPC 5 it's there in I think 4.13 as well but 5.x text also uh neon optimizations come as default on AR which means that uh openci will be much faster on arm devices now um and this this of contribution also comes from arms itself arm itself and uh that's where we have been taking it from >> right so we have been collaborating with ARM for about 3 years now and uh they have uh you know with our team uh they have helped us uh improve the performance So uh Gushar and Abhishek all people need to do is when they download OpenCV 5 they'll need to simply compile when they're compiling on ARM will open CV automatically figure out okay it is on an ARM uh you know uh it is being compiled on ARM and all these u the neon instructions etc will be included uh is that how it works >> right >> okay so uh at the compile time it can check uh at what device it is and then based on that uh it can save its instructions that while the runtime uh which part do I need to choose >> if uh that uh neon based path is there it will choose that >> and for when people are using it in Python they don't need to do anything because pip would automatically download the right uh okay yeah >> yeah it's under the pip code and uh you need not to specify anything it's by default taken and it automatically figures out the platform and gives you the best performance for that version.
>> Yeah.
>> Uh like we discussed on FP16, FF16.
So as Satya explained on that uh like the core operations that are needed for that is uh like arithmetic operations and the convert to operation like uh before processing anything in uh BF16 or FP16 we need to convert it from somewhere like image is read uh like is read from int 8 then it needs to move in FP16. So those convert to operations are there and then normalization mean minmax log and count non-zero operations. So these are the list of most common uh used operation in core and like these are supporting FP16 and BF16 now on the img prop part. Uh the thing like if you have ever used uh like text put text in OpenCV you like in photo text you must be able to figure out that text is not good enough because at that time it was using uh hers text that was from 1967 by way too old and it was just like uh a couble code.
there was no uh like real bold it just like it was thickening uh those pixels and there was no proper proper spacing.
So in openc 5.x5.0 uh we have decided to replace it with some like true type and that is uh rubric and that particular uh the texting that we are using is sense. So it has Latin uh every like all these scripts built in.
>> So foreign languages it will support right the full um pretty much any language as well as the rendering will be much uh nicer.
>> Yeah. And uh internally there is a option that uh if you have your custom font you can also load it and compile it with it but yeah that won't be the pip version but internally you can do that.
Okay.
>> And uh talking about one of the most used uh open series function that is find contour and uh that's been much faster right now in openc 5.x. The reason being is uh the assuming uh if even if we are able to process those like contours just uh take an example there is an image and there are multiple contours and if we are going to like look for it one by one then it going to take eternity what we do is uh we use sim that is uh giving the same work to multiple codes but uh in the photoex even though we were doing it the constraint was locked system. So when you are writing it back uh you can only go one by one. So eventually what happens uh the same memory cannot be accessed by multiple coursees at the same time. So they need to queue up and log would be there that one core is writing then uh other have to wait the lock is there and uh that was just killing all the performance that we were getting from uh parallel optimizations.
So now OpenCV has adopted uh troco algorithm that is actually lock free. So what it does uh it preassign the paste memory that uh that particular memory part is yours uh and other is some other codes. So they can uh process parallelly and also write parallelly. So you can uh save that time that you were getting like uh that was getting eaten up in the locks. So it would be much faster now.
So Aishik, do we have a number how fast it is like compared to 4.x?
>> I don't have the exact number right now.
>> Okay. Okay. Like a 2x kind of a thing or one point?
>> It should be nearly 2x at least.
>> Okay.
>> Yeah.
>> Also uh it supports those chain codes and uh links. So I'll explain chain codes to you like uh in a contour what are the possible directions we can go left right top bottom and the other direction would be top left uh top left top right bottom left and bottom right.
So instead of moving pixel by pixel and figuring out where to go uh there is an approach that is called chain code. So it predefined uh between 0 to one sorry 0 to 7 that uh this block is zero another is uh like one and then it moves that way. So it's fastest to access those uh blocks. So in the in uh in all eight direction we will get the faster navigation.
Similarly for link run uh we can uh process each row parallely and then club them uh to make a whole computer instead of processing uh pixel by pixel also uh the mgro part the most like one of the most used function that is walk perspective and remap and they also support 16 path now and uh we have introduced those params like in In a function uh if you pass those input values you need to re redefine those function you need to overload. So instead of overloading you can just take params and param in params it will have all the functions all the variables you can increase them decrease them. So you you need not to define those functions again and again and overload them.
So I just want to uh uh uh stress a little bit on why uh like have why did we spend time on optimizing something like contour detection right in today's time you'll say that uh who uses contour detection anyways but uh we need to realize that this is a kind of a function which is used almost anywhere where you are doing a segmentation using any DNN model right you will uh need uh at some point to detect count loops, right? And uh working in computer vision in a lot of times uh we have realized uh while auditing a pipeline is that the counter detection part was taking more time than the inference part and just by optimizing that uh function we could reduce the overall pipeline time by 30 to 40%. Right? So uh these things go unnoticed. Right? But uh if you know these if you know this that uh this this is the bottleneck and now if like open Z5 it makes it faster it might just faster your if and you don't need to change any code right. So you just upgrade the version and your existing pipeline might get much much faster.
>> Yeah. So this is this is the kind of uh you know uh insights you get when uh people who are building real world applications they are creating a library right because uh a lot of times people don't realize like the example Gummer gave when you run a segmentation algorithm you usually don't work with uh the masks directly right you could but uh often times you need to convert them into contours and if creating the mask which is the segmentation takes smaller time than actually finding the contour then you have missed out on uh on a big optimization uh you know uh opportunity.
So that is what uh Gurimaru was referring to and all this all these insights right that we need to uh we need to optimize this function versus that function. It comes because you know we also have a consulting business right big vision um which uh where we we we solve real world problems for our clients and that informs us that okay if this function was better then uh we could we could have a better performing uh overall product and that is why we go ahead and uh improve those things which we would like to use ourselves in our consulting business in OpenCV.
Right.
So uh coming on the API cleaner part uh most of the features that were absolute and uh like were not that important have been moved to open s uh and talking about the features module we are introducing DNN based features and the important features like sift and uh other are there but rest of them that are obsolete are moved to open cv contra same part on the object detection uh the hard and hog have been moved to open cv contra and the OpenCV main would have the newer object detection approaches like YOLO and other approaches. So it's moving toward DNN based because it's much accurate and much better. Also the uh classic ML module have been moved to OpenCV contrip. uh you can use it from there or uh psych learn can be used for those traditional models and uh on like approximate nearest neighbor uh finding algorithm we have like introduced a new algorithm added a new algorithm that is anoy based algorithm because it's uh much faster than the original FL kitty tree method because it takes forever to compute those nearest neighbors and uh annoy based approach is faster because it has like random forest and multiple tree voting. So that's like pretty much faster. So it's been used now. The graph API is uh also moved to OpenCV contract and OpenVX support is dropped.
Uh and uh the on the photo module part the important things uh that were uh make with chart detector and color correction. So that's been uh like ported to OpenCV main from uh OpenCV contract because it's widely used and uh now in you need not to have OpenCV contra for that you can use it directly from OpenCV main.
>> So the this annoy based uh matching as well right so uh just to give you a practical example this comes in u stitching image stitching right so um when when do you do feature matching? So if you have two images or uh if anyone who uses uh today a phone with a camera they know that they have a panorama mode uh to capture images uh right of a landscape. So how does that panorama works is right that you uh start start gliding the camera from one location to the other and it starts stitching different images together into a single uh large image right so how does it does that so it tries to basically uh match points features from two images and uh realign the images right so the feature detection is separate part and then there's a matcher right so uh this uh feature matching approach basically speeds up things right so uh legacy we were using something called FL which was slow but this is now much much faster >> yeah and the same thing you would use in let's say face recognition right once you have uh let's say you you detect the face first and then you create uh an embedding for the face which is like a vector that represents a face and now you want to find the best uh and there could be millions of faces in your data database you need to find what is the nearest neighbor for your uh new face embedding right and uh that because there are millions of those you need to do it very fast so that is also done using uh ENN another one which is uh a modern application is basically let's say you want to match uh photos with text let's say you want to do a uh a v uh a search right you have a database of photos and you want to do a search for uh certain text right you basically say that I want photos of dogs something like that right so that text you create an embedding for the text and then you match it with the embeddings of all the photos uh and there also you would use approximate nearest neighbor and small improvement in performance here can make a big difference in your final application but here uh I think this this improvement is uh pretty substantial right >> and any any modern LLM that you use any conversation you're having with any of the GPT or claude or everything in the back end uh it is pulling some embedding searching for your queries against some database right so >> it is basically at this point that that's where this inspiration came from to improve this because uh this has the use of this these things have increased a lot in recent years >> yeah so earlier there were just a few applications right you would use it for feature matching and face recognition but now it's like everywhere right whenever you want to do uh vision language models etc. under the hood they use these algorithms quite a bit.
>> Yeah.
>> Uh coming to the main part that is hardware acceleration layer and that's been introduced in OpenCV 5. So we'll talk about it uh here. What it does uh what is uh hardware abstraction layer or hardware acceleration layer. So like it uh for every device there is uh some different computing power different uh SIM optimizations. So in this uh layer those particular uh implementations are present for a particular device. Let's say for uh ARM uh we have TDCB and that does this job. So it's developed by ARM and uh like specifically for ARM it implements those SIMD functions or those operations to make it faster. How openc uh takes it? So as like uh we compile it uh and then it checks uh on which platform it's getting compiled based on that it decides to use like uh if this is the platform then I can use this hardware abstraction layer and uh like why do we need it? It's just because uh the original code that is like uh the common code that cannot be optimized without ifs uh and buts like uh if this platform is this then you need to do this. So those if condition is like are too much time consuming. So the better would be uh there should be an abstraction layer that uh does job for each platform uh and clc is the one that does this platforms.
Yeah, we have we have others like fast CV uh which is which is from uh colcom uh right and uh the overall design idea is that if if we have if we don't do hull what do we do right is that uh for every uh hardware I I create an if else block and I check if this is the hardware then I in that uh I just add my own optimized function for that hardware right now to have that inline in the code everywhere is just not it's it just makes things a mess right so uh it's a separate kind of it create if you create separate library which is kind of have a registry that uh this function is already optimized in this way on my hardware and this is the code >> right uh so what we just do is that we just check registry if that implementation is available we take it from that uh that library right so and that also allows ows other uh vendors, other hardware uh companies to just uh do their work right in a separate uh library and then it's just like a plug uh that uh comes in. So if it is let's say for example CLDCV is now by default enabled uh on AR. So if you install on ARM that plug is already there right? So if some other hardware which is not by on by default but it's available in OpenCV you can build open CV uh just by switching off switching on something and then without changing your code or anything it just gets optimized right so that's the overall idea of having HAL and it does not just limit to CPU uh with the 5.0 O release it will just be CPU but the plan is in uh the future uh minor versions of uh OpenCV 5 it will extend to uh non CPUs as well. What we call non CPU is basically including GPUs. So currently uh in 4.x there is some support for CUDA uh which is in in high demand right we want uh GPU acceleration in OpenCV as well. So we want to do it via a HAL itself, right?
So even if it is CUDA, even if it is some other GPU back end for let's say AMD and everything just is like a plug that can be plugged in, right? Uh without making anything uh messy, >> right? And just to just to summarize everything, right? Let's say I'll give you a very specific example. Let's say uh you want to do uh the resize function, right? So you called cv resize and you want to just uh down sample the image to half the size. You will not know but under the hood there are multiple implementations of it right. uh there is a reference C++ implementation which is written by us right it is basically how you would do resize and uh but but when when this function is called we also look whether based on your hardware let's say you're running it on ARM uh if there is a faster implementation than the reference implementation right which the provider of uh the the you know ARM if ARM has a library that uh has a resize function which is optimized then we will use that library. Uh if uh you're running on callcom processors and callcom is uh the the the library is present the resize function is present in their library then we will use that uh library. So the same code will run faster on uh ARM call Intel processors and you would not need to rewrite it right. So it automatically happens based on the hardware you're running and under the hood we have used optimized libraries which is pro which are provided by these companies right so that's uh that's the whole uh idea of HAL and u the it it basically almost feels like a plug-in right uh a new hardware manufacturer comes in we just say that okay uh support these functions and this is the connection uh of how you do it right so there There is this this is how you do it. If you support these functions, give us the kernels or whatever it is and uh instantly your uh your hardware can be accelerated. Right?
So all the people who are working for other companies and they want their uh you know hardware to be accelerated by OpenCV. So OpenCV if they want OpenCV to run fast on their hardware, you can reach out to us, right? Uh and and we'll uh we'll help you do that.
Ideally, you'd reach out to us with money.
>> No, you know, not ideally, definitely reach out to us with money. This this kind of work is not done without uh without active support, >> but it's very affordable from what I understand.
>> Yeah, that's true.
>> So, uh like these are this is the list of uh additional hall that has been added in OpenC 5.x. uh two we have discussed that is uh ARM 3DCB and call fast and there is uh another risk 5 that is uh RBB it does for risk five devices uh those risk five devices are mostly these devices are mostly available in uh China area so just a brief about risk 5 is that uh it is basically kind of an open-source uh on the hardware Right. Uh so um it's like some some of the current architectures are owned by specific companies. Right. Risk 5 is something which is open and anyone can just take that and let's say form a new company and launch their own hardware based on that architecture without paying like a significant licensing fee or anything.
Right? So it's like right now we only talk about open software. This is something coming from open hardware. Uh right? and um and the strong push is there from OpenCV China and the uh in that region uh to have software optimized for such hardware right so we are we are supporting that actively as well um uh contributions coming from the China team >> right uh so discussing more about CDCV and it's already been introduced in 4.13. So there would be uh like next compatibility would be there in openc 5.
So what it optimizes the most common functions that are goian blur, resize, pubine and these are optimized on uh and we have validated it uh on graviton 4 as well with the help of OpenCV cool that is cloud optimized OpenCV library that's publicly available right now and uh like as discussed you need not to define like I want to use 3DCV it's already under the hood open CV code decided it and uh use it on this specific platform for uh universal intrinsics. Uh we have extended support for AVX devices right now. So the optimizations are there for AVX and uh AVX 512 also for Omnion and uh Risk RVB. So FP16, BF16 uh those type of interestings have been implemented, template forms have been implemented in that and uh those scalable vectors had been uh implemented for uh most of the functions that are commonly used and uh as well as those mathematical operations with uh help of SIMD have been implemented across uh x86 ARMS and uh other devices.
Yeah.
>> Uh so for people who may be slightly confused what do these u acronyms mean?
So every processor right they publish a set of instruction sets right these instructions basically say that in a single clock cycle right when you're when you're what are the instructions that can happen in a single clock cycle right so they would say okay uh it could be something like you know multiplication and add happens and things like that right I don't even know the details but those are the kinds of there is an instruction set per processor right and every processor has uh different instruction sets. For example, neon is for ARM, uh SSE is for Intel, AMD, uh etc. And you know, they had published an instruction set and then there is an upgrade to the instruction set uh like AVX um to 512 that is an upgrade. Uh similarly, uh RVV RV is the uh the uh disk 5, right?
>> Correct.
>> Yeah. Yes. So and so on and so forth right so there are the different instruction sets which again you don't have to worry about right under the hood we make sure that uh we are using uh the optimizations that are available for the processor and uh so all of these things that we are showing you unfortunately u or fortunately uh it it actually doesn't affect your coding style right it is happening under the hood things are getting faster without you even uh needing to change your Yeah, the these and and this is also for uh developers uh from the vendors hardware vendors uh looking at how do they optimize right so we are these are basically ways uh where how they can optimize or accelerate open CV on their hardware right so we are creating trying to create interfaces in easy for them as well and just a brief note on graviton 4 uh which Abishek talked about. Graviton 4 is a range of hardware uh which comes from AWS which is it is available on AWS and for AWS Graviton processors we have optimized version of OpenCV that runs very very fast uh compared to the open version of OpenCV right so how it is available is that uh you can uh go on AWS marketplace and when you launch a new AWS instance uh you can search for a cloud optimized OpenCV. Acroname is cool uh because we are cool and and uh you will get OpenCV pre-installed the optimized version. Uh it just doesn't have the optimizations that come with CDCV. uh we have done more optimizations under the hood on the most commonly used uh pre-processing and post-processing optimiz uh functions u which is most valuable today for your DNN pipelines right so uh 4 is cheaper than x86 and is also faster so you'll get that thing right >> indeed try out cool cloud optimized opencv library on AWS coming on the uh new DNN engine that's the most like one of the most important aspect of OpenCV5 uh on optimizing newly uh like new deep learning models and supporting them. So let's talk about it firstly uh about the onx coverage.
So the new graph engine has like much more 1x coverage uh than the previous photoex. So initially it was 22.5% and now it's uh more than 80%. So almost all the onx models are supported now in open cv5. The key implementations or the key coverage key coverage uh that's been added are subgraph support. So it now support uh if and loops of graph and the dynamic shapes. Uh so the issue was most of the models are uh now export exported with dynamic shape. But in openc 4.x it was just allowing to uh process the models which have static shapes. So that blocker has been removed in openc 5.x and now we support uh dynamic shapes too. Other thing that is uh uh QDQ graphs. So in open 4.x there were some uh Q linear supports but uh QDQ support were not there and that uh Q linear support were like was very slow. So we have implemented some fusions optimizations and it's now much faster compared to photo.x the other thing that is uh attention and uh metal fusions. So like where it comes from uh for all the LLM models the first thing uh that's needed is attention layer like it's the core building block of that and to run them faster we had to optimize that and that's been done and the metal part that is matrix multiplication. So eventually every uh like every other model is doing what uh multiplying those metrics so those needs to be faster. So most of them can be fused in uh one or two operations. So that fusions uh those fusions have been applied and it's now much faster uh compared to openc 4.x or even compared to on an x runtime >> that that uh jump in the number of uh operators supported is big and uh what practically number of operators means is uh just to simplify it is you can think of it as layers. Right. So each each uh deep learning model has let's say set of layers. Uh in 4.x uh you can still run you could still run a 1x model and then do the inference using open CV. But uh the range of layers supported were very less and what used to happen is in the end you'll get a non-sup support not supported error when you try to run a model right. uh but as it it has been progressing and new kinds of layers and operations have been coming in we wanted to expand and support as much as possible right so right now around 80% of the UNX operations uh are are supported and that covers I think most of the deep learning models out there the some that are the remaining 20% operations are not very much related to inference also right so these also have the list also has operation ations that are mostly used in training right so uh and OpenCV we don't support uh uh training uh yet maybe that that comes at some point in future in OpenCV 6 but uh for inference we we now support most of the models what that means is that if you already have a code written in uh Python using OpenCV you you can use OpenCV itself without in need to install any additional library and uh do the inference by if you have a model already exported to ONLEX format right so uh that's a that's a picture right so you don't need to install any other extra library >> right and uh like one of the philosophical things uh in in our choice of you know what to support is based on what is more popular right so to get 100% onx uh on NX uh support it it's tough because you this uh there are so many different kinds of layers etc that are supported uh but if you if you look at by you know it's by supporting 80% we pretty much cover all the vision applications right all the vision and vision language applications and the remaining 20% it's it's almost like u you know they are rarely used u and they are not definitely not used by popular models right so uh this one ensures by taking this uh approach uh it ensures that we are able to support you know YOLO 26 is there, RFDER is there, all kinds of uh uh uh VLMs are there. So basically we cover pretty much uh everything. Uh for people who do not know what is uh ONNX it is basically uh open neural network exchange. This is a format in which all neural networks most neural networks are exported to uh you know you you train your neural network in PyTorch and then you export it to uh ONX so that uh you can run you know that's a standard format now you can run it on pretty much any device using something called the onyx runtime right and uh we are basically uh on many computer vision models we are beating the onyx runtime now because that's that's That's that's what our goal was that we want to beat Onyx runtime one way or the other and we don't care uh you know about many other things that Onyx cares about. Uh we care about speed and very specific models that we want to make really fast because those are the most popular models and that is why we are beating uh Onyx runtime on several important computer vision models.
Great.
So open CV5 uh publishes three engine for uh running on NX runtime and other like TF light and all. So the 4.x engine is now called uh engine classic. The newer engine that OpenCV is introducing is engine new and the optional engine that is engine. So let's start with engine classic. So it uh only has photoex engine. So it's not that compatible that I explained earlier.
It's for static shape only. It doesn't have uh onx coverage properly and uh it doesn't support subg graphs with uh engine new we are supporting those uh subgraphs and uh we have applied fusions that's that makes uh like the performance much faster. So just to give uh some context about fusion. So let's say if we are uh running u a layer that has convolution based normalization and radio. So what generally it does it takes uh one layer process it writes back the output uh to the DM and then picks it again uh for batch normalization and same like does the same thing for that too same for radio but eventually these uh three operations can be clubed because uh they are not changing shape in any way. So what uh we do we just fuse them in a loop just take input process it with coin then best normalization then relu and then write it. So that redirect time is uh much lesser so making the overall engine much faster because in deep neural networks there are hundreds of convolution and uh best normulation layers. So that will eventually make it much faster.
And the big thing that is supported is uh dynamic shapes and attention. So attention module uh like attention layers uh as I told before it's for LLM and BLM models. We have uh like good fusion for that to improve the performance dramatically. It even outperforms on an X runtime for some cases. I'll just show you uh in next slides.
And uh for the operations that is like less than 20% operations uh that we are not supporting onx we also provide an uh optional engine that is engine you need not to compile it again it like uh it's with opencv so you can directly use it by just passing the flag uh instead of like using engineu you can just passen so it covers all the operations that are uh not supported in opencu and engine and the other execution providers like GUA, open VO and all. So for now, Open CV uh engine new is focused on uh CPU.
The GPU uh functionality would be added later maybe in 5.1 but till then uh you can use uh onx runtime GPU to uh run OpenCV with CUDA and with open vo. So it just gives you the uh like runtime choice whether to use CPU or GPU with uh that engine already.
So basically uh the new engine that we have built from ground up right that is what we are uh releasing as default and uh it will support as we discussed most of the models and it's much faster in many cases than onx runtime. uh but additionally as an option we also have uh refactored and made a way that you can use onx runtime as a back end while using opencv right so you're still doing the inference u on the uh code level when you're writing the code you're still using opencv syntax and uh just you need to add a flag for using x runtime in the back end what will happen is that we'll take that model and internally we'll use the onx runtime codebase uh to do the inference. Why are we providing that option? Uh is because uh there are some cases where onx runtime is still faster, right? So you have that option. There are cases. So we don't have a GPU support coming in 5.0 at this moment. Uh but it comes in let's say 5.1 or 5.2. Till then you can uh leverage the onx runtimes uh GPU uh back end right. So you can still run uh using the same syntax the same model on CUDA using OpenCV uh just to give you that option right. So that will not be enabled by default as yet uh the onx runtime option but uh if you are compiling it for yourself there are there will be instructions on how uh you can use that as a packet right so uh it's a similar way how ons runtime does it as well. So owners runtime also has different uh what they call execution providers where they basically pass on the inference to the other third party library right so that they have execution providers for tensorat for core ml direct ml open vino which is used intel tensor we all know uh is the most pro used on nvidia jetense right so it's a similar way right so opencv is coming on top of that as a wrapper as an option So, uh actually uh I have a meeting coming up in uh in a couple of minutes.
So, you guys uh continue. I'll drop off after a couple of minutes.
>> Sure.
>> Okay.
Yeah. So, uh talking about what uh engine brings with respect to engine uh classic uh on the technical terms. So uh like the major issue we faced with uh engine classic was it was holding the data in the layer itself. So it was like very difficult to fuse the operations and uh like if you want to unfold the subgraphs that was like almost impossible with that. So we had to res redesign it. Now it uh contains all the layer data in the uh like array of tensors. So it's much easier to const fold them like uh unfold those subg graphs and uh fuse those operations uh whichever we want to fuse. The thing another thing is uh the engine classic it was like it required a specific memory for a specific layer. So even if like uh in 150 layers uh graph I call a layer at the first node and last node that memory will be occupied for the whole time.
that's just uh occupying the memory that's uh that should not be acquired.
To mitigate that issue uh we have introduced the unified buffer. So instead of assigning memory to each layer there is a like uh memory loop shared memory where whatever the memory is needed you can use and if you are not using it for a certain time then that memory would be assigned to some other layer.
So it uh allows us to reduce the memory footprint and uh optimize that those memory uh uses.
And another would be like instead of uh connecting graphs manually, you can just define a dispatch which layer you want to run uh right now. So that would be a free graph and you can connect it later on instead of connecting initially. So that gives us uh again the freedom of fusion and um whatever the operations that we like can skip that can be skipped right now and the other thing would be introducing dynamic support support instead of static models. Then uh subgraphs like if and loops uh the major thing would be uh QDQ that is like for FP32 computation is expensive. So it takes uh much more time compared to intate models. So what we do is uh we quantize those model and then again dequantize. So it converts those FP2 uh operations into intake. So now it's much faster and it has certain fusions as well just like FP32. So eventually it's converting those slow FP32 models to fast inate models.
Then uh those fusions like attention, metmal and softmax fusions are already there and uh there is a clean memory plan right now instead of like there is uh no memory leaking it has if uh the memory is not getting used it gets uh freed immediately instead of waiting uh for some other layer.
So on the quantization part just uh adding on to it what basically it means is that uh we talked about that now you can run your onx model with openc itself in the same uh syntax right uh but uh what if and it let's say it runs faster but you say that I I need to make it more faster and you have a contised model uh which you have done the contisation and uh um it's still inx but then if it doesn't run in open cv again then then we are at the same place right so that's why we worked on adding that contised model supports as well so that you can run your contised models and then get much more faster uh inference speeds right so you're not blocked at some point you all know quantization sometimes brings small accuracy uh regressions but that's part of the process right so if you have already you want that quantized inference uh due to maybe some uh memory constraints and latency constraint Right? And uh but if you could if it didn't support those quantizations, then uh you will still go back and have to use another way to do the inference, right? So that basically keeps uh everything in the same ecosystem by adding that uh quantization support.
So listing out some models that OpenC 5 supports which uh OpenC4 did. So like those detection models that are uh really fast and uh almost real time and are real time. Uh so these are RTDR, RFDR, YOLO 26N which is newly launched model from YOLO YOLOM segmentation and this SSD mobile net it was not supported because of uh missing subg graph support. So that's what is now added. So these this model is also now supported.
Then uh like newer face detector models like blaze face, retina face. On the segmentation part, we support uh SAM 3 SEG former and depth anything V2 that is depth depth estimation model which gives pretty good uh depth results and raft optical flow model, multi camera uh models and on the VLM part and uh those backbone part we support vision transform models and uh mobile vision transformer models, mobile net clip and grounding Ardino on the generative models that is LLM and VLM uh we support coin 2.5 JMA 3 Palisma GP2 ZP4 and other models as well. So these are the models that we have tested and we have targeted to optimize more comparing their their performance with uh onx runtime. So for XFET uh that is uh one of the most popular feature detector model. We are faster than onx run time and for YOLO 810 as well that is classical uh YOLO model YOLO 26N dyno it is small and Euro 10 for RFDR we are slightly faster as well and for uh trareia that is a text recognition model we are faster than onx runtime on the part of bulare models that is our v2 grounding dino know direct net uh so the how we do is uh how we do and grounding you know they are prompt to detection models so for them as well we are pretty faster than uh on an extend time so here's the sheet for that so I have tested it on Intel i9 uh like i9 system so for xed it's almost 30% for yo 26 it's 40% and dyno it's nearly 25% for it's uh 36% yearly.
>> Yeah, >> I'll show some results on that.
>> Yeah. So, just one more point.
Basically, this changes the overall outlook for OpenCV uh that is there currently, right? So, uh uh within some few few years because AI has progressed uh so fast, right? and u OpenCV mostly people have been using just to build conventional pipelines pre and postprocessing operations and then using some DNN models in between using some other library uh to create their AI pipelines right uh with open cv5 the new DNx engine DNN engine um now this becomes again that open CV is DNN front right so you can do the deep learning inference with open CV5 and it is much faster now we supporting almost every uh latest model as you saw in the list we like have quen 2.5 like which is a VLM right the more the perception today and I think most of the people uh don't know that like uh OpenCV can only be used with images and videos right but uh now it can also be used with image and text right which is the VLMs so uh this be this makes open CV again at the forefront of DNN inference and much and faster than onx runtime in one of the more popular uh models that are there right I'll show some examples for that as well so the prompt here is uh uh it's inference for uh grounding dino model the prompt is uh person so it's detecting very well and the speed is uh faster than onx runtime that's almost 100 millcond faster than that for all v2 as well uh it's almost like it's uh 400 mcond faster than onx runtime so it's beating the state of art coming on the inpainting and diffusion uh model part so that is uh also like one of the hot topics there right now and um OpenCV creates and provides a unified sample that is uh very easy and simple to use. So the the sample can be found in uh samples DNN in printing py.
What you need to do you just need to uh take your image pass it to there draw your marks mask and uh it will give you the inputed output. I'll just show you in result a result.
So here uh I just wanted to paint this tree. What I have done is uh I just drawn a mask on this and like we are getting perfect output. The uh tree is removed.
So the end to end uh functionality and uh ease of use is there in the sample.
Also uh there is one other sample for in printing as well that is uh latent diffusion waste and it's uh giving the pretty good results as well on the LLM and VLM part. Uh we have tested a few models. Uh this one uh is coin 2.5. So this is the list of models that we have tested. Uh that is coin 2.5, jamma 3, pal jama, gpt2 and gpd4.
So now OpenCV sips its own uh tokenizer uh instead of relying on uh some third party module and it has its own KVK. So making it uh much faster. What KVK does is just uh like you can understand it as caching uh instead of recalculating everything it caches some things and based on those keys uh it takes that uh value out and use it.
This is the output for uh 2.5 model. Uh the prompt is what is OpenCV and it's uh giving the detailed description of uh OpenCV that uh it's a set of OpenCV like computers and libraries uh designed in C++ and the accuracy is completely there as of onx runtime. It's perfectly matching with that.
Regarding the VLM models, we tested Pelama VM model and uh the prompt was this image and describe it. The text was describe the image and uh this image was the input. So it returns uh the best football players in the world. For this particular image, it returns uh kitten on the grass.
The first one could be >> to some people >> that can controversial.
>> We don't want to >> controversial controversial opinions.
No, tell us tell us in the chat uh who is the best football player in the world.
>> Coming on uh like uh 2D features and 3D vision.
So, OpenCV is introducing uh like integrating DNN based features that is uh alike disk that those are the like most popular feature extractor models uh available in the market. those are pretty robust and uh like on like why we why we are moving to DNN based feature modules because uh the original like classic based approaches that like sift and OB they are good for textured uh textured indoor scenes but uh whenever you are seeing like repetitive features and big lighting change they fail but uh those DNN based model learns from distinctive and like matchable approaches just don't rely on gradientrich uh mathematical calculations. So we are integrating that part and that would be part of openc 5 along with the uh light glue matcher instead of brute force matcher that is uh much faster and much robust compared to the original brute force matcher uh that openc had. It's already it's uh currently there as well. But uh on top of that uh we are also introducing DNN based light glue matcher.
So this is some example uh for it. It is a disk plus light glue model that we have used. Uh so it's perfectly matching those feature and perfectly supported in open 5.
So uh coming on the panorama stitching part uh as Corsum explained that it can be done with traditional algorithms as well but it can also be done with DNN based algorithms like get the features match them and stitch them. So that's what being done uh here it's uh being done using alike plus light glue. So this is the parame stitching image that we have got.
on the uh 3D module that is scalp 3D module that OpenCV we had in uh 4.x uh has now been divided into three separate modules to maintain the cleaner structure and making it more organizable and easy to use. So it has like three module uh modules that is uh 3D module, cali module and stereo module.
So instead of keeping uh everything in one, we decided to uh move it in three separate modules so it's easier to use.
So what it brings uh in the cali what additional it's bringing it's it has now multi camera calibration. So what multi- camera calibration is instead of calibrating uh a camera like solarly it is in the same room there are multiple cameras with one checkerboard and instead of uh calibrating those cameras one by one and getting some marginal error in each case. we uh calibrate those uh calibrate them in a single pipeline so that they can learn from each other based on uh like uh like that error can be minimized if they're uh getting calibrated together.
So that support has been added now and stereo depth has been moved from calip 3D to a cleaner module.
The 3D module has uh all the things required to wire up all the things uh that has been there in calip and stereo module. So it has uh ren fitting and some building blocks for slam some 3D restorations point cloud and mess generations and uh sorry loading and saving and uh a part of visual autotry.
The next part is that it is DNN based it's being added right now. So it would be part of 5.1 but not not part of 5.2.
Uh the big thing uh that is improved the documentation. Uh let's not talk about it. Let's just show you that.
Is is the video playing? I can see.
Yeah. So this is the old documentation that we had. Doesn't look uh very good.
Now the newer documentation we have uh is more like a modern design and it's more user friendly and visually appealing.
>> Wow.
>> Big difference.
>> Aishik, we can't hear you if you're saying something.
>> Yeah.
Uh no, I was just waiting for this video to finish. Okay, I'll move to summarize uh what OpenC 5. Uh like 5.2 will save uh we can come to this part that uh the ONX coverage has been increased by much. Most of the onx model LLM, BLM, diffusion based are supported now in OPCV5.
The performance is uh very much optimized.
The support for uh those zero D8 1D tensors and some part of ND tensores that were not already there are now completed. OpenCV5 completely supports them. The newer data types like FP16, PF16 B are now supported and uh like compared to OpenC4.x it only had uh limited data types. So with these new data types the memory footprint would be much lesser and model inference would be much faster. the dynamic shape has been improved and uh yeah that would be it that's uh additional engine that is engine that's already been discussed.
>> Yeah and the license update as well. So >> yeah license update as well.
>> Yeah.
Yeah.
So uh on the resources part you can like you have openc or opencv and github and the docs new one would be hosted soon and gaw contributions open cv university and the opencv membership that we'll discussed earlier uh at the start of the session.
>> Well played.
So >> that's it. Thank you guys.
>> All right. Thank you. Thanks so much.
This was fantastic. Um, I'll bring everybody back on screen here for a little bit and we can >> I'll stop sharing.
>> Yeah.
Thanks so much for that presentation.
That was a lot, man. Open CV5 is uh I think it's hard to overstate just how much it is. Um, it's a lot, but it's coming soon and it's a huge release and we're all super excited about it. Uh it's been what eight years since the major uhx release and it's coming soon. If you're at CVPR in Denver next week, we will be there and you'll uh we're happy to answer any questions or talk turkey about OpenCV 5 or anything else. We're especially happy to talk to you if you want to give us money. Um that's a thing you can do and we encourage you to do so. Speaking of giving us money, it is now time for our trivia giveaway here. Um, before we do the giveaway, however, I'm going to uh remind everybody that today's episode is brought to you by Spikerbot, Build a Brain, Watch It, Come to Life.
We had them as a guest last week. You can back them on Kickstarter. Uh, it was a really awesome demo and set a new record for the longest episode of OpenCV Live ever. uh which this one is uh a little bit longer than usual, but not nearly as long as last week. I encourage you to go check it out. The demo was fantastic, especially the educational uh potential of this thing is is just off the charts and we were so happy to have them join us last week and we hope that you'll back them. If you back Spikerbot on Kickstarter, OpenCV gets a little kickback uh in exchange for driving some traffic their way. It's an awesome project. It uses OpenCV. Uh so go ahead and uh check that out over on Kickstarter. There's about 16 days left in the campaign and you can be one of the first people on your block to get this the first robot that runs on neurons. So check that out.
Now to the important part, our trivia giveaway.
The way this works is I am going to ask a trivia question based on today's episode and the first person to answer that question correctly in the chat wherever you're watching will win the OpenCV University course of your choosing. You can go to opencv.org/university to see what courses are on offer and pick one up for yourself even if you don't win today's trivia. But uh I hope that you win. Good luck out there everybody. So today we talked a little bit about the history of OpenCV and the various releases.
Answer in chat, what version of OpenCV was the first official PIPI release? What version of OpenCV was the first official Pippi release?
Oh, we got some close ones in there. You got to be specific.
And it looks like we have a wiener here.
Uh our winner is uh Christian FOB.
Christian answered correctly that the first version of OpenCV that was released for Pippi was 3.1.0.0.
Congratulations, Christian. Please send one email to me. That's philopcv.org and I will make sure that you get the course that you want. So check out that course.
Um, we've got a bunch of questions here from the chat. I think now's the the time for it. We'll take a few minutes since we went a little bit long.
Um, here we go.
Uh, let's see. I've got uh Imperium asks, "How can we benchmark OpenCV 5 performance in an AI production pipeline? You got any suggestions there fellas?"
>> Yeah.
>> Garrison Abishek.
>> Yeah.
>> So, Abishek, you want to sh example how how we do that?
>> Yeah. Yeah. So uh like how do we do it uh on our pipeline or uh on our end we just uh like for most of the cases there are samples from OpenCV. So we just uh run the forward on that part and uh just time it and for those we don't have sample we write the samples using opencv and uh in the forward we just uh put the timer and we calculate the average for let's say 10 performance 10 runs and then we can like mark it as this is the performance for this particular model uh on this particular device for openc 5.
Yeah.
>> And uh and we also have a benchmark OpenCV benchmark which basically uh the where we benchmark hardares actually on OpenCV uh to see which hardware performs uh the best when we run a OpenCV pipeline on it built using OpenCV.
Right? So that will also be upgraded uh soon with OpenCV 5 and that will have examples on how to do so.
>> Right? And one more thing that is uh OpenCV DNN profiling tests. So you can uh run them as well uh that are already there in OpenCV repo. So you just need to run them and they will get they will benchmark all the listed model that are there in OpenCV uh profiling tests and give you the numbers for your particular platform.
>> Excellent. Thank you for that.
Um, we've got a comment, not a question here from Fron. Go Graviton with cool.
Thank you, France. We We appreciate you, buddy, if you're still watching. Um, we've got another question from Zoom.
Uh, any update regarding OpenCV support for the Intel Ultra process, NPU?
Thoughts on this one?
>> So, NPU support is not not in the plan, at least for 5.1. I'll say 5.0 0 and 5.1. Uh, but we don't know if Intel supports us and says they they want it.
We'll get it in 5.1.
Yeah, if you're out there, Intel, if anybody's watching from Intel, holl at your boy. Um, easy to get at.
Philadopencv.org. Let's Let's make something happen, shall we?
Um, we've got a couple more here.
One of these is um uh does video processing run real time from inference? Uh what frame rates and resolutions can people expect from OpenCV5 video processing on commodity hardware?
>> Yeah. So it all depends on the first thing is the hardware uh which you're talking about, the resolution which you're running the algorithm at and what algorithm are you running, right? uh what we benchmark how we check is basically what's the speed up right so whatever the FPS you were getting let's say with ux runtime of that particular pipeline what is the percentage speed up that we see right so um like to answer this question uh it's very broad right like I can't tell you an FPS uh without all these details but I can tell you that it is faster uh than much much faster than OpenCV4 and faster for from next runtime as well.
>> All right, thank you. Uh we've got another one here about compiling OpenCV for C++. Um this is a long one, but uh they use pip install when using Python, but when programming in C++, I need to compile the code from GitHub repository.
And I see that for certain modules, I need thirdparty dependencies. In the case of OpenCV5, what third-party modules are required for C++ compilation?
>> Yeah. So the third party Yeah. Go ahead.
Question.
>> Yeah. So uh the third party depends uh like if you're compiling a C++ it depends uh more on your requirements.
But let's say if I'm taking example of uh OpenCVDN then uh the third party for now would be MLASS and some other third party modules that are already been supported. So you need not to uh do it by yourself and for other thing let's say you are doing video processing and uh like those visualization so the third party uh dependency would be like Greamer FFM impact you need to compile with that.
Yeah, >> cool. Thank you for that.
>> And um adding on to it is that uh the dependencies how you we handle dependencies uh between 4.x and 5.x text that is largely the same while uh when you are dealing with C++ compilation right uh and uh if you still uh uh if you don't want that hassle and if you are on AWS you have tool on AWS that you can use without compiling OpenCV you will have it and but we also uh release binaries right for specific OS versions we release pre-built binaries that you can use unless you have a specific requirement of some modules Awesome.
Thanks for that. Hope I think that answered the question pretty well. I've got one more about C here. Um from Zoom, will there be CMake examples too in the repository?
>> Um I don't think I fully understand that question.
>> Martha Gorum in Zoom, if you've got any followup for that, uh please hit us up and try to answer your question there.
Like if you do mean uh if you you want to register something like a sample is created and you want to register it uh in the cage. Yeah, that's already there.
Uh you can register it directly in the cmake list and use it. But if there is some other use case then uh please elaborate your question.
>> Okay. I I think what what mean it means is that uh if they have their own code and they want to link OpenCV with it, do we have CMake samples or templates for that?
>> Yeah. Right.
>> So we we do have that in in the samples folder. They refined it many many of them.
>> Fantastic. Thanks so much.
Uh I think that's all the the questions we got time for today guys. Uh Abashak Gersomer any final thoughts for the audience out there?
Open >> CV5 would be there in about a week.
Please try it out.
Fingers crossed.
>> Yeah, fingers crossed. Knock on wood.
Um, yes, W's in the chat for OpenCV5, everybody. Um, yeah, thanks so much.
>> Hopefully we maintain the we started with like we don't write bugs.
>> Exactly. Exactly. I mean, the reason OpenCV5 is taking so long is because we just made the decision to not write any bugs in it that you know what are you going to do?
Thanks so much for watching out there, folks. We really appreciate you joining us every week here on the wild and woolly world of Al Gore's own internet.
We will be back in two weeks. Next week we're in Denver, Colorado for the CVPR conference. It's going to be a whole lot of fun. Uh come by our booth. We'll have stickers. will have friendship and we're happy to answer any questions you might have about OpenCV 5 or OpenCV sponsorship at the booth. So, come come say hey and uh take care of yourselves out there. Take care of somebody else if you can and have a great day wherever you may be. We'll catch you next time.
Adios.
Related Videos
Agentforce NOW AMA: Build with React and Salesforce Multi-Framework
SalesforceDevs
490 views•2026-05-28
How agent o11y differs from traditional o11y — Phil Hetzel, Braintrust
aiDotEngineer
450 views•2026-05-28
WEB TECHNOLOGIES UNIT-2 | Degree 4th sem BCOM Computers web technologies unit-2 full explanation💯✅
LearnwithSahera
1K views•2026-05-29
More tests are always better? How to use AI to identify tests that bring little value
Alliance4Qualification
335 views•2026-05-29
Search Algorithms Explained in 60 Seconds! 🤖💨
samarthtuliofficial
218 views•2026-06-01
People of Game of Thrones using JavaScript DOM
AltCampus
296 views•2026-05-30
Introduction to Problem Solving Part - 1 | Lecture 1 | Intermediate DSA
ascensionix
107 views•2026-05-29
So What's Odin Lang Even Good For
TechOverTea
131 views•2026-06-01











