Atomic operations in C++ provide hardware-level locking that is significantly faster (10-15 nanoseconds) than OS-level mutex locks (500-2000 nanoseconds), making them ideal for simple counter increments and shared pointer operations, though they may be less explicit for complex synchronization scenarios.
Approfondir
Prérequis
- Pas de données disponibles.
Prochaines étapes
- Pas de données disponibles.
Approfondir
Why Atomic "Lock-Free" C++ Code Still Locks Your CPUAjouté :
In multi-threading, one of the most common things that you can happen is a data race. Suppose you've got a function called increment, which is using a global variable counter and incrementing the value. If you run that function in a multi-threaded application, so you're going to have a data race. Thread one and thread two are going to both try to increment the value of counter at the same time. And because counter++ takes three CPU operations on the hardware, the threads can race and every single time you'll get a different answer. But what if there was a way to write code like this and for it to completely work out of the box? And that's where atomics come in.
Atomics are built-in C++ feature that allow the user to create a variable.
Often times, the variables that you can create are going to be of a built-in type like an integer or a long or a character. And you can perform some atomic operations on them. If you were to, for example, remove the mutex like so inside the code and do counter++ like this, the code would actually work as expected. And that's what I want to talk about in this video is what is an atomic, how does it work, and run through some code examples.
I've commented out the atomic code and I've also removed the mutex lock and unlock. All of these are incorrect. Of course, the simplest way to prevent a race condition is to add a mutex like so. So, I've saved the file and you can see here that and every single time we're going to get the value of 200,000.
But there is one problem with this approach and that is that this mutex lock and unlock are kind of slow.
In order to lock and unlock a mutex, that is going to be done on the OS side.
The OS abstraction is way slower than if you were to do this on the hardware itself. Maybe you don't have a counter, you have like couple of different things. In those cases, using a mutex lock is obviously the easier approach, but if you're just incrementing a counter and you want to lock that operation, it's unnecessarily slow. If you were to use a mutex, that requires the OS to schedule it, and that can take between 500 and 2,000 nanoseconds.
Whereas, if you were to do the same operation with an atomic, that's a hardware lock. It's done on the CPU level, not on the OS level. It's way faster, right? It could takes between 10 and 15 nanoseconds.
So, right now, I'm just running the code with a mutex lock and unlock inside of each iteration of the for loop, and you could see that it takes about 9 to 10 milliseconds. If I were to remove this and use an atomic int instead, you could see that, yeah, it's about 4 milliseconds. So, it's about two times faster. Now, of course, this is probably not the perfect example. So, let's try this out by, instead of adding it to 100,000, let's do it 1 million instead.
And this time, you could see that it took about 100 milliseconds for the program to run with a mutex. If I run with an atomic instead, it's 30 to 40 milliseconds. And just to really showcase how different using a mutex or using atomics can be, I've created five threads, and I'm joining all five of them, and let's check out the differences. Yeah, the counter is getting incremented to 50 million correctly, and it takes about 800 or 900 milliseconds. And using the same thing with a mutex lock and unlock can take 4,000 milliseconds. So, it's nearly four to four and a half times slower to use a mutex lock and unlock.
I've also opened up the assembly. If the mutex is not available for use, the OS needs to do a bunch of other things to ensure that the thread that does not have access to the mutex keeps rechecking whether or not the mutex is unlocked. In the simplest and fastest case, the actual thread can immediately access the mutex. As I step into it, you're going to see that there's a lot of other instructions that need to be run, right? And of course, these are going to potentially call other functions like pthread_mutex. And if you keep going, you're going to realize that there's a lot of instructions that get generated if you're just using a mutex lock. And now let's compare this to what an atomic lock looks like. The first thing that you're going to realize is that the compiler was smart enough to unroll the for loop, but the unrolling part is unrelated. The important part is right here. Notice that when you have an atomic variable, the operation to lock is first of all, it's a special sort of operation in the generated assembly, and this is super optimized. So if I go, if I press SI, there is nothing to step into. It automatically goes to this next line. There is no function call. There is no operating system scheduling anything. It's a very fast operation.
And of course, that's this operation right here. It's the fetch and add operation. And as I keep pressing SI, this lock and increment instruction happen in single pair. So right now we're at plus 23, and if I press SI just once, it should go to plus 30.
So I've pressed SI just once, and yep, there you go. It's gone to plus 30. So this is essentially an atomic instruction. The mutex approach requires a lot of work on the OS side, whereas the atomic approach requires almost no work on the OS side, actually.
First of all, you're going to want to include atomic. You could create an atomic like so. You can remove the mutex cuz you don't need more. The lock and unlock is going to be done directly at the hardware level. And so what this means is we don't need to lock and unlock like so. And we can actually just do counter plus plus like this. And you might be thinking that's a little weird that you could just increment a counter like this, and that is true because when you have an atomic, you can do a couple of operations. If I do counter. You're going to see what those operations are.
There you go. Exchange, fetch, add, fetch, or fetch subtract. The thing is, the counter plus plus operation is actually internally calling fetch underscore add, and we could see this if we just go to the definition. Operator plus plus is actually doing a fetch add and passing in the value one. So, this would be the same as writing fetch add like so. But, for the purpose of this example, we could just do counter plus plus. And, what you can notice is that even though we don't have a mutex lock or unlock, and we've got multiple threads, this should work as expected.
And, you could see that every single time, the result of counter is 200,000 in the counter plus plus operation on the CPU still takes three operations. It takes the I think it's the load, add, and store operation on the CPU. They were able to figure out a way to somehow just like lock it, even though the CPU is doing three separate operations, it can essentially lock and unlock itself, and it's just much faster.
There is one more thing to note. These things actually also can work for complex data types. So, say you've got a struct point like so. You can always check whether or not the structs that you've created can or cannot be atomic by doing a static assert and checking is always lock free. And, yeah, that's kind of like a special case. I would assume that in many cases, you probably don't really want to use atomics on complex data types. I would assume that you would probably just want on the built-in data types, but this option exists for those that need it.
Now, personally, the mutex lock and the unlock seem a lot more explicit. Saying counter plus plus like this, it's not obvious whether or not this is thread safe. Now, of course, if I wrote the code like counter fetch add, then of course, that would be a little bit more clear that this is an atomic, and this is more likely to be thread safe. But, at the same time, you still have to be careful with your logic when using atomics because you can have two variables that depend on one another and you can still have thread unsafe code essentially. Anyway, I'd probably just use mutex lock and unlock. I just want to talk about atomics because for the certain scenario where they are useful, they can be much faster. And I do believe that if you ever use a shared pointer, you've essentially indirectly used an atomic. And yeah, that's all I want to talk about in this video. If you've made it this far, hit the like button and consider subscribing for more random C++ deep dives just like this.
And if you want to sponsor the channel, just send me an email. Thanks for watching and I'll catch you guys next time.
Vidéos Similaires
Agentforce NOW AMA: Build with React and Salesforce Multi-Framework
SalesforceDevs
490 views•2026-05-28
How agent o11y differs from traditional o11y — Phil Hetzel, Braintrust
aiDotEngineer
450 views•2026-05-28
WEB TECHNOLOGIES UNIT-2 | Degree 4th sem BCOM Computers web technologies unit-2 full explanation💯✅
LearnwithSahera
1K views•2026-05-29
More tests are always better? How to use AI to identify tests that bring little value
Alliance4Qualification
335 views•2026-05-29
Search Algorithms Explained in 60 Seconds! 🤖💨
samarthtuliofficial
218 views•2026-06-01
People of Game of Thrones using JavaScript DOM
AltCampus
296 views•2026-05-30
Introduction to Problem Solving Part - 1 | Lecture 1 | Intermediate DSA
ascensionix
107 views•2026-05-29
🚀 BCS613C Compiler Design | Module 1 to 5 Schema Evaluation 🔥 | VTU 6th Sem 💯 #VTU #bcs613c #exam
Pranavaa-y4y
104 views•2026-06-02











