White elegantly utilizes C++20’s metaprogramming to achieve a "write once, run anywhere" paradigm for heterogeneous computing. This approach effectively shifts the burden of architectural abstraction from the developer to the compiler's static analysis.
深掘り
前提条件
- データがありません。
次のステップ
- データがありません。
深掘り
Lightning Talk: Crafting CUDA Compatible C++ Code - Jon White - CppCon 2025追加:
attending any conference, [music] it's in it's incredibly important to to be there. That's kind of the only way to really dedicate your time uh uh to be there and kind of be immersed in the whole thing and not distracted by by other stuff going on. Even if you do get distract distracted with [music] interesting conversations in the hallway.
>> Hi, my name is John and I'm going to be talking about crafting CUDA compatible C++ code.
Um, so basically the problem is I'm trying to write a uh parallel math library that needs to run on CPU and GPU, but I don't want to write anything twice and I don't want CUDA features in my C++ code or CPU code. Um so just as a simple example of uh an operation you want to parallelize u single precision ax plus y um it's a embarrassingly parallel problem because uh every index is independent of all the other indices um so if you want to parallelize this on a CPU you get your vector of uh threads and then you distribute the work to each of the threads or if you want to uh paralyze it on a GPU I've implemented a grid stride loop and a uh CUDA kernel.
Um but the issue is uh those were both uh singlepurpose functions that you had to write both the operation and the parallelization method uh every time. Uh and ideally we want to separate concerns into the operation and the parallelization method.
Uh so step one obligatory context for all the things.
Um so the reason this is important is because on NVCC uh the NVIDIA compiler um you can pass the experimental relaxed con expert flag uh and that allows all of your con expert functions to be uh uh used on both the host and the device. Uh you write it once uh and you don't actually have to use any CUDA uh keywords.
Um so if you can read that uh um so up at the top we have uh the single precision ax plus y uh as a const expert function um and so that is being called from both of the parallelization methods the the one that's doing the CPU uh threading and then the one that's doing the CUDA kernel.
Um so step two uh basically you want to follow the example of the STL uh pass in an operation instead of having to call it uh from each of your par parallelization methods. Um so we're going to pass the operation as a runtime parameter and uh the parallelization method is going to be parameterized on the type of the operation.
Um, so yeah, now we're able to pass in the uh SAXP op to both of our parallelization methods.
Um, and so now we have separation of concerns and everything is only written once except that this doesn't work. Um, so anyone know what's wrong with this?
Uh so the problem is that the operation kernel is being passed at runtime and so it doesn't actually correctly resolve as the the host or device uh version of it.
Um so the CUDA version is actually getting the host version of the function.
Um and so the CUDA kernel is going to silently fail when you try to call it.
Um so actual final step uh make the operation a non-type type play parameter. um that's going to force resolution at compile time. And so the host version gets the the host the host code gets the host version and the uh CUDA version CUDA code gets the CUDA version. So this is what that looks like. Uh still just a con expert function uh defining the operation. Um but now we're passing in the operation as the template parameter, not as one of the runtime parameters. And so this works now.
We did it. Uh we now have a way of uh executing parallel operations that work on either the CPU or the GPU without having to write anything twice and without using CUDA in code that's meant for CPU.
Thank you. [applause] [applause]
関連おすすめ
resume fixed instantly 😭 Comment “app”andI’ll sendyou the link #parakeetaipartnership #resumetips
Ritcareer
686 views•2026-05-31
Re: 🗣️📍theprophedu📍2026 GST 103 CLASS (E-EXAM REVISION)
theprophedu
636 views•2026-06-04
3D Basics in C
HirschDaniel
2K views•2026-06-05
Search Algorithms Explained in 60 Seconds! 🤖💨
samarthtuliofficial
218 views•2026-06-01
Making Minecraft Clone with C++ & Raylib
PecaCSLive
686 views•2026-06-04
People of Game of Thrones using JavaScript DOM
AltCampus
296 views•2026-05-30
Instagram accounts got PWNed
EricParker
13K views•2026-06-03
So What's Odin Lang Even Good For
TechOverTea
131 views•2026-06-01











