The DeepX M1 is an M2 accelerator board offering approximately 25 TOPS performance with plug-and-play compatibility for Raspberry Pi and edge boards, featuring a multi-NPU architecture that provides faster streaming inference (3x improvement) but has limitations including int8 quantization challenges, no transformer support, and complex inference scenarios requiring custom wrappers; it supports classical convolutional models like YOLO family well but is not suitable for transformer-based or stereo depth estimation models.
Deep Dive
Prerequisite Knowledge
- No data available.
Where to go next
- No data available.
Deep Dive
DeepX - my thoughts and overviewAdded:
Hello everyone. My name is Anton.
And on my channel I have a lot of different videos about Edge boards. This time we have DeepX M1 board and a lot of thanks to Andy from Android group because he provided access to this board and give his valuable feedback about his previous experiments with this board. So, I tried to experiment, compare with existing boards and let's go. I think it's another pretty interesting board. DeepX it's M2 accelerator. They have a different form factors for different vendors. For example, Radxa providing them and a lot of different vendors. Today it's pretty popular board in my opinion and they direct competitors in my opinion obviously it's Hailo, Xilinx, MemoryX and a lot of different boards with this form factor. So, [snorts] the general usability of such chips you just plug and play it in your Raspberry Pi or your different Edge board without any acceleration and have pretty good throughputs like 25 tops. I tried to go the whole way for this board from installing from their official documentation on GitHub and checking how good it's work. This video it's not a technical guide on how to do this because you can find this in their official documentation. It's mostly my feedback on what it's working, where I think this board is good for usage, where it's not so great. For basic models, board perform very well.
Detection, classification, regression. I will speak about numbers a little bit later. When you have 25 tops board, you expect something because a lot of competitors have around the same performance. And here it's like compatible to this 25 tops. Of course, it's slower than accelerator, for example.
And of course, it's five much faster than some chips like some rock chips, some NXP, pretty classical performance. The main limitation, in my opinion, it's int eight and quantization.
I didn't find such good approach to quantization like Hailo has, like Qualcomm has.
Maybe I didn't find all documentation on this topic, but for example, with Hailo and Qualcomm, you can find maybe dozen pages of documentation about how to prepare your model to int eight quantization and how tricky some situation can be.
And of course, as you know, it's a pretty complicated process.
Not every model could be exported into int eight, of course.
So, in general, if your model is trained with quantization awareness and it was successfully trained, you easily can quantize your model. Also, [snorts] a lot of classical models, they also can go easily through quantization. I want to mention that I'm speaking about computer vision models. For language models, it's a little bit easier.
But for example, as one of previous board on my channel was MemoryX and they use FP16 operations. Like they stored int eight weights, but with activations they use FP16 to overcome this quantization issue. Here, nothing special. So, in my like model I tested, they worked, but it was like some list of models that usually are working with int8 quantization. So, I expect that if you have some your random model and you're not sure that this model stable for quantization or you know that it's not stable, then better to look on different chips. But, if you're using one of 100 models that are appropriate for quantization, then you will not have any problem. And today, like if you're using some YOLO family models, if you're using some classical classification models, they will work and usually they're working okay with uh quantization. Next important part, it's inference behavior.
When you're infer your model separately just one per image, uh your inference speed will be pretty slow. If you're inferring so like in streaming model when you have uh constant stream of your data, uh your inference speed will be three times higher or maybe higher. At least, this what I benchmarked. It's remember me a lot of different boards and there are two reasons for this. One was the reason for Hailo and some other boards is limitation of your M.2 bus and uh it's like you need additional operation to transfer, to data copy, and so on.
It's heavily dependent on your model, on your image size, and uh but it can create additional latency. And it was, in my opinion, one of the main issues with Accelero because to have all like Accelero has 200 frame per sec 200 tops, and to have all this tops performance, you need a clear uh bus which will be stable. But, there is a second issue.
It's the approach to create a three NPU inside the one main NPU. We already saw this on Rockchip NPUs. They also have like a few different cores. And on the one hand, it's a big plus because you can infer separately a few models and have a good performance. On the other hand, it create additional overhead in complexity of your inference and pipeline. Yes, they mhm have already prepared inference and pipeline and if you are okay working with this board as a streamer, like a stream inference, it will work for you good. But [snorts] if you like need some complex inference where you are not ready, like you need to send a separate images from different devices, uh you probably need to wrote write your wrapper. With Rockchip boards, we did this a lot times and actually it's not a big problem, especially today when you have like chat GPT and other uh devices, but I didn't explore this part a lot because it's like a complex part and uh sometimes it's like limited by some protocols, inferences, APIs.
So, it's not a big issue. In my opinion, uh mostly here it's limited by uh architecture, not by uh M.2 bus. You can use benefits of this and you can have some minuses from this approach. I mainly evaluated uh Python API during my uh tests, uh but it also has a C++ support. I didn't test C++ at all. My main feedback it's from this Python API.
It's better than Accelera. I think it's on par with Hailo. It's the same logic of the inference API. Subjectively, in my opinion, Memory X maybe have the best Python API. Let's speak about model expert limitation. Uh Deep X has a pretty nice model zoo. Uh you can find this on their GitHub, but of course, for me, like maybe the most interesting part, it's export some models that are not in this model zoo and check how it's work. I tested a few models that I used last 1 and 1/2 year, maybe. So, I tried uh to export some uh transformer based models like DynaV2, and none of them work for me. In my opinion, it's a little bit unexpected uh because I know some competitors can work with them, but it's not like totally unexpected because of course, uh this board has transformer limitation.
Uh also, uh stereo depth estimation models failed for me. Again, it's not like unexpected. I expect this. Uh I even didn't try like LLMs uh or other linguistic models. Obviously, it will not work, and it's stated in documentation. One interesting result, I successfully exported YOLO world, and in general, when I explored what is possible to export and what not possible, if you work in like without computer uh without transformers and without some uh complex networks with a few separate images as input and some super complex convolutional 3D convolutional, everything should work out of the box. Most classical convolutional are supported. In my opinion, it's pretty strong here. Also, pretty interesting, this board has 4 GB on board memory. It's much higher than Khadas it's better than memory x a bit better than accelerator but as far as I understand their new board will have only 1 GB of memory and in my opinion it's a reasonable decision because today it's I don't know which model you need for 4 GB but like 1 GB it's a reasonable amount of memory and still it's better than Khadas 8 for example. Of course it's worse than Khadas 10 but Khadas 10 it's a different story. I hope I will have some overview for this board on my channel.
Inference performance. I tested their examples and for me they work exactly as it was stated in the documentation. So I run few samples everything is according this 25 tops. I didn't check everything I just just check a few separate models like mostly yellow families and also here is some comparison numbers for my experiment when I just did a direct python inference. I think that's all short summary. So in my opinion it's great boards you definitely can use it but you need to understand like first limitation in 8 second limitation no transformer support third limitation if your inference is a complex it's not a streaming something more complex complex probably you need to write your custom wrapper how to treat your all three of them are expected and in my opinion it's pretty common situation for this types of board. So good board for from this class. I already know that a lot of people using this board and in my opinion it's a great. So, thank you for watching.
Bye, and I see you in the next video.
Related Videos
OpenHuman VS Hermes AI: Who Wins?
JulianGoldieSEO
285 views•2026-05-29
Long-Running Agents — Build an Agent That Never Forgets with Google ADK
suryakunju
142 views•2026-05-30
This computer is made from real human brain cells. And you can buy it.
Talktmsmedia
3K views•2026-05-28
BREAKING: Microsoft’s New Image Generating Model Beat Out GPT 1.5 and Nano Banana 2
aimmediahouse
122 views•2026-06-03
I Made the Same Anime Fight Scene in Every AI Video Generator
NobleGooseAnime
295 views•2026-05-30
Nvidia Bets Big On AI PCs | New Chip To Power Windows Laptops | Technology | AI Updates | N18S
cnnnews18
3K views•2026-06-01
I Tested NEW Opus 4.8 on Four Projects (Updated LLM Leaderboard)
AICodingDaily
298 views•2026-05-29
3D Platformer Update - NO CAPES
SolarLune
294 views•2026-05-30











