Step 3.7 Flash is a 198 billion parameter sparse mixture-of-experts model from StepFun that activates only 11 billion parameters per token, featuring a 256K context window, native image and video processing, and three reasoning levels (low, medium, high). The model demonstrates strong agentic capabilities by analyzing Python codebases and generating structured reports, while also supporting 80+ languages including regional and RTL scripts. It is available under Apache 2 license and supports VLLM and LangChain frameworks. The model excels in agentic reliability and multimodal tasks but currently trails behind GPT-5 and Claude Opus 4.7 in raw coding and terminal benchmarks.
Deep Dive
Prerequisite Knowledge
- No data available.
Where to go next
- No data available.
Deep Dive
Step 3.7 Flash - 198B Open Source Model That Does Everything; Does it Really?Added:
You're about to see some dancing steps from Step 3.7 Flash, a brand new mixture of expert model from StepFun. I have given it this prompt to go through my Python project and understand what each file does and then create a report through Hermes agent. This is Fahad Mirza and I welcome you to the channel.
The model which we are discussing is this freshly out of oven Step 3.7 Flash from StepFun 198 billion parameter model open weight multimodal and available right now. We are various platforms. You can also get it installed locally, but you would need to sell a kidney and multiple GPUs in order to get it working locally.
If you want to know how to do that still, you can just go to my channel and just search with StepFun as we have installed many models from them locally, but for the purpose of this video, I'm just going to save my kidney and do it through API because I don't have a multi-GPU cluster. But if you're looking for one, you can simply go to the video's description and get Mastered Computes uh link and a discount coupon code. Follow me on X for any AI updates.
Okay, so marketing is done. Let me now quickly tell you all about this model.
And by the way, this already has finished, but we will get to it shortly because I think now you know what we are going to unpack in this video. This is again a sparse mixture of expert model. As I said, only 11 billion activate per tokens. The context window is huge, 256K, which is quite good. It is able to handle images and videos natively and we are going to test it out shortly.
Also, the good thing is that it gives you three reasoning levels and it is also available in contest format, but still even for contest you would at least need two GPUs or three GPUs to run it decently.
The reasoning levels are low, medium, and high, which is quite common these days, but the best thing is that the model is Apache 2 license, which is always a good idea. VLLM SG Lang both supported. We will also talk about these benchmarks without boring you to death with numbers. Uh but for now, let's go back to our Hermes agent and see what exactly the model has done here.
And now, let's review the output. I'm just going to scroll down in Hermes agent, and you can see that the model has scanned five Python files, understood the full architecture of a cert monitoring app, which is my app, and wrote a detailed structured report, which I will show you shortly, all in one shot.
Another cool thing which I can identify right off the bat is that it also used parallel file reads, terminal command, and a diff review tried without any hand-holding. This is a diff.
Pretty good.
This is impressive.
But I think there were a couple of hiccups. One is that it was hitting max output token mid-response, which is not good.
You know, I already had given it a lot of tokens, by the way.
Anyway, you see, it has omitted it, but then it has given us the format of the report.
Um but it recovered, which is good.
Let's see what sort of report it has generated.
And this is a markdown file. Let me open it in the VS Code.
So, if you look at this report, this is perfect, I guess. You see?
This is quite good. It has really gone deep and wide in order to obtain what I have given it.
All the key symbols, all the files are there.
It has given me the synopsis of the code.
I'm taking my sweet time just to make sure that we do it properly and be fair with the model. I think it has done well.
It has done pretty well and this is all open source by the way.
Well done. So, Hermit agent plus this, I think this is a good coding companion.
And now let's do a multi-linguality test. For that, I'm going to give it this a very very fine sentence that whatever you can do or dream you can begin it. Boldness has genius, power and magic in it.
You need to So, the model needs to translate it into all of these world languages and if you can tell me in the comments who has said this without looking up in any of the search engine or AI, full marks to you.
Okay, so let me now run this and we will check out the translation. Please also help me out. What do you think about it in your own native language? And if your language is not included into this list, let me know, please and I will include it in the next video.
And this is the result and to be honest, model has handled all 80 plus languages including obscure ones like Gutnish, Faroese and Tigrinya without breaking a sweat. I'm just going to scroll through.
I have checked few and I think it has done quite well. Some of them are slightly little but overall, it's a solid result.
And it has even gone with some of the scripts and RTL like Arabic, Urdu, Burmese and few others all rendered correctly.
And I'm just scrolling down. You can pause and check your language.
And these are 80 plus languages as I said from all across the globe.
Some of them are regional but most of them are quite good. I have just included this beautiful Balochi language from Pakistan.
And I think I also included Basque language. I guess this is a um thing just on the border of Spain and France.
So, let me know what do you think about it. Even it went up with this gibberish translation. So, pretty good.
Okay, so now let's try out couple more things.
Now, in the next test, what I'm going to do is to check multimodality, but with a twist. I'm going to show the model this image of a lightning storm.
And in the Hermeze agent, I'm asking the model to look at this image, and I will also give it the path shortly.
And then I'm asking it to uh I have described it that it shows a lightning strike over this create an index.html file on the basis of it, and then I just build that index.html file, which we we will actually run it in the browser. So, let me give it the path of the image.
Now, I have given it the path, so I will just run it. While it runs and does its magic, let's talk about uh maybe the benchmarks very quickly.
So, you can see that on Agentic benchmarks, it has already topped Claude Eval, which tests how well a model holds its instructions across long multi-turn sessions, which we also tested it around cross-file checking. In our first prompt, it has beaten both Gemini 2.5 Pro and GPT-4, which are quite old. That is the only drawback here. Anyway, on multimodal, it leads simple VQA search, but on terminal bench and GDP Eval, it trails behind GPT-5 and Claude Opus 4.7.
And on uh humanities last exam or HLE with tools, it also sits behind both of these as well. Sweet bench bro puts it in second place for agentic coding. So, I think strong on agentic reliability and multi-modality plus multi-linguality, but not yet at the top for raw coding and terminal tasks. But, we will see.
And it has created the code. It is giving me all the information around it.
I'm also showing you the raw stuffs just as a matter of transparency. So, this is all the synopsis of the code, and it has saved it here on this location. I haven't really looked at it yet. This is index.html file.
Let's check it out together. I'm just going to drag and drop.
And already it looks good. Instruction following is awesome. It has changed the colors, but I'm still waiting for the lightning.
Do you see any lightning? I don't see any lightning.
I don't see any lightning. I do see some terrain is there.
All that stuff is there. You know, some uh sort of pressure vibes are coming these concentric circles, but there is nothing there. I think this test this has failed, and I think this really matches with uh with saw around raw coding. Doesn't really do that well.
But, at least it was able to see that image and then understand it and tried its best. Anyway, I cannot test video with it because uh this API doesn't support it, and I think I already have exhausted all my credits there.
Um bit expensive, too, by the way. I think I already have spent around $10 or so just for these tests.
Anyway, let me know what do you think about this model.
And if you want me to make a local video, please become a member of the channel so that I could just fund some multi-cluster GPU, and then we will have another fun round. That's it. Thank you for all the support.
Related Videos
OpenHuman VS Hermes AI: Who Wins?
JulianGoldieSEO
285 views•2026-05-29
Long-Running Agents — Build an Agent That Never Forgets with Google ADK
suryakunju
142 views•2026-05-30
This computer is made from real human brain cells. And you can buy it.
Talktmsmedia
3K views•2026-05-28
BREAKING: Microsoft’s New Image Generating Model Beat Out GPT 1.5 and Nano Banana 2
aimmediahouse
122 views•2026-06-03
I Made the Same Anime Fight Scene in Every AI Video Generator
NobleGooseAnime
295 views•2026-05-30
Nvidia Bets Big On AI PCs | New Chip To Power Windows Laptops | Technology | AI Updates | N18S
cnnnews18
3K views•2026-06-01
I Tested NEW Opus 4.8 on Four Projects (Updated LLM Leaderboard)
AICodingDaily
298 views•2026-05-29
3D Platformer Update - NO CAPES
SolarLune
294 views•2026-05-30











