The video highlights the tech elite's pivot from brute-force scaling to architectural elegance, proving that massive parameter counts are no longer a substitute for genuine efficiency. It’s a sophisticated acknowledgment that in the AI arms race, intellectual refinement is finally outpacing mindless expansion.
Deep Dive
Prerequisite Knowledge
- No data available.
Where to go next
- No data available.
Deep Dive
Deepseek V4, GPT-5.5, Kimi K2.6, MiMo Pro, video game agents, 4K editing: AI NEWSAdded:
AI never sleeps and this week has been absolutely exhausting. Deepseek releases their long awaited version 4 and it's a beast. OpenAI drops the best model you can use, GPT 5.5. In the same week, we have not one but two new open- source models which are tied at number one. The best image generator by far also dropped this week. We also have an open-source agent that can create video games end to end. This AI agent can autonomously do machine learning research on your computer. We have an open- source AI that turns videos into HDR with much better dynamic range. This AI can edit images at up to 4K in resolution. We have some ridiculous humanoid robot demos and a lot more. So, let's jump right in. First up, this AI is really cool. It's called multiworld and this can generate video worlds with multiple agents and multiple camera angles at once, hence the name multiworld. Here are some examples. The framework is able to handle multiplayer game environments like this. So you can see two characters within each scene. Plus, it's able to pretty consistently generate different views of the scene. So you can see for each example, there are two different views. This can generate a variety of different game environments. as you can see here, each with multiple agents interacting in the scene while cameras capture the scene from different positions. Now, in addition to just generating these video game worlds, this can also generate videos of robots working together at different angles.
For example, here you can see its generations involving two robotic hands working together, all viewed from different perspectives. And this is actually great for creating training data for robots. So how does it keep everything coherent? The core is this multi- aent condition module. It uses this agent identity embedding to inject distinct agent identities into action tokens which are then fed through the video generator. And then for multiv- view consistency, we also have this global state encoder. It's able to reconstruct the scene just from a few partial observations. And this is also injected into the video generator to generate a coherent scene. In simple terms, it's like having a director who knows exactly where every actor and every camera is in 3D space at all times. The awesome thing is they released the code to this already. So, if you click on this code button and you scroll down a bit here, it contains all the instructions on how to download and run this locally on your computer. Plus, they also released the data sets to this as well. So, this is completely open- source, which is fantastic. If you're interested in reading further, I'll link to this main page in the description below. Next up, this AI is pretty awesome. It's called Open Game, and this is an AI coding agent that can automatically build and test and fix video games. They call this the first open-source agentic framework for end-to-end video game creation. Here are a ton of different games that it has coded up, which you can actually play.
I'll link to this in the description below if you want to check this out further. Here's the methodology behind it. So this system is broken down into three parts. So this first component is where the model is trained and then afterwards it goes through this autonomous agent workflow which runs through things like classification, scaffolding, design, asset synthesis, implementation and verification. And this is like a self-correction loop. And it's kind of designed just like a game developer. And then next we also have this game skill component and this is continuously evolving. So it keeps an ever growing library of successful tasks and projects so it can reuse these templates in the future. It also contains this debug skill that also keeps track of errors that it has encountered and verified fixes. Now as you can see from these game demos, it's incredibly good. It's able to generate all these assets very well. Plus, the game design and mechanics are also designed very well. Definitely better than if you just prompted a state-of-the-art LLM zero shot. Now, if you scroll up to the top of the page, they have released the code to this. If you click on this code button and you scroll down a bit here, it contains all the instructions on how to download and run this locally on your computer. If you're interested in reading further, I'll link to this main page in the description below. Next up, this AI is pretty interesting. It's called UniGen debt and this is an AI system that gets better at making realistic images and spotting fake images at the same time.
So here are some examples of it detecting fake or AI generated images.
If we give it this image, UniGenet is able to determine that this is a fake image. There are some details that don't look realistic. This is another example.
UniGen debt was able to determine that this was a fake image due to disproportionate textures and odd lighting, hinting that this was generated by AI. And conversely, if you give it a real image, UniGenet is able to correctly determine that it's a real image given its visual details. Now, because this model is trained to spot fake images so well, then the reverse is also true. It can also generate images that look a lot more realistic. For example, here is an elegant woman in feathertrimmed hat time timeless profile. You can see unien debt looks a lot more realistic than a competitor called bagel. Or if we generate a pagota and Mount Fuji scene, unien debt is able to generate this a lot more accurately.
Same with these other examples. And as you can see here, across these fake image detection benchmarks, on average, unien debt outperforms the other competitors. So how this works is it uses a unified model with symbiotic self- attention which basically means the generation and detection help each other in a loop and this helps it generate a lot more realistic looking images. A pretty interesting concept.
Now if you scroll up to the top of the page they have released the code to this. If you click on this code button and you scroll down a bit here it contains all the instructions on how to download and run this locally on your computer. If you're interested in reading further, I'll link to this main page in the description below. Also, this week, we have a new number one open-source model. So, Kimmy just released Kimmy K2.6. This is an incredibly powerful and capable model.
If you look at these agentic benchmark scores, then it's pretty much on par with the leading models, including GPT 5.4 High, Opus 4.6, and Gemini 3.1 Pro.
All with the highest or max thinking.
Same with coding and also visual benchmarks. It's really good. Here are some of its impressive achievements. So, here it says Kimmy was able to autonomously download and deploy Quen 3.5 locally on a Mac. It also then implemented and optimized the model in Zigg, which is a highly niche programming language. So, it kind of wrote some custom code to make this model run even faster. So, after 4,000 tool calls and over 12 hours of continuous execution and 14 iterations, Kimmy was able to improve the throughput from this Quinn model from just 15 tokens per second to 193 tokens per second. So, that's pretty insane. It's also insanely good at front-end design.
So, here are just some examples of websites it has created. And like the previous K2.5, this new K2.6 6 also is able to use agent swarms. Now, instead of just controlling 100 agents for the previous model, this 2.6 can simultaneously orchestrate up to 300 sub agents across 4,000 coordinated steps.
That's pretty insane. Imagine like an entire army of AI agents working for you all at once. And this can help you automate a ton of stuff really quickly.
For example, you can design and execute quantitative strategies across 100 global semiconductor assets. Or you can get it to write an astrophysics paper with over 20,000 data points and 14 charts. Or you can scrape a 100 jobs and generate 100 tailored rums for each job.
Or if you're like a webdev agency, you can get it to scrape the web and find 30 stores without websites and then generate 30 landing pages for them in minutes. And then you can just cold email each of them again automatically through this agent and say, "I noticed you don't have a website. I just helped you build a website. Would you like to work with us?" The awesome thing is this is open source and you can plug and play this through any agentic system including OpenClaw or Claude Code. Next, here's an independent leaderboard from artificial analysis. If you look at their intelligence index, you can see that Kimmy is currently the best open- source model out there, beating GLM 5.1, which was the previous king. It's also edging really close to the top closed models out there. And in terms of cost, if you choose to use it via their API, this is insanely cheap, even cheaper than GLM 5.1 and way cheaper than the closed AI labs. Now, of course, because this is open- source, you can also just download the entire model to run locally on your computer. Now, as the best open-source model out there, this is quite huge at 1.1 trillion parameters.
So, the total size of the model is just under 600 GB. So, you'll need to link multiple GPU devices in order to fit this model. Anyways, if you're interested in reading further, I'll link to this main page in the description below. Next up, this AI is super useful.
It's called Open Code Design, and this is an open-source AI design system. You can plugandplay your own model, and this is like a self-hosted alternative to Claude Codeesign or Lovable or Figma AI.
So, this can take in a text description of what you want to design or you can also add in reference images and assets and it can autonomously build it out for you. It can build beautiful user interfaces or slide decks or PDFs or posters or other designs. So, here's an example of the interface here. We're getting it to design a product change log page presented as a vertical timeline. And you can see this agent is able to autonomously create the page on this right preview. The awesome thing is this is open- source already. Plus, it's relatively easy to install, so it's just a simple.exe file. If you're interested in reading further, I'll link to this main page in the description below.
Also, this week, Xiaomi continues to cook some really strong stuff. This week, they released two new models. One is called Mimo version 2.5 Pro. This is their strongest model yet. As you can see, for Agentic coding benchmarks, it's on par with the top closed models out there, including Opus 4.6, Gemini 3.1 Pro, and GPT 5.4. Same with these General Agentic benchmarks. Here you can see that MIMO was able to even code up a fullfeatured video editor. This is a desktop app with multiple tracks, clip trimming, crossfades, audio mixing, and export. It built all of this with over 8,000 lines of code and over 1,800 tool calls. and 11.5 hours of autonomous work. And if you look at this chart from artificial analysis, you can see that MIMO 2.5 Pro is actually tied with Kimmy K2.6. So both of them are currently the number one open-source models. This is also incredibly efficient. The X-axis is the number of tokens per trajectory and the Y-axis is performance. So ideally, you want to be in the upper left corner.
And as you can see, both MIMO models are way more efficient compared with the other top models. And the awesome thing is here they say that the Mimo 2.5 series will be open sourced soon. So stay tuned for that. For now, you can try it via their online chat interface, which looks like this. Or you can also link it via their API. Now this is text only, but this is designed for agentic workflows. So it can access and use a variety of different tools. But in addition to Mimo Pro, Xiaomi also released this version 2.5 which is a multimodal model. So this can also understand images and audio in addition to just a text prompt. If you look at these multimodal agentic and understanding benchmarks then as you can see again it's on par with the top closed models out there including Opus 4.6 and GPT 5.4. And again they are planning to open source this but it's not out yet. So right now you can try this in their online chat interface called AI Studio which looks like this.
And then at the top here is where you can select both version 2.5 Pro and version 2.5 which is their multimodal model. If you're interested in reading further, I'll link to both these pages in the description below. Next up, this AI from HuggingFace is pretty damn impressive. It's called ML intern. And like the name implies, think of this as like a machine learning intern which can autonomously read research papers, train models, and write AI code. So this is essentially an agent that can take in a plain text prompt like fine-tune this model on my data set. And from that it would autonomously proceed to well carry out the task. And it did some crazy stuff. For example, if you get it to train the best model for scientific reasoning, it actually went through some research papers and then found a benchmark for scientific reasoning and then it fine-tuned Quen 3 to do well on this benchmark. After 10 hours, it was able to achieve 39% on GPQA, which is like graduate level science questions versus the original 10%. So on the GitHub repo, you can see how this agentic framework works. We have an agentic loop which can use multiple tools and this is designed to function just like a machine learning researcher.
And the reason it's so good is it makes full use of the hugging face ecosystem.
It's able to find papers on hugging face papers and then also pull in relevant data sets from hugging face data sets.
It's also able to browse all the models that are posted on hugging face. And this agent is also able to emit events so you can stream or monitor everything in real time. This is like having a machine learning researcher that lives on your computer which can read papers, search GitHub, spin up training jobs, write the code, all within the hugging face ecosystem. And the awesome thing is this is completely open source. So you can download this and try it right now on your computer. If you're interested, I'll link to this main GitHub page in the description below. In humanoid robot news, we had the second ever humanoid robot marathon successfully held in Beijing. and we actually had some shocking results. So last year, the Uni Tree H1 pretty much dominated most of the races. It was by far the fastest runner. But this year, a spin-off lab from Huawei called Honor, which also makes really awesome phones, by the way, they dominated the podium. Their autonomous humanoid robot, which is called Lightning if you translate it from Chinese, it won the autonomous category. In fact, it completed the course which was over 21 kilometers in just 15 minutes and 26 seconds. This is a huge achievement because this actually beats the men's world record of 57 minutes and 20 seconds set by Uganda's Jacob Kaplimo by nearly 7 minutes. So, this is an insane feat. We already have humanoid robots being able to outrun the best of the best human runners in the world. Not only that, but we also saw massive growth in the number of labs and robots competing compared to last year.
There were over a 100 humanoid robots, which is like five times as much as last year. And about 40% of them ran fully autonomously using AI, sensors, and navigation without any remote control or teleoperation. This is a huge jump compared to last year's marathon where most were remote controlled. Most of them were really clumsy and only a few of them ever finished the race. So, in just a span of a year, you can see how much this technology has accelerated.
Pretty exciting stuff. In other humanoid robot news, we have a new demo from Uni Tree. Now, the Uni Tree robot is known to do really well in acrobatic stuff like flips and kung fu and dancing. But here we see a new demo of it demonstrating incredible balance and agility using wheels on its feet. So we first see a scene of it with a single wheel attached to each foot. And it's able to glide smoothly around this outdoor paved area and even perform dance-like maneuvers. It's able to spin in tight circles or balance on a single moving wheel while extending its other leg into the air. So pretty acrobatic already. But then we can also switch that single wheel with a rollerblade like this. And so here you can see it wearing these roller skates and it's able to navigate this urban environment with realistic side to side skating motions. And here it's even able to skate backwards while doing crossovers.
Very impressive. And then finally, instead of wheels, they even put these ice skate blades on the robot. And it's able to skate beautifully on this ice rink. Notice that the way it glides is very similar to like how humans skate.
Now, this demo is actually super impressive because bipedal or basically two-legged robots are inherently topheavy and very unstable while walking. So, it's already tough to get them to balance on just two legs. But if you put them on wheels or blades, then this drastically increases the complexity of keeping them upright and balanced. So to perform these actions, the robot's control system has to make like thousands of micro adjustments per second in real time. It needs to coordinate its legs, torso, arms, and everything else continuously to manage its center of gravity and balance. And not only that, but here they even got it to perform these dynamic movements like spinning or skating backwards, which makes it even more impressive. If you want to level up your content creation, definitely check out Higsfield, the sponsor of this video. They just launched a powerful new workflow combining the legendary GPT image 2 and seed dance 2.0. It's basically a full creative pipeline from prompt to final video all in one place. Here's how it works. First, you generate your visuals using the best image model out there, GPT image 2. Just type in a prompt and it creates highquality photorealistic images with insanely accurate details, lighting, textures, even text rendering is super clean. You can generate multiple consistent images at once so your characters and scenes actually stay coherent across shots. Then step two, you take those images and bring them to life using the best video model out there, Seed Dance 2.0. This is where things get crazy. It turns your static images into full 1080p videos with cinematic motion, realistic physics, and even native audio generated in sync.
You're not just getting visuals. You're getting a complete scene with sound, movement, and storytelling baked in.
What's really impressive is how controllable everything is. You can guide camera angles, motion, pacing, even reference specific styles using images or clips, and the character consistency holds across multiple shots, so you can actually build real narratives instead of random clips. The use cases here are massive. If you're a creator, you can go from idea to finished video without filming anything.
If you're running ads or building a brand, you can generate entire campaigns visually before ever touching a camera.
And for agencies or content teams, this cuts production time down from days to minutes. Whether you're making ads, short films, or social content, this combo of GPT Image 2 and Cance 2.0 is easily one of the most powerful AI creative workflows available right now.
Try it using the link in the description below. Also this week, OpenAI releases the best AI model you can use right now, GPT 5.5. And at least to me, this is noticeably more performant and less errorprone compared to Claude Opus and other top competitors. I was able to get this to autonomously code a ton of things without a lot of back and forth.
Now, I already did a full review video on this, so I'm not going to repeat too much here. definitely see this video to get a sense of all the insane things that it can do. I'll link to it in the description below. Also, this week we have a new AI called Uni Geo. This is an image editing tool that lets you control the camera position precisely. Here are some examples. So, let's say this is your original image on the left. Well, you can prompt the camera to move by a certain amount of degrees and it's able to edit the image by this precise amount. Or here are some other examples where you can specify the camera to any other position in 3D by specifying the degrees and it's able to handle this very well. It doesn't have to be fixed to the center. You can also get it to pan and tilt. And of course, you can string multiple movements together. For example, for the first prompt, you can get it to pan left by 16° first and then tilt up by 7°. And then here are some additional examples for your reference.
Now, if you use one of the standard image editors out there like Nano Banana or GPT Image 2, you're not going to get this level of precision, you can't actually control exactly where the camera moves or how many degrees to tilt the camera. But with this method, it gives you precise control. So, how this works is it takes your input image and it actually reconstructs the scene into a 3D point cloud like this. That way, when it takes in your prompt, it's able to control exactly where the camera should move within this 3D scene and then rerender the scene accordingly.
Now, if you scroll up to the top of the page here, it says the model is coming soon. So, it does look like they're going to open source this. For now, if you're interested in reading further, I'll link to this main page in the description below. Next up, this AI is super useful. It's called Edit Crafter, and this can edit images of up to 4K in resolution. So, here are some examples for your reference. As you can see, it's able to edit highresolution images while preserving the details. Let's say on the left is the original image. Well, we can turn this into a desert setting at sunset, and here's what we get. Or we can change the moon to Earth, and here's our result. Now, here's an example where the image is a lot larger at over 4,000 by 2,000 pixels. And as you can see, we can convert this cherry blossom into maple like this. However, note that the colors aren't exactly the same as the original image. It does tend to add some contrast and saturation to the image.
And then here we have a true 4K image, 4096x496.
Here we can turn this forest into a burning forest. And here's our result.
or we can turn this into Stonehenge. And here's what we get. Again, not the best image editor. It tends to oversaturate the colors. Now, at the top of the page, they have released a code button. So, if you click into this and you scroll down a bit here, it contains all the instructions on how to download and run this locally on your computer. Note that here, if you are planning to edit 4K images, then it does require 24 GB of VRAM. If you're interested in reading further, I'll link to this main page in the description below. Also, this week, as you probably know, OpenAI releases by far the best image generator you can use right now. It's called GPT Image 2 or Chat GPT images, too. And this is an insanely performant model that can generate full diagrams and infographics and other visuals. It can also generate super realistic images. It has a lot fewer errors than the previous leader, Nano Banana. For example, I got it to generate a grid of a 100 posters of anime shows or movies. And here's what I got from GPT Image 2. Notice the insane detail and accuracy of its generation.
And then here is a screenshot of a Windows 11 desktop with a ton of different windows and everything actually looks good. All the text in this Slack chat, plus even this financial spreadsheet in this Excel window is actually accurate. like all these columns actually make sense.
Anyways, I already did a full review of GPT Image 2, so I'm not going to repeat too much here. See this video if you want to learn more. Also, this week, LTX releases a super useful video tool. If you haven't heard of LTX, they are behind the best open-source video generator with sound. Now, previously, all of these LTX generations are just 8bit SDRs. So, the colors and dynamic range were very limited. But now with this new Laura, which you can just add on top of any LTX workflow, you can turn the video into full HDR. This increases the dynamic range by a lot and makes the colors look a lot better. You can see that the details of everything and the colors are a lot more balanced, which makes it way better for post-processing.
You can plug this through a video editor and this gives you more room to adjust the colors to however you want. So, it's pretty lightweight. This Laura is only like 340 megabytes in size and you can just add this on top of your existing workflow, which is fantastic. If you're interested in reading further, I will link to this page in the description below. Now, in addition to OpenAI's GPT image 2, Google also released a beast of an image generator. This is slightly different from Nano Banana. So, this is called Vision Banana. And here they call it a state-of-the-art unified model for both image understanding and generation.
Vision Banana is basically able to take an image and break it down into all these different maps. We have a segmentation map, instance segmentation, expression segmentation, depth estimation, and surface normal estimation. This is like the orientation of surfaces in the image. So here are some examples of semantic segmentation.
We can enter a prompt to get it to segment certain elements of an image.
And here's our result. Or here's another example where we only get it to segment cat ears and the exit sign and the background. And here's what we get. So it's able to understand exactly what to segment within the image. Or instead of cat ears, we can get it to segment the entire cat as well as the lock and the exit sign and the background. And here's our result. We can even specify like what color we want each segment to be.
Or here's another example where we can get it to segment certain things in the image. And here's our result. Notice that it's super accurate. Or instead, here's another segmentation where we get it to segment the menu and the dessert and the patterns on the wall. And here's our result. Here's an even trickier one.
So, if we get it to segment each price tag, but make it colored differently, it's able to handle this very well. Or we can also get it to segment each croissant, but each one with a unique color. And here's what we get. Here's an even trickier one. Segment and color each piece of garlic differently. And it's able to detect the garlic and segment each one very well. Or here, it's even able to segment all the prices on the price tags and color them differently. And here's our result. Now, this is also incredible at detecting depth in an image. So, we can turn this image into a depth map like this. Notice the amount of details. This is really good. Here's an even trickier example where we have a ton of foliage everywhere, but it's able to generate the depth of this extremely well. Here's another example. And this can also predict the normal or basically the orientation of all the surfaces of an image. So, here's an example of that.
Even though this is a super messy scene, it's able to predict everything very accurately. And here's another example.
Really cool. So, in terms of all these image understanding metrics, you can see that Vision Banana is even better than some of the top competitors like Meta's SAM 3, which I've gone over on my channel before. In terms of 3D understanding, this is also state-of-the-art, much better than Depth Pro or Mo, which I've featured before.
Now, for now, they've only released a technical report. There's no indication whether they will open source this or not, but if you're interested in reading further, I'll link to this main page in the description below. And we're not done yet with model releases. So, Tencent Hunen also releases Hi3 preview.
This is their latest and best language model. So, here are some specs. I think there are misspellings here, but it has 295 billion parameters. So it's quite a large model but not like the largest which are over a trillion parameters.
And then this is a hybrid expert model.
So when you use it only 21 billion parameters are active making it pretty efficient. And then it has a context length of 256K.
Now again this is not among the state-of-the-art models which are over a trillion parameters. This is like five times smaller but it's able to punch above its weight. So if you compare this with other competitor models. One thing to note is these are older generations.
So like GLM 5, we already have 5.1. Kim K 2.5, we already have 2.6 this week.
And GPT 5.4, although we have 5.5 this week. But nevertheless, given a model that's like five times smaller, it's still able to achieve roughly the same performance as these competitor models, which is really impressive. So this is among the best model in terms of reasoning and agentic use given its size and efficiency. Here are some other benchmarks in terms of context learning and instruction following. And again, it's really close to the top models which are like five times larger. Now, in the AI space, 10 cent is known to release some pretty cool 3D model generators like Hunyan 3D. And of course, they're also famous for their video generator called Hunyen. But it's really cool that they're also focusing on developing these language models as well. The awesome thing is this model is already open- sourced. So if you scroll down here, here's a link to the GitHub, hugging face, etc. Note that the total size of this is like 600 GB. You'll still need to combine multiple GPUs in order to fit everything. But if you do have this, then if you click on this GitHub repo and you scroll down a bit here, it contains all the instructions on how to download and run this locally on your computer. They also give you the script on how you can fine-tune this further. If you're interested in reading further, I will link to this page in the description below. Also, this week, the long awaited Deepseek version 4 is finally out. Note that this is just the preview version, so the main version is still coming soon. First of all, if you look at some of these benchmarks, it's pretty much on par with the leading closed models out there, including Opus 4.6 Max and GPT 5.4 Extra High. And the thing is the cost of this is like way lower than the closed models which I'll talk about in a second. Now they actually released two different models.
There's the pro version which is a lot larger at 1.6 trillion parameters. When you use it 49 billion of those parameters are active. And of course this one is a lot more performant. And then we have a smaller flash model which is 284 billion parameters. Both of them have a context window of a million tokens. So, you can fit roughly 700,000 words into your prompt at once or a medium-sized codebase. More than enough for regular stuff. Now, if you look at this leaderboard by artificial analysis, then Deepseek V4 unfortunately is not the number one open- source model out there. It scores two points below Kim K 2.6 and Mimo 2.5, which were also released this week. So, it's a super competitive space right now. And in terms of the pricing, it's also not as competitive as Kim 2.6. six or MIMO version 2.5, but still way cheaper than the top closed models out there. If you look at another top leaderboard called Arena, where people can blind test different models side by side, then we get the same thing. So, in terms of open-source models, DeepSeek version 4 is ranked number three, still behind Kim 2.6 and my personal favorite, GLM 5.1, which was released a few weeks ago. So, not the most performant open- source model you can use right now. Now, interestingly in terms of this vibe code bench, then DeepSeek version 4 does score number one, beating Kimmy, GLM, and Miniax. Now, the awesome thing is as before, they've open sourced this Deepseek model already. So, here are the weights to both models. Notice that the base version is used for fine-tuning.
But if you just want to use it and you don't want to train it further, then you can just use the non-base version. So, if I click on this flash model, you can see that this is pretty small. This is 160 GB in size, while the Pro model is 865 GB in size. So, if you do want to run this locally, you'll need to link like multiple compute devices together in order to fit the model. Anyways, that is Deepseek version 4. I know a lot of people have been waiting for its release for a long time, so it kind of built up a lot of hype. Just from the performance benchmark so far, it doesn't seem to be the best of the best open- source models. If you're interested in reading further, I'll link to this main page in the description below. Also, this week we have a new AI called co-inact. And this generates really realistic UGC or influencer style videos introducing products. It's really simple to use.
This just takes in two images, a photo of the product and a photo of the person plus the prompt. And it's able to generate a video of that person introducing the product. Now the really cool thing is for the prompt you can even specify instructions step by step as you can see in these examples and it's able to generate a video following each step sequentially. The key innovation here is this dual stream code generation which you can think of as like one stream handling pixels while the other stream handles the physical relationship between the human and the object ensuring that the interaction makes physical sense. And this is a pretty huge deal for e-commerce and content creation and probably bad news for influencers because now instead of hiring influencers to make these videos if you're a company, you can just plug your product and a photo of any person through this AI to mass-produce a ton of content in just minutes. Now, at the top, they have released a GitHub repo.
And here it says they are planning to release the inference code and the model within a week. Plus, they're also going to release the training code to this.
So, this is like 100% open- source, which is fantastic. For now, if you're interested in reading further, I'll link to this main page in the description below. And we're not done yet with state-of-the-art model releases. So, this week, Alibaba also drops Quinn 3.627B.
This is their latest dense open-source model jam-packed with intelligence and performance. Note that this is different from their mixture of experts model, which I featured last week, which is called Quen 3.6. 6 35B A3B. All right, this one is a lot more efficient because only 3 billion parameters are active when you use it. But this new one that was released this week is just a jam-packed 27 billion parameter dense model. All 27 billion parameters are used. This does require more compute, but it's way more performant. As you can see from these benchmarks comparing this new Quent 3.6 6 against the open- source Gemma 4, which was released a few weeks ago and is a lot larger. Quen 3.6 performs a lot better. This is currently the best medium-sized open-source model you can use right now. It's designed to have outstanding agentic coding abilities. As you can see from these benchmark scores, it also has strong reasoning across text and multimodal tasks. In fact, this is natively multimodal, so it can also analyze images and video. The awesome thing is, as with the other Quen models, this is already out for you to download on your computer. So, if you click on this hugging face link, note that this is just 55.6 GB in size. So, this could potentially fit in just a high-end GPU.
So, if you're looking for the best medium-sized open-source model you can use right now, this is definitely it.
Don't sleep on Quen 3.627B. I'll link to this in the description below if you're interested in reading further. All right, next up we have a new AI called Uni Mesh. And this can both generate and edit 3D models. Here are some examples.
Given just a text prompt or an image, it's able to generate a full 3D model like what you see on the left, but you can also edit it further with a text prompt. For example, you can get the astronaut to hold the moon or make this air balloon made of grapes or add wheels to this bulldozer or change the color of this motorcycle. and so on and so forth.
Now, because it has such a good understanding of, you know, 3D objects, you can also do the reverse. You can also plug in a 3D object and get it to describe it. This is called option captioning. And here are some examples of its descriptions based on the above 3D object. Now, at the top, they have released a GitHub repo. And here it says they are planning to release the model by late May of 2026. So, stay tuned for that. If you're interested in reading further, I'll link to this main page in the description below. And that sums up all the highlights in AI this week. Let me know in the comments what you think of all of this. Which piece of news was your favorite? And which tool are you most looking forward to trying out? As always, I will be on the lookout for the top AI news and tools to share with you.
So, if you enjoyed this video, remember to like, share, subscribe, and stay tuned for more content. Also, there's just so much happening in the world of AI every week, I can't possibly cover everything on my YouTube channel. So, to really stay up to date with all that's going on in AI, be sure to subscribe to my free weekly newsletter. The link to that will be in the description below.
Thanks for watching and I'll see you in the next one.
Related Videos
OpenHuman VS Hermes AI: Who Wins?
JulianGoldieSEO
285 views•2026-05-29
Long-Running Agents — Build an Agent That Never Forgets with Google ADK
suryakunju
142 views•2026-05-30
5 Mind Blowing Omni Uses Cases
PaulJLipsky
1K views•2026-06-02
This computer is made from real human brain cells. And you can buy it.
Talktmsmedia
3K views•2026-05-28
BREAKING: Microsoft’s New Image Generating Model Beat Out GPT 1.5 and Nano Banana 2
aimmediahouse
122 views•2026-06-03
I Made the Same Anime Fight Scene in Every AI Video Generator
NobleGooseAnime
295 views•2026-05-30
Nvidia Bets Big On AI PCs | New Chip To Power Windows Laptops | Technology | AI Updates | N18S
cnnnews18
3K views•2026-06-01
I Tested NEW Opus 4.8 on Four Projects (Updated LLM Leaderboard)
AICodingDaily
298 views•2026-05-29











