AI video generation models like Seedance 3.0 use dual-branch diffusion transformer architectures (MMDiT) that process audio and video in parallel rather than sequentially, enabling more efficient generation with tighter audio synchronization and lower costs. The 18-minute single-prompt generation capability represents a significant advancement in narrative coherence, while the 8x cost reduction (from approximately 14 cents to 1.75 cents per second) makes AI video production more accessible. However, claims about Hollywood-grade quality and specific release dates remain unverified, highlighting the importance of distinguishing between technical capabilities and marketing hype in AI technology evaluation.
Approfondir
Prérequis
- Pas de données disponibles.
Prochaines étapes
- Pas de données disponibles.
Approfondir
Seedance 3.0 — NEW AI Video Generator Makes 1H+ MoviesAjouté :
Disney sent ByteDance a cease and desist earlier this year. 400 working actors signed an open letter asking Congress for protection, and right in the middle of all that, a leak dropped, [music] and it explains exactly why everyone is panicking. SeeDance 3.0, an 18-minute movie agent, something called narrative memory chain, 1/8 the cost of today's SeeDance 2.0, it claims Gemini Omni looks like it's still loading. Every one of those is a reported claim, and we're going to rate them today. But first, let me show you why this model is worth your attention even before the 3.0 hype.
Here's what actually happened in order without the drama layer. In February 2026, ByteDance published SeeDance 2.0.
Shortly after that, Disney and Paramount Skydance both sent cease and desist letters over IP concerns in AI video outputs. 400 actors signed a letter asking Congress for labor protections, and two senators, Blackburn and Welch, one Republican and one Democrat, wrote directly to ByteDance's CEO, Liang Rubo, demanding the model be shut down. I'm not here to weigh in on any of those legal or political questions. What I will tell you is what they signal. When you see that many incumbents move that fast in response to a single model, something real is happening on the technology side, and the release cadence is the clearest signal of all. ByteDance went from SeeDance 1.0 to a rumored 2.0 mini in under a year, and the pace is accelerating. That's not a company iterating slowly. Something is clearly coming fast enough to scare people who have a lot to lose. Let's look at what this model actually is, >> [music] >> and what that leak says. SeeDance is ByteDance video generation model line built by the Seed research team. That team has roughly 1,500 researchers. It's led by Dr. Wu Yong Hui, previously a Google Fellow and VP of research at Google DeepMind. He spent 17 years at Google before ByteDance hired him. So, when people ask why this lab is competitive with Google and OpenAI, the answer starts there. The technical differentiator is the architecture. SeeDrones uses something called MM DIT, a dual-branch diffusion transformer that processes audio and video in parallel, not sequentially.
That sounds like a detail, but it matters a lot in practice. Sora and early VEO designs generate video first and then try to fit audio on top.
SeeDrones co-generates both from the start. That's why the audio sync is tighter, and it's a big reason the cost per second is lower. You're running one pass instead of two. Now, let's look at where SeeDrones 2.0 actually sits right now, because this is verified data, not a leak. On the Artificial Analysis video arena, SeeDrones 2.0 is ranked number one for text-to-video with audio with an L 0 of 1,214.
It's also number one for image-to-video with audio at 1,194.
If you haven't heard of it, that's partly because most US tools don't expose it directly. The leaderboard doesn't lie about that. There is one exception. Happy Horse 1.0 from Alibaba's ATH team leads the no audio benchmarks. 1,293 for T2V without audio, 1,294 for I2V. That team was led by Jiang D, who previously led Kling development at Kuaishou. So, the top two video models right now are both Chinese labs, and neither of them is Sora. Sora's app shut down on April 26th. The API sunsets on September 24th. That's not a prediction.
That already happened. So, that's the baseline. SeeDrones 2.0 is genuinely the best audio paired video model on the public leaderboard today. Now let's look at what the 3.0 leak is actually claiming and what I think is real.
Everything in this section comes from a single source, a Chinese language post by an account called @MokouCN dated February 14th, 2026. The account is sometimes attributed to someone called Dr. Liu Zhang. I want to be clear about what that means before we go any further. I checked the SeeDance 2.0 and 1.5 pro paper author lists. That name does not appear. There is no verified ByteDance affiliation. The claim went viral on Western Twitter through @VaserX, @Mark_K, and @MarkGadala, but the chain traces back to one person with no confirmed inside access. That does not mean the claims are false. It means we rate them accordingly. I'm using the same framework we used in the GPT-6 video, credible, possible, unverified.
Possible, 18-minute single prompt generation. The leak claims SeeDance 3.0 can generate up to 18 minutes of coherent video from a single prompt. I'd call this possible, not because @MokouCN said it, but because it lines up with ByteDance's published direction. Their work on long-form coherence and the dual branch architecture both point that way.
The technical path from 15-second clips to 18-minute generation is plausible from a team that already nails audio sync. I'm just not calling it confirmed from a single unverified source.
Possible, narrative memory chain. This is the claim that got the most attention. Narrative memory chain is described as an architectural feature that keeps characters and environments consistent across long outputs. The same character looks the same in minute 15 as in minute one. Same lighting, same setting throughout. That problem is real and well documented in AI video research. ByteDance's published work on temporal coherence in MM DID suggests they've been working on exactly this. The name comes from the leak. The underlying problem and ByteDance direction towards solving it both check out. But, the specific branded name and spec are not confirmed anywhere.
Credible. MM DID V2 architecture upgrade. The leak references a second generation version of the dual branch MM DID. This is the most technically boring claim in the leak, which is probably why it's also the most likely to be true.
ByteDance has published incrementally on MM DID. A V2 iteration between 2.0 and 3.0 is exactly what you'd expect from a team on this cadence. I call this one credible. Possible. 1/8 the cost of today's C DID 2.0. This is the headline number that made Rhett Reese panic. He co-wrote Deadpool and Wolverine, and he posted, "I hate to say it. It's likely over for us on X." And honestly, that reaction makes sense. But, I want to be precise about what I can verify. C DID 2.0 today on Volcano Engine costs roughly 1 yen per second, which is about 14 cents. Third-party access through Fall or Segmind runs from 5 to 20 cents per second. The cost trajectory is already moving toward cheaper. A further 8x reduction with a new architecture?
It's possible, but I haven't seen a ByteDance source confirm that number, and I'm not treating it as fact.
Possible. Native multi-shot directing and real-time preview. The leak describes a built-in multi-shot director mode, a movie agent layer. You describe a scene, and the model works out the shot structure itself. ByteDance has filed patents in this area. There is no public demo anywhere. That puts it in the possible range. It lines up with what the team is building, but I can't point to a receipt for it yet. Real-time preview at lower resolution is a standard inference optimization. That part is plausible for a production release, but also unconfirmed.
Unverified specific June release date.
The leak implied a June 2026 release window for 3.0. I'm not buying it as stated. Bytedance is in the middle of a real compliance situation. The global rollout was paused in March and reopened in April with C2PA watermarks and facial authentication gates. The US market is still excluded and major new model release into that environment on a specific date from a single unaffiliated source. That's a pretty weak signal.
June matters for 2.1 and 2.0 mini. More on that in a moment, but I wouldn't plan around a 3.0 June drop. Unverified 4K native output and Hollywood-grade quality claims. Claim 3.0 from Kuaishou launched native 4K at 60 FPS on April 23rd. I can point to receipts on that one. The claim that Seed end's 3.0 matches or exceeds that with Hollywood-grade quality comes entirely from @MokoCN.
There is no demo and no published paper.
The Verge, when they covered Seed end's 2.0 outputs, called them still slop relative to pro production. That was their word, not mine. Here's the point.
Hollywood-grade is a marketing phrase and right now it's attached to a single unverified source. Wait for footage. So, to summarize the leak, three claims in the possible range, three I'd hold at arm's length. That's not a bad ratio for pre-announcement from a single unverified source, but it means half the claims you've seen retweeted this week are not something I'd stake anything on yet. Here's the leaderboard in real time because the numbers shift. Seed end's 2.0 sits at the top of the audio-paired categories. Happy Horse 1.0 leads the now audio side. Claim 3.0 is strong in the 4K bracket. VEO 3.1 is still in the top five. Gemini Omni flash launched May 19th at IO 2026. It's not yet on the artificial analysis leaderboard, so I can't rank it here by Elo. Google is betting on conversational editing and YouTube shorts distribution. A different product strategy than raw generation quality. Sora's app shut down on April 26th. The API sunsets on September 24th.
Open AI built the model that launched this whole category and then watched ByteDance and Kuaishou ship past it in under two years. The market moved and the leaderboard reflects it. On pricing, SeaDance 2.0 runs 14 cents per second on volcano engine direct, 5 cents to 20 cents through third-party aggregators.
Happy Horse and Kling are in a similar range. VEO through Google AI studio is higher, particularly for commercial tiers. Quick aside, if you want to test SeaDance right now, we built AI Master studio for our own production workflow first. It's the tool our team uses every day to generate video for this channel and our clients' projects. SeaDance 2.0 is integrated alongside VEO and Kling 3.0, all under one interface with watermark free outputs. The cost per generation is lower than going direct on most models. The workflow is one prompt.
You get side-by-side outputs across models and pick what works. We didn't build this to sell software. We built it because we were tired of juggling tabs, API keys, and watermarked exports. We just wanted to ship videos. When SeaDance 3.0 lands, it'll be in the studio within days. That's how we operate. So, if you want early access without the setup friction, the link is in the description. One link at aimaster.me.
Access is open right now, no waitlist.
All right, back to what you should actually do this week. Here's my honest take on the near-term picture. The smarter move right now is not waiting for 3.0. The more important story for the next 30 days is C Dance 2.1 and 2.0 mini. Two sources I trust more than one-off Twitter leaks are Wave Speed and Pan Daily. Both track ByteDance product updates closely. They're reporting C Dance 2.1 will deliver roughly a 20% quality lift over 2.0. That is a meaningful improvement. And 2.0 mini is priced around 7.3 cents per second.
That's cheaper than 2.0 fast while reportedly beating it on quality. Both are rumored for June 2026. Those are not confirmed, but they're better sourced than the 3.0 leak. Where can you actually get access today? Volcano Engine Direct is ByteDance's cloud platform. It's available in most markets, but you'll need to set up an account and handle the API setup yourself. Third-party aggregators like Fall, SegMind, and Evo link all support C Dance 2.0 with simpler onboarding.
CapCut has started rolling out C Dance powered generation Brazil, Indonesia, Malaysia, Mexico, Philippines, Thailand, and Vietnam as of March 26th. If you're in one of those markets, you may already have it in your CapCut app. The US market is still excluded from the direct rollout. If you're in the US, third-party access or AI Master Studio is your cleanest path today. Here's one concrete thing you can do right now. I think testing a prompt structure on C Dance 2.0 that works around the coherence limitation while you wait for 3.0. The structure is scene description, then a comma, then character description, then maintaining consistent appearance, then the action. Something like sunlit rooftop cafe, Tokyo, early morning. A woman in a yellow linen jacket, dark shoulder-length hair, maintaining consistent appearance, pours coffee, and looks out of the skyline.
That phrase, "maintaining consistent appearance," anchors the model's character reference across a short clip.
It's not narrative memory chain, but it works on 2.0 today while you wait for 3.0 to make it automatic. The overall verdict on timing, don't pre-order the hype. Test what's already shipping.
SeeDance 2.0 is the number one audio-paired video model on the public leaderboard right now. That's not a rumor, that's the leaderboard. Let me address the bigger question underneath all of this. What actually changes when one prompt generates 18 minutes of coherent narrative? I'm going to give you the workflow math and leave the interpretation to you. Take an average indie short film, 15 minutes festival quality. It costs somewhere between $50,000 and $200,000 depending on crew size and production scope. CN Tech Post reported that Volcano Engine clients, short drama producers, are already seeing 70 to 90% cost reductions with SeeDance 2.0. Let's be conservative and say 50%. That's still a number that changes who can greenlight a project. The parts of production that shrink first are the ones running on pure visual labor. No dialogue, no actors performing, storyboarding, pre-vis, b-roll coverage, establishing shots, these are already happening with AI in production right now. Ruairi Robinson's viral Pitt vs. Cruise rooftop fight clip wasn't a feature film. It was a pre-vis level demo. That's the wedge. Original writing still needs humans. Performance work where a specific actor's presence is the product, same. And directing with a real creative vision doesn't compress either.
Add anything that needs rights clearance on real people or IP. A Marvel actor fighting a digital environment doesn't compress the same way a b-roll shot of a city skyline does. Those distinctions are real with 2.0 and probably with 3.0, too. I'm not telling you this is good or bad for Hollywood. I'm telling you these are the parts that move first and the ones that don't. You can do your own math from there. The distribution angle is worth noting as a pure fact. Yinying and CapCut combine have over 800 million monthly active users. If Seedance 3.0 ships and ByteDance routes it through that distribution layer, it spreads differently than anything else in this space. OpenAI has the ChatGPT user base behind it. Google has Workspace and YouTube combined. ByteDance has TikTok's editing layer at 800 million monthly active users. That's a different kind of reach. It's part of why both the cease and desist and the senator's letter happened. The incumbents understand the distribution math, even if the model quality is still debated. Let me give you the 20-second version of where I land on everything. From the At Moco CN Leak, the two that hold up as possible, 18-minute generation and narrative memory chain. The direction is real, the source isn't confirmed. The MMDI TV 2 upgrade is the most boring claim in the leak, which is exactly why I'd call it likely. The cost claim and multi-shot director mode are possible. I want to see confirmation before I commit to either. The June 3.0 release date and Hollywood-grade quality, I wouldn't plan around either of those. Here's the verified side. Seedance 2.0 is the number one audio-paired video model on artificial analysis right now. The near-term story is 2.1 and 2.0 mini in June. The US market is still excluded from direct access. Sora is out of this race. Happy Horse leads on raw, no audio fidelity. Gemini Omni Flash is interesting, but not yet benchmarked. If you want to actually test Seedance 2.0 today, AI Master Studio has it integrated with other video tools like Vio and Klang under one interface, all watermark free. Link in the description at aimaster.me.
I'll see you when next model drops.
Vidéos Similaires
BREAKING: Microsoft’s New Image Generating Model Beat Out GPT 1.5 and Nano Banana 2
aimmediahouse
122 views•2026-06-03
Are AI deceiving us? | Roman Yampolsky, Gleb Solomin #AI #science
shortsGlebSolomin
1K views•2026-06-02
Nvidia Bets Big On AI PCs | New Chip To Power Windows Laptops | Technology | AI Updates | N18S
cnnnews18
3K views•2026-06-01
AI Doesn't Create Bias — It Inherits It
UXEvolved
176 views•2026-06-01
Distributed Inference Challenges Explained #shorts
alexa_griffith
466 views•2026-05-31
[한글자막] OpenAI @ Replay 2026 | OpenAI는 Codex로 개발 방식을 어떻게 바꾸고 있을까요?
TechBridge-KR
1K views•2026-06-03
Starting & Test Driving JAKE'S Abandoned BUS from Subway Surfers | POV Restarting
RestartGaragePOV
4K views•2026-06-04
Building the Future of Voice-First Sovereign AI: Sarvam & NVIDIA
NVIDIA
3K views•2026-06-01











