拡張機能をインストールして、あらゆる動画内を即座に検索しましょう

SubQ: New AI with 12M Token Context Window!
追加: 2026-05-10

677 回視聴4013:07JulianGoldieSEO元のリリース: 2026-05-09

SubQ’s sub-quadratic scaling is a bold architectural pivot, but expanding context windows often leads to diminishing returns in reasoning precision compared to targeted retrieval. The pedigree of the team is undeniable, yet extraordinary claims of 1000x efficiency require empirical proof beyond a compelling marketing narrative.

[00:00:00]Sub Q, new AI with 12 M token context window. Q AI's 12 million token context window just broke the game. Let me show you what this thing can actually do. New AI startup out of Miami just dropped a model that reads 9 million words in one prompt. That's about 120 books in a single call and they're charging less than 5% the cost of Claude Opus to do it. CTO Alex Weedon posted this on X yesterday. He said Sub Q is 52 times faster than flash attention at 1 million tokens. He said it uses 1,000 times less compute. And the AI world spent the last 24 hours arguing about whether this is real or fake. So I went and read the papers, the benchmarks, read what the skeptics said, and I'm going to break it all down for you in plain English, no jargon, so you know exactly what this means and how to use it. Here's the headline number first. Q has a 12 million token context window. To put that in normal words, a token is roughly 3/4 of a word. So 12 million tokens is around 9 million words. That's 120 full-length books, entire email inbox for the last 5 years, every contract your business has ever signed, all loaded into one prompt. Time. Now you might be thinking, hold on, doesn't Claude Sonnet 4.7 already do 1 million tokens? Doesn't Gemini 3.1 Pro do 1 million tokens? They do. Here's the problem nobody talks about. Those models say they can handle a million tokens.

[00:01:14]Reality, they fall apart way before they hit that limit. There's a benchmark called MRCR V2. It tests if a model can actually find and use information spread across a long prompt. GPT 5.5 scored 74% on it. Claude Opus 4.7 scores 32.2%.

[00:01:29]Gemini 3.1 Pro scores 26.3%. Q scored 83% in their research result. Their production model scored 65.9%.

[00:01:37]So it beats Opus by double, beats Gemini by triple, and it does it at 12 times the context size, and they didn't stop there. On a coding test called SWE Bench verified, Sub Q scored 81.8%. So 4.6 scored 80.8%. So this small startup just edged out Anthropic on a coding benchmark. The model that's in their own words way smaller than the big labs. The cost number is the one that made me sit up. There's a benchmark called ruler 128k. It tests long context accuracy.

[00:02:05]Claude Opus scored 94% on it for about $2,600 worth of compute. Q scored 95% on it for $8. That's a 300 times reduction in cost. Accuracy, tiny fraction of the spend. So, how did they pull this off?

[00:02:18]Let me explain it like you're 10 years old. Imagine you're at a party with 1,000 people. Normal AI model every single time it wants to say something it looks at every single person in the room and asks, "Are you connected to me?"

[00:02:28]Does that for all 1,000 people. So, that's 1,000 times 1,000 checks. Million checks. That's why long prompts get expensive fast. Every time you double the words, the work goes up four times.

[00:02:39]That's called quadratic scaling. That's why your favorite AI model gets slow and dumb when you paste in a big document. Q does something different. The CTO Alex Weedon explained it like this. He said, "Only a small portion of those connections actually matter. They're noise." So, instead of checking every single connection, Sub Q figures out which connections matter and only checks those. They call it sub quadratic selective attention or SSA. Double the words, the work only doubles, not quadruples. That's linear scaling. And at 12 million tokens, that adds up to about 1,000 times less compute. Now, I want to give you a short break from the tech for 1 second. If you've been watching this and thinking, "Okay, this is moving way too fast for me to keep up." That's exactly why I built the AI Profit Boardroom. Inside, we have four live coaching calls every week where I walk you through how to actually use models like Sub Q to save hours in your business. Daily tutorials with step-by-step guides. 30-day roadmap for setting up long context AI agents that read your entire customer database, your full email history, every Zoom call you've ever had, and answer questions for you in seconds. We've got 2,800 business owners in there right now. Many of them already testing Sub Q for client work and lead gen. And there's a member map so you can find people near you doing the same stuff. Link in the description or go to aiprofitboardroom.com.

[00:03:51]Okay, back to it. Let me show you why this matters for normal people, not just developers. Here's the first big one.

[00:03:56]Right now, when you upload a long document to ChatGPT or Claude, it usually hits a wall. You paste in your 200-page PDF and it forgets the first half by the time it gets to the end.

[00:04:05]People came up with workarounds. Most common one is called rag. That stands for retrieval augmented generation. The AI chops your document into tiny pieces, looks at the pieces it thinks are relevant, and answers based on those.

[00:04:17]Works, but it misses stuff. Loses the connections between sections. Gets confused. Q kills rag. You don't need to chop. You don't need to chunk. You just paste the whole thing in. On a real example, imagine you're running a coaching business. You have 500 client call transcripts saved up. You want to find every time someone mentioned a problem with their pricing. With Claude or GPT, you have to build some weird search system to do this. Sub Q, you paste all 500 transcripts in. You ask the question. It reads everything and gives you the answer with full context.

[00:04:47]That's one use case. Here's another.

[00:04:50]Think about contracts. If you're a small business owner, you've signed a lot of stuff. Lease agreements, vendor contracts, customer terms of service, employment agreements, insurance policies. Most people never read them past page two. The 12 million token window, you load every single contract you've ever signed in one go and ask, "Where am I exposed? Do I need to renegotiate this year? Which of these auto-renew without me knowing?" Then you get one clean answer that pulls from all of them. Copy, paste. Chunking. No missing details. Let me give you another one that hits home. Customer support. If you run any kind of business, you probably have years of customer emails sitting in Gmail. Hundreds of thousands of messages. Plain every refund request, every happy review. Sub Q, you load all of it. You ask, "What are the top five reasons people churn? What's the one feature people keep asking for that we haven't built? Which customers are about to leave based on the tone of their last message?" You don't need a data team.

[00:05:41]You don't need a fancy dashboard. You just ask and it reads everything you've ever received and tells you. Want to slow down here because this is the part most people miss. The big shift isn't just bigger context. It's the death of the workaround. For the last 3 years, every smart AI builder has been duct taping things together. Rag pipelines, back to databases, fancy chunking strategies, bedding models, trivial re-rankers. All of that complexity exists because attention was too expensive. If SubQ holds up, all of that goes in the trash. The whole layer of plumbing that startups built billion-dollar companies around just becomes pointless. You stop building elaborate filing systems. You just hand the AI the whole filing cabinet. Now, I have to be honest with you. Some smart people are skeptical, and they have good reasons. Let me steelman them so you know the full picture. A guy named Will Depuy, who's a respected AI engineer, pointed out that SubQ is probably built on top of an open-source model like Kimmy or DeepSeek. The CTO, Alex Weedon, confirmed this on X. He said, "Yes, they're starting from open-source weights because they're a young company.

[00:06:41]This isn't a model they trained from scratch with billions of dollars. It's a clever architecture sitting on top of work other labs already did. That's not a bad thing, but it's a different story than building a frontier model from scratch.

[00:06:52]The big benchmark scores are from a research model. The one you can actually use through their API right now is called SubQ1M preview, and it scored lower on MCRV2 instead of 83. It beats Opus by a lot, but not the headline number. There's a writer named Dan McCarty who said this on X yesterday, and I'll quote him directly because it's the funniest line in this whole story.

[00:07:11]He said, "SubQ is either the biggest breakthrough since the transformer, or it's AI Theranos." That's the mood right now. We don't know yet. The benchmark is a single run. The model's not generally available. Past attempts at this kind of architecture like Mamba and RWKV and DeepSeek sparse attention all promised the same thing and ended up being smaller wins than the hype suggested.

[00:07:31]There's also a startup called magic.dev that claimed a 100 million token window a while back, and that never really materialized publicly. So, caution is fair. Four, there's a real technical reason this might still hold. When DeepSeek tried sparse attention, the part that picks which connections matter still had to check every connection to pick. So, it was still quadratic, just hidden behind a sparse layer. Q claims their selection step itself is linear.

[00:07:54]If that's true, this is the real deal.

[00:07:56]If they're hiding a quadratic step somewhere, it's just another nice paper.

[00:08:00]Here's the thing, even if Sub Q ends up being half as good as they claim, half is still a generational leap. Half the cost of running a million token prompt is still huge. Six million tokens of usable context is still six times what anyone else offers. Let me tell you what's already real. Team, CEO Justin Dangel is a five-time founder. He's built and exited companies in health tech, insurance, and consumer goods. CTO Alex Weedon was a software engineer at Meta and ran generative AI at a company called Tribe AI, where he led over 40 enterprise AI builds. 11 PhD researchers from Meta, Google, Oxford, Cambridge, ByteDance, and Adobe. This isn't two guys in a garage. This is a serious team with serious backers. Here's what they're shipping. There are two products in beta right now. The first is the Sub Q API, which gives you the full 12 million token window. Feed it your code base, your documents, your data, whatever. Reads it, answers. Second is Sub Q code, a command line agent that plugs into Claude Code, Codex, and Cursor. Don't even have to leave the tools you already use. Q code just makes them way better understanding your whole project. And they said they're targeting a 50 million token window for Q4 of this year. 50 million tokens is over 35 million words. That's the entire library of a small university in one prompt. Let me show you what this unlocks if it's even half real. Let's know AI. Right now, ChatGPT remembers like 30 messages back. With 12 million tokens, your AI assistant can hold every conversation you've ever had with it. Every note you've taken. Every voice memo. Every text. Actually knows you. Not in a creepy way. The way a really good assistant knows their boss after working together for five years. It's goals.

[00:09:32]Load every call transcript from the last year. What's the one objection that comes up that we always lose to? What did our top closer say differently than the bottom three? Train every new hire on patterns extracted from real calls instantly. Hiring. Dropping 500 resumes plus 50 client briefs plus your team's Slack history. Who fits this role and why? And which past hires looked similar at this stage? Model has the full context to actually compare instead of guessing from a one-page CV. Keeping.

[00:09:58]Two years of bank statements, invoices, and expense reports. I'm prompt. Find every recurring charge I forgot about.

[00:10:04]Find every customer who hasn't paid in 60 days. Find every category where I'm overspending versus last year. These aren't theoretical. These are things you can do this week if Sub Q holds up the way they say. Here's the bigger picture.

[00:10:16]For three years the AI industry has been hitting a wall called the context window. Everyone knew bigger context was the next jump. Tried. Mamba tried.

[00:10:24]Tried. Deep Seek tried. Google's Gemini tried with 1 million tokens but the quality crashes after 200,000. Magic.dev tried with 100 million tokens and disappeared. Q is the first attempt that comes with hard benchmark numbers, an API you can hit today, and a team with the credentials to take it seriously. If they're right, every product built on rag and chunking gets rebuilt over the next 12 months. If they're wrong, this becomes another footnote in the long list of papers that didn't survive contact with reality. Know in 6 to 8 weeks. That's how long it takes for independent benchmarks to come out and for real developers to start posting their results. Be honest with you about what I think. The cost numbers are too good for me to ignore. Going from $2,600 down to $8 on the same benchmark is not a small claim. If even a quarter of that is real, the AI economy shifts because the bottleneck has never been intelligence. It's been the price of using intelligence on a lot of data at once. Q is attacking that bottleneck head-on. Here's what I'd do if I were you. First, sign up for the early access at Sub Q AI. It's free to get on the list. And start thinking about what you'd ask an AI if it could read everything in your business at once.

[00:11:28]Don't wait until the tech is mainstream.

[00:11:30]Make the list now so when access opens, you're ready. Learn how to use long context properly. There's a real skill to writing prompts that take advantage of millions of tokens, and most people are still writing prompts like they're using a chatbot from 2023. Speaking of that last point, here's where I want to wrap. If you want help putting all of this into action, come join us in the AI Profit Boardroom. We're already running tests with Sub Q inside the community, walking members through exactly how to set up long context workflows that read every customer email, every contract, every call transcript, and turn it into one clear answer. We've got 30-day roadmaps for building AI agents that handle your whole inbox, your whole sales pipeline, your whole client database. Daily tutorials shipping every single day. Four live coaching calls a week where you can bring your specific use case, and we'll show you how to build it. Members, a prompt library packed with long context prompts that actually work, and a member map so you can connect with other AI builders near you. Link in the description or go to AIProfitBoardroom.com.

[00:12:27]And if you want the full process, the SOPs, and over 100 AI use cases like the ones I just walked through, come join the AI Success Lab. It's a free community. Links in the comments and description. You'll get all the video notes from this episode, plus access to 67,000 members who are crushing it with AI right now. The shift here isn't really about Sub Q. It's about what happens when one of the biggest constraints in AI suddenly stops being a constraint. The companies that figure out long context first are going to look unrecognizable in 12 months. The ones that don't are going to get left behind, slowly at first, then all at once. Be in the first group. The tools are showing up faster than people can learn them.

[00:13:03]The only thing you can do is start learning today. I'll see you in the next one.

#seo #chatgpt #seotips

関連おすすめ

OpenHuman VS Hermes AI: Who Wins?

JulianGoldieSEO

285 views•2026-05-29

BREAKING: Microsoft’s New Image Generating Model Beat Out GPT 1.5 and Nano Banana 2

aimmediahouse

122 views•2026-06-03

Long-Running Agents — Build an Agent That Never Forgets with Google ADK

suryakunju

142 views•2026-05-30

This computer is made from real human brain cells. And you can buy it.

Talktmsmedia

3K views•2026-05-28

I Made the Same Anime Fight Scene in Every AI Video Generator

NobleGooseAnime

295 views•2026-05-30

Nvidia Bets Big On AI PCs | New Chip To Power Windows Laptops | Technology | AI Updates | N18S

cnnnews18

3K views•2026-06-01

I Tested NEW Opus 4.8 on Four Projects (Updated LLM Leaderboard)

AICodingDaily

298 views•2026-05-29

3D Platformer Update - NO CAPES

SolarLune

294 views•2026-05-30

トレンド

コンピュータサイエンス

The Meta AI Hack Is a DISASTER

LowLevelTV

141K views•2026-06-03

Paris is in SHAMBLES right now 😭

H1T1

4053K views•2026-05-31

The Casino Had Us Guessing All Day

VegasMatt

157K views•2026-06-03

The Dancing Plague...

HoodieGuyStories

1730K views•2026-05-30