A $200 subscription might buy you faster retrieval, but it cannot replace the critical judgment and original insight of a human mathematician. The hour-long wait for AI "thinking" only highlights the inefficiency of replacing genuine expertise with brute-force computation.
Deep Dive
Prerequisite Knowledge
- No data available.
Where to go next
- No data available.
Deep Dive
The Truth About $200 ChatGPT for Research MathAdded:
So, it's been about 2 weeks since we gave Sam Alman $200 for OpenAI's best model. It's known as Chat GPT Pro, and I've been using it loads in doing maths research and trying to see whether it's actually that much better than GPT plus, which goes for $20. The main thing that you get with this $200 subscription called GPT Pro is the Pro Thinking Mode, and it goes all the way down to Pro Extended Thinking Mode. So for example, if you use free chat GPT, your queries tend to be rooted to just standard where sometimes depending on the difficulty of the question, it might allocate some extra inference compute to your prompt.
But when it comes to the first paid tier, which is GPT plus, you get access to thinking mode. And then if you pay 10 times more than that, so going from $20 a month to $200 a month, you'll get access to GPT Pro. And in the last two weeks, I found that giving extended thinking mode my math problems, it generally thinks for about an hour, which is roughly three times as much as I was previously finding on the GPT plus subscription. And previously in my tests of GPT Pro, I'd found that there wasn't a huge increase in the model's capacity to do new things mathematically. But in the last 2 weeks of testing GPT Pro dayto-day for doing maths research, I guess my opinions changed slightly. And in this video, I want to outline the kind of truth, so far as I see it, of the capacity of this incredibly expensive AI model to do research level maths. Because look, we're not short of the AI shields telling us that it's better than PhD level, research level, you know, whatever level you want to say that it is.
>> GPT5 is a major upgrade. Now it's like talking to an expert, a legitimate PhD level expert in anything on demand.
>> Grock 4 is better than PhD level in every subject. No exceptions. But I feel like at least in my experience of testing these models, it's definitely a mixed bag when it comes to these sorts of claims. On the one hand, in the last few weeks, GPT Pro has come up with some pretty crazy equations. I say crazy, but they could be I haven't actually checked yet whether they're right or not. I need to like, you know, actually read it, which is a whole ordeal, which I don't think that I would have come up with anytime soon. And it did it in just 1 hour. But on the other hand, to claim that GPT Pro is better than research level at everything, and I suppose specifically for maths in this context, I just don't believe that that's true.
These models do have an incredible and pretty much unparalleled, at least from what I'd seen in the past, ability to retrieve information and even apply known information in new ways. But that said, in my experience, there is a huge difference between talking to an LLM and the feeling that I get of its intelligence when I'm discussing random problems in maths, like random matrix theory and some aspects of number theory, and comparing this to talking to an academic. Now, I've got some notes from a meeting that I had with one of my collaborators. Grab them. They're like this. Okay, you have a picture, okay, of a bunch of scribbles.
Okay, so here are some notes from a meeting that I had with an academic. And these notes, although it does look like kind of deranged scribbles are worth way more than any conversation that I've had with Chat GPT. They've given me like way more insight than Chat GPT has given.
And the point that I'd like to highlight with this is that although the $200 chat GPT model is incredible at doing certain mathematical calculations and even trying to do things which are technically new in the sense that they're applying known things to new problems in ways that haven't been done before. I do think that it lacks a certain wisdom which talking to a PhD student or postto and definitely an academic is just unparalleled. It's just not something which I feel like these models have yet even the very expensive models. And the same story is true for the most expensive model of Google, which was Google's Gemini Ultra with Deepthink and also Anthropic's Opus 4.6, which although impressive, had credit limitations and wasn't particularly groundbreaking relative to these other models. I mean, they're all fairly similar. From my experience of using GPC Pro and any of these very expensive AI models is that they're very good at retrieving information specific to the problem at hand. And when it comes to maths, it can even construct an analytic argument, you know, with equations and stuff to show that what it's thinking about is technically true. It's especially good at drawing on loads of different areas of maths which you might not have studied. I mean, of course, a human being can't study every single area of maths. It's incredibly broad.
And although that's a good thing when it comes to trying to solve new problems with strategies that you'd never thought about, I can find that sometimes when chat GPT invokes some random area of maths like you know stochastic partial differential equations or something which I don't have that much knowledge with. Asking it to break down its reasoning into smaller steps seems more limited in a way than kind of being able to get a final answer. But sometimes the most difficult part actually of deciphering chat GPT's response is trying to determine whether it's actually justified in the claims that it's making. And as a PhD student in maths hearing all of these claims about how CH GPT is better than PhD level or better than research level at everything. I sometimes find it a little bit hard to believe cuz actually if you were to read a PhD thesis you'll see that line by line it makes a lot of sense and the person writing the thesis tends to genuinely understand what they're saying. But even when it comes to these state-of-the-art AI models, I find that half the problem is actually making sure that what it said is right.
It's far more difficult to actually check whether it's right because often it's not, than it is to actually check its final answer. And that's especially true in maths because you can normally simulate a final answer, which might be an equation for something. If you've got specific known examples of the thing that you're testing, you can then ask any of these AI models to create like a Python script to test it. This is especially useful, I found, when moving my projects to Visual Studio Code with codecs because you can basically have your project in Latte with folders next to it with your Python scripts and codecs can literally go into these folders, make the Python scripts, run the Python scripts, check if they're right, make plots for them, and even then bring them back into your latex script all in one prompt. So although I'm saying that these models might not be wise, you know, wise to the same extent as when I meet like an academic and talk about random matrix theory, they are still really, really good. So I'm not dissing them. Codeex is insanely useful. That said, you still have to check whether the Python script is right because you don't want some AI hallucination that's just claimed that an equation is true to then waste your time when you subsequently look in that Python script and see that it was actually basically just testing two sides of the same equation like saying A is equal to A and telling you that those two answers are the same rather than testing the thing that it's claiming to be true, say A equals B, where it's testing B, which is the thing that you don't know whether it's equal to A. Now, one surprising limitation that I wouldn't have predicted when getting GPT Pro is just how kind of annoying it is to have to wait an hour to get the answers. On the one hand, you do get way better quality of answers. The results of the time that it spends thinking do speak for themselves. But when I'm actually trying to get some work done, it's not that useful if in a way the answer comes 1 hour after I ask it. And I'm sure some people would argue that that's a tenuous criticism. But actually, when it came to GPT Plus, which is like a tenth of the price, I never really found that the period of time that I was having to wait was kind of slowing me down in the workflow.
Whereas with GPT Pro, it's thinking like three times as long. So, if I have a question for something or I need it to do some sort of algebraic computation, having to wait well over an hour is sometimes quite frustrating, especially if the quality of the answer isn't that much better than GPT Plus. You may as well just get GPT Plus, which is 10% the price, get a quicker answer, which is roughly the same quality. Maybe it's not quite as good, but at least you haven't spent three times as long waiting for it. This relates well with one of the biggest downsides that I can see of AI, and that is that I find that sometimes when I'm solely working using chat GPT, I'm actually less clocked in to the work that I'm doing, especially when the waiting time is long. When I send off a long prompt that I might have spent a while writing, I often find that it's difficult to have work to do while waiting. And naturally, this tends to mean that I just don't work in the time that it's thinking. And the big fear that I have with this is that I'm actually not thinking in this process.
Which to me raises the point that at times these tools are sold to us as kind of productivity boosters. But actually upon reflecting on the times where I've had long sessions with chat GPT, sometimes I think that it's only a surface level productivity that I'm getting done with these models. And in all honesty, the equations that it's giving aren't very valuable and I'm not necessarily learning that much in the process. And it really doesn't matter which discipline we're talking about here. If chat GPT is good as like a default search engine for the field of research that you work in, I think that it's going to be really important to make sure that you don't overuse it as a tool for doing work. If you're spending hours a day, like I do sometimes bouncing ideas back with chat GPT, you might find that sometimes you finish the day feeling like you've done a lot of work when really you haven't actually learned that much or made that much productive progress. Especially when it comes to a difficult problem. I think that chat GPT has a bit of an illusurary nature to it where it makes you think that the things that it's saying are really deep and meaningful, but they lack originality and are often kind of more like pure intellect rather than a wise or insightful interpretation that you might have expected from a scientist in the past. And then comes the argument of codecs. Now, okay, as a PhD student in maths, the amount of codecs that I can physically use seems to be inherently bounded. In the last few weeks, I just haven't even been close to using the codeex usage. And even with GPT+, I don't think I've ever run out of usage for the week. Of course, if your use for these AI models is in editing some large code repository, that's great. The 10 times codeex usage is actually genuinely quite useful. But as it stands right now, I don't have enough tasks to give codecs where I trust it enough to do it. You know, I never really wanted to write whole sequences of algebraic proofs cuzn't that good at it. So far, the main things that I found Codeex to be useful for is in running benchmarks in Python within the folder of the Latte file that I'm editing and then formatting these within the same Latte file. That's really useful. It's also really good at checking for spelling mistakes or notational inconsistencies and random things that I'd need to read the draft multiple times in order to pick them up. So, next time you hear Sam Alman say that chat GBT is PhD level intelligence in your pocket or you hear Elon Musk saying that it's better than PhD or, you know, better than research level at everything.
um which is pretty much what he said.
>> Grock 4 is better than PhD level in every subject. No exceptions.
>> I hope that this makes it clear to what extent at least I think that these claims true and to what extent they're slightly less justified. I really appreciate all of the support. Honestly, everyone who comments and hypes the videos channel members, I really appreciate it. Having access to these models and repeatedly making these videos forces me to do work which maybe I wouldn't have done in this extra time, like kind of doing math as a hobby as well as as a job. So, thank you very much for that. But as always, I hope that you enjoyed the video and have a good one. But even when it comes to these state-of-the-art But even when it comes to these state-of-the-art Oh my goodness. Even But even when it comes to these stateofthe-art AI models, that's like quite difficult. You should try it. But even when it comes to these state-of-the-art AI models, okay,
Related Videos
VALORANT's Latest 'Exclusive' Tier Bundle is Rough...
KangaValorant
17K views•2026-05-28
Flight Attendant Mocks Poor Looking Black Woman — Mid Air Announcement Exposes Her Real Power
SkyboundStories-b4r
184 views•2026-05-28
I FIXED My Friend’s Blown Turbo RX-8… Then Sold It
Cameron-RX8
134 views•2026-05-28
NewsWatch 12 at 5: Top Stories
NewsWatch12
1K views•2026-05-28
Simon Jordan & Danny Murphy deliver PREDICTIONS for Arsenal's Champions League FINAL with PSG
talkSPORTArsenal
6K views•2026-05-28
Botting is OUT OF CONTROL in Classic WoW (Again)...
SolheimGaming
108 views•2026-05-28
The "AI Job Apocalypse" is CANCELLED!
WesRoth
9K views•2026-05-28
STREET FIGHTER 6 - INGRID Story Walkthrough @ 4K 60ᶠᵖˢ ✔
RajmanGamingHD
12K views•2026-05-28











