Install our extension to search inside any video instantly.

How a reasoning model cracked an 80-year-old math problem — the OpenAI Podcast Ep. 20
Added: 2026-06-04

353 views3041:17OpenAIOriginal Release: 2026-06-04

AI reasoning models with extended test-time compute can solve complex mathematical problems that have resisted human mathematicians for decades, as demonstrated by an OpenAI model that disproved the 80-year-old Erdős unit distance conjecture in combinatorial geometry, showing that general-purpose models can achieve breakthroughs in mathematics when given sufficient computational resources to reason through problems.

[00:00:00]Hello, I'm Andrew Mayne, and welcome to the OpenAI Podcast.

[00:00:03]On today's episode, we're speaking with Alexander Wei, Hongxun Wu, and Lijie Chen from the reasoning research team behind a recent math breakthrough from an OpenAI model.

[00:00:13]They'll tell us the story behind the discovery and what stood out to them about the reaction.

[00:00:18]Everyone had a hard time sleeping because it's so, so exciting.

[00:00:22]Okay, this model is something that's really amazing.

[00:00:25]I mean, this is something that can be published in the best journal of math.

[00:00:30]Maybe this is one in a hundred times where it's too good to be true, but it's actually true.

[00:00:37]Lijie, tell me what you work on.

[00:00:39]Oh, I work on reasoning with Alex.

[00:00:41]Okay.

[00:00:42]How did you find your way into reasoning?

[00:00:44]Last summer, Alex had this breakthrough in like IOI and IMO.

[00:00:48]You know, I used to be a participant in IOI.

[00:00:51]Okay.

[00:00:52]And then I was like, oh, that's crazy.

[00:00:54]You know, model can already win medals, gold medals.

[00:00:58]At that time, I was an assistant professor at UC Berkeley.

[00:01:02]But then I'm thinking maybe I should try to rethink my career.

[00:01:07]It seems like making the model smarter will maybe have some bigger impact on the world.

[00:01:13]And then I just kind of had a conversation with Alex back in last October.

[00:01:20]And then I got super excited about this thing.

[00:01:22]And eventually I just joined OpenAI.

[00:01:23]We hear IOI and IMO come up a lot.

[00:01:26]Alex, you want to unpack those for everybody?

[00:01:28]So IMO and IOI are these two competitions for high schoolers.

[00:01:34]They stand for the International Math Olympiad and International Olympiad of Informatics, respectively.

[00:01:41]And these are just devilishly hard math problems.

[00:01:44]You get two sessions for each of these exams that are like four and a half to five hours, and you just have to do three problems.

[00:01:51]And so for a long time, these were sort of an implicit grand challenge in AI.

[00:01:59]When would we be able to get models that could perform as well as the best humans on these exams?

[00:02:05]That was a pretty interesting starting point, I think, for measuring the success of the model.

[00:02:09]And we're here to talk about how far things have gone since then, which is pretty incredible.

[00:02:13]But how did you find your way into reasoning?

[00:02:15]So I did my PhD in ML.

[00:02:19]Towards the end of my PhD, I got excited about this idea of spending more compute at inference time to solve harder and harder reasoning problems.

[00:02:31]At the time, I was playing with GPT-3.5 Turbo in the API, and I didn't really get any interesting results.

[00:02:39]But there was this team at OpenAI that seemed to be doing something pretty similar.

[00:02:44]And so I got super excited about it and was lucky enough to be able to join.

[00:02:48]So probably the simplest way to describe that is like letting it inference time is basically letting the model think longer about it.

[00:02:54]Yes, that's right.

[00:02:55]So basically, before this era of test time compute, models sort of answered immediately without, like right off the cuff, without thinking.

[00:03:07]And what inference time compute, test time compute does is you now give the model a chance to think and improve its answer and try different things before having to finally output something.

[00:03:19]That obviously just helps make the model smarter, lets them do things that they wouldn't otherwise be able to do instantly.

[00:03:25]When you started to work on reasoning, did you have an idea of where you wanted to see this go?

[00:03:32]Like what your expectations were?

[00:03:33]Were you looking at it purely from, hey, this is very cool from an academic point of view?

[00:03:37]Or did you have some sort of other vision?

[00:03:39]I think for me, the draw of reasoning when I first got excited about it was that this was something that, you know, models just obviously can't do right now.

[00:03:49]So this was like end of 2023, start of 2024.

[00:03:53]Models were, you know, struggling with grade school math problems.

[00:03:57]And so at that time, it was just like, can we just get these models to do something reasonable on math at all, let alone have them be much, much better than I am at it.

[00:04:10]I remember my first day at OpenAI, Nolan Brown asked me when I thought models would get IMO gold.

[00:04:21]That was just a benchmark we talked about.

[00:04:24]I think at the time, a lot of people even within research thought that IMO gold was out of reach this year, but maybe like 2026.

[00:04:34]I felt like I had an idea that if we just pushed for it, maybe I thought we could do it by April.

[00:04:43]It took until June to get a really good model.

[00:04:46]And then IMO rolled around and we were able to get gold.

[00:04:51]And I think zooming out, I think this happened a lot faster than I expected.

[00:04:57]And it's crazy to me that progress since then has kept up at this same sort of blistering pace.

[00:05:08]It was just 10 months ago, but it feels like the IMO level of problem feels far in the rearview mirror of AI today.

[00:05:17]Nolan asked me the same question.

[00:05:19]I mean, not about IMO gold, but about whether a model can solve P versus NP.

[00:05:25]I think P versus NP might be something quite hard.

[00:05:28]Because I think the reason is that I think for solving P versus NP, you would need to build a new theory.

[00:05:36]Maybe you have to write many books of new ideas to get there.

[00:05:40]So currently, it seems we are still far from that.

[00:05:44]But, you know, maybe who knows what will happen in the future. So, yeah.

[00:05:47]Hongxun, what do you work on?

[00:05:48]Oh, I was working on theoretical computer science.

[00:05:52]I was collaborating a lot with Lijie in my PhD.

[00:05:56]I was at Berkeley.

[00:05:58]And I remember when o1 came out, I was talking to my advisor saying, "Oh, there's no barrier in models solving math problems anymore."

[00:06:09]I think he just smiles.

[00:06:10]And he knew that he was going to lose a student. Oh, wow.

[00:06:16]So let's talk a bit about that, because it's an interesting point.

[00:06:19]Because as you said, it went from the model would just have a moment to try to figure out the answer.

[00:06:24]Then all of a sudden, you've given it the ability to spend longer and to think about it, reasoning.

[00:06:29]And the results have come pretty quickly, and I think surprising to a lot of people.

[00:06:35]You had a model that was able to basically disprove one of the Erdős conjectures.

[00:06:40]Could you explain that just a little bit?

[00:06:42]So our models last week, they were able to produce a proof of the, or a disproof rather, of the unit distance conjecture due to Erdős.

[00:06:53]And this was an 80-year-old open problem in the field of combinatorial geometry, where basically the question concerns if you have endpoints, let's say on a piece of paper, how many of them can be one inch apart exactly?

[00:07:11]And how many pairs can be one-inch apart exactly?

[00:07:17]And how does this number grow asymptotically with the number of points on the piece of paper?

[00:07:22]This wasn't a trivial problem.

[00:07:25]When Erdős put this together, the idea was to say that it could, you know, I think ideally it had to be only done on a plane or something like this.

[00:07:31]But there was, you know, the idea that maybe there was no better way.

[00:07:34]And this has been out there because it's a very interesting problem.

[00:07:37]And the fact that a model solved this is pretty profound.

[00:07:41]And also, this model was a general purpose model, correct?

[00:07:45]Yes, that's right.

[00:07:45]So Erdős's original conjecture was essentially that the optimal solution to having as many distance one points on the plane was to arrange them in a square grid.

[00:08:01]And what the model proved was that the square grid was not actually close to optimal at all.

[00:08:07]and that you can do much better with a different construction using a lot of high-powered number theory.

[00:08:16]Hongxun, how did you choose these problems?

[00:08:19]I guess we didn't really choose the problem.

[00:08:22]What happened was we wanted to test the upper bound of our model's capability.

[00:08:26]So we just used a selected subset of Erdős problems and to test the capability of the model.

[00:08:32]I would love to know, one, who is the one that hit Enter and asked the model the question.

[00:08:39]I guess both of us, like Hongxun and I.

[00:08:41]You guys at the same time, like, press-- Yeah, maybe.

[00:08:45]I think what happened was actually we were testing like two side different internal models.

[00:08:52]And we both saw some correct solutions.

[00:08:55]It was really, really exciting for us.

[00:08:57]How did you know that it worked?

[00:08:59]Of course, you first asked the model to check it.

[00:09:01]Okay.

[00:09:02]But of course, you know, models sometimes are not reliable.

[00:09:04]I got it. It's good. Don't worry about it.

[00:09:07]Yeah, so then we just, after we checked with the model, it seems plausible.

[00:09:11]Then we just asked a bunch of, you know, our mathematics friends in the company, you know, Mehtaab and Mark Sellke.

[00:09:18]And at first they were like, oh, there's no way this can be true.

[00:09:21]It's a major open problem.

[00:09:23]But after, you know, just they think about it for a day, they couldn't figure out any mistake.

[00:09:29]Then they become more convinced.

[00:09:30]Then eventually they're like, actually, this may be correct.

[00:09:34]Yeah, then everyone had a hard time sleeping because it's so exciting.

[00:09:39]What was the conversation like when you started getting people saying that this was accurate?

[00:09:44]For me, I was not that surprised because I guess when Mehtaab first said, okay, what happened was first Mehtaab said, this is definitely wrong.

[00:09:53]But I actually knew that he probably just spent like five minutes, 10 minutes looking at it.

[00:09:57]So in my heart, I don't really believe that.

[00:10:01]But later he told me it's 50%.

[00:10:03]I was thinking, okay, if we extrapolate the trend, then maybe next night it will be 100%.

[00:10:10]So yeah, it's a little bit dreamlike, but also it feels a little bit natural that this model would do something amazing.

[00:10:23]Later, it just become more and more real that this might actually be correct.

[00:10:30]might actually be a big deal the first time they can publish something that would get into top math journals.

[00:10:37]We knew this day was going to come, but never knew that it's going to become reality so fast. It's like living a dream.

[00:10:46]I mean, this is something that can be published in the best journal of math. It is way beyond IMO level.

[00:10:54]So I would expect something to happen at some time, but at some point.

[00:11:01]But maybe not just this may.

[00:11:03]Yeah.

[00:11:04]One of the things I think that we've seen emphasized at OpenAI is that OpenAI doesn't try to train two specific benchmarks and stuff.

[00:11:12]That OpenAI tries to build really good general overall models.

[00:11:15]And I think sometimes people say, well, we just try to build a generally smart model, and we find these things a lot away.

[00:11:20]And when it comes to reasoning, it's the same thing.

[00:11:22]Something that's really good reasoning overall, you find these capabilities.

[00:11:26]Does that ring true for you?

[00:11:27]Yeah.

[00:11:28]I think for this model in particular, I think it's one that I think all of us have also just used in lieu of the current model in Codex.

[00:11:43]And it works quite well as just a general purpose model.

[00:11:47]Having the capabilities to do this Erdős unit distance result, I think people will be able to do this at home in the near future.

[00:11:57]It's been exciting to see people react to this and pay attention to this.

[00:12:01]We went from just a very short period of time ago where people said models weren't good at math, and now models are doing this.

[00:12:09]What have been some of the more fun things you've seen online or reactions from people?

[00:12:13]Ever since we announced the results, my friend in TCS started to ask me to try their open problems, including my advisor gave me like two, three open problems to try on.

[00:12:27]I think the reaction was very positive.

[00:12:30]I think people really get a sense that the frontier of AI today can really come up with research output that I think many human mathematicians would be proud to achieve.

[00:12:44]And I think it's really great that we're able to communicate this, that this is the frontier of progress to the rest of the world.

[00:12:51]I've seen people make these designs of trying to sketch out the model's construction.

[00:13:00]If you plot it on a grid, it's actually this very pretty, symmetric, geometric design.

[00:13:06]Yeah, I guess we are thinking maybe try to make one of the designs, put them in a frame and put them on a desk or something to celebrate this moment.

[00:13:18]Yeah, I think it's going to be fun when we start seeing things like tiling problems and other stuff where we can actually just look at the artifacts that we need.

[00:13:26]So we've been hearing more about Erdős problems lately.

[00:13:29]And some seem like they weren't as challenging to solve as perhaps as people thought.

[00:13:34]They just needed some attention.

[00:13:35]Yet this one seems to be a little bit more complicated.

[00:13:38]Where would you rank this?

[00:13:39]I think he proposed like a thousand questions or more, right?

[00:13:43]So like he, you know, Erdős problem is this collection of all the problems he has asked.

[00:13:47]There's some problems where he has offered some money for a solution.

[00:13:52]Some problems he just noted.

[00:13:54]And this problem he offered, I think, $500, which is from last century.

[00:14:01]So it was a little bit.

[00:14:03]And also, this is one of the central questions in this field of discrete geometry.

[00:14:10]And this has been heavily discussed by mathematicians in many discrete geometry papers.

[00:14:18]And so it's kind of one of the questions people have thought about a lot and really want to see the answer.

[00:14:23]So I would say this is more like a major open problem in a concrete field of mathematics instead of some just like, you know, many other Erdős questions, which may be just some, you know, something Erdős asked after lunch or something.

[00:14:39]So how do you collect that $500?

[00:14:42]Did it disappear when he passed away?

[00:14:45]I think there's a special agency for that.

[00:14:48]But you usually just frame the check.

[00:14:51]Yeah.

[00:14:51]Yeah.

[00:14:51]So maybe we'll just frame the check in Sam's office. I don't know.

[00:14:55]How do you feel this proves that reasoning is effective?

[00:15:00]Well, I think the biggest proof is that if you look at the plot in the official blog, if you give the model more time to think, the accuracy on this problem grows faster.

[00:15:12]Like if you give it a lot of time, it can get almost 50% correct.

[00:15:15]So more thinking, more correctness.

[00:15:17]I think that's really a proof of reasoning being effective.

[00:15:20]But Alex, to go back to this, this isn't a math model.

[00:15:22]This is a model that can do many different things.

[00:15:25]Do you see a correlation between as these get better at solving things like mathematics that it works with other general problems?

[00:15:33]That's the hypothesis at least is that this model was not trained specifically for math.

[00:15:40]And so, and we just wanted to, you know, like we had this new model, how we came about this.

[00:15:45]We wanted to take it on a test drive, essentially.

[00:15:48]And so we evaluated it on some, you know, very challenging math problems and to just see like, what can it do?

[00:15:55]When you go through the proof and you look at what it came up with, were the things that surprised you, things that you would describe as creative?

[00:16:01]So for some context, like the proof is like well above my own mathematical pay grade.

[00:16:07]But just at a high level, my understanding was that this idea of taking class field theory and applying it to problems in combinatorial geometry hadn't really been done before.

[00:16:23]Though some people knew that there could be this bridge between these two fields.

[00:16:30]Being able to do that and execute it requires, first of all, to make the connection requires quite a bit of insight and creativity.

[00:16:40]And then to execute the proof is also a very delicate, careful affair that very few people would be able to do.

[00:16:48]I think the most surprising thing for me is you tell the model to do something and you want to have a lunch.

[00:16:55]And when you come back, you'll see that it actually does much better than you thought.

[00:17:00]And at that moment, you feel like, Okay, this model is something that's really amazing.

[00:17:05]So going back to GPT-3.5 Turbo and working with that and looking at a model that was doing automatically instant sort of inference and figuring these things out to now a model that's able to do incredible mathematical proofs.

[00:17:21]Is it using tools? Is it using Lean?

[00:17:23]Is it using some other things like that?

[00:17:25]Or is this doing purely inside the model?

[00:17:28]In the case, the model basically is like Codex.

[00:17:31]It can code, it can look at the website and find information.

[00:17:37]So it's basically a general ChatGPT setup.

[00:17:40]You can also write Python and execute it.

[00:17:42]But I don't think the model writes anything.

[00:17:46]I think Lijie has a story about the Cambridge dictionary.

[00:17:50]Oh, Okay.

[00:17:51]So the first thing the model do when it gets to the website is to check what unit means in the Cambridge dictionary.

[00:17:59]It's a little bit ridiculous.

[00:18:01]So it looked up the word unit?

[00:18:03]Yeah, you also make sure it has the absolute correct understanding of what is unit.

[00:18:08]Have you seen it do other things like that, where you're saying, oh, it's trying to ground itself to make sure it understands the question?

[00:18:14]I think definitely.

[00:18:14]A lot of time in the model answer, it will actually explain the definition again.

[00:18:19]Yeah.

[00:18:20]To show that it actually grounded the definitions.

[00:18:24]As people who are very knowledgeable about computer science, people who know a lot about mathematics, is it intimidating to all of us and see this happen?

[00:18:34]I think it should not be intimidating.

[00:18:36]I think it should be empowering.

[00:18:38]Okay.

[00:18:38]After the proof actually come out, like mathematician has improved, first improved the bound it proved.

[00:18:45]And second, they use the motivation of the construction to knock down other open problems as well.

[00:18:53]So I think the trend is going to continue.

[00:18:57]Model can make good breakthrough on some very hard questions we don't know how to solve.

[00:19:03]But then how to digest that idea, how to use that method for other good things, I think human still has a role in this.

[00:19:13]So what do you think the role of somebody working in mathematics is going to be like five years from now?

[00:19:18]I think there will be a lot of AI and human collaboration.

[00:19:21]Because AI, and now AI, they know a lot, right?

[00:19:24]They can connect distant ideas.

[00:19:26]But humans can also think for longer.

[00:19:29]Currently, it seems AI cannot build a new theory for math, for example.

[00:19:34]But I guess humans, once they have the help of AI, they can just grab all the ideas from distinct fields of math.

[00:19:41]I think they can empower humans way more.

[00:19:43]Do you see this working into other fields?

[00:19:45]Are we going to see discoveries in physics?

[00:19:47]So, I mean, I can't speak for physics, but I mean, I guess like we're all researchers in AI.

[00:19:55]And I think definitely for me, like my day to day looks completely different than when I first started doing research in this field.

[00:20:06]I think so much of my work is now done by coding agents.

[00:20:13]I can just do so much more.

[00:20:17]And I think that's been a sort of magical feeling that with AI, you're really starting now to feel like you can use AI to build AI faster.

[00:20:26]How much has AI changed the way you do these sorts of things?

[00:20:29]I think changing completely.

[00:20:30]Even when I just joined half a year ago, I was hand coding the codes, looking up the select channels for directions.

[00:20:42]But now the default is just ask Codex.

[00:20:45]And I ask Codex to do a lot of things.

[00:20:48]And then I just go to lunch.

[00:20:50]I just go to talk to people.

[00:20:54]The work completely changes.

[00:20:56]And now you use Codex on your phone and you can check on it.

[00:20:58]Yeah.

[00:21:00]It's interesting how much more I want to do things now that you have this sort of tool that can work all the time and do stuff.

[00:21:08]Lijie, how do you explain this to your friends who are sort of trying to understand what this means and how it's going to impact other fields?

[00:21:13]So, I mean, I have some mathematician friends and I have some, you know, friends in other fields.

[00:21:19]Yeah.

[00:21:20]So I think the way I want to tell them is that I feel like, you know, some may be afraid that AI will replace them.

[00:21:25]You know, AI will just replace mathematicians.

[00:21:27]Yeah, but I think it's really about empowering every theoretical researcher.

[00:21:33]Because AI really has this advantage of knowing so many stuff and connections.

[00:21:38]Currently, it seems like the problem had for a human may not be had for AI.

[00:21:43]And that's a really great thing.

[00:21:45]We can use AI to solve those problems, get new ideas, and then we can digest them and make new discoveries, just like Hongxun said.

[00:21:54]So I think some of them get very excited about this.

[00:21:56]And of course, one thing is that, you know, currently it's only on math, but I believe that because it's a general reasoning model, like all the at least theoretical researchers, they can benefit a lot from that.

[00:22:08]Like, I think the dream world will be like everyone have some access to the top level reasoning ability.

[00:22:14]So other researchers can use them to discover whatever they want to discover.

[00:22:19]And then basically, OpenAI will accelerate science a lot, because you are empowering every scientist to accelerate the science worldwide.

[00:22:29]And then that's our mission.

[00:22:30]So if I was a researcher, how would I get started?

[00:22:33]What advice would you have to say, Okay, try this first?

[00:22:36]We'll start with you, Hongxun.

[00:22:37]Get a ChatGPT Pro subscription.

[00:22:39]Of course.

[00:22:40]It's really, really much better than thinking without Pro and because they think longer.

[00:22:47]and try to ask the boldest question you can ask.

[00:22:52]I had an experience that sometimes I try to decompose a problem into a smaller problem and ask the model.

[00:22:58]And it turns out that it was not as good as just directly ask the question, because my decomposition was not the best way.

[00:23:05]Why do you think that was?

[00:23:07]I think because as humans, we have all kinds of priors on how problems should be solved.

[00:23:12]And they are very helpful in reducing the thinking time.

[00:23:17]But very often, the prior are wrong, and there are blind spots.

[00:23:21]And AI models, they sometimes just can surprise us with discovering these hidden things.

[00:23:28]When I spoke to Alex Lupsasca, he talked about how kind of treating it like a graduate student.

[00:23:33]Not talking down too low, but not too high, but at the right level so you could just understand that it knew the terms and worked for you.

[00:23:40]Alex, how about for you?

[00:23:41]What advice would you give somebody who's a researcher who wants to try to figure out how to be more effective with this?

[00:23:46]Yeah, I think a lot of it is actually like, I think these days learning to trust the model and like figuring out like, you know, how far you can go in trusting the model and also learning like, you know, what's beyond what the model can do.

[00:24:00]Because if you don't have a sense of that, you don't like, you know, maximally use the full capabilities of the model.

[00:24:07]I think Lijie has taught me a lot about how to use these tools better.

[00:24:11]I feel like I'm sort of a dinosaur in some respects in terms of adoption.

[00:24:17]Because I think I started working at OpenAI well before these tools existed.

[00:24:23]And so I think I have a lot of old bad habits where I don't trust the models enough.

[00:24:27]I still think it's like the models of six months ago or something.

[00:24:31]That's an interesting paradigm.

[00:24:32]Okay.

[00:24:33]So, Lijie, what advice would you give?

[00:24:35]Oh, I have this method of every time you double your trust on a model.

[00:24:41]and see when it fails.

[00:24:42]And if it fails, you just go back.

[00:24:45]And you do this every month.

[00:24:48]Then you can quickly get to the point where you can maximally trust the model, but also not breaking your stuff.

[00:24:56]And apparently, for the last five months, it's going really exponentially.

[00:25:01]Back in the GPT-3 days, I had a list of tests and things like this I would do.

[00:25:06]And I'd watch them incrementally get better, then GPT-4, And then by the time o1 came out, I had to throw it out because that was just toy problems at that point.

[00:25:15]And I feel like I have to continuously sort of adjust and kind of keep trying bigger and more complicated things to do that with.

[00:25:22]Do you think that for somebody who's in mathematics or in a related field right now who's feeling a little bit concerned by this, do you think that they should be taking a more optimistic approach?

[00:25:31]I think it's legit to feel concerned, especially when a lot of the field is problem-solving oriented, because models are going to be really good at problem-solving.

[00:25:48]But mathematics is really, really much more than problem-solving.

[00:25:53]It's more about understanding the structure and building new theories, like Lijie said.

[00:26:00]And I think we should try to figure out how to better use the model to help us in solving the problems that we met and then try to accelerate the speed that we build new theory and come up with new understandings.

[00:26:17]I think that's the more optimistic view.

[00:26:21]When Codex becomes much better, it can do so much more things for you.

[00:26:25]You would expect you will work less because Codex is good.

[00:26:28]somehow you actually work more because that's way more thing you can do.

[00:26:32]So I actually hope this can happen for math as well.

[00:26:35]The model becomes so good, I must imagine you have 10 ideas.

[00:26:39]You can ask 10 models to try them and see one of them succeed.

[00:26:43]And they don't have to do tedious calculation by themselves.

[00:26:47]So I would imagine maybe what happened with coding can happen to mathematicians.

[00:26:53]It's interesting too because when we talk about the Erdős problems, Paul was a very interesting person who found a lot of things curious and said, oh, this is neat.

[00:27:02]And we have this category of problems he put together, but there's not a lot of rhyme or reason to them.

[00:27:08]They were just things that he found curious or worked with other people on.

[00:27:12]And I think that's kind of a big thing that's neat about science in general is often we think that there are these real specific hierarchies, but literally it can just be things we're curious about.

[00:27:22]That being said, how long before there are no more unsolved Erdős problems?

[00:27:26]Some of them are very, very hard.

[00:27:28]Yeah.

[00:27:28]Yeah. So, yeah, I don't know.

[00:27:30]Do you foresee us, maybe Alex, needing to come up with a new category of problems?

[00:27:34]I think probably the hardest problems on that list, I think that list includes the Collatz conjecture.

[00:27:40]These are problems that feel very, very far out of reach of the mathematical technology of today, even though many of them are quite simple to state.

[00:27:49]So we'll still have some more things to work on and continuously move things through. That's good to know.

[00:27:54]It's exciting, though, too, to think about what happens when you do start applying this to other areas in physics and astronomy and start looking at data sets and stuff and what kind of discoveries are going to be in store.

[00:28:07]Do you have any particular area that you're hoping to see?

[00:28:09]Oh, I hope it just solves P versus NP.

[00:28:11]Okay.

[00:28:13]How about you, Alex?

[00:28:14]I think the next milestone in my head is really like AI that can do AI research.

[00:28:21]I think there are so many unsolved problems here.

[00:28:25]We are, in a sense, in many ways, limited by all the limitations of just our own intelligences.

[00:28:37]I'm optimistic about just having AI broadly available as a technology because there's just so much more demand for intelligence in the world that humans can supply.

[00:28:49]Oh, I wanted to say P versus NP too, but Hongxun said it.

[00:28:52]Yeah.

[00:28:53]So I guess beyond that, like one concrete thing I'm very interested in is like, no, like it currently assumes AI is trying to combine ideas from different fields.

[00:29:02]And of course, in a very novel and, you know, sophisticated way, but like, can AI actually generate completely new ideas from scratch?

[00:29:11]I mean, that's something like we haven't really seen concretely in AI.

[00:29:15]And that's something I maybe want to see next happening.

[00:29:19]And that can be very cool.

[00:29:21]Have you seen traces of that yet?

[00:29:22]I think so.

[00:29:23]Like, you know, even in this problem, I mean, I think some-- if you look at the chain of thought, which is like 125 pages, I think some of the thoughts are pretty creative, although they didn't work out.

[00:29:36]Yeah.

[00:29:37]I mean, the final idea is more like combining all the stuff.

[00:29:39]But somehow it has some creative thoughts.

[00:29:41]But it is interesting.

[00:29:43]Early on, arguments were like, these models But you could give it two ideas that have never been connected before and say, what's the relationship?

[00:29:51]And that would be something very, very new.

[00:29:52]It felt like something different.

[00:29:54]And I feel like we'll probably be seeing more of that.

[00:29:57]Do you see us coming up with new forms of mathematics?

[00:30:00]I think that actually will be a shorter way down the line.

[00:30:06]Next year.

[00:30:07]Maybe.

[00:30:09]I think because models now are very, very good at coming up with some idea to solve a problem.

[00:30:14]but it's not good at proposing a completely new different kind of math or proposing new theory.

[00:30:21]How to get a model to do that is still very, very open.

[00:30:25]How I would think about it is we see this like Moore's law for the time horizon at which these models are effective.

[00:30:37]And I think you sort of feel that in math where, you know, there's like every few months, the amount of time these models can like sort of work independently for doubles, at least the amount of human equivalent time.

[00:30:52]And so, you know, for solving problems, if you're really, really good at it, maybe some problems like you actually have pretty short paths for the solution.

[00:31:01]You don't need to take that long.

[00:31:03]But I think for inventing new ways of doing mathematics, that's much more like a years or decades long process.

[00:31:11]And so I think it'll still take a bit of time for that exponential to get there.

[00:31:17]This was done by an internal model that you guys worked on.

[00:31:20]And since then, GPT-5.5 has been able to do the same thing.

[00:31:23]And we've seen other labs have said that they've been able to do this as well.

[00:31:27]But this was several weeks ago, which is now ancient history.

[00:31:29]What have we seen since then?

[00:31:31]I think one difference between the original result and I think what the follow-up findings have been is that actually for the original model, there was no scaffolding needed.

[00:31:44]You sort of just asked it to do the problem and then it gave you the answer.

[00:31:49]And so actually this is all, you can read the original prompt and response in the note we uploaded on the blog.

[00:31:58]Whereas I think the follow-up efforts have had a little bit more structure or steering of the models.

[00:32:06]I think one interesting data point here is that it's really all about test time compute scaling.

[00:32:14]After we initially solved the problem, this is a plot Lijie brought up earlier, is that with enough test time compute budget, the model is able to solve the problem around 50% of the time.

[00:32:29]So it's not surprising that you can get there with other methods.

[00:32:34]You can find this with other methods as well.

[00:32:37]But I think what's really important here is just that as you pour in more test time compute, you get better results.

[00:32:43]It seems like it's kind of a virtuous cycle where you take today's model, give it more compute, let it solve for that, and you understand how these problems can be solved.

[00:32:51]Next generation models can learn from that and just get more and more efficient.

[00:32:54]And you just have this, basically, it seems like it just scales forever, right?

[00:33:00]What do you think we're going to see by the end of the year?

[00:33:03]What I want to see is people use our model to discover lots of new stuff.

[00:33:09]And not only in math, but also in all of science.

[00:33:12]Of course, OpenAI wants to do some cool math stuff.

[00:33:14]But I think it would be better if everyone can use the model to discover their own science.

[00:33:19]And I would expect like many mathematicians will use the model.

[00:33:22]I mean, maybe not completely on the model, but collaborate with the model to discover a lot of more math results.

[00:33:30]I think that would be really cool.

[00:33:31]Hongxun, I've talked to some mathematicians who know others or who are very reticent to even try using AI in mathematics.

[00:33:40]What is the best argument you can give?

[00:33:43]I think I'll just show them the proof of the conjecture.

[00:33:48]I think just about productivity, right?

[00:33:50]Like we do math not just to enjoy the pleasure of the problem solving, but also to advance a field and to understand the truth that we're looking for.

[00:34:07]And using AI is going to speed up that by a lot.

[00:34:13]It's going to tell us what we are really struggling to find, and it will be hard to resist using it at some point.

[00:34:23]You could be an astronomer and not use a telescope, but you kind of have to ask why.

[00:34:28]Yeah, exactly.

[00:34:29]I know one of the researchers here likes to watch computers play chess against each other, and it feels like he sometimes learns things from that.

[00:34:38]Do you think that we'll learn to be better mathematicians or researchers or scientists or just thinkers in general by watching the solutions the models come up with?

[00:34:46]Looking at 125 pages of thinking is probably not very helpful for a mathematician.

[00:34:55]But just by looking at the answer, you actually do learn some idea that you didn't know before.

[00:35:07]And that inspires later mathematical works that knock down other problems.

[00:35:12]So I definitely think people learn something, like mathematicians learn something from AI solutions.

[00:35:18]Yeah, so some of the mathematicians that we asked to review the proof together with collaborators, they actually use the idea to disprove some product conjecture, but for real numbers.

[00:35:32]I think that's one very good example.

[00:35:34]AI can crack down important questions and give us ideas that we can apply elsewhere.

[00:35:40]Yeah, I think it's remarkable that this group of mathematicians has already, just in the span of a week, already used it to disprove this result that I think is maybe of similar importance to the unit distance conjecture.

[00:35:57]So I think this is a wonderful example of mathematicians seeing this and using it as inspiration and bringing the ideas to bear on a different problem.

[00:36:09]What does this mean for the mathematical community?

[00:36:11]I think for us, like when we do these experiments, I think we want to like make sure we like, like empower the academic communities we interact with, where we don't just like go to some community and like, you know, from the outside, try to like solve a bunch of their problems and like give them a bunch of like AI slop.

[00:36:34]But what we really want to do is we want to make these tools available to researchers and let them direct, you know, this like all this like AI test time compute at the problems they think are important.

[00:36:52]And I think it's not, I think it's like it really shouldn't be viewed as a race to like, you know, solve as many Erdős problems as we can.

[00:37:01]But more like, you know, we want to like make people aware that the technology is out there. This is what it can do.

[00:37:08]You're not trying to solve every Erdős problem.

[00:37:10]Yeah, I would not say that as our goal.

[00:37:13]I think this has just happened to be a particularly significant result that we thought would be important to, you know, share with the world.

[00:37:20]that this is the capability level of models today.

[00:37:24]But this is like, it's really not the goal to like, you know, just like go through the list as if it were a race.

[00:37:30]- Do you foresee things applying to like cryptography?

[00:37:34]And there's also some debate too about, do these models get so good that we kind of surpass even where quantum computing goes, which sounds kind of crazy.

[00:37:45]- Yeah, I think cryptography is really an important topic these days because, you know, the foundation of cryptography is really about some problems like factoring.

[00:37:54]It's hard to solve by computers, right?

[00:37:57]But basically, we only have conjecture.

[00:38:00]There's no mathematical proof of this fact.

[00:38:02]And suppose the model gets really good at algorithms.

[00:38:07]Maybe they will prove some of the cryptography conjecture and say, "Okay, those protocols are actually secure.

[00:38:14]We don't have to conjecture them to be secure."

[00:38:16]Or maybe they'll find some loophole.

[00:38:19]And that's also very important.

[00:38:22]I think we need to make sure the foundation of our security is good.

[00:38:28]So the model can stress test the foundation of the cryptography to make sure we have better security.

[00:38:33]What about quantum computing?

[00:38:36]I think that's a very different territory.

[00:38:39]Quantum computing, actually I used to study quantum computing.

[00:38:44]My first paper is on quantum advantage, which shows like for some tasks quantum computers can do better than classical computers, but so far I think the models.

[00:38:55]I mean that is classical computers.

[00:38:57]I mean that they do what human can do I mean maybe a bit better because I'm with that we can sometimes do like more fancy stuff like simulating some quantum effect in chemistry, which we probably not.

[00:39:10]I'm not an expert on that, but that might not.

[00:39:14]It's unclear. It is just two different paradigms.

[00:39:17]So I'm not super sure how they compare to each other.

[00:39:20]But I think AI is going to greatly accelerate the pace that we develop quantum computers. In recent years, there's improvement in error correcting.

[00:39:32]You have quantum error correcting codes that only uses simpler type of operations and that really speed up the physical implementation.

[00:39:43]So I expect more of these to come from collaboration with AI, that AI can propose new quantum error correction algorithms and then we can develop the quantum computers much faster.

[00:39:58]Once you ask the model to solve a question, you can of course follow up with how did you solve it?

[00:40:04]can you explain this part of the proof to me?

[00:40:08]And then the model will patiently try to teach you how everything goes line by line. So it's actually not just one-shot problem solving.

[00:40:19]You can ask a follow-up question to learn how the proof works.

[00:40:24]And I really like that.

[00:40:26]One thing you learn very quickly as a researcher is that if your results are too good to be true, you probably have a bug somewhere.

[00:40:35]I think every researcher has had an experience where they see amazing numbers from their experiments and it turns out the experiment was actually wrong, the numbers were wrong.

[00:40:46]When I first heard about this from Lijie and Hongxun, that was my prior.

[00:40:50]I was like, "Oh, I'll wait for them to find the bug."

[00:40:54]But then I think as the days went on um is you sort of like have this like growing like optimism that oh like you know maybe this is the one in a hundred times where it's like it's like too good to be true but it's it's actually true.

[00:41:10]Well, gentlemen, thank you very much.

[00:41:13]Yeah, thanks so much. Thank you so much.

Related Videos

Artificial Intelligence

OpenHuman VS Hermes AI: Who Wins?

JulianGoldieSEO

285 views•2026-05-29

Artificial Intelligence

BREAKING: Microsoft’s New Image Generating Model Beat Out GPT 1.5 and Nano Banana 2

aimmediahouse

122 views•2026-06-03

Artificial Intelligence

Long-Running Agents — Build an Agent That Never Forgets with Google ADK

suryakunju

142 views•2026-05-30

Artificial Intelligence

This computer is made from real human brain cells. And you can buy it.

Talktmsmedia

3K views•2026-05-28

Artificial Intelligence

I Made the Same Anime Fight Scene in Every AI Video Generator

NobleGooseAnime

295 views•2026-05-30

Artificial Intelligence

Nvidia Bets Big On AI PCs | New Chip To Power Windows Laptops | Technology | AI Updates | N18S

cnnnews18

3K views•2026-06-01

Artificial Intelligence

I Tested NEW Opus 4.8 on Four Projects (Updated LLM Leaderboard)

AICodingDaily

298 views•2026-05-29

Artificial Intelligence

3D Platformer Update - NO CAPES

SolarLune

294 views•2026-05-30

Trending

Why Batman Lets The Joker Live 🤨

zackdfilms

9222K views•2026-05-30

Computer Science

The Meta AI Hack Is a DISASTER

LowLevelTV

141K views•2026-06-03

Paris is in SHAMBLES right now 😭

H1T1

4053K views•2026-05-31

The Casino Had Us Guessing All Day

VegasMatt

157K views•2026-06-03