Formal verification, rather than just fixing errors, is the critical pathway to scaling AI brilliance and achieving superintelligence. Unlike informal AI systems that rely on human experts for grading and cannot scale effectively, verified AI systems using formal proof assistants like Lean can compound intelligence, achieve better sample efficiency, and provide the rigorous foundation needed for true mathematical AGI. The key insight is that verification enables both scaling up (deeper capabilities) and scaling out (broader applications), making it essential for any AI system aspiring to reach superhuman mathematical reasoning.
Deep Dive
Prerequisite Knowledge
- No data available.
Where to go next
- No data available.
Deep Dive
Scaling Past Informal AI - Carina Hong, Axiom MathAdded:
But it's for the first time now I think verified AI is to [music] open up collaboration either it's human AI collaboration well before blueprinting that's human human collaboration and lyn was a grounding was a verification formal language and then human [music] AI collaboration like we're seeing now future AI agent agent agent like collaboration like I [music] think verified AI is for openness it's not for meeting the requirements of closed industries and I think just like I think verification should not be about oh I remember like you know there's this article like chatbots It's mixed up of is math solution to hallucination.
Verification to me is not about lousiness. Verification to me is about scaling brilliance, compounding brilliance. It's like just kind of going back to the collaboration point. It's about Rammenujin being a much stronger mathematician. He was already a really strong one, but verification helps him extend the brilliance like both kind of like scale up and [music] scale out.
>> Welcome to the Leen Space AR for science podcast. I'm Brandon Anderson. I build uh RNA therapeutics at Atomic AI and I'm joined by R.J. Haneki uh the CTO of Mirror Omix uh working on spatial transcrytoics. It's a pleasure to have Karina Hong from CEO and founder of Axiom Math. Um Axiom has made a splash [music] in several different areas.
First they were they got a perfect score in the Putinome uh last uh December. I think they also had the claim of the first AI to prove research conjectures using formal verification and um very exciting they just yesterday [music] announced quite a large uh series A. Um yeah, welcome to the show. Thank you for having me.
>> You just raised $200 million, which as one of your colleagues said, this is like basically the entire like US math budget for math research each year.
>> Is that true? Actually, >> according to his LinkedIn post, yeah.
>> Wow.
>> Uh 250 million is our apparently the annual uh math budget.
>> We should spend more on math research.
[laughter] >> Yeah, it's kind of sad, but >> Yeah, I know.
>> But anyway, like you know, as a you know, as a nerd who loves math, that's like really cool. But I mean I'm just like that kind of blew my mind like what [laughter] I heard that like okay so like yeah how is it 200 200 million I guess 1.6 billion valuation. Yeah I don't know.
>> Yeah. Um well super super excited to be here. Um also I think like you know this is a series A so it's very very interesting timely timely podcast. Uh we are like a seven eight months old company so it definitely means a lot to us. It's a really cool milestone. Um we're currently about like 30 people now. So kind of going into I think this amount of funding will like give us a fuel that we it needs to to to accelerate um the strong execution momentum that we have so far. Um I think like people think of us like there are many kind of ways to think about people think of as us as a math startup. So math startup lin startup the other obviously things that we do um that are formal verification. We think verification is a really good best first market for math >> and so I think this fundra is going to like let us explore some of the applied domains uh as my colleague CTO Shubo said in the the little launch video um of the series A we had is it it lets us broaden our dreams. So yeah >> but still like $200 million and I guess a 1.6 billion valuation. How is there a market for that? I mean I was like obviously you're not doing this just for the fun of proving things although I'm sure there's a lot of that but >> so let's let's bring us back to 2024 so when you know 01 recently models like just came out >> what is what was anthropic kind of like secretly working on back then it was coding >> and everyone knows they're working on coding like open AI meta ax everyone has full knowledge that anthropic was working on coding and they just like overlooked it they thought oh there are at B2B place they just want one vertical people think of coding as one vertical and now look at where we are today coding kind of like strong transfer learning from coding to reasoning to basically you know a monopoly in the in the future of reasoning and I think that's that's really really shocking the people who are working on coding I think back then believe in something that we believe you know similarly with math and le now which is that if you have more structured and formal data it's going to be a lot more horizontal than the specific vertical we are tackling. So you know if today we are doing you know math informal way like the standard train of thought data train a math model based on human preference then I would say well perhaps we are just a math startup right but you know while we are pursuing math we are also doing things that do have transfer learning to other to other domains um so I think that's kind of like the broader broader picture is that while the DNA of the company remains math and all of us are math nerds and this is a very strong culture statement. Everyone has a great mission of having AI be a superhuman mathematician like we are seeing on punan on a batch of research conjectures. In fact, we have another batch coming. Um we're also thinking that this is going to be fundamental to verified reasoning and we kind of talk a little bit about verified AI. I want to talk a little bit about verified AI next because I think you have another >> Yeah. Yeah. I have several things I want like uh so I want to hear about the verified AI. I do want to dig in a little bit. So, do we know that you know Anthropic and OpenAI and everyone they're not doing formal verification and using that for their rollouts and whatever?
>> I think I have a lot of like rumor mill that I probably shouldn't like put it on the record like I think you know like researchers talk they play card games.
Yeah.
>> But there there really interesting reasons if they are or not doing it. I think that's like kind of the takeaway I have which is that if you're like at a frontier lab and the direction [clears throat] actually does change a lot for a lot of reasons beyond your control.
>> So I want to kind of like bring us back to the alpha proof moment right like alpha proof was such an amazing that really the 2024 um 28 out of 42 performance was the IMO moment for me.
It was not gold in 2025 because across 2024 and 2025 AI models could solve all the problem that are not combinatorics.
The only difference is that you know if you get all the problems that are not combinotaurics you get 28 in 2024 and 35 in 2025 because there's only one combinatorics question in 2025. um after alpha proof kind of like we didn't see a lot of the formal math uh you know results or kind of progress from Google deep mind and that's actually because of reasons that are not necessarily technical but if you're at a startup and you have very singular focus that is formal math and verified AI then um you know you get to work on really cool problem for a long time and you have like a lot a lot higher likelihood to get to where you want to be in terms of like progress and breakthrough unlock.
>> So yeah, just define that for us.
>> Yeah, like a lot of people think about formal verification as an ancient you know subject. Um it it existed like as long as you know way before like like deep learning and it existed in the time of rule-based computer science. uh there's this really strong push of like formal verification around like since ever since 1980s uh really interesting historic anecdotes such as uh I think the Paris trade union demanded that the automatic switching of the subway system needs to be formally verified for safety purpose. So quite interesting trade union for for technology um and like I think around the time of challenger both before and after European space agency was using formal verification for the Arian spacecraft. Uh it's also interesting Boeing Airbus for verification and then more recent years right like I think like there's a lot of push about automated reasoning at AWS because they have a lot of enterprise customers that really requires things to be to be 100% you know verified and there's no edge cases mi uh like missed and like just general like testing doesn't satisfy the need. So a lot of people think about verification as something that's like annoying because it's like tax and compliance like it's making sure that we are good to go right and like that's really not the and and so we we talked about like verification I think our competitor when they launched they talk about um formal verification pre-reasoning they talked about it uh in the time of hallucination and and maybe for them like formal verification is about the lousiness the the hallucination for us. No, like for us verified AI is about the brilliance.
It's about scaling and compounding super intelligence. So this is quite a deep point and sometimes it takes a little bit of explanation. So if you think about like you know the the place of brilliance for example ramen like he's a brilliant mathematician he was able to find a lot of like interesting formulas just by intuition before he know how to do proofs. So he went to Cambridge um you know work with Hardy and Littlewood and you know in the famous movie the man who knew infinity there's this like storyline of how hard it was for Hardy to force him to no longer rely on intuitions and do proofs. After he learned proof writing he came out as a much more powerful mathematician whose results um like intuitions turn into theorems and future generations of mathematicians build on that those theorems. So it is a way to kind of scale and compound uh the intelligence that we already have. Another example, mathematicians kind of have been writing code in English or their respective countries natural language for thousands of years. And why do I call it writing code? Um because there's this sort of community standard of rigorous logical deduction. Everything has to be step by step correct. Um otherwise you will get outcasted by your mass community. uh like >> the law well [laughter] >> rules in the community. So, so it's interesting right because that is kind of human mathematician enforced right and so uh it's a peerreview process peer review of a paper currently takes two years >> okay so but proof assistant and you know formal proof trackers like lean still found its place right and why like if you if I'm a mathematician and you know any like my work can be peer reviewed by other humans like why do we even why do mathematicians even play with lean right and why do we even like talk about kind of like you know lean based um assisted like ser improving it's it's because like it handles a low level for example we're not even talking about AI we're talking about for example the grind tactic in lane it can currently handle a lot of mass proofs like at a very low level and and this is pretty shocking because I have seen you know actually another uh company working in the same space like you know some of their demo and I look at the demo like it can actually completely be handled by grind uh which is a tactic in lean Can you explain what lean is to non experts?
>> Okay. Yeah. I think our order is like a little little wrong. Yeah. So LIN is a computer program uh a bit like for mass proofs. It is a formal language um just like its cousin Isabel CoQ or rock um and um some other further cousins like Daphne Agda like these formal languages whole sector. Yeah.
>> And what what does it do? It it basically if you have a proof written in the program in ling and then uh assuming there's not no any like weird things happening like you know like you know use of sorry which is a tactic that let you take things for granted assuming everything is is safe um hence P people have tools like comparator safe verify and Axium recently rolled out verify proof that's like 100 times faster than comparator um then you know once you kind of execute that program it like once it compiles and it tells you that is correct then it is the proof is actually correct.
>> So just like a type checker >> yeah that is based on this result called u car Howard correspondence which turns proofs into programs. So, so I want to talk about the magic of ling. Why I think it's a really good programming language is because on one hand if you don't care about the formal part at all, if you don't care about the logic part, you just want to use ling to write code, you can like we have had candidates actually currently um you know the person is working at the lin fro um he wrote autograd in lean in our interview process.
>> So it's a it's a a turning complete language.
>> That's right. So um you can write you can do a lot of things with lean you can it's a it's a functional programming language right and then you can you can also use it to so you use it to do coding you can use it to do math two in one >> okay >> and kind of going back to what I was kind of getting at if mathematicians are already enforcing that most proofs you know say say maybe not all mathematicians but by but the ivory tower and people in academia all proofs are correct why do we even need lean the model tracker It's because lean has tactics that help them handle the low-level calculation or proof or deduction not calculation. Um then for them to be able to navigate in a high level intuition space. So this is my point that it is not about like formal verification or verified AI to us it's not just about handling or like kicking out the lousiness, the hallucinations, the mistakes. It's about scaling brilliance. It's about super intelligence. I actually Terrence Ta has a great video also about using lean to as a way you can collaborate because you can you exactly that's another point I want to I want to talk about right a lot of people think about you know what is our what is our market it has to be some like really niche industrial societies area that is mission critical safety critical no that's not the TAM the TAM is all code >> the TAM is a is a right of first refusal on all AI generated a code like for right the first refusal meaning you know you get to choose whether you want to verify it. So this is the important part I want to kind of come across which is that people talk about formal verification as almost like painful because it has all these like stringent requirements >> up until now it has been [laughter] >> yes and and to us it's actually verified generation means performance gain it means higher sample efficiency it means a startup like us with like you know still we we raise some money but lesser compute budget lesser data budget than Frontier Lab will be able to match even exceed you know performance on superhuman tasks. In fact, for the punam exam that we just competed December 2025 which we did in real time mass arena which is this organization that evaluates a lot of LMS found the best LM deepseek got 103 points out of a 120 point exam. The best human obviously we now know is a student from either MIT or Chicago. We don't know which one um because they don't announce the top five winner score got 110 and we got 120. So it's the first time actually I remember when we were starting this people were like is it even possible that a formal mass you know system with so much orders of magnitude last data can can match or beat a a for an informal LLM and pund is the first time it beat right and and so we're not thinking about it just about the painfulness the challenges it pose we are thinking about the verified generation performance gain the improvement the the fact that you can you know just like you would expect RL4 lean seem to to have improvement because of seeing evidence of our own coding. So this is the second point I want to make about like how to think about verification verified AI.
>> So maybe we can talk a little bit about why. Can you describe what what is different about what you do versus what the frontier labs you know at least when they're building their standard RL enhanced uh LLMs. What what's different about what you do?
>> Yeah. So we heavily rely on a kind of data called lean data and we kind of talked about lean is all the all the data um that we have that's lean proofs you know it's correct so you you know it's correct or not and that's quite quite important so you know we have a system of models these models are post-rained and um using RL or SFT >> so LLM found like some sort of foundation model that you get off the shelf and you postrain it or or continuous.
>> Yeah. And there's obviously an inclination for open source, you know, base models.
>> So, it does speak English, >> probably knows how to code, but it also you fine-tune it or or continue >> and the base model might be similar to what everyone else saying as well, right? Um if [clears throat] they're not kind of pre-training their model, right?
>> Yeah.
>> And then we basically do this, you know, RL for formal math kind of there's I think a standard pipeline or like, you know, tricks of the trade that people use. We try to innovate really uh on top of it as much as we can. I think that we found um scaling inference to have almost no wall um recursively decomposing uh you know a proof goal into many sub goals and then learning to backtrack as well. Is there a risk that like you start out with this, you know, what you know and a certain domain of data sets and so on and then you start rolling out, you know, recursively in a space, but now all of your training data is localized in some domain that you it still is only so like maybe logarithmically um in some large space growing from your initial training data. So you could get trapped essentially in that you know you could be really good at this but you just created a big jagged frontier where some other domains are just far from them >> distribution shift we're talking about.
So yes so so you know it is an open question whether a uh a system that can do really well in number theory can do well in uh give me you know another another field of math. Yeah exactly.
Well, actually I think this the way we think about it is it depends. It depends on whether topology has a lot of the um existing definitions as almost like you know the the math infrastructure >> um existing because what people have found in the past is when people were building out math ligebra um you know book work um like they they can just >> so math li being the lean like undergraduate library kind of stuff. So it's like all the proofs that you learn in undergraduate math and they're all sort of in >> vain. Yeah. So for example, some of my friends um who currently are Axiom. It's you know crazy like full circle back moment. Um Kenny um we're like friends for like you know five six years and he was the first one to tell me about lane.
He was working with Kevin Buzzard to build out math lib. It's a lot easier to codify algebra in math lib than than for for analysis. M uh so so that's that's interesting because for analysis a lot of the definitions around convergence limits is sector becomes tricky and so I don't think there's a lot of like topology in math lib today um in terms of like differential topology differential geometry kind of stuff so you know our system likely will not do very well on those on those domains because it doesn't even have definitions to build off on top of for the places where the um definitions are are in we actually are doing quite okay in terms of distribution ution diver diversity we have good performance you know so having solved open research questions and um number theory commutative algebra algebraic geometry some discrete math that comes and probability >> so earlier you said that like with the pudnum exam you the the 2024 version when all of the questions were that were not that alpha proof did not get right >> the IMO international >> yes for the IMO all of the ones I got wrong were in cominatorics is there is Is there like a weakness there in that specific domain?
>> I would say so for for Olympia in math people are seeing commonars being a little bit more um tricky.
>> Uh seems like the steps are quite creative. So I for I'm a human and you know when I have friends who are really good at commentaryics which I never consider myself really the the top of comarics. I'm kind of better at number theory but I know some people who are just they're imo gold perfect score put them fellow perfect score and like all the way and then when they do like tricks and comarics I'm like >> I don't know how you thought of that and but you know after you give me that construction actually becomes a lot more trackable I think a leanbased system will struggle in those very creative um places which is why we at Axiom actually also invest on something called mathematical discovery. It's not used and we have some major news in the coming weeks basically open sourcing entire code bases of mathematical discovery coming up.
>> You want to tell us a little bit?
>> Yeah. Yeah, sure. U so uh we are currently uh having two code bases um being open sourced. So the goal is for if you're a mathematician or you're a theoretical physicist and you have a problem that you would like to solve.
For example, you want to uh find a construction that is a very complicated graph construction, then we would suggest you follow the very detailed manual supposed intended for mathematicians to run the code that we write. Uh it's a it's a tool for for mathematicians to make mathematical discoveries. Mathematical discoveries is this idea that you know proof is not enough for math. uh in fact before you kind of start proving something you don't know where you want to start. So you will try to construct some interesting examples. These can be usually say sequences right if you want to understand the property of a sequence you will write out a few of the first terms. This can also be graphs. So if you want to you know figure out what the graph that you're looking for um should I have say a certain property uh then you will start by doing some simpler version of the graph. Now constructions cannot be done by lean. So we believe in having AI for mass discovery and we have you know one of the OGs in that field chart um member of technical staff at at Axium and he previously have done pattern boost and end to end you know settle disproof a 30-year-old conjecture by finding a counter example um found the solution to 130 year old problem the global leono function that is a kind of mathematical object showing up in the three body problem. So we we are we are thinking that you know it's mathematical discovery tools should be open to the mass community. So we are open sourcing entire code bases for that.
>> So discovery meaning it gives it makes new conjectures or it >> that's a yeah it's a preconuring step actually. Oh I see.
>> Yeah. So you you start to form intuitions right. If you're a mathematician and your goal is to solve a really hard hard conjecture x improver can't just solve it for you. Um you might want to try to formulate some sort of lemas conjectures that you want to say then give to axium improver.
[snorts] Um if you're a human mathematician you will start by wanting to formulate that conjecture. You don't know where to go. You want to find constructions. Now uh the code is that we're going to open source going to help you uh hopefully significantly. So, one thing that maybe there's a lot of computer scientists listening and one of the things that will immediately kind of come up in especially when you're talking about formal verification and so forth is RI's theorem and decidability and incompleteness theorem and and maybe um some arguments about computational complexity and LLMs. So I I'm curious to hear Rice of Serum says you cannot prove non-trivial things about programs for all programs, right? So h how are you navigating this space? Obviously formal verification, you know, does is able to do some things.
>> Yeah.
>> So yeah, I think like it's it's very clear that you just like there's theoretical result telling you you cannot formally verify all programs, right? But you you I think it it's good to formally verify majority of the useful programs, right? So you know like I remember uh there's this MIT uh like little like documentary or not a documentary like an advertisement for you know uh people who are admitted students and then there's this famous line by Tim the the beaver the mascot of MIT saying that what aory give you which is which is kind of like it doesn't stop us from trying to push it as much as possible. So the goal that we have for the future is suppose you are you know doing doing the the coding you want to v code a really complex task. So you know currently it's front end websites but in the future we might want to vibe code much more complicated things whole distributed systems even then we want to be able to say decompose it there's maybe a highle kind of like sketch plan this we can make other people can make but say you know you have claw give you like you know kind of break it down into 10 things and at one point it will decide to call axiom and axiom will give you a computer program that you know is formally verified or it will say this is still too hard for us.
>> So you you write the program, you give it to Axium, it makes changes to it maybe.
>> So so we're talking about kind of two um sort of phases.
>> Um it is possible that we are the verification partner. So you already have a computer program and you want us to verify it. In fact, like you know, GPT found a proof to an unsolved Erdos problem and our competitor Harmonic, you know, Aristotle um you know, verified it. But we we can do we want to do verify generation, right? We might want to say, hey, you know, this little component everything that we generate and provide for you um is is formally verified.
>> I see. So, so the idea would be you you generate you co generate bo both and so that and I can imagine this fitting into um you know the idea of a promise or a sorry sorry and then a sorry [laughter] >> lean sorry >> a lean sorry meaning it's a lema that is unproven but you're just taking it as given until you can take have the time to prove it right is that a good way to think about a sorry >> that is a good way to think about a sorry but not necessarily in the coding context >> it's So I can imagine you're you can say assuming that this module is verified then this module is correct >> and and so that that you can decompose a problem small enough that you can verify is this kind of >> so so let's say we want to you know like web code control flows.
>> Yeah.
>> Right. That's quite hard. you will likely, you know, break that down into multiple steps >> and then it will continue to break down these steps into more fine grain steps >> and at one point you want something that is absolutely correct. Y >> and then this is also something that is likely within reach. Then we want to generate you know both uh we want to generate a piece of computer program and underlying is a guarantee that there is also the uh proof that has been generated which tells you that the thing that you specify this you know programmer can solve for you. So, so, so the vision we have is anything that can be which anything is you know and it's a little bit marketing because because as you said theoretical bound but but mostly um well almost surely hopefully um anything that can be defined can be executed anything that can be specified can be proven. So the way I think about it is if you have a uh program um times a a you know a program times a times a statement or problem it maps to verifiability conditions times a proof.
So while the programming ver program verification community has given you say the verifiability conditions and we're trying to kind of recruit a really strong team to help us do that. action prover is gonna give you the proof.
>> So just help me map from the program to the proof because like I could say you know this twoline lean program verifies you know sort of like whatever whatever I claim it solves. How do I know that it actually verifies the thing that I think it >> verifies? Yeah. So so for example there's this currently there's this benchmark called code marina. It's a uh code verification benchmark um that's supposed to be limb friendly and so you know every problem is a coding problem and the goal is to generate there's a code part and there's a proof part two two different computer programs. Yeah.
And then the goal is to generate code with proof. So you know the code that supposedly solves this problem and then the proof that this program indeed does solve the problem. I see. Now now how do people do on this benchmark? I kind of want to like talk about this a little bit because it's interesting. It was um rolled out I think by um Berkeley and Meta researchers in 2025 and they found I think whatever version of GPT they evaluated does like pass one like 3.6% iterative something like 22%. Now you know how does the formal mass systems models do um copra which is a a system because in a system you iterate and define so pass one doesn't quite work but still they evaluated pass one of the system about like I think 11 12%. And then also deep sea prover and go to prover model 11 12%. And I think our competitor has released last year on the only proof part 96% um and we actually recently with no modification to the pandm system we saw a 99% out of the 189 problems we solved 187 we missed only two um code with proof. So if you if you want to train something to do code with proof and you want to do reinforcement learning it's actually quite annoying because look it's it's mix if you want proof to be informal math it's it's very annoying because then that's like just mix objective function um your code is something like Python your proof is say natural language mass proof um you will not have very strong RL kind of performance right but if you have proof as lane and you have you know code you can choose roster which is a strongly typed language. It's more it's more conversion. So you're going to have much better performance. I can't wrap my head around how do I tie so like I can say that this proof solves for ma theorem right but I don't know that like >> but it's two lines in lean obviously it doesn't so how do I know that the program that I wrote matches the proof that I generated >> you will basically look at the coding problem and you look at the the program and then you um like try to see if it satisfies the verifiability conditions But like how do I know, right? Like if I read it, >> right?
>> Um you know like I can I can just like eyeball it and I can say and then like traditionally how mathematicians have done this is they they you know they take the paper and they read it and they say I agree that this proof solves the problem and then this other person says no wait it doesn't for you know like look at this and then people disagree and eventually there's consensus that that like this proof solves this problem. So like how do how are you >> but you check it step by step. Right.
>> Yeah. Right. Right.
>> Yeah. Yeah. So you basically will look at the verifiability conditions and see if it does actually satisfy that.
>> So so suppose suppose like we're looking at like you know a piece of computer program. Yeah.
>> Right. And then whether it does actually solve the coding problem. You will have a judgment about that. Right. Yeah.
>> So you will not solely rely on testing even though that is a way. That's what >> so somebody looks at the proof and says, "Yeah, that actually solves the problem that we think it's supposed to some."
>> But then but then now you're you're basically producing a you know formal verification program that satisfy the verifiability conditions >> about this >> program and this statement. So again the function is taking you from the program and the statement to verifiability conditions and proof.
>> Okay. So I can see how this works in a benchmark. Then if I have let's say I have a a flight control system that is like very >> then the the problem becomes very annoyingly uh you know this the like specification I think the word is going to you know even if we say successful like like anything that you know that we will have a specification problem.
>> Yeah.
>> So like here comes a bank saying that like please do I have a really safe financial audit? Sorry. Like prove the financial audit for me, right?
>> Yeah.
>> Like >> what does that mean? Like we we can't specify. Humans are bad at specifying everything that we want. [snorts] >> There's always like some sort of >> saying that we are not specified and if it's not specified, it's not proven.
>> Okay. So what do you do about that?
>> Yeah. So we're not there yet. Okay.
>> Currently uh you know like again the the the vision as of currently is anything that can be specified can be proven.
Okay. Now obviously there are people have been really good at you know that's where maybe where that's informal kind of reasoner come in >> right the informal reasoner can and this is I want to kind of you know call the literature of testing like testing are great because testing is like hey have you thought about that right like like I want to highlight the work mutation based you know LM unit test generation by accident CTO Shou and he was a director of Facebook ad research like the way you kind of think about it is like the AI will be like hey have you thought about have you thought about this this this case like and so this is a little bit like conjecture >> so the conjecture is going to help with the specification >> I see >> and then the prover does the proof >> and so this is an interactive process maybe that the person so that when we're actually giving good specific >> I think this is the future of coding yes I think this is the future of coding and I think this is where you know this is where I think even if we are suppose like given the assumption that everything can be formally verified you know like studying sort of like you know automatic task generation is still interesting because it it is basically giving you the specification proposal.
Yeah.
>> Right. And then another thing is let's talk about all the formalization right which is the ability to to define it. It is kind of conver uh converting something that is more more informal uh into a into something that is more formal or the formalization. Um so suppose I have a coding problem that is written for ICPC and this problem is written in English like Alice and Bob blah blah blah. Okay. Now I want to convert that into a formal statement like a formal spec. How do I do the auto formalization step right now? This is going to be how because I have not solved the problem yet. So I don't have any signal. I don't have any grounding. The test cases input output pair is going to ground my formost.
>> So I know I have to know I'm going to give this input. I'm going to give this output. It has to have these characteristics. And so and so I write test cases and I write a so is there equivalent in lean of this right where the specification where you just know the sort of like outcomes that you are expecting so that like you the statement of the the result and then the but the proof is completely un quite annoying because it's like a lot of the times it's proof >> so you don't actually have the numerical answers to ground it.
>> Okay. So autoformmization is a quite quite a hard thing to do.
>> Um because you know what's generally happened is um you can't you just it's hard to ground the auto formalization of a statement. You can obviously ground the automization of a proof >> but because you can then just run it >> but you need human to eyeball it.
>> How big is a lean proof of like a formalized you know of a formalized program of significant size? I mean do they grow with the size of the program or do they grow super linearly? Yeah, currently actually you know for each line of code written there could be like 20 lines of proof.
>> Okay, >> it's not looking that great.
>> But but is that like a linear relationship or is it as the complexity of the program gets greater then it like it you know sort of also grows so that it's >> a good I don't have a good answer to the scaling law of that.
>> Okay. Yeah. Because I know that that's a problem in formal verification, right?
Where you have these huge pro like you have to have these very very long proofs for even simple.
>> Yeah. So then then do are you going to run into sort of like limitations in in the capabilities of LLMs when you start to get to large larger um >> what what we believe fundamentally is we are building a reasoning engine.
>> Mhm. [clears throat] >> And we have seen a prover deal with really huge trees that are like you know tree of a proof.
>> Okay. Uh we have seen it scale from 40 notes to 4,000 notes.
>> So wait, sorry. Actually improver is the is the LLM.
>> Action improver is a ensemble system of multiple models that we do post training.
>> I see. Okay.
>> And also it also includes obviously the tools that AXO that we have um open released. Sorry.
>> Yeah. In other words, yeah. So so we have seen it being able to deal with more and more complex task.
>> I see.
>> We don't think it's partially bound. You could ask, you know, is it bounded at one point on the pre-trained base model?
>> Yeah, >> I think that's a good question. I think, you know, mid-training could be very interesting because it does actually, you know, a lot of the sort of capability gain does come from that part, right? If you could argue that even if you uh try to reinforcement learn some uh person who is not very talented uh that person might behave you be be perform a lot less well than an un unpost trained Raman you can you can you can argue that very very sad reality of things but um so at one point we might consider doing doing that >> then >> but we think there's so much to push >> so you just feel like there's so much overhead right now or so so much um >> space taste to glow that that you're not running into theoretical constraints at this point. I I just wonder because you know there's been recent results in the computational complexity of the problems that LLMs can solve fundamentally and I don't think that they're really a concern for >> you know when I'm writing code with cloud code but I can imagine problems becoming big enough in a system like this where you have a gazillion lines of lean you can't get them get them into the context window so you have to like be smart about that and then you have to summarize and then you're summarizing and summarizing and pretty soon are like kind of losing track of what's going on and it just seems like with a large very large system like that you might run into.
>> Yeah, I think this is this is interesting. It's always a problem of abundance. So simple you just like keep really the the mathematical discovery renaissance has come action prover does try to prove everything you end up with like tens of thousand lines of limp proof. So first of all it's auto informalization is a lot easier than auto formalization minus a problem of no grounding right. you know, every every model has seen a lot of text and a lot of lean. So, you can always, you know, convert that lane back into back into informal and then there's the problem of well, how do you know if you're correct or not? You can rely on cyclic like consistency. So, you then formalize again and like prove like program equivalent something like that. So, that's >> Oh, so you you like informalize and then formalize you can use it to make sure that you still use Yeah. Yeah. like and although informalization is you know obviously less hard a problem so you can always do that. So for a lot of the you know the the link code that we output we can have an informal summarizer of of like big chunks of l is actually doing okay so you know that's that's a that's a thing and there's have another question of like which I think is very interesting is I think there's a panel at um ICML Vancouver last year um at AI for mass workshop there's like Leo Deora and Java Jere Jeremy Aigod and um Shoubo and CTO was there and they were talking about like will will humans or mathematicians at some point stop trying to understand what's going on there right because like >> suppose you're a really ambitious mathematician you're like I want to proof read my hypothesis and bang here's a limp proof and like it's actually correct and it's just like you know problem 1 million lines. Um yeah, isn't that like a big negative for the community because I mean usually when someone comes up with a big proof of something um often times it process I was about to get there right it's like well will will that negative outcome happen was a question the panel was discussing it's completely hypothetical no one's no one's like you know model system can can prove my hypothesis right so the disclaimer please please don't cut that part [laughter] just stand alone um but like you know um well people still trying to try to understand what's going on and I I think the answer is usually is is always yes. I think curiosity and the the desire to understand what is going on you know um mathematically or in other domains as well. It's a basic human need and I think that is like I think a dose of optimism in an era of I think verified super intelligence suppose we get there is that even even if all the outputs are going to be produced and at a much you know faster pace and much more exponential volume compared to what humans could possibly consume they're still going to try to consume it and they're still going to try to consume the ones that they deem important. So then basically attention is the bottleneck and if attention is a bottleneck then really intuition and taste uh you know of which statement is probably worth worth the consumption of human and also maybe in a finite computer resource worth worth the consumption worth the the sort of spending of compute resources that's where human mathematicians taste will always guide us and I think that's incredibly beautiful >> is it worth like internally taking like results So you can prove one way and then trying to send your system at many different routes to get like or like orthogonal conceptually orthogonal proofs and so you kind of get a diverse set of different ways of you reasoning about the same thing because you know I think it could be very valuable if you give it a problem to say oh well like here's kind of the brute force natural way that like maybe some humans would do it. Um and then the uh there's like a really much shorter elegant way of doing it. So have you essentially thought about training your models to be elegant in some ways? Yeah, at one point we're going to get to there because you know I think the conjecture uh will probably depend on what what you know will probably depend on what we mean by taste elegance feels like an alignment problem to me you know like you know who who gets to say what is elegant humans get to say what is elegant right that's what makes human right there's something about hard work right that what what you work on hard is what you're going to be good at >> yeah yeah and we're going to have a problem about that I think like pretty much in a lot of the domains as well, right? Not just math. Like how do you be that senior um programmer with you know really good high level understanding?
Well, I guess full stack understanding high level and low level >> if you haven't spent the year of training.
>> I mean I would argue that you don't this is very philosophical but like you know I I don't need to be good at assembly language programming right like no not many people are good at that. A few people are because it's important for their job.
>> It's not experience but curiosity.
>> Yeah. So, but but it feels to me a little different because not being good at like proving things for example, right? That seems like a fundamental gap in like that maybe my mind doesn't develop in the same way if I am not doing that. Whereas if I'm just not good at assembly language program well, but I'm good at like higher level programming. So maybe that doesn't matter. I think that's probably because how the maybe how the education system the pipeline works which is that if you do not show early signs of brilliance you don't sometimes go through the process of pre-training >> in math.
>> Yeah. Yeah. [laughter] >> Right. Like so so that maybe you can argue that you don't need to say you know learn everything to develop a sense of taste but there's like a threshold you kind of need to meet.
>> Yeah. So for example, you probably need to be able to code even if you don't need to understand assembly language and that thing might transfer my intuition or you know my my [clears throat] intuition might transfer from the Olymp I tried to pursue and comics transfer is more direct.
um it's very similar and number theory could be further but still okay and then when it gets to like something that's a lot more different than Olympian mass transfer is that strong but kind of like you know you need to diligent as you said right like you need to diligently go through some amount of training [snorts] >> and if people over rely on strong AI and that doesn't happen >> I want to switch gears >> you mentioned uh software verification what are the domains how are you going to make uh enough money to justify the valuation that like and congratulations by the way.
>> Thank you.
>> How so what what's the give us the the high level summary of like what is the what is the vision that you show you put in front of investors about why does this actually make a lot of money?
>> Yeah. So um first of all this round is kind of preemptive. So it's uh I think a lot of the investors have pretty high interest about about Axium. Um in terms of kind of what we believe in, we believe the future of coding is going to be somewhat constrained by verification capability >> and we believe in solving formal mass is a very natural starting point and then by extension you can increase the verification capability across hardware and software and for hardware for example that's quite revolutionary. I mean that is there is no as we know there's no partial credit for a mostly verified GPU.
>> No >> uh [laughter] >> it's all or nothing. It is all or nothing and you do and you do need a perfect prover like I want to stretch that which stress this point which is that suppose I am a you know I am someone who loves solving maths. I think there are a lot of Twitter users who enjoy Pokemon like hunting um Erdos problems and then I just try to um you know use a non-deterministic OM uh like GBT say to try to get the full proof for that.
>> Yeah.
>> Now I can do that many many times and I might succeed and I might not and I might not have a problem with whether I actually succeed or not. This absolutely does not work for hardware verification.
So for those kind of domains which I call like hardcore verification needed it is a painoint. It is a current painoint there there there are hundreds of humans and thousands of licenses being dedicated to solve one local grid problem verification. Just as an aside, the my understanding is that the industry standard for design to verification in a asich as project is like 1 to three 1 to >> four to three to four. Correct. Both in say Tim size and then duration. Yeah.
>> Right. So if you multiply that uh yeah square and then I think so so that's that's a I would say like you know it's a it's a must cover. And now for software verification it is it is interesting right because you know as probably we all realize like my nephew vibe codes a lovable website there is absolutely no need to formally verify that piece of code like why would you now I heard of a story from Kats actually that New York Times reporter who um told me the story which is like >> however if you think about like you know in the time of agents like my open claw can probably do all sorts of things and probably can do some bad things. Uh like my open claw can decide to like tag something bad to my professor, right?
Like and and and you can say that perhaps is that a problem of formal verification? Probably still not, right?
You can change something about the action space and make it more limited.
So you don't you don't need to rely on for verification. So you can have a lot of cases but you can think about you know maybe an enterprise that is dealing with a lot of regulatory kind of stuff using agents they might want to do something like it it's their choice but I will argue that the improvement of verification capability both in latency you know and in accuracy all these stuff the performance holistically is going to determine whether people rely on formal verification or not.
>> Sure. So in a way we want to make it so good that basically we can make that a choice. So, so why did the investors think that you could do this, right?
Because I mean people have been working on verification for so long and I think everyone agrees it's an important problem and it and I think certainly if I can just have a verification proof for every program that I write like hey Claude like give me the proof also and then it just produces it and oh yep looks good to me. I would absolutely do that. But so why is it what was it that the investors saw in your opinion that persuaded them that okay this is the moment I'm gonna put in my 200 million or whatever?
>> I think um when it comes to faith you either have it or you don't. So you either dream the dream with us or you don't and that's okay because when we realize the dream the company is going to be worth 10 billion.
>> Yeah. So I think that's kind of the the feeling that I have which is that we believe verification is the critical critical part to super intelligence. Our version of super intelligence is absolutely verified.
>> We don't think there's any other possible future. We do not believe that >> I'm going to say on the record we do not believe that an informal mass system is going to be the mass AGI solution. Why not?
>> We just don't believe that.
>> I mean, the counterargument is, oh, you know, like we just do a lot of good RL and you know, we've seen uh GPT, you know, solving, you know, I think some problem and like whatever. So why do you think that that runs out of gas?
>> Yeah. So you can say that if you are a frontier math and you have like so sorry frontier lab and you have like infinite resources why does there is by definition no running out of gas right if you think like infinite means like there's no running out of gas I don't think it's going to scale to super intelligence >> so you think that you run out like you run out of money basically or run out of power >> sure so we as a startup first of all cannot do that we first of all as a startup cannot do that >> but we generally think that formal mass then by sort of converting mass proofs to programs to code give us much better performance. So, so it's just it's your sample efficiency argument and so forth that you just and maybe you just can't that that you can't bend the curb enough if you don't use formal. The thing is the the thing is the informal stuff is also available to us in a way if you really like you can have a both informal and formal system and that is going to be >> I see I see >> very strong the thing that I kind of like I think my my suspicion about like you know whether we can scale to mass AGI just by the informal approach is you're going to keep having you know the LMS judges solution or you have human experts who grade And it's just human experts like doesn't scale that well.
And then if you really argue infinite infinity then sure then you also have infinite money and you can pay infinite there there's so many is there really infinite number of people who can understand and prove at say like about like you know a result a nonual result in langland's program I think you know good luck finding those people and in fact I think how frontier maths came to came came together is because they couldn't assemble a a a benchmark by their expert pool so they have to you know collaborate with epoch to do it right and and I think that's kind of what what I worry about about qu having the human part. So they have LMS judges and then now stocastic judging. The problem is like whether something is impossible to achieve versus something is incredibly expensive and like really incredibly expensive and incredibly expensive to achieve get kind of like mixed in the end. And then of course investors always want to know why you, right? So I've read a little bit about your background and I think it we would do a disservice to the audience if we didn't hear a little bit just about your personal story.
>> I see.
>> Do you want to talk just a little bit about like you've you've done some really interesting stuff. So I I'd love to hear like you and then your team.
Yeah.
>> What what what makes Axium special?
>> Yeah, I think Axium is like very special because they're really expert mathematicians. Basically they are users of the system we are developing and that iteration loop is very fast. It is extremely it is extremely fast. You have like some of the strongest you know mathematicians and both in research and Olympia contest and you also have people who are um you know mass li contributors, maintainers, developers um lingurus really and u combine them with people who come from like applied ML um really strong organizations like Metafare um and golden age um as well as people who have codegen expertise who work with like compiler like Colonel Jan um have kind of these backgrounds of people together. I think that sort of interdisciplinary way of thinking about things quite quite helpful. We think AI for math has traditionally been quite interdisciplinary. People are borrowing techniques from even AI for science.
pure tech um borrowing tech techniques from from the code gen literature and people are borrowing techniques from obviously the broader like you know frontier like applied ML um to try to apply on the niche problem AI for math.
So we also think having this sort of very special special team is a is a differentiation. Uh we also think that you know as you say there's no no permanent mode. Um the proprietary data that we generate um and a little bit of a flywheel we are seeing is a time mode.
Well me personally uh I I love math. I think, you know, I I kind of have been doing math since I was very young and like math sometimes gets really hard when when the problem you are solving are are just a little bit out of reach and it gets a bit depressing and times to times I wonder if I can just have an AI help me. [laughter] Uh and uh and yeah, I think why I figure why not build such a thing?
>> You did a master's at Oxford in neuroscience.
>> Has that informed your thinking here?
That's a great question. I think like my my my you know experience with neuroscience is you you learn very well but what's hard? [laughter] What's impossible?
I mean it's very interesting. I think that year of neuroscience like give me some feelings about what's hard and almost no feeling about what might work.
So, but I think I was kind of under the pretense of neuroscience like hanging out at the UCL Gatsby Institute and was fortunate to do AI research with some really cool faculties and so I think that was a very productive year of AI study >> non- neural study. You >> so you're it was mostly for you studying AI.
>> That's right. That's right. I think in the UK if you um back back in the you know um 20th century if you call something AI you would not get the donation but if you call something brain science you might have a chance. So, so the UCL Gatsby which is a premier AI hub where a lot of people actually go um you know from their to deep mind including Deise himself uh it's a very wonderful research environment. Uh I remember those kind of like tea time talks were very amazing and and people were basically just doing AI. Uh it's called the Gatsby computational neuroscience institute.
>> Yeah. I think how how that kind of you know happened was because so I was I was in the master of neuroscience program and then um quickly realized that you need to like kill rats and um kind of don't want to do that and computational neuroscience sounds more appealing and and when you look at the project and you see like transformer you're like you absolutely want to do that.
>> Yeah, [clears throat] >> we're we're all excited about that.
[laughter] >> So so after the Gatsby uh you started a math PhD program at Stanford. I started actually one year full full-time at the law school. Oh.
>> Because the JD PhD program structured in a way where you have to spend full one full residency year. So that was also a very fun year um of learning things that like are just quite fascinating like criminal law looking at homicide cases.
Exciting. No. Um >> do you ever feel like the legal system is under or oversp specified in some way that maybe you could um you could access and improve?
>> That's a that's a great question. I think for a lot of things it's definitely underspecified. Um for some other things I was actually quite excited about sort of transfer learning from mathematical reasoning to those specific fields. I think appella litigation the legal gymnastics you see some really good appellet scholars and lawyers that just come from mass training. not many but like Lawrence Tribe for one you know Harvard law professor um one of the you know strongest like um you know appallet litigation and SCOD's briefs like brains on on the left uh Democratic party um and uh uh and I think there's a lot of other domains such as antitrust that's incredibly flowcharty contract law sometimes also flowarty bankruptcy tax u more on the corporate side I I I just love litigation side [laughter] I mean >> yeah so actually I do just just because we're talking about litigation it's not the same thing but there was a there was a Erdos problem that that Axiom saw I don't know if it was Axium prover or whatever is that right there was a controversy about it because it had represented that it had solved the problem when in fact the proof had been it had discovered the proof and then just formalized it.
>> Yeah. So actually what happened was our competitor harmonic decided to publicize that they have solved uh unsolved problems um Erdog number uh 124 and 481 and then we trusted their literature review believing that these problems are really truly unsolved and we were really young company at the time. We wanted to test if our system can attempt to try the problems that our competitor can. We fully did not expect that actually solved them but um turns out that we were both wrong that in fact the problem has been solved before. I see. So then >> it's not the only time that we relied on others literature uh uh literature search and you know we we should own it.
Um the other time was this paper called dead ends in squaref free walks. Um you know professor um uh uh professor Miller um have this problem that actually turns out to have been solved. But um we we I mean we really should have done our part. that is that is you know >> the point I'm trying to maybe elicit is um not not like you guys did something wrong but rather >> you know there's this like Japanese like um advertisement of like a whole company like hundreds and thousands of people like apologizing in the in the advertisement and it's like you know sorry we raised our price by like 5 cents and that's the advertisement I was like thinking that maybe I should just do that [laughter] >> it's so it's so embarrassing >> no but I think that the question of providence of information and sort of like how do you it goes back to the question I was asking before about like how do I how am I connecting the answer to the question >> yeah this is a great question I think after Erdos we're like extremely careful and so we kind of like you know we didn't really look at the other Erdos problems I believe that harmonics still continue to claim they have solved Erdos problem that might might not I don't know uh it's you know there's a I think Terrence Tao and a lot of other people have a database all the Erdos problems and the status I think you know like it is really by the way like it's a really easy mistake to make because there are so many EDOS problems that actually have been solved right and I think that that's kind of indeed I think like you know search and retrieval is a is a is a hard problem like you don't know if that argument or an equivalent version of that in fact I think the most interesting part about that entire database is um there are a lot of problems that are not directly solve solved but can be just an very is the extension almost a trivial extension of another result that has been solved or not sometimes not even resolved sometimes I think in this um dead end square free walks case which is nothing to do harmonic complet um that we actually didn't realize and then professor um conander actually pointed to us and to professor Miller is that it was actually from a stack mass overflow or stack overflow post >> um like a user pointed out that there's a 1936 six results. It's fascinating. I think it's hard to hard to find out why search is a hard problem.
>> I guess that means that you do does the the conjecture engine or whatever does that does that use search as part of its process or is that something that you kind of you you the human does and then feeds? I think I think knowledge graph or knowledge base is a very you know important component of any any company.
>> Yeah.
>> Uh and I think I don't think it's talked about enough.
>> And so and and you guys with that it sounds like you don't want to give us too many details but like so you guys have a knowledge graph. I mean that brings up also I I read somewhere that you guys have a really massive database of lean proofs that you've generated. So synthetic data in in some sense but that the and this may maybe is a competitive advantage for you.
>> I think I think everyone is trying to accumulate like a data which is not a mode it's just time and time mode.
>> Yeah.
>> Yeah. It's all it's all it's all about like you know whether you can execute fast enough um to make sure that you have like a certain buffer um because of say your data set you know accumulation but that is only just a buffer. Have you ever thought about doing something like an alpha zero for math where you start from nothing and let it just make up axioms and see what happens?
>> Ah this is a wonderful question. I think that's a very interesting approach actually. Yeah, I think we we believe in something which is that like you know suppose um action improver can be a really strong mathematician and then really the the thing that it is proving every day should hopefully help it improve right I think this sort of self-improvement design is extremely valuable >> um and I think there are um other people in the AI for mass community I think uh professor Gabriel Pesra's work is very interesting um I think there are some of the um kind of more conjecturing type of exploration Um suppose we just kind of change um you know a lot of the there are specific things you can do in in certain in certain ways that that can try to see if your system can learn to contracture and build theories. I think that the the topic is really interesting and important because it really you're claiming that the to get to super intelligence.
>> There's sort of this like it's just not going to be possible.
Maybe if you had infinite resources, you could just RL and it would work maybe.
But the reality is is that you just can't be sample efficient enough or whatever it is to do that. So that you need some sort of verifier in the loop with the inference process rather than because you do have verifiers and like sort of during the training process and you just don't have them during the inference process.
>> Yeah, I think a lot of them are just secretly like trying to use this to ground their reasoning.
>> Yes. As well. I mean, I would I was surprised that that like when when 01 was, you know, everyone knew 01 was coming, but it did hadn't come out. I was sure they're going to announce that they're using lean to to do like formal verification of proofs and actually generate proofs and then verify them so that they're grounding and reasoning. I mean, that was my >> when I was there, there was 3PF that was a great piece of work. There's also mini F2F. These are all formal mass work at OpenAI. Okay. So presumably those guys are doing something.
>> No, no, they all left.
>> Oh, they all left.
>> So that's my point which is that if you're like, you know, an intern, I guess you can't be an intern forever. So let's say you're like a junior, you know, like member technical staff and you want to work on something for like as long as it takes to solve it.
Weirdly, people think about startup as this sort of your runway can just run out and it can just like all fall apart thing. you might have a better chance of staying focused on the same problem for as long as it takes at say a startup like Axium or one of the other new labs.
>> Yeah. If you're aligned to the mission of the big company rather than like somebody decided that what you're doing is no longer >> Yeah. Yeah. It can be your VP lost some political fight and so Yeah.
>> Yeah. Absolutely. So >> no obviously if we succeed then they're all going to you know start doing that again.
>> Yes. And then like I guess as a talent then there are more like you know potential places to choose from as well.
>> Yeah. So then your job is to go fast. So they they they're they're struggling. So actually you um we haven't talked about it but you actually also just released an um an API for doing lean verification.
>> And um I actually tried it with cloud code um because it's easier than setting up uh you know your own lean um tool chain. Yeah.
>> Um and you know like tried to get lean to proof some stuff.
>> Um and the infrastructure is is maybe non-trivial especially at scale. So you want to talk a little bit?
>> Yeah. Yeah. So we just released Axel AXL stands for Axium LIN engine and uh it's really a set of kind of proof validation and manipulation tools that are built for lin in the language of lin. So uh it's a bunch of meta programming tools.
Now meta programming talents are are extremely I think like you know hard to find and we're so grateful to have like a really crack team working on that and we want to kind of like release it to the community for um to use for free because we think that there are probably other people doing also like large scale lane operations and these tools going to make their stuff go a lot more robust and faster and do so at scale. Um and AXO is currently I think 14 uh like such tools starting from verify proof which is the sort of to make sure that there's not nothing weird um you know going on like no no sort of cheating by by link code you don't aim something out you know we don't you don't assume weird things if you ask n plus n equals n you can pro to prove 2 plus 2 equals 2 which you're you know for sure not that's not the right answer. Um there are also like you know a lot of other kind of generation tools. Um for example you can try like different repair attempts. So you know broken lane in and then good lane out. Uh and you know there are like currently you know other repair methods by LM. So hopefully this what we provide can be just a lot cheaper and uh more kind of you know straightforward and it's just you know I think strong strong and better engineering can can get you to to a place that's quite far. A lot of the people from the link community has been using Axel even if it's just been a week to do all sorts of different interesting things. We have seen uh people from the kind of blockchain community use it to to do interesting things as actor and we have seen also we have heard from a lot of the people that claude plus axel is kind of their go-to setup um for for now. Um we think that these are really interesting tools. I think famously I think um today there's this mathematician who said he formalized the Donald news you know using cloud to prove I think a result um a Ramsey Ramsey result and then to formalize the the the limp proof and then um that is also using AXO tool so we are really glad to see people kind of already using it >> I mean I feel like this is a great opportunity for the collaborations that Terrence Tao was talking about as well where once people have access to the common tools and it becomes easy to do and I mean like if if you have a intuition even uh not a strong ma mathematician like myself you might be able to participate in the you know sort of like an effort to prove a larger theorem or something like that.
>> Yeah, I think that's that's very interesting like view which is that like if you think about like mathematics has been not like as collaborative as software engineering. You don't have like hundreds and thousands of people working on something together. I think poly mass was an instance when when that happened that was fantastic. So if you have a lot of really good sort of setup indeed like commoditized kind of access and people can all participate in in fact that's how I think some of the large formalization projects have been done uh things are divided into subtask but really the blueprint writing process by say Terren Tao and Alex Condonderage of assigning the task to different people and how things kind of fit together that blueprint writing part is extremely important and there has been I think result about sphere packing I think by um one of the other companies out there and the blueprint part for um the A dimension is still pretty much built on what the sphere packing uh community the link community the humans blueprint and similar with some of their other results as well the blueprint part has still been human generated and I think autogenerated blueprint is going to be a technical bottleneck that many people are trying to solve around the same time >> so is there value in me as a you know cloud code user trying to attempt part like some small lema or whatever um where I don't have a great understanding of the math maybe I have a high level understanding >> depends are you trying to formalize or are you trying to prove >> uh to prove new things that's a good point yeah so maybe form you would obviously probably start with formalization right you know the proof and you just can't get nobody has been able to get the formalization correct >> I do actually have seen people use link and formalization and they try to do it by hand you know not using and AI as a way to learn mathematics. No, it's you know it's auto formalization. You don't have that process. Well, it's interesting because I think a lot of the uh my friends who started you know working on lyn and mass li was because they are in PhD and this problem's really hard. we get stuck all the time and we want to kind of review some of the undergrad classes a time where we still understand what the math was about and we do so by you know doing lean and I think that's that's very beautiful >> the material >> yeah but if you have for example like you know um access to action proer that also can formalize all the formalized things and you don't have you lose that part of the learning process >> yeah but I do think that you know like for for for you know you and I we can set up like Axel and try to like see, you know, what what results we might be able to prove. And I think that's quite interesting. And thanks to Axel sort of making the speed a lot faster. You don't have to wait very long.
>> I remember the pandm exam day. We were all like in the war room. It was a Saturday. We're all really excited and we just got the exam paper from the official um like organization, the proctor of the pandemic exam. We just like were looking at like how like how much workout Axel is like getting. And without it we couldn't have solved it with um I think the eight problems within the time limit that definitely not not within the time limit. And I think one thing about these tools is like it's very interesting in that potentially you can have interesting reward for RL as well.
>> What do you mean by that?
>> So for example verify proof can be a a reward for just basically a proof is completely correct and validated. I see.
>> I think formal verification tooling can be interesting direction to pursue with RL.
>> Yeah. So you mean um for example you formalize auto formalize the informal proof and then verify what and then use that as a reward or do you mean?
>> No, as in like you pass like limb programs in these formal tools, right?
like and you will have some sort of score.
>> Okay. Yeah. I think if I were to build 01 or something, I would have in my mind I would have use what I just described.
But you're saying just to learn how to do lean. So, so the value proposition which is interesting about frontier lab is that suppose you are a toc business then sure you you can just not do what we are doing and we have since for example deepseek like originally having a formal team and then later dissolve that team because of strategic direction change that's all completely reasonable now suppose you are focused on coding right and you have talent who want to work on what we're doing, >> it makes a lot more sense for you to do code generation, further your strength and mode. Yeah, >> you can partner with Axium. Um just like how for example, Frontier Labs um partner with uh startups that work on search such as Excel and parallel, right? Just call Excel API for search and potentially, you know, if you're Frontier Lab, I think you should call Axium API for verification.
>> Yes.
>> Um [clears throat] better preposition, but >> spitting up your own. It doesn't make sense. I mean, it just, you know, potentially, um, I think the talent, the the finicky of lean, um, the sort of data code, like like, you know, there's there's no reason to.
>> Yeah. I mean, it took me five minutes to set up. [laughter] >> Why did you decide to start a >> Right. Why did I decide? You were a grad student at Stanford and you know, in math.
>> Yeah.
>> So, what made you decide to >> I wasn't in math for a while at all. I was I was uh I think like almost as soon as I started the PhD, I just started fundraising. So it wasn't like >> Oh, really?
>> Yeah.
>> Was that the plan or did you did you start there and you're like almost immediately realized that this is >> Right. Right. So So the year of law school, right, it was very very interesting to me like on an intellectual level. But it's also this the first year where I had no science, technology, math whatsoever in my life.
>> It's a weird year, right? Like I'm I'm reading a lot. I'm I'm practicing. Well, I'm learning how to write. I'm learning how to read like >> and but like I'm I'm just kind of I want to like be obsessed about something in technology. Like that was also what's going on that year. So yeah, the year of law school, right? And and it was very very interesting to me because it's like okay like I just I need to be obsessed with like a technical thing cuz otherwise I get to I don't think I'm bored because I really love like everything about about law. I really really loved it. It was it was something that's incredibly interesting to study.
Um but I just I I mean I've been basically like you know very excited about like the progress of reasoning. I was looking at a lot of the post- training kind of papers. I was I was learning all of these like just by myself. Um and then at one point it got to a point where I'm like I think this is for sure happening.
And like I think talking to Shou right at at Ver like every weekend also like it didn't help like soothing this thought. So I got more and more obsessed >> and at a point I'm like okay if I'm doing this like literally every minute and I can't think about something else like you know I need to do something about it. I mean it's like you I fall madly in love with the idea that AI is going to do math and like okay now do I do I do math? like I it's really really crazy like at a time where I remember the obsession was quite I just couldn't get out of it and then um I went to this Nihennessy event scholar dining house like host all sorts of like free lunch events and those are great because you get free food and you get interesting intellectual exposure to things and I remember Julie Draw who was I think a Facebook um first Facebook PM came to speak and then after that I just like basically walked up to her and I said like uh like what do you do if you want to do a startup and you really wanted to do academia because you you kind of love math. And then she's like, well, you know, what's your time spent on these two different things? And I'm like, 100% 0%. And then she's like, well, you kind of have to follow your energy.
>> Yeah. I mean, if you're if you are completely obsessed with it.
>> Yeah. I was completely obsessed with it, but I thought [clears throat] it's going to be big. And I thought like it it just it just has to be a for-profit startup because like it's so much broader than making mathematical breakthroughs.
If [clears throat] you think about like recursive self-improvement and like really the the kind of more high level like concept of like you really want to have this AI AI scientist like the mass reason is going to be is going to be a pretty big part of it.
Yeah.
>> And now trying like I think the the sort of belief by by cursor and cla and other folks is like okay like just like mass transfer to coding coding transfer to mass as well. I think that's true. It's just that like you know why why not push it directly. I don't I don't get it. You need to push that directly.
>> And then there's this other like you know thought which is that and maybe kind of going back to the collaboration point right. Um verification has traditionally been thought of as okay well there are some industry where there's a lot of guard rails. So if you're working in defense, military use, okay, you need to like basically satisfy a lot of barriers to entry to meet those stringent like requirements. So it's it's something that's verification is for the industries that are closed but it's for the first time now I think verified AI is to open up collaboration either it's human AI collaboration well before blueprinting that's human human collaboration and lyn was a grounding was a verification formal language and then human AI collaboration like we're seeing now future AI agent agent agent like collaboration so like I think verified AI is for openness it's not for meeting the requirements of closed industries And I think just like I think verification should not be about oh I remember like you know there's article like chatbots mixed stuff up is AI the solution to sorry it's math solution to hallucination. Verification to me is not about lousiness. Verification to me is about scaling brilliance compounding brilliance. It's like just kind of going back to the collaboration point. It's about ramen being a much stronger mathematician. He was already a really strong one but verification helps him extend the brilliance like both kind of like scale up and scale out. So verification >> rigorous >> verification to me is not about you know like erasing the mistakes the lousiness about scaling brilliance and and the third point is that like verification to me is um not about like the sort of you know just talking about rigor it's actually about performance gain right it's not just about the stringent requirements the hurdles that you need to overcome it is about like actual verified generation is going to make it so much better and I think like kind of these three points I think the last point is that a lot of the people think that you work on verification because of your distrust for technology. Like it sells really well to I think the general p public including like my parents like oh why we're doing verification because like you know technology make mistakes it's no we don't think verification is based on is because of the distrust for technology. It's because that's what like um expected rapid exponential scale up and and um the deployment and the creation of technology and technological progress is what that compels and demands. It's a very mathematical perspective, right? Because you're saying proofs are proofs are drive math, right? A lot of math is based is is about proofs. Yeah. And math drives a lot of science and innovation in the world and the innovations in math drive innovation in the world. So that >> but it doesn't need to even go through like in terms of you know the solve everything like obviously stands like my point is like transfer learning doesn't like transfer learning is about like pushing math math reasoning. just so so there are kind of I guess like there a couple narratives here like for some people it's like you you solve math and then maths are the you know fundamentals of sciences so that's actually the from AI for math like take this radical layer of AI for science is that narrative we actually believe in just like general transfer learning like I think Axiom is Axiom is on the infrastructure stack >> and you think that this is just a first step to you know basically unlocking capabilities in many domains in science and law for example.
>> Yes. I I think it's so so again there are like you know multiple multiple kind of like beliefs. One belief is that there's math and there is like you know formal the power formal verification.
Suppose we actually you know solve math and have a really strong informal math reasoning [clears throat] engine. We do not expect that term to be as large as solving math through the formal way.
>> Why?
>> I mean code as as it is language but it is indeed on the more structured end.
>> Yes.
>> It bridges informal and formal.
>> Yes.
>> What we are doing is it's not informal versus formal right. We're not taking the sort of like completely formal of a proof approach like it's in it's bridging between informal and formal. It is bridging between high level and low level. It is a direct sort of like a direct improvement through reasoning um through trans like transfer learning and it's also indirect in that like okay well like math is going to unlock little science and sure um and that is really what we're seeing.
>> So you think that it enables transfer learning?
>> Yeah, >> I see.
>> I think that is that is pretty much a consensus.
>> I think it is a consensus and this is a bet that has been pretty much kind of overlooked by others because math sounds pure and it doesn't sound like there's any commercial value. Mhm.
>> Well, I do obviously understand the opportunity like the opportunity cost if you're like a really like a frontier lab of of solving this problem, but I definitely think this is a problem that if you're like a well resource startup, you should be doing.
>> That's an interesting Yeah.
>> perspective. Did you get everything out that you wanted?
>> Yeah, I think I think it's um like you know like the question of like is math or is verification? The DNA of the company is math. We think best like verification is the best first market.
>> Yeah. And we think that sort of like solving math and especially like formal math um is going to like help us like tackle the really ambitious quest of verified AI. Now when we are done with that we might have other that second markets including a for science we just talked about but but on the theoretical layer right like I think real world testing is important and potentially we can stay in the digital world and soft software stuff and for other things to be to be to be getting reward like physical word signals >> but do you think that that that the sort of the capability of doing really powerful reasoning >> once you have that powerful verified reasoning engine that that's the moment when okay now we've unlocked that for you know software verification and hardware or whatever but okay so now what about biology what about chemistry >> so that could be one then the other one is then like really how far are you to recursive self-improvement >> okay so just AGI >> yeah I think there is this sort of question and different people because of their probably different backgrounds have different um it's it's really where your energy and your passion leads you like for some people actually I have heard this actually you know with my friends they want to work on AGI because they believe solve AGI solve death there are other people who come from a more like medicine background they really believe they can solve death and they don't solve AGI and then solve death >> they just solve like AI for science >> now which way is correct I don't know >> and so the recursive self-improvement angle it sounds to me like you're saying that the combination of verification plus the sort of like language which is informal. It's that combination that enables really good recursive.
>> I think recursive self-improvement is going to happen anyways. We're trying to have like formal verification earnest place. [snorts] So we we like again the whether um formal verification can be welcomed and deployed and become a consensus depends on how well we execute [snorts] and I think when you boil down that problem into an execution problem you should just go for it. what looking forward what's the biggest bottleneck that you see in the field for both atom and maybe just the field abroad in terms of >> uh fragmentation so um I think we're in a market where people like to start like you know a thousand people they don't join force they start a thousand things I think that's actually the biggest like kind of bubble indicator I think there are categorical bubble and there are like other categories where there are moonshots it's not bubble it just looks a little bubbly in the field if people who are like really like of really legit, you know, backgrounds decides to join force and work in a team for the mission rather than for ego for kind of the status new lab founder. I think that categories I'm I'm really bullish other and vice versa. So I think the bottleneck actually is about potentially I think I think it's it's it's annoying because it's like we are in a if if you believe we are in a in an age of um research if you believe in like deep tax are the interesting directions to go after um [clears throat] the market sort of conditions currently is good and bad and that good it enables these sort of long-term long horizon bets to be funded bad because there's too much noise in the market and some other like irrational players. Um, you know, we we try to work with really incredible venture firms like they are the partners. They are our intellectual partners and there's a lot of alignment and we really bounce like very uh cool ideas technical and non-technical each other like for long long hours and and you know we spend like a lot of time off work and weekend together to really intensely build the company. There are also other people who just want to like park like capital somewhere and you know while we don't work with them these encourage these are market conditions that encourage fragmentation and when things get fragmented like no one gets there like I think every category regardless of how how right the idea is it's pretty much in a sort of earning the right to exist stage and if that is the case then um and for example great DTAC company Space X and people do actually join force to work on that dream and potentially in that case also a very charismatic founder. I think a really kind of concerning thing for me personally is that for other probably some categories that I'm personally quite bullish about their action about and just like looking at things generally fragmentation is a problem like the sort of you know we see stop pulling professors from university to um work on something when it really it really is a really interesting kind of situation. Maybe this is a naive naive question, but like right now when you were talking about players in let's say AI for math where you know you harmonic yeah >> and then you know the big labs right am I missing someone is like is that actually fragmented really?
>> I guess fragmentation I think is a bottleneck for the entire like AI landscape.
>> Okay. Yeah. I think AI for math is a category that is actually not a bubble because it is not fragmented because people who are really amazing talents do like to join force. So for example the fact to get K can uno and Fr Charton on one team like this is fantastic like you have someone who's a core contributor Frontier math tier 4 really great benchmark setter Franis who's on the AI for mass discovery have proving and discovery they work together then you are suddenly a player with both proven capability and construction capability and that's fantastic and I believe you know as you said like harmonic probably also have some really great talents like joining force together I think afro mass is a is a good category because of the absence of fragmentation but even you know from from our perspective the sort of for example um you know RL right being I don't I don't think that's like a category per se but you know RL talents currently um it's quite hard to to attract and retain right for for literally everyone and there are a lot of um companies being started then sold like three months later and and just the the each month where you could have worked on a technical problem and you're instead working on deals It's a month that is wasted and I say that like you know also with some amount of pain and suffering because of having having gone through two fundraises. Yes. [laughter] Yes. Yes.
>> Yeah. So so what's the biggest bottleneck in AI for math >> for for a for math?
>> Not Axium but just the community >> the [clears throat] community of AI for math.
>> Yeah. Where is it going? What is the thing that everyone just really wants to break?
>> I expect fragmentation to start to happen as Axium and harmonic establish category leadership.
>> Mhm. [clears throat] So I expect people kind of you know that that's one thing but I also think that another bottleneck could be the pressure of short-term versus long-term. I think that we are doing things in a very sort of fastpaced manner. But that does not mean we can always or it does not mean it is always correct to do things in the most fast-paced manner. like we did things in a fast-paced manner because well we were founded on the day of international mass olympia so we couldn't have competed in that anyway the next mass Olympia is punham and we're quite excited because it's I mean it's undergraduate exam and this year's IMO25 IMO was easy on the MOHS scale and pund could be hard and in fact it was harder than the IMO and the MOHS scale if you look at the AI um you know how much how many scores the AI uh has has retained on average and on the max you difficulty of the problem. Pandm is harder in both both axis. Um so we want want to try and so there's only a gap of four months but it doesn't it doesn't mean I'm always going to set four months goals. If I build a company only setting four months goals I I might build a really short-sighted company. So there are like I think longer horizon problem. I think for example market forces could force other players into trip verification. Well it is possible that co-verification is a holy grail.
It's possible that if you solve that then you also naturally solve trip verification with some amount of like epsilon caveat of like distribution shift but I strongly believe that like a bottleneck like could be the pressure but I think that Axium is fortunate that when we are early enough to we are like a team of just incredibly like um high agency people that our execution generally surpasses expectation but I think like what I think could be a bottleneck for the entire aformat field is that potentially trying to prove commercial value is going to distract significantly from the core capability improvement.
>> Yeah, that makes sense. Cool.
>> Thank you for driving up and coming to see us. I know the traffic was was horrible.
>> Yeah, thank you >> and and um it's been really a pleasure speaking with you and um we look forward to to seeing how things develop.
>> Yeah. Thank you so much. Thank you.
Yeah.
Related Videos
OpenHuman VS Hermes AI: Who Wins?
JulianGoldieSEO
285 views•2026-05-29
Long-Running Agents — Build an Agent That Never Forgets with Google ADK
suryakunju
142 views•2026-05-30
5 Mind Blowing Omni Uses Cases
PaulJLipsky
1K views•2026-06-02
This computer is made from real human brain cells. And you can buy it.
Talktmsmedia
3K views•2026-05-28
BREAKING: Microsoft’s New Image Generating Model Beat Out GPT 1.5 and Nano Banana 2
aimmediahouse
122 views•2026-06-03
I Made the Same Anime Fight Scene in Every AI Video Generator
NobleGooseAnime
295 views•2026-05-30
Nvidia Bets Big On AI PCs | New Chip To Power Windows Laptops | Technology | AI Updates | N18S
cnnnews18
3K views•2026-06-01
I Tested NEW Opus 4.8 on Four Projects (Updated LLM Leaderboard)
AICodingDaily
298 views•2026-05-29











