Anthropic's Mythos model represents a stepping stone in AI development rather than a doomsday scenario, demonstrating that AI can find security vulnerabilities in codebases that human reviewers might miss, but this capability is more about finding bugs at scale rather than discovering entirely new types of vulnerabilities. The model's announcement and subsequent release through Project Glasswing to select partners illustrates how AI companies balance security concerns with practical deployment, while the broader implications for software development include the need for agent harnesses, guardrails, and human oversight to safely integrate AI into development workflows.
Deep Dive
Prerequisite Knowledge
- No data available.
Where to go next
- No data available.
Deep Dive
Anthropic Mythos: Hype, reality and the actual security implicationsAdded:
[music] [music] All right. Hello and welcome to the Thought Works [music] Technology podcast. I am one of your hosts, Nate Shuda. Best way to describe me is architect as a service. And I am here with my good friend, my colleague, my sometime co-conspirator, Chris Kramer.
>> Hey, thanks so much, Nate. I'm Chris Kramer. I'm a AI and machine learning leader at ThoughtWorks.
>> Outstanding. Well, thank you for taking some time to chat with me here today. I I've been wanting to pick your brain about this since the announcement came out. So, let's just dive right into it.
What do you think about Mythos? Is this is this finally going to be the the AI that that destroys humanity? Is it just a step-wise change? You We got a lot of things I want to dig into, but let's just start there by setting the table on mythos. Your thoughts and and we'll see where that gets us. Yeah, I I you know, the community has been very interested in uh what's going on with mythos obviously and it is not the doomsday AI in my opinion. Uh it is a stepping stone and very much a representation of the scaling limits we are now uh reaching with the uh LLM topology. Yeah, I mean it it feels a little bit to me like like things have slowed down a little bit in that regards and that that there was this period where a new model would come out and it's like wow this just absolutely blows away everything we had before and as is always the case in technology that the pace has definitely slowed and and from what I can tell there's definitely an improvement. Now, for those that that maybe haven't been paying as close attention, when mythos was announced, Anthropic came out and said, "Whoa, this is so dangerous that we need to contain it." And then they announced this project Glass Wing, which was basically a collection of technology companies that were given access to it so that they could explore their codebase, make sure there weren't, you know, very glaring zeroday CVE type things. And we have started to see some announcements come out of that. I know Misilla just announced I think it was 271 bugs that they they fixed and so it it is clearly finding things that that have been there in some cases for 10 years, 20 years. Although I'm very curious as to if if that's something unique to Mythos or if that's something that older models with similar prompts also would have found.
Going back to one thing you said, this recent article I saw that a Discord chat I believe somehow had access to Mythos.
Um, >> yes, >> they obviously didn't run Mythos on itself. Yeah. So, I would say that given enough time and the right agentic harness, these are probably faults that another model could find. [snorts] But there is a lot of ability tied up into kind of the hidden state of these large language models that obviously Anthropic has unlocked some of that potential without needing a massive upgrade in chips. And we we can certainly dive into what we suspect is going behind the scenes with Mythos.
Yeah, there's there's a couple things I want to pull on there, but I I I do want to start with with the CVEEs. And so, it is clear that that Mythos is finding bugs and and I I think that's that's not disputed. The question that I would have is if if Mythos found, let's [clears throat] say, 10 bugs in your code and and half of them are like real legitimate bugs that need to be fixed, you know, not the kind of, well, gee, if you prop your front door open, turns out anybody can walk in. Well, yeah, we kind of knew that, right? you leave all your ports open. That's that's not not good, you know. But the kind of weird interesting ones where well, if you pass in this wild card and it goes through this path, you know, then all of a sudden you've got root access, you know, that's that's the kind of stuff that we obviously want these things to find.
What isn't as clear to me or or at least some of again the anecdotal evidence that I've pulled from talking to people, it it seems like older models would have found most of them, maybe not all of them, right? And and so when I think about this as a step-wise improvement, that feels more right to me than we've suddenly like hockey stick in a new and interesting way. Is that is that what you're hearing and seeing?
>> Yeah, it is. And I think maybe the um finding with Mythos is perhaps more so that there's a lot more duct tape holding together enterprise software than uh the perhaps populace thinks there is.
>> No one could have predicted this.
>> Yeah, I know, right? And that now we we have some of the tools that are doing those at speed. And I think maybe that's the distinction we're seeing with mythos is really the um not the ability is brought down into reality, but the ability to do it in a a reasonable time frame.
>> Do you think that this portends a future where part of our pipelines will be and then AI scans for security vulnerabilities and is that a dramatic departure from some of the existing tools and techniques we've been using?
It is touch on this uh paper I mentioned just before we started speaking which is I I have seen some uh what what we call agent first code bases out there and these are code bases where the commits are you know expected to be done by an agent and then through the you know GitHub actions pipeline whatever CI/CD you're using uh there's also an agentic security scan and some you know, patches and bug fixes, all done agentically. Um, I I think there's a a side note, I think there's a really interesting question there about IP rights and ownership, which you've uh brought up before, Nate, but regardless, that seems to work best on small to medium-sized code bases. And I think this is basically going to continue to be a problem with LLMbased agentic architecture because of something called document poisoning. And this is what this new Microsoft paper touched on which is the longer the task, the more and and I'm going to butcher the thesis of this document. I we can uh share it as part of the link. The longer running a task, the more likely a corrupt or bad statement in some document is likely to cause the whole thing to just go off the rails. And so that's exactly what I'm seeing in these agentic code bases is that the longer it's sitting there by itself kind of the more fragmented it's thinking and the more poisonous bad assumptions get over time.
>> Interesting. Yeah. Yeah, I mean I I do think that's that's a fascinating part of this that it feels like many of the things that we've tried to do for the ways our own brain works, right? Like break the problem down, decompose the problem, you know, short tasks. A lot of those same tips, techniques, tricks apply using AI effectively, you know, and it it kind of gets back to some of the conversations that that we've had about, you know, how do we apply this to our our software development life cycle?
And and it it seems to me self- servingly maybe, but that the fundamentals of of software engineering are are pretty darn important today, even with these these tools that allow us to potentially move a lot faster.
>> Absolutely. I think we're still in a place where I at this moment would not trust an agent to write a full feature by itself. granularity still dictates uh human breaking that into chunks or doing a massive code review.
>> Yeah. And I I I think that speaks to the fact that as more code is produced almost by definition there are more bugs, there are more defects, you know, and that's true whether human writes it, whether AI writes it. Do do you think that with AI potentially writing more code with code bases potentially getting larger, are we opening ourselves up to more of these sort of zeroday CVE type problems? Is AI going to find many of them anyway? So, it's really just kind of a wash. We're basically where we are today. Yeah. What do you see as the impacts of that on the software we're producing? I I do very much think we are in a weird uh transitory phase where we are experiencing the growing pain of a technology that absolutely is going to radicalize the so job of a software engineer. [snorts] And so I I make no claims that it will never get there. Um I I think it will probably be there in in as little as a year from now. But for now, we're in this place where the development velocity is not seeing the uplift just yet as it relates to the number of uh lines of code that are being output.
>> Yeah. And I I do think that's an interesting way to look at this. You know, I think that a lot of organizations, they don't even know what their current baseline is in terms of velocity, like how long does it take for them to to create a feature, put it out there. And so if you don't know how fast you're going now, you can't really say whether AI is making things better or worse. And one of the things that that I've been thinking [clears throat] a lot about here over the last few weeks as various conversations I've had with folks like you with speed like that the faster you go the riskier things get you know and so if you think about you know driving through your neighborhood at 20 m hour versus driving through your neighborhood at 70 m hour you know there there's a huge uplift in risk when you're driving faster you know and kind of the analogy I've thought about is you know you and I have have driven been at highway speeds for many, many years. You know, clearly we've done it safely for many years, but that doesn't mean you can put us behind the wheel of an F1 car and expect us to get around the track without crashing or hurting ourselves or hurting someone else. And and so I I do feel like that's a potential issue as we introduce more of this into the software development world without the proper training, without the proper guard rails and harnesses in place, how do you move fast safely? you know, and I'm I'm curious if you have any thoughts on that, how you see companies sort of dealing with that.
>> I have a bet and it this is not an original bet. I I think the industry is very much coalescing here, which is agent harnesses, meaning the rappers and and tooling we we put around these agentic systems and LLMs. That's kind of the new breed of, you know, prompt engineering, context engineering, whatever it is. And what that points at is that that's sort of where the IP is starting to live of doing this safely.
Meaning you have to either a build yourself or b really trust the framework, the skills, the ways of thinking, the ways of working that these agents are relying on. And that only addresses half the equation. That kind of addresses the agentic side of stuff.
The other half is the human side and how are people actually interacting with these agents and I think that's where there's still very few organizations have touched on that and I it's still kind of a special snowflake every organization we go to what is the right fit uh for that organization of uh rolling enabling developers safely with this software.
>> Yeah, that that's a good point. I mean, a lot of that is where are they kind of on their own software development journey. You know, you've got some very very mature companies that have have been doing this, you know, agile and whatnot at scale. They've got those practices and principles in place. If you've got someplace that's more chaotic, maybe doesn't have all of that, it's a it's a rockier road for sure. You I mean, the the way the Firefox team said is basically, you know, we're we're in for a rocky transition here as as we start, you know, adapting to and adopting these tools. And you know, I think it'll be very curious to see how how that that all plays out.
Anecdotally, I've heard a couple of stories recently, one where an organization is now seeing like 50,000line diffs from from AI. And you know, they're theoretically supposed to review all those, but what developer is going to look at a 50,000line diff? And you know, if you and I were on a project and a a junior engineer checked in something with a 50,000 line diff, we'd have a conversation with them and we'd talk about why you shouldn't do that and then they'd learn and they wouldn't do that again hopefully or we'd have to continue coaching them until they learned that lesson.
AI can't really learn that, right? I mean, we have to put these harnesses, these guardrails in place. It's like, no, do not do a 50,000line diff unless you're, I guess, reformatting or something silly.
>> I I I might be going a [clears throat] little tangential to your point just now, but I was in researching mythos, I I found a quote. It was that intelligence and reaching a goal are not necessarily the same thing.
And so I bring that back to how these LLMs and agents are trained. They are very much optimization techniques that that we're using to get these agents to do what we want. And that's what we see quite often in these agents and LLM kind of acting as yesmen. And I think that very much also goes back to a perhaps uh more code is better. Maybe there's a uh perception of that in the models.
And B, this disconnect between writing code and thinking about code.
>> You you do bring up a really interesting point. These these models almost anytime you ask anything, it's like, oh, that's very insightful. Wow, Chris, you are so smart for thinking of that. That's a great question. That was so well phrased. You know, while I I guess I do appreciate the ego boost, this kind of wonder like should you really be doing that? Is is there a way for me to tune it so that you just give me the straight advice? You know, sometimes you got to you got to tell me what's really going on here. Don't don't don't soothe my ego necessarily.
>> I know it is it is nice to get a pat on the back every once in a while though.
>> Oh, totally totally. I just I worry that some developers are going to real think that well their LLM says I'm an amazing developer. I'm the best developer they've ever met, so obviously I must be really good at this.
Yeah, [sighs] there's I I I've seen a whole slew of I don't know if I want to call them uh preede AI slop startups and uh radical what what would be radical transformative uh changes to computer science that are very much just an echo chamber of bad code that doesn't do what it says it does but reinforced by pats on the back and uh pseudo intellectual white papers and I don't know what the impact of that is on society but I'll be curious to find out.
That's an interesting point, Chris, because I I think a lot of the sort of chatter or or discourse around, you know, AI, especially when it comes to to writing code, creating code, seems to come out of startup land. And I think there is a segment of the population that thinks that all software is only created by startups. You and I have spent our entire careers working with legacy organizations, older organizations, companies that have code that's been around for 30 years, 50 years. They have existing business practices that you know you can't move fast and break things. You know breaking things results in outages that cost you millions and millions of dollars. How do you see these tools like how do we safely apply these tools in those environments? And and you know I mean I I think maybe another way I'd ask this is maybe being on the bleeding edge isn't the best place to be in some cases.
>> Yeah. You know, I I think we see a paradox really where there's been research that companies that are really leveraging AI to realize value are organizations that enable teams to run kind of like startups, which is exactly, you know, the opposite of what you're saying, which is move fast, break things, uh push with bugs, and if someone sees it, just pull it back and fix it.
And a lot of organizations are just not set up that way, right? And >> so there is a almost an orthogonal shift that needs to occur in a lot of large organizations to truly benefit from Agentic AI, which where I'm seeing that kind of startup. It it sounds like such jargony uh phrasing, but where I'm seeing that mentality really work is where these guard rails, policies, etc. are baked into the platform that developers enable themselves with for these agents. And so [snorts] you really as an organization, you really need to create a seamless developer experience where they can go out, grab one of these agents, grab a key for cloud code, whatever, start developing, and [snorts] all of the safety guidelines that need to be in place to prevent these million-dollar bugs are behind the scenes and and not something that gets blocked through process or um still having big code reviews, whatever it means. So, there's there's kind of a a balance that needs to occur in a shift in large organizations.
>> Yeah. I would say one of the things we've been trying to do for a long time as part of shift left is let's make the right thing to do the easy thing to do so that you don't have to think about it, right? and and so let's make it so when developers are doing their day in dayout work they the harnesses and validations and whatnot are baked in so that we don't have to rely on well Chris will do the right thing Chris is a good de he knows not to leave the front door unlocked yeah maybe maybe he forgets one night you know so if if you bake it in it's a lot harder to to skip a step so a few other threads I'm interested in pulling on here so you did mention this earlier it does feel like we've hit some interesting limitations in terms of how we train these models. And you again, we're not seeing that sort of exponential new model comes out and just blows away what we had in the past. But I I think it's when we hit these constraints that some creativity starts happening and and we see some new ways of breaking this. We saw this with DeepSeek, you know, hey, here's another way of training these models. It doesn't always have to be on the most expensive, fastest, baddest chips. What do you think is coming in that regard? you know what what's sort of the next step in how we train these models and you know what do you think is going to kind of you know as Gary Marcus would say kind of break through that that limitation on scaling >> we have seen that with what mythos is uh and to talk about that I'll go back in time just a little bit so before LLMs and I should say before transformers and and this paper that I'm sure most people have heard of by now called attention is all you need The prevalent model architecture was really had to do with recurrence and and we know that from software engineering obviously as you know recursive thinking which is basically within a model we have this unit that kind of unfolds itself as the task dictates. So it can get more complex dynamically depending on on what it's doing.
That idea was kind of lost when transformers overtook recurrent neural networks as the uh pervasive model architecture. And what I meant what I mean by that, excuse me, is that is just sort of uh gas pedal to the floor. Let's see how far we can get with transformers by just kind of exploding the the reasoning section of of the model. bringing us back to today.
I think what we're seeing with mythos and other uh recent model advantages is well I don't think papers suggest that this is exactly what's happening and there's a uh codebase called open mythos you can go check out that kind of implements this is we're starting to inject loops back into the reasoning portions of these models where as a uh you know sentence in the easiest case is going through the logic can actually pass back to itself to continue thinking about that.
That makes these models really hard to train though because before we only had a you know n layer model we had to worry about optimizing.
Now the permutations of looping back, how many times do you loop loop back become exponentially more complex and that's why until recently it really wasn't worth productionizing at scale that architecture but with scaling limits I think we're at the point where companies like Enthropic are starting to do that. Yeah, it is always interesting in our industry to see when we've kind of run into the limit on something, what do we do to get around it? And and I feel like that's often where we do get these these really interesting jumps and and steps forward that we couldn't have before. And you know, I think it kind of leads to I think it was Ethan Malik who wrote a piece about the jagged frontier of of AI where you know, it seems like it's so good at some things like I mean it it really has gotten very good at coding. I think that's that's very clear that the code it's generating today versus even just a few months ago is is pretty remarkable. But yet it still screws up other things. You know, we we all have our favorite example. You know, I I I mentioned this to my wife the other day.
I said, you know, you can't lick a badger twice. And she's like, what? And I said, oh, you you don't know that reference. Like, you know, for a while there, you could put any idiom you wanted into to Google and it would Oh, yeah. No, that's a well-known idiom that means, you know, blah blah blah, you know, and and can't look a badger twice, you know. So, uh you know, although I guess in fairness, say all idioms are made up. So, you know, that's fine. But I I'm curious to see, you know, and I'd love your thoughts and like what do you expect to see in the future, whether that's 6 months, a year. I mean, where where do you feel like that next sort of frontier is going to be where we're going to look and go, "Wow, it really did get good at this now."
>> Yeah. No, that's that's a good question.
A big question as as an industry or as a society. I don't know what the right uh topology is at this point, but certainly we've kind of drawn a a circle as it were, a boundary in the sand where some sort of semanticbased embeddings of knowledge plus an agent harness plus an LLM creates a really good proxy for a lot of knowledge working tasks. And so I I really think to when we see the next radical transformation, it's one of those three things that is going to have a uh transformative paper come out that is completely orthogonal to how we're currently thinking about these things. I I don't want to diss the ingenuity of of humans here, but I think at some point some of these transformations are going to be AIdriven.
M um for sure >> I I have a hypothesis and and maybe this is from too many years of reading science fiction that there's some sort of uh knowledge boundary where all of our AI advances to date are very much tracking against a biological way of of designing intelligence. meaning this sort of concept of neurons and brain-like structures.
At some point, I think AI is going to come up with something that we couldn't even have imagined. Um, and that's what's going to lead to some of these transformations coming down the road.
>> Yeah, that's that's a really interesting point. And I remember years ago, and I guess this would be more machine learning than than what we truly consider AI today, but reading about, you know, how do we create the most efficient antenna for like a satellite or something having to do with with a space probe kind of thing. And as humans, we're sort of drawn to symmetry.
And so some of the designs that we come up with are not actually the most, you know, the best approach in terms of giving you the the best antenna, but but they look good. And so we sort of naturally gravitate towards that.
Whereas these these learning models don't have that constraint. And so they'll actually generate this is the most efficient antenna even though a human looks and goes that that's kind of ugly. But it's actually the best way to do it. And and I I think to your point, we'll likely see something along those lines where a human would not have made that jump or that combination because well obviously you can't combine these three things together. Everybody knows that. It's like well what if we just try it? Oh, look. It works. And then then God only knows where we live. Of course, maybe that gets us into the book that came out. I think it was Fall of 25. Uh, if anyone builds it, we all die.
>> I'll uh kick it to you to dive into that one. That that's on my list. I haven't read it yet, though.
>> I think they make some interesting points about you cannot possibly know the motivations of essentially an alien mind, you know.
And so we we think that we can bake into it. Well, you like humans, keep us as pets at worst, right? And and the reality is we can't understand. We can't even really understand how it makes some of these connections. You know, it's just these giant strings of numbers and you're like, don't know how it got there, you know? And I I think as someone who has spent their life writing deterministic code, it's a little unsettling to have something where it's like we don't know how it arrived at this answer, but it did.
>> Yeah. You know, a article uh on a blog on hugging face really caught my attention recently which very much has to do with this whole whole mythos situation albeit coincidentally and it's about this essentially a researcher in his basement with with two GPUs and he he wanted to see if he could make a model uh that could be at the top of the hugging face leaderboards with just those two GPUs, consumer grade GPUs he had in his basement.
And what led him down this hypothesis that eventually wound up to be true is that he was playing with different LLMs that he passed in base 64 as as the input, nothing else. And the LLM understood it and and could work with it perfectly.
which doesn't really make sense if you think about how these models are trained which is in human readable language.
And so the fact that you could put in base 64 and then you know get a comprehensible output comprehendable output even led him to believe that there's some combination of reasoning layers under the hood of these models that were doing different things like uh you know translating base 64 into more uh representative meanings that the model could then work with. And so you can find this uh blog post by the way by looking up rys repeat yourself which was the name of his model. He he started to basically do brain scans on these models where he was taking different permutations of layers and then pointing at a layer and having it repeat itself or go back two steps. And using this he could actually find kind of reasoning related layers within these models where these layers were doing diff doing similar things and by repeating them he could extend out the thinking cycle of the model and therefore the quality without actually having to train a new model or significantly change the size of the model. So, we basically just took uh Quen uh Quen 2 75 billion and added some of those loops we were talking about earlier with Mythos. And that model I think was on top of the leaderboards for like a year if not more on Hugging Face.
Um it's a really interesting blog post.
If you go in there, he almost has MRI scans of the model showing how he went about figuring this out. It's it's a really interesting uh read.
>> Well, that that's fascinating. You know, I think we because of where a lot of the attention has come from, where a lot of the the money has come from, I think we've put a lot of attention on on the the big companies, the OpenAIs, the Anthropics, the Microsofts, the Googles, the Amazons, etc. And it is possible to do these kinds of things in your proverbial basement. And you know, I've got a friend who's got like a 3GPU setup at home and in addition to using it to keep his office warm in the winter. He does some pretty fascinating things on there and it it does kind of make you wonder. I mean, is is that is that where the next breakthrough is really going to come from? Is someone who's essentially tinkering in their garage doesn't have some of the constraints maybe or or the well, we have to go down this path because, you know, the lead researcher says that's what we're going to do. This I'm going to go over here and play and and see what happens. Just just like we saw with this uh these scaling constraints, you know, humans really get the most creative under constraint situations.
Sim similar to the transformation we we saw with the Quen models in in their optimization technique. I think we will see similar transformations in the ant uh optimization space when these large model providers stop subsidizing tokens.
Um and and people in organizations start to see the real cost of of these models they're running. I'm really glad you brought that up because that's one of the things I'm starting to hear from folks where we've kind of been on these subsidized all you can eat plans and you know I've read some reports where developers on the $200 a month plan are costing the tool vendor 50,000 a month you know and I don't have an MBA. I'm just a techie. But I don't think that's a viable business model to to charge 200 for something that cost you 50,000. That doesn't seem like a good way to make profit. and and right now this this whole industry has largely been been subsidized by a lot of investment and and a lot of money chasing this. But you can start to see the the edges of that where where companies are going, "Hey, we've already poured an obscene amount of money into this. Show me the profits." And what do you think happens when that switch flips and we're no longer using subsidized tools that are using subsidized models and everyone says, "Actually, I'm going have to charge you what you what you're really using here."
>> Yeah. what one of two things is is going to happen and we're already seeing early signs at organizations where boards are starting to say hey where's where's my business value essentially so the first thing that might happen I think is that some organizations that hadn't seen tremendous business value from AI probably just going to kind of throw their hands up and say all right we're putting caps on everything uh we're going to limit our spend here. You know, we tried it. Wasn't for us.
Alternatively, I think a lot of organizations actually have an opportunity to really maintain their value while costoptimizing through self-managed model stacks, GPUs, um their own foundational models, what whatever that looks like depending on the use case.
And I think it's organizations and and I'm stealing this anecdotal data point from uh Andy Nolan, another thought worker. It's really organizations around the 500k use cases around the 500k and up uh price point that self-hosting really becomes the best option. And so a lot of large organizations we're going to see are just going to kind of go internal but for the most complex of use cases I think.
>> Oh that's that's interesting. Yeah.
It'll be curious to see how organizations respond to that. And and I I think it's it's interesting to me that some companies are actually like tracking people's token usage but as a way of of sort of ranking you. And and I I know some companies have like leaderboards and I I was talking to a friend of mine who the and whether this was explicit or implicit, it's like, you know, if you're not using enough tokens, we're going to wonder why as opposed to, hey, you're using a lot of tokens, we need you to cut back. And and I I feel like this is yet another example of just because you can measure something doesn't mean it's a useful metric. I would like to think the right way to look at this is so tell me how much business value you're generating out of those tokens, right? I mean, you you and I have had that conversation before where I I need I need lots of tokens.
Why? Because I need lots of tokens, right? But how much business value are you generating with all those tokens?
And it be curious to see how that plays out.
>> Yeah, it very much feels like a hype metric. The token contest, I guess we'll call it. very much smells like kind of a NFT bro attitude or a a Bitcoin bro attitude which is you know yolo let's just uh throw coins at the problem until it fixes itself. I I totally agree though that this is another tricky governance situation where ideally token use needs to be tied to an articulatable business case with clear business value.
And if you don't have that, you either are not measuring the right thing or don't have the dashboards, KPIs, whatever it is in place to measure the right thing because you know tokens is an arbitrary signal of uh nothing.
>> Yeah, it's the same as judging a developer by lines of code written or lines of code modified or lines of code deleted. Yes, it is measurable. That doesn't mean it's actually valuable or or measuring what you think it's measuring.
And and maybe that's another constraint that will will get us to a better place where instead of it being burn as many tokens as you can, you know, I shared with you something that someone shared with me on Instagram of, you know, the the hypothetical, you know, CEO, I need you to spend a trillion tokens a month.
You know, how many tokens are you spending? Well, I'm I'm my agents are making other agents feel good. You know, like, oh, wow. You know, that's that's doesn't seem like a good use of money.
>> A poetry writing agent.
>> Yes. Yes. my agent's writing poetry and then yeah and judging it you know so may maybe that's part of it too is is as we're experimenting because there's a lot of experiments because this is a brand new tool and you know I think one of the challenges in software from the beginning is that we are often building things that no one has ever built before using tools that we literally just invented and so it takes some time you know and I I suspect when we started building buildings of any specific size we made some mistakes aches and walls caved in and and people, you know, lost limbs or died and and we learned, oh, this is not the right way to do that.
This is the right way to do it. You know, this is the property of this. This is the math behind it. And I think we'll get there, but I I think we need to have a little bit of patience and understand that we're going to have to try some things before we sort of near zero in on this. This is the best way to get there.
>> Yeah.
Historically, I think we also have seen with ML a lot longer runway before models actually reach the public consciousness if if they ever even do.
And what I'm thinking of is something like uh what Meta might build like the Instagram algorithm. you know, they're they're kind of refining those algorithms behind the scenes and training against those for years uh before and and maybe that algorithm was a bad example, but for years before the public's really ever conscious of these things. And so LLM it's been such a fast approach on the hype cycle that we have multiple confound compounding things that make this a very complex space which is a user behavior.
B actually getting the underlying LLMs to behave like we want and then C getting the the agent harnesses on top of that to behave like we want. And it's one of these situations where you turn one knob and the other thing goes crazy and vice versa. And so bringing it back to earlier, I think we're very much in this weird transitory growing pain space.
>> Yeah, I I I definitely feel that as well. And I I mean I I sort of pendulum between, you know, some existential dread, you know, as I mentioned earlier, if anyone builds it, we all die. I mean I don't I don't really think that's a likely outcome honestly to some very [clears throat] optimistic things right and and I think about the kinds of applications the kinds of software that these tools will enable us to build that we could not have built in the past you know I I think back to learning to code with a literal text editor compared to the IDs we have today and I think you know what an exponential increase in what we can do the complexity of problems we can solve And you know I I think the most positive outcome of this is the kinds of problems that we cannot solve without these tools and and what is the benefit to all of us for that.
You know whether it's much better weather forecast. I mean think about how much better that's gotten in in our lifetimes. You know it wasn't that long ago that you'd have to look at the back page of the newspaper and it would be like yeah it's going to might rain today and now you know my phone can tell me that you know rain stopping in 15 minutes. And it's usually right within plus or minus a few minutes, you know.
Is there anything you see coming down the the slow period, coming down the pipe that you're like, man, this is really going to be pretty cool or can't wait to see how this impacts us?
I'm excited to kind of see the stabilization of these models and what I think will be their >> kind of exit back behind the curtain again where machine learning usually lives and not that they're going to go away but I think agentic LLM based agentic AI is very much going to kind of become come a sidecar to workflows and and tasks that where once we've solved a lot of these guardrail issues, harnesses, uh task attention, staying on task, we're not going to be as interactive with these things. And I think that'll do it do a couple of things. a we start to move away from these metrics like uh you know number of tokens spent and get back to focusing on accomplishing actual uh tasks [snorts] and uh B I think it gets out of the way for whatever comes next and it's very hard to predict what that is you know we've we've seen this is even prelim regular old transformerbased We've seen a lot of cool stuff come out of this space in terms of uh how it gets applied in medicine with like protein folding and drug discovery.
And so I think we might see something like that which going a little bit back to what we were talking about earlier but some transformative way of developing these models or representing knowledge that is not token based spitting out tokens. I I really think that's what's coming next, maybe in the next uh two years or so.
>> So, you you mentioned this earlier and and I do want to circle back to it, but so Enthropic announces Mythos and says, "Whoa, this is so scary. We we can't release it." And then, you know, announces Glass Wing and we're only going to release it to certain partners to play with it and and I know at least anecdotally that the companies that had access to it, the NDAs were strict to the point where you couldn't even really admit to your coworker that you had access to it.
And then we discovered that if you were on the right Discord server and you went to basically the right location on the web, you could find it.
>> Any thoughts on like if it's that scary and that dangerous, shouldn't it have been harder to find?
>> Well, certainly a a few things with that. I I initially go back to I'm pretty sure we had the same perception or OpenAI was doing the exact same thing for GPT3 or maybe GPD4 which was this model is so radically different and intelligent that we cannot release it to the public. And then 3 weeks later there it was and today we sit here and I would call GPT3 trash compared to what else is out there. Um, and [snorts] so this this is a terrible analogy in in terms of how it represents what I think of humanity, but you know, handing a a child a gun for the first time without, you know, teaching them about safety, whatever else is is always going to be a a dangerous thing. And so maybe that's why they kind of slow drip these models. At the same time, we know that Anthropic, you know, is pouring billions of dollars into this stuff, and this could be a really clever marketing play. Even, you know, this all came to be through through a leaked press release that someone found buried in some code.
You know, we don't know if their product team put that there. I I kind of doubt that. But I guess what I'm saying is that there's there's a million reasons that they could be kind of keeping this model under wrapped and it could very much simply have to do with uh capacity and they don't want the public knowing that we're a little short on GPUs right now and so we're going to tell everyone this is too dangerous. Um so it's hard I guess it's it's hard to speculate is what I'm getting at. Yeah, I I think it's always wise to have some salt when you listen to some of these these releases. And you know, I I think we always have to ask, is there some other motivation behind this? Whether that's to, you know, pump up a stock price, whether that's to to your point, well, we don't actually have enough capacity to let everybody play with this, so we're just going to say it's too dangerous, so we're just going to constrain that to our our partners or or trusted friends to to go ahead and look at. Yeah. So, I always think, you know, extraordinary uh how does the phrase go?
Extraordinary claims require extraordinary evidence. So, we'll see.
We'll see.
Well, I'd like to wrap up, Chris, with with the fact that there's so much that goes on in this space. And, you know, one of the things that I love about my job is I get to pick your brain on a regular basis, but how are you keeping up with all the changes in models and papers? Is is there are there any resources you recommend? Anything that you would say, hey, if you really want to try to keep your your fingers firmly on the pulse, here's how to do it.
>> We're in an age where uh that's that's a difficult question. you very much fall into AI slop quite quickly. Uh, and and not necessarily the content, but how it's presented to you. I want to call it a legacy feature at this point, but on my Android phone, and I'm sure you can get to this through just the Google page, too. They send me a curated kind of stream of articles and stuff and they do a great job of finding recent papers and stuff that I would find interesting.
Um, LinkedIn, but you have to make sure you know you're following trusted sources, people you trust. And uh, you know, I've I've certainly seen I'm I'm certainly guilty of it myself. I've I've seen claims on LinkedIn that uh don't always hold up when you read the underlying paper. Um so make sure you're following trusted people on LinkedIn. Uh you know, Reddit, Twitter. We've always had this problem though, right? This is media literacy. Don't believe everything you see. And uh >> it's getting harder.
>> Absolutely. Yeah. Yeah. What about you?
Where's where are your go-to sources?
>> Same thing, right? I've kind of got this curated set of feeds that I follow and and that usually does a pretty good job of of surfacing these things up as as they occur, you know, and I I think that's kind of it, right? Is you just sort of have to have those lines in the water and then react accordingly and and then adjust. But, you know, there there are some very good people. I I do read a lot of Gary Marcus. I've been following a lot of Ed Zitron as well, although Ed writes a lot. you know, I mean, seemingly everything he puts out is 12,000 words, 16,000 words. I'm like, I I really need the the executive summary of this, you know, and and and it's great. There's some amazing stuff in there, but it's like, all right, I got to I got to I got to strap in. Like, this this is going to be a chunk of time I have to devote to it. But I think that's it. I think the other part of it, too, is is just having these kinds of conversations. You know, I was I was at Arc ofAI recently and just talking to to other presenters and and attendees and just kind of having this back and forth is really valuable and just sort of like, hey, what's working for you?
What's not working for you? What what are you struggling with? You because I think we're we're going to we we have to work together to to sort of figure this stuff out. And you know, it we'll get there faster as a community than we will each of us trying to do this on our own.
and and so I would encourage you to engage in these these conversations and and see where that leads us.
>> Yeah, you know, that's a nice kind of conclusion to land with, which is I think uh AI companions uh still cannot replace good old human interaction.
>> 100%. 100%. I I think that is the perfect message to end on, Chris. And I want to thank you for hanging out with me, brother. A pleasure as always >> and I want to thank all of you for hanging out with us as well. I hope this was useful.
>> See you on the next one. Cheers.
>> Bye.
Related Videos
OpenHuman VS Hermes AI: Who Wins?
JulianGoldieSEO
285 viewsā¢2026-05-29
Long-Running Agents ā Build an Agent That Never Forgets with Google ADK
suryakunju
142 viewsā¢2026-05-30
5 Mind Blowing Omni Uses Cases
PaulJLipsky
1K viewsā¢2026-06-02
This computer is made from real human brain cells. And you can buy it.
Talktmsmedia
3K viewsā¢2026-05-28
BREAKING: Microsoftās New Image Generating Model Beat Out GPT 1.5 and Nano Banana 2
aimmediahouse
122 viewsā¢2026-06-03
I Made the Same Anime Fight Scene in Every AI Video Generator
NobleGooseAnime
295 viewsā¢2026-05-30
Nvidia Bets Big On AI PCs | New Chip To Power Windows Laptops | Technology | AI Updates | N18S
cnnnews18
3K viewsā¢2026-06-01
I Tested NEW Opus 4.8 on Four Projects (Updated LLM Leaderboard)
AICodingDaily
298 viewsā¢2026-05-29











