Crawshaw correctly identifies that AI agents need the full system sovereignty of a VM to safely experiment and fail. Containers are optimized for efficient deployment, but agents require the unconstrained, isolated environment that only a virtual machine can provide.
Deep Dive
Prerequisite Knowledge
- No data available.
Where to go next
- No data available.
Deep Dive
agents need VMs, not containersAdded:
David, what made you build a cloud?
>> Oh, well, there was no cloud I wanted to use, so I have to go build our own.
That's uh it's a short story. Uh the longer story is uh my co-founder and I had spent quite a while exploring the AI product space looking for things we could build uh around, you know, sort of code generation. And this was really, this was pre-clawed code uh showing up and actually working. Uh and so we spent a while prototyping uh uh products including we started on a uh an agent-like product as you would call it today uh in the pre-claw code world and we spent a lot of time on uh uh isolation machinery and so we started with uh container isolation systems so that whenever you asked it to do something it would copy your git repo into a fresh container uh it would do the work there it would pull git commits back out onto a branch on your git repository. This was called Sketch. Uh, and it was, uh, it was a ton of fun. Uh, and then, you know, a bunch of other agents showed up, uh, and were easier to use because obviously the user interface to ours was Git, which was, uh, you know, it's, uh, uh, if every time you typed in a prompt, you then ran a bunch of git commands to get into a specialized branch where the results were, uh, it was a it was a tricky thing to use. But what we learned from building all of that is we really liked isolating agents. they work a lot better in some kind of uh universe of their own. Uh and uh we built a lot of machinery for hosting that in the cloud.
Uh and we realized that machinery is actually the most interesting thing we built. It was a surprisingly hard to spin up a machine in the cloud that did these things today. Uh so we thought we'd uh uh we'd explore turning that into a product. And so we kept following that uh uh that path. Uh and now we're at a path where like well we needed our own computers to you know get the the shape of the machine right. Uh we needed our own DNS to get DNS names in fast enough uh given the fast startup time of the things uh and before you know it you've uh you've built a cloud.
>> That's a cool story but I was to ask you more about uh sketch it was called right before Shelly >> the sketch thing. So first what's wrong with using Docker?
Yeah, we started with Docker uh and it it worked pretty well. Uh it worked pretty well for our local development like we designed it around our own needs. Uh and we thought this is great and you know containers are isolated enough. We said to ourselves the where it went wrong uh it came in two forms.
First is if you think about it as a development environment we took it to uh companies to try we would get their engineering teams to try using it as part of their development cycle. uh and the first thing they do is try running their test suites uh from their FCI, you know, in this in there. And you really want that with agents, right? You want ask it to do a thing and then have it be able to run all of the tests and start the server and try it because that feedback makes agents really good at their job.
>> Otherwise, you have to sit there in the middle constantly telling us like, "No, the test failed. Oh, the test failed."
Uh and uh every company we took it to would run into issues with the containers. Uh and you know, container is not a well- definfined term, right?
Even in the Docker universe, there's like a whole set of capabilities you can assign to your container, which is like exactly how the croups are getting sliced up under the hood to build your isolation environment on Linux. Uh but people's uh test suites would include like using Docker Compose to bring up some servers in the middle of the tests.
Uh and you can't use Docker Compose inside Docker, for example. Uh and so every every company we went to, we would go and fix a series of these issues and then we take it to the next company and try it with them. Uh, and each company we tried would find new things that didn't fit inside a container. And we we realized these things are never quite the right shape and they're never quite powerful enough. And like what we actually need is a is a computer that they can run these things on that looks like, you know, the computers people actually develop on which means well it needs to be a VM. And like if your tests need to create a network interface, let them create a network interface, give them root, let them do whatever they have to do. Uh, and as soon as you start saying that containers don't really work at all, you really need uh the proper isolation of a VM to make it happen. Uh, and so that's really that was the first big thing that got us there is we wanted uh uh we wanted to be able to run all of these test suites. It's like oh if you you know we ran into test suites that install K3S as part of the setup like okay well K3S has to work again doesn't work inside even a G Visor setup doesn't work. Uh so the second thing that we ran into is we discovered we really liked longunning VMs or longunning environments separate from a single session and we realized there was actually a whole lot of software where we have a very different life cycle for developing it where we were just treating dev as prod and so we kept the thing around and like we would start using it and it was the actual product.
There was no deploy phase where we put it somewhere else. Uh the development environment with the agent was good enough and yeah that yeah uh >> so that's more toy that's more for toy projects right this part type of thing.
Yeah, the uh you know we used to so going back in time 25 years we used to develop in prod a lot actually and like part of this was a lack of uh uh modern best practices but you would go into the CGI bin directory and you'd edit the Pearl script that's right in there and you just do it live and you'd hit save and the next reload of the Apache server would run the next script. Uh then we had and like this was not just a pearl thing right like on in the Microsoft universe they had active server pages which was like this you'd edit theasp file it would auto recompile next load java copied all that java server pages there were these whole universes of like we we develop live uh and it worked really well actually for some things uh and it worked really terribly for other things which is why we stopped doing that uh and uh uh it the set of software you want to develop in prod shrunk to nearly zero uh from back then to uh 2025. And now that we have agents that can develop live for you, uh the the sort of the central thesis is the set of software that you want to develop directly is actually no longer zero. There's some things you want to do that way.
>> That's that's a good point. You you brought back a memory of mine. I I did this in part in 2018 with Ruby. Um, I was working for a payments company back then and we would run these nightly uh, payment reconciliation scripts that would basically read the DB and format it in some sort of document for some payment provider that would expect a certain way and um, we definitely did edit in part. Yeah. When we had some minor issues, some minor bugs, some typo, something breaking Ruby its type system would definitely sometimes edit in part. I remember doing it. Um, but back to the testing because you you really intrigued me there. I didn't know you could you could not run Docker Compose in a Docker container, although probably makes sense. Are these system tests we're talking about that are running all these Docker containers?
I've never been very good with the ontology of tests, you know, as in like is it an integration test or an end to end test or a system test or a unit test? I' I've never been very good at categorizing them. To me, there are tests and I run them uh for my software.
And uh you know I like to you know it's very good when your test just tests a small piece of your code because then it's clear what fails when it does but it's also good when your test actually tests the real database and the end toend nature of things right >> uh and these companies piled all you know they basically had a CI pipeline of some sort that would run everything uh and you really want to give an agent the freedom to run whatever tests it thinks are appropriate for the problem. Uh and sometimes it's all of the tests. Uh so these were probably larger tests that were actually running real subsystems. I uh I can't remember the details of some of their composed files, but a classic thing to do in a composed file is to start a Postgress uh and then connect to the Postgress. And that's actually super sensible like mocking out your Postgress in a test is just asking for you to miss some subtlety of how some SQL statement is pared uh because you used a not quite Postgress. And so there's, you know, I'm I'm a big fan of doing things like that.
Generally, I I think tests should be large enough that you don't uh fall into traps like that. So I'm I'm quite fond of it all. Uh from a technical perspective, I think Docker Compose couldn't start in a Docker container.
And this is actually worth someone checking me on. Uh is uh I think Docker Compose does its DNS names by doing something interesting. It intercepts the DNS packets on the network interface and it uses some Linux API to do that and I think that is not always available inside docker and docker and docker is also generally pretty tricky you know it's a theoretical thing that can work and in practice I think it gets caught on a lot of uh subtleties of uh nesting croups >> interesting yeah I mean my understanding of system test is it's some sort of end to end test that spins up the system as realistically as possible And under my experience at least, you wouldn't usually run this in a PR type of environment. Maybe it runs as part of the CI for the PR, but usually I think it runs on the main branch, let's say nightly, which you bring up a good point that you probably do want agents to run these things now that they can really just drive the whole process autonomously.
And it's probably something that's missing today because I guess the usual workflow today with let's say cloth code is you know you run it it iterates on the code it runs the local tests which are probably the system test maybe some integration tests and then puts out a PR and anything that's a little bit more demanding from resources it probably doesn't run um where when when I used to work at confl we were developing Kafka and uh all the tests on that project would take at least two hours on a good CPU >> um and on my MacBook I some especially when the MacBook got a little bit older let's say it was 3 years old or something when I ran the tests the computer was barely usable like even Zoom was lagging so I could I I really didn't want to run tests locally most of the time unless I had to and not to mention you know imagine having three four agents running their own set of tests developing separate branches that would be completely unusable so there's probably some something to be said about being able to outsource the the CPU to run the tests also run the the AI agent loop Yeah, I mean I've I've worked on a bunch of teams in my career with very different sort of test setups. You know, I've seen tests that take hours that run once a day. I've seen, you know, test suites that take minutes. I'm a big believer in test as much as you can, as quickly as you can. So, for example, Exdev doesn't have a nightly test suite setup. We have the tests we run every time we push to main and those tests spin up at this point probably thousands of VMs in the process and if you try to run them on your laptop you have to run them in a VM and so then it's all nested VMs uh assuming you're working on a Mac that is because you need a Linux base for all of our infrastructure uh and you know it might take 10 minutes uh which is why we now have a stack of very large computers in the cloud uh that we use for tests and so the tests get immediately shipped off your laptop if you're working locally and honestly I basically don't work locally anymore. I work inside XIDev VMs. Uh but I believe our test suite currently takes about 100 seconds. That's sort of our magic number and you know we just shout it out and use enormous computers to get it done as quickly as possible because I I really don't like waking up in the morning to find you know middle of yesterday I broke some tests and now I know about it. uh I I want my development loops to be as tight as possible and I think that is a lot of work like you really have to commit to building environments like that and it consumes a lot of engineering time and it's ongoing engineering time right because as you work you add tests and everything gets slower and it's it's very painful >> uh but you know it's a uh it's a philosophy that works really well with agents because it gives them all of the tools to know uh to to be to to have confidence in their work.
interesting point. But then you said, so I'm trying to wrap my head about around excess so far. So so you you've said that the motivation mainly was to to to make it AI agent friendly for development purposes. Is that a correct take? That's where we started. Yeah. And uh you know the uh one of our thesis is what is AI friendly is whatever is developer friendly. And there's a there's a kind of mechanism here which you know I'll posit a mechanism which is that uh uh the models we use uh in agents are trained on developers.
So does transcripts of our of sessions of developers driving agents and using tools that is used to train them. And so they're very good at doing things the way we do them. Uh and so environments that are good for us are good for them.
And if they have to spend uh you know a quarter of their context window reading the docs to figure out how to use the cloud API to list the VMs that are currently running which you know some cloud APIs are not not trivial. uh then uh that's bad for agents. Uh and you know we knew it was bad for us as developers, right? I don't remember the commands for listing my VMs on AWS, right? It's like four lines long. I have it in a shell script somewhere. JQ is involved. I have to pass some JSON to make it work. It's it's involved.
>> Uh and like we've always just like, okay, that's bad for me, but I put it in a shell script. I'll stop thinking about it. Uh but our agents think about that stuff every day using those systems and it's uh it burns valuable context window that they can't use for solving our problems. And so that's our thesis is like we should build whatever is best for developers. And you know I think traditionally clouds have not been built to be what's best for developers because I don't think developers buy large resources in clouds. I think if you have a hund00 million contract you're negotiating for an enterprise. Uh you negotiate it based on who gives you the you know the most CPUs for the fewest dollars. And if your engineers have to you know add two lines to the command they use the ls no one cares.
traditionally uh now if we you know want to make our agents work really well we do care suddenly >> yeah I remember Azure's uh CLI was probably the most ridiculous I I just remember having the longest commands to run there ever but um there again I would ask you what's wrong with modern cloud so okay let's say we burn a little bit more tokens the API harder to to understand I'm sure the agents well I'm not sure but they they must know it because presumably they've been trained on quite a lot and even if they do fail right now intermittently I guess that might get fixed as the models improve but but fundamentally what do you think is lacking from current clouds because it's quite the endeavor to build your own cloud.
>> Yeah I mean I think uh I think an agent can figure it out right you you absolutely can figure out any cloud with an agent and there's an argument to be said for like we'll just throw endless amounts of tokens at the problem and then it doesn't matter if all of the underlying stuff is no good. Uh the question is about token efficiency >> and you know in a sense that could be dollars and I'm I'm not actually particularly interested in that because I don't know where that's going to land actually like the first chips really designed for LLMs are busy taping out right now and we'll see at the end of this year. Every chip we have today that's designed for LLMs was actually designed for 10 other projects. Uh and it's some sort of amalgam. uh you know it takes a long time to build a chip. Uh and so like how expensive tokens are going to be in 2 or 3 years is completely unknown to me. How big a model is going to get, how sparse will they get, you know, how much compute will be involved, it's a it's a mystery.
Uh but what is uh clear today right now is models have pretty limited context windows. Even these million token context windows which never really work like you never really want to get beyond halfway through the token window, >> the context window. uh you know every every token that you're just constantly wasting doing something is eating that context window and is that's intelligence that is no longer being used to solve your problem. And I I see this driving agents all the time, right?
If they get distracted trying to solve some innane problem in you know the way the test was laid out makes it very hard for them to figure out a problem then they'll lose track of the bigger issue I was trying to solve around you know the last couple days I've been working on spam detection machinery which gets pretty subtle and it's very easy to lose track of as a human and it turns out it's very easy to lose track of as an agent too like they they constantly go off uh and uh get distracted uh but you know every every second spent dealing with a cloud is a is a second not spent solving the real problem and and more importantly is distracting you from that problem and you know maybe there are technical solutions to that maybe you know I've seen a lot of experiments with sub aents where like they spin themselves out uh so they don't burn main context window may maybe someone will solve this problem and like suddenly that thesis won't hold but I haven't seen a solution yet >> is that the main thesis though >> I mean the main thesis well that like I said it was in two parts right and like the the uh there's this thesis of like we can build something that's better for developers and thus better for agents is true uh but then there's this the other side of it which is the the environment developers want to work in looks a little different uh than it used to and this is the environment of like you know I want a VM to come up to run my experiment in and I want it to have TLS on it automatically and I want it to have an AM proxy in front of it automat automatically. Uh whereas if I start an EC2 instance today, it's immediately dropped onto the internet with some port filtering in front of it. Uh which is not at all what I want. You know, I don't want the SSH exposed directly to the internet, you know, and I I want the web server only accessible to me and my team. Uh and you know, that sort of machinery, that's that's the development environment I want. And you know, it's uh it's hard to come by. It's uh there are these uh people on the internet who love uh telling people use VPS's uh to solve problems. So, I'm actually a big fan of that because I think they I think they understand something about programming that like is kind of lost when you get too deep into like managing complex clouds.
>> Yeah.
>> Uh but they also have a list of like things you should do the moment you start your VPS. It's like 30 things long and like >> you know one way you could look at Exid Dev is like what if we just did all that stuff for you out of the box so that you know it just worked. So >> that is good. I think there is space for that. We are seeing more I mean talk on about Hitster on Twitter and everywhere.
I think there is something to be said about having your own basically machine knowledge. You have AI agents. You've democratized a lot of that operational knowledge to AI and I think it probably could be outsourced for at least the toy projects. Um I tried exit.dev. I love it. I think you've done really well on the UX and the signup UX and the usability UX. I uploaded my own project there hosted in in zero time. Um, I don't know if everybody watching this knows how it sort of works, but as a TLDDR, I would say you do have your own proxy in front of the machine, which somehow automatically logs me in when I'm to my browser. Let's say if I try to open the VPS on my browser because I'm logged into your website, it opens the whatever I'm exporting, whatever port I'm exporting, let's say some web server I coded. Whereas if I open it on another browser, it doesn't. These sort of things are very useful for sure. And where I thought you may see use or where it might be cool is I imagine that nowadays in this um AI world we're living in where I I see companies adopting AI coding overnight and spending millions on on tokens and people I feel like product managers are changing their roles and they're just their workflow now is like demo something instead of actually writing up requirements and ideas.
>> Mh. And I I feel like basically the amount of for lack of a better word toy projects or projects that are some sort of demo or showcase or a trial of something have exploded. And I think you're right that there is not there's not a very easy way. If I vibe code something today, how do I show it to my friend? How do I show it to my teammate?
How do I show it to my whole team? It's quite a lot of work. Yeah, that's uh uh one way I like to describe uh the pricing model of XDEV is I used to have an Apple note. Uh actually I had a whole series because they would get long and then I'd make a new one with like a bunch of bullet points in it and I like one liners of things I wish I could build because for everything I build there's like 10 things I wish I could build and I always way too much stuff to build which is why I like agents, right?
They let me build a bit more stuff. Uh, and the idea in my mind is like, well, instead of writing a oneliner into an Apple notice, I'm walking down the street, paste it into xdev/new, and some of the time the thing I want actually appears or very close to, uh, and you know, it's like maybe it's 25% of the time, then the rest of the time the agent goes off the rails and doesn't understand at all what I was trying to build. But, uh, you know, it's, uh, that that dream of like, you know, let's just see if we can build the software off the bat. Uh again the set of software you can build that way is now is no longer zero. It's uh and like most things still doesn't work but you know it's improving uh and that uh to make that work like on a on like a heteron style system uh I I would have two options. I could either for each one of these like oneliners I could go and buy a new piece of machinery on hets. could pay some money.
Uh or I could take a HNA box and I could install Docker on it and carefully divide it into containers and I could install a reverse proxy that like maps various domain names into it. Uh, and that machinery we're describing of like splitting up a hexa box, that's exidev basically like that's its whole purpose.
And like and like this is a very weird sort of economist way to pitch a a cloud, but like the goal is that the marginal cost of starting a virtual machine on XDV is zero because you've purchased compute resources from us. You bought some amount of CPU and RAM and then in that you run as many VMs as you like and they share those resources. And so, you know, you got 10 ideas today, you create 10 VMs. Uh, you look at seven or eight of them and they're terrible and you never look at them again. And like two of them you keep poking around occasionally and they consume almost no resources almost all the time except when you're looking at them. So, it's fine. So, that that's kind of the pitch.
>> That's a that's a good pitch. I I do like that model. I haven't seen it before. It seems like today if you buy a VM, it has to have some minimum amount of vCPUs. And as you say, a lot of projects maybe they require one vCPU for 30 minutes today, maybe they don't for the next 23 hours. Um, I really like that split. I' I've never thought about it too much. I think you're right. You could potentially do the same thing in a Headstrong box or an AWS EC2 box, but there's still a lot of thing things to to set up and build. It's pretty bespoke, right? To to split up by containers, you have to run the proxy.
It's actually much harder on EC2 because I don't think they allow nested virtualization and they're already a layer in uh whereas on a HNA box uh I mean you could buy a bare metal box from HNA right and you don't even have to use nested virtualization which is great. So you could set up a VMM you could get firecracker running and you all this stuff you know you can have a lot of fun on those things and look I'm all for having fun right built stuff uh but also I like things that are easy and just work and so >> 95% of people do not want that. I saw some comments in hacker news saying that yeah technically you could do it and if you want to go ahead but I'm sure 95% of people don't don't want to deal with that at all. U so so with exit then are you running your own boxes or are you running on top of let's say the AWS?
Yeah, we we actually started on AWS. We started with metal boxes on AWS. Uh and we did that for expediency. You know, it's uh a great thing about the cloud is there's resources on tap. You know, if I need five machines from AWS, they don't even blink. You know, you just have five machines. It's great. uh you know I'm I don't want to be you know uh down on the hypers scale of clouds like you know they're uh they've achieved a a scale that you know is astonishing and in with that they've achieved uh you know all sorts of wonderful things like that uh and uh uh you know the the issues on AWS are mostly the price it's very expensive running on AWS uh uh uh when you're small you know if you're very large you know the deals become amazing but we're not even big enough yet to think about great deals like that. Uh so we had a we had an inherent price push towards doing something else. uh and that's when we uh started uh approaching there there are sort of white glove services for uh servers running in data centers and so we started there getting machines rack outside the uh the hyperscalers and now we're talking to people you know we're busy building some racks of machines uh and the the advantages beyond price there are kind of two well one is a also a price advantage around networking which is that uh the clouds have very expensive networking uh and like you don't really realize how expensive it is until you get outside of cloud and like zeros disappear from the prices and you're like wow that's something. Uh and so you know we we want to pass on as much of that as we can to our users. Uh we you know our prices are actually still pretty high by comparison because uh you know we're trying to build some interesting sort of global anycast machinery. Uh but uh you know if we can make prices cheaper we will. Uh the uh the second thing is the shape of machines and it's uh we we we think we think there's uh especially for dev environments there are machines that are kind of hard to do on clouds today that are are really interesting that are actually easier to do if you're at your own machines. And so the big thing here is around disk. So by default clouds don't do a lot of NVME in machines. They want to push you to remote storage. And uh there's a lot of you know historically this made a lot of sense when we were using hard drives. Putting hard drives over a network is no big deal at all. Uh it took 10 milliseconds to do a seek on a hard drive and so 1 millisecond for your Ethernet roundtrip time is fine. Uh the uh uh switching to SSDs was an issue though because now the seek time is 10 microsconds and you have a 1 millisecond uh Ethernet seat time.
Uh and so it's uh you know the the remoteness starts to dominate the the random reads and writes uh which is which is no good. Uh and so you know they have some instances with NVMe in them but they are they're shaped for very specific needs and the big need is uh running traditional databases. Uh and you can really see this in the layout of these things like you know if you start you know a post a Postgress cluster on uh big metal spe specialized instances on AWS you start a primary and two replicas say or three replicas one of them fails the primary fails you fail over to a replica you bring up a new machine and you spend the next 8 to 10 hours filling you know hundreds of terabytes of data into its NVMemes that's fine like that's it works really well actually the the product's good uh but one of the interesting things about it is like if a if a machine in AWS has a disk failure uh they fail you over to another machine like one disk failure means uh all of your NVMEs are wiped and you have to restore from remote which takes hours because you know these are large NVMEs uh and that's actually not a good model for us at all like a better model for us is to put you know a RAID five like setup on NVME so that if a disc fails an ops person can swap the disc out and the machine keeps running uh rather than on one disc failure we have to swap pop everyone over to another machine and run in a degraded mode from remote storage for 8 hours while we fill the NVMe.
>> I I see what you mean. You're talking it fails all the hard drives you mean across uh users. Is that what you're saying?
>> Yeah, I mean if we run a machine with 10 NVMe in it uh and a physical drive fails, you can keep the machine running and have an ops person swap the drive out and bring the bring the RAID back up >> while it's running. Because from a single user perspective, let's say you're on AWS, if your disc fails, it's just one one EC2 instance disck has failed for you, right?
>> Yeah. If you had an EC2 instance and it has 10 discs in it, like one of these biges. Okay.
>> Yeah. And one of the discs fails, they fail you over to another EC2 instance, all your discs are gone.
>> No way.
>> I mean, it's it's fine for your traditional Postgress replica because you just bring it up from and like you get instant you get the machine back instantly, which is really nice. You don't wait for an ops person. Uh so it's it's a fine model for uh uh the cases they run but it doesn't work for the cases we run. Uh and you know the v and there's very few machines on AWS that have these big NVME on them. Most people use remote storage and you know the problem there is the IOPS from the the the uh the long uh Ethernet turnaround >> u and I don't know if you >> and the price too the price and the price too.
>> Yeah. Yeah. I mean, if you want 200,000 IOPS on an EC2 instance, you pay like 20 grand a month. You know, it's expensive.
>> You pay a lot. Yeah.
>> Yeah. And it's actually it's not unreasonable that it costs that much because it's actually really hard achieving those IOPS over a network.
It's like a technically hard thing you've asked for. Uh but it's kind of weird, right? Because my laptop has 500,000 IOPS. And so you have this odd experience if you write software locally, it's fast and good. You ship it to the cloud, it's slow. Like what's going on?
>> Yeah. And and not only that, I mean SSD and hard drive prices have been falling off a cliff in the last 20 years, 15 years, 10 years.
>> Yeah. With an uptick in the last 12, unfortunately.
But yeah, it's uh Yeah, and it's I think it's there's a bit of path dependence like they built those clouds around hard drives and then redesigning them around SSDs is hard. Uh there's also there's an inherent advantage to the cloud providers in having all of the disks be remote which is if you think about all the instance types and all the different dimensions of the instance types like for every instance type there's the the general case the network optimized the CPU optimized the RAM optimized and like there's all these variations. Uh you'd have the same problem if you started using local disk for everything where for each one of those you'd have the one with a lot of local disc and the one with very little local disc.
>> Yeah. uh and so you add an entire dimension to the SKs they sell and they already have a lot of instance types and so it becomes a management you know the logistics of it would be terrible for a cloud uh we we think uh it's worth paying that logistics price uh the small scale you know having a lot of NVME because we think it uh uh we can build a better product doing it >> I think especially with AI should help you choose between these CPUs right you should be able to manage a higher load that like the cognitive load of okay, which CPU do I choose out of this? Let's say 1,00.
>> Oh, for the users, absolutely. Yeah. I mean, that's uh the AI are great at that. It's a they're great at building spreadsheets for you. From AWS's perspective, they have to manage all those different machines. They have to figure out how many to buy of each one.
Like that's a that's a hard problem.
>> Yeah, that's true. But for that point of view, I mean, there are virtual machines, right? So, they presumably buy other machines that they virtualize on top of. So they don't buy other VMs CQs.
>> Yeah. I mean I think your problem is from again from their perspective is there are some customers who want 256 vCPUs and a terabyte of RAM and 100 GB of disc like that's what they that's what they want. And if you've gone and built a bunch of machines where instead it's 10 terabytes of disc and now what do you do with the rest? And you can expose it remotely over the network. Uh and that's that's kind of the great thing about remote disk. Uh but if you don't uh uh it's wasted uh and then you have to charge them for it.
>> And so hence the this is why the the many the many instance types problem and then the many kinds of machines in your data center. They have they have real hard problems these >> I get. Yeah. It it basically removes one dimension from their balancing problem.
>> Yeah. That's right.
>> Which can be a big effect actually >> at scale. Uh >> for them. Yeah. It doesn't matter to us as users, right? We just want the thing that, you know, works.
>> But what you mentioned about price is super important, too. This this notion where you get a bad deal if you're small scale, medium scale, if you're large scale, then you can negotiate and get the prices down. You're presumably talking about those um those specific word for it. I forgot the name, but there's negotiations where they give you a discount on whatever whatever thing you're buying. They can be very large discounts. I don't unfortunately I think all of them are under NDA, so there's almost no public information. But I've heard cases where you can get let's say networking discount at up to 90%.
Yeah, I don't have any data on the sorts of discounts you can get, but I know they have the margins to give the discounts and the simple mechanics of largecale vendor purchases mean those discounts have to exist because you know if uh if you spend 100 million a year on cloud and you know you're on Azour today and then you go to GCP and say hey can you beat this price they will because they'll want you as a customer and so you simply you have the negotiation leverage if you're willing to move clouds to uh uh to get the discounts and so presumably and you know that's this is the same as buying stuff at large scale from any vendor uh and so they must give out great discounts to very large customers we can just infer it from the um yeah and another topic on the costing is I think the last two years I saw I kind of saw this trend of people moving away from the cloud for price purposes I think uh DHH was on talking about this very publicly and other people agreed as well because I think we've reached this tipping point where the cloud is just too expensive for what you're buying as the hardware keeps growing in capability and basically the price of let's say a vCPU decreasing the the cloud prices I don't I don't see them moving at all and >> yeah yeah that's a good question I mean I don't actually know the general case like yeah uh so like DHH I I haven't read his things, but I assume he spends like singledigit millions on a cloud per year.
>> I think 2 million they >> spent. I don't I don't think that's enough money to get a good discount out of a cloud. Like they're not they're just not going to do it. Uh and so he's the perfect example of that sort of lostinthe-middle company that could be served by someone who, you know, wants to build for that without the crazy margins. And so, you know, that in a sense that's the ideal place for a business like XDev to get to. I think we were a long way from being able to help him.
>> But uh that that's uh that's the dream.
>> And and you know the main push back I see when I talked about this was that oh well AWS gives you all of these services and all these autoscaling and auto restart or whatever else and I think that's a little bit overplayed and I have my own opinions but I want to hear what you think about this.
>> Yeah I I agree with you. I think it's overplayed. Uh I I would say the one magic thing AWS has that you they really do have is they've always got another computer, >> right?
>> Like whatever's whatever's going on, there's another computer. It's on tap.
And like that's really like as a person who's busy racking machines and buying them, you know? I'm like, "Oh, I need another computer. Cool. 6 weeks lead time. It's a radically different experience when you press the yes button and like 6 weeks pops up instead of like a 30 second start time." Uh so, you know, huge credit to AWS for building something like that. It's it's an immensely hard problem. Um but beyond that they they you know if you go to their website there's like 800 services that appear right. Uh almost none of those are useful to me and they've become less and less useful. You know it's mostly a uh it's mostly a list of things that people thought were useful 5 years ago. Um and the few that are useful are very easy to do now with agents in variety of ways in the open source stacks we have. Uh there's there's a couple that are great and I really admire.
Yeah, the cloud UX is ridiculous. The UIs are just ridiculous to navigate through and you know agents experience the same thing as you say and the CLIs and the and the APIs I guess they're sort of a byproduct from this over complication.
>> Yeah. But like if you look at like all the managed database machinery on AWS, I don't think I want to use any of their managed database products, you know, mostly because they're the wrong shape for the problems I have. uh you know I I think I if I had to run a Postgress on AWS today for a company uh and like I had to just use AWS like I would just do it myself. I wouldn't use RDS. I wouldn't use Aurora or any of those things.
>> I think you're in the minority here.
>> I I think I probably am in the minority but I think I've also seen RDS fail catastrophically. Yeah.
>> Which uh until you've experienced like you don't really appreciate the horror of it. You know I I would love someone to manage my database for me. I don't think those products are the answer though. Like I think there are other products out there that might be like I'm I'm very happy with the idea of managing my database. I just don't think they have the or someone managing it for me. I just don't think they have the products for me. And if I look through their catalog, there are a couple of things they do really well that I would not want to do myself. Like uh they have an email sending service for example.
>> Uh sending email well is actually really hard. It requires like a lot of care and attention to the quality of your IP addresses. Uh how other vendors are talking to you. It's it's a big data problem. Uh, and I think I think the Amazon service that does that does really well and I would happily be a customer of it. In fact, maybe I should be. I should go investigate it. It's uh uh and uh you know the classically um uh this S3 which Amazon do quite well uh which has always been a funny shape to me. I've never been a huge fan of it but like I have to admire the fact it works and uh you know I totally see why people use it. So >> yeah, history is an engineering marvel.
Uh yeah, I don't know about people. I think a lot of people don't want to host their own database and their own data infra partly psychological fear of so complex partly uh being able to point the blame button at somebody I think like >> avoiding responsibility for data info that costs your business millions if it goes down I think is another appreciated aspect I so I worked for for conflict which was a managed Kafka offering so Kafka SAS and yeah I saw a lot of this a lot of people do want you to run their Kafka and all these vendors what what they do day in and day out is just they market how hard open source is to run and how hard it is to manage when in reality I don't there's nothing hard about it there is um a learning curve where you need of course you need to know what the conflicts mean and you need to know how the service works and etc but there's something fundamentally difficult about you know the software >> yeah I do I agree with you and I'm actually really curious what agents are going to do to that in the future like I am, you know, using agents to crash course myself and how a system works is extraordinarily effective like you know and and this comes of course I've got 30 years of background in a bunch of systems and so learning the next system isn't too hard >> but like it's amazing right like I' I've never actually run Kafka myself but like how long would it take me to figure it out with an agent like maybe if I sat down and worked hard for a week I could you know I I would understand a lot of its failure modes and be confident to run it myself like that wouldn't have been true like 5 years ago.
>> Yeah, definitely. Just the the ability it democratizes knowledge like nothing else because the knowledge of how to run Kafka is somewhere out there in the vast internet. There are multiple articles.
It's just that you need somebody to go over all them and humans usually do this by spending five years in the industry and just reading whatever they see >> but an agent can pull it you know in an hour.
>> Yeah. I mean back to RDS just because again I've seen it fail. Like the scary thing about RDS is they're running Postgress for you but you don't you don't get direct access to the Postgress. So there's all sorts of failure modes where you know exactly there's a button you could push to go and fix it but you can't. And again if you're small and you email them you know some hours later they might look at it.
>> Uh it's a very nasty setup. Uh and again, you know, I agree that there's a great idea of letting someone else manage it for me, but now that I can go on the internet with the help of agents and figure out potential solutions to Postgress, the idea that I couldn't get to my Postgress is terrifying. Like I want I want to fix it. So yeah to to play devil's advocate also I mean there is a sort of network effect in SAS providers where if you're a SAS provider and you have thousands of customers and they hit the same issue eventually you develop some proprietary solutions that the open source never sees >> and then you might have some issues that would have been a problem with the open source that are not a problem to you because your automated system fixed it already. So that is one thing that you know credit to to SAS if if done right.
Um but but to your point and I was going to bring the same point is that AI changes changes completely completely I'm I I don't know where our AI ses are but they I I expect to see products like this in the not too distant future because fundamentally I think it's a very simple problem to solve but there's a lot of pattern matching in like okay the log says this my disc is full at least I know what to do I know what the runbook kind of looks like maybe a human can approve me running these commands but I I really think we're going want to see improvements and to that point um it's a good time to segue Shell which is your AI agent in in X uh which I did a demo I I logged into a VM and I said to to Shel I just said hey uh build me a Kafka cluster give me Graphana give me a nice UI uh give me a schema registry um yeah and send a message or something like that or give me a UI for for these things and like traditionally this would have taken me as a human probably 2 hours to do because the route is open cafka, read the docs, read the quick start. Okay, then find the the schema registry component I need to to open.
It's a separate component, not part of the open source project. Open the GitHub, follow the readme, find the UI.
There are 20 UIs to choose from for Kafka. Pick one, follow the readme, follow the installation, set up everything, create a Docker compose or something, spend 30 minutes triing and erroring that out as it fails to deploy, and eventually I got it. Shelly or should I say uh Claude um or both did it in under 15 minutes. I actually demoed it in a video that blew my mind. That's a that's a major timesaver and it's an example of of what I'm saying that apps are should and are getting so much simpler with AI.
>> Do do you guys use anything like this internally? Oh, I mean, so it's, you know, there there's a constant tension between uh uh, you know, doing a better job in production because you put an agent there and the the danger of putting an agent in production.
>> Uh, the only time I've ever actually deployed an agent in production was when we launched back in just around it was just around Christmas time. It was just just after Christmas. uh we just had it open so anyone could just use it directly. Uh and we immediately got a lot of spam and abuse. It was a whole lot of there was some Bitcoin miners but it was mostly DOS clients and other really nasty stuff. Uh and uh uh we had a whole bunch of people we had a spammer signing up really regularly like creating accounts every few seconds. Uh and somehow they were wrecking our machines. Uh and uh and I I thought to myself, well, they're creating accounts every few seconds. So I created another production instance because this was back when we were on AWS. I pushed the on button, got another machine, loaded it up with spammer accounts. I watched the machine start to flail. Uh then I installed a command line agent directly on it and gave it root. And I was like this is I was like I'm willing to do this because it's just spam VMs like you know I'm not putting anyone's machinery at risk. And I said like look what's going on. Why why is this having such a detrimental why why do these VMs uh why are their network behavior not isolated from the other ones and it went and examined a whole lot of uh Cisco machinery I didn't know about you know it's a there's always something new to learn about Linux right and it came back with an answer really quickly about an IP tables misconfiguration I'd made uh in the system uh where I thought I was carefully restricting uh uh VM's ability to uh influence the contract table, but I'd got the interfaces wrong. Uh, and so they were able to uh collapse this this uh uh this table in the host, you know, in the host system uh networking configuration uh and like ruined the day of all of the VMs next to it. Uh so like as soon as I knew this, it's like a oneline fix uh and you know, deployed it and like all the other machines and then like cut the spammers off and deleted all their VMs.
Uh but uh you know I didn't know what to look for like I you know we were delving into some very obscure part of the Linux kernel that would have taken me many many hours of uh research and here I was fighting a production issue and I was in Australia at the time it was 2 a.m. and no one else was online. And so like the fact that an agent just showed up and solved the problem for me uh was uh it's one of those experiences like the once it happens to you, you're like, "Wow, okay, I'm never going back." Like I need this tool in my pocket to solve problems.
>> So yeah, uh the the the the magic for me of the AI S, which again doesn't really exist yet because what what AI are you going to drop into production like really risk user data with? you I wouldn't you know you can't do it on a real machine because there's risk of user data eress you know Xfill you can't you can't do that u but at the same time they have all this esoteric knowledge about systems that is extraordinarily hard to come by and can solve problems so quickly like there we have to figure this out >> we do yeah we we definitely do I think I mean I guess as a naive implementation could be a read only version where let's say you don't deploy to the VM but you have all of your VM telemetry and data ship out to some system which the AI can then analyze or some I guess the the way it will happen is well before production we have the toy projects which is let's say my accountant exit defe I run some some toy toy websites there I'm happy to give the AI everything there that's >> exactly yes me too >> yeah as I said this isolation this isolation lets me like run the AI as as much as it can let it break everything I'll create a new VM I don't care I don't care >> so that already proves the thing this is production for me for something that's not super important and then I guess we'll get this top down effect where first we use it internally first we use it on our own devices which is let's say my MacBook which is probably backwards now I think of it but I think that's how the adoption started everyone installed cloud code locally >> they were going to start using them externally let's say some some dev machines like like X or AWS or wherever and then eventually maybe it reaches its way to to dev staging maybe production I don't know in some very contained form but Um >> yeah, I I think you've hit a couple really good points there. Uh first is yeah, we all installed Claude on our laptops and I can't imagine running an agent directly on my laptop now. It has the keys to the to everything. Like it's uh it's the last place I'll put an agent. Uh but then uh excuse me the uh uh in production you're right that again all of that software that I'm willing to isolate and do the whole dev equals prod thing for like I'm very happy to let an agent have root and like edit some systemd files and change some croup configs. Who cares? It's like uh I have snapshot, you know, actually this is a feature that we're about to launch for XCD. We're just trying to get the UI right. Like we actually have snapshot backups of the VMs. Uh and like to me the big trick to dev equals prod is the is the big roll back button that you click and you just go back to a snapshot from a few minutes ago.
>> Uh because you know if it deletes your production database where your production database is a spreadsheet that five people use like that's fine, right? It's it's a perfectly good uh uh uh uh uh control system for agents. Uh so yeah, all of that seems straightforward to me. The actual AI SRE is the really is the really hard one.
And you know, I agree readonly sounds good, but there's you know this is like two sides to this that's really challenging. The first is like readon still only gives it some of its ability.
Like it's actually really useful to let an agent play around a bit on a system so that it can explore and find the problem. you know, there's some problems you can find by actually changing some sysuddle settings. Uh, and the the second side of this is just Simon Wilson's lethal trifecta problem of the moment your agent is sitting around with user data or any data you value, uh, you have to cut off its internet access or else it can exfill the data, >> right? uh and like you know you you start and like as soon as you do that it can't go and Google search for weird uh uh user forum posts from 20 years ago that have the solution to the problem >> but but also you can't use let's say cloud or open AI because that is from the internet and you're potentially feeding the models with that data too.
Yeah, I mean so I mean we do everything through a ZDR contract with Enthropic and OpenAI and so the zero data retention contracts and uh I I I trust contract law enough that those companies are actually not retaining and training on it. Uh I certainly hope that turns out to be true. Uh but you know they're promising it's true and so I I choose to believe it. So >> Okay. No, I just wondered if a customer has a contract with you and you have a ZDR contract with them, does that complicate things? But maybe not. Um >> we don't we don't store any of the uh uh the sessions with Shelly. Uh they're stored in your like local SQite database that you control in your VM, but like we don't train on it or and like we pass everything through on those sort of standard business contracts.
>> So, but but to your point, even editing that uh CICTL settings or something, I mean, even if a human does it, that's still something that's very risky and usually you will not see it in production, right? Like at all, >> that's right. I mean, that's why, you know, I'm I'm scared of the S AI S doing a thing I would do, right? And like breaking the computer. I break computers all the time. And so, >> yeah. No, I mean, I think we're far away from having a an automated AI system. We have automated systems that we've coded up ourselves. Sure. Uh, but even today, if you're in a big company and you want to do some change to production, as a human, you still probably need approval from another SR at a minimum.
depends on the environment and the setup. But yes, uh you know, I've I've seen uh uh I've seen everything from SRRES have to sit next to each other while they work and read each other's commands before they hit enter uh to uh software systems that enforce that uh to uh companies where the S is, you know, it's a part-time role that the programmers fill that uh you know, whoever's awake deals with the problem.
You know, it's the world's got the world spans the spectrum.
>> Exactly. So there is a spectrum and I was going to say there's also this like human psychology has showed us that people will risk things for convenience.
I think the best example is Tesla auto driving. You know getting a letting a robot drive your car is the most dangerous thing you can do dayto-day but in practice it works. In practice people trust it enough to do it and maybe they even put their hands off the wheel even though they should they shouldn't. So from that perspective, from that context, I think I can imagine a fair subset of companies even yoloing with some agents in production probably.
Yeah. I mean, you know, it's uh uh yeah, uh I mean the you to I don't know how far we can push the uh the the robo taxi analogy, but you know, I've been in a few taxis and you know, they haven't had the best human drivers, uh which are a little scary. Uh and the robo taxis, uh in theory at least, all the data is centralized and studied and you can draw good statistics over how reliable the drivers are. uh you know, you don't get a robo taxi that didn't sleep well last night or maybe spent too long at a party or something like that.
Uh you know, long term uh those sort of wellstudied systematic models give you better results. Uh we may not be there yet. You know, I really have no idea what state things are in.
>> That that's a great point with human error. It's an amazing point. You're right. the majority well I won't say majority but a fair amount of SRRES are not to the level you've been used to working with because maybe you've you've worked at really good companies in the valley etc like the spectrum of SRE in the whole world everybody running some sort of production database in every country in the world is is very large so now that you said I think actually I can probably replace a lot more than we think >> yeah I mean me being an S sur at 2 a.m.
It's not what you want. You know, I'm not I'm not at my best. It's uh uh Yeah.
So, is there a day when a a machine can do a better job than me at 2 a.m. as an SR? Absolutely. Like, I'd be very excited to see me get replaced in that role. So, all right. Well, we have we have a few minutes more to go. Um I want to ask you, is there anything you would like to share about uh Exi, your career, or basically anything about the the topics we've been talking about so far?
>> Oh, yeah. I mean, you know, I'd love people to try it and give us feedback.
Uh >> there's a 7-day trial, right?
>> There is. Yeah. Uh most of the time, you know, depending again, there's a complex uh spam fraud model in there now. Uh but most of the time, if you show up, you can just try it uh with a 7-day trial.
Uh the uh uh you know, feedback's great.
Uh the I think the the thing we're most interested in right now is you know what's uh what's the next thing we need to build to be more generally useful because we have so many things to build. we have these really long lists and you know it's a uh a big challenge for us is just prioritization like you know do we need you know more sophisticated DNS control do we need static IP addresses that you can open arbitrary ports on you know do we need good you know we have integrations do we need a good Slackbot integration uh so that you can like talk to Shelly in Slack uh is what's missing a Discord integration you know just just trying to answer questions like that uh user feedback is great for that uh to try and get get to like which thing should we be prioritizing? Uh, and you know, we get, you know, we get several pieces of feedback a day and then I have agents help me put it into spreadsheets and try and manage it. And it's uh, again, it's a it's a good task to try and automate.
Uh, uh, because there's there's a lot of value in trying to be uh, rigorous about these things. It's very easy to get uh, you know, the last four people I spoke to, the anecdotes they told me have way too much weight in my mind. Yeah.
>> Uh, and you know, I I want to solve their problems immediately. And if you make a spreadsheet of the last, you know, 40 people you've spoken to, you suddenly see the pattern is very different and you're like, "Oh, wow.
It's a uh it's good to be rigorous about these things." So, yeah, we're very much in data collection feature buildout phase right now.
>> I mean, how old is the company? How old is this cloud?
>> Oh, I mean, we launched in December and started work on it months. Yeah, that's right.
>> That's nothing. That's >> Yeah, it's uh there's a long way to go.
Uh but uh we we have a you have a very good team uh capable of building a lot of things. We have a lot of you know we have agents supporting us building things which helps a lot. Uh and you know we've got uh good foundations to build on which is nice. You know the uh you know building building EC2 back when they built it was extraordinarily difficult uh because the Linux kernel did not you know uh help you uh back then uh even though they were giving dedicated slices of a machine to each VM they still had noisy neighbor problems because Linux was not good at isolation and scheduling. Uh and in fact there was a very early now very big customer of uh Amazon who was uh telling me that they would uh start their EC2 instances and then run a little for loop in bash on it that would just print the date every second and then they had a outer script that would look at the dates that got printed uh and if uh the if uh 10% more than a second was spent between each second uh in the date printing so the sleep one actually slept 1.1 >> uh they would shut the EC2 instance down and out another one. It meant there was someone else on the machine who was you using a lot of CPU uh and you know they had a noisy neighbor. So they had to build this noisy neighbor detector in front of EC2. But you know the Linux kernel is so good at isolation and control now that it's it's so much easier to build highquality uh cloud-like primitives which lets us get into like building more interesting primitives uh precisely because of that.
So yes, it's it's a good foundation to work from and we have a lot to do. So >> that's amazing. Well, I'm really looking forward to see what you guys built it. I can't believe it's just been 6 months because the park looks very well polished. Uh but I'm looking forward to see next year, next two years, next three years.
>> I wish you all the best of luck and um thank you for taking your time.
>> Thanks, Dan. This was fun. That's uh you have a good
Related Videos
Agentforce NOW AMA: Build with React and Salesforce Multi-Framework
SalesforceDevs
490 views•2026-05-28
How agent o11y differs from traditional o11y — Phil Hetzel, Braintrust
aiDotEngineer
450 views•2026-05-28
WEB TECHNOLOGIES UNIT-2 | Degree 4th sem BCOM Computers web technologies unit-2 full explanation💯✅
LearnwithSahera
1K views•2026-05-29
More tests are always better? How to use AI to identify tests that bring little value
Alliance4Qualification
335 views•2026-05-29
Search Algorithms Explained in 60 Seconds! 🤖💨
samarthtuliofficial
218 views•2026-06-01
People of Game of Thrones using JavaScript DOM
AltCampus
296 views•2026-05-30
Introduction to Problem Solving Part - 1 | Lecture 1 | Intermediate DSA
ascensionix
107 views•2026-05-29
So What's Odin Lang Even Good For
TechOverTea
131 views•2026-06-01











