Install our extension to search inside any video instantly.

Nebius Inflection 2026: Full Program

Added: 2026-06-20

1,396 views832:06:20nebiusofficialOriginal Release: 2026-06-19

Nebius correctly identifies that the AI industry's real challenge has shifted from building models to the complex engineering required for reliable, large-scale production. This program offers a pragmatic roadmap for turning experimental prototypes into sustainable enterprise infrastructure.

[00:00:12]What happens when AI meets the real world?

[00:00:28]Technology is the foundation.

[00:00:31]Every revolution reaches an inflection point. This is ours. It's your curiosity, your vision, your determination, and your relentless pursuit of pushing boundaries that's made this happen.

[00:00:48]We're in the golden age of AI because of your ambition, making AI thrive in the industries we all rely on.

[00:01:09]Together, we're defining the next phase of AI.

[00:01:29]Please welcome to the stage chief revenue officer at Nebus, Mark Boreditzki.

[00:01:44]Good afternoon everyone and thank you for joining us. As the voice of God just shared, I'm Mark Borditzki. I'm the chief revenue officer here at Nebus.

[00:01:54]It is my great privilege to welcome you all to our first Nebus inflection.

[00:02:02]Okay, I'm not going to bore you with yet another um yet another AI keynote claiming that the entire world is going to change because of AI. That AI is making things move faster than ever before. That the opportunity is massive in front of you.

[00:02:22]As a matter of fact, every week we hear announcements for new models, new benchmarks, new agentic frameworks.

[00:02:29]Sometimes we hear all of them on the same day or even in the same hour.

[00:02:34]We're actually at an interesting point and I think you all know this that AI can shift from being a tool to being something that does amazing things that will shift industries. But everyone in the room here knows that what we need is another what we don't need is another speech about the size of the opportunity.

[00:02:58]We need a more honest conversation about what is actually happening on the ground now because a lot is working.

[00:03:09]Also, a lot is not working and a lot is still messier than people tend to admit in a keynote on the stage.

[00:03:19]That has to change.

[00:03:22]that is if AI is going to go from um experiments into systems people can depend on that requires us to move to the next level.

[00:03:36]That is why we created inflection not as another conference, not as another product launch moment and not as another place to make a set of industry announcements.

[00:03:52]We created it for the people that are doing the work.

[00:03:57]the operators, founders, researchers, infrastructure teams, investors, enterprise leaders, and builders taking AI from possibility into production.

[00:04:13]Because inside companies today, AI is already ahead of the decisions being made about it. Teams are building agents, multi- aent automations, and incredible new workflows. Some approved, some shadow AI, but all of it points to the same thing.

[00:04:36]People are not waiting.

[00:04:39]And that is where the work gets real.

[00:04:42]The demo works. The agent looks impressive. The first model call feels magic.

[00:04:50]Then it touches the real world. The workflow, the internal data, latency, SLAs's, governance, and the finance team asking why the bill just doubled.

[00:05:06]That's when the real cost shows up. And I don't mean the invoice. I mean the bill for AI at scale. Because once AI moves from pilot to real usage, the question changes.

[00:05:22]It's no longer about can the model do it. It becomes can the system do it reliably, fast enough, safely, at a cost that makes sense?

[00:05:36]Can we evaluate it when something changes? Can we see what happens when it goes wrong? Can we govern a multi- aent swarm to prevent it from doing harm? And can we prove the value is greater than the cost?

[00:05:55]How organizations answer these questions is what separates the companies that will define the decade from those that won't.

[00:06:06]The first wave was about scale, token maxing, more prompts, more context, more reasoning steps, more loops.

[00:06:19]But reality changes the metrics.

[00:06:22]What comes next shouldn't be about more tokens. It should be about making them count at a cost where the math works, at a quality users trust, at a velocity that changes business.

[00:06:38]The team that wins the next chapter aren't going to be the teams burning the most compute.

[00:06:46]They're going to be the teams that turn compute into outcomes.

[00:06:51]Think value maxing instead of token maxing.

[00:06:56]Value creating production AI at scale is the next great inflection ahead of us.

[00:07:04]But technology on its own doesn't create inflections.

[00:07:09]People do.

[00:07:11]The future won't be predicted into existence. At Nebus, we see our purpose as simple. Help the people who will actually build it.

[00:07:26]And those people people are in this room.

[00:07:31]researchers and operators, founders and enterprise leaders, infrastructure and application innovators, each contributing a critical piece of a much larger ecosystem.

[00:07:48]We believe no single company will bring the full potential of AI to the world alone. And that's what makes gatherings like this one so important. Throughout today's program, you will hear perspectives from the most interesting and leading minds in AI. You will hear about where technology is headed, what challenges still need to be solved, and what it will take to move from experimentation to production at scale.

[00:08:25]We hope these conversations challenge assumptions, sharpen perspectives, and inspire new ideas, and that you take full advantage of the extraordinary group of people gathered here today.

[00:08:41]Many members of the Nebas team are here as well, not just to present, but to listen, learn, and collaborate. And as we move through the day, I encourage you to engage fully, ask questions, share your experiences, challenge conventional thinking, and take advantage of the opportunity to connect with this extraordinary group of leaders gathered here. So, thank you for being a part of this. Um, before we get started, I'd like to share something that's of a great deal that means a great deal to us here at Nebus. Nvidia and Nebus have been building together from the beginning of Nebus and I want to share a special message from Jensen Wang.

[00:09:28]My friends at Nibbius, congratulations.

[00:09:31]What you're building is extraordinary.

[00:09:33]Data centers are becoming AI factories.

[00:09:37]They turn energy into tokens and tokens into intelligence. AI factories are the new infrastructure of this era. They must be built where people live and work and create region by region, community by community. The infrastructure must live where the demand lives. This is what NBIS is building. You started with deep cloud engineering DNA. Then you rebuilt your platform for the AI era.

[00:10:06]Scaling from one data center to gigawatt scale AI factories in just 2 years.

[00:10:13]NVIDIA brings accelerated computing, networking, systems, and inference software. NBIS built a full stack AI platform for developers, startups, researchers, and enterprises. Together, we are proving that the world needs AI infrastructure everywhere, and building it locally is the only way to make it work.

[00:10:34]The buildout has just begun. NVIDIA is proud to partner with Nibbius as you build the infrastructure of the AI era.

[00:10:44]Congratulations and have a great inflection.

[00:10:54]Please welcome to the stage co-founder of Nebus, Roman Shernan.

[00:11:04]Okay. I I thought that Mark will introduce me uh but they changed at the last minute and gave all the control to robots um and Jensen.

[00:11:16]Uh but we know that Jensen in control of everything.

[00:11:20]Um okay thank you for coming and for actually for opportunity to talk about what we built in neios what we think we already delivered and what is the next for us and our industry is obviously in the inflection point over the last few years we showed Actually, you showed what AI can do, but now we together need to prove that AI can create the real economical value.

[00:12:04]To reach AI's true promise, we need to create the real value for organizations and humans. And actually we need to build a healthy businesses with a healthy margins not just show the beautiful numbers in revenue and big investments rounds.

[00:12:25]Uh this is once in a life opportunity and we need to deliver and here is the bad news. Uh to deliver you need to go and figure out the boring infrastructure details.

[00:12:41]This is dirty work. The unsexy part of the work.

[00:12:48]It's one thing to have a nice prototype um or idea.

[00:12:55]It's it's another thing to overcome the complexity of real production and the scale and keep product as beautiful as we showed at prototype but reliable.

[00:13:09]The company that started with entropic needs to shift towards open source models to meet unit economics and really grow.

[00:13:19]Agent that works nicely in prototype uh when goes to scale starts compounds the problems and actually fall apart.

[00:13:31]A talented researcher that quit the large hyperscaler with a brilliant idea to build his own or her own lab needs infrastructure that will just work.

[00:13:45]And that's why we developed Nebios. We want to help builders when they scale.

[00:13:52]When we looked at the market, we saw the false choice. On one side, established hyperscalers with a lot of services, global reach, but looks like they were designed and built in the previous cloud era. They were not optimized for AI workloads and AI developers.

[00:14:17]They bolt AI services onto legacy uh legacy infrastructure and their modus operandi was always to lock in developer in the closed services with a complex billing and not to say that they have a permanent conflict of interest. They can allocate more capacity for their internal use and less for cloud customers.

[00:14:45]on other side so-called neoclouds. I told many times I hate this a new category of bare metal providers that built for AI workloads but often they are not reliable. Uh and to be honest builder experience is poor.

[00:15:12]people who build it more system integrators. They not real developers.

[00:15:19]So we thought that both choices have real constraints and we believe there is a third path a new category of the product scaled cloud for AI built from our first principles.

[00:15:38]First AI specialized.

[00:15:42]We built and and optimize only for machine learning. We don't do anything else.

[00:15:48]Full stack the best total cost of ownership for our customers because we go from the ground up. We build and operate data centers. We assemble our racks and servers and we build a fullstack software platform.

[00:16:07]Third is builder first experience. We call it meet builder where he needs us.

[00:16:16]So we let developer focus on what they need to do and what they need to control and abstract the less of complexity.

[00:16:27]The fourth is openness.

[00:16:30]Actually we're too small to try lock people in the closed ecosystem. So no vendor lock, rely on open standards, give a choice.

[00:16:42]And the last but not least, people matter.

[00:16:48]Support, customer experience, engineers to engineers relations.

[00:16:54]And um we also dog fooding our platform from day one and until now.

[00:17:02]We built Nebios as a different kind of the company for different kind of the users.

[00:17:10]You know this phrase, nobody was fired for choosing AWS. So our customers also can choose AWS.

[00:17:18]But they choose to build on Nebios because it's fast, efficient, and actually engineer it with the teams with the most demanding AI workloads in the world.

[00:17:29]We combine the scale and reliability with the performance of supercomputer.

[00:17:34]This is our moto. And let me share some examples of how we build together with the uh four kinds of customers.

[00:17:45]Um first our super lab partners Microsoft and Meta. We help them to build their internal ecosystems. Of course, uh they are large and capable, but they come to us because they know we can build very fast in a real constraints of the physical world. They need the largest interconnected clusters in the world. And we deliver custom racks and servers, the latest GPUs with infinity band and multi-ter storage.

[00:18:19]Power efficient with unfailed reliability.

[00:18:24]People sometimes call it commodity but we think there is nothing commodity at that scale. A fully integrated production ready factory AI factory is not a commodity and for us it's a validation of how we build the foundation level of AI infrastructure from the customers with the most insane demand in the world.

[00:18:51]They taught us how to optimize bare metal compute and build the foundation of AI cloud.

[00:19:01]There are teams that need move fast to exist.

[00:19:08]They need to do more with less more with less. and they don't have large infrastructure teams that support them like in a big tech companies.

[00:19:22]So we built multi-tenant cloud for AI labs.

[00:19:27]Rec was building a 20 billion parameter uh image generation model but their training sessions uh stalled midrun. So our engineers fixed network directly patched nickel and increased training speed by six times.

[00:19:47]Corser needed access to Nvidia B300's for their large re reinforcement learning tasks.

[00:19:55]They were the first to adopt at scale the newest chips even before the official firmware was released. So we moved fast and hacked together.

[00:20:08]And we applied this co-engineering lessons to many other customers, accelerating drug discovery for two of the most successful UK biotech startups and dozens of other teams that developer that develop next generation image AI image generation like Black Forest or robotics.

[00:20:37]AI for robotics like ROA or video and world models like Deart and accelerating research like core automation.

[00:20:50]And the greatest reward for us is to hear that such a great experienced people look at us not just as a vendor but as a partner.

[00:21:03]We learned from those teams that real cost of training is not a GPU hour.

[00:21:09]Training at scale will break. So we built health checks with auto healing and allocated spare capacity for clusters to achieve the industry-leading reliability and up to two times better total cost of ownership than some of the big clouds.

[00:21:32]And also they need the earliest access to the newest hardware. So we hardly invest to be first and provide them as early as possible with a performance that only few providers can do.

[00:21:51]If AILabs taught us what it takes to train a good model, uh the next type of the customers is actually uh those who taught us how to serve them.

[00:22:04]Inference is exploding. Everybody hear it. And the driver for that is AI native products that serve already millions of users and exponentially grow.

[00:22:15]To succeed, they need reliable infrastructure that never sleeps.

[00:22:21]But what is even more important, it enables the unit economics of their products.

[00:22:28]Hicks serves more than 25 million users and grew from zero to hundreds of millions revenue just in a few months.

[00:22:38]They required developer experience that will enable them to very very fast and continuously experiment and also they need the very efficient autoscaling in inference to serve the spiky media demand.

[00:22:58]Brave delivers um over the 16,000 16 million of real time AI reviews every day. Every AI summaries every day. They started with do-it-yourself approach to inference just rented the cluster and ran their system themselves but shifted to managed platform because we could improve their unit economics.

[00:23:27]SW Health builds AI care for mental health patients.

[00:23:33]When sensitive topic is involved, high latency for user feels like we just don't care.

[00:23:42]Using ded dedicated endpoints at Nebios, they could reduce the end to end latency of their product from above 20 seconds to below 12.

[00:23:53]So for these customers we built Nebio token factory uh a managed inference platform that give access to um all the models optimized for every use case. It based on the same reliable infrastructure and uh orchestration capabilities that we have in the cloud.

[00:24:14]Inference optimization is a model plus system level problem and we combining Nebios engineering with two of the recent acquisitions that we announced.

[00:24:26]One of them is Egan AI uh team based here in San Francisco. Uh they are focusing on model level advanced quantization techniques spers attention kernel level and clarify system design orchestration kic caching and looks like now we have quite a strong team to deliver on inference validated by some very after uh respectful people.

[00:25:00]The next lesson was that we are only available if customers can if customers can consume us on their own terms.

[00:25:12]Not everyone starts from the scratch from and being nibble as AI natives.

[00:25:18]Enterprises on their way to become also AI company train and serve models but they need not only performance but they need trusted infrastructure. They need capabilities to add agents into existing systems and processes.

[00:25:37]Revolute one of the largest fintech companies in the world with more than 70 million users. They have legions of AI agents that run on top of very sensitive data.

[00:25:49]They added Nebul token factory for what their existing providers were not effective.

[00:25:55]And together we improved uh the velocity of uh their AI development and delivered 65% more fraud detection and 41% better product recommendations.

[00:26:10]Another example is Shopify. They train recommendation model and build quite complex agentic systems and they use sky pilot to orchestrate workloads across GCP and nebios.

[00:26:25]So no lock in multicloud.

[00:26:30]Mastercard they process billions and billions uh transaction every day and they integrated tavili our another recent acquisition the nebios agentic search uh into their existing flows. So now they can detect money not only based on the historical patterns but using online signal.

[00:26:55]The results higher detection rate and reduced reaction time.

[00:27:04]This happens with ecosystem of our pattern of our partners because it's not only how you build the product but who helps customers to extract the value. So we very helpful to uh all our partners who took a risk to bet on us uh early.

[00:27:30]The top challenge for large organization is not the technology even it's operational model and compliance.

[00:27:39]So we built our platform with built-in security observability cost control and compliance.

[00:27:47]And we also given their prices multiple ways to consume through console API and SDKs with well doumented recipes and no login 100% opensource standards and integrations to make multicloud work.

[00:28:05]Teams can integrate Nebios with whatever they already have and build with confidence.

[00:28:13]We shaped Nebios uh for different types of the teams with different types of engineering requirements.

[00:28:21]Let's look at this at the foundation.

[00:28:24]the latest and greatest GPUs running in our own service and Rex. Above that, a full cloud platform with powerful storage, auto healing, observability, everything tightly integrated for zero performance loss with set of ML tools including our own uh slur and Kubernetes operator, serverless and other uh services and on top the AI runtime token factory for inference. and fine-tuning of the models with system level optimization and model level optimization built in.

[00:29:07]All that with multiple ways to consume security, compliance and observability.

[00:29:14]One platform to serve any type of the builder. AI product developer to experiment and grow their product.

[00:29:23]a male engineer and scientist to spend their time on building and not configuring the cluster.

[00:29:30]Enterprise teams to run AI at scale with control.

[00:29:35]Nebios got there thanks to all the great customers we had privilege to work with and learn from.

[00:29:51]But that's not all.

[00:29:54]Everything we just showed and talked about is what we are growing now. But we want to be ready for what's coming next and maybe we don't know all the details yet how to deliver. This is agent agentic gentic new agentic world. Let me share how we think about how the future of AI infrastructure can look like.

[00:30:19]Agents are growing exponentially and changing our industry again. Customers want the agent to complete the task at a cost that makes product viable.

[00:30:31]Tokens will become the next infrastructure layer. Outcomes will be paid, not tokens.

[00:30:38]This creates the new requirements to the cloud which we need to address.

[00:30:44]An agent is not just one optimized model call. It's a loop. It plans, call tools and models, observes results, retries, and continues until the task is finished. Today, it's easy to prototype an agent. You can connect a model to a few tools and get things working. But production is different. Running one agent once is not the same as running thousands of agents for thousands of users in the organization.

[00:31:15]At scale small small mistake compounds looks nice 95% success rate per call converts to total failure.

[00:31:26]One bet plan can burn 10 times more tokens that we budgeted.

[00:31:32]So what should the cloud infrastructure be? First of course it needs high performance inference fast cost efficient serving many calls per tasks.

[00:31:45]Second grounded data access real-time web search extraction and research that gives uh context to the agent.

[00:31:56]Then orchestration uh we need to organize routing between the models and tools retries state management durable execution and uh task that they they can run for minutes and hours and of course observability in the walls. We need to collect the full traces of what agent planned, what it did, which tools and were called, what failed, what it cost, and what is the outcome. And of course, control and safety, permissions and boxing and cost uh caps.

[00:32:34]That is the shift from stateless model serving to agent runtime infrastructure.

[00:32:44]But when the infrastructure is in place, we can start the next loop. Every agent when it runs produces ton of data, plans, traces, costs and outcomes. And when we capture all that data, we can start improving agents systematically and continuously.

[00:33:05]The same way how today we optimize the inference endpoints, we can optimize routing, we can improve prompts and tool calling, we can reduce the cost. So the cloud platform becomes not just where agents runs. It becomes the system that makes agents measurable and continuously better.

[00:33:30]There is one more shift in the cloud. A new persona emerged the agent as a user.

[00:33:38]Cloud platform for were built for human users.

[00:33:44]Developers who read docs, click in consoles, deploy and debug services manually. Agents need a different cloud interface. API first programmable and observable.

[00:33:59]We in Nebios started moving there a year ago. We made nebios API available through MCP so agents could interact with platform.

[00:34:09]This year we are working on Nebios agent echo which knows how to execute complex complex task across our infrastructure.

[00:34:20]But the deeper point is workload optimization. Agents behave differently not like humans. They call API continuously run many steps in parallel retry and optimized for cost and efficiency that requires low latency APIs, efficient scheduling cost controls.

[00:34:40]Why I think Nebios can build it? We are vertically integrated from hardware to APIs. So we can optimize agentic workloads across the full stack and achieve the results. The goal is simple.

[00:34:56]We need to make Nebus a cloud that agents can use effectively.

[00:35:02]And what is really exciting this is a green field. Aentic AI will be a new workload for everyone for every player.

[00:35:10]There is no decade of accumulate accumulated experience.

[00:35:16]We see the new types of developers, the new type of the applications. Everyone starting from the scratch. And when everyone starts from the scratch, the advantage goes to those who can move fast and co-engineer.

[00:35:32]That's us in this room to provide a real value of AI products.

[00:35:40]It will it won't be easy, but you get what you do and the builders can solve this problem.

[00:35:50]take impossibly hard problems and solve them.

[00:35:55]And we Nebus will do our best to hold your back. We continue to build on pace scaled cloud for AI.

[00:36:05]And with that, let me call to the stage Aradi, our CEO, the one who push us to the limit. and he will tell uh in what de in which dimensions he pushes I just I just want to summarize actually what Roma just said just maybe a little bit add one thing it's just couple of slides it's not a big presentation so uh actually what Nabius is about what we are doing.

[00:36:47]Uh DBS is building a platform, a compute platform. We build our own data centers.

[00:36:55]We build our own racks. Everybody knows it. Recently, we started going a little bit down the stack and had to do something in the energy level of grid uh generation, the bloom contract and so on. So this is the basic hardware platform. On top of it, we built cloud, we built inference, token factory and now we we're building aentic layer.

[00:37:25]These are the tools which makes Navio itself. So, Navio is a compute platform and tools for application develop for developers of AI applications for those who actually create AI. So, Nabio is just a tool.

[00:37:46]What Neabus is not doing, Nabio is not developing its own models and no nebios is not developing its own apps.

[00:37:56]It's our dear developers or clients who using these tools using this compute build this magic things be it consumer apps or enterprise app applications which actually generate all the value in then real industry and this is where AI actually we expect where where all this value will be created where everything will be I don't know 10 times faster or 10 time 10 10 times cheaper or times 10 10 times more and if it happens then we have all the ecosystem works working and we're very close to this so neighbors is here just to state this uh Roma told about our two-dimensional space we're building the product software product mostly which the bare metal cloud inference agentic and we develop it it for different kinds of clients who needs different things who think differently who speak different language bare metal or buying GPUs or tokens and it's different kinds of it's it's uh developers or project managers Now it's much more people than we're using a traditional cloud. So we're building in these two dimensions, but there's one more dimension and this third dimension is scale.

[00:39:40]So we're building these things at scale.

[00:39:44]We started less than two years ago with a small 10 megawatt data center.

[00:39:52]uh we said that we are running more than 200 megawatt now. We recently said that we will be at point 800 to 1 gawatt of power running power by the end of this year. We said that we already have reserved contracted 3 mega gawatt of capacity and we said that by the end of the year it will be more than 4 gawatt.

[00:40:18]We're going there. We're very close.

[00:40:22]So this is the scale. But scale it's measured in megawatt, gigawatt.

[00:40:28]It could be also measured. Oh, sorry.

[00:40:32]Where we we're building almost forgot.

[00:40:35]We we're building all these megawatts and gigawatts in Europe, Middle East. It will be India and Asia Pac Pacific soon.

[00:40:44]And of course majority of all all of our building built and growth is here in in US.

[00:40:51]And what is important of those four gawatt we are contract we have contracted now 2/3 it's our capacity it's land power shell of our own. It's not leased.

[00:41:07]So we build a pretty big system.

[00:41:10]So this capacity again it scale could be measured in gigawatts. It also could be measured in number of GPUs here is it in one h100s but it doesn't matter the scale we are building is is hundreds of thousands and millions of GPUs.

[00:41:30]How many companies today provide hundreds of thousands of GPUs in a publicly available cloud?

[00:41:40]Well, there's the three hyperscalers I don't know and us maybe somebody else I don't know. Uh so we are building one of the largest publicly available AI cloud. This is the scale but this scale also could be measured in dollars. There is gigawatts and giga dollars. We started with we were happy to start with two two billion dollars 23 months ago.

[00:42:15]Uh then we raised it was actually enough for us to start reserving all this capacity because reser to reserve it's just 1% of total capex. Then we raised billions last year and started building these data centers.

[00:42:32]We reserved tens of billions this year and it will allow us to build our gigawatt this year of GPUs and now we're thinking how to get to hundreds of billions of dollars. Up to now we were pretty uh creative and efficient in raising this capital. We were one of the first or the first actually who started this prepayment model from many customers which helps us helped us to bring cash to build. Uh we were one of the first with this huge uh backstop contracts which allowed us to finance our builds cheaply.

[00:43:24]Uh and I can tell you there is also our converts and other traditional instruments.

[00:43:33]I can assure you that we will continue raising as creatively as as efficiently as we did and there will be some announcements coming even very soon.

[00:43:47]But we it's it's unlimited growth. We need more and more. It's hundreds of billions of dollars to build. It's gawatt to build. It's millions of GPUs to build.

[00:44:00]It's a huge scale production and I would say that yes we're building this three-dimensional space the product customers scale but there is force dimension like time you cannot put it on a on a graph but this dimension actually it's the story of the company itself we we next month we will be two years old we have created this company all the talent we we managed to to collect people who came to us who who who actually it's thousands of people now. So it was also again we call it the force uh dimension and uh using the name of this event today um we decided that we should inflect and we officially stopped calling our company startup from today.

[00:44:59]Again, we create the product, we create it at scale, and we create it for customers. And it's customers who actually make it all make sense. And I think it's the time to listen to some of our customers.

[00:45:18]>> Basecam research is building the internet of biology.

[00:45:21]>> This is worked.

[00:45:22]>> Yes.

[00:45:23]>> Robo Force is building robot labor that take on the things that humans shouldn't have to do. We're trying to improve professional communication.

[00:45:32]>> We are building foundation models to uncover new biology about the brain to change the course of dementia.

[00:45:38]>> We're trying to give regulated industries the ability to accelerate their software development.

[00:45:43]>> We design a new molecule and put it in the market in less than 24 months. That is absolutely insane. When you take 7 years to create a new molecule, no, it can take 24 months. One of our main technical challenge is to actually do the proper context engineering to not bring everything mindlessly into the context. Pharmaceuticals usually takes around like 10 years and $2 billion. We have to think about the data, the model and the infrastructure.

[00:46:08]>> Who can be the fastest to provide compute at a really large scale >> and that's where Nibius comes into play.

[00:46:16]when Nabius became important is to really scale our AI safely.

[00:46:21]>> Thanks to Nabius, we were able to scale our model in a matter of weeks.

[00:46:25]>> We could reduce the time from months down to just a few hours.

[00:46:30]>> You guys support us at the same speed that we wish to experiment at.

[00:46:33]>> The speed is incredible. Using Nebus, we reduced by about 70%.

[00:46:38]>> What would usually take us 2 to 3 weeks, we could get done within a week. When we started using Nebius, we were seeing our P99s at 4 to 500 milliseconds, which is phenomenal.

[00:46:49]>> We are talking to them actually constantly, like almost every day.

[00:46:57]>> I think it's a very exciting time to build right now. There are chances to build something that is used by a lot a lot of people. By 2030, I expect we will see truly great advancements in biology across all of the domains.

[00:47:12]>> We are going to need an exponentially higher amount of compute to take us to the next level.

[00:47:18]>> Find people that are on this journey with you because they believe in the mission.

[00:47:22]>> Very exciting journey. We're just getting started.

[00:47:36]Please welcome to the stage executive editor at the information Amir Ephrati.

[00:47:49]Hello everyone, my name is Amir. Great to be here. Um this is a great touching off point from the agentic layer to discuss how uh customers of Nebius real real companies uh providing products to AI customers are experiencing the moment that we're in. So without further ado let's let's welcome the rest of our speakers.

[00:48:24]Hey guys.

[00:48:26]All right. So, uh when um Nebius first came to me with the idea for this discussion, um I thought it would be uh quite quite a different um event than maybe what we have today. I I thought it would be kind of case study after case study of companies that are doing things that they thought they could never do before or they couldn't even do 6 months ago.

[00:48:50]Um that's still very much the case and there's plenty to talk about there. Um but I think we're in a bit of a uh let's call it narrative shift if you will. um a lot of chatter among CIOS and CTO's about uh runaway costs uh lack of cost controls um and it really seems to be dominating the the conversation. So I'm really particularly excited to to talk about that and what uh companies are are doing. So um let's kind of set the stage for that discussion. Um and I guess we'll we'll just sort of go in order and I I want to um hear from each of you. Uh we'll start with Nikita uh from uh from data bricks uh about just how you view this particular moment. Is this sort of a a temporary blip, temporary moment of review, reconsideration, uh retrenchment, or is it just sort of like a matter of a few weeks before the next models come out, we'll sort of forget about all that and we'll like race ahead toward like how much of the API at anthropic or open AI can we uh can we use? how much of our money should we give uh to Scott and to Devin uh and the agents there? Um why don't you kick us off with that >> and tell us who you are?

[00:50:07]>> Yeah, so my name is Nikita. U I I work at Data Bricks. Previously I founded a company called NEAN that was acquired last year. It's in the downstream consumption of all the AI coding agent providing infrastructure for modern apps. um at data bricks I'm I'm uh I'm running um an operational database what neon has become uh and also uh an application platform and so I'm very very closely connected to all the code generation systems so so here uh to answer your question um I think there's like a data bricks perspective like what people do like it's a it's gigantic engineering organization and then what are our customers asking us to do so let's start with the customers customers want cost controls um and and um you know obviously like we we just went through a a small phase of token maxing and I think we're we're still kind of in it. Um but then um the wheels starting to come off at uh at at various places where it is important to to leverage AI as much as possible if you are an engineer. Um but then there's also wasteful spend as well and so people want to see what's going on. the data bricks perspective is um that if you can put um AI AI gateway uh through the AI consumption so it's say it's a purely infrastructury uh point of view um saying hey send our AI your AI constru uh consumption through our AI unity gateway um and we'll tell you what's going on um uh and then from that point on it's like okay well from that you know if you want to you can uh shift into other models we will uh host them for you and yada yada. So that's purely a data bricks perspective uh for their customers. Internally I uh um internally the AI budget is unlimited for engineers. So token maxing still exists.

[00:51:55]Um uh people can bring their own AI tools or can consume like mostly people are running cloud code or something like an internal version of Devon I would say. Um uh something that's resembles Devon as uh uh probably but um uh I I think where it's heading is um I actually don't know. So I'm I I I think we're going to have more visibility in how productive engineers are and when we we start looking at the overall software delivery pipeline and seeing what the bottlenecks are in there.

[00:52:29]>> All right. Veny from Data Robot. Take us away.

[00:52:32]>> Sure. Uh I'm the I'm Veny. I'm the CPO for Data Robot. We've been really focused on how to get um enterprises to actually really retool their the way they work with agents. And so sort of like sort of like you know how uh agents have now changed how coding works. We're thinking about how to do core uh enterprise work workflows like business planning or safety or you know things like that and operations. So what we found on the cost side is that when you start seeing the stuff in production just start seeing it uh it is no longer sort of like a hey it's a $30 extra you know subscription right per user. you're starting to look at millions of dollars and suddenly it's a big line item and and so uh so when the I becomes large you you're going to start asking questions about the R right and so that's kind of what's happening right now it's everyone sort of relitigating you know hey is it enough to just put an agent and call it good because it costs a lot but what am I really getting for the on that side and so you know our perspective is you have to really uh on the eye part you really have to sort of think through uh cost is a very core design principle up front so whether it's when you're building the agent designing the agent uh evaluating the agent you got to think about that then you got to think about how to get the correct uh you know the models uh and then running them at at runtime you know uh Nikita talked about the gateway sort of you know thing you know moving to different models based on intent to different uh models so you're actually choosing the the lowest models and when you are running your own models self-hosting your own models are you getting the most utilization so it's really a big stack of working you do on the investment side and so that's important and on the ROI side and the on the return side you got to pick interesting problems right? You just can't take every >> yeah workflow you already have and stick an agent next to it and and sort of expect returns. You really have to sort of go native and sort of think through how to revisit uh your entire business process now that the agents there.

[00:54:19]>> Well, we'll we'll kind of go deeper on that in a in a second. Um for Nar and Nebius, um I was just on the way over here thinking about this this moment that we're in and it really feels similar to where we were 1015 years ago with the public cloud. I was writing about customers of AWS in particular that were like, "Wait a minute, I just spent 20 million more dollars than I expected to this year." Back then, $20 million was a lot of money. Um, and so it it actually feels like a very similar moment. I don't know if that if you find that to be true, but but where do you where do you see us in this moment when it comes to uh applied AI and you know the kind of costs that we're seeing among the customers? Yeah, I think cost problem comes with a scale. So uh when you start >> scale or scale >> scale scale yes uh like it's it's very easy to create prototype like Roman showed see some results but if you scale this prototype to hundreds of users your talking economics kill you and you need to apply some technologies to make it more reliable and cost effective like I can give you one of the examples like uh uh when we were a year ago when we were creating created an MCP for our cloud uh like you can you can just take this MCP attached to our uh cloud and ask some questions like what some user did in our platform and uh with MCP it consumes it took like 15 minutes and consumed million tokens just to answer this question when we updated it to echo which has like a lot of context about our information uh about our APIs about everything the same query like took seconds and uh and agent spends like thousands of tokens so it's really important not only to use uh models but also to optimize data layer and provide more and more efficient context to the models and it will reduce your cost drastically and that's what we see in our platform definitely >> and and for Scott um I'm I'm guessing some of this noise around um you know the backlash to to some of the spending that we're seeing out there is probably music to your ears as somebody who's uh not only um kind of uh built his business and product uh uh by sort of abstracting the model uh providers um but uh uh but also being very outcome focused outcome first um tell us about this moment how are you experiencing this moment you're seeing all these headlines you're seeing a lot of customers and CIOS and CTO's discussing about it uh about these uh cost runaway costs sorry what what is it like for you >> yeah for sure so so I'm Scott I'm Cognition so we build Devon the AI software engineer um you know pretty strongly agree with all the points that that folks here have already made like you're really seeing prices and just spend increase to a point where all these CIOS really care about it obviously and I think at a really high level the way I'd put it is like guys like AI works like it it's just clearly worth it you know when we talk about okay you're getting this kind of efficiency gain or you're be able to do so much more you know the GPUs are expensive but they're not that expensive right like like you you are getting you know relative to the tokens that you're paying if you're comparing that to how much more capacity you have or how much more output you're doing. It's it's very much, you know, the math is just very clearly there, you know, relative to to, for example, what you're paying your humans and how much more, you know, every single human on your team can do, right? I think the thing that um that people are often really focused on is okay, well, how do you really measure and optimize that and how do you think about that in terms of outcomes, right?

[00:57:53]And so, you know, writing 10,000 lines of code or something, you know, it is much much cheaper for any of these models to write 10,000 lines of code than for a human to write 10,000 lines of code. But it's almost, you know, it's it's almost like managing agents in the same way that managing humans is a thing, which is those 10,000 lines of code, if they were totally useless and on a task that you never actually shipped or built or whatever, then that was a waste of money entirely, right?

[00:58:17]And and and so it's much more about thinking about, okay, what are the actual outcomes I'm driving, right? And you know people talk about Jevans paradox which we I would say are very actively seeing in the field today.

[00:58:27]Every single company is building way more and they're shipping way more software. But what they want to know is okay well what are the concrete returns that I'm getting out of shipping way more software. How do I measure that?

[00:58:36]Right? And maybe it's that I'm getting my products to market faster and then I'm getting doing more revenue that way, right? Maybe that's it's it's that I'm able to give customers a way better experience and like let's figure out how that kind of maps to this. Or maybe it's that um you know all all all of these things that I'm paying for internally or like software that I'm paying for internally I'm able to build faster and better and and and smarter versions of that and that's why it's worth more. But it's really I I think less so about like whether the literal pound per for pound is there and more so about making sure they're directing it towards the use cases that actually affect their bottom line.

[00:59:10]You you make it sound pretty easy. Um obviously I don't know how many folks here are at extremely large um enterprises where there are lots of thieftdoms lots of different uh spending centers cost centers uh different comp different divisions doing different things. One of the solutions in the earlier paradigm of the public cloud was like centralizing having some sort of central decisionm around spending so that it wasn't just every team doing whatever they want. So, can we get it a little bit into kind of some of the organizational hurdles that you're seeing companies try to tackle here? Um, and then also, you know, we just uh wrote a column last week at the information around some of the basic uh steps that companies can take to try to lower their costs. Um, you can read about it there, but you know, a bunch of folks are now talking about model routers. Um there's a lot of things that like are easier for more sophisticated customers to do than less sophisticated ones, but for any of you jump jump into that.

[01:00:12]>> Yeah. No, it's to your point, it's it's definitely easier said than done. And you know, the with with larger organizations, there will be compartmentalized budgets and here's what you're allowed to spend on this and here's what you're allowed to spend on that. I I I think a few things I would call out that that we're seeing. Um, one is I think for a lot of spend, you know, spend that folks were thinking about, you know, outsourcing or figuring out how they pay for services or things like that. Um, it's it's been kind of a very natural rotation to to say, "Okay, well, let's figure out how we actually do way more with AI instead, right?" And so that's kind of like much more of like a a natural substitute. To your point, I mean, I actually think a lot of the the big gains are actually less from cutting cost, but more from increasing output and increasing capacity. Um there's some work I think that needs to be done in terms of like just like really passing all of that through and making sure you know throughout the stack that everyone on the team understands exactly where where the gains are coming from. Um but I think you're seeing that process happen across um a lot of these big enterprises today. Um on the point of um you know model routing I think it's a great point as well which is um you know I think what we're seeing especially is as people are thinking within the confines of of the tokens that they have or the budgets that they have obviously we see more and more that all of the models are getting better right and and we see this in code where you know the absolute hardest tasks you still want the very smartest models on right and so you know Fable which was just released today like there are tasks that only Fable can do but the reality is That's call it maybe 10 or 20% of the the tasks that a software engineer is typically doing. Right? For that other 80 90% there's naturally a question of okay how do I make sure I'm using the cheaper models and like there are way you know there's great open source models that will take care of 50 60% of my tasks and do that 10 times faster and 10 times cheaper. Obviously I want to go and make that improvement and do that. Um and so so I think model routing is becoming a much bigger piece and I think we'll continue to see that happen.

[01:02:06]>> Yeah. C can you guys Oh yeah. Talk more about open source if you can. uh at some point but yeah go ahead.

[01:02:11]>> Um one of the things that I think is very practical uh for a large organization is to build an internal tool uh that is useful for both coding and non-coding tasks. Um it it could be a thin wrapper over cloud code. It could be something more sophisticated but certainly it's connected to all the internal processes usually via MCP um uh that is email slack like basically all the system where work happens and a lot of work happens obviously in in generating code but also in in uh deploying that code and running uh CI/CD process. So what what this does is now you have a more like because everybody's using that tool and of course in order to do that the tool needs to be useful inside data bricks this tool is called Isaac uh inside um RAM it's called um incept I think and and of course you can just buy this tool off the shelf it will be called called Devon um but but but once you have that um you have all this telemetry coming out of this from the old AI usage which is not just you know your calls into AI but actually like work is done through that tool and once there then model routing becomes like a real option. Um and and and then you can start start piping certain c certain usage to cheaper models and basically um once you're on the consumption path and then you're on a consumption path end to end uh of every work that's happening in your organization you can digitize it and therefore you can optimize it you know by by piping to open source models by uh choosing model like lots of things can be can be happening. You will also find out that whatever that the bottlenecks might not become models at all. Uh like for example modern CI/CD uh uh processes is kind of broken. Um you know for a while uh you know at data bricks we had a like we now have piling PRs right they're waiting for either code reviews or they're waiting for uh for the CI/CD process to complete. We have all these charts going up and to the right of how many PRs in parallel are stacked up. So, so I think we'll we'll start optimizing those things as we see them end to end. And the first kind of prerequisite is to build um a tool that that where work happens for your organization. So maybe I'll take it um I'll take in a different direction because you started with organizational uh you know I think there's sort of two models we've seen. I think a lot of it is bottoms up, you know, people sort of picking up tools and sort of using them.

[01:04:40]And I think generally they end up automating and speeding up what they do themselves, like sort of the the productivity gains you get from the individuals, but those are really kind of hard to measure. You can certainly be qualitatively, but it's kind of hard to measure because you're like, I did a better presentation, higher quality presentation, but it's not clear sort of like how to measure that. Uh, so I think there is that. But I think the place that I've seen sort of more success in measuring has been a little bit more top down. So at Chevron who's a customer of ours, you know, they have a a whole team that's really, you know, reporting the CEO that's working top down to say how do we do AI based uh ret transformation and and they're working with us to actually take on these really hard problems. They have a facilities of the future. Uh there's a reference that just went out on our website, but you know, they really talked about like, hey, how do we pull things together that previously could not be done? And that's sort of the idea of like, you know, people have talked about ro collapse and time collapse with agents because things go much quicker. You can sort of do different things. And so we found you know getting a model uh that is sort of you know uh a traditional LM that does reasoning that is co-opt that is you know physics Nemo that does other other kinds of uh physics modeling pulling those together to actually solve a really hard plant safety problem. That's what they found. They were like hey look now we're able to literally not send humans into gas leaks because they're able to now you know manage their drones and safely right and so it's a very different way. And so they're that that they find that to be a very interesting use case that really sort of changes the the economics for them in terms of what they can do with their own work. So it's not you know how many engineers or how many you know individual individuals but they're able to really measure look we build a new facility of the future that has less people out in in the in the gas leak right. So it's a different kind of uh approach which is top down.

[01:06:23]>> Yeah. Yeah, I mean, I guess I I know that there's there's no silver bullet here, but I I think it's uh it seems like there's a a pretty big difference between um companies that maybe are extremely sophisticated, um run a lot of databases, know how to throw AI on top of it to create applications and those that maybe are in heavily regulated industries, are very old, uh gigantic, need a lot more handholding. Um so I I'm I think everyone is really trying to uh understand what are the best models for measuring you know outcomes for measuring ROI like what are the best examples that exist are there you know unique opport you know open AI and anthropic can can say until they're blue in the face this is how you should do it but are there um you know startups that are starting to to come up and and help and provide the right kind of dashboards or is it going to be the legacy I don't want to say legacy the OGs of AI included or the palunteers of the world or the sales forces of the world that are going to insert themselves and say we are the intermediary here. You need us to make sense of this all to know how to route to different models to know how to use open source in addition to everything to achieve your outcome. Just tell us what you want to achieve and and we'll achieve it for you. So I I'm just trying to I'm just trying to figure that out because like there's this massive debate about oh no, you just need a really good um set of databases, spin up a lot more databases, throw AI on top of it. I mean, you know, ask Snowflake. They've been saying a lot of this recently. Um, and you're off to the races. You don't need this intermediary layer. It's just this massive debate in the industry right now. Right now, I'd love for you guys to weigh in on it.

[01:07:59]>> Yeah, I can. Actually, this is our internal pain right now because uh we are enabling AI in our teams. Uh, and uh data layer I believe is like is super important uh to be optimized for agents like what I'm talking about. I'm talking about uh you can you can get great productivity with uh creating your own LLM Vikas uh uh for your personal use but when you scale it you realize that like there is like there is not not much technology today to help you with uh uh because you need some semantic layer to show to guide agents what is company context like what is company terminology what is the company legacy what is history what are the processes And all of these are located in scarsed data sources inside company plus half of the information is people heads into emails in chat. So uh you need like different semantic layer which will aggregate all these uh data sources like just to scale like you can get a lot for personal productivity like my personal clothes it can create great analytical queries like better than anybody in our company to be honest but like to scale it like I can't scale it because like it has my own context as well so I need something in between to connect all the data source and to provide shared context to the company. So I believe companies should move to have personal context plus create corporate cortex uh to uh allow it to be super scalable.

[01:09:29]Yeah.

[01:09:29]>> Yeah. Yeah. I'm I'm glad you mentioned semantic layer. We actually did a whole article on the semantic layer a couple of weeks ago. It was sort of in the context of uh Microsoft's PowerBI and how it's trying to put more kind of um walls around it, make it harder uh maybe for certain folks to to to get in. um uh by folks I mean having customers use um you know data bricks or other uh tools uh to kind of uh bring their data from PowerBI into the rest of what they're doing and in developing AI applications.

[01:10:00]So I I am c I am curious for your thoughts about this and how this is going to play out. I mean um putting putting walls and and toll gates uh around um you know traditional applications seems to be happening. uh companies are talking about it on earnings calls. It's sort of the beginnings of it. I don't know where it's going to end up, if there's going to be a revolt among customers or not, uh or if customers are able to vibe code their way out of it. But I I I'd love to hear your your thoughts on on that trend as well, any of you.

[01:10:31]So I I would say you know I think you can put the toll gates on but I think that you know some people will uh grudgingly accept it but I think in the end it's not going to work out because like you know it's the customer is going to say like this is the stuff this is my data and my IP and they will have to sort of figure out the the right thing where they wipe code there's a way out of it there'll be you know there are alternatives they'll go to someone will say hey look there's a way to make money not having those things right and so I think that may not be the the hardest problem to solve for. I think like I think it'll it will go go past it.

[01:11:02]>> I I actually live this problem a little bit. Um because well think about it. Um where is the work happening? Is the work happening inside an AI tool? Like is the work happening in cloud or is the work happening in my SAS tool?

[01:11:14]>> That's right.

[01:11:15]>> And and so uh okay well um where does the customer want the work to happen? Uh and at the end of the day where the customer needs the work to happen that's where this thing will arrive. and you can put put up walls but if the customer wants to to live in cloud code then a competitor will offer a capability that allows you to consume everything from cloud code. So now okay well but you have your SAS tool it's you know billions in revenue all of those things and what do you do uh what you do is of course you you have to do both and and then you will allow people to do work where they want to do work and then you need to bring AI into your product uh and make sure that hopefully um if you're lucky um and if you're not you're going to be dismediated right but if you're lucky you can provide better experience by using your tool with AI then um you know consuming that same product inside cloud code and I think every every company every owner of a SAS product and obviously data bricks is a SAS product and this is a data product but it's also a you know a SAS tool um they they they have to do both and there is no choice really today um uh and then the future will play out the way it will play out >> yeah just maybe just example like you know a lot of the people in my company now use cloud co-work and to create PowerPoint index, right? So, so you're not in PowerPoint to build the deck, you're actually in cloud building the deck and and it's just a export mechanism, right? The PTX comes out of the end. And so, so if you if you sort of put a huge, you know, barrier to uh PowerPoint to say you can't use this thing, then you'll use something else to >> use PowerPoint.

[01:12:51]>> No, we use it. That's the only thing it supports. By the way, it doesn't give Google Slides yet. So, but when it does, you know, there'll be, you know, we could be able to use any of those. So the idea is that you're able to you uh the a if you're working in AI the other things becomes less important and then that's that's where you have to compete and you have to say I don't think it's actually sustainable to simply put a toll gate.

[01:13:09]>> Out of curiosity what what percentage of AI spending in the enterprise today do you think is like experimental or not considered you know essential yet?

[01:13:22]Does anyone know the answer? Have a guess?

[01:13:25]I think as a function of spend I think it's honestly a pretty a pretty small minority at this point just because the workflows that are typically really scaled up are the ones that people are just seeing work on mass right and so so as a function of like okay the number of use cases we can talk about then I think there's tons and tons of thing you know we see this in orgs that we work with we see this um you know even internally there's tons of things that people will try we'll mess around with a couple different use cases but obviously the ones that you find that okay this is going to work over and over and over and this kind upgrade or this kind of migration, I just know for a fact I'm going to save, you know, six hours for every single time I I spin up an agent to go and do this. Those are the ones that get scaled up a thousandx, right?

[01:14:06]Um, and so when we're talking about this this like AI boom and and this like massive growth in consumption. Um, of course there's some of that, you know, there's a big narrative of all the, you know, people going crazy with the tokens. I'm sure there's like some of that that exists there, but I actually think in practice for for orgs that are kind of like monitoring their spend, most of them have gotten to a point where where they're spending the large majority of that on on on real use cases. Yeah. 100%. Like especially coding, right? Coding just blew up. like like it's so obvious for anybody who writes code how how more effective uh they can be by using uh AI tools and therefore like the spend on that particular category is is is just unreal. So I think this takes >> sure but definitely unsustainable in some cases. I mean we we I think we was a very popular article but we like published the internal like uh dashboard for how meta is measuring uh some of these costs as their as their engineers were talking. It was definitely not. It was definitely out of control. Um I don't know how big of a customer they were of Anthropic. They were probably top four. Um there's probably still top four, but I I don't I just don't know if it's that simple. Like I think there's still a lot >> I think in in engineering and sort of like especially in coding, I think like it has sort of pass criticality, right?

[01:15:19]And so it is useful. People know how to use it. And I think a lot of the use is actually uh is sort of directly sort of uh production type work. But I would say there are a lot of other use cases where you're not that there at all. I think we see a lot of that in traditional industries like you know their adoption of AI or agentic AI is much much earlier. And so I'd say they're a super majority of that spend probably is uh is uh experimental which is trying to find the use case where it actually makes a real don't get me wrong it's not like they're not spending on AI they're spending on co-pilots they're spending on you know Gemini they're spending on anthropic. Uh so I think personal productivity I think it's there but I think like in terms of like if you think about mission critical workloads where coding is one of them then what's the next five that are in a traditional industry I'd say like it's very very early days. I think there's a lot of experimenting being done there.

[01:16:06]>> Is there like a how how big is kind of the gap between capabilities of the underlying models and the harnesses that are very quickly being built around them and what people are using them for today. It does seem like I don't know if maybe I'm just speaking for myself. I'm definitely experiencing and hearing about new, you know, uh new uses almost every day, certainly every week. Um so I'm just curious to know where how big of this this gap uh how big is this gap.

[01:16:38]>> I think it really really depends on on who who we're talking about basically, right? And so I think like folks in corporate you know startups in San Francisco for example I think are pretty on top of it, right? And corporate America to your point I mean that can be um you know months and months out or in some cases even years out um which is a big deal. I mean it's it's a good call out for for for the physics of of how a lot of this enterprise adoption looks and and you know people ask this question obviously a lot of companies in AI have had these like insane growth curves and there's a question of like like well how how is that possible that they can grow so fast and serve so much you know is it sustainable is it not sustainable um and the way I'd kind of put it is well with a lot of these you know technology has always come in waves right and before this wave was the cloud wave and before that was mobile and before that was the internet uh personal computer whatever I I think there's a couple things going on. I think for one um obviously the the the the mechanism of delivering it is just pure software and so it's just much easier to adopt and and and and use at companies. But I think the other thing to your point is like it's not really okay to be two years late, you know, anymore, right?

[01:17:43]And if you think about cloud adoption, I mean lots and lots of companies just got on cloud in the last few years, you know, they just moved their last systems off prem, right? Uh, and it's like, all right, yeah, you're 5 years late, 10 years late, but it's okay. You know, it's like we're making it work with what we have and maybe it was would have been a little bit more efficient to come sooner, but it works out fine. 5 years late, 10 years late on AI is is not going to cut it here, right?

[01:18:05]>> Unless you're Apple, in which case, apparently so. Yeah. Um, and and I think a lot of the enterprises really see this as well. Um and so so I think even you know when we're talking in these relative terms it's 6 months late or 3 months late which is a big deal by the way in terms of what use cases it's worth calling out it's still like an order of magnitude faster I think that that than than many of the previous trends that we've seen. Yeah >> I would agree here and that's I really agree that startups and AI natives are in very lucky position. They don't have this legacy. They they're starting from the scratch.

[01:18:38]>> Yeah. And like more history, more legacy you have, more internal politics you have like it means that you need more more changes to adopt AI internally. So it really depends on like more mature work you are in worse situation you are able.

[01:18:52]>> Yeah. And even even with coding, right?

[01:18:53]Like which is probably the now the most mature AI sort of agent AI use case like if you're a a company that has a large codebase and a team and everything else and you say look now you've sort of you no longer have these different roles.

[01:19:06]There's only a builder role. It's not PM and design and uh engineering. You have to now reorg all the teams and you got to figure out what to do with them and and like and that is not free, right?

[01:19:16]But if you're a startup in 50 miles of this this event like and you're starting from scratch, yeah, sure you don't have any of that or you have five people and it's easy to sort of get around it. But if you have 500 people, thousand people, >> there are real work to like reorg the team and how to think about planning and how to think about the work. And so uh and that is the people who are trying, right? Right. There are other people who still don't understand it just because they don't have the the >> What's the oldest company that you know that's done the best at adapting?

[01:19:44]>> I I I I don't know enough, but of our customer base, I would say Chevron because they're sort of they've been leaned in. They're really really top down leaning in and it's super uncomfortable, right? They're like the the plant managers like WTF with what you're proposing here like I can't do any of this stuff because regul regulations, but they're pushing, right?

[01:20:02]and they have a team set up to push and so uh I think it's I think every company is trying to push different differently but I think it's uh I think we deeply underestimate how much it is to change you know a bunch of work certainly in traditional companies. Yeah.

[01:20:17]>> Well, I'd be remiss not to talk about the uh GPU crunch uh that um your friends at Nebus and Nerk um know all too well. I think um Mark from your team uh was calling out um raising prices in the last earnings call. Uh we're certainly hearing from a lot of startup founders um and and other kind of AI product founders that um it's it's it's tough out there. It's tough out there. A lot of capacity is being reserved for really large customers. So I'm just curious like certainly Nar for you but also for you know for your customers here like what are the tricks like what's working not working? How are you get how do you get better deals? Um yeah what are you doing?

[01:20:58]>> Yeah that's a good question. How to deal with it? uh first uh I believe uh like if you're a user of AI if you're not building AI like you are very likely to have in position to use open source models uh using optimized data layer will also allow basically to squeeze more from the like capacity investment that you can get uh from infrastructure from the GPUs uh combining multiple types of workloads you don't need to resource for all the like you will need to reserve for some some types of the applications but sometimes you can large run batch inference where you can go and do like what Shopify are doing they using GCP and Nibbus and preemptable instances both uh places uh for like non-critical workload so there is like multiple tricks mostly to like you but the main idea is that you can map your type of the workload to requirements for the compute like for production like large scale production predictable production you needs to uh have reserve uh and like you can get more more tokens from this reserve if you use like token factory like services which allow you to squeeze more tokens basically but you can also combine it with burst usage it's it's definitely possible in the world particularly if you use like multiloud technologies like sky pilot or something yeah >> uh Scott in the remaining time I was wondering if you could walk us uh into the future 5 years from now um and just tell us what your agents are going to be doing for us. Um, are we going to have IT departments anymore? Like what's what's going to happen?

[01:22:36]>> Yeah. So, five years from now, you know, we're all going to be in the metaverse.

[01:22:39]There's not going to be any physical.

[01:22:40]No. Yeah. Um, no, look, I I mean I I I think you really see the trend continuing and and I mean I think there's at this point it's you know it's it's it's almost a trit saying in San Francisco but but it's worth really like you know sitting on the implications of it which is you know people talk about the MER study of of how much work can an agent do autonomously right and two years ago wasn't that long ago two years ago that you know it was 20 seconds or something right and a year ago it was like 5 10 minutes of work and now we're talking about you know several hours of network and it continues to grow at this exponential curve. I haven't seen the I actually don't know what the latest number is for Fable, but I'm sure that is now, you know, another, you know, 2x higher or whatever. And if you just naturally follow at that curve and you just ask the question, well, what happens if it does continue? Then you're thinking about, okay, this model is an agent that can do months of work, right?

[01:23:30]And it's it's almost a different operating model where it's now like, look, let me give this agent a whole initiative. let me give this agent an entire like goal to like and let the agent scope out its own project and think about how it should accomplish the objective and how it should do all those things, right? What does it look like when you're actually going in and doing that and on a scale of days or weeks or months? Um, I think we're going to see a lot of that. Um, and and I think the cool thing about >> what's going to be a month-long project that you want to do?

[01:23:55]>> Oh, it's a great question. I mean, it's it's it's like, you know, today, let's say for a coding agent, it's like, hey, here's this bug that the customer reported. Go and do this. Or like, hey, we just wrote a product spec for this feature. Let's go and do that, right? I think not too long from now, it's going to be like, hey, like at a high level, we're just like really thinking about optimizing our architecture, trying to save money on costs, trying to optimize our databases.

[01:24:16]Devon, just get take that as a whole initiative. Take a look at what's going on right now. Think about what you think is inefficient, and then just build everything that you think is like the right way to go and redo this. Like what what are the right way to key into the database? sort of the right, you know, it's it's like rather than a task, it's almost like a a a an open-ended question. Well, what should we do? You tell me. Like, you go do the research on this. You figure this out, right? And and I think we're starting to get to that already. Um and and I think the interesting thing that that comes to obviously is just you're getting closer and closer, I think, to to to getting to think of ideas and have them become reality. Um which is what I'm personally really excited about. I mean, I I agree, by the way, we're going to be in a massive crutch for GPUs. Please, uh, Mark and and Roman and Ar, please save some for us. Um, but but but like I I I mean, I think that's what we're going to see. And and I think for every business out there, people are just going to be able to build so much more. They're going to be able to do so much more for their time.

[01:25:09]>> What's the longest time horizon, you know, job task that you've seen done that like surprised you in the last couple months?

[01:25:15]>> Yeah. So, we've seen people who have run Devon sessions for like multiple weeks.

[01:25:19]I would not recommend that. It's there's some more like meme type behavior. Um, but but no, I mean, we we've seen like end to end. I mean, for example, like for some of our training runs and our projects, it's like we we had projects that like, you know, would have been full like internship projects like um a year ago, you know, like multi-week internship projects and like Devon just does it in a couple days and it like runs all these things and it it it puts up a nice data set of its results and it gives that to you and it's it's it's really incredible to see. There's an interesting thought um and and like hearing all that and uh it occurred to me that it you know I grew up in the in the in the in the era of like various infrastructure projects database engine storage subsystems uh you know companies like pure storage snowflake data bricks nutanics uh Palo Alto networks right um they're they're building you know infrastructure products what what is the defining property of those they're really hard to build uh takes a lot of um you know uh really hardcore systems engineers that you know live in the Bay Area. They're very contested, right?

[01:26:20]That you know they can they can go work anywhere. Um and and and generally each one of those projects is like many years, right? So you you want to put together like an enterprisegrade storage system, enterprise grade database systems, you know that's like five years, right? In order to go and and build it end to end. But the other property of those systems and they're they can be incredibly well speced because you know u what what the API to the system is uh you know what a database engine is you know what a storage subsystem is so I think um it it may be an opportunity to build more infrastructure systems and if you build them from scratch and then you start with a well- definfined spec then um you can you can like design your system not obviously oneshot it but like think deeply how you can break down work and then then unleash an army of agents to go and build this things much much quicker. And by the way, some of that I'm I'm seeing uh uh right now data bricks uh but I think it's going to be like 10x 100x.

[01:27:22]>> This is a product preview you're giving us right now. Is that >> uh no maybe like uh you know there's entrepreneurs in the room uh that want to build infrastructure things that we we certainly live in the world of opportunity right now.

[01:27:34]>> Okay, cool. All right. Well, we're out of time so thanks gentlemen very much.

[01:27:38]Thank you.

[01:27:48]Thank you, Amir, Scott, Veni, Nikita, Narc. Um, hearing from our customers is critically important and it's a a real privilege to have extraordinary uh leaders and executives like we did just now on the stage.

[01:28:07]Um hopefully what you heard is that the gap is real.

[01:28:14]The gap between what we want to do with AI and what we have in the way of systems, processes, tools, and enterprise readiness.

[01:28:28]And in order for us to deliver on the vision that we have, in order to be able to bring the right kind of innovator and builder and enterprise leader insights to the table so that we at Nebus can take the inspiration and understanding and turn it into the necessary inputs that we need in order to deliver on the potential of um what enterprise and scale AI looks like.

[01:29:09]The gap is isn't the ambition. It's the it's between what AI can do and the systems that organizations can actually deploy, trust, and depend on. And what it will require, as I'm insinuating, is collaboration across the entire stack.

[01:29:29]And not the kind that happens at a panel once a year, but one that is actually happening as we're building and sharing what we're working on, what's working, what's not, developing a common set of standards and repeatable architectures.

[01:29:51]To that end, I'm excited to announce three new initiatives um designed to bring together some of the most experienced builders and practitioners in our industry.

[01:30:06]The first is the launch of the Nebius customer advisory board.

[01:30:13]This is not merely a collection of titles, but a working group of operators, builders, and enterprise leaders from across the AI stack.

[01:30:25]The mission of the advisory board is simple.

[01:30:29]Help shape the future of production AI.

[01:30:33]That's a discussion we're having here today.

[01:30:36]So, it is my pleasure to introduce the inaugural members of the Nebus customer advisory board from Amy Black Forest Labs, Cloudflare, Cognition, Cohhere, Core Automation, Higsfield, Reedcraft, Revolute, and ROA.

[01:31:00]Many of them are in the room today. As a matter of fact, if you're one of those, can you stand up?

[01:31:06]Oh, hey, there we go.

[01:31:12]Thank you for being here to stand up when I said that.

[01:31:17]Please join me in welcoming and thanking all these exceptional leaders for the partnership and commitment. The cab is how Nebius partners with organizations like the companies that I just mentioned. Alongside the board, today we are launching the Nebius Fellows Program, a network of developers, contributors, and community organizers who are shaping what act what AI actually looks like when it lands in the real world.

[01:31:51]Our founding cohort spans cities from across the world. They are contributors to VLLM and CNCF and building the AG agents and eval frameworks the rest of us use.

[01:32:07]They run meetups, hackathons, workshops across the globe from Tel Aviv to Toronto and Berlin to Buenosares and from San Francisco to Singapore.

[01:32:19]And with that, it is my privilege to welcome the inaugural class of the Nebus fellows.

[01:32:26]To each of the fellows, thank you.

[01:32:34]And here they are.

[01:32:41]They do incredible work. I've seen some of their videos that Wakasa shared with me. and it's a real privilege to have this kind of community support and I'm really excited about what it means as we're building towards the potential for AR AI.

[01:32:54]Okay, finally the third program. I'd like to share our new builder program that launched in preview today.

[01:33:04]This is for builders who are getting started and for the ones who want to move faster than the infrastructure around them. people testing their agents, deploying inference, learning the stack, and turning ideas into products. You will get Nebius and Tavali credits, courses, easy access to our features, and a path to wider Nebius ecosystem.

[01:33:33]You can sign up now at deb at dev.nebius.com.

[01:33:40]So in conclusion, we are adding three important ways to learn, collaborate, and advance AI in production. Our CAB, our fellows program, and our builder program. So with that, I'd like to shift gears and now introduce another builder, Devon Sakdev Nebus, VP of ecosystem strategy to share our thoughts on agentic inflection.

[01:34:20]Good afternoon. Thanks for joining me today. Quick show of hands. How many of you have built or prototyped an agent so far?

[01:34:29]Excellent. Keep them up if your agent is running in production.

[01:34:34]Keep them up if there's a user to these agents that is somebody other than yourself.

[01:34:40]Keep them up. If you are running not just one agent, but more than one agent.

[01:34:47]All right. So, this is the gap that I want to talk to you about today. and I have a great group of people who are going to join us in a little bit to help us dig into it. But before we start, I want to set up the stage for the conversation just a little bit.

[01:35:00]So a year ago, we were asking the question, can I build an agent? And since then, models have improved, frameworks have improved, and tool use has improved. But today, the question that we're asking is, can I operate an agent or maybe 10 or maybe hundreds in production?

[01:35:19]See, the challenge is that most teams have prototyped agents, but very few have been able to successfully and reliably run them in production. And the reasons why are more interesting than you would think.

[01:35:33]Let me show you a real example. So, we built a compliance audit agent for healthcare companies. And this agent helps compliance teams audit their standard operating policies, hundreds or maybe thousands of them against 30 or so regulatory frameworks, GDPR, HIPPA, um, SOCK 2, you know, the rest. For today, we're going to focus on a very particular task. We're going to focus on the fact that FDA FDA has released a new set of guidelines for AI enabled medical devices and the agent's job is to find out which operating procedures have been affected and file remediation tickets in Jira. So let's see what happened when we built this agent.

[01:36:21]The building the prototype was actually pretty easy. Barely took us a day. I think we had done most of the work before lunch and out of the gate um it actually worked. It was built using GPD 5.5 as the model lang lang chain deep agents for orchestration and then pine cone vector DB for retrieval.

[01:36:43]Like I said for most tasks it just worked out of the box or once we built it but for this specific task with the with the latest FDA guidelines it wasn't able to find the most latest one. So it used the knowledge that it had the existing knowledge and it completed the task but it didn't understand fully what was the change that triggered the task.

[01:37:04]So this really isn't really a reasoning problem. This is a challenge of freshness of data.

[01:37:12]So we added Taville to ground the agent with um live agentic search and this was just one change to the stack this one addition to the harness and off the bat we saw that the agent was able to now start by finding the latest FDA guidelines and it discovered 47 procedures that were affected by these new guidelines.

[01:37:36]It also filed probably double the amount of tickets that the prototype agent had.

[01:37:41]But two problems emerged. One, this agent also increased the coverage and the scope. And when it found the 47 findings of affected procedures, it not only found things related to FDA guidelines, but it also discovered some things related to HIPPA and and few other regulatory frameworks that we weren't intending for for the agent to look at. So prioritization was not very clear and it was left to humans to triage. And second, if you notice that this one run of just one task to took about $657.

[01:38:17]I got a Slack message from our engineer building this saying, is this for real?

[01:38:20]Are we going to spend this much money on building this agent? And that's really unsustainable especially in production uh at least for this agent. So while we solve for freshness of data, we expose two new challenges. one of over scoping and second of inference economics.

[01:38:37]So we tried a third configuration. We swapped out GPT 5.5 and replaced it with Deepseek V Pro uh V V4 Pro running on uh Nebas token factory and instantly the cost dropped from $657 per run to about $34. That's 95% cost savings and this is without any post-training or fine-tuning. The scope also improved. We went from you know 47 findings down to 29.

[01:39:06]But yet yet again we saw two challenges that emerged. Number one was the runtime actually doubled. It went from half an hour of runtime to about an hour which was pretty long for this agent. And then second um when it filed the severity the the tickets with a certain type of severity we weren't able to understand and explain what was the reasoning behind that severity. So the agent was largely opaque.

[01:39:35]So we continued experimenting with newer models. We continued experimenting with different models and what we settled on was Nvidia's Neotron which they just Neatron Ultra which they just released last week and it's available on Nebia's token factory. We also made a few other changes to the harness. We added linksmith for observability and we added snow globe from guardrails AI for user simulation and adversarial testing. Once we ran this particular configuration, we saw that the cost dropped a little bit further down to $24 for the run and we saw that u the runtime reduced pretty significantly from an hour down to 13 minutes.

[01:40:16]We also started using Lang Langmith's recommendations and data from Snow Globes simulations to improve agent behavior. We were able to understand what the agent was doing, why it was doing, and we were able to steer the agent in the right direction.

[01:40:32]So at this point, not only was this agent performing well on this particular task, we also ran this agent and the other configurations over 120 uh different tasks with known ground truth. This particular configuration performed the best. It had nearly perfect recall. It had 20% higher precision and it was about 70 to 80% cheaper than closed source models. So you'd believe that this is now production ready.

[01:41:01]It was but we discovered another challenge which is we could build and run this one agent but how are we going to run hundreds of agents um for for hundreds of users in production. So we've solved for runtime and trust and now we're looking at the challenge of operating at scale.

[01:41:23]Let's take a step back. Let's look at all four runs and find out uh what we discovered. So really three things stood out. Number one, this path from produ prototyping to production ready is not necessarily a linear one. It's actually a maturity curve. So every time we saw a problem and we solved it, we discovered new problems that needed new types of fixes and new types of tools. So in reality, the the right stack or the right harness matters as much as the right model because both go in conjunction in creating the right agent outcomes. Second, open- source models or open weight models are closing the gap pretty quickly for most agentic tasks.

[01:42:06]Openweight models work out of the box.

[01:42:09]We had no retraining um or fine-tuning through the runs that we we walked you through. And third, production ready is not the same as running and improving in production, especially when you're running hundreds and hundreds of copies of the agents.

[01:42:27]And that's the that's the next frontier.

[01:42:29]And that's what I would like to talk with our panel about and dig in the next few minutes. So let me invite the panel.

[01:42:37]We have uh Julia Shodenstein from Lang Chain, Ashoto from Pinecom, >> Rotim Wise from Tibili, and Sha Rajpal from guardrails AI.

[01:43:03]Thanks for joining us. SH I'd like to start with you. So in one of the runs we saw that the agent went off scope when it started looking for other regulatory frameworks rather than just FDA.

[01:43:15]What are the things that are necessary to keep agents on scope especially before you hit production?

[01:43:21]>> Um yeah that's a really good question to start with. I think the interesting problem about agents specifically is that it's very hard to even know what the scope is or what out of scope looks like until you actually start deploying that agent. Right? So as an example, when you started out building that agent, you wouldn't have expected that the failure mode you'd see first is, you know, it pulling data from these other sources until you ran it on some data points, on some queries, and then you discovered like, hey, this is where it's kind of going on off track. And that level of infinite surface area in the way that agents can fail is actually one of the key ways that building agents is different from building traditional software. And um a lot of how we have a very opinionated take on how to solve this problem. And a lot of how we think about solving this infinite surface area problem is inspired from self-driving cars which is where I used to work for many years. And in self-driving cars similar problem space right like the real world is infinite. And how you really think about solving it is simulation. So rather than building something building an agent then going out into production and then waiting for some user to use it in an incorrect way and then see that failure in production.

[01:44:28]Instead, why can't we simulate, you know, a lot of different kinds of user queries before you ever hit production um that mimic your real users and then also go off track in, you know, ways that you haven't seen before. And that'll uh you know, help you anticipate all of the different ways that your agent can fail. So, we've seen that pattern before in physical AI in hardware systems work really well. And we've seen it, you know, the foundation model labs and the frontier models use it a lot when they're building their uh models and their agents as well. And we've seen that pattern come come about much more with you know um with the agent that you know like you guys build for example. Yeah, >> Julia, would you believe that this is this agent steering is it a orchestration challenge or is it an evaluation challenge and particularly not just about how do we catch that before hitting production but how do we continue to monitor it when agents are in production.

[01:45:17]>> I think orchestration and evaluation are really coupled and and the reason for that is is what Trey is describing is you're not writing your agent logic in deterministic code anymore. Instead, you're using a harness and an LLM that calls tools in a loop. And so, you can't pinpoint a line of code that will tell you exactly how your agent is going to respond. And so, the best way to know how your agent is doing is to write some evals or some assertions on how you expect your agent to perform. We talk a lot about this agent development life cycle, which is very similar to the software development life cycle in stages. the build, test, deploy, monitor. But it looks really different for agents because it's very unbounded.

[01:46:03]You take user input which is very unknown in natural language and agents themselves have infinite in response.

[01:46:10]And so you need to have a system to iterate very quickly through this agent development life cycle. And so it's not only your orchestration problem or your eval problem. It's being able to test prepod and postp prod as well and having visibility to really know how your agent is going to do when it gets uh finally in the hands of end users.

[01:46:33]>> What metrics would you be looking for when you're running these simulations or even evaluations?

[01:46:38]>> Yeah. Um good question. And so I think there's some around um so broadly I would classify these metrics in you know a couple of different categories. One is you know uh product or performance metrics which is is the agent doing the thing that it's supposed to do. Um and the second big category is you know more I would say like defensive is it is it causing any kind of harm that you know you wouldn't kind of anticipate right so as an example if it uh draws on uh sources that are incorrect or non-factual you know that would be something where it's misleading the user uh and it allows you to you know like not do the task very well. Uh same for other complex coding agents as an example right where they can they don't solve the task well. Um the other big categories is it jailbreakable? Is it you know can it can it cause certain harm? Can I cause it to like violate its own guardrails etc. So that's kind of like the other big set of categories. I think there's u there's a good starting point for how to think about your metrics. But one of the things I've seen people like the the actual workflow with how you come up with this is that you build an agent have some opinion early on about how you how you care about success. But as you run it on more data you actually see how it performs. Then you'll start tracking maybe you know my customer support agent hands off to a human too often so it wasn't a metric that I knew of before but when I actually look at the data I will understand that this is a metric I now need to track similarly you know maybe it's making too many tool calls or the cost is too high so I can build metrics around that but it's retros retrospectively after you actually look at the data and how it behaves >> good you know so u adding both snow globe simulations and and lang recommendations was a big boost to improving the accuracy of the agent.

[01:48:15]Just switching gears a little bit, another big boost to the accuracy of the agent was when we added live grounding.

[01:48:21]So, um, to me that's where freshness of knowledge came up and freshness of knowledge is a moving target. The the knowledge outside in the world away from the agent is constantly changing. Um, Rim, what would you say is a way to monitor and maintain that the agent always has fresh knowledge? It seems like it's been a 2-year-old problem with some solutions. Yeah, >> you can tell us a little more.

[01:48:45]>> Yes, I think um you know real time data is probably crucial to any agents today.

[01:48:52]I think just two years ago you'd ask questions like what's the weather today or what's the score of the game last night? And the response you would get is I'm sorry my knowledge is cut off to 2021. I cannot answer that. And so I think the only people that's going to be happy today with this answer are going to be the Knicks fans. But that's uh yeah, I'm a New Yorker. Um but really like the the purpose of grounding initially was okay let's just um connect models to the web give them access to real-time information so they can at least answer the questions but what we're seeing now is much much bigger when you connect AI to the web it's not only you get fresher data or fresher answer uh you really get um a much better quality response and to understand why it's happened we need to really look at what's happening to the web today. Uh so up until now, people used to communicate to the web directly, right? You're going to go on Google, you're going to send some emails. Uh but what we're seeing today is that people communicate to agents and they communicate to the web. And this shift is really splitting the internet as we know into two layers. One that is more human optimized, which is the internet you know today, and one that is more optimized for machine intelligence. And that's really the layer that we're working on. Um, and thinking about this layer, there are four core pillars, which is tokens, efficiency, that you need to optimize for accuracy, freshness, as we said, and obviously latency. Uh, but by the way, different agents going to need different tradeoffs. You're going to some agents like deep research agents going to need to go for hours, sometimes days. It depends what the task. that I don't care how long you're going to go. I just want 100% accuracy. But if you think about a car assistant voice uh voice agent, then latency is the most important pillar. Uh so really what we're building here at Tavil is something that's going to give you this flexibility and that's completely different than how human search is built today and it creates a massive opportunity. You can turn hours of research into seconds. And just to give you a quick example, think that you're planning a trip to Italy. H you're probably going to go on Google.

[01:51:12]You're going to search for locations.

[01:51:13]You're going to search for events.

[01:51:15]You're going to sign up and you're going to have to compile everything yourself.

[01:51:19]It can take you some minutes or hours.

[01:51:21]Uh agents can do it in seconds. It can process enormous amount of web data. uh it can synthesize it using LLM and then it can produce a nice result. So eventually it's creating a new paradigm.

[01:51:36]So more compute can directly translate into better search in this new era of web search.

[01:51:42]>> You you mentioned something really interesting around a new paradigm with this idea of two internets, right? One for agents, one for humans. Um Ash, I want to want to um check in with you next which is B. So in one of the runs we saw that the agent consu consumed probably millions of tokens when it was retriving information and it reread the information that it had already retrived from before. Do you believe that we need a retrieval system that is built for agents as the first class consumers rather than humans or models?

[01:52:18]>> Yeah, I think uh humans are we need to deal with machines. They're pretty forgiving.

[01:52:26]And if you look at Pine Cone invented the whole idea of vector databases, allowed people to come back and use AI tools on those vector databases. And if you look at the evolution, 2022, Chad GPT came out. Pine cone went out and produced the most number of chat bots ever. But that was a consumer. They were very forgiving. you you gave wrong answers no big deal you said I don't know the result of nicks that was okay and then you went to the next level by the time we got to 2023 enterprises started to come in and enterprises are not so forgiving but they're still human beings in September last year for the first time we saw a new class of users who exceeded the number of API calls than humans called agents and they're not forgiving. They just take information as you give them believe that's the information they have and act on it.

[01:53:38]So this is not a problem of you know the LLM being bad. This is fundamentally a problem of a mismatch where you have asked agents to perform a task but you gave it systems built for human beings. That's what we did.

[01:53:52]>> Yeah. For four years we've been building for human beings and this new class of users came in and we had to fundamentally change fundamentally change that and that's what we announced Pineco Nexus uh May 4th which is rearchitecting the entire stuff we used to do for agents very similar to what you talked about two webs I know I know the web uh I heard what last month was the first time the web traffic exceeded agent search then human enterprise exceeded September last year as they started maturing stuff. So we came up with a very different model where fundamentally pine cone nexus allows people to ask for three things right agents need one an ability to express themselves on what their task is two the ability to receive information in a model and a manner that agents understand which is a structured way I don't want a poem I don't want your music I just want you to give me a precise answer because I have a job to do and third frankly to get to that millions of tokens and be able to do much more productive is where our partnership comes in. The ability to take Nexus, run it on a compute economic model that allows us to drive, I think we did the numbers with you, 91 to 95% reduction in tokens, 80% reduction in actual cost of running an agent with you guys.

[01:55:28]Now that's a business ROI I can justify but it requires a very different underlying knowledge infrastructure very different one than built for humans.

[01:55:39]>> Yep.

[01:55:40]>> I think that's what it is.

[01:55:42]>> And just talking about you know the impact on business um so you know production ready agent means that the production can be deployed um in in production agent means that now the business is dependent on it.

[01:55:55]>> Yeah.

[01:55:56]So Julia, what what would you say that what's the point where we cross this chasm or what metrics do we need so that we can actually have agents run with autonomy or enough autonomy maybe with some human intervention where we're able to trust the decisions that they're making at scale.

[01:56:14]>> Yeah. Well, we've been we've been working with agents or trying to build agents for a really long time almost coming up on four years now, which is is a lifetime in this space. And now all we talk about is agents. That's the word we use. But last year we talked about agentic. And I really liked that term because it was more of a spectrum. When you started with your initial chat bots in 2023, there was not really anything agentic about it. People would talk about agentic rag where your LLM started making some corrective choices or or had more decision power, but it wasn't an agent. It was it was really just chatbot. Um now as we've moved on and the models become way more powerful we have these new standards and interoperability. So you have agents and sub aents and you take action and you delegate a lot of the tasks now to to the model. You do start seeing more and more uh agents moving on that spectrum going increasingly agentic. And really it's not when is it ready to go into production. It really depends on the use case um and and the stakes. And we have used deep agents which is a harness that is tool calling in a loop. A lot of enterprises still really want determinism. And so we have a a different framework called lane graph that helps you if you need these three steps to happen in this order 100% of the time. The most efficient way to do that is code not an LLM.

[01:57:38]>> Um and so it it really depends on the use case and uh what you're trying to accomplish how much tolerance you have for delegating the task fully to the LLM. Um but if you are moving more to the agentic side, you really do need that full level of uh belt and suscenders approach. So it's the eval that we spoke about, it's the observability that we spoke about, you have guard rails, you fundamentally cannot trust these systems because they have non-determinism in them. And so depending on what your level of comfort is the task, you're going to have different precautions in place to make sure that they are production ready. uh especially in high stake situations like like enterprises where you're dealing with unforgiving uh end users.

[01:58:24]>> Great. Let me skip forward to one more question. Um I think we're going to run out of time in a in a couple of minutes.

[01:58:31]Uh but hot takes from each one of you.

[01:58:34]So we've got this one agent running.

[01:58:36]It's doing its job. We're going to put a 100 agents in production now. What do you believe is the thing that's going to break first? We'll start with uh Julia.

[01:58:48]visibility. You're you're going to have no idea. It's going to be chaos.

[01:58:52]>> Yeah, I would say knowledge when one agent you can manage errors manually. You can say, "Ah, let me go recompile some stuff, reindex some stuff." A 100 agents is a knowledge infrastructure problem.

[01:59:09]You cannot have two agents try to get some information and actually get it different. Especially in enterprise, it better be consistent. And if you don't have that, you lose trust.

[01:59:22]And if you lose trust, you're never in production.

[01:59:26]>> Good point.

[01:59:28]>> And support land chain and say visibility. Again, >> you have to disagree.

[01:59:34]>> It's not a hot take if we agree.

[01:59:36]>> Yeah.

[01:59:37]>> Yeah. But I'll also add on that that eventually the biggest problem in AI today is search or organizational context layer because companies that won't learn how to utilize their proprietary data, they're going to be left behind because they're today that's their only mode. It's going to be in the game.

[01:59:54]>> Um I will say uh I'll also echo uh Julia and say you know you you won't know what's happening, but I will say that it's it's going to be very hard to you know make updates to it. uh you'll be you'll have your agent when you're running 100 it becomes very very hard to know if you know it's getting worse it's getting better how you should iterate on it um so just the ability as a developer to make it better becomes really hard at that scale >> visibility looks like we have some agreement but I still want to debate on about this um at ranks later we'll we'll talk more then >> all right >> you asked us only for one if you asked us for two or three you probably would have agreed >> next session All right, before we close, um I want to leave everyone in the room with uh something you can use um right when you walk out the door today. So everything that we built um our agent with today, the production inference infrastructure, orchestration, observability, simulation, grounding, retrival, these are essential layers that are necessary to build, run, and improve agents in production. And most teams have to uh reinvent this architecture when they're starting their journey. So today we are introducing Nebius agents blueprint.

[02:01:09]This is so that you don't have to start um prototyping agents from from scratch.

[02:01:14]And it's an open uh reference architecture which means that you can go from prototyping agents to production ready agents using recipes and runbooks that we've created. It's available on um build.newbs.com. nebs.com/bloopprints.

[02:01:32]With that, I want to bring our panel to a close. I want to mention one one more thing that um all the products that are in this blueprint are built by folks who are representing here on the panel. So, I would like to thank them for that and I would also like to thank you for an incredible conversation. I'd also like to thank you for joining us again and hope you guys have a great rest of the day.

[02:02:11]That was awesome. Thank you Julia Rotham, Ash and Sharia and Devong for providing us a wonderful um panel on agentic inflection. Um, by the way, blueprints are the very beginning. There's much more to come. And I think we all know that there's still a lot to build. And as we shared today, we're at that inflection as uh the market moves towards Agentic. We're building towards what we are learning from customers and what we're gathering from our partners in order to be a big part of supporting what is going to be a major inflection in the market. So, as we begin to wrap up our first Nebus inflection, one thing is clear to me.

[02:03:06]As we know, the opportunity in front of us is massive.

[02:03:11]But I think you heard this over and over again, it's just the beginning. And there's a significant amount of responsibility that we have to make it happen. The next chapter of AI will not be defined by what models can do. It'll be defined by what organizations can deploy, what enterprises can trust, what developers can build, and ultimately what users can depend on every single day. And if there is one conclusion from me today from today's discussions, it is this.

[02:03:52]There's no way we're going to get this done alone.

[02:03:56]The path forward runs through rooms like this one where humans are sharing what works, builders are being honest, and ecosystems are willing to commit to problems larger than any individual company.

[02:04:16]The spirit is exactly what we created in flection 4. The challenges ahead are real, but every major technology transformation has faced this moment.

[02:04:31]Each transformation has been driven forward by humans who refuse to accept that difficult meant impossible.

[02:04:41]I hope you leave today with three convictions.

[02:04:46]that collaboration is not simply helpful, it's essential.

[02:04:51]That the challenges are significant but achievable.

[02:04:57]And that the next era of agentic AI is already here and each of us has a role to play in shaping it.

[02:05:07]Inflection is not really about an event.

[02:05:11]It's about a moment when an industry stops debating what comes next and starts being accountable for actually building it.

[02:05:24]When experimentation becomes execution.

[02:05:28]When individual innovation becomes collective progress.

[02:05:34]That is what we started here today.

[02:05:39]And that is why this is the beginning of a inflection we're happy to be a part of. On behalf of everyone at Nebus, by the way, these are their pictures.

[02:05:53]Thank you for your partnership collectively, your leadership, and for being here.

[02:06:00]We hope to see you at the next inflection. Thank you.

Related Videos

Artificial Intelligence

AI Agent Mastery Certification Course: Lab 4 – Tools & MCP

arizeai

350 views•2026-06-16

Artificial Intelligence

Real-time Voice cloning, Kimi K2.7 CODE, GLM 5.2 and 3D reconstruction | AI News

kaiexplainsYT

111 views•2026-06-16

Artificial Intelligence

He Believes AI Could Replace Humanity Faster Than Anyone Expects

LondonRealTV

815 views•2026-06-15

Artificial Intelligence

General Session by Rami Rahim-The next generation of networking: From vision to self-driving reality

HPE

108 views•2026-06-17

Artificial Intelligence

[PLDI 2026] Flatirons 3 - LCTES (Jun 16th)

acmsigplan

191 views•2026-06-16

Artificial Intelligence

Google DeepMind’s AI Halves UK Housing Planning Time

60secondsignals

467 views•2026-06-17

Artificial Intelligence

The Creators of Claude Code and OpenClaw don't Prompt Their Agents Anymore?!

ColeMedin

569 views•2026-06-18

Artificial Intelligence

Why prompt injection is AI's biggest fail

usemultiplier

1K views•2026-06-17

Trending

Nobel Scientist Creates Device to Harvest Water From Desert Air

DrBenMiles

2200K views•2026-06-16

GROW A GARDEN 2 UPDATE

KreekCraft

668K views•2026-06-20

উটের কুঁজের মধ্যে কি থাকে?

MrBonGrow

1861K views•2026-06-18

아픈데 손은 호강 중

Memody-q3b

5995K views•2026-06-14