Install our extension to search inside any video instantly.

Granite 4.1, IBM Bob & building a quantum ecosystem
Added: 2026-05-08

1,477 views7947:58IBMTechnologyOriginal Release: 2026-05-01

IBM correctly identifies that the future of enterprise AI lies in modular specialization rather than monolithic hype. This pragmatic shift toward distributed, power-efficient training is exactly what the industry needs to move beyond mere experimentation.

[00:00:01]Enterprise cares. Can you understand tables? Not so much. Can you do the extremely coolest pictures that are sci fi or anything of that kind? Now can you understand tables. All that and more on today's mixture of experts? I'm Tim Huang and welcome to Mixture of Experts. Each week, MOU brings together a group of the smartest, most thoughtful people working at the cutting edge of artificial intelligence to lead you through the week's news. On this week's episode, we've got Marina Danilevsky, senior research scientist, Gabe Goodhart, chief architect, AI Open Innovation, and Kotaro McGraw, principal research scientist.

[00:00:39]Welcome to you all. And Marina, welcome back. It's been a bit. We've got three big stories today, and we're going to have a bonus segment on quantum with Jamie Garcia, who is the director of Strategic Growth and Quantum Partnerships. So we're going to talk a little bit about Dylan Co from DeepMind, which is a distributed training protocol. We're going to talk a little bit about Deep Sink V4. But first I want to start of course with a little bit of IBM news.

[00:01:05]There's been two big announcements that have come out this week. One of them is IBM Bob, which is a system level AI development partner, basically a genetic coding partner and a new generation of models from granite 4.1. So maybe I'll turn it over to you first. I mean, let's start with granite.

[00:01:21]I mean, what's what should people be looking out for here? What should people be focusing on? And is there anything different from the last generation? Yeah. Um, so exciting morning over here. Uh, granite 4.1 going out the door. Uh, a mere hour and nine minutes ago as of recording time. So, uh, it's it's, uh, got a snooze, those slack notifications. But, um, no, it's an exciting launch. Um, and definitely one that probably might seem a little strange to the market, uh, where most models right now are going out targeting a general, a genetic use case. Um, and for this launch, the granite team has really focused on specializing in specific tasks and providing models that complement general agent frameworks really well. So the launch includes not just the LM text models, but it includes the vision and speech multimodality models. And it also includes a next round of our embedding models and all put together. These are really targeted at augmenting what you can get out of a general reasoning agent model to provide best in class support for specific tasks. So the latest vision models are really targeting table and chart understanding, as well as providing general vision capabilities. So they provide, you know, leading quality in those specific tasks. The speech models similarly are driving down the size as much as possible. Well, targeting um, uh, transcription and translation as the primary tasks and seeing just how small they can get that at, you know, highest quality benchmarking so that you can put those on as many devices as possible. Um, the language models come in three sizes, down to 3 billion, up to 30 billion. Um, so packing a lot of intelligence in, in a relatively small package, um, focused largely on instruction following and tool calling. So these are the kinds of things that you could build a rag pipeline around that you would then offload as background research in an agent ecosystem or sort of specific tool invocation types of workflows that you'd want to have, uh, supporting a larger agent workflow. Yeah, I think the positioning of this is really, really interesting. I mean, you know, one way of reading this launch is just that there's been this, like, enormous explosion in costs across the board. Um, and for a lot of enterprises implementing this stuff who don't have unlimited money to draw on, um, you know, cost becomes a really real concern. Um, and so, I guess, counselor, maybe I'll turn it over to you. I mean, I think one of the questions that I had a little bit was, you know, obviously there's been all this hype around agents. You know, I think we've been talking about agents continuously for two years now. Um, it kind of feels like this positioning is almost to say, look, regardless of whether or not we live in Agent world, there's still going to be a lot of need to offload tasks to much more specialized models. Um, and I guess how do you see this ecosystem evolving over time? I mean, it almost kind of feels like there's like a need to build the infrastructure around the agent versus just like focusing on the agent itself. Yeah, I totally agree. I think Agent Tech is is a very important emerging workload everybody's paying attention to.

[00:04:40]But there are also many other things as part of the workload that we cannot ignore. So and if you look at the enterprise AI, it is pluralistic, not monolithic. And like for example, if you look at the bobs example, the multimodal orchestration, which routes each task through the right modal cloud, for example for hard reasoning, Mistral or granite for cheap completions, fine tuned specialists, for example for security review, and even granite. As Gabe mentioned, it ships as a family. We have language, vision, speech, embedding. So all of this, you know, the idea here is how do we design these things to compose? And if you see what the Frontier Labs are doing, they're kind of selling these monolithic intelligence. But I think what IBM is doing is kind of selling a system architecture that is composable. I like maybe to bring here the the analogy that we had with the OS vendors, you know, in the 80s. It's like we're seeing the same maturity move the OS vendors made in the 80s from these giant programs to composable services. So I think the one giant model does everything. ERA is not really sustainable in the enterprise. And, you know, cause I think is a is a big portion of this here is like, maybe you can use the frontier models for things that you can afford for certain tasks. But there is a lot of, you know, maybe other things on the floor that you need to take care of and you need to do it in a cost effective manner and in a sustainable way. And that's still going to kind of, um, hit your main business and benefits you.

[00:06:10]So the specialized models, I think are super important, the SLS that granite families are focused on the multimodality. These are really, uh, super important capabilities for the enterprise.

[00:06:21]Yeah. And, Marina, I think the story that I'm hearing here is, I guess, a little bit of, like, complementarity. Um, you know, I guess, you know, I think the main thing to observe, I guess, is that, you know, there is also IBM Bob that has come out that is an agent based project, and it feels like there's kind of an effort to build these two pieces together, but make them kind of composable and swappable. Is that kind of how you read the IBM uh, Bob launch? I think so I think it's alongside the thing that we've seen coming for a long time, which is how these large models become commodities and that there are, you know. Yeah. You can try to, I guess, compete. But what do you need? 5 or 6 identical large models for? It's not the case. Especially not an enterprise most of the time.

[00:07:03]Yeah. Cost really matters. And also how you do on a really specific task is what matters with like the vision model, right. Enterprise cares. Can you understand tables? Not so much. Can you do the extremely coolest, uh, pictures that are sci fi or anything of that kind? No, it's can you understand tables? That's where the model you want. I think we need to get done. Yeah, we need to get the work done. Sorry. Um. And also, can you not spend too much money when you're trying to understand tables? So this modularity where you really want to treat this again as these are standalone functions that you can compose and say, okay, if this is my job, you do it. If this is my job, you do it. That's what Bob does. And that's what we are trying to do with as grand a family release as well. Like I said. I was just going to say, I think the point about commodification here is really important. If you think about other commodities, once the capability Becomes common, then it's about like supply chain optimization and, you know, cranking down that denominator whichever direction you want to do the math. But basically getting getting the most out of the dollars that you're spending. And so I think both Bob and the granite family are really aiming at different aspects of this. You know, granite is aiming at how do you get this done for specific tasks that you don't necessarily want the entire noise of it living in your agent context, you just want get this done.

[00:08:18]Give me the result. Now I'm going to put that into my agent context. That's exactly what these granite models are designed for. And Bob on the flip side is saying, how do I figure out most intelligently how to and when to invoke those side spurs to offload cost and then keep the main logic in the expensive model. And so the goal there with Bob is not necessarily individual optimization. I think, you know, cater to your point. A lot of these systems have started consumer facing where fundamentally you pay your subscription fee. Yeah, that that gets baked into your monthly budget and nobody thinks about it again. It's and and that's from a consumer model a pretty acceptable way to do this. But you've seen so many stories out there already floating around the internet about companies blowing their company budget of tokens within the first quarter.

[00:09:07]And you start to realize that there's a real sustainability problem here at a corporate and enterprise level where you're just blowing through expensive tokens for trivial tasks. I think the phrase that is bandying about is token maxing, right? And this idea of maxing out how many tokens you can spend to prove that you're the most AI person at the company. Well, that's just going to crush your company's budget. So if you can token squeeze, if you can token right size, whatever phrase you want that actually uses the tokens but uses them effectively with the right cost per token. That's really where the sweet spot is going to be for companies operationalizing this at scale. And that's what both of these launches are targeted at. There's also another interesting aspect regarding modernization, which is, I think a wedge. No one else can match with Bob and Granit. Like if you look at, for example, COBOL, IBM, mainframe, Z, these are there are like trillions of lines of code in production written in languages. Most coding agents barely recognize.

[00:10:06]And I think Bob treats them as a first class. And that's not a feature that's, I think, a moat that we have here. And that's so important for those enterprises like banking and financial industry and so on, that rely heavily on these, you know, legacy code that is super important. And modernization here is really key. Yeah. Maybe, maybe a final thought in this segment is, do you all think that maybe the era of the agent is actually just going to be passing? So, I mean, I think to Gabe's comment, I think a little bit about like translating this into, you know, we're really asking a question of basically like, what are the tasks that need to get done in a given enterprise on a day to day basis? Um, and I think there's one way of viewing it, which is actually a lot of those tasks are pretty routine. You know, the large proportion of them are things that are like pretty replicable. And so the need to have, like, you know, generalized authentic behaviors in some ways actually kind of runs against the grain of what a lot of enterprises actually need. And that in some sense, like, you know, the agent is this kind of like unpredictable cost line for the business. Um, and so I guess I wonder, I don't know, a question for you all is like, I don't know if anyone wants to venture a view is basically like, maybe we're just excited about agents right now, but when you get down to it, maybe like 90% of what happens in most businesses is pretty routine.

[00:11:18]And therefore we sort of don't need ultimately the generality that agents will provide. Am I being too pessimistic about this? I think you're being a little pessimistic on this, but I think there's actually I think what we're going to see is sort of a shaking out of where the user experience that agents offer is something that clearly has legs versus the tasks that agents accomplish, being something that becomes repeatable and can essentially get distilled out of that general agent logic into something much more deterministic. So I think right now, you know, we're going from the pendulum swing of everything is a bunch of handwritten code and or manual steps that users have to do themselves to. Oh my God, we can actually induct this into a general system and let the system handle it. And I think what you're going to see again, trying to do that cost optimization, is that the patterns that users are going to go through with those general systems are going to start to shake out into a bunch of common patterns, and then we're going to be able to extract those things out and make them tools, make them sub agents that are running off of a much smaller model with a much smaller toolset, something that's going to allow those things to be cost controlled and also quality controlled. Right? If you put those things in a much tighter workflow, you run much less risk of things going awry, and then you're still going to have that really, um, beneficial user experience of an entry point with a top level agent that says, like, I'm taking care of work for you, either individual or team or enterprise. And but it's going to be delegating to these nicely contained, cost effective solutions for specific routine tasks. I want to jump on the distillation comment from Gabe, because I completely agree. I think you're being correct him when it comes to the generalist, but the reality is that the future is once again going to be the specifics and the the task specific. Like Gabe said, you need the generalist to start with because first of all, you don't completely know what's going to work and what's not. We've all been surprised in interesting ways of what the generalists can do and what they fail at, and also unique. It's fun, but you need to have these experiences because how are you supposed to get sorry to say, but the training data to figure out what those benefits should be, you have to have a huge amount of information about people interacting with the generalists, succeeding and failing in different ways. To finally get to a point of all right, I've understood what works and what doesn't, and it's data driven, not some sort of an a priori thought that one or 2 or 3 people had. We're all going to be partially right, partially wrong. So this is a correct cycle for us to go through with various degrees of pain. Um, but that's that's just where we are in the cycle right now. Maybe if I can add here, I don't think we can say maybe the era of Agent Demo is over. Uh, I think the era of the agents as infrastructure is just starting here and, uh, and, you know, I think it's going to be important to have this hybrid world where we're going to be dealing with agents. Right. And like Marina said, there is the generalist, there is the specialist, and it has to be kind of layered orchestrations across all of these things working together. And the right orchestration is really key here to drive also the cost down and the productivity. So I think maybe what's shipping now isn't replace your developer. It's really what's bounded govern multimodal agent that quietly does 30% of your work while humans handle.

[00:14:40]Maybe the judgment calls the small the smaller story to pitch. Its much bigger, you know, market to actually really capture here. As an infrastructure nerd, I got like very interested in this story. DeepMind released a paper about a method that they're calling decoupled de loco. And De Loco stands for Distributed Low Communication. And it's basically a paper that's kind of pushing along there kind of ongoing research about whether or not it's possible to do training runs for large models in a distributed way across multiple data sets or, sorry, across multiple data centers. And they have a method that they say, you know, works pretty well and is actually an advancement over the state of the art. I guess courts are looking at this. You know, I see this and I'm really excited because it's like, oh, well, you know, wow, we can suddenly build data centers in lots of different places. And training doesn't have to be centralized in these like enormous, enormous facilities. I guess the question for you, Kotaro, is like whether or not a paper like this and research around it is almost coming a little too late in some sense, which is that we are building these giant data centers that assume that you have to do training runs in one place, and that's really hard to move now. And so I guess maybe the question for you is someone who's deep in the kind of hardware aspects of all this is, um, you know, have we already kind of, like, made our bed as far as training goes? And, you know, are we, for all practical intents and purposes, going to really be doing training runs in really large data centers?

[00:16:08]Or do you think these methods over time are going to eventually change kind of the architecture of how we how we do this kind of training work? Yeah, that's a very interesting question. And also, you know, this paper got me thinking a lot about, you know, this I think this gigawatt scale, single site cluster assumption that almost drove every frontier training plan from like 2023 to 2025 is now being challenged by, you know, its biggest practitioners. So in a Google DeepMind with this paper, really they're challenging this. And it's not really just about the cost. It's also about the power because gigabyte sites need its own substation. So the grid, for example, in Northern Virginia is already maxed. So so I think the loco is not just an algorithm. I think it's really a hedge against power, you know, kind of also permitting this supply chain bottlenecks. And I think there is a symmetry here. It's like training. It seems that we will get to federation here while inference will keep kind of concentrating because inference wants this co-location of the kV caches, which is really important, low latency, tight syncing, etc.. So I feel that, you know, kind of the data center is not going away, but it's kind of bifurcating here. So it's like we will see these two patterns, different topologies, different hardware optimizations where more distributed training but co-located inference. And that will drive, you know, rethinking about, you know, these gigawatt scale single site clusters. Um, and, and another thing that was interesting in this paper is the good put, which was the metric that really kind of finally matures here, you know, the field because what's really important is like what throughput or what metric really leads towards the right direction. That gives us, you know, the good work. So, uh, for many years, really with benchmarks, uh, the peak flops, um, and sometimes the, you know, the, the hardware utilization, but good put and the realistic failure. So the paper shows 88% versus 27% with the, uh, classical data centers. And this what this is showing is training costs in production. What's really determining that is like the waste that we have, we have so much waste because of the large number of failures that we're having. And at the same time, I think this is also shifting the distributed systems that went. That went through 20 years ago from like throughput at idle to kind of, uh, tail latency under chaos when we have all of these failures. So, um, so I think it seems to me that this is kind of the next chapter after FDP fully sharded data parallel and even the 4D parallelism, when we do, you know, tensor parallelism, all of these four forms of parallelism that are now used to scale these frontier models. So really interesting directions.

[00:19:05]That's challenging the status quo of today's distributed training. That's right. Well, so two angles I want to take on that Marina. Maybe I'll kick the next question over to you. I mean, how are you brought up like the the energy implications of these really big data centers? And I would say it's not uncontroversial to say that, like, the energy draw has been a controversial aspect of these data centers. And so I guess I wonder whether or not it's going to be like technologies like this, these almost become more possible because of like the public sentiment and maybe the policy around these technologies where you are going to need to build data centers that do lower data draw, because in some ways the grid can't keep up. And if the grid can't keep up, there are big implications for everybody else that needs that power. Curious about how you think about kind of like, I guess the interface between like almost the science here and the kind of like politics and policy discussion around AI. I think it's a great point, because it allows you to think of data centers that might say, oh, we can train more or less, depending on what else is going on with the rest of the grid. If the rest of the grid is under a lot of strain, because, I don't know, it's real hot here in California and all the ACS are running, there may be some of that training that is being done in one place can be run to somewhere else. Um, another thing that they mentioned, which I thought was interesting, is the point that this maybe allows you to train at different speeds and on different hardware in different places, and yet still things kind of still work, which also means that you don't have to say that everything has to be upgraded all at once. Everything doesn't have to be bought all at once in order to be able to still get what you want. That flexibility is potentially nice because it allows us some leeway to maybe actually put some policies into place of, okay, you're going to have some constraints on when you can train, how much you can train, how many resources you can draw, and then everybody can still be happy enough, if not actually very happy. The companies can say, all right, fine, we can still manage this because we'll go ahead and distribute depending on what's going on in the rest of the grid, and then people will feel like they have some kind of control. And it's not an all or nothing of, well, either you let our data center do everything we want or we leave. Um, and so I think that that's nice. Although I have to say, probably papers like this are not the easiest to explain to people why these things are so. I mean, councilor did a great job. Council did a wonderful job. And this is this is her area, not mine. Um, so I think maybe actually you bring up a really, really valid point of seeing how it might be possible to actually translate these fairly technical aspects to exactly what. So what effect does this have on policy and effect does this have? I think it'd be great if there was more communication of that kind. You know, again, science communication is hard. Um, this could be really valuable to people though. So, um, hopefully they would actually be some more writing on on this topic. Maybe by Qatar. Yeah. More Caltech. Yeah. Definitely. Yes. Yeah. That's a good though. Good idea to explain this. So one thing if I might add same here. I think another things uh, kind of um, following what Marina said, uh, the economics of this, you know, because right now what happens is because of the way we do these distributed training, you really relying on these data centers with tightly coupled GPUs and clusters. Right. So this changes the game and not just, you know, having the same versions of these GPUs. You can also mix and match, like Marina mentioned, older GPUs with newer GPUs. For example, in this paper they showed the TPUs like two generation slower generation and older generation. And I think that's very interesting because this also lets you tap and use capacity whenever it sits. There's a lot of standard compute in the world right now.

[00:22:41]Partial clusters of peak time, you know, kind of geographically isolated facilities, older accelerated generations that no one wants to dedicate frontier runs to, to. So what? The loco also turns those into useful capacity. And combined with the power constraints, this kind of points to a future where training a frontier model looks more like running a globally distributed, like federated batch job than just booking a single site for 90 days. And that also changes hugely. The economics of doing this, which opens up new frontiers for many to do these distributed runs also more sustainably, but also for companies, you know, you don't have to all CapEx always relate to getting the latest, greatest GPUs or accelerators. You can also reuse older generations. You have mix and match and still gets to good training runs. That's great.

[00:23:35]Gabe, do you want to talk a little bit about the impacts from kind of an open perspective? Because I'm really dating myself now, but like SETI at home always used to say. I. Was gonna make this exact analogy. I will let you explain city at home, but it's like kind of like, how far does this go?

[00:23:49]Because if you really pull it out, the whole way gets pretty interesting, I think. So no. It's it's funny that you came to that. I mean, I was. In. I don't remember I was young enough that I had no idea what was actually happening. It was just a screensaver that you saw on the public library computer, and you could install it on yours, and then maybe your school installed it and it was doing some kind of background task while the computer was sitting there idle to look for extraterrestrial terrestrial life in a massively sharded way. Um, but no, that's exactly where my mind went. And I will fully admit that I am not a trading expert, and this paper was way over my head in its technical implementations. Um, but the, uh, the closest I got to an interpreter.

[00:24:30]Interpretable read of it is that at various critical points in the distribution of the actual computations, they introduced these tolerance windows. Um, and so I would need to deeply understand the math a whole lot better than I do to figure out where the limits of those tolerance windows could extend. My guess at this point is that they could not extend to every single one of us running a tiny little bit of the compute on our laptops while we're not using them at night.

[00:24:56]Although I guess now that we're our agents are coding for us at night, maybe that's not a thing anymore, but, um, the I, I mean, I think this idea of a tolerance and this both fault tolerance in terms of failure, fault tolerance in terms of latency, fault tolerance in terms of just the, uh, speed of computation that any given member of this ecosystem can produce. Um, it'd be really interesting to see how far that goes, because I could imagine something like this leading to the first like truly public open source shared model that is a community owned model where everybody donates a certain amount of their local compute. But again, I'm not sure that the actual algorithm being proposed in this paper would support that, given the the level of distribution and sharding that that would require. Um, you know, the the flip side of all that is that, um, I could see this being the next order of magnitude on frontier models, right? Because, you know, now you don't need to fit your frontier model training into one data center. You can fit it into ten data centers. And now you've got, you know, a ten x bigger model, um, that actually you could effectively train at, you know, the scale that you maybe have access to already or that you could rent. So, um, you know, what that does to the inference size of that model. Um, I'd be really curious. Again, not a training expert here, but, um, I could imagine this leading to something like an extra large base model that never actually gets released, and then gets distilled down into actual usable inferencing models that actually encodes more intelligence. Um, but we'll see. I think this that's my other read on why this is coming out of Google is that I imagine that as an organization trying to be on that frontier, they are looking for what technique enables them to do that next functionality order of magnitude leap. Um, whether it's order of magnitude and intelligence, order, magnitude in size, order of magnitude in data, um, like something that goes even bigger and acquires even more distribution, that you just fundamentally hit those limits of power. As we talked about physical space in a data center, availability of the latest GPUs, all of those constraints that have been needing to line up in order to train one of these frontier models up today. If you can break those down and say, now we can actually put this together in a federated way, then all of a sudden some of those theoretical limits go away and you can actually make progress towards a better model.

[00:27:26]In the world of open models. Deep seek V4 has come out. This is the latest model of From the Deep Seek group that has kind of like continued to kind of push the frontiers on how far open can go. Um, I guess, Gabe, maybe I'll start with you for a second. You know, my first thing, just kind of looking over the release stocks is, uh, open models are getting really, really, really big. Like, effectively the pro model. Um, let me just bring up the number is, uh, 1.6 trillion parameter model with 49 active params. Um, and so like that's crazy and obviously exceeds the ability for anyone who's got reasonable infrastructure to run. Um, and so I don't know, it's kind of I guess the question I have for you is like, who's using these giant open models? Um, and, uh, and, you know, where do we think the application is for them? My top line read of this whole release, aside from the story of Deep Squeeze back is that this is yet another strong indicator that cost matters. Um, and, you know, just like we talked about in the Granite and Bob's segments where you're targeting offloading low hanging tasks to smaller models. Um, you also want to drive down the cost of that top line agent model as well. That's managing the logic of what your agent tasks are doing. Um, and, you know, I think between the cash implementation, sorry, the attention implementation with their, uh, compressed, uh, sparse attention, uh, and uh, the ability to run this thing probably on less than the latest and greatest chips, um, is going to allow organizations that want that intelligence for themselves to run this thing at a whole lot cheaper than they could buy it from, uh, a frontier provider. So, uh, again, just yet another thing saying, look, we know that the level of intelligence keeps rising and it's important. But for large enterprises that need to govern the cost at, you know, the entire enterprise breadth rather than the individual subscription breadth, um, you need to be able to keep that cost down. So I think that's the the main story here for this model. Um, the other thing that I think is really interesting, and I just love seeing with these deep secret leases, is that this this lab just keeps coming up with really cool ideas. Um, and, you know, in the training space, especially for these extremely large models, these experiments are expensive to run. Right. Coming up with a novel attention mechanism is not cheap because you actually have to see if it works, right? I mean, so there's all sorts of heuristics for training the baby version and seeing if it works, but does it scale up? Um, does it actually sustain over the million context length that they're trying to, uh, trying to push? Um, so it's really cool to see that. And I expect to see those same mechanisms, you know, propagating out further into other smaller open weights models. But yeah, back to the size question. Um, like clearly this is being targeted at the organizations that want a cheap frontier cost. But man, I really wish the things labeled Flash and Nano were things that I could fit on my 128 gigabyte. I feel like the words don't mean anything. Come on, you know, I may I may be lucky enough to receive a second GB ten soon. And I'm starting to think about like, what could I fit in a cluster of two of these things? Because I really want my local AI work workhorse to be able to capture this intelligence. But, um, you know, until things start coming down to a reasonable size that you can fit on local hardware, I think the the distribution of these things is still going to stay centered in a data center, in a cloud somewhere, whether it's an on prem cloud or whether it's a, um, you know, AI specific cloud or whether it's, you know, a general cloud provider, uh, you're going to need networking and hardware that are hard to commoditized easily. So I'm really looking forward to the next rev that pushes these one size rung lower, that starts to fit on commodity hardware, and then everyone can just run them locally. Marina, as a researcher, you know, Gabe talked a little bit about like, well, it's not just about the big size.

[00:31:26]There's also a bunch of really interesting kind of like ideas that Deep Seek is experimenting with here. Anything that stood out to you on on the launch? A couple of things. So one, and Gabe mentioned it was uh, the attention mechanism. So this whole idea of how it decides to make the model, you know, figure out what to pay attention to. I think it compresses some information, focuses on some other information. This seems to me like an immediate. Okay, you're trying to be a coding agent enterprise assistant. Because how do we use our coding agents? We point them in an entire repository and go read this, do something with it. That's a lot of context. So we're not talking about books anymore. We really are talking about a real reason of why you should be reading a whole bunch of context and saying, yeah, most of it doesn't matter, and you better figure out what does. And I'm not going to tell you because I don't want that to be my problem anymore. So that actually seemed like, um, real interesting things that they were doing. I know that they've been doing some work with memory. Of course, a lot of people are trying to do work with memory, but I think the way that we represent context, frankly, is one of the next real frontiers that we should be handling properly. I mean, you can train models over and over and over again, but it still really matters how you give them context and then what they're supposed to do about it. Um, the way that we have generally up to now created context as humans has not been optimized for this kind of thing. And so there's all sorts of different tricks have been coming up and up and up. And this is really one of, I think, the more interesting lines of research that are going on right now.

[00:32:56]You'll see it under the term memory management. But it can be, you know, thought of as all sorts of things. That's what's what's interesting to me is how do you actually have that context be shown to the model with no work from the user, and have it still be able to function and not forget stuff, especially over time? We still see problems when you have a session running for a long time with a model. It will forget stuff. It will rewrite things. It will consolidate in ways that are self-contradictory. So any kind of movements that are made in that direction I think are, are very interesting. I believe it also is uh, was trying to optimize for being used with agent frameworks.

[00:33:35]So again, it's very clear that that's, that's where they're going. Where their head is. Um, yeah. It's very clear that that's where they're headed. So that's absolutely fine. Um, but yeah, I found that that part interesting of what they're sort of focusing on because, you know, they had this moment where they had an oh my God, deep sea car. One is rolling everything, but a lot of the other labs caught up pretty quickly. So this release is interesting and it is valuable. It is maybe not enough to send the stock market tumbling, which is maybe good for the rest of all economic stability.

[00:34:08]Um, but it it's really interesting what they come up with when their big thing is how do we lower cost? How do we lower cost? How do we lower cost? It's it's a really interesting lab to watch. Well, maybe final thought counts are if you want to give us a parting shot. Deep seek. Are they back?

[00:34:22]Yeah, I think they're back. And they're back. Uh, strong especially, you know, with all of these, uh, especially hardware or co-design stories that they're, uh, and tricks that they're bringing to the table. So they're really, uh, very interesting, uh, and innovative algorithmic things that they're bringing here. The, you know, the the sparse attention, the, uh, the lightning index, you're using fate, uh, and how to make these custom Cuda kernels pick a small subset of relevant tokens. Uh, you know, the Mo also, which is now becoming kind of the default for these, uh, to make these parameters cheap. But I think another interesting thing is the inferencing stack right now need to be rethink rethought. Basically, with all of these changes that deep seek, the 3% activation is actually a, you know, a differentiation point that closed labs can't really easily match, because activating 49 billion parameters out of 1.6, that's huge. That's 3%. And closed labs right now can't match that inference economics without rebuilding their serving infrastructure around comparable sparsity. And I think this is structurally why the IPsec API pricing even keeps kind of drifting downward. And the closed labs can't really follow that without any big margin damage. So this is, I think, a very interesting angle. Yes. In terms of all the economics of inferencing with all of these hardware software co-design tricks that they're bringing to the table. And and also the 1 million context that Marina mentioned, even, you know, if you look at the rack calculus and the rack pipelines right now, the way it was built initially, it was built with that limitation that we don't have access to big context. So we had to do all these chunk ins. And then there was the retrieval aspect and so on. So when the 1 million context becomes free at the default tier, every enterprise that built a rack pipeline in the last two years has to rethink. Do we keep retrieving, or do we just stop the entire document set into the context? And that choice was kind of settled when the context was expensive. But now this is reopening. And I think for those of us building enterprise systems on accelerator hardware, this changes, you know, what do we optimize for in the inference stack. So this is like we have to rethink a lot of these assumptions that we had made for some of these AI enterprise stacks.

[00:36:50]That's a great note to end on. Well, we're going to go on over to Jamie. Um, and I love this panel. So Marina Kotor, Gabe, thanks for joining us today. And we'll have you all back on the show very soon.

[00:37:03]Jamie, thanks for joining us. For our listeners that are just tuning in, we've got Jamie Garcia here, who is the director of Strategic Growth and Quantum Partnerships. Jamie, welcome to the Mixer of Experts. Thank you for having me. Obviously, we're living in an era where it feels like there's a big quantum announcement every few months or every few weeks, and I think it's really easy to kind of get the eye drawn to, like the latest technical breakthrough or, you know, the latest scientific question. Um, what I want to do today is just talk a little bit, particularly with where you sit on working on quantum issues about sort of what's going behind the scenes to make all these discoveries happen. Because, you know, I was catching up with a friend recently who's in this space, and he was like, quantum still, you know, despite a lot of activity happening, a lot of companies getting in the space still really feels like very much a team effort. And so I want to give our listeners like a little bit of a sense of that and, and kind of how IBM's thinking about that. Yeah. So quantum computing and making any advance in quantum computing is always a team effort. It it stems from the fact that basically for quantum computing, because it is something that is evolving and developing and the hardware is moving at a really an accelerated pace at this point. We're also having, you know, software development at the same time. Algorithms and applications development. All of those things require kind of different skill sets, if that makes sense. So you have people coming back. You have your engineers, your quantum engineers that are really driving in, the physicists that are driving the, you know, innovations on the hardware side. And then you have software developers that are having to create tools on the software side.

[00:38:36]Working together with subject matter experts in things like chemistry and biology that are completely different fields. Um, and so it really does take a village because of that. Um, and I think that, you know, so even like our own organization is a little bit of like a microcosm of, like that diversity in terms of people coming with the different skill sets. So, you know, I myself come from a chemistry background and started working, you know, with a quantum team over a decade ago. But like, you know, that's like one example of like many um, and how people have to come together to really make these innovations happen behind the scenes. And I think this seems like to me, just reading some of the news stories recently, IBM's been kind of ultimately closing a bunch of partnerships with universities, kind of trying to build a lot of this kind of network. It sounds like I know there's a partnership with, uh, UIUC and MIT, and I think there's also like a Xth one. So you all have been all over the place recently. Yeah, yeah. No.

[00:39:34]Partnerships are corn central. So this is you know, within IBM we have our own, you know, program. But we really have to work with people external, uh, to IBM in order to like make things happen. So that involves companies, um, and certainly universities, uh, the university partnerships you mentioned. Yeah.

[00:39:52]UIUC at Georgia and then MIT, um, very, very exciting, um, I should say, for working together on quantum. And they all have sort of unique kind of things that they're looking at. I should also mention that these are partnerships that have been established over. You know, in some cases decades. So they're not necessarily like new for like IBM's partnerships with those universities. But I think the interest in quantum has grown so much that it's really showing up now as a part of the partnerships that we have. So, for example, with MIT, which I'm really excited about, we're going to be looking at algorithms and applications with them in the quantum space and the sort of the fundamental mathematics that go into, uh, you know, developing out these applications. And that is, I think, really important. And what it does is it leverages the deep expertise of, like our partners at the universities to be able to come together and really bring those sides together and learn from each other to advance the science and and get to, you know, meaningful and impactful work together.

[00:41:02]Well, where. Is the network going next? I mean, what are you hunting for now in terms of partners. Like where where is the kind of focus, what's still missing, I guess, in the network that IBM's building. Yeah, yeah. So I think like, uh, you know, UIUC is is going to be focusing on, uh, something that we call quantum centric supercomputing and basically, you know, and then it ha, I think we may have just seen an announcement about this half Mobius molecule that they're able to actually like create that was, that was like worked, uh, working with people that were working on the quantum side. So I think that all of those, those things kind of together, it's they're starting to tackle some of the bigger challenges that we have that in a in a very many ways. It's like, you can't just do it all alone. You need to activate the field to really look at these things.

[00:41:55]Um, discovering the next, you know, big and impactful quantum algorithm isn't the easiest thing in the world to do. So, like, you know, we really just it's a call to the, the field and to, you know, our partners to come together, to start. To start looking at this and then to start thinking about how you can have quantum computing that's integrated with HPC and AI, to be able to actually address these problems towards quantum advantage. Right. Which we said when predicted that that would happen this year and we would start seeing examples of that. So I think like that's where it really is. It's like more than it's beyond just IBM. It's about all of us. And activating the ecosystem to work together towards these really big problems that are out there that require that kind of attention. I mean, as you kind of look down the road, right. Um, you know, the conversation I was having my friend, he was like, oh, well, quantum is going to break crypto really soon, and it's going to change everything. Um, what are the next milestones in the next like 12 to 24 months? Like what's the next big kind of breakthrough you think that is on the way? I know you talked about Quantum Advantage, and I know if you want to talk a little bit about what that is and why that's important, but just kind of for our listeners who might not have their heads in quantum all the time. Right. What's like the next big story they should be expecting? Yeah. So Quantum advantage. Our definition of it is where a workflow that, you know, has a quantum component to it basically is able to perform a solution to a problem in a way that's cheaper, more accurate, or faster than classical computing alone. And so for different fields and different industries, that means different things. Um, but you can imagine, you know, being able to, you know, simulate a large biological molecule with improved accuracy over state of the art classical is is a very exciting thing. So we think quantum it's not a one size fits all right. It's not a big data machine. It has certain tasks that it performs best. And so we really need a couple the CPUs, the quantum processing units with GPUs and CPUs altogether in order to be able to orchestrate these workflows and be able to, like, kind of get the most out of each part of a heterogeneous computing platform. So that is what we're what we're looking at. And then identifying the problems that are meaningful is what we really look to our partners to do, because that is their expertise. If they're coming from a certain industry or if they are, you know, a world expert in chemistry. They're the ones who know where the bottlenecks are. They're the ones who know what the problems that that are really meaningful and that really will push the field forward, lie. So in the next 12 to 24 months, I just expect that we're going to see more examples of these from different areas areas like Hamiltonian simulation, optimization, machine learning, partial differential equations. Those are the types of problems that we think quantum will really play a role in and be able to, you know, aid in the computation in a way where you do get an advantage for using, you know, a workflow that contains part of it on a quantum computer. Are you?

[00:45:03]I know you said your background's chemistry. Are you kind of like a chemistry partizan in some ways, because my background is mostly in AI, and I remember for a long time people were like, ah, AI can't do anything. And then like chat happened and it was like, okay, well, now I really get it. And it feels like similarly with quantum, when you have these conversations, is always like it could apply to all sorts of things and be a really big deal. We're still waiting on like the one application where people are like, oh man, this is this is the thing that's going to change everything. And I know as a chemistry person, if you're like, it's going to happen in chemistry first. Well, like I think so. Like, you know, quantum computers operate on the language of quantum mechanics. And so, you know, quantum mechanical systems are oftentimes like our natural systems that we think of. So some of the the best problems that we think are suited for quantum include things like chemistry, materials, physics, you know, so so describing our natural world is just like a it's a perfect fit. So that is, you know, I think chemistry just from my background and my experience with it and kind of knowing how to work with computational chemists and how that can help accelerate discovery and help really accelerate the work that we do in the lab and complement it.

[00:46:15]Knowing like the promise that quantum brings to it is personally very exciting to me because I'm like, wow, you can totally flip this on its head. Instead of doing the experiment in the lab and then modeling it afterwards, maybe you can model it ahead of time, save a bunch of time and money in the lab, and just go in and then try whatever you know, the output is of of your, your simulation.

[00:46:37]So quantum. Yeah, definitely for chemistry. I think some of the other areas are also very interesting that I mentioned like finance optimization, healthcare and life sciences, biology. Um, you know, anything that's like a dynamic system, uh, you know, is is interesting for quantum. And so I'm not going to like, place my bets quite yet because I think that, you know, the the as I said, the field has been activated. And so I think we're just going to see more and more examples of this, like in our uh, Quantum Advantage tracker, for example, and GitHub, like we're seeing things like kind of ranging across the different areas. So that is really exciting in and of itself. Yeah, I agree, it's super, super exciting stuff. And um, when we hit advantage this year, we'll definitely have to have you back on the show. Um, but Jamie, thanks for spending a few minutes with us talking through the current state of play here. Yep. Thank you for having me again. And that's all the time that we have for today. And thanks for joining all you listeners. If you enjoyed what you heard, you can get us on Apple Podcasts, Spotify and podcast platforms everywhere. And we'll see you all next week on Mixture of Experts.

[00:47:52]I like.

#IBM #IBM Cloud

Related Videos

Artificial Intelligence

OpenHuman VS Hermes AI: Who Wins?

JulianGoldieSEO

285 views•2026-05-29

Artificial Intelligence

Long-Running Agents — Build an Agent That Never Forgets with Google ADK

suryakunju

142 views•2026-05-30

Artificial Intelligence

This computer is made from real human brain cells. And you can buy it.

Talktmsmedia

3K views•2026-05-28

Artificial Intelligence

BREAKING: Microsoft’s New Image Generating Model Beat Out GPT 1.5 and Nano Banana 2

aimmediahouse

122 views•2026-06-03

Artificial Intelligence

I Made the Same Anime Fight Scene in Every AI Video Generator

NobleGooseAnime

295 views•2026-05-30

Artificial Intelligence

Nvidia Bets Big On AI PCs | New Chip To Power Windows Laptops | Technology | AI Updates | N18S

cnnnews18

3K views•2026-06-01

Artificial Intelligence

I Tested NEW Opus 4.8 on Four Projects (Updated LLM Leaderboard)

AICodingDaily

298 views•2026-05-29

Artificial Intelligence

3D Platformer Update - NO CAPES

SolarLune

294 views•2026-05-30

Trending

The Casino Had Us Guessing All Day

VegasMatt

157K views•2026-06-03

The Dancing Plague...

HoodieGuyStories

1730K views•2026-05-30

The Fastest Way To Board A Plane 😮

zackdfilms

6504K views•2026-05-29

Artificial Intelligence

DOOM Runs On Everything...except Neo Geo

ModernVintageGamer

143K views•2026-06-01