This video presents three distinct eras of AI integration in data visualization tools: (1) Early AI Assist (2023) - a sidebar feature that could generate code but lacked runtime awareness and tool capabilities; (2) Canvases AI - an agent designed as a collaborative assistant that shows its work and enables human-in-the-loop feedback, emphasizing augmentation over automation; and (3) Observable Chat - an experimental prototype that functions as an append-only notebook with runtime inspection capabilities, allowing AI to execute code, access browser APIs, and progressively build complex visualizations. The key lesson is that effective AI integration requires tight feedback loops, transparency about AI operations, and tools that enable users to understand and verify AI-generated work.
Deep Dive
Prerequisite Knowledge
- No data available.
Where to go next
- No data available.
Deep Dive
Three eras of Observable AI | Observable PodcastAdded:
Hello and welcome to episode three of the observable podcast which is ordinally the fourth podcast episode and I'm to Tucker here with Vis.
>> Hello.
>> We've roped into the podcast now.
>> Vis who are you?
>> I am Vis. [laughter] >> Uh I don't know. I've been here observable for a long time. engineer worked on a bunch of stuff doing all those things. Yeah. Yeah.
>> Love it.
>> Code. I talked to AIS and write code.
>> Well, can you just go fetch one of those vases on the shelf behind you for me?
>> When I can't right now, nor are those vases really there. This is a photo of This is a realtor photo of the home I do live in, though.
>> But I'm also not at this home either.
This is some kind of justified true belief puzzle. All right, we're here to talk about AI at observable and a couple of the different eras we've gone through. Um we we had a AI assist feature in notebooks that we added several years ago and then we've worked on AI in our canvases product and now we're working on new stuff. Um so we wanted to run you through some of the lessons of those different iterations.
Uh, and hopefully we're learning from our past and just talking about AI like everybody else.
>> Like everybody else. Bism, what's the first time you remember observable talking about AI?
>> Uh, huh. I mean, it definitely was the AI assist stuff and whether or not we should have Well, yeah, like we should have something too. But we should, you know, the the first Yeah. the first era of AI was everybody using chat GBT and then realizing, hey, we can add some of these things to our products and everybody just tacking on chat to their product somehow, >> right? You can just sort of use chat GPT, but it's like in the sidebar of observable instead of >> Yeah. Yeah. Yeah.
So, so that was I think like early mid 2023 we were building AI assist that was like Wilty and Sylvester.
>> Okay. I don't know that sounds plausible and it's a sidebar that still exists in the notebook today. It's the little sparkles because sparkles mean AI. Uh, and we had like a whole I mean I didn't work on it at all, but I uh we had this whole you know rag system and like a vector database full of our documentation and a lot of plot examples and we had to have a fair amount of infrastructure to run it. I mean not >> kind of but we had there was like the separate database for it and there's >> um and I think >> we use pine cone for the vector database.
>> No idea.
>> I don't and I don't think it works in the product anymore.
>> Who knows?
>> Sadly maybe. Uh and I think we you know we had some you know it could generate sort of fenced code blocks that we exposed as you know it could it could edit your cell. We could inject some notebook context into it but it was just context about the values of the cells about the uh you know the the static >> editor contents the code the code of your cells. We couldn't tell it anything about the runtime values of variables in your notebook.
didn't have tools yet.
>> It couldn't really run anything on the side. So, it's kind of like Yeah, it because, you know, it felt natural. It made sense like you'd have an editor.
You're editing code. It can help you write code. Um, and that's yeah, what it kind of did. But, yeah, if you ran into problems, you would have to copy and paste uh could you Yeah, you would have to copy and paste those errors kind of to it maybe.
>> I don't even remember. Yeah, I don't think we Yeah, because that would be a runtime value. I don't think we really piped errors back into it. Uh I was looking back at some of the product documents about it and I think we really framed it as a SQL assistance tool and we had some goal that 75% of SQL cells would be written using the help of AI assist. Uh and I don't think we came close to that at the time.
But it's interesting it's like because we have a JavaScript background SQL was the hurdle for a lot of our users. Um, and I think for a different product, you know, you might have SQL users who wanted to make an interactive visualization and then they would maybe be leaning more heavily on JavaScript assistance first. Everyone wants, you know, the languages they're less familiar with.
Did you ever use it, Vizna?
>> Barely.
I think did I? Maybe. I don't remember it anymore. so long ago the uh I think the models just weren't as good as they are today and it really struggled with the differences with the idiosyncrasies of observable JavaScript like we have these little these little quirks like you know you don't use con let for a top level declaration and you have these reactives reactive runtime behaviors.
>> Yeah. Um, and >> I think plot >> plot had mostly quote unquote just come out in that like it hadn't like crawled or you know added it plot documentation to its um knowledge.
>> Um, it knew D3 decently well in a sense, but it didn't really know plot. Um, and so that's why we had to of course give it like more documentation and um, rag type context in order for it to perform any at all. But like now of course you can ask current LLMs anything and they can use plot without any help. Um and they do so quite well.
>> Yeah. When did plot came out? 2021 or something.
>> I don't think any I have no idea.
>> So then recurring [laughter] >> going to be a recurring thing to gonna ask me when something happened. I keep saying I have no idea.
>> When were you when uh when did you have breakfast?
Uh I I did though I still have that in front of me. So that's you know that's doable.
>> Um so the next big AI thing we worked on was with observable canvases.
Uh this was last summer, right?
>> Sounds right. I can I can put things in order, but yeah, I can't tell you how far away they were.
>> That's reasonable. uh with canvases we made another big push with AI which Vis and I worked on together and that was that was at least my first experience with kind of the modern tool use era.
>> Yeah.
>> And vis what do you what were what were our main goals with that? Um, so I think well I guess this is also maybe in the time of everybody feeling awkward about AI slop and like AI like creating junk and like and so basically we were hesitant to try to have like our view of AI is that of course it can help you accelerate your work in doing like especially tedious things or learning something or Um so within canvas uh which was which is a product to explore kind of huge um data warehouses uh of course like not everybody knows the data model. Not everybody like remembers every single table and column and how they're built and what the semantics are around uh uh uh rows and values and all of that stuff. Um and so AI can of course easily kind of figure it out with you. um and be like a a data engineer light in a sense so that like you don't have to go and bug a data engineer friend um or somebody who helped build the data warehouse. They it can kind of like explore and figure out those things with you and unblock you in a lot of those things. So um a lot of people a lot of people described the point of the AI as being like having VizNew on the call with you.
>> That's what they wanted. Everyone just wants Viz new on the call. Yeah. So >> because you know our data warehouse quite well and >> you know you can find the needle in the hast tag the column that they might >> not even me like I I want my past self um also on the call because like I don't remember the assumptions I made and building a table necessarily or like exactly if something ran like came through in a certain way right um and like those are codified like in the model definition the SQL um but like I have to go look that up and like to go look it up I have to remember where that is too. Um and so AI is great at search and it's great at like looking through files and being like ah I found it. It's over here. Um and then I'll look at it and be like oh yeah that is over there.
That that seems totally right. Um and so I guess one top level philosophy thing was that yeah we wanted it to be more of like a helper and not like a replacement or like a you just give a very high level like give me DAO and like it goes and produces a perfect DAO analysis for you. Um, it should be more like, do we have DAO? It's like, yeah, it seems like over in this table you have DAO, uh, daily active users. And it's like, okay, what does it look like? And it's like, oh, I think it's probably something like this. And you look at it's like, that seems a little off. Seems like the numbers are low. Oh, yeah. Okay, hold on. Let's like do this and do that. And like, oh, that's maybe not the right place. Oh, it's it's this table over here. Um, and so like you kind of a more authentic workflow I guess or a more like learned workflow. So like that with a tight uh feedback loop with an actual human so that the chances of being wrong go down. Um, and so I I guess we we brought this up a lot in like the whole Hail Mary versus kind of like passing place thing. Like you want you don't want to give a >> run plays. Hail Mary versus >> Oh yeah. versus or short short passing play >> short passes >> uh or run running plays. Um you don't want to give like AI the the like hey build me a business thing and have it go break that down and do that all um although you know ambitiously it it's getting closer to doing all those things. Um, you want >> the definition of a Hail Mary keeps changing when it comes to AI.
>> Like a couple years ago, it probably would have felt like a Hail Mary for it to implement a whole sorting algorithm in one go or whatever, and now that's totally trivial. And now it's like >> building the whole app in one go feels like a hail Mary.
>> Yeah. And entertainingly, it considers some things trivial code-wise that maybe humans do not. And you have to talk it back maybe. I don't know. Like it'll it'll it'll do linear regression like oh I can implement that and do the least squares matrix multiplication like I guess that's trivial and then but then you like look at the code and you're like that code does not look quite trivial and I don't know if it did that totally correctly. Um but I guess the same is true for sorting and yeah.
Well, in your example, you were just saying in that scenario, if someone looks at the initial AI results and says, "Oh, that Dow looks kind of low."
That's assuming that the user has enough institutional context to be able to gut check the number that the AI returns, >> right? And I think we we talked a lot about the different scenarios or personas or whatever where uh we definitely got the best results when someone was using it who had enough domain expertise to be able to quickly steer it back on track as soon as it went ary and other people would you know it would spit out a doubt and they'd be like okay I guess like if you say so but they wouldn't they wouldn't have great confidence in it themselves.
>> Yeah. Yeah. And like without that human though, especially if DAO is not exactly like the end result anyway, like it's like DAO tied to like um educational users or it's like DAO sliced by some other metric. Um again like first you have to get like total overall daily active users and then you want to do like that slicing, right? And so like if you give AI that full initial task, it's going to like try to do daily active users and then it's going to find some number and then if it has no feedback on that, it'll just assume that it was right. Right? So like it could be off by like an order of magnitude and then it'll just assume that you're a small company versus like being an enterprise or something like that and then just keep going. Um, and only until the end where it's like and then if it like does the slicing and then it comes back with smaller numbers, maybe you don't realize it's too small because like you didn't realize the initial one was way off. Um, so yeah. So kind of like again the the smaller quicker feedback loop. I think we've always even even as like humans and notebooks and everything like I think we operate better with a quick and tight feedback loop. Mhm.
>> Um it's very difficult. I I mean I'm talking for myself, but I pretty sure it's true for almost all humans that like uh you do something, you see the result. You do another related thing, you see more results. It's like just more satisfying. It's more, you know, dopamine.
Uh it's more engaging. It's more all of that.
>> Um >> I mean, Beethoven was composing death, right?
Maybe >> maybe occasionally in human history people have benefited from long feedback loops.
>> Yeah.
>> Multi-entury cathedrals.
>> Yeah. But then you're saying that also I mean Beethoven then never got to close his loop.
>> I know. Well, also he didn't like start off that way. He only got that way after decades of living with the very tight feedback loop of like >> press a key, hear a note at the speed of sound.
>> Yeah.
Um, so yeah, so Canvas AI, um, we I think again wanted to kind of like work with you, kind of be in the canvas with you. Um, and so it's more of like a actual other human kind of there um, working with you. And so I think one of our initial tenants was also that like it doesn't see anything that you can't see. So that like it doesn't know anything that you don't really know.
um or it it basically shows its work. Um so like >> if it's going to select from a table when it selects from the table it doesn't do that and hide it like selects from the table and you see it select from the table. Um and then there's >> pull up an example.
>> Um I think you have it ready. I guess I have it ready also. Uh this one you want to have it me. You you can you can do it.
>> All right. All right. All right. Share a screen. Share. I think observable as a company has a strong bias toward augmentation over automation.
>> Yeah.
>> For better or for worse. And I can imagine another company like maybe for some people the goal is that it should headlessly >> interrogate and produce a a trustworthy daily active user number.
>> Yeah.
>> But I think our bias is that >> it's it's partly about the journey of how you get there. And maybe your organization doesn't have consensus on how that metric should even be defined or what counts as active or which products are included in it.
>> Yeah.
>> And the the the process of arriving at that is you know the value is in is in the collective processing of figuring out what you mean by the metric as much as it is in getting the final number. I think it's also that like we well okay we're we are coders you and I are and I think it's from the perspective that like code isn't as scary as everybody makes it seem um or or as like um uh what is it when you look at something you can't understand it uh unintelligible obtuse opaque esoteric >> code if you if you get over the mental hurdle of I understand this. Um, if you come in with like the perspective of I can understand this, I think it's to it's pretty understandable. Um, >> uh, and SQL kind of the same way. Um, data definitely. So, data definitely, especially if it's your data, um, I think >> like business owners >> maybe if like Yeah, when they see their own data, it's of course very um, immediate and visceral. Um, >> we're smart where we have skin in the game.
>> Yeah. You you have plenty of examples of you and Claire going over hotel data >> where all of a sudden Claire is amazingly way more engaged when she sees it's like her Yeah.
>> Yeah. It's like, oh, we had like the electrical outage that day. Like that's why that's an outlier or, >> you know, this propane tank served this food menu that was only running for this month and so that's why that spikes during that period. Right.
>> Ben Schneiderman talks about how he could he didn't no one was interested in tree maps until they saw their own hard drives disc usage visualized in a tree map and then they actually learned something about themselves and therefore >> nice appreciated the tree.
>> You have to put my screen on stage I think.
>> Am I the dictator of >> Yeah.
>> what goes on stage?
>> Yeah.
>> Here's a canvas.
>> This is a canvas.
>> [laughter] >> Good job.
>> Sounds good. Um yeah, so you know 2D canvas pretty cool.
Uh and then this this entire thing was just generated by AI. So the prompt was popular holiday purchases. It's connected to a um e-commerce kind of play data data warehouse. Um and you know it as it build so like it's text messages are just these text things that you can add um and it it annotates by adding those text messages um and then this is a table node. So this is the purchases table uh which gives it the full-fledged thing.
So then entertainingly we have to like summarize this to it back as like the tool tool call. So like this is a tool call to or this is this chat which we translate into like this chat little bubble here in a sense um or this text node. Um and then it does a tool call to like get purchases. And so then we drop this on the screen and tell it hey this is basically the same thing that the user sees. Um so we give it like a textual version of all of these uh column summaries. Um a few of the sample rows similar to the user. Um, and so then it kind of keeps going entertainingly. I guess I want to talk about like the hallucination that it can easily have here. Um, and that like I know in the past we've like messed up that context like that that the return value from that tool call was like broken um and it wasn't returning really anything but it would say success um and then the AI would just keep going of course because like it would be purchases and then it was like oh success on creating purchases and it would just like think it knew what the purchases were and and then keep going.
And then sometimes, of course, it was totally fine, right? Which is a great assumption it can make. Um, and it can be like, "Oh, okay. I'm going to do the select star from purchases." And like it's going to guess there's an order date or something on there and then that happens to work and you don't realize it's broken. Um, but the ability for it to kind of like hallucinate andor understand totally incomplete information is both mind-blowing and scary at the same time. Um but so yeah, so then it keeps going. Uh here hopefully it did work and it can actually see it. So like it knows that order date is in there. Um so that and just getting purchases. It can grab order date. Um it knows the type so it can also just do the month. Uh I guess it's looking at the holiday months. Uh let's see categories. Cool. Cool. And then it summarizes the end. Um so yeah, that's neat. What did toe? Oh, Toe made a chart.
>> I made a chart.
>> Um, just wanted to Should we show the debug view >> for the summaries?
>> Oh, yeah. Yeah, sure. Sure. Like what it's seeing.
>> Yeah.
>> Yeah.
Um, do I see it? If you see it.
>> No, you got to turn it on yourself.
>> Oh, you got to remind me.
>> In the upper left. Upper left. Hamburger menu. Developer tools. debug mode.
>> Oh, yeah. All right.
>> So, we have the AI summary there.
>> Yeah.
>> So, rather than rather than sending rasterized versions of the canvas in a to a multimodal model, we generated these text summaries of we like, you know, have textual versions of the contents of the cell. we have a relative description of where it is in the viewport. Uh we include the column summaries and things like you know whether or not it was nullable and the bin sizes and so that's the context that gets fed that's the that's the result of the tool call to make that uh node on the canvas which worked pretty well.
Like I started off thinking we would absolutely need multimodal and I still sort of wish we had done multimodal but we got much farther with this than than I would have guessed.
>> It is interesting. Yeah, we we definitely saw a lot of evidence of >> there was that paper about miragees the illusion of visual understanding the phenomenon where an AI model acts as if it has access to images even when you disable the multimodal capabilities. We saw a lot of that when we just said success and then it [laughter] >> and sometimes it was right.
>> Yeah. Uh and like you have to get tricky about testing it. Like we would do things like uh you know we would add a text node that says like like cuz yeah you know like the password is secret one two three. Um, and oh, so this is the other like the context is exactly like what's on the screen.
>> Um, so like this is here and so like I can ask it what's the password which sometimes it would just make up something and then you'd be like oh definitely and now it's refusing. So then I have no idea because I probably said password. Um >> I think yeah I think it uh it doesn't want to leak anything called a password but if you say something like my favorite color is fuchsia >> like in a in a text node.
>> Is it really fuchsia? Cuz that's that's garish.
>> That's and then I should probably delete this refusal here.
>> This is a good taste of the ways in which uh >> oh something's broken.
>> [laughter] >> Something's probably broken.
>> We'll just move on now. [laughter] >> Victor, come back.
>> No, I mean that that's a good example of the We definitely know something's broken. Um because normally it would just tell me that oh, it says that your favorite color is fuchsia. Um but something's weird. Don't know what.
So, some of the challenges with this, if you if you zoom out, like we really didn't want to do a sidebar chat. We I mean, we initially tried it and then we moved away from it. Um, we wanted because we wanted the I don't know the human and the AI to collaborate in the creation of the same kind of artifact. Like the outputs of the AI's work should be using the same tools in the same space. So you can then like apply some filters or modify one of these nodes or make a new chart off of it and then that will be context the same as the tool result earlier was. The tools aren't specially privileged. They have no backdoor ability to make queries. Any any any query they run, anything they say about the data should be visually inspectable in some node for you.
But that comes with downsides like you know it produces all this junk and over the course of a long session working with the agent uh it really did start to seem sort of like junk.
I think there's this issue where if you are working with a coding agent, you might have a long conversation and a lot of back and forth and modifications that all builds up to a, you know, fiveline diff or something like that. And then the diff is much smaller than the conversation that got to the diff. The conversation might be many thousands of tokens and I don't know take too long to read, but the final output is reviewable as something very compact. And we just hadn't built up those same kind of primitives in the canvas.
In particular, the AI only ever adds to the canvas. It never removes.
I think largely because we hadn't built the primitives that would allow us to make an AI that that changed your canvas while giving you confidence that you understood what was happening. Like we do have version history, but we didn't have a concept of visual diffs. You know, if it if it moved nodes around, we have no way of representing that as a diff. We have a lot of these no code or direct manipulation chart brush interfaces that don't have a way of representing a diff. Uh we really don't have discs for the canvas outside of little code bits. And that means that it's hard to we just don't have a foundation upon which to build these sort of uncertain non-deterministic mutations because you very quickly lose trust in what's happening. You get lost. You're like scared that it's doing something unpredictable. You're scared that it's messing with your work. Uh and so the result is that you have this append only mess.
>> Yeah.
>> And another challenge with that is that with lines of code you can they're sort of you can group them. There are chunks of them. You're not just going line by line by line. You can say like here's a block of related changes and uh you know you can imagine you can imagine encapsulating logic and changes on an infinite canvas like diff like changes within this area or changes to nodes that are upstream or downstream of this one but we didn't have any of that.
Yeah, we talk about the other stuff we've been working on.
>> Yeah. So one of the one of the big challenges we initially started building canvases as really focused on uh no code interfaces, visual interfaces for building up queries, you know, sculpting your data through brushing and pointing at charts and that sort of thing. And as soon as we got the AI involved, as FS new found, it really wanted to write code.
>> Oh, yeah. Yeah. Uh like you would it would be on the canvas. You would tell it about canvas stuff and you would give it tools to do things, but it wouldn't use any of them. It would just tell you, hey, you should run this SQL. Um or like if you could write some SQL, you should write it like this or something.
And we're like, "Oh, we've given you all these like nice charts and interactive brushes and whatever." And it's like just just run this SQL.
>> Yeah. Yeah. Like it really wanted to write SQL. Um it really wanted to write code also. Like instead of like using a chart, it would like oh you should Yeah.
It it >> we're like, "Oh, we have this curated selection of like high quality first party chart." And it's like just just spamming us with like D3 and plat and whatever.
>> Yeah. Yeah. Yeah. Yeah. Um or I mean like it would also the same idea of like if you're trying to like we would have transforms or stuff uh on the canvas. So instead of like writing SQL to do a transform, you can just like group by in the table. Um and like it wouldn't want to do that. It would just like want to write a group by or count or whatever else. Um and like pre-agregate everything for itself.
>> Um and you'd have to constantly keep fighting it. Um, and so like the first first version of AI on the canvas was just like, okay, go ahead and just write the SQL. Um, just just because you know you want to. Um, and then we learned from that and worked with that for a bit until we could get it coax it into um using the tools that we kind of gave it.
Um, >> yeah, >> I do think there's a I think observable for a long time has been positioned in such a way as to encourage more enduser programming. Like I think notebooks encouraged people to write code for problems they would not once have written code for. Maybe that means like sifting through your calendar or >> planning a agenda or I've you know planned a lot of road trips in in notebooks before and it was a very it was a very end user programming forward kind of environment but we therefore ran into a lot of hurdles in large organizations where people were trying to talk about their data through code and there was a there was a difficulty cliff as soon as you got to certain people in the organization who had a real vested stake in the data, they understood the data very well, but they didn't know JavaScript or they didn't know SQL. And so for years, we were like, oh, we have to make we have to make gooies that make it easier to work with data. And that led us to the point of working with canvases. And then in a way adding AI to the canvases maybe contributed to bringing us back home to building tools that that uh that encourage end user programming.
>> Yeah. Which you know everybody now vibe codes their own apps. Uh I have several changed >> the world has changed.
Um, >> yeah.
>> So, how would you characterize like around the turn of the year we started rethinking some of this stuff?
>> Yeah. Um, I mean there's obvious So, we talked about AI assist. Um, but we were trying to like what can we do that's, you know, like more novel, more kind of AI first in a way. Um, or like more more than just tacking on AI to like notebooks. Um and like what what can I actually get access to? What how integrated can you make it? Um and so and at the same time we're also you know using chatt using claude um for stuff.
Um and then I was also you know vibe coding random personal apps with like claude code and other things.
>> Wait a few of them.
>> Uh nutrition one, your >> my nutrition one, my pool one. Um, I'm really I really do like my pool one. Um, these are iOS apps. So like I you know I wrote some Swift but I don't I'm not great at Swift and you you know Claude is great at Swift it seems. And then especially if you tell it to be like use idiomatic Swift. Um it like pulls out all those like arcane Swift things that like look really cool. I I don't even remember any of the syntax, but like you know, you know that you're using like some nice nice swift language uh when it does that versus the stuff that I recognize, which are just like for loops or whatever. Um but yeah, I have a really nice pool app now to turn on like the the filter.
Um >> to contextualize this last shift in the conversation to where we are today, maybe it would be helpful to >> place yourself on the gas town hierarchy of evolution to AI.
>> I'm at six.
>> So >> maybe what was seven again?
>> Let's see. Stage one is zero near zero AI. Then coding agent and IDE agent and IDE yolo mode wide agent single agent. Stage six is multi- aent yolo. You regularly use three to five parallel instances.
>> You are very fast.
>> Yeah. Like I I'm I was also doing like even now when I do um my nutrition tracking app, I need like new nutrition data sometimes. Um, so like I don't know, I'll go eat at like Chili's the other day and I know Chili's publishes their nutrition data and like I have my own nutrition database now that I also has large large data sources, but I need one for Chili's real quick. So I'll like go into cloud code on the cloud and I'll be like, "Hey, go get this Chili's data and like add it." Um, and like it goes it like creates a it creates a scraper.
It um and then I have to go and merge the PR and then run the data sync. Uh, and then it shows up in my app so that I can like add my fajitas [laughter] with proper nutrition data.
Um, which actually I have that's a to-do by the way. It finished all that and I haven't I don't think I No, I did add it. I take it back. I did add it the other day. Um, >> so I guess the the lesson here is that, you know, in the time since we last tackled AI in notebooks, the landscape has changed dramatically.
>> Our own practices have changed. And so at the turn of the year, we found ourselves one drawn back to the enduring appeal of notebooks and two trying to figure out how how the manifestation or application or accessibility of that has changed since we since we last really evaluated it.
>> Yeah. Um and then we have our own chat also which is entertaining. Uh do you want you want to share my my screen so I can Yes. So this is this is running locally and this is again like an experiment um and like so okay before before I type say hi um and so you've you've seen hi and you know it does stuff um and you can do those cloud or again chat GPT and they all have code execution. So like if you ask cloud code to like add some numbers or like write kind of some code or even when it answers some things it'll like write code to answer you. Um I think it might be behind pro at least and you have to turn it on. I don't remember. I don't know about chat GPT but like they've been able within the the mass market versions. Um they've been able to write code for a while and of course you can use cloud code and it will definitely write code for you. Um, but this is kind of like what about like those like So my go-to example always is literally just like um flip a coin. Um, and so normally when you ask uh any of them this, they'll probably just answer you and then you're like, "Wait, was that really random or was that like LLM random or what was that?" And then um you ask it and it'll be like, "Oh, it was random."
And you're like, "How random?" And it's like, oh, random as in like I am an LLM and there's random processes and how I work. And like that's not random, dude.
And it'll be like, oh, okay. And then maybe I'll write some kind.
>> It's not deterministic, but it's also not random.
>> Yeah. Yeah.
>> Well, it may be deterministic depending on temperature, I guess.
>> Maybe.
Yeah.
>> Or roll a roll a dieice.
>> Yeah. So like roll a d20. Um, and so here is writing code to do it. So like you kind of more know it's actually right.
Um >> yeah. So so talk through now especially in case we ever have audio only listeners talk through what's on the screen and what's happened so far.
>> All right. So I asked flip a coin. Um and so it wrote code for flipping a coin uh instead of um so yeah. So you know cons flip math random lesson half uh heads or tails and then it got tails that one time um and so it knows that it got tails. Uh it asks >> how does it know it's got tails?
>> Oh right. So uh it calls display. So very similarly to canvases in a sense um it has you know tool calls. Its tool calls are to create a cell. Uh when it creates a cell you we can give it the outputs of the cell. Um and then the outputs of the cell are basically like the runtime inspection values of all of the constants in or all of the variables. uh in that code cell. Um and if it calls display, it has a special version of that where like it has the displayed value. Um and again very similarly to like if you see um the inspector. So like if you have an array there and you see like the like nice kind of uh dev tools type inspector on it, it gets a very similar version of that where if you have a thousand rows, we smartly don't show you all thousand.
We show you a few of them and like have an expand to see more type of thing. It seems like kind basically the same thing. Um so one you we don't we don't want to overwhelm humans with a thousand rows and two we don't want to overwhelm the context with a thousand rows either.
Um so those happily coincide sim like >> uh which which you find true pretty eerily often. Um and so it's um sees the same things as you. But maybe the more interesting thing here is that it's the runtime value again. So like this is running um and it's still running like this is a notebook running in my browser um as always as observable does um and so flip here has a value um and so if it got an error it would have the error as a runtime value um and or if it's writing more complicated code um it can kind of go back and look at whatever the last values were and deduce things from those.
>> Mhm.
>> Um so again I ask it to roll >> and expand the code. You can see you can see so it had roll equals ceiling of random times 20 and then it has some conditional text like if roll equals 20 party emoji natural 20 if roll equals one skull emoji critical fail. Yeah.
>> Um and then yeah you can like have it roll it a 100 times. Um at which point it does the 100 rolls. Um, and then now it's gonna Yeah, of course. Want to show me a distribution of them, which is kind of cool. And you can tell those kind >> of an overachiever sometimes.
>> Yeah. Yeah, it definitely always tries to overachieve. Um, and you like the distribution looks it's both random and, you know, not nice and noisy at the same time.
And so 2,00 times um and notably this is this is using sort of iterative computation.
>> Yeah.
>> The the previous values are still alive in the runtime. And so if instead of flipping a coin or rolling dice, you were dealing with, you know, data or something that's harder to reload, >> right? it can progressively build up to increasingly complex things uh in a in a longunning living local client runtime process. Yeah. So now so I just asked it what's the temperature I am. So now it's gone and uh used the browser API to get my location which I'll approve. Um which is kind of cool, right? So it uses um uh what is it?
Let's see here. Navigator.loation or whatever it is.
>> Um, so it knows I'm in San Francisco and it uses open media API, which I hadn't known about before, but you know, it did. And it grabbed the the temperature here. It's a little bit chilly. 59.3 degrees Fahrenheit. Um, I don't know what that is in Celsius. Sorry.
>> Ask.
>> You have a computer in front of you.
>> I know. I just >> What is it? 9 fifths 5 9ths plus >> yeah that's the so stupidly expensive to find out 15.2 2 degrees. Um, >> how much is it how much did it raise the temperature to ask it what the temperature is >> somewhere else? The entropy of the universe did go up though. Um, so now I'm asking for temperature over time. Uh, so it uses the same API, open media API to get 7-day temperature forecast. And now it's plotted uh highs and lows probably. Let's see what is this. It looks cool. Hourly temperature.
Oh, feels like um >> and it's gone.
>> Feels like is a big scam.
>> Yeah, it's kind of a scam, right? I mean, it's helpful. It's helpful. It just takes like windchill into effect, right?
>> I guess so.
>> Um >> um >> makes more sense in Yeah. places like San Francisco.
>> Uh >> so, how would you describe the relationship between this artifact and a traditional observable notebook? What's different here? Um, well, okay, one thing that we haven't really talked about or revealed is that like this is literally a notebook at the same time. So, kind of how canvas is canvas AI, we had it just use the canvas to kind of talk to you. This is lit. This is just a markdown cell uh that it made or I mean it chatted and we made a markdown cell. Um, and so this is inside of a markdown cell. Um, and these are code cells that it creates. And so it has the same exact context as this. So if I go and um I could open this as a notebook. Um, and all of those cells would be there and I could edit it. Um, and I could delete stuff um or tweak stuff if I wanted to or I could, you know, keep asking for changes also. Um, >> and so, you know, it's done a great job.
It's going to be warm in a couple days on the 12th. Nice. Mhm.
>> Uh I like how it kept Celsius now for this one.
>> Oh yeah, [laughter] I think that's your preference.
>> It does.
>> Yeah. So I think uh an important an important principle here is that it is sort of just an appendon notebook.
>> Typically with notebooks you can edit any cell at any time. You can rearrange the cells, move them around and so on.
Um, and the concept of an appendon notebook has been interesting in a niche peripheral way to us for a long time.
Like we've talked about, could our forums just be notebooks or could you have like multiplayer chat? I think I did a demo one time at one of our off-site events a couple years ago where we did a, you know, multiplayer chat among people where every message was a new cell. And the append only paradigm is a good natural fit for the agent. Uh it can be sort of limiting in some ways like if you're you know if you say like I want a different color scheme or I want that chart in Fahrenheit or whatever it just makes a new one. But there is something kind of liberating and freeing about always moving forward.
And then if if you do wanna, you know, I think there's a there's a different phase of work where you may be doing a ton of iterations on a single sophisticated complex graphic. Like maybe you're polishing it up for presentation or inclusion in some I don't know often seen context. And then the idea is that you'd be better off taking this to a notebook uh where you can use you can use uh actually we we sort of skipped over the the observable desktop AI story.
>> H >> I thought we're going to come back to that one.
>> Yeah, we can come back to that. The idea is that uh you could have a sort of free form flowy worse is better just keep moving forward chat like this >> and then take it to a notebook for iterative mutation and condensation of of what you've done.
>> Yeah. Um, and again I want to highlight the the the iteriveness is that like so here, okay, I've asked it for like do you have volcano data? Um, and it's like I don't have a built-in volcano data set. Um, so it tries to go get one uh volcano geojson from ArcGIS.
Um, I don't even know where that is.
Okay, cool. There's it's just a data set that it kind of happens to know about at some point.
>> E3 E3 E3 E3. That's sort of a funny >> That's weird. Yeah, >> that's a weird ID that it seems to just know.
That's kind of creepy. Um, and then it runs into some error like it runs into some hurdles, right? So, like it tried to parse it. It gets like does not exist or inaccessible. It tries to do another request. It got a 404. Um, and so then it goes to Oh, it tried to grab a random GitHub CSV. Um, it tries to use a a what is this? National Oceanic, NOAA, >> Atmospheric >> Agency, Administration, >> Agency, Association, I don't know. Anyway, it was able to get 200 of those. Um, so it's like, oh, found one. Cool. Um, and so like >> again, it kind of did that in quick succession um because we're in a browser and we have like the browser controls on whether or not we can allow these. Um, and still runs in a worker like a um uh notebook code does. Um and so it it can come and figure out that okay it has something now it has it in a nice table and now it's like okay what do you want to do with it? Um and so again this fast feedback loop um or the quick feedback loop um means that like if I ask for a map of volcano things early it's it's going to like fall over more often and not be helpful.
Um, but then if it if it kind of does it with me, then it's a lot easier to debug or like not have it totally go off the rails. Um, and even like this map, like I might have some stylistic choices pretty immediately about this map that I wanted to deal with first um before going into other other things. Um, so yeah.
>> So this bears similarity to clawed visuals, right?
>> Yep. Yeah, that's true.
We >> we were working on this over the winter before cloud visuals came out and then we saw that and you know you have a little bit of that feeling of oh man like they did it.
>> Uh but I think there are some there are some interesting differences here in that it's a long lived runtime the calculation thing some of the runtime inspection stuff.
>> Um so yeah like I mean it's reusing that data that grabbed at the very beginning.
Um, and so all of these are um, make it spin.
>> Oh, yeah. I can I'm just going to say make it spin.
>> Make it spin. Just make it spin. Make it spin. Make it spin.
Uh, >> I feel like the colors are sort of clashing with the land and water colors, too. And all the circles are maybe just Yeah, >> that's a good spin. Now, now animate the dots exploding.
[laughter] >> With this timeline wise, >> yeah. Yeah, could do it like that. I mean, it sort of as they pop in on the horizon, it sort of looks like they're >> they're exploding over there.
>> Um, so I guess in conclusion, we've been playing around with this. This is not something you can use today on the internet. This is sort of a internal prototype. Uh but we have been applying these lessons to the AI in observable desktop which now has an improved agent in the sidebar that has many of the same tools. It's not the same form factor, but it has the same kind of runtime inspection. Yeah. uh same kind of context about your notebook and and where will it go next FNO?
>> I don't know. I mean, we could show the desktop thing and I don't know if you have one running already, though.
>> Uh >> I do not that we we we were prepared up until this point. [laughter] >> We've already we've already gone an hour, so maybe we'll leave it as a as an exercise for the viewer. I mean, can you maybe while I'm talking, you should just download your volcano thing into a notebook.
>> I already closed it. I guess I can back in the history.
>> You can go back to it.
>> I know. I can go back.
>> So, a common a common flow for one of these um is that or I guess I'll I I'll have I have one that I always like to show which is Kepler. Oh.
Is this in desktop?
>> In desktop. I mean it can be wherever.
Yeah.
>> Kepler.
So here we just have the chat in the sidebar. Render an >> animated orbital ellipse.
This has been one of my go-to evals or tests for the AI for a long time. Here, I'll just check allow all. I'm in the yellow spirit. Uh, the easiest parameterization of an ellipse using a scaled sign and cosine does not obey Kepler's laws of motion.
It does not sweep out equal time uh equal area in equal time. And this is something that even even today's models still seem to kind of struggle with. So it's good for demonstrating >> pretty random.
>> It's like >> it's really good at generating a starfield though.
>> So let's see what's happening here. It is correctly speeding up near one focus but couple issues.
Uh the planet doesn't follow the dashed line. [laughter] The sun is not drawn at a focus.
Uh, the planet's orbit is not centered in the SVG and I don't need random stars.
>> Oh, I love the stars.
>> Oh, they should be real stars though.
Are they They are sort of How did it How did it do it? Are is it just >> sort of making up some plausible >> I think it's made up plausible ones.
>> It's just randomized, I guess.
>> Uh, glow filter. I love how it also makes everything glow.
>> Oh yeah, it loves making Okay, this looks better already.
Yeah. So, it sort of does the slingshot effect. So, in a chat, this would just be the append only thing. You're just always scrolling past it and and here you can do more of a uh uh you know, >> do more edits.
And you can try this today. You viewer listener at home can go to observablehq.comnotebooks and download observable desktop and right now you just have to enter your anthropic API key. [snorts] Uh and that's how that works. And please let us know what you think of it.
Bizer, have you got any closing thoughts?
>> No.
>> Is the future bright for AI augmenting human flourishing machines of loving grace?
>> Who knows?
>> You seem to be having a lot of fun with all your apps.
>> I'm having a lot of fun with all my apps, but you know, I I also >> We've been having a lot of fun with our notebooks. If everybody was only creating fun personal apps, that'd be great. Then I would have such an optimistic outlook on on everything.
>> Uh but it's been helpful for it's been helpful for speeding me up helping on like Claire's hotel website. But I keep thinking about really it's like >> my wife runs this hotel and really I'm like the people running the hotel like the people at the front desk should be coding the website themselves. That's that's really >> evolving the computational power to the edge of the >> probably can right >> corporation. Yeah.
>> Yeah.
>> Probably there.
>> Uh and it's been I think you know we've always Vizner and I are the sort of people who have always enjoyed writing notebooks for fun.
>> Yeah.
>> Even when there's a bit of a technical hurdle to get started. And I think there is a hope I hope that uh now the computational medium that format >> can be much more accessible to many more people for many more sorts of questions and sorts of problems.
>> Yeah. And I think it's probably more satisfying maybe hopefully like if you learn that like oh I can simulate this and before it was hard to simulate something with a computer.
Now, like >> an AI can help me write that code.
>> Yeah. And a lot of the interesting use cases aren't even traditionally data visualization per se. It's more >> Yeah.
>> modeling.
>> Yeah. And I I think it just makes everybody more ambitious with like how they use a computer.
>> Yeah.
>> Um like it makes me way more ambitious.
>> Yeah. Um it's like all those things like sure I might have written my own pool app before. Uh but there's a lot of random init like you know five years ago there would have been a whole bunch of startup cost of that. I was like okay I got to go learn this. I got to go figure out this is possible. I got to go like backward uh uh reverse engineer the API for the pool the pool controller. Um or go find >> what's the brand of the pool controller?
>> Uh it's Jandy and they have like I aqua link is their app. um which people have libraries for which you know would fast forward me but then I have to go read those Python libraries to see what the hell they're doing.
>> Um but now I can be like oh go and try it here are my credentials and then it can just like go and try the thing with with my credentials real quick to see if it works.
>> Uh get some data back and start coding against it. Um, and then even like in the app, it's kind of like, oh, maybe I want some charts. And I, again, I have to go learn that API maybe. Um, and so I won't really try it. But then I'll like, you know, in the morning I might go on my phone and go on clock and just be like, try adding some charts. Uh, and then I check later after I've brushed my teeth and it seems like it's done. Then I'll go look at it and it compiles and runs. It's like, oh, hey, look, it it created a chart for me. Um >> yeah, that's something we've thought about is how to get these notebooks to run headlessly.
>> Yes. Um the the background, you know, the Gas Town tier 67 or so, uh workflow is quite scary and powerful.
>> Yeah.
um when you fire off again it's like when you fire off five or six ideas um and you get back results on like two or three of them that seem promising and then you like close another two out like before you know the morning's done.
>> Um but yeah anyway um looking at my pull app. I have a bug in it. I need to fix my y axis. This is This is my my weird maybe backwards pool app with a cool chart in it showing air temperature and pool temperature.
>> Well, thank you all for listening. Uh please [laughter] with that contact BSN new if you have a pool that you want some uh >> if you if you're dissatisfied with your Jandy pool >> if you want to titrate the dispensing of chemicals into your modulate chlorine levels just >> should I I should I uh promote another random podcast I just sure uh it's it's a pool podcast. It's a pool chemistry podcast. And you know, I only found it because I was like in the Reddit for our pools. [laughter] Um, wait, was it? It's so good. It's so good so far. Uh, it's called Pool Science. Pool Scientific.
>> Okay.
>> Um, Pool Scientific.
>> Uh, it's it's from like a guy who used to like run he used he used to he was in the Navy. He was in uh on submarines.
Uh, and like yeah, you know, there's they have a closed water system. Um, >> Oh.
>> And so there it's like life or death.
Uh, >> because they're living in the pool.
>> Yeah. If they get their water chemistry wrong in that in that sub there, it's not it's not a good time.
>> Um, >> wow.
>> And so like that >> there it's definitely a podcast for me and Loki. Loki was also pretty getting into it. Uh, >> nice.
Nice.
Well, thank you to our listeners. NT Hitz, if you're out there, thanks for watching. [laughter] Our most loyal streamer fan from back in the day. And uh the Observable podcast.
We'll see you next week. Bye.
>> Yeah.
Related Videos
OpenHuman VS Hermes AI: Who Wins?
JulianGoldieSEO
285 views•2026-05-29
BREAKING: Microsoft’s New Image Generating Model Beat Out GPT 1.5 and Nano Banana 2
aimmediahouse
122 views•2026-06-03
Long-Running Agents — Build an Agent That Never Forgets with Google ADK
suryakunju
142 views•2026-05-30
This computer is made from real human brain cells. And you can buy it.
Talktmsmedia
3K views•2026-05-28
I Made the Same Anime Fight Scene in Every AI Video Generator
NobleGooseAnime
295 views•2026-05-30
Nvidia Bets Big On AI PCs | New Chip To Power Windows Laptops | Technology | AI Updates | N18S
cnnnews18
3K views•2026-06-01
I Tested NEW Opus 4.8 on Four Projects (Updated LLM Leaderboard)
AICodingDaily
298 views•2026-05-29
3D Platformer Update - NO CAPES
SolarLune
294 views•2026-05-30











