Modern AI tools enable rapid prototyping of complex applications by combining multiple APIs (such as OpenAI for research, ElevenLabs for voice generation, and agent platforms) into streamlined workflows, allowing developers to build functional applications in hours rather than months. This approach, called 'vibe coding,' democratizes app development by reducing technical barriers and enabling creators to experiment with new interaction patterns, such as enabling users to take photos of statues and have AI-generated conversations with them within 30 seconds.
深掘り
前提条件
- データがありません。
次のステップ
- データがありません。
深掘り
How to talk to statues — Joe Reeve, ElevenLabs追加:
[music] >> Can I get a vibe check of the room?
How's everyone feeling? Are we like we really want to hear what Joe has to say about statues or are we like I just want to like chill and just sit in a quiet room. How people feeling? Statues.
Statues.
Okay, that's good to hear.
You're much more lively than statues which I spent a surprising amount of my my life with recently so um This I'm going to well actually I'll quickly introduce myself. I'm Joe. I work in the growth organization at 11 Labs.
Hands up if you've actually I've got a slide for that. Hands up if you've ever used slash heard of 11 Labs.
Okay.
So that this next slide you'll probably be familiar with a lot of it. Um 11 Labs does basically we're an audio AI foundation model company. So everything from text to speech, you put some text in you get some speech out. Um transcription in the other direction.
Music. We've got the first commercially legal uh AI music generation. We've licensed all the all the all the training data uh behind the API. Sound effects. Create voices. This one by the way just like pro tip in case you're ever using 11 Labs at a hackathon or something. This section creating and editing voices is like the thing that people really just don't use 11 Labs enough for and this is the thing that the statue app that I built that we'll talk about um is built built on and then agents uh this is our fully managed agents deployment platform which sounds like a mouthful of of SAS and it sort of is but it's also very cool in in various ways.
So who saw the statue app? It went quite viral at least here in London.
Raise your hands if you saw it.
Okay, that's fine because I'm going to play you a video cuz I people like me love playing videos of our own voice out loud.
Um it's I'll just play the first 30 seconds or so.
I made an app that lets you talk to any statue you want using AI. So we've come to the British Museum to see how it works.
>> [music] >> I am Pharaoh Amenhotep. I am Demeter.
Kua Hakanaia.
I am Hans Sloane. The guardian lion.
I am the young writer. And I am the war horse.
First, take a picture. Okay, so now I'm about to explain that bit of me explaining. So we basically what this statue does is it lets you take a picture of a of a statue. So this app lets you take a picture of a statue. Um it then does a an open AI deep research uh on on the identity of the statue generates a bunch of the sort of historical knowledge and prompts for what it thinks the the voices of those individual statues would have been if they were alive. Um it then uses our voice design API that really underutilized API where you can put in a description of a voice and it will go and generate something that matches. And then it creates an 11 Labs agent and then starts the phone call and that whole thing works in like 30 seconds. So you take a picture of something you get all the like search research back from OpenAI generate a voice and start talking to an agent to a statue within 30 seconds. Which is pretty fun. Um What sort of if you if you're interested in like reading more about the details of how it was all built you can scan that QR code. It's just a blog post. Um this was attached to the the initial tweet that I made um which basically had the prompt for one shotting this whole thing I built in Cursor in two hours. Um It it sort of sort of well so I I built this in in two hours on a Sunday cuz I was I was sort of tired and bored. And then published the the prompt through the 11 Labs blog made this video that you just saw posted it on a Tuesday or something and got 50,000 impressions. It's like pretty pretty good not bad. Um people on on Twitter kind of liked it. And then three museums or people who represent groups of museums and a bunch of other like businesses including like Tripadvisor competitors and stuff who were coming and saying we've been well actually one of them the CEO called me he found my WhatsApp phone number somewhere and called me up and said I've had a team of 10 people working on this for a year how did you build this? Um and so then the next day I I reposted saying I've had a bunch of like interesting people I vibe coded this in in like two hours not as a brag just as a this is interesting like vibe coding is so powerful and if you sort of experimenting with these these interaction patterns you can actually do something quite big surprisingly big. Um It then went completely viral and went from 50,000 on the first day to 1.5 million on the second day. Uh and it was in part because it got kicked off by the vibe coding and then suddenly I got all these artists and creatives and everything from from portrait uh museums to like Bonhams and Christie's reaching out saying we want to have people be able to talk to the items we want to sell. Um I think what I would sort of Oh and then and then this led to something called 11 Hacks which we can talk about later maybe. Um There are loads of different things in this story we can talk about. We can talk more about the statue app and about 11 Labs I sort of want to get get more from you. Um we can talk about 11 Labs generally we can talk about means to do sort of growth particularly API growth uh at one of these companies um from the growth engineering point of view and and we can talk about the sort of implications on culture which I think is something that we're not as a as an industry we're not really looking at as seriously as we could be. Um vibe coding generally and again what that impact is having on society. Um voice interaction patterns or making viral videos or anything else. So it's sort of yeah please.
I mean it's really easy to prototype these days right? Yeah. Like machine learning in production that's the harder part. So maybe I don't know I'm curious about how you would take your really successful prototype how can you scale it >> [snorts] >> Yeah well that's one of the things this is going to sound very 11 Labs salesy now. The nice thing is that pretty much all of like what I've done is stitched together existing APIs which are designed to scale. So yeah if I wanted to start doing user management that sort of thing that's relatively I think well understood and you can buy in other third parties for that. But the the hard bit the maintaining the agents and the voice design that's all APIs that there's no way I can make a dent in the API volume even if this this goes absolutely gangbusters. So from that point of view I think and I think that's something that vibe coding is really showing is that the glue pieces and and telling a good story about the glue is the in part the most important thing of the project rather than the solving hard technical problems. So yeah obviously there's a lot more work to do to make it actually production ready uh which is something that I've been talking to a bunch of the the museums about. Uh we might be doing that as a like a nice 11 Labs gives something nice to the to the museums. But it's the hard bit is actually not the user management and like Cursor again can pretty much you need to to check it all obviously pretty much one shot that with Superbase or whatever else for for logins and magic links and that sort of thing. It's it's mostly the relying on our our APIs and our our agents platform to do the heavy lifting I suppose. What about e-vals?
E-vals I get so that's that's one of the big things I think the the take a photo and just get your research back is not really the long-term solution for the museums.
Really the important bit the hard bit is going to the curators and saying curators can you figure out what is the actual narrative let's not just take random things you found from Google let's actually like put some thought in in and design to the content. So that's I think the piece that's the longer tail. The nice thing is a lot of the museums have they see their core IP as these databases. Um so they have APIs often um and we can pull that information out. The V&A has a public API for for that their stuff so.
Yeah I guess on this topic and maybe related to the art culture voice interaction it how what is the interface thing that you enable the curators to design the the experience?
Cuz I know Yeah so right now there's no no I haven't designed anything for that. at best they could log in to a to a a dashboard at the 11 Labs dashboard and make make edits to the system prompt and the knowledge base files that are in there.
Um The I I think probably in this sort of the information management the best case the best interaction pattern is probably editing text uh rather than speaking although obviously if you're choosing a voice you need to sort of manage that. There's an interesting question there. I've been talking to uh a chap Jago who used to be the head of the Americas at the British Museum and now runs the Sainsbury Centre which amusingly is the location for the uh the Avengers headquarters in the Avengers films. Um He is going through this big long academic process of figuring out what should a voice sound like for an inanimate object. So it things like where did the materials originally come from? It came from some mountain in China or it came and then so it it the the rock was shipped to Vietnam and it was carved in Vietnam and then it spent the last 200 years living in a British Museum. So like what should it sound like? It would have a little bit of a maybe a Chinese origin with some Vietnamese twist in there but then it's just lived around people with British accents but maybe also not because it's it's lots of tourists. So actually thinking through from a a much more philosophical point of view what should objects sound like? And that's something that from my point of view in in in 11 Labs is really interesting because I I think we have the opportunity to give all sorts of things voices. Like elevators. Like a lift should probably I mean they do have have voices. They're often quite discordant with what a lift is I I find. But it's quite likely I think that we start walking into lifts and saying I want to go to this floor please using voice to to to to to interact with them. So what should they sound like and it's beco- becoming a more important question. I I don't know exactly what the answer is but smarter people are doing that. So so sorry. Yeah. So it would be really interesting to hear uh about your thoughts about voice as an an interface to application.
Um What in general what what problems arise what do you think of how do you use interact what do they expect from this voice agent and when does it not work what problems do we do we expect if we are going to build an interface interface? Yeah, I think currently voice interfaces voice interactions have quite a large range of problems currently but a lot of them are solvable.
One of them is you basically have a binary you're either interacting with voice or you're interacting in some other way and I still feel like the sort of interactive or generative UI plus voice is something that we still haven't seen particularly we've got sometimes it's and this is something I've experimented with is like if you think about a coding agent you've got lovable or something. I want to be able to talk to my app talk to lovable but not talk to the coding agent part I want to talk to like a product manager agent that then goes off and triggers my coding agent to go and do things. So what is that?
So you the voice interaction there is not direct that the thing that I'm talking to it or the thing that's doing the work is the thing I'm talking to I sort of want to be talking to a to a a halfway house person. So I don't what are the problems aside from often the thing you you you end up talking to is not the thing you actually want to be talking to there's also the what are the parallel sort of interaction patterns in UI and I think actually you were just showing me your app earlier which does show the stuff that the the voice agent's thinking and and sort of extracting from the conversation and allows you to to interact with that at the same time.
That's something that I I think we're going to see a lot more of the sort of multimodal conversations where it's voice and visual.
Um the other thing is people don't like interrupting uh voice agents cuz they're too polite people are too polite and I'm starting to learn to just like interrupt agents much more aggressively and that actually makes the experience much better but we I don't know how to give people permission to interrupt.
Sorry. How do you how do you solve the problem of like guidance of skills so I mean with a typical coding agent you can give like skills this this this you can throw it in together right but I don't think it's >> [clears throat] >> the same interface or is it the same interface or So the the 11 agents platform doesn't really support the concept of skills that it could it does support the concept of knowledge files which do get loaded in so you could do it in that way we also support MCP calling so you can have knowledge embedded in those um or skills embedded in in the MCPs the uh I don't think that's core to voice or not I think that's mostly down to like the the interaction patterns of of coding agents are quite lend themselves to skills um but you could have a voice agent that then has the ability to use skills and you can add voice capabilities to an to an existing coding agent that in in that way so I don't know if that's related directly to skills but um I basically want to talk to talk as a as a human model. I see I see well actually some people some people have built the particularly with open claw actually there's a 11 labs is is quite a common interaction pattern for open claw people have built phone phone numbers they can call and then they'll call them back and that's what they think um and then obviously you can if you say well I want you to be able to load in skills it just like learns how to do it and and adds its adds that capability to itself so that's for sure possible um and people are doing it with their sort of open claw setups and and some closed code setups In this experience one of the AI culture in the vibe coding thinking about new interaction patterns to create with history with our built environment like where or how do you see that starting to because I mean I mean we're so empowered now with this vibe coding it's still there's a barrier but how do we build that engagement and and do it well I don't know what the museums are telling you I mean I'm sure this is so new but Yeah I think to be honest the the museums I so I I met with the CEO of the Science Museum or co-CEO of the Science Museum and they're asking the same questions they don't really know the answers they're saying well I mean and and they the Science Museum for example is really good at going and trying stuff so they've got all these big tablets that kids can go and interact with but in my personal opinion a lot of the time that can sort of feel like sticking technology onto the the rather than it being a core part so some of the stuff we're experimenting with here is obviously you've got to take a picture of a statue and you talk to it we're commissioning a a statue to be made that has the technology inside of it and speaker and phone um a microphone so that you can then talk directly to the statue without having a piece of technology in the way without it feeling tacked on and that's something like with the the red phone booth you may have seen here um on floor three uh there's a phone you can pick up inside of a K6 um red London phone phone booth or British phone booth and talk to an agent talk to um Sir Michael Caine so that's trying to put it into the real world a little bit rather than having it go through a screen.
um So I can imagine I mean with vibe coding I can really see kids making their own games like you can imagine a kid going to the Science Museum they have their tools and they can start creating whatever experience they want as they engage like I it just sort of explodes the possibility. Yeah yeah I mean I think even more generally there's a question of what's the vibe coding I feel like still hasn't really gone consumer mainstream it's like even even lovable sort of feels like it's targeted at consumers for building effectively building B2B SaaS apps you know it's like have super base and you know uh standard standard design components but I this is why I really love vibe coding evening vibe coding events they feel like the OG hackathons because people show up and they have they've never even thought about writing code before sometimes I go to them and I talk to people and they're like I say so what's your favorite app and like who made the app and they're like wait people make apps I thought they were just there on my phone on the app store right they hadn't even thought through the fact that people have to make them so that's something that I I find vibe coding events really fun because people come in and they type in they have no idea what have no idea what a hamburger menu is or or like a an accordion is so they just say I want this and I want this I want this they get something completely wacky cuz the LM just says yeah okay I'll try it when if they were talking to a software engineer I would have said you want one of these and one of these.
Um so I I I think at some point we're probably going to have like what's the Instagram filters moment for vibe coding or the Tik Tok moment for vibe coding I think we're going to have something like that that makes social vibe coding much more but I I don't know what it's going to look like but worth experimenting. Do you see anyone doing it well like in that sort of experimentation?
There's there's Spielwork they're sort of a an app mobile app for vibe coding games and it's Tik Tok swiping um and there's I think I can't remember what it there's a London based game vibe coding tool that is again focused on games I don't know that games are really the thing cuz they're quite complex um but it it there are a few people experimenting but I don't think there are that many people really deeply pushing the boundaries of of what's possible they're mostly like lovable but on your mobile phone is my opinion um does anybody else have any opin- any sort of see anyone doing good consumer vibe coding?
Vibe coding to what extent in terms of like creating things or copy or whatever or just in any any sort of sense? Well I guess content creation yeah.
Yeah.
Well well well you you in your game you can like spawn things in with voice right and that's like a great it's not quite vibe coding but it's still interacting and Cuz I think that's that's what it like vibe coding still means you have an understanding of certain primitives like even what you just said you're thinking about databases I think when this goes mainstream you're not people aren't thinking about this way they're they're and that's what's so exciting for me when you think about culture and what you build is when it gets to a point where just things that us as engineers wouldn't even it's not even how we would approach the problem and that's where some incredible creativity comes So the the pattern that I think is closest to being a winner in this space is the Facebook Instant Games API which doesn't exist anymore um they they deprecated it but it was in Facebook Messenger you could play these games and they had these primitives for social gaming that they tended to be quizzes or like Fruit Ninja and you compete with your group chats and things I built I bought for 15 pounds um a Fruit Ninja clone off a website instrumented it with the the Facebook Instant Games API which was this beautiful like JavaScript async await had a get user information get friends uh create a leaderboard or get your position on the leaderboard um very basic data storage key value storage and um async await at show an a rewarded ad and show an interstitial ad and so those things allowed you to make very quickly and easily a social graph enabled ads enabled experience for consumers so I bought this thing 15 pounds instrumented it with Facebook Instant Games API and went to bed the next day I woke up with 15 million users on this random yeah I mean I didn't make very much money but they're like 15 million users in Vietnam because obviously Facebook they test everything out in in the sort of lower value advertising regions and then rolls up but that was an amazing because you've got the social elements of people instantly sharing it around and that I think is probably the template that's going to something along those lines is probably going to be the thing that wins on on the social vibe coding but I don't know if that actually answers the question but On the kind of like vibe working vibe building kind of uh front I kind of feel frustration whenever I get a a response back in voice maybe it's just me but like it's almost like the the the input like the information density per second isn't quite high enough like I have I use a lot of voice out just getting things out of my head so I can pass them to whatever Or voice input I guess it's the yeah. But then I I I still feel like I need I don't know diagrams or text or some like really high density um thing back from the system so I don't know whether how do you guys think about that? Whether I should be curious to hear other people's views on that.
Whether they agree with it. It sounds like there are some nods. But, that's what I find myself leaning towards where it's like information rich in and then I can just speak my thoughts out as they come out and it's almost semantically understood. Put into that information rich format.
And then my intent is like, you know, spawned out and and across the system itself. So, that's the pattern that I at least find seeing and feeling and what I almost want to vibe interact with now, my email or like everything is cuz I feel I feel like that longing. It's not quite there yet, but I I I I feel like I I think the industry needs to build in that direction and then to try to get in information rich, you know, Response, yeah.
And just my sort of raw intent. Yeah, that's something that I feel. I absolutely feel I, you know, I want to speak speak easily and quickly and then receive maybe a little bit of voice, but mostly this like generated, maybe it's a UI, maybe it's just diagrams, maybe it's whatever app I'm in context of. But, yeah, I want to have parallel input parallel output, but like single input, my voice. I think you're talking about so I I find that I don't actually get that much information necessarily from voice, but what I do get is like companionship.
Mhm. it kind of like triggers that. It lessens the loneliness feel somehow. If I'm like talking to something and I'm getting some information, I don't feel I might be learning more if I'm looking at diagram or text, but I I don't feel as I feel more motivated to kind of like continue tinkering. So, there's like some interesting modalities where you feel different things if you get the information coming in at a different modalities, at least I find so.
Curious to hear how other people think about whether they they think differently or whether that's something that they also feel as well.
Well, the visual cortex is much older than voice and and text whatever. So, seeing something is Yeah. Also, it gets When you ask it to be concise, I don't feel offended if it gives you a concise answer. But, in speech, if it gives a concise answer, it's all really like, man, I just asked you a question here.
to be concise then it just sounds rude.
That sounds an interesting and may maybe this is possible, maybe it's not. But, what does what does skim listening look like?
May maybe maybe like listening actually should also have two buttons, like forwards and backwards and I can just tap tap tap tap tap go forwards half a sentence until I I don't know, maybe that's maybe we should build that. Who who wants to vibe code something with me straight after this?
How would it work? So, you'd be you'd be you'd just easily Yeah, back back and forward with the with the audio.
It's like a speed dialer like on a podcast where I want 2x. Yeah, we or like the old iPods where you can sort of spin forwards and backwards. That's I don't know, that's that's probably quite a nice listening interaction. Cuz and you sort of I guess you want to scroll forwards in concepts, not necessarily in sentences, right?
Like what is the thing that your eyes look at when you're skim reading? It's probably it's not the sentence structure, it's like the words again.
Yeah.
Yeah.
And that's actually yeah, this is this sorry, this is very interesting and exciting for me. Like because if if I'm talking to an agent and it just starts rambling about, you know, sometimes you get back three paragraphs of stuff and I'm like, no, not this one.
But, I want to interrupt it and say go to the next one. But, then it's effectively saying it's a new prompt, right? So, then it's effectively saying, yes, okay, I'll focus on that next one and then it'll write me three paragraphs about the second paragraph. You know, that's not really at all what I wanted.
Unless I say be concise, then it says something rude to me. What would you like make a summary for each paragraph and then like if you hold the Yeah.
Yeah. It will expand and then or you can like Yeah, I think the Claude app has done some interesting stuff on the voice interactions because they they show you something different to you hear.
And they show the higher level sections and then it goes into each one and you can tap on them. So, that's I guess getting a little bit closer to this.
But, I think that also means you're not interacting as though you would with a human conversa I don't know. I I I'm I'm thinking about that. Like why do we not have this issue when we're talking to humans?
Right? Like how do we What what are some of the other cues?
Yeah, but right there there there there's like a cue we have a sense.
There's so many other things.
Yeah.
If I'm as this many points, like if there's an agent response coming in audio, I don't know how long it's going to be.
As I think of what was mentioned here.
Like is it going to be like, you know, a minute or is it going to be 10 seconds?
I almost want to know. And if I know it's like really long, But, also you can tell when I'm about to interrupt you. So, you go faster. And you like move maybe you and you can tell if people are listening or not.
Yeah.
I guess oh, there's another thing which is interesting here, which is the interrupting. Sometimes I don't want to interrupt. I just want to say, yeah, yeah, yeah or oh, but no, go back. And like it's like you you're always listening for the uh-huh, yeah, yeah, yeah. But, you can't do that with an agent.
I think I saw one video, so I mentioned this to Joe before, but like but briefly but briefly you can be able to sort of interact with a sort of a PS5 game and also create what you want in that world as you're playing it. So, you're you're on like a you're in a shooter battle line playing you we're trying to like, you know, kill each other.
It's a bonus language, but like if I'm if I'm like trying to be creative in, you know, generate a new getaway vehicle or or a helicopter, I can just sort of say that experience and have that appear in the game straight away and then I can just sort of apply it. So, that's sort of the work that I'm doing at How I Met Joe actually.
But, I think the the challenge that I faced is that relying on just the like voice interruptibility is quite unreliable. So, I've just got around this with a just a simple whisper flow type Push-to-talk type.
And then hold hold to talk and then let go to finish, which it it augments like purely this audio stream with some other cue. And in the same way that I think you almost need some very light nudge interface on top of the audio that you're receiving.
And then maybe you just say like, you know, you're saying something and then you see a little um sort of like circle appearing being like the agent wants to ask you a question. That would be an interesting experience to feel, right? And then you're not being interrupted.
>> Mhm.
You're being kind of like, you know, the agent wants to talk. And then maybe you either stop or say, okay, what what what what idea do you have? Because I even can feel you doing that to me right now.
>> Yeah. You you want you want to say something. I'm just getting loads of ideas. This is great. I just sort of like I'm we we we're almost like having an information, you know, communication on that level, but with audio, if I'm just listening to audio, I'm not I don't think I I know that. Whereas if I was an agent or if I was just like just listening to the audio and not looking at the looking at the new stream.
Okay. I'll let you talk. Sorry.
Well, I'm just imagining or I'm just imagining on that point that the agent wants to respond to you, it could be showing I want to interrupt and tell you about this thing or this one. And and then suddenly that becomes what we've just done, but with even more context than doing it with the human because it's signaling the topic it wants to talk about.
One question regarding the product because regarding this topic, you need some sort of tool talk tool calling in the background to be able to actually understand at that level of detail.
And and how would that tool call be handled? Would you would you sort of be streaming the audio directly to back or back RTC to the to the client or through a back channel and then getting some information? Do you update the the system prompt? But, how how do you work on that part? Obviously, this hasn't been as far as I'm aware hasn't been built yet. The way I would probably approach it is looking at the transcript and just keep analyzing the transcript over and over again and say, do you have anything to to add? Do you have anything to add? Do you have anything to add? Or what would you add? Um rather than being a tool call or being part of it, I would do this as an asynchronous looking at the transcript. Asynchronous or prompt?
Because the original prompt has sort of a plan you wanted to do this and then halfway you sort of change the plan.
How does the original agent talk to Well, this is actually an interesting thing with agents.
They often we see agents as things that you can't really interact with the internals of. But, of what's the next And that transcript of the conversation is completely uh uh uh yeah, yeah, yeah.
And currently if if I have the the the way most agent platforms will work is it'll generate its full text thing and then start generating the audio. And then if I talk, even while it's partway through, it's still reading out the audio uh the the text, it will just append my message to the end of that full message. But, we do have the timestamps. We know how far through the the audio it's played. So, we could actually just go and edit the transcript that's coming back and say, well, no, they interrupted at this point. So, we're going to forget that the LM even generated more text.
Yeah.
Yeah.
Yeah.
I mean, why don't you let's go to the 11 Labs booth at the Expo floor and just vibe code something and see.
This this is great. I can't wait.
Now, it's getting easier and easier to make tools in this. So many different ways to execute your ideas. And it seems like the main kind of differentiates now getting your word out there. Like how did you approach making those videos?
And like how long did it take you for the statue video to make it?
That's a good question. Totally totally separate. Um So, this is sort of the inspiration of this 11 Labs thing.
Um I learned to make videos. I'm not particularly good at it, you know, it's like still relatively janky. Um I learned to make videos through doing other like politics related campaigning stuff before I worked at it before I even knew 11 Labs existed.
The It turns out that with videos, random things go viral or with content, random things go viral and random things don't. Like things I think are going to go viral don't and then things that do do.
I think it's basically about practicing.
Like it's Editing a video I find to It's like the 80/20 rule. Editing a video to the standard of the statue app is actually quite relatively easy. But then to go beyond that, you suddenly cuz I edited that on my phone, the editing itself took about 20 minutes, 25 minutes.
Going beyond the quality of that suddenly means thinking about using a desktop editing tool which suddenly makes everything even using CapCut on desktop to me is like three times harder than using it on my on my phone. And it's also three times more expensive the subscription on the laptop than on the phone. So, I think a lot of that is just doing stuff and trying and iterating.
Big things that I found adding captions really help to the video. Having a hook in the first You can You look at your various platform analytics once you've posted a few videos. My videos tend to get between 6 and 12 seconds is the average the median view time and then people drop off.
So, you need to get your hook in there because most people are going to drop off if they don't buy the hook.
So, that's that's an important piece front-loading the the like interesting piece.
Adding music makes a massive difference.
And this is something that was was much harder, but now with 11 Labs music generation, I will make the video and then add Sometimes I'll make the video with the narrative and and everything and then just experiment with completely different genres until I find music and I'll just put the music on.
Does it work? Yes, no. And then you can edit the the different sections so that it times up with the sections of the speech. So, you don't need to like figure out music first, find a piece of music and then match your speech to it.
The other way sometimes I will have a vibe I want to get across and I will generate the music first and then figure out what's my speech that matches that vibe. So, it might be like an an excited theme or a I I actually chose the music for for the statue app before I before I made the the video itself because I thought this is a fun piece of music for a statue type thing.
It's like a bit of imperial outside the British Museum it sort of made made sense as a attention grabber.
So, but I think music is a massive thing that people underrate.
Cuz you just put it It's relatively quiet, but it makes a massive difference to the feeling. So, you just did it on CapCut and did you think It's literally on my mobile phone. I've got I borrowed my wife's lapel mic Bluetooth lapel mic which cost 200 quid from DJI which makes the audio much better.
And then yeah, just edited on CapCut. Super super simple.
>> think about like this is the shot I wanted and then like I I wanted to pick this photo I mean the the the the one out the front of the British Museum, yes. Thank you.
Yes, I did want that one. But the It was actually the second time I recorded the video. I I went once and got some stuff that was like a bit boring and then I went back and just took a bunch of city photos and that's what ended up being the video.
Thanks.
Yeah.
Cool.
Thank you very much. Thank you.
>> [music] [music] >> Um
関連おすすめ
OpenHuman VS Hermes AI: Who Wins?
JulianGoldieSEO
285 views•2026-05-29
Long-Running Agents — Build an Agent That Never Forgets with Google ADK
suryakunju
142 views•2026-05-30
This computer is made from real human brain cells. And you can buy it.
Talktmsmedia
3K views•2026-05-28
BREAKING: Microsoft’s New Image Generating Model Beat Out GPT 1.5 and Nano Banana 2
aimmediahouse
122 views•2026-06-03
I Made the Same Anime Fight Scene in Every AI Video Generator
NobleGooseAnime
295 views•2026-05-30
Nvidia Bets Big On AI PCs | New Chip To Power Windows Laptops | Technology | AI Updates | N18S
cnnnews18
3K views•2026-06-01
I Tested NEW Opus 4.8 on Four Projects (Updated LLM Leaderboard)
AICodingDaily
298 views•2026-05-29
3D Platformer Update - NO CAPES
SolarLune
294 views•2026-05-30











