Malmberg offers a pragmatic masterclass in balancing computational constraints with privacy, correctly identifying Whisper small as the sweet spot for local intelligence. It is a refreshing shift from cloud-dependency toward true user agency in modern software design.
Deep Dive
Prerequisite Knowledge
- No data available.
Where to go next
- No data available.
Deep Dive
Talking Tuna's Talk ModeAdded:
I never really used to talk to my devices. I always found it a little bit weird. Like, how is this faster than using it like I have shortcuts for everything? I have like type fast enough. Isn't this just like, I don't know, weird?
Of course, this all changed with the introduction of LLMs and agentic coding tools and what do I know? Of course, these things talk all day. Like, they stop existing the moment they stop talking. So, of course, they won't talk talk talk talk talk. They just want more words. So, what used to be like your own clumsy blabbering that you would have to clean up afterwards is actually like prime coal for the talking machines.
They love it.
So, now I find myself like talking all day, whispering into my microphone to the coding agents, "Increase padding on the left."
>> [laughter] >> And it's not that this isn't weird, but yeah, now it's kind of helpful, too. And of course, I do all of my computer talking through Tuna's talk mode.
So, how do we get to Tuna's talk mode?
So, usually you would set a hot key to start Tuna in dictation mode.
Like I have here, I spawn it and then say stuff and then hit the hot key again or press return and it's going to transcribe and then when it's done, it outputs the text directly. What you can do instead is like press the hot key and while you're talking press the tab key. As you see, it's going to switch from pasting to staging and what that means is that when it's done, it's going to bring up Tuna, but in text mode with the thing that you uh just spoke.
There's a setting.
Let's go there.
Uh in talk mode settings for the behavior that it should do by default, stage or paste. And uh whatever you choose, tab is going to switch between them live as you speak.
Um as you see, my shortcut here is in the hotkey style. I have it bound to hyper S.
And in toggle mode. What that means is I press the hotkey, let go of everything, and it's going to record while it does.
I'll hit the hotkey again when I'm done, and that just works for me. But there's a bunch of ways to configure it in a modifier style, meaning you can just like hold right option in this case, and then it's going to uh do its thing while you hold. You can like set it up to hyper key or two things or like whatever whatever whatever.
But um this specific configuration works for me.
We have other settings like pausing media while it's running uh recording or just reducing the volume. Actually, I want it uh flipped.
So, [laughter] this is how I have it configured. Um and of course uh all of the dictation that is U in Tuna's talk mode uh happens on device. So, nothing gets sent to any server anywhere.
I found that the uh on-device models uh work just fine for my needs. Like uh you don't even have to send it to a server anywhere. It just like all happens on device, completely private, completely just for you. It's using the resources that you already have available. Uh in um Tuna free, which is free and always will be free, >> [laughter] >> you get access to the built-in Apple speech uh dictation, and also the tiny Whisper model. Um, you can download and use it inside of Julia here. Uh, it's totally great. Like, doesn't need anything. I think it's English only, but um, maybe you are, too.
>> [laughter] >> Um, but if you upgrade to pro, you get access to more models, which are uh, this whole uh, suite of Whisper models plus some others like the Parakeet. Um, we even have even more models. Uh, if you are the tinkering type who want to use all of the things.
I found from my testing that this one, the Whisper small, is actually the best sacrifice between like how big the model is, how much space it's going to take on your disk and in your RAM, and the results. And I am also Danish, so sometimes I speak Danish.
And that's why it's multilingual. I did a test, actually, because I wanted to figure out like which of these uh, billion models are actually is actually the best one. Like, the best compromise between size and uh, effectiveness.
And uh, what I did was to build this tool, which is like another thing with all these gigantic tools. Um, used to be you would spend days just building this uh, testing tool, but now you can just like say it out loud even into the microphone. Whisper, I want a tool that could compare all the on device dict- speech-to-text models and then give me a resume. Uh, not resume, a summary. And then give me a table and I can sort the table by uh, wall time and then also write a review of the results because I can't even be bothered to like plus whatever it is that And then it just happens.
Totally crazy. But that's what I did.
Used um, build a tool that would take the same audio samples send it to all the models that are available and then ran them by time and efficiency and disk use and RAM use.
And what it found was actually the same that I've experienced that the Whisper small model is probably the best compromise between things.
And uh yeah, you can find the report from this tweet.
And the uh the actual uh text that it passed.
Here's the microphone configuration. I actually really like how this turned out. I don't remember where I got it, but it must have been from somewhere. The idea is and the idea is that um Tuna remembers all the devices that you have you've had connected and then you can sort them by a priority in this list. So, that means when I'm in my office here and I have it plugged in, I have my fancy Shure mic connected. And when that's connected, I want Tuna to use that as the primary. But otherwise, it should use the MacBook microphone.
These are my AirPods and of course the Cam Link and the jack and and all these like meta things that can record system audio. I never want to use those. I never want to use my iPhone microphone. So, yeah.
It's uh sorted by priority and it falls back to the next available one. Works kind of nice.
And then you can provide a vocabulary which is going to be sent to the model when it transcribes, but I'm not actually sure this is how how well this works.
But it's in there. Of course, as with everything in Tuna, they all compose and compile. So, if you want to say launch talk mode from another thing with another thing and another, I don't know your ways like you do you. But uh you can uh switch between modes in Tuna.
So, imagine getting the talk mode switch command going, hitting command K is going to bring you up this uh context menu uh like uh about the command that you have staged.
You can add a global hotkey, add to the combo mode, but you can also copy the run URL.
And then uh from whatever Sorry, let me just get this into place.
From whatever uh app that you want to, you can uh run this and it'll open up uh Tuna directly. And of course, when it's ready, it's going to do what you've set it up to do, which is uh in this case pasting directly. And >> [laughter] >> oh, look, it mentioned some of the vocabulary. So, yeah, apparently it's in there.
Um We also have the Tuna CLI now.
Uh which you can use to do the same thing. Um so, in this case, as we can pass from the URL that we copied here, this is going to use uh this Uh no, one second.
This subject and then the action is called uh Tuna core common actions. I think we can just do switch then as a string here. Yeah, this works. Um Yeah.
So, even works through the CLI as well.
Yeah, so that's a brief uh tour through talk mode in Tuna, which is just like you see, a way to get stuff into text mode, and text mode uh we've talked about in another video. S- Somewhere about here. [laughter] Is he hearing voices? Is he starting to talk to his devices.
Yeah. I hope you'll find as I did that these on-device models are perfectly fine for dictation. I want to at some point add the live dictation mode that some of the apps have where sort of text starts to appear as you're talking. But I haven't found the on-device versions of those to be quite fast enough for it to be great. And when they aren't, they sort of just end up being distracting. So, it's going to get there eventually, but currently we have this like >> [laughter] >> I don't know.
Two two-stage thing going where it's like talking and then it's transcribing and then it gets in there.
Um which works fine for me.
That's tuner dictation mode. I hope you enjoy it and like it. And of course, if you want more tuner tips, the best place to go is this channel.
I hope you'll subscribe for more and I'll see you in another video.
Bye.
Related Videos
OpenHuman VS Hermes AI: Who Wins?
JulianGoldieSEO
285 views•2026-05-29
BREAKING: Microsoft’s New Image Generating Model Beat Out GPT 1.5 and Nano Banana 2
aimmediahouse
122 views•2026-06-03
Long-Running Agents — Build an Agent That Never Forgets with Google ADK
suryakunju
142 views•2026-05-30
This computer is made from real human brain cells. And you can buy it.
Talktmsmedia
3K views•2026-05-28
I Made the Same Anime Fight Scene in Every AI Video Generator
NobleGooseAnime
295 views•2026-05-30
Nvidia Bets Big On AI PCs | New Chip To Power Windows Laptops | Technology | AI Updates | N18S
cnnnews18
3K views•2026-06-01
I Tested NEW Opus 4.8 on Four Projects (Updated LLM Leaderboard)
AICodingDaily
298 views•2026-05-29
3D Platformer Update - NO CAPES
SolarLune
294 views•2026-05-30











