The shift toward local LLMs is a strategic move to reclaim data sovereignty and hedge against the inevitable "API tax" of cloud giants. It marks a necessary transition from merely renting intelligence to owning the infrastructure for long-term technological autonomy.
Deep Dive
Prerequisite Knowledge
- No data available.
Where to go next
- No data available.
Deep Dive
時代はローカルLLMか / API価格のさらなる高騰が予想される / Claude CodeもCodexもローカルLLMで動かすAdded:
Yes, good morning. So, I'm going to start streaming. Uh, today is Tuesday, May 26th, 2026, a little after 9 AM.
Yes, good morning.
Well, today's broadcast will also be streamed simultaneously on Substack, so please feel free to listen to whichever you prefer.
By the way, just wait a moment.
Oh, are you okay? I thought the internet connection might be a bit shaky, but it's fine.
yes. Good morning everyone.
Yes, so yesterday I was broadcasting from a hotel, but today I'm broadcasting from my home in Osaka.
Well, there's this web media called Web Manager Forum, which has been around for a long time.
Yesterday, I was talking about it at various places like Konshinkai, and it seems it's been around since 2006. So why is the Web Manager Forum 20 years old now? It's exactly its 20th anniversary.
I started my blog and became a systems engineer in 2005, so it's amazing to think that it's been around ever since.
yes. Well, at least when I was working at an agency, I was always looking at the Web Manager Forum, so I guess that was about 10, or maybe even 15 years ago.
yes. Well, I feel very grateful to be able to appear at a real event related to Fetant Shagorham, something I've been watching for a long time. Yes, there were lots of them. I was surprised.
Well, it was a talk about AI agents, and I talked about use cases, but to explain the background, there's a Mr. Matsui from GMO, and he asked me if I'd like to attend this event, and I said, " Okay, I'll go," and that's how I ended up participating. The venue itself was huge, with about 120 people. Well, the venue is divided into several sections, and other sessions are also taking place at the same time. I think there are about four rooms? There are about four or five rooms, and different people are speaking on different themes in each one.
Thankfully, I was in the largest room. The venue had a capacity of about 120 people, but it was completely full. It was truly completely sold out.
Well, since this was a free event, I thought there might be a lot of last-minute cancellations, but surprisingly, that was n't the case. Well, I suppose there were some, but the seats were still full from front to back.
Well, we talked about various AI agents, and what we do at our company, like AI mentors, gateways, podcasts, and various other things, so we talked about those things, and then, in the middle, well, I mean towards the end, Mr. Matsui from JMO, who was on stage with me, said, " Wait, wait." There's no microphone. I did it again. The microphone wasn't attached. yes. I've attached the microphone now, so I think the volume might be louder than before. How is it? Yes. Towards the end, Mr. Matsui from JMO, who was on stage with me, asked how many of us were watching Web TV, and I replied that, well, saying 1/3 might be an exaggeration, but about 1/3 of us were watching. That's quite a number, isn't it?
Well, actually, I think there will probably be some people who don't raise their hands when I ask that question, so if there are any of those people, I guess they're probably 33 or older, but that would be great, wouldn't it? At that point, well, in the end, when it was time to exchange business cards, I asked people who wanted to exchange business cards to line up, and they lined up and a lot of people said things like, "I always watch your YouTube videos," so I thought, "Ah, I'm glad I started doing YouTube." Well, if I hadn't started a YouTube channel in the first place, I wouldn't even have had the chance to appear on it. The voices have become much closer. thank you. That's right. We're getting closer in age, you know. Well, the camera is right here.
yes. So, well, this time, the web manager's response was good. I hope they'll invite me again. Things like that. Surprisingly, or rather, surprisingly, I don't get approached at all. Well, I think a big reason is that I don't have many connections, and I don't really connect with a lot of different people because I find it troublesome, so the chances of getting contacted or anything like that are lower. Well, I do like these kinds of events, and if I'm invited, I'd definitely go.
yes. Good morning to those who are late. good morning.
yes. That's because I went to Tokyo yesterday, or to be precise, yesterday was Sunday, or rather, on the last train on Sunday.
Well, this is something that will definitely happen at that event, and it's a big one, too. It's the kind of event where more than 10 people come to listen offline. Well, I thought it would be really bad if I didn't go even if there was some kind of trouble, so I decided to join the group in advance, stayed in Shinagawa, and then headed to the venue on the day of the event, but I did n't go that early because I had some other things to do. It started at 10:30, and well, they were also live-streaming it. I did a live stream, then had a meeting, and then went, so I ended up arriving just before 3 PM. No, but you see, I went from Nagaoka to, um, Akasaka, but it was seriously so difficult, the difficulty level was so high, wasn't the train difficult? Isn't it difficult? I don't know, I don't know. Are they the same?
No, is this JR? Is it JR or the Tokaido Line? I couldn't figure out if the flags were different or something, so I seriously couldn't ride about four of them. I went to the platform I thought was the right one, but it was n't, and I was at the next platform, watching the train go by. I then realized that another platform would be faster, so I went there, but I missed that one by a hair's breadth, so I went to another platform, but that was also a different train, and so on. No, seriously, it's really difficult. That's amazing.
Well, there are a lot of people riding the trains in Tokyo, after all.
yes. Well, well, well, at the web manager forum, I also spoke about AI agents, and there was a lot of talk about AI. Well, I haven't listened to anyone else's, or rather, I haven't been able to listen to them, but I do know the titles and stuff like that, so even if I had seen that beforehand, I'd probably know about half, or maybe more than half, about 70%.
Yeah. Saying 70% might be an exaggeration, but it's probably closer to 60%. About 60% were like that. It was related to AI, wasn't it?
So, if there's a chance someone who was here yesterday came, thank you very much.
yes. That's about it for the forum for the inexperienced staff. Well, I think it's fun to hold offline events occasionally, so please invite me to something sometime. Invite me to the event.
yes. So, as I also wrote in the title, I wrote that the local LM era was called local LL LM. No, actually, I'm quite serious about this, and I think that local LL Remmaji would be fine. Well, recently, a lot of people are working with brands like Claude Code and Colex. I think there are also a lot of people using anti-gravity, and many people are using that kind of thing, but you have to consider the cost and things like that when you use it, and when you think about all that, I've come to the conclusion that local ALM is the best after all.
No, seriously, I'm quite serious about this, and there's talk that API fees are going to go up drastically from now on. Well, even now there are still some places where I think the prices are quite high, but I think they'll go up even more.
Well, you have to make all sorts of investments and things like that, and the more computing resources you have, the more advantageous it is for learning and theory, so everyone keeps investing, but then you have to get it back, right? If we want to recoup our investment, we'll have to raise prices somewhere, because it'll be tough otherwise.
So when that happens, well, it's an extreme case, isn't it? If API fees doubled, or even increased tenfold, would it still be usable? I doubt it would be very practical. Well, in that case, local LLM, or local LLM, is like the usual, uh, chat GPT, or, well, GPT 5.5, or Opus 4.7, well, those are cloud LLMs, right? It's a system where you download models from a server via the internet and use them, but with local ALM, you can download it to your own computer, or any computer for that matter, and use it freely there, so it doesn't cost any money. That's all I need. Using that original model is free, by the way.
No, that's why I think that, in the end, it might be the strongest.
Considering all the factors... Well, what I mean is, ultimately, as I just mentioned, the fact that Local Eleb is free to use is a big plus, but also, the security aspect is incredibly reassuring. Well, I also say on X and other platforms that we shouldn't send out information and things like that, and it's a topic that often comes up on social media, isn't it? When that happens, well, with regular cloud Edelem, if you send confidential information, it goes to their server, so we don't know if it will lead to information leakage or anything like that, but if something happens, we have to think about the risks and stuff, which is a problem. But with local Edelem, well, if you install it locally on your own computer, it will work even if you're not connected to the internet, so your login information won't go anywhere, and it's safe and secure, that kind of feeling.
So, in terms of access to highly confidential information and such, it's reassuring, and there are no fees, so what do you think? Don't you think local LLMs are the best?
Well, of course, there are also disadvantages. mosquito. LL Rem, well, one thing is that it's closed, and as I mentioned earlier, it doesn't have the performance of the cloud-based LM walkway.
Huh? Why is it GBT5.O Pass 4.7 or something? Compared to those, well, its performance is slightly inferior.
Well, compared to the previous version, the performance of the locale and mums has improved considerably.
So, you can use local LAMs that are not that different from the latest language models from about six months to a year ago. Well, the quality of functions and designs held by big companies like OpenAI, Athropic, and Google is good, and local LAMs dominate in terms of text length, but yes, that's true. So what's the secret here? It's that when you look ahead, uh, five or ten years, well, not five or ten years, even in a shorter span, uh, two or three years ahead, uh, there's a chance that AP fees might go up a lot, uh, well, when you consider security issues and so on, even if the performance is a little inferior, using local LMs, LL Remo, is more advantageous.
So, in the end, the local ALrem itself evolves, does n't it? Well, right now, you know, the JMA4 that came out last month, or the JMA4 9.6, are pretty good value for money. Well, it can run even on machines with not-so-high specs, so there are even bigger local LMs available. Then the problem arises that the specs aren't sufficient and it won't run.
This time it's going to be a hardware issue, so it's going to be quite difficult. Well, it's difficult to do with a regular PC, so you'd have to buy a proper GPU, like a Vilia GPU, and run it on that. Well, of course, that's not necessarily a bad thing. I think it's fine if they do it that way.
Well, these days, it's pretty hard to find, and you can't buy it even if you have the money. If you want to buy a good GPU, it can easily cost 1 million or 2 million yen. no matter what.
Well, I'm saying that investing in hardware is probably the most cost-effective option if you look at it over a relatively long period of time.
Well, I mean, if you have a GPU or something like that, and you're the one who owns it, then when a new model comes out, you can just switch to that model and get the latest one for free. Moreover, if you're talking about confidential information, you can use it without worrying as much as with the cloud, so isn't that better? I mean, you never know what's going to happen, right? Right now, there's Claude Code and Clock, oh wait, not Clock.
Um, Codex, huh? I'm familiar with things like Claude Codex Anti-Gravity, and even just AI agents in that area.
Well, maybe anti-garbage is only so far gone, but right now we do n't know what's coming, so while the performance is somewhat lacking at this point, considering that the performance will definitely improve, I think the local LM option isn't a bad one.
yes. It's difficult. But in the end, well, if I had to say, it's a hardware problem. As I said earlier, it depends on what kind of GPU you can use and what you want to do with it, and if you're going to make a real investment, then it's a matter of having tens of millions or hundreds of millions of yen to spare. If you set up that local environment without thinking about it... Well, there are various language models, after all. Anything truly good inevitably requires high-spec machines.
Well, I do n't know if anyone will be interested if I talk about this again somewhere, but I'm thinking of talking about local LA again sometime. There are also various things to consider on the hardware side, so I'd like to talk about that too, although I'm not sure if there are many people who would be interested. To run a language model, you need what's commonly known as V-RAM memory, and its associated memory area.
Well, first of all, the question is whether the language model will even work, and even if it does, how fast will it be able to return a response? In short, the model itself won't work if it doesn't have enough VAM. Therefore, output is not possible. Even if there is enough VRAM, if the waiting time is too short, it will take too long to generate. It takes too long to produce output, making it practically unusable.
Well, you might say, "Why not just use something with a large VRAM and large memory, and a wide range of capabilities?" But then it becomes expensive. It's going to get ridiculously expensive. Those GPUs and such are, well, they're not even available in the first place, but that makes a huge difference.
yes. So, well, I think local REM is really great, but it's something I can't really afford to do myself, or rather, it's difficult to do so. So, if there's anyone who would be willing to pay to help me establish such a relationship, please let me know.
Well, I'd like to build it myself for around the cost of my own money.
Well, if you have a machine, you don't really need anything else. Well, I don't think many people have the necessary machine, so you'll need a GPU. oh yeah. Knowing when to use them is important. It's simply a matter of using them appropriately depending on the situation. I just can't seem to decide. If you put it that way, there's no single right answer as to what to use, like with local revs and such. It really depends on how you want to use it, how much you want to use it, and so on. So the type of machine you need will be different, and the GPs and local curries will also be different and change. Well, that's precisely why, no matter what you do, you'll end up needing a GPU, so investing in one isn't a bad idea. Oh yes, that's right. You know it well. DGX Park costs over 1 million yen, right? I agree. It will exceed 1 million.
Well, there's this machine called NVD's GDX-PAR, and it's really small, but it's like a supercomputer, an AI supercomputer.
Well, it has GP and stuff, but, well, it depends on the purpose, I guess. Rigid Spark is good, but I don't think it's that cost-effective.
The memory is quite large, I think it's around 128GB, but the standby space is quite limited in the DGX Park.
Well, basically, it's about the time it takes to output that data, but how long was that again? I think it was probably around 200GB per second.
Well, there are definitely some things that are a little lacking.
Well, if you're using a fairly large and intelligent model, it can get quite sluggish. But if you're using it for training or something like that, I think it's fine. I'd say it's more on the LLM side. Well, since it's made by the NBA, it's naturally going to be a good match for Kula.
If you're going to buy something like a DGX knockoff, you might as well just buy a Mac. Macs offer much better value for money. No, Macs are amazing. It's not for nothing that everyone would choose to buy McDonald's if they had to choose raw curry. Seriously, with Macs, well, to begin with, the memory is called unified memory, so the CPU and GPU use the same memory, or rather, they share it. That's why the bigger ones have 128GB of memory or more. And that's how it originally was. This story is also quite interesting, and, well, what was it, Mac Studio or something like that, a little while ago, about two months ago, I think they were releasing one with 512GB of memory, which is quite large. Even 512GB models don't have that feature anymore. 96, perhaps? There's only 96GB left, and what this means is that memory is being wiped out and depleted worldwide, so 512GB is no longer available.
That's because the MacBook Pro, for example, probably has 128GB of memory available right now. And the standby time was pretty fast, I think it was around 800, 800GB or something like that. So, it's about four times the price of the GX Percentage I mentioned earlier, and if we're talking about the price, it's probably around 800,000 yen, like the MacBook Pro, so the GX Percentage would be about twice that price, but the memory is about the same.
Well, if we're talking about the waiting time, in terms of waiting time, um, if using a Mac, like a MacB, and setting it to full Spanish has about four times the waiting time and is faster, then that's definitely not good.
yes. Well, when you're dealing with things like water theory, the GPU's memory becomes important. Well, it's what's commonly called V-RAM, but normally V-RAM and regular memory are different. The memory used by the CPU and the memory used by the GP are different, but if you've ever built your own PC, you'll understand this better. I also built my own PC when I was a university student, and I think building your own PC is really fun. I recommend it. You really understand it better when you try it. It's really funny. We haven't even decided what to make or how to make it in the first place. Well, of course, everything became one. There are also DIY PC kits available, so if you buy one of those, it comes as a set, but I wanted to choose things myself, so I chose the motherboard, the CPU, the power supply, the storage, and so on. Even then, I was like, "Oh, I see," when I saw that there were slots for things like that, and I was like, "Oh, so you put it in here like this." So, if I buy a GPU or something, I'll just plug it in here, right?
So, you plug the memory into the motherboard, and well, memory is like this board, you plug this into the motherboard, but separately from that, you also need something called Vm that the GPU uses.
The price is high, or rather, there are NVIDIA GPUs, for example, and I think the RTX 5090 is commonly used, but even the 100 series costs around 1 to 1.5 million yen, I do n't know. If you were to buy it normally, it might cost more, but right now, V-RAM isn't that big, is it? I think the capacity was 32GB. I think it was around 32GB, but the standby range was really wide, probably around 1900 or 1.9 terabytes. Well, I'm not entirely sure, but it's probably around 1.9. But it's fast. And since I don't have much VM space, and it's only 32GB, I can't run anything too big.
Oh, but as I mentioned earlier, um, if I were to say, GGX Park and the like have 128GB of memory, so they can run on larger models as well. However, the waiting window is n't very wide, which means it's slow. And, the larger the memory capacity, and the wider the range of body types, the more expensive it gets. It's really expensive. Well, you often hear about NVIDIA 's H, H200, AB200, and more recently, the GB200, in the news and such, but those are on a completely different level.
So, the memory capacity is in the hundreds, and the endurance range is, for example, 8, 8TB, so compared to the DGX Spark, uh, NVIDIA's, well, what's called the smallest supercomputer, it's completely different. So it's like 200GB, right? That's a lot. And for example, the B200 NVIDIA GPU has a score of 8, so what's the difference? Is it 40 times? Yes, the number of pulses it can output per second is about 40 times.
Well, even with H, well, I mean, well, it's tough if you ca n't afford something like that. Well, the price is really high. I do n't think you could even buy it even if you had the money. It was something like a P200. How much is it? Probably around 7 million.
Yes, it's incredible in just one hit.
Well, the next one up is something like the GB200, but that's not sold separately, is it?
Apparently, they're sold as a complete set, which I didn't know about either, but while I was researching, I found out that even if you ask about things like the B200 or the H, it's not something you can easily buy. I was researching it for some reason, and I found out about the NVD GB200, and well, to put it simply, it's much bigger than the VRAM memory I mentioned earlier, and it has a wider standby range, so I was like, wow, that's amazing. So, you know, those kinds of things, like GPT5.5 and Opus 4.7, are used in those kinds of places, and I wondered how much they cost, so I looked it up, and you know, they're not sold individually, they're all bundled together in a set with CP and GP and all sorts of other things, so I thought I'd buy that GPU, the GP200, which I think has 70 or so chips in it. Like 300 million yen. He said 300 million yen.
300 million yen, well, companies all over the world developing AI are fighting over that much. No, seriously, that's terrifying.
Well, that's why we have to consider the balance in that area.
So, of course, it's important to be able to run large models quickly and intelligently, but if it were that easy, the whole world would n't have any problems. So what do we do? Well, it also counts towards the number of people using it simultaneously. If you're just one person and want to try out a local LLM, it's not that difficult. Like the JMA4 I mentioned earlier. You don't need to worry about it backing you up at all. Well, it's more about the spec level, but it will work. With 32GB of memory, it's more than enough to run smoothly, so that's fine. By the way, my Mac has 32GB of memory.
yes. I tried installing JMA4 on this Mac and it works perfectly fine. Well, I'm not going to comment on it right now, but whatever.
Well, but something on a larger scale, like you and me. And, um, what else? Minimax, I guess. There's a type of LM (Learning Machine) like that, but it does n't work with that kind of thing. It's a bit too large a scale to handle.
a.
Ah, Deep Seek.
Deepseek is really cheap. Promax passed with a GPT score of 5.5. I agree.
Since it's a replica cheek product, I think it's pretty good value for money overall. I use it regularly too. Well, since APIs and such sometimes use DPC, it's perfectly fine, right?
Well, it's open source, but I do think it's quite a lot of work. That Deep4 Pro Max is quite large. It's probably something like 1.6, 1.6 terabytes, right? The parameters and all. Let's investigate. It should have been over 1, right? Your cheeks are really sticking out. Yeah, that's right.
1.6 terabytes, 1.6 terabytes, those are the parameters. And the number of active users is 49 billion.
So, what's known as " moe" architecture?
Well, even with active users, it's still only 49 billion. Parameters need to be reasonably large, otherwise it's a real pain to make it work. Well, actually, it's 1.6 to begin with. parameter. It won't run at all with poor memory.
So, even the 128GB version won't work at all, right?
Well, you know, there are probably even more of those. You'd probably need something like a 512GB VAM (Virtual Memory Module).
Well, it is 49 billion, after all. Even though it's active, to put it very simply, MOE is a mixture of espertoire, and it's like each expert in their respective field has one super-intelligent language model, or rather, experts go into it and run only this one expert depending on the task. It looks like only this part will move. Well, if it's coding, it's like only having coding specialists run it, and instead of running the whole thing every single time, you only use a part of it.
So the number of active parameters is something like 49 billion. In reality, the total number of parameters is 1.6, but in practice, it means that 49 billion is enough when it's actually running. So, in terms of meaning, well, if you want to run 1.6 terabytes, you need a lot of VRAM, and if it's not wide enough, it won't even run at all, or if it does run, it will be extremely slow. I wonder how long it will take if I go there before I get back. good morning. It'll take me 5 minutes to get back. It ends up being something like, "It will take 5 or 10 minutes."
Well, it's useless like that.
So, if you don't do something of a certain caliber, the performance will improve because the number of parameters is large. Well, if I say yes, it will probably cause a lot of trouble and people will say that it's not like that.
But, well, the performance is, well, what, just because there are many parameters does n't mean that the performance is high, but, well, the ones with high performance do have many parameters these days. So, even with a small number of parameters, like the part-language model, there are some good ones coming out, so you ca n't say that's always the case, but I do think that smarter ones tend to have a large number of parameters. For example, deep seek nowadays uses 1.6 terabytes of data. It's gotten quite big, hasn't it?
Because if it's Gemma 4, Gemma 4, then it's like that. Well, the Gemma 4 is pretty good in terms of performance, but in terms of the number of parameters, well, there are probably various models, but it's like 30 billion parameters. Well, in simple terms, um, 1/5 or something like that in terms of size, with parameters, well, so we want to use the smartest one possible, but to use a smart one you need machine specs, so that's the point. I was looking at Dospara (a computer retailer) and the prices were so high that I gave up, so I'm thinking of trying out a Mac mini with either 32MB or 64MB of memory. Does Macmin still have a 64-bit version? The largest one is 64-bit. No, really. I'm seriously lacking memory, so I wonder if I can buy a 64-bit system?
Right now, I might only have a 48-bit system.
Looking at it now, the M4Pro chip has 14 CPU cores and 20 GPU cores, with a standby memory width of 273GB. oh.
Well, that's why it's about the same as GGX Park. Speaking of waiting... But the price is cheap. Well, the V-ram is a little small though.
Around 300,000.
48GB, right?
Two units would be 96GB. If you buy three, it's 144GB. Buying about three of them would cost about the same as a DGX puff.
Well, even if you buy three, it's only about 1 million yen, about 1 million yen, and I think the Rigidx rip-off will cost a bit more too, so if you compare them based on that alone, it's probably cheap.
Wait, it seems like someone programmed an improved version of the DeepSE4, which has the same performance as the Deac model, and now it runs on 90MB of memory. That's amazing! Wow, that 's amazing! It should work with an M3MAX 512GB. Yeah, I guess so. Yeah, this part is complicated too. The type of memory you can choose from varies depending on the time and situation, and the chips are also completely different, so it's not necessarily true that the latest one is always better. Even with Macs and things like that, right now at the Mac Studio it's the M3 Ultra, right? The M3 Ultra, how much is this?
32 CPUs, 80 GPUs, and it costs around 800,000 yen, maybe 900,000 yen, and the memory is 96GB, right?
Originally there was probably more, but now it's 96GB. The number of people waiting for this is 5819. About four times as much as before. Four times is an exaggeration, isn't it?
Well, considering the waiting time is about 3 to 4 times longer, it's good value for money.
Wow, I'm glad I bought that mini PC with 96GB of RAM for a cheap price of 170,000 yen at the end of last year. Well, if you buy the same medal model, it'll probably cost around 270,000 yen. It definitely has 96GB of RAM, so what is that Mac?
96GB... yeah, Macs are strong. If you include options like this local LL Rem, well, it's not like McDonald's is always the best choice. It's complicated, isn't it? Macs do n't really have a dedicated GPU that you can use later, basically. If you're using Windows, well, if you want to do it, you just need to plug in a GPU. Ah, I see. So you mean the V-dam, or that thing? The system memory is 96GB.
yes. So, well, you know, we've somehow ended up talking about local elements and hardware, but today, when you normally use LM or AI in the cloud, you don't really think about it that much, right? The reason for that is that they possess the machines and such, like OpenAI and Anthropique. So they've invested an enormous amount of money, and they have tons of those incredibly high-performance GPs I just mentioned, like the H, H200, B200, and GB200, and they're running them, so they've invested trillions of yen in the future to do this, and on our side, we're just using them for internet-related purposes. Well, the fact that the cost of automatic transmissions is going up is just unavoidable, isn't it? The amount of money invested is already abnormally large, so if you want to recoup it, it can only be very expensive. Well, but right now, you know, so, well, GPUs and memory and things like that are running out, so yeah. At some point, the cost of using AI will increase, so if you think about that, it might be better to own your own hardware, like a GPU or something like that. They came quickly and steadily.
Okay. Hmm.
amazing. Everyone teaches me all sorts of things.
People who like this really, really like it. I think people who aren't interested are just really not interested.
Where did we get to Teki?
Oh, a reward-based distributed single-reward calculation framework that's easy to implement and can solve local sex issues and cloud pricing and security problems would probably go viral. Well, if it does come out, it'll definitely go viral. Well, that's the difficult part. I suppose if it were that easy to produce, there would n't be any trouble. I heard that NVID drivers are now compatible with macOS. Oh, really? I heard Apple and NVIDIA don't get along.
NVIDIA GPUs and similar products aren't compatible with Apple devices. That was my understanding.
Oh, I'm happy. thank you. I always enjoy watching it. I'm having fun too. So, thanks to everyone, I'm having fun making YouTube videos.
Mac's Unified Memory is great because it's an affordable way to get a lot of memory.
The iteration is slow, though. Yeah, I guess so. Well, when you try to get it running, it depends on what you're doing, but VRAM, uh, allows memory sharing between the CPU and GPU, so it's easier to run things like LLM. Is it too late to build an online AI system when cloud AI services are raising their prices? Well, that's a tough one. If you ask me if it's too late, I think it's perfectly fine, and that's probably how it's going to be, but, well, if it's going to turn out that way anyway, would n't it be better to do it that way from the beginning? So, in that case, you wouldn't have to pay any API fees, right? If you want to use that cloud AI, you have to keep paying for it, and while its performance is somewhat lacking, I think it's nowhere near as unusable as it used to be. I think it's perfectly usable in real-world applications. Even the current open-source LLM (Low-Level Memory) is like that. So, the point is, wouldn't it be better to get an online GPU or something like that now and run open-source software on it?
Then there won't be any API charges or anything like that, starting now. And from now on, GPUs and things like that will become more and more expensive, and so will memory, but the situation where GPUs and memory and things like that become scarce, the price goes up, and they become harder to get, that's the situation now, but I think that will probably accelerate more and more. Well, there are places that will buy up whatever they have, so there's absolutely no sign of that stopping now. And when I think about it that way, I'm starting to think that it might be more rational for me to spend my money on building something like that from now on. Well, the scale of it will be bigger, so if you ask me if it's actually possible, I'd say it's pretty questionable. I mean, what do you think? I'd like to hear everyone's thoughts and opinions on that. Conversely, I think there are really a lot of different opinions, but, uh, the one that runs on 99 bytes is the anti-sequence version of DeepSeekV.
Oh, it passed the benchmark with Flash GPT 5.5. Wow, that's amazing! Seriously? Really? The RTX 5090 is powerful. Well, yeah, I guess so. It seems like it often comes up as an option, doesn't it?
GPU.
Well, the VAM itself is probably around 32GB, but it's not that big. The atmosphere is vast, isn't it? I think we can get pretty close to that point, so it's kind of fast.
Well, it depends on what you want to do.
That too. That's why I'm letting them do those kinds of things, and I want them to do those kinds of things. I'd like to use something like the RTX 5090, and well, you don't have to use just one; you can use two.
Therefore, if you're going to use that kind of thing, there are advantages to using a Windows PC. It 's a local LLM, but it's possible to install a high-performance model on Google Collaboration and use it; you can do that.
can.
Well, it depends on what you want to do with it, but if you keep it running all the time, it's going to get expensive.
That's exactly it. Google collaboration pricing. You could use H or something like that. It's expensive, but ca n't I go? Oh, I might not be able to go. Up to A100, huh? yes. I don't understand right now.
What's a GPU that's faster than a fast GPU? That online article was like that. It seems like only old things are showing up.
Ah, the A100 is about $3.50 per hour, I think. That's pretty good. If you have an hour, it's $3.50. If this costs 150 yen, then it would cost about 500 yen per hour. You don't need to use it yet, right? I think there are quite a lot of cases where Google Collaboration is used when having students practice things like grips.
However, it's difficult to keep it running all the time. So it depends on what you want to do. Well, if you want to fine-tune it, then that might be fine. After discovering this channel, I developed software for my company using Claude Code on GitHub, and my company recognized my work and even gave me a prize. That's amazing! It says, " Vivid 2419 is amazing!" That's amazing! But congratulations! Since I couldn't predict the global situation, I was prepared to take the hit if the cloud market went up. If you have the ability and resources to set up an on-premises system yourself, that's a viable option. Yeah, that's right. Yeah, that's right. yes.
Well, it costs money, you know. Ultimately, the biggest challenge is having to spend millions of dollars upfront. And if you have to put things on it, and it gets bigger, then you start to have issues with heat and noisy fans.
It's exactly the same kind of problem that data centers and similar places face, but the question is, how do we deal with the heat?
So, even if we want to build a data center or something like that, the nearby residents are against it. If you're trying to create something like that, the way you use it in development is probably quite different from how you use code.
Well, if you put it that way, I think it's not really related at all.
Well, I don't use Claude Code either, but I think it will be used if you install it locally, so you can just use it as is. It's just about what moves. Well, either way, nowadays, there was the Claude Code source code breach incident, so there are Claude Code-like tools floating around as open source, and if you use those and open source A-Model, you can totally go with that. I think it will be something that can be used normally. I think that part will be fine.
Ah, but I think the bigger problem is what to do about the GPU. It's annoying when older notes in the substack appear at the top. Doesn't this bother you, Naka-chan? No, actually, maybe it doesn't bother you. The reason is that I haven't watched it that much. I don't really look at substacks that much, well, I only look at them when I'm about to post something, but on the other hand, it helps me out because people can find my old notes and stuff and give me reactions. If anything, I'm grateful for it, or rather, since it's still in the very early stages, I do n't remember when I started, but I posted about a month ago. I've started a substack project. I think it's amazing that people still get likes on posts that sound like "Gyei" (a type of fish). I guess I'll just have to build my own GPU. Well, that's really true. That's absolutely true. Well, that's why everyone's making GPUs. Countries like China are also developing their own GPUs. China has restrictions from the US, so they can't export the latest GPUs, which is why H and H200 series GPUs can't be exported to China. It's impossible to get it from Chinese companies, for example. That 's why we're trying to make our own GPUs, or rather, we're improving various things so that we don't need to use the most advanced GPU configurations, and that's how deep seeking came about. So, what was so amazing about the Deepseek shock was that American companies and others are using the very cutting edge, like H200 and B200, to the fullest extent to produce their most advanced models, but China produced something comparable to that, in a state where it was restricted and unavailable, and that's what was so amazing. And the fact that it came out as open source was shocking. So that's why it was a Deep Seek shock.
Nvilia's stock price has fallen, you see. DeepSeek came out and it was like that. Building your own GPU for a custom-built PC is a bit much. If I could do that, I wouldn't be having this trouble. Wow, it's amazing that it can sell for such a high price, even if it's a genuine product. Well, but that's not possible right now. Well, but right now, really, everywhere is lacking in GPUs and other resources. If you make it, it will probably sell. If you ask me if I can even build my own GPU, I'd say probably no.
yes. So, I guess I'm retrieving it, or is the era of local REM, or is it the era of even greater advancements in API science, and it is predicted that both the code and the codex will run on local REM, so it seems like it's being retrieved. I wonder if they'll do it properly with GPU Rapidus today.
Yeah, I do have a friend who's upgraded his GPU's memory. That's amazing! Wait, you can actually increase the GPU's memory by soldering it on? That's right. They're starting from one piece, right? The GPU and its VAM are a set, so they're basically soldered together. As for system memory, you can insert it into a Windows PC, or rather, you would n't normally do that with a hack, but Windows PCs have slots, so you can insert it there. You can just plug it in without any modifications, but is n't it amazing that you can increase memory with modifications?
What kind of technology is that?
But I might not have the courage. The CPU is something like an RTX 5090, right?
Well, that's the kind of memory that makes judgments about things like that, right? So, that means you're trying to judge something that costs over a million yen on its own, right? I absolutely hate it.
Worrying is the end of the line. That's way too scary.
Well, if it were possible, it would probably be cheaper.
But no, but if that's the case, I feel like I'd end up buying another one and using two, or using multiple. It's a good idea to get used to it now. No, I seriously think it's a good idea. yes. And I'm being quite serious about this, because, well, if you're a company, you're naturally concerned about security, and I think there's a lot of uncertainty about how this will develop in the future, so, well, tens of millions of yen is unrealistic, but I think it might be worthwhile to buy a GPU for a few million yen and install something like that. Because the future is so uncertain, I don't think buying a decent GPU now would be a waste of money.
Well, either you could use that method only for the parts that require some kind of highly sensitive information, and use Cloud L-Rem for the rest that doesn't matter, or something like that. I've seen a video on YouTube of that guy creating V with an RTX4090. Really. Really. So, who is this Chinese guy?
Who is that? That's amazing. Hey, I'm pretty shocked.
Wait, you can do that? You can set it up? If I were to increase something, well, you know, rather than buying something good, I only knew about buying a set or increasing the number of sheets, so I didn't have any other options. Wow, the option to add more by soldering is amazing. No, but that's probably it.
How do you put it? Even with limited funds or a small budget, they manage to create something really great. I think the people who build that kind of environment are probably the ones who actually do it. Yeah, if you have the ability to run GPU VMs normally, then there's plenty of work for you right now, because companies are really struggling with that. No, I'd like to increase the memory, something like that. If someone sends me a message saying they want to increase the number of V-dams on their existing GPUs, or want to add more, I think I could charge a lot of money by saying, "Okay, I'll do the expansion for you." They're coming in droves. I feel like I'll get requests to kill people from all over Japan, or even the whole world. He just attached it with a heat gun and stole it. Wow, that's hilarious. What's that?
Buy a junk GPU and repair it yourself. There are people who are crazy. Uncle Channel. Oh really?
Yeah, they're strong. People who can do that kind of thing are seriously amazing. People who can make this themselves are really incredible.
Oh, Uncle Japan.
Ojioji Japan, you're amazing! There are 420,000 people!
Registered users. I didn't know that anymore.
Ah, I did indeed fix my RTX 3080 with a 250 yen part. That's funny. What are you bringing? It ended up being a rather confusing story, though. This is getting a bit confusing. Is this story interesting or amusing? What is this story about? This stream is about adding VRAM to a GPU by yourself while half-naked. That's a wrong landing. I feel like the way the sadness is being portrayed is wrong. Is that the correct answer? Let's use local LM. The era of local LLMs is upon us now. I understand that they might build their own PCs, but that's about it.
Which local REM should I use? Okay, I can understand the talk about GPUs, standby, and memory, but isn't it a bit of a leap to then suggest adding more memory yourself? It's all happening so suddenly and noisily.
Today's title: Let's build our own PC in the era of local LLM. No, actually, I think this is a pretty profound question, or rather, I think it's a really great topic.
Well, I guess that's how it ends up, right? You know, when it comes down to that, the advantage of having done it for a long time comes out, like saying, "I built my own PC when I was a student." It seems everything is connected after all. So back then, I didn't have any money, so I was really just focused on how to make it as cheaply as possible. I chose the graphics card, the CPU, the memory, and I figured out how much power I'd need for the power supply, and the feeling of accomplishment when it finally worked was incredible. I even downloaded the OS myself after buying it. I was using Windows, specifically XP back then.
Vista was still out, but I remember installing XP because its reputation was so bad. But eventually, all of that disappeared, and I don't know why, but I remember installing Linux.
By the way, I think I still have that computer. Oh, wait, maybe not. Did you throw it away? I threw it away when I moved here the other day. But I had it until then. I had it until then. I had it for about 20 years. I don't use that custom-built PC, though. I just couldn't bring myself to throw it away. I was so attached to it that I couldn't bring myself to throw it away.
Ultimately, I used Windows XP as my main PC during my student days, and I think I built my own PC around my second year of university? I'm not sure if I was 19 or 20, but I prototyped it when I was around 19, and I was doing blogging and SEU at the time, so I did all of that on that computer. I also had a laptop, a Toshiba Dynabook or something, but its performance was really bad, so I eventually deleted XP and installed Relaxed Minutes and ran that. That's why that experience is really useful. I'm living in the present moment. I think knowing what's inside is quite important. That's where they said "memory." What exactly is memory? Oh, but that thing was there too, wasn't it?
Well, like, "So you stab it like this?" I thought it wasn't moving, so I made it move a little more. I need to push it in a little further, with a click. That's something I can learn from. Also, if you build your own PC, it's surprisingly risky.
Surprisingly, because it's an Ailey, the memory and other components are often exposed, so you might think, "I can't get this out," or "What's with this case? I can't remove the cover," and then you start banging at it, and suddenly it comes off, and you cut your finger.
Actually, in the research lab at that university, there were quite a few computers like that, computers that were no longer needed, and I saw a senior student in the lab trying to use one of them, and when he tried to take off the cover, he cut his finger really hard and was bleeding profusely, so I thought computers were scary.
Computers are dangerous.
Surprisingly, there was an article about making a lot of money by creating graphics cards with increased memory. No, no. It's not that kind of meeting. It's not that kind of meeting. Someone who accidentally short-circuited their PC wiring while assembling it is passing by.
Ah, you're right.
No, I was already done with it up to the point where I thought, "I can't believe I made a mistake." If that were the case, I'd have to stab myself, right? All sorts of things. This is where the cable from the hard drive (which was common back then) would extend from the graphics card, and you'd plug that in. I'll plug the power cord in here. I miss Bister, who failed not with the beauty of yesterday, but with the superficial beauty she sacrificed yesterday. The pistachio was gone in an instant, wasn't it?
Windows 7, which came after that, was actually quite well-received. No, but back then, Vista, the memory of computers at that time, wasn't it like 2GB? Oh, I mean, not just 2GB, but with Vista and stuff, anything less than 2GB was just too much. I think that's roughly what the conversation was about. I wonder what the number was for Windows XP? But it was probably around 52, right? Memory is still 20 years old. A little over 20 years ago.
No, I think it was 512 or something. It moved even at that level. It's unthinkable now.
Nothing is moving right now. 512MB, or even 1GB. I doubt anyone even has a computer like that. I think it's harder to buy.
Yes, it's definitely Vista. It was so good that I kept using XP. It's scary when you plug the parts into the motherboard, isn't it? surely.
Yeah, I think that's a common experience with custom-built PCs. Also, make sure you touch the metal before handling the parts.
Oh, you do that when you build your own PC, right? Well, with electronics projects, you can damage things with your own electrostatic discharge, so you kind of have to run an electric current through the parts first, and that's why I still do it out of habit. Whenever I touch something like that, I always touch a nearby piece of metal to discharge any static electricity, something I learned in college and it's become a habit now.
Oh, I see. Perhaps it's a low-cost, regional type of business in the future.
Yes, yes, yes. But that's really the kind of era we live in. The laptop I bought before Vista, the first laptop I was forced to buy in college, the low-spec, expensive one that I was made to buy out of obligation, it probably doesn't even have 512GB of memory, right? Is the memory really only 2GB or something? I think it was something like 256. So, sometime during university, when I was 22 and he was about 21, two years younger than me, he bought a laptop that was running Vista. Well, but Vista has such a bad reputation, it was seriously poisonous yesterday. It's kind of sad that someone would buy a computer costing around 300,000 yen only to find it running Vista.
Okay, so let's wrap things up now. This little chat went on for a bit too long.
Well, I might talk about locale again sometime, maybe on YouTube or something. Well, recently I've been evaluating Locarem and putting together various things about it, so I'll talk about that again sometime. Well, but, well, if it seems possible, it would be difficult, but, um, I think it might be worth seriously considering buying a GPU or something, and building a local environment like a local LLM. So, if you can, try adding that memory to that GPU. Please let me know if you're able to add more. I really want to know. That's not very brave. Oh, but that's fine. Is that it?
Basically, I buy junk or whatever I really do n't need, and then I put it there, well, it does n't work, so that's the thing. Well, at least we can practice that. For expansion. So, we're going to make a decision about something like the RTX5090 for the real thing. If you succeed, you'll be a hero! Does it cost around 1.5 million? No, it doesn't cost that much? Oh, it doesn't cost that much. Around 700,000.
ah. No, let's do it a little longer.
Wait, have the prices gone up?
$4,000, so that 's about 700,000 yen.
In older factories, you sometimes find machines running Windows 985, so you actually need quite old memory. Yes, you're absolutely right. Back in the day, there was Windows, like Windows 98, ME, and even Windows 2000.
That's exactly the computer my dad uses, it's Windows ME.
That was the era. yes. So, um, let's wrap things up now.
Thank you to everyone who is viewing this on a substack.
Thank you to the approximately 60 people who joined the substack as well. In total, it's not quite 200, but somewhere around 200.
yes. So, let me give you a few announcements and then we'll wrap things up.
Well, first of all, today's discussion about security was from the perspective of local level, but I think there are cases where companies are hesitant to implement AI because they are concerned about sensitive information and things like that. Well, for people like that, well, it's for corporations, but we have something called Claude Code Safety Hub.
Well, it's mainly a gateway.
By installing a gateway, the traffic between the terminal and the server goes through the gateway. This allows for blocking any confidential information or similar data that might pass through this gateway, and it also protects against external attacks like prompt indexing. Well, the good thing about this is that even if you don't have the literacy or anything like that, it forces you to get this GETBEE, so the good thing is that you can process various things at this point. And then, well, you can also monitor usage and things like that, and you can see who is interacting with whom, so, I think it's a pretty good option if you want to introduce AI safely into your company. It's pretty much an all-in situation, with a lot of things attached to it, so yes. Well, it can inspect temporary files, block attacks from external web pages, and even block attacks like crop injection that we found during our research. It's a set of various features like that. Well, if you're interested, please feel free to contact us. The description box contains a URL to a dedicated landing page, not a general link. And, for those who simply want to learn how to use it, we also offer corporate training. As part of our corporate training on how to use it safely, there are many training programs that teach how to use standard CrowdCode, but those mainly focus on operation methods and usage. However, our training teaches how to use CrowdCode safely, plus information security. Well, we're talking about information security, and today's discussion about Local Eleven also touches on that.
When we're introducing something in our own company, I think it's quite difficult to implement unless it can be used safely, so that's what I'm talking about. yes. So, if there are people who would like to learn how to use Claude Code or something like that for their classes, please let me know. Right now, with the subsidy, you can get up to 75% back, meaning your company's burden is only 25%, so it's very affordable. Please consider this option. Also, for those who do n't need full-fledged training in payroll and would be fine with one-on-one instruction, we offer one-on-one coaching. Well, for corporate clients, I offer a two-hour training session where I teach them about security and how to use blacksmithing. yes. Well, it really depends on the level of experience. For someone who has hardly ever touched a Kurodcode, I'll teach them how to use it. And for someone who has some experience, I can watch them use it and give them advice like, "Oh, you should do that part better like this."
So, if you're interested in one-on-one sessions, please check this out as well.
And there's one more thing.
Yes, we'll be offering security courses for non-engineers, specifically AI security courses. Oh, that's June 14th, right? I'll be holding a security seminar, so if you're interested in learning about security, please sign up.
Well, I'm talking about vibe coding, or rather, Claude Code, and AI, or Chat GPT, but especially with Claude Code Codex, you really need to be able to write code from scratch, so I'll talk about the minimum you need to know, or you'll be in trouble. It's about four hours long, so it's quite long, but if you take it, I think your literacy will increase considerably, so if you're interested, please sign up for the AI Security Fundamentals Code course, which is scheduled from around 1 to 5 pm. Well, it's cheaper to sign up now than to sign up later, and we will also be selling archives, but if you buy the archives, it will be more expensive than signing up now, so please keep that in mind.
yes. Well, it's good to know the basic principles of security, how to protect accounts and things like that, what kind of information is okay to give to AI and what kind isn't, how to handle APIs, and also the terminal and things like that.
So, the terminal and commands, so you can avoid a situation where you're completely lost when asked, "Is it okay to execute this?"
Well, it's about being able to make a certain level of judgment, and also, well, GitHub, I think this is quite directly related, so I'm thinking of teaching about security and safety as well. So, well, if you're not an engineer, well, if you have n't studied that kind of thing before, I think it would be a good idea to take the course.
Disks are expensive if something goes wrong.
yes.
Well, I've included links to all of these in the description box, so please check them out if anything interests you.
Okay, then, I think we'll wrap things up here for today. Okay, I think I'll be able to do another live stream tomorrow, so please come and join me!
Uh, it's scheduled for around 9 o'clock.
Okay, so I think we'll wrap things up here for today. It was quite chaotic. thank you for your hard work. I'd like to watch a video about adding more memory to a GPU. I'm getting ready to go out. Yes, thank you very much.
Related Videos
OpenHuman VS Hermes AI: Who Wins?
JulianGoldieSEO
285 views•2026-05-29
Long-Running Agents — Build an Agent That Never Forgets with Google ADK
suryakunju
142 views•2026-05-30
This computer is made from real human brain cells. And you can buy it.
Talktmsmedia
3K views•2026-05-28
BREAKING: Microsoft’s New Image Generating Model Beat Out GPT 1.5 and Nano Banana 2
aimmediahouse
122 views•2026-06-03
I Made the Same Anime Fight Scene in Every AI Video Generator
NobleGooseAnime
295 views•2026-05-30
Nvidia Bets Big On AI PCs | New Chip To Power Windows Laptops | Technology | AI Updates | N18S
cnnnews18
3K views•2026-06-01
I Tested NEW Opus 4.8 on Four Projects (Updated LLM Leaderboard)
AICodingDaily
298 views•2026-05-29
3D Platformer Update - NO CAPES
SolarLune
294 views•2026-05-30











