This approach cleverly exploits autoregressive noise to simulate entropy, providing a practical heuristic for a fundamental architectural limitation. It is a sophisticated workaround that highlights the persistent gap between token prediction and true stochasticity.
Deep Dive
Prerequisite Knowledge
- No data available.
Where to go next
- No data available.
Deep Dive
You *can* get an LLM to Generate Random NumbersAdded:
LLMs have always been really bad at picking random numbers reliably. So much so that stuff like this has become a meme. If you ask for a number between one and 100, you're going to get 42. And this is an example with chat GBT. But Claude also has this problem. And if you go to some of the open source models like Quen or Deepseek, same story. And that is why this paper caught my attention. String seed of thought prompting LLMs for distribution faithful and diverse generation. It's all about getting LLMs to reliably generate random numbers according to a proper probability distribution. Now, when you hear that, your first thought might be, well, why on earth is that even useful?
Because it's really cute that we might have some sort of a trick that lets us go from a situation like this where there's bias in the data that we generate to a situation like this, right? That's a cute result, but how on earth is that useful? Because we have algorithms that can generate random numbers that are way more efficient than LLMs. If that's your gut feeling, you'd be half right. You don't actually want to use LM to generate random data. But the thinking is that this task is a metaphor for something that you do want LLM to do because you could say, well, if it's biased, then we're going to sample one thing way more than the rest.
And that means that we are generating an output that isn't diverse. And that is something we do want to have when we're doing anything with creative work. So if you give the task of, hey, could you write a short fable? It would be very repetitive if all the fables were about a tortoise and um you know it would start every story like that. However, with this trick, you might be able to go from a situation on the left to a situation where one story is about a fox, the other one is about a river, the other one is about you know some seeds.
You would get something that is just way more diverse. And I'm going to focus in on the statistical aspect. And that's mainly the part that you see over here.
And the reason is because this is nice and countable. You can do experiments but more easily this way. But of course, the reason why we're interested in this is related to things like this. If you can get an LM to represent a proper probability distribution, you might also be able to get the LM to generate very good but diverse outputs and that you don't get stuck in the situation where the LM is just always repeating the same thing over and over and over again. So, for this experiment, I made this notebook and what we're going to do is we're just going to use a couple of these open- source LLMs. I'm going to go for some of the smaller ones because the paper uses some of the larger ones. One detail that's important here is that I'm going for thinking models for all of these. I'm currently going for the 250.5b version. If you load that up in MIMO, by the way, one thing that's pretty cool is you can actually inspect all the layers of the model. That's a feature that we added recently. But just as a first experiment, what I've done is I've got this function that allows me to sample lots of times from the LLM. And I'm giving it the task to go ahead and pick a digit from 1 to 10 uniformly at random. And it has to reply with only the digit. Lots of digits come out of this thing and I'm calculating the entropy for those digits and I'm also doing some counting. And one thing you're going to see right off the bat is we get some number for the entropy. But we also see that it samples one, it samples 2, 3, 7, and 8, but it never samples four or five or 6 or 9. So something is definitely up here. It also never samples 10. And that's bad. You don't want that. You want to have a system that's able to generate the full distribution. So, we're going to apply a trick, and it's a bit of a counterintuitive one, but we can show that it does work. What you can do is you can tell a thinking LLM to generate a 12 character string of lowercase letters first. You're going to see if it can generate something of a entropy source. And what you're hoping, of course, is that this is something that is happening inside of that thinking clause that you sometimes see appear in those uh training data sets. This is not something that the user would necessarily see, but you're hoping that it's able to do some thinking. And within that thinking it can generate entropy and then based off of that then it can do the task that we just said. So we're going to say generate some entropy and then based on that string and then we follow up with the normal prompt. The only thing that we add at the end is some instructions on how to actually give us the result at the end. So answer colon choice. That should be the last line of the response. If you don't do this it sometimes generates a whole lot of things that you aren't really interested in. But anyway, um if I were to do this and again I do this a thousand times, then the results look like this. So just to make a quick comparison, before we had this situation, low entropy number only five options and now we have higher entropy number and we see all the options appear. So right there we seem to be on to something. But I do hear you think, gee, uh that's a expensive operation.
Can't we just insert our own source of entropy here? Can't we use a random string generator, put that into the prompt, and won't that help? At least that's what I thought when I first read this. So, what I'm doing here is this is not the fastest way of running this because I have to give a new prompt to this sample task function over and over again. So, this cell takes longer to run. But what I'm doing here is I'm saying based on the entropy of this string and then I'm generating a UU ID.
Answer this and then I'm giving it the same prompt as I had before. Now, if I were to look at this, yeah, it does look better than what we had at the beginning. At the beginning, we really had only five numbers and here we do see all of them. But if I were to compare what I see over here to what I see above over here, just the number of options, but also just the entropy, then it does seem that having an LLM think about its own entropy as a step does do the thing that we're interested in. And just to be clear, if I go all the way up, the entropy that I started with was around 2.19.
And if I look at what happens if I insert my own, it's around 2.5. So we have more entropy. if you add your own.
It's just that it's a lot smaller than a 3.1 that we see over here. So, there is something about this trick that at least just by glancing at this one experiment works. Now, I also took the liberty of doing this across a couple of tasks. So, there's digits, there are also letters that you can sample from. There's also a coin flip that you could do and there's also a flip coin that you could do. The coin flip would do heads and then tails.
The flip coin would do tails and then heads. I'm wondering here if the order might be of importance and we could basically apply the same trick here.
Before showing the charts, I do want to point to one downside of this technique.
So suppose that we have some raw answer being returned to us. So this is let's say what we see. We see some sort of a random hash being generated or random characters at least and then we get our answer out like so. Well, how do you actually get the number out is a very good question because it's a lightweight model. it's not exactly doing what we want it to. And there are actually two ways to get the answer out. One way would be to use a proper reg x where we're going to look for that answer colon space and then the number that we're interested in. But then sometimes we see that it results in a1 for whatever reason. And that's not something you can parse with a reg x. In fact, you could also argue that's a wrong result. So you should sample again. That's a thing that can happen especially with lightweight models. One thing you could also do is not use a reg x but just take the last number from whatever is being returned. These are two different approaches to deal with the lightweight LLM. Neither approach is necessarily perfect, but I will say that there does exist this extra concern with this technique that you should be aware of. And as we'll see in a bit, the results can actually vary for both of these two approaches as well. So here are some charts with some results. So the gray lines over here, that's our baseline. And you can see in this baseline that a lot of mass goes into number two over here. A lot of mass goes into number seven. There's a bunch of numbers in the middle that we don't grab. But if we apply our trick uh then we can totally see that something is indeed working. We are sampling way more things in the middle. I don't know if I would say that this passes a very strict statistical test but at least from eyeballing you can see that this trick works. You see a very similar thing with letters A through E. And one thing you can also see here is that there is a bias. The first thing that is being mentioned in the option list. Uh that's the thing that's being sampled more in the baseline. And when we apply our trick, we definitely sample more in the tail. So that's also what we want. And this is a pattern that also repeats itself when we do heads versus tails. In the head scenario, we sample the thing that's mentioned first way more than the thing that's mentioned second. So that's also an interesting thing to know. One thing I thought was interesting too is that for the heads and tails, if you look at the entropy that we end up with, it definitely feels way more balanced here than if we do tails and then heads.
So, you know, the fact that we have a trick that quote unquote works doesn't necessarily mean that the trick works consistently across all tasks. I would have expected this to be as high as over here at least, right? But another important detail here, if we look at the overview of everything that we did, we can measure the entropy of everything that we do, but we can also measure how often we actually end up parsing something successfully. And we do see that this new technique of insertion does cause a lot of muck in our output to the extent that Regax is having a hard time getting it parsed. So that's a thing to at least be a little bit worried about if you're interested in doing this. Another thing that I thought was a little bit cute as well is if you look at what this approach is doing, if you look at what the maximum amount of entropy possibly could be, then you see that we are actually a little bit higher. Now, why is that? That's related to the fact that sometimes you can also produce another digit like zero instead of 10. So that's why the entropy here is slightly higher than the theoretical maximum. That's still something to be slightly concerned about, of course. But the main thing that you see, and that's really consistent, is that no matter which of these two approaches you take, the baseline is always improved when we apply this trick. Not always as much. So again, very clearly, you can see here that the baseline for our coin flip, that's improving a whole lot. But if we look at the baseline for our flip coin, which is tails heads, then we see that, you know, increase, but it's more modest. There is another thing that's good to know about this technique, though. So I'm going to scroll all the way back up again now. And what I'm going to do is I'm going to select the 3B model instead of the 0.51. And as we'll see in a moment, uh, this changes a bunch of things. Remember how before the digits would be more around 1 and two, and now we seem to sample more in the middle in the baseline. And also notice with the letters down below over here that before we would always sample A, B, and then a little bit of C. But now the baseline is just everything in A. And even when we apply our trick, I mean, we get a trickle of better behavior, but it's not a whole lot better. And then we get to the coin flip which is super biased. We always pick tails. When we apply the trick again as a small remedy, but before everything was in heads and with the flip coin scenario, it's the same thing. We now always pick heads for some reason. So things are flipping when you switch models. The general pattern definitely still holds. You are able to go to a higher level of entropy compared to the baseline, but for this particular model, I would argue that the uplift is definitely more modest. So there you go.
I was able to reproduce some of the things that were in this paper. I did a few things differently because I definitely went for smaller models and you could make the argument that life is different if you don't have that many tensors in your model, right? So, it's cool to know that there is a technique that allows you to get an LM to generate more diverse output. That is definitely pretty cool and interesting. Will this work for you? Well, as the last exercise shows, that really depends on the model.
And the only way to know for sure is to just repeat this exercise for yourself.
But cool to see new tricks.
Related Videos
OpenHuman VS Hermes AI: Who Wins?
JulianGoldieSEO
285 views•2026-05-29
Long-Running Agents — Build an Agent That Never Forgets with Google ADK
suryakunju
142 views•2026-05-30
This computer is made from real human brain cells. And you can buy it.
Talktmsmedia
3K views•2026-05-28
BREAKING: Microsoft’s New Image Generating Model Beat Out GPT 1.5 and Nano Banana 2
aimmediahouse
122 views•2026-06-03
I Made the Same Anime Fight Scene in Every AI Video Generator
NobleGooseAnime
295 views•2026-05-30
Nvidia Bets Big On AI PCs | New Chip To Power Windows Laptops | Technology | AI Updates | N18S
cnnnews18
3K views•2026-06-01
I Tested NEW Opus 4.8 on Four Projects (Updated LLM Leaderboard)
AICodingDaily
298 views•2026-05-29
3D Platformer Update - NO CAPES
SolarLune
294 views•2026-05-30











