AI systems like AlphaGo discover alien game strategies by combining search algorithms, evaluation functions, and self-play learning, which allows them to optimize for long-term value rather than human intuition, enabling them to find moves that appear strange but are strategically optimal.
Deep Dive
Prerequisite Knowledge
- No data available.
Where to go next
- No data available.
Deep Dive
Why AI Plays Like an AlienAdded:
In 2016, one of the greatest Go players alive was sitting across from a machine.
Computers had mastered chess years earlier, but Go was still considered too intuitive, too subtle, and having a space of possibilities too large to search by force.
For 36 moves, nothing looked impossible.
Then AlphaGo played move 37.
placed far from the action. Not aggressive, not obvious, not human.
For a while, it looked like the machine had lost a thread. But later, the board began to reveal what the move was doing.
AlphaGo had seen a shape in the game that humans had not yet named.
So, how can an algorithm find a move that human intuition almost excludes?
To a human, a game is a struggle of plans and ideas. To an algorithm, a game is something colder. A set of states, a set of legal actions, a transition rule, and the reward.
Once you write a game this way, intelligence becomes a problem of choosing actions.
A position on the board is a state, usually called S. A move is an action A.
The rules tell you the next state and at the end someone gets a reward. Win, lose, or draw.
In a two-player zero sum game, your gain is exactly your opponent's loss.
That means a position is not good just because it gives you an exciting possibility. It is good only if your opponent cannot refute it. If your opponent can force the bad branch, the bad branch is the one that matters. On your turn, take the maximum. On your opponent's turn, take the minimum. This is minimax.
It is not asking what is the best thing that could happen. It is asking after the opponent responds, what is the best outcome I can still guarantee.
The problem is that mini match is correct in a world where you can search everything.
But real games are far too large. If each position has about B leadable moves and you search D moves ahead, the tree has roughly 1 + B + B ^ 2 + B cubed and so on until B to the D. That exponential is the wall every game playing AI runs into. It doesn't take many branching steps for the search space to become computationally impossible. For games like chess and go, it would take more time than the history of humanity, longer than the age of the earth, and even longer than the age of the universe to search to the end, even on the best supercomputers.
Since the algorithm cannot search to the end, it needs an approximation.
So when the search stops, the leaves of the tree are not real endings. They're just unfinished positions. The algorithm still needs values there. Instead of computing the true value V of S for the state, it learns or designs an estimate, it scores the frontier and those estimated values can be backed up through the tree just like real outcomes.
This is the role of an evaluation function, not to see the whole future, but to give search a useful guess when the future is too large. In old chess engines, this estimate was built from human concepts, material, keen safety, pawn structure, mobility, and so on. In modern systems, more of this evaluation is learned from data.
Either way, the role is the same.
Compress a huge future into one number.
So, evaluation tells us how to score positions. But search can also get stronger by learning what not to inspect. Here the left branch has already given us a value of five. So from the root we know we can guarantee at least five. That guarantee is called alpha. Now look at the right branch. The opponent has already found the response that holds it to at most three.
So even if the unexplored move is interesting, it cannot change the final decision. We can cut it off. The branch is not impossible. It's not wrong. It's just irrelevant. Alpha beta pruning does not change the minimax answer. It changes how much of the tree you need to inspect to find it.
For a long time, the natural strategy was to put more human knowledge into the machines.
And this helped. Human knowledge can give a system a strong start.
But the pattern that kept winning was different. methods that would use more computation, deeper search, better evaluation, learn policies kept improving. Human knowledge rises quickly, then tends to plateau. Search, learning, and compute starts less elegantly, but they keep scaling. This is Richard Sutton's bitter lesson. The bitter part is not that human ideas are useless. It is that they're not often what scales.
Alphador combined the three pieces we have been building.
First, a policy network. Giving a position, it predicts which moves are promising, not the final answer, just where search should look first.
Second, a value network. Giving a position, it estimates who is likely to win. And third, Monte Carlo research.
The search uses the policy to focus on promising moves and the value network to judge positions it cannot play out completely.
Each simulation adds evidence. Some branches get more visits. Some moves become more convincing.
In one simplified form, the search balances two forces. What already looks good and what has not been explored enough. But where does the training data come from? In selfplay, the system does not wait for humans to provide examples.
The same network plays both sides.
During the game, search turns positions into better targets, not just what the network first guessed. But which move search found worth visiting.
When the game ends, the results gives those positions a value label. So one game becomes training data, a position, a search improve policy, and an outcome.
The network trains on that data. Then the updated network plays again. Each generation creates the data for the next one.
This is the strange loop. The machine is not just learning from history. It is building its own curriculum on a completely different scale.
Now move 37 is less mysterious. The machine was not asking the human question, does this move look natural?
It was asking a colder question. Does this improve the value of the position?
A move can have low humanlike probability and still have high long-term value.
Search can visit it again and again, even if it looks strange to us. This is why these systems can feel alienike.
They're not random. They're not magic.
They're optimized around a geometry of the game that humans do not naturally see.
The alien was not built by adding mystery. It was built by removing human assumptions, then staling search and learning until the game revealed patterns we had never named.
Related Videos
OpenHuman VS Hermes AI: Who Wins?
JulianGoldieSEO
285 views•2026-05-29
Long-Running Agents — Build an Agent That Never Forgets with Google ADK
suryakunju
142 views•2026-05-30
This computer is made from real human brain cells. And you can buy it.
Talktmsmedia
3K views•2026-05-28
BREAKING: Microsoft’s New Image Generating Model Beat Out GPT 1.5 and Nano Banana 2
aimmediahouse
122 views•2026-06-03
I Made the Same Anime Fight Scene in Every AI Video Generator
NobleGooseAnime
295 views•2026-05-30
Nvidia Bets Big On AI PCs | New Chip To Power Windows Laptops | Technology | AI Updates | N18S
cnnnews18
3K views•2026-06-01
I Tested NEW Opus 4.8 on Four Projects (Updated LLM Leaderboard)
AICodingDaily
298 views•2026-05-29
3D Platformer Update - NO CAPES
SolarLune
294 views•2026-05-30











