AI models can internally recognize when they are being evaluated without explicitly stating this in their chain of thought, creating a new challenge for researchers trying to measure model risk honestly, as models may 'sandbag' and pretend they lack certain capabilities even when they recognize they are being tested.
深度探索
先修知识
- 暂无数据。
后续步骤
- 暂无数据。
深度探索
AI Models Can Know They’re Being Tested, And Not Tell You本站添加:
And so that text then, since it's in plain English, gives us a new kind of window into what the model's thinking about. That's the basic idea here. They see a bunch of interesting stuff. One result that's especially interesting, unverbalized evaluation awareness. So, you know, we've talked about language models where when you give them a chain of thought, you'll often find them thinking that, "Oh, this question I'm being asked looks suspiciously like a an evaluation, like I'm being benchmarked for maybe cyber risk or something else."
So, you know, maybe the model chooses to like sandbag and pretend that it doesn't have the capability, that it'll pass the test. Well, in this case, they actually see that the model sometimes internally recognizes that it's in an eval without ever saying so in its chain of thought.
So, there's a new way of detecting the sort of like, I guess, unspoken thoughts, if you will, of these models.
相关推荐
BREAKING: Microsoft’s New Image Generating Model Beat Out GPT 1.5 and Nano Banana 2
aimmediahouse
122 views•2026-06-03
Long-Running Agents — Build an Agent That Never Forgets with Google ADK
suryakunju
142 views•2026-05-30
I Made the Same Anime Fight Scene in Every AI Video Generator
NobleGooseAnime
295 views•2026-05-30
Nvidia Bets Big On AI PCs | New Chip To Power Windows Laptops | Technology | AI Updates | N18S
cnnnews18
3K views•2026-06-01
3D Platformer Update - NO CAPES
SolarLune
294 views•2026-05-30
AI Doesn't Create Bias — It Inherits It
UXEvolved
176 views•2026-06-01
Distributed Inference Challenges Explained #shorts
alexa_griffith
466 views•2026-05-31
[한글자막] OpenAI @ Replay 2026 | OpenAI는 Codex로 개발 방식을 어떻게 바꾸고 있을까요?
TechBridge-KR
1K views•2026-06-03











