BAML elegantly solves the "JSON tax" by offloading structural overhead from the model to the framework, significantly lean-tuning agent performance. It’s a sophisticated shift from brute-force prompting to precision engineering that prioritizes both cost-efficiency and system reliability.
深度探索
先修知识
- 暂无数据。
后续步骤
- 暂无数据。
深度探索
Save 20-50% on agent tool call cost and latency本站添加:
What's up Spacey? Let me show you how what I meant by saving you a bunch of costs and tokens when you're using Bammel. So, I'm just going to set up a project really fast in Python specifically, but this works with any language, not just Python.
So, I set up a quick Bammel project. Um Bammel's this thing that we built that is designed to help you do incredibly good tool calling by using models. So, I'll just show you a couple examples of how it works.
Um all this tooling you're seeing is provided by Bammel. If you do Bammel in it, it just auto installs everything for you.
So, let's just swap to using GPT-4 mini model really fast. I'm using the mini model. You can see the prompt. This This function gets transformed to this prompt. There's a test case written over here, and I'm going to go run the code.
Now, typically when you run the code, you get something like this. You get backtick backticks. You get JSONs. And then, what we do, we do very simple stuff and just do error correction here, so you get the right value.
But, how do we save tokens here? Well, let's show you something.
What if we told the model we don't care?
Prefix equals answers like this. So, now the prompt has changed to say this instead. ctx.output_format takes the return type of the function and just schematizes it.
When we run this code, you'll notice that the model didn't use quotation marks here cuz it's answering like this. And that brought us down to 64 tokens from 72 tokens.
Let's do something else. Let's Let's be more explicit. No quotes around strings.
Now, when I run this, now I get no quotes at all. I'm still getting all the values out. Now, we're down to 56. So, 72 down to 56 is 56 / 72, and we have now saved 77% uh fewer tokens. Or I guess 23% fewer tokens.
If I go to Now, this is one part of it, but let's go further. Let's talk about how we really push token savings to the limit. So, what if instead of using the compiler, we're going to start hardcoding what the prompt says.
But, we're going to change things. This function will now return an array, but nothing in the prompt says that. The experience class will actually return an experience class.
Well, what happens now? Well, a couple things happen. One, the model runs.
But, even though it produces data, notice what you got on the right side.
You got an array. You got experience tokens, and it's an empty array cuz string can't be cast to the experience class.
If I made this a experience or null, what happens now?
Well, now you get three nulls because each of these strings can not be cast to experience, but it can be cast to null as a best effort, so it gives you null.
So, the idea is you as a developer no longer have to force the model to think about schemas or orientations. We let algorithms do this work. This is all happening in like millisecond time. You saw it happen while it's streaming.
And you get really good control over exactly what happens. And then, you can see how this is dramatically fewer tokens from two angles, not just the perspective that we can just cut absolute tokens, but even when a model misunderstands you or makes slight errors, instead of having to retry or go again, we pull out as much data as possible with really rich information that says, for example, items may be dropped, and you actually know exactly what items are dropped as well.
So, you can decide if you want to retry or not based on certain constraints.
I'm going to switch this back to doing the right thing.
So, output format.
And as you're noticing, doesn't really matter. You pull out all the information. It pulls out backticks.
Doesn't matter. No quotes are in here.
Has quotes somewhere else. It doesn't really matter. We basically built a really good algorithm for parsing data from non-deterministic systems into the type system of your choice.
Now, when you want to go use this in another language like, let's say, Python, you're able to do something really cleanly like this.
From Bammel client import B.
And R equals B.extract_resume {dot} {dot} {dot} and pass in whatever string you want into here, and R.experience is guaranteed to be a list of experiences or none. So, let's update that to be purely a list of experiences and no longer make it optional. Well, now it's a list of experiences. So, we do bridging across type systems in every language of your choice. So, again, when you're thinking about coding or agentically coding, everything is type system driven.
And the hard work of getting the type system data is left to the compiler or tooling that is built to do it robustly.
Hopefully that helps. If you have any more questions, let me know. This technique is called schema-aligned parsing.
相关推荐
OpenHuman VS Hermes AI: Who Wins?
JulianGoldieSEO
285 views•2026-05-29
Long-Running Agents — Build an Agent That Never Forgets with Google ADK
suryakunju
142 views•2026-05-30
This computer is made from real human brain cells. And you can buy it.
Talktmsmedia
3K views•2026-05-28
BREAKING: Microsoft’s New Image Generating Model Beat Out GPT 1.5 and Nano Banana 2
aimmediahouse
122 views•2026-06-03
I Made the Same Anime Fight Scene in Every AI Video Generator
NobleGooseAnime
295 views•2026-05-30
Nvidia Bets Big On AI PCs | New Chip To Power Windows Laptops | Technology | AI Updates | N18S
cnnnews18
3K views•2026-06-01
I Tested NEW Opus 4.8 on Four Projects (Updated LLM Leaderboard)
AICodingDaily
298 views•2026-05-29
3D Platformer Update - NO CAPES
SolarLune
294 views•2026-05-30











