Mojo's attribute-based expression system enables powerful compile-time metaprogramming by representing expressions as typed attributes that can be evaluated at compile time, using a two-layer ML representation where the meta layer contains typed expressions that fill holes in base layer operations, with attributes serving as constant data on operations where runtime variables are not allowed, and the system supports type-checked generics, dependent types, and custom dialects for extending the language.
Approfondir
Prérequis
- Pas de données disponibles.
Prochaines étapes
- Pas de données disponibles.
Approfondir
Modular Tech Talk: Mojo’s Attribute-Based Expression SystemAjouté :
Modular Tech Talks is an exclusive series featuring internal presentations from our product and engineering teams explaining the inner workings of the modular technology stack. In this edition, Billy Zu presents on Mojo's attribute-based expression system and how it enables Mojo's powerful type-S safe meta programming.
>> So today I'll be talking about Mojo's attribute based expression system. So I I guess the target audience is for people who are either uh already know ML or are curious how Mojo's comp time system works. We'll go over how um Mojo sort of uses ML in a weird way. Um I call it the Mojo's big bet on ML attributes because by the end of the presentation you'll soon see that uh everything is an attribute.
Okay. So the rough agenda if I can click yes is um we'll first go over a quick Mojo primer what Mojo looks like and then we'll dive into the attribute language then we'll talk about how we actually evaluate this because remember everything is meant to be evaluated at comp time and finally how well does this whole thing work.
So a few words about Mojo.
Mojo was uh created out of this need for a portable systems programming language for the era of heterogeneous compute that we're in right now. It unlocks the full power of the hardware by giving the programmers both low-level control which is necessary for performance, the zero cost abstractions which allows you to build up higher level representations for ease of use and of course the safety guarantees that you would expect out of a modern programming language. And the main superpower behind all of this, how we're able to provide all three at the same time, is a powerful and type- safe metarogramming system.
Uh Mojo embraces type check generics meaning you write your generic code once we check it once and well for the most part and uh you'll be able to use it assuming that uh it's correct for all the inputs that you are allowed to give it. We also fully embrace dependent types which means you can basically parameterize anything with anything. So for example, normally you would parameterize functions, right? But here you can also parameterize uh these strruct types. You can even parameterize arbitrary expressions, which might sound weird, but we'll get into that. And uh you can the things that you parameterize them with is not are not just types, right? You can actually parameterize them with arbitrary values, even userdefined structured type values.
So what this enables is a very powerful uh userdefined libraries, right? So uh it's very important for GPU programming where sometimes you might want to parameterize your kernel function with a lot of stuff to exactly tune into how you want this kernel to behave. And this allows you to customize all that. And what you end up with is a very simple and sort of unified language where types are really just a special case of compound values. It removes a lot of special cases that other languages might need.
So at last year's LVM dev meeting, we presented the overall highle picture of Mojo's ML representation, which is roughly this two-layer system that represents the parametric IR ml. So the base layer you can think of as the MLR operations and they have these typed holes that we've dug out of them and they are part of the what we call execution time logic and then you have the meta layer of typed expressions that fill in these holes that we just dug out of these operations and those are used to configure the base layer operations and uh today this talk will be completely focused on the meta layer of how these typed expressions are represented in ML.
Okay, so let's dive into the attribute language.
So what are ML attributes? If you look at the language reference, you'll see that it says attributes are the mechanism for specifying constant data on operations in places where a variable is never allowed. for example, the comp comparison predicate of a comp operation which I've included on the right.
So the key part is that it it specifies that the attributes are constant data on the operations where a variable is not allowed. And this variable means runtime variable. And so if you look at the compile operation, it it accepts this predicate attribute, right? Which allows you to control what predicate this comparison is actually comparing for.
And if you kind of squint at this, you can see that MLR operations are basically parametric templates, right?
The compile operation is basically a template that can be parameterized by an attribute that specifies what predicate to use. The thing that we want on top of this is that we don't want to specify these attributes at the operation construction time.
We want to be able to delay the specification of these constants until later. Uh and this is what allows us to have comp time.
So the concept here is really the eventually constant attributes which is that we want to put on this operation some placeholder attribute that is not a constant itself but represents some computation that will eventually result in a constant and so from the perspective it does become a constant right it it it just isn't a constant at the start of compilation but by the end of compilation it is a constant So what does the attribute language look like? Attributes can be also parameterized by attributes, right? So when you define an attribute, you can say what the sub attributes or or types in inside are. And that allows us to use this system to build out almost like a treel like structure where attributes represent the computation nodes. uh and you build up a computation tree using attributes that represent operations. So the same mathematical operation you have there uh can be represented in op form right as a bunch of ops or it can be represented equivalently with the attribute tree representation and that's what we're doing. So Mojo's attribute language has three core components. We have types, we have abstractions, and we all also allow extensions. So, let's go over them one by one.
Types are important. All expressions in the attribute language are types because they help us enforce some sort of well-formedness of these expressions.
So, a leaf attribute, which is an attribute that doesn't take in any other attributes, right? It can define its own type intrinsically. So for example, if you have the string constant attribute, uh it can just define its own get type as returning a string type constant string type and any operator attribute that operates on other attributes basically specifies the types of its operands in the attribute verifier and then it also specifies it own type with get type as well. Right? And this get type can depend on the input types if if the input types are not fixed, right? Or it can be constant.
And the operating types again are are checked by the verifier on construction.
And so what that gets you is that all attributes that you can construct are well typed. And this is already quite powerful enough that you can basically use just use this system to define the grammar rules for for your programming language. But it's not quite enough for us. Uh we also want to introduce abstractions of course. So we have a more generalized version of lambda types which we call generator types. It's special in two ways. First of all, it has multiple arguments. Technically, this is this doesn't really give you any additional expressiveness, but it does allow us to better encode what the user is operating with. It saves a bit of RSI as well. We also have dependent abstractions which means that the result of the abstraction. The type of the result can depend on values the input values which means you can define things like this ID generator which uh has a uh takes in a t which is a type and the output of this generator is a t. So the so the output depends on the input value and since uh you have multiple arguments the later parameters can depend on the early parameters. So the type you can specify the type of a later parameter using early parameters that gets you basically dependent types with multiple arguments and of course the result type can depend on the input parameter any of the input parameters.
Now that you have these dependencies, you have to somehow express them, right?
You can express them using names, but that's not a very stable way of encoding it. You would have to deal with name clashes all the time. So we use these dean indices to represent these argument references of. So each uh in debin reference is a attribute that has the depth which is how many levels of generator types you're looking out of which is exactly what the bru indencies represent. And on top of that we have the index which because you have multiple arguments at each generator type and of course the type of the thing that you're representing. And what this allows is really a unique representation so that variable namings don't matter when they don't need to.
One thing to watch out for is that the depth is uh important because you might have nested generator types. So in this case you have a two-level nesting so that the one zero here jumps out one level of generator types and represents the zerooth argument of the previous generator type.
So what are some ways to to create these abstractions? A simple way is >> it is it acceptable to jump in with a question?
>> Sure. because I've always been confused reading those indexes when I read the ML the RR. Could you briefly describe what the actual indexing scheme is? I couldn't puzzle it out just staring at it, but I'm sure it's trivial.
>> The indexing scheme.
>> Yeah, I mean what the the two coordinates there the first coordinate is what are what do each of those coordinates represent?
>> So the first one is depth and the second one is the index, right? So >> depth and index.
>> Yeah. So if you think of indices, right?
The def tells you how many abstractions you're jumping out of, right? So zero means the closest generator type. Uh and one means one generator type outside of that, >> right? And you keep going. So so they are they just tell you exactly which generator type you're looking at and then the next coordinate tells you which argument inside of that generator type you're locating.
>> Nice. Okay, cool. Thank you.
>> Sure.
Yeah. So the same thing is used when you create generator attributes, right? So what so generator attributes is really the simplest ways to to create a value of generator type. It's basically if you squint you can kind of see a lambda here. Basically you specify the body expression as an attribute. You specify the input parameter types. you specify the result type and it packages up into a generator attribute. So if we look at this ID thing again the type is a generator type that takes in a type a value of that type and produces a value of that type and then the value tells you what value it actually produces and the value is the value of that type and that's basically it.
Uh, yeah, Connor, >> what distinguishes that? I'm like, how would you spell referring to the type itself versus referring to an instance of that type? Um, I don't know if that's maybe not not in scope for the simplified example, but >> Oh, oh, actually, I think this might be a title because this this is supposed to refer to the instance of that type in the value, but here these are representing the types.
Does that help?
>> Well, it sounded like maybe you were saying the example is wrong, so maybe I'm more more confused. Um I mean it's okay if this is just meant to be conceptual because I know the actual IR is probably much more complicated than what you can show on a slide but >> yeah this is a very dumb example. So so if you think of the first argument the first argument has type any type which means the first argument is a type right and if you go to the next argument the type of the next argument is specified by the value of the first argument. Now first the value of the first argument is a type and that describes the second argument which is what this is saying right? So you want so this is a list of types right and so the first type is a type the second type is a type of that type and then the result is the same thing as what you took in >> yeah but I guess the point is that these in this context these debrun indices are referring to like post.
I'm like trying to wrap my brain around this. They're not referring to again they're pointing to the T, not the any type or they're pointing to the X, not the T in the second parameter.
>> That's right. They're they're always pointing to values.
>> Yeah. Okay.
>> Except Yeah.
>> Yeah. Except there's a title.
>> Yeah. Okay. All right. Thanks.
>> Okay.
So now that we have typed expressions and abstractions, you technically have a programming language. Uh but of course that's not fun, right? If all you can write is generator types, everything is going to be incredibly verbose. So we obviously want people to define their own types and extend the system. And being in ML, we basically use dialects. So a dialog can define its own custom types, its own type constructors or operators on the types as attributes and they all work within the same system because MLR allows them to. So for example, we take from upstream the index type at least we still do for now and uh we use the upstream integer adder to represent constants of the index type.
Locally we have our own uh internal dialect that has the SIMD type and we have the SIMD attribute to create constants of the SIMD type and you have operators operator attributes that operate on attributes of the SIMD type.
So for example the SIMD comparable and with this you can build out basically whatever you want.
So now that we have the language there we have to talk about evaluation because we do want to actually evaluate all this at comp time.
So let's go over a little exercise. So let's say you want to fold this thing this 1 plus 2 * 3 and you have a let's say a pseudo ml syntax of you know add one mole 2 three right so we want to reduce this into a constant integer matter of seven. So how do you do that?
Uh a simple thing you can do is at the in the constructors of these attributes you can do a quick check. You can say if both of the operands are already integer adders which are constants then I don't have to actually create a operator attribute I can save a little bit of ml context save a bit of uniquing I can just produce a constant attribute immediately right without the intermediate thing uh that works right you you can have the m adder let's say you have a m adder C23 produce a six and that gets bubbled into the add address constructor and it creates a seven immediately. So you you actually don't have the intermediate added madd address here. That's neat. So is that all we need?
Let's see something a little bit more complicated.
So suppose you have this parametric comp time which again is just a generator adder like we mentioned earlier. You it takes in two index inputs and adds them together.
And the uh it's represented in again pseudo mir here.
So let's say you want to fold and 23, right? And in pseudo mirr, that's how how you would write it. You have the generator, you have the operands of the generator.
So how would you do this?
You're quite you're a bit late for the uh constructor rewrite, right? because the generator body has already be been written. It is at 000001.
So there's not much you can do at construction time. You do need a little bit of replacing going on. So normally you would reach for a adder type of placer. It's a pretty low-level utility in mcf stream. And what you would do is every time you see a param index refer something like 0 0 something like 01 you go and replace it with the corresponding input right? So you take two or you take three. Based on the index, you use a replacer to stick them in. And when the replacer is done sticking them in, it'll run the add attribute constructor again. This time with two and three. And so the old, you know, construction time folding kicks in again, and you get uh five out of it.
So seems like we're good.
There's a little bit of a complication because my attype replacer while it's very powerful, right? It's generic over all kinds of attributes. You can walk uh attributes generically, it's also very efficient. It caches the replacements.
So if it if your replacer ever says replace this attribute with that attribute, it'll remember that and the next time it sees the first attribute, right, it'll immediately uh replace it with whatever your replacement return the first time. So it's it's like memorizing your replacement lambda. And that's not great for us because our indexes are relative, right? So the green 0 0 points to the inner index argument while the brown 00 points to the outer index argument. And when you're doing replacing for the outer index, you don't want to actually replace the 00 inside the green 00, right? You only want to replace the brown 00.
And and that's that's not going to work if you just use the raw outer type replacer. So what we have is our own custom adder type replacer that is depth aware which means it tracks how many scopes you have entered and the cache is dependent on the depth and so as you walk the tree you change the depth and you register cache hits sorry you register the caches with the depth and so the next time when you want to look up a cache you have to be at the right depth to retrieve that thing. And this is very important for us because roughly 30% of the replacers actually hit the cache. So if we don't use caches at all, compile time is just going to blow up.
Another thing that we need to do for a custom replacer is make it context aware because not everything is internal to the expression tree. So for example, we allow the expression tree to invoke functions that the user writes which are not part of the tree, right? They're part they're a function that the user writes is just an op somewhere. Uh and when you're replacing an attribute, you normally don't have access to all that information. And so one additional thing that we provide as part of the evaluation infrastructure is this thing called an evaluation context which gives you an interface to look up what uh the the sort of information that's external to the attribute tree. So in this in this case you can basically have a reference to the seal div in the expression tree and uh the evaluation context will be able to provide you whatever hook you need to look inside the seal div.
Another use case for this is for witness tables. So for example, if you have a type that conforms to a trait, the way that it conforms to that trait is encoded in a witness table which is an op.
And uh without a context lookup, you wouldn't be able to access any of that witness table. The so as I just mentioned, you have the ability to look up functions and type definitions via the evaluation context. And all you need to do to access the evaluation context is implement your attribute with an additional attribute interface and implement the evaluate with context hook. And the hook will be uh provided for you when folding is needed.
And this is very also very important for us because as you can imagine we don't want to actually encode all the ops in attribute tree. It's incredibly inefficient because anything you put in attribute gets unique and uh internalized and uh it also allows you to actually call functions right without this you wouldn't be able to call functions you wouldn't be able to look up type definitions uh way gave a talk about the com time interpreter at the LVM uh euro LVM and that session is recorded so you can probably find the video line.
Okay. So, how well does this whole thing work? Because everything is an attribute and uh it might uh I don't know, maybe we're crazy.
So, the first thing that I was worried about is compile time. And for this I took a matall test from our repo and it is one of the new ones that actually uses layout tensors or I guess it's not new anymore but uh it it it has a lot of com comp time computations going on and this is what we got. So for building the test roughly 17% of the time is spent on folding with the majority being the substitution kind of folding.
And to me this was actually a bit of a pleasant surprise. I thought it would be a lot bigger. Um seven well 17 isn't trivial. It's also not significant enough that it's like we need to fix this now. So in terms of a compile time, it's actually not that bad. Now you might say, what's the cost of creating all these attributes, right? Because you're doing all these substitution, you have all these intermediate attributes.
Does that blow up your memory usage?
So for the same test, the uh attribute storage measured is actually only 7%.
of all the memory that we took up during compile time, which again is actually not that bad.
Now, there's multiple ways you can read read into this. You could say that maybe if you look at the absolute value, you might say maybe our entire com compiler's uh memory usage is way too big and we it would be good to shrink other parts um until the attribute part is significant enough. But at least for now, it's not a big worry for us the amount of memory that we're using for these attributes and types.
So ending on a happy note, that is uh all of the presentation and uh we can open for questions.
Vidéos Similaires
Agentforce NOW AMA: Build with React and Salesforce Multi-Framework
SalesforceDevs
490 views•2026-05-28
How agent o11y differs from traditional o11y — Phil Hetzel, Braintrust
aiDotEngineer
450 views•2026-05-28
Re: 🗣️📍theprophedu📍2026 GST 103 CLASS (E-EXAM REVISION)
theprophedu
636 views•2026-06-04
WEB TECHNOLOGIES UNIT-2 | Degree 4th sem BCOM Computers web technologies unit-2 full explanation💯✅
LearnwithSahera
1K views•2026-05-29
More tests are always better? How to use AI to identify tests that bring little value
Alliance4Qualification
335 views•2026-05-29
Search Algorithms Explained in 60 Seconds! 🤖💨
samarthtuliofficial
218 views•2026-06-01
People of Game of Thrones using JavaScript DOM
AltCampus
296 views•2026-05-30
Instagram accounts got PWNed
EricParker
13K views•2026-06-03











