This research addresses the removal of NF (non-freezing undefined behavior) from LLVM, which currently causes 18% of miscompilations in the LLVM test suite. The solution introduces two key innovations: (1) a new byte type that represents raw memory bits in SSA registers with per-bit poison tracking, enabling proper handling of raw memory copies and bit fields; and (2) a freezing load instruction that prevents poison spread when loading bit fields. Together, these contributions allow changing initialized memory to poison while maintaining ABI compatibility for bit field encodings. The approach fixes 11 previously failing tests with negligible performance impact (average 0.8% runtime regression), demonstrating that removing NF is both technically feasible and practically beneficial for compiler optimization.
Deep Dive
Prerequisite Knowledge
- No data available.
Where to go next
- No data available.
Deep Dive
[PLDI 2026] Flatirons 4 - PLDI Research Papers (Jun 18th)
Added:uh distinguished paper. Yes, >> you for the introduction.
So, thank you also everyone for coming.
Um today we're going to talk about NF, why we no longer want it in LLVM and how we can remove it.
So to give you some context, currently Alive 2 finds about 1336 miso compilations in the L LVM test suite.
25 of these tests would become correct the moment NF would vanish from LLVM which is about 18% of the total failing tests. Notice notice that these tests are only unound because NF exists even though the majority of the tests do not touch NF explicitly.
Throughout the years, the LVM community has managed to remove about 58% of total NF users. uh but currently progress has plateaued as the remaining almost all the remaining uses of NDF uh are um concentrated in loads of uninitialized memory.
So what actually is NDF? So as many optimizing compilers LLVM has two kinds of undefined behavior. The first kind is immediate undefined behavior which is reserved for operations that trap the CPU. uh deferred and undefined behavior covers all the remaining cases. So, LLVM has three uh main forms or values to represent deferred UB. So, the first form is poison as you can see there which is sticky uh and propagates like floating point NS. The second form is NF which is mostly used nowadays to model loads of an initialized memory. Uh notice that every time we read an NF value, we may get a new different value.
Freezing a poison a poison or NF value gives us a nondeterministic but fixed arbitrary value through the freeze instruction. Then at the bottom of the letters we have regular values. Notice that from these values only poison is sticky while andf is not. uh this non-stickiness of NF gives it some counterintuitive properties.
So for instance consider here this optimization where we want to transform 2 * y into y + y. So many of you here might think that this is clearly a valid optimization. However, it is actually not only because anf exists. to see why.
Remember that every time we reach an NF value, we may get a new different uh concrete value.
This means that uh in the program on the left we have a single if we consider y to bef we have a single and f read. So we get an arbitaryary value. We multiply it by two. So we get an even number.
However, in the right side we have two and reads which means that we read an f twice. The first read could be for instance two and the second read one which gives us three which is an odd number. So we we went from a program which gave us only even numbers to a program which gives us both even and odd numbers. So the refinement is clearly not valid.
So if can cause these subtle bugs and get in the way of optimizations why is it still here?
As I mentioned already, um the remaining uses of NF are almost all come from uninitialized memory.
We want to change an initialized memory to poison. However, we first must address two main blockers. The first blocker are bit fields. So, uh clang currently uses a read mask write pattern to write uh and mask bit fields. And if we change an initialized memory to poison because of poison stickiness, this pattern breaks. The second problem are raw memory copies which are currently unsound in LLVM.
Changing an initialized memory to poison would further aggravate this problem as we would get more poison values in memory. So that's not something that we want.
So let's now explore each of these problems in more detail. So the first um blocker are bit fields. So to start bit fields, clank reads the whole word, masks it and then writes it back to to memory. So in this example, we have these bit fields and to write to bitfield to bit field a we load the 32 bits then we clear uh a bits and then we store it the value back to memory.
today. This is this is correct because NF is not sticky. So because poison is sticky. If we if the original load contains a single poison bit, a single uninitialized bits, the whole load will be 32 bits of poison.
Then we end up storing this poison value to memory which overrides the existing bitfield values.
The second problem are raw memory copies. So, LLVM currently lowers small mem copies to integer load store pairs.
As you can see here, this has two main problems with using integers. So, the first problem is that pointers in LLVM uh are not mere addresses but also carry something that is called provenence. So, provenence can be taught as the capability to access a memory region. So the problem with using integers is that uh integers do not capture this provenence. So if we load a pointer through an integer type we end up discarding this provenence.
The second problem is that poison is value wise in integers with which means that uh if a single uh bit in memory is poison the whole loaded value will be poison.
This conflicts with me copy as we want to copy memory as it is without spreading poison bits.
Uh so switching an initialized memory to poison would increase the the quantity of poison bytes in memory which which would make um raw memory copies the raw memory copy problem even worse. So both these blockers boil down to the same problem which is that uh LLVM cannot represent raw memory in SSA registers.
So that's precisely our first contribution which is a new bite type that can represent raw memory bits in SSA registers. Additionally the bite type tracks poison per bit which means that a single poison bit does not to taint the loaded value.
So previously uh me copy uh intrinsics were lowered to integer load star pairs as you can see on the left and now with the bite type we preserve individual poison bits and pointer provenence.
We also added a new byte cast instruction which allows us to convert byte values to other types perhaps integers and um pointers. So the bite typing the bite cast instruction comes in two variants that type penning variant uh forces uh reinterpretation of the bits and the exact version on the right um uh returns poison if the value if the type of the value encapsulated by the bite does not match the cast destination type. So for instance consider that that b8 value uh represents a single po a single pointer bite. If we try to cast that pointer by to an integer, we get poison back.
So the by type gives us more than just a summ copy lowering lowering. Consider for instance uh this chart type in C which acts as a universal type holder.
So it can hold the raw bytes of any given object in C. Also, char pointers can alias with anything which is what allows us to write a mem copy implementation.
So on the left we see that we uh copy a single bite as we would inside the manual mem copy implementation.
Currently char lowers the um LVM lowers the the char variable the char type to integers which is unound because of provenence and uh spreading of poison bits and with the bite type we can uh preserve the raw byes without coercing them and casting them to integers.
Another application of the byte type is to fix existing optimizations. Consider for instance load widening where we merge two consecutive loads into a single one. So the original code loads uh two values from consecutive memory locations. In the middle we show the emitted IR by LLVM nowadays which widens the load and then uses bitwise operations to extract uh the two load results. Combining the load is unound because of poison. So if either of the original loads contain a single poison bit, for instance, if load one contains a poison bit and load two doesn't, the widened load will will be 16 bits of poison. So in the optimized program, the first load and the second load will be poison while in the first program we get the first load with a valid value. So this is not a valid refinement. With the bite type, we can just load the the raw memory because point poison is track per bit and then use the bitwise operations and the bite cast instruction to extract the relevant parts.
Our second contribution is a freezing load to address the bitfield issue which is equivalent to loading uh raw memory then freezing it per bit and then casting the result to an integer. So this is what bitfields need. Um, clang currently has two encodings for bit fields. It has a fine grain encoding which aims to reduce the bit width of loads and uh default encoding which loads at the word level. So these two encodings should stay API compatible and if we freeze um the loaded value per bit we avoid this poison spread. So when we write back to memory, we don't overwrite uh other possible ex existing fields.
So yeah, with both uh with the freezing load, these two encodings stay ABI compatible.
So we ran a live two over the LVM test suite with our new two contributions and found that 11 uh of the previously uh flagged tests were fixed which is about 8% of those I mentioned at the beginning and we had no new failures.
These tests include unound me copy loings and unound pointer coercions.
We also verified the compilation of real C programs with the live CC and in we found that in Draco 330 me copy lowerings were fixed uh one for especi in all the benchmarks we analyzed no new issues were reported.
We also evaluated our benchmark across 20 uh applications from the Fonx test suite as you can see here.
So here you can see two plots. On the left we have the runtime performance and on the right the compile time performance.
So uh notice that negative values here are better and the average value is indicated by the dashed line. So with our new optimizations implemented on top of the bite type uh the worst uh slowdown we have in runtime is around 4% in Cindy Johnson and the average is about 0.8%.
compile time stays uh relatively unchanged with a average regression of 0.2%.
So we analyzed the emitted bite casts um in our code and found that most of the byte casts follow a really simple pattern which is we load raw memory and then immediately cast it to another type for instance an integer.
Um there was folks uh expressed some interest in allowing loads to do the type panning which would allow us to combine this load bite cast pair into a single load to the target type.
So we benchmarked an implementation and found that 11 um 1.6% 6% of the old instructions in the old version were by casts and this folds uh change the number down to 0.1%.
So here are the results with the new folds we see that the average runtime regression reduces to 0.2% and compile time stays largely unchanged. So regarding the SIMD Johnson regression in runtime, we investigated and found that it is due to jump threading not able to combine some basic blocks that it could previously. We found out that this this is due to LLVM's range analysis not being able to operate on the new byte type only integers. So extending the analysis to operate on by types as well would resolve this regression.
Lastly, we benchmarked an implementation where we switched an initialized memory to poison and lower the bitfields through the freezing load. So the average regression and compile time impact here are about zero and the largest regression we observe was in runtime uh I believe John the reaper with around 1.5%.
We traced this uh regression back to a loop alignment heristic which reacted to some small code changes not necessarily to our new semantics.
So to conclude uh we introduced a new bite type and a freezing load. Um together they allow us to change and initialize memory to from to poison.
Also the bite type fixes some additional issues with LLVM and raw memory semantics.
uh fixing some of the LVM test suite um tests and some uh miso compilations.
Furthermore, it has a negligible performance costs. And to conclude, I'd like to mention that we started up streaming the byte type to LVM and the first patch has already landed. So, thank you Thank you.
>> Questions.
>> This is really cool stuff. When you got to the bitfield stuff, I was kind of expecting you to say you were going to load it as a bite and then add bite and an or operations rather than the approach you took. Did you consider that?
>> Yes. So the freezing load that we use for uh to load bitfields is actually a synthetic sugar for what you were mentioning. So a load of a raw memory type a load of the bite type and then a freeze an explicit freeze and then a cast to an integer. So um we believe that if we did introduce those three instructions instead of a freezing load uh the IR instruction count would increase for bitfield. So that's why we exposed the uh freezing load uh attributes.
>> Okay. Sorry, I meant if you added and and or to be things that could also take byes rather than >> Okay. Yeah. So you mean extending and and to also operate on by on the bite type.
>> Yeah, that's uh a real uh a little bit uh tough because so bytes can hold both integers and pointers. So the semantics often end and our instruction on pointers are not really defined because pointers have provenence. So we don't really have a good semantics to a concrete semantics to extend those those instructions to the right type. So that's why we didn't.
>> So strictly speaking this is semantics change of the current LLVM specification. Have you assessed any thoughts on the impact on the other project which compiled to LLM? Try to reason about semantic uh preservation and stuff like that.
>> Yeah, I'm not really familiar with uh so I can't really say much about it. So yeah, sorry.
I guess maybe more just a remark than a question, but having worked on a verification system that worked at the LM level, this is a persistent pain in my ass. So, thanks for working on this.
>> Thank you.
>> One last question.
>> Hi, uh, thanks for the talk. I if I remember correctly from the beginning you said that removing UNDEF would fix about what was it 18% of miscompilations from the test suite whereas at the end you say that your fix fixes about or with your changes about 11% of mis compilations.
>> Yes. So >> so what about the other ones?
>> Yes. So the bite type was like uh so the initial goal of the paper was to um to fix the raw memory uh issue in LLVM. So one consequence of the bite type is that we can also model the bitfield loads which is the one of the main obstacles to remove and def. So these uh tests that I mentioned uh on the bite type they are not included uh in those uh uh tests that would vanish if NF was removed. So they tackle things like unound me copy loings and unound pointer coercions. So we actually haven't removed NF yet because we need the freezing load as well and we started up streaming the bite type. So those tests are only concerned things like unound pointer coercions and raw memory accesses. So that those were are meant to be fixed with the next next patch which would tackle the mem copy luring.
So yeah in the long term those 18% would vanish. Yes.
>> Okay. Let's thank Pedro.
Can the next speaker come up? Our next speaker is Renee who will tell us about the downgrading semantics of memory safety.
Can you try speaking?
>> Yep.
>> All right. Thank you very much. So, uh this talk will be about memory safety or rather providing a a semantic framework for thinking about memory safety. It will not be about memory safety and C.
It will be about uh using ideas and techniques from secure information flow uh to define what memory safety means for small programs written in in a in a small uh memory onsafe language called noac with a flat memory hierarchy integers for pointers uh manual memory management using maloc and free. So let's look at a small example.
This small uh program allocates 128 bytes or tries to allocate 128 bytes. It then proceeds by dreferencing the the first of these allocated addresses and printing the content at this address. Um so is this program memory safe? No.
First of all the allocation might actually fail, right? So uh in that case maloc returns a null pointer and this null pointer may then be dreference leading to a null pointed dference which may then lead to a sec fault and so on and so forth. Second of all, uh even if the allocation is actually successful, the uh the output of the program still depends on on initialized memory. Of course, uh the uh the newly allocated memory may or may not be be be initialized. That actually depends on the implementation of the allocator.
So I hope you can agree that this is definitely memory unsafe. But that of course begs the question, what does it mean to be memory safe? And if we look in the literature, uh, memory safety is often, not always, but often defined as a long list of, uh, bad behaviors, errors, error classes that should be avoided, that should not occur in a memory safe program. So things like buffer overflows and use of uninitialized memory and so on. So if you want to prove that your program is memory safe, you would have to separately prove that it doesn't have any of these errors or error classes or bad behaviors. And of course, this is far from ideal. Uh, in fact, as noted by Michael Hicks back in 2014, actually this goes back way further, of course, but on his blog in in 2014, he noticed that it would be really nice to have a definition of memory safety that would rule out these errors, bad behaviors as a consequence of the definition rather than as building it into the definition.
So, let's explore that a bit further with the next example.
So this example allocates two two small buffers, eight bytes, something like that, and then compares the actual addresses returned by the allocators, right? Depending on on on those actual addresses, it'll print either a one or a zero. So this program doesn't have any buffer overflows. It doesn't use any uninitialized memory. It doesn't have any use after free and so on and so forth. So by this uh by the by the traditional definition, this would be a memory safe program.
However, the observable output from the program actually still depends on the actual allocator implementation. As I just mentioned, it actually depends on the the concrete values, the concrete addresses returned by the allocation. Uh so that means that the the the thing if we wanted to reason about the behavior of this program that would actually be really hard if we if uh if we wanted to do that without taking the allocator implementation into account. So if we wanted to reason about the behavior of the program, we also need to reason about the behavior of the allocator. And we think that uh that for a memory safe program, we shouldn't really have to to reason about the allocation. We we should be able to reason about the program at a sort of an API level instead. So that leads to the sort of the key idea of a of our paper is that a memory safe program is really a program that doesn't depend on the implementation of the allocator that it's using. Right? So how do we formalize this and how do we how do we reason about this kind of independence?
Well, that's the I think the the key insight of the paper is that we can look to the rich literature of information flow security and then borrow everyone's favorite security property non-inference. Right? Non-inference has been used for literally decades reasoning about how programs how program behavior observable program behavior is independent of any secrets uh held by the program. So we want to tap into this literature and then think of allocators or rather as rather allocator implementations as secrets and we don't want any of this secrets any of these secrets to leak into the behavior of the program. So we want to show some kind of non-inference result for programs.
All right. So how does this this look?
Again non-inference is about observable behavior that should not depend on something secrets or allocators or whatever. So on the right hand side we have a small program that does an allocation and then immediately prints the address that is returned uh by the allocator. It doesn't really matter if it's the null pointer or not. It's just it's printing the concrete value and this corresponds to a direct leak in the in the information flow security sense of the word that we have on the left hand side where the secret that is input from Alice is directly sent on to Bob.
So Alice's secret is leaked directly to Bob. Yeah.
slightly more complex example. So this is the example from before. We have uh on the on the right hand side the control flow of the program depends on the actual addresses returned by the allocator. So here we have some kind of indirect dependence. Right? This this change in control flow is observable by the program by the observable behavior of of the program. This corresponds to an indirect flow in the information security world where on the left hand side again we have the the um the control flow of the program actually depends on the on the values of the secrets and this change in control flow is observable by uh an attacker by just looking at the output right so finally a non-example because uh now we have on the right hand side it's a slight change from the program before I don't know if you if you if you notice but now this program on the right hand side it prints one. No matter what is returned, no matter what the result is of this comparison of the addresses, the observable behavior is the same. And of course, we want our our notion of memory safety to be uh precise enough to actually capture this as a memory safe program, right? Um so this is what what makes our definition semantic. uh also corresponding to to uh to exactly the same semantic notion on the information flow side where again yes technically the the control flow depends on the secret values but the observable observable behavior actually doesn't. So we consider this to be a secure program and a memory safe program respectively.
All right. So now that I have uh hopefully convinced you that the that allocator independence is is quite useful and probably a good idea for talking about memory safety. Let's see where it doesn't work anymore. So on the program on the on the top right hand side, we have a program that allocates probably a large buffer or something like that. And of course, this may fail uh just like any any memory allocation.
But this program does the right thing.
It checks whether or not whether or not the allocation fails. In that case, it prints an error message. Otherwise, it prints okay and and is is on its merry way. So this is actually this is obviously not an an allocator independent program because depending on the concrete values returned by an allocator it will either print an error message or or or it'll print okay.
However we consider this to be a memory safe program. We really like for our definition of memory safety to also incorporate this include this kind of programs because we don't think of um out of memory as necessarily a memory safety issue especially not if it's handled correctly in the program. So how do we how do we weaken our notion of non-inference just enough to actually capture these programs but not any of the unsafe programs. Another example is the the the lower one on the right hand side. A small program that again it allocates some memory. Then it casts the pointer to an integer and prints the integer. Clearly not an allocator independent program. However, um if you're programming low-level systems like device drivers or for embedded systems, you very often have to work explicitly with specific addresses, right? So maybe maybe you're doing memory mapped IO or you're you're setting up the interrupt tables or or whatever. So again, we would really like we would like for for programmers to be able to slowly gradually in a controlled manner break the this independence. So make the the the program um enable the programmer to say yes, I know I'm introducing a dependence on an allocator. However, I'm I know what I'm doing. So so go away. um and of course still have still have yeah that's naive I know but but at least we should allow them um but still retain memory safety allocated independence for the rest of the program. So how do we how do we manage this kind of weakening of our property? Well uh again we look to to the information flow security community and look at the the concept of declassification or downgrading.
So non-inference says that a program's observable observable behavior should not depend on on any secrets unless they have been declassified because then they are no longer secrets of course. So this idea of of managing and and controlling when and where to to actually declassify and of course there's a lot more to declassification. I'll not go into that.
But the idea here is that we're going to consider the cast to be a declassification operator. So this allows the programmer at least to manage when and where uh to to uh to weaken the the memory safety guarantees and sort of in the same way also uh for for the uh for the out of memory.
Right. So with all of these puzzles puzzle pieces in place I'll introduce the the the the note language in a slightly more formal way. So it's a small while language. It doesn't contain any real surprises. is C like of course it is not C because we we we don't have any undefined behavior as such and we don't want to get into that discussion at all that's not what what this is about but uh but but provide a framework for talking about memory safety in a slightly wider context so the only thing we've added an an observe this is just a way of encoding the observable behavior so this generalizes the the prints and the outputs and network communication whatever you need cast is basically a noop is a no operator and this is is actually also the same in in the information flow security community. A declassification operator it's useful for for indicating this is where it happens. This is where where where stuff happens. But it is a no-op in a semantic sense.
Our language is of course parameterized on an allocator. Allocators define the semantics or implement the the semantics for free maloc and then the the null pointer the exception pointer right. Um so what is an allocator in our setup? It it consists of five things. It has an abstract allocator state. We don't really require anything of this state or have any constraints for it. It's up to the allocator to decide what to use it for. We just uh make sure to actually thread it throughout the the semantics so that it's available. And then of course it defines a null address malo and free and also an initialization that is only run once at at the beginning of the program to set up the initial allocator state.
Okay. uh semantics nothing too exciting I'll not go into into any depth with that just note that the the semantic step relation of course depends on an active allocator that's the alpha on the left hand side of the turn style we thread the as I said the allocator state that would be the capital a and the a prime and finally the running the semantics generates an event trace events are the things that might happen during execution that are of relevance to our memory safety so that would be all of the all of the normal memory uh memory management instructions. Do note that there is an M fail which corresponds to okay I've tried to do a maloc but it failed. I I tried to allocate n size of n memory allocation of size n but it failed. So we want to to keep track of that of course and the the ops v is of course sort of the uh the event corresponding to an observe instruction right so this these are the externally observable uh uh behav this is the externally observable behavior of the program so running a program gives rise to an event trace and this is really what what is at the heart of our our formalization so how do we formalize this kind of non-inference I'll not go into a lot of dep depth with that, but we'll be more than happy to talk about this after after the the presentation. We use an epistemic approach. So that kind of turns the idea of of thinking about non-inference on its head if you if you're not used to it. So instead of thinking of what can be observed, we're actually thinking about what what does an attacker know, what does the what is the precision of the knowledge that an attacker may have about the system. In our in in our case what we are thinking of is what is the set of allocators that would give rise to the same event trace for this program. So basically these are the this is the set of allocators that we can't really distinguish based on the traces that we've seen so far. Yeah. So the idea for memory safety would then be this set of allocators should be if not constant it it certainly shouldn't uh become smaller over redu over over execution time. Right? So the more we observe it it it should stay the same or or certainly not grow smaller because if we can remove an allocator from this set it means we've actually introduced a dependency because now we know that it cannot no longer be this this allocator.
Formally we define the allocator impact.
This is the uh the blackboard a for a command C the for a program C that generates a trace t. So imagine we've run the program C and we've observed the the event trace T. uh now the allocator impact is the set of allocators that would have given rise to exactly the same trace basically okay so now as I said the idea is we want to make sure that this doesn't really change that's fine but we also need to take care of uh of our downgrading operation so in this case the cast operation in order to do that we rely on some some fairly uh uh uh yeah sort of a weakening of the idea of an allocator of the allocator impact.
So we're going to define this uh blackboard a cast arrow whatever uh to mean more or less the same as the allocator impact except that it's the set of allocators that would give rise to the same trace but also furthermore allows for a further cast operation for a further cast instruction. We make no requirements of what are the addresses that that that get gets cast. So what what are the specific values used but it should be possible to do a cast after after having produced the trace t.
Same goes for for the allocator. So how do we use these uh these allocator sets to actually define the non-inference?
Well um if we if we start from the from the bottom that's probably the easiest. So uh the blackboard a so the allocator impact for the trace t eta right. So this is this is the the the set of allocators that are able to actually produce the the trace t combined or or followed by by an a single eta event. This should be the same as the set of allocators that could produce the the the trace t. So what that means is that even though we've seen yet another event and we we've observed something about the program this eta we can't actually remove any of the allocators from the allocator impact sets meaning we haven't introduced accidentally or or on purpose any dependencies on any of the allocators in this set. Yeah. Uh for the cast operation this is slightly different because we can't we can't um we have to take a slightly weaker uh definition. So what we're requiring here is that the allocators that have given rise to the current set of traces must be part of again this this progress impact forecast which says that yes it's the same allocators but all of the allocators in in this set must also be able so all of the allocators that have been able to produce the trace t must also be able to produce some kind of cast operation. Again, we make no requirements on the values that are that are uh actually cast or the addresses that actually are are cast, but it they must be able to do a cast operation. Uh I'm running out of time, so I'll just uh skip this just briefly mention that uh we've used as a use case for for this uh this idea. We've shown that uh we can translate the the memsafe language of amarim pritu and pierce from their post 2018 paper uh to into our no tag language and preserve memory safety. So mems safe language is of course a language of memory safe programs. So this translation actually satisfies our gradual allocator independence i.e e memory safe and in fact the translation of memory programs that are not stock so not error errors out on in some way they will either be out of memory or they will succeed with the same memory I think that's all I have there's a lot of questions but I'm afraid we only have time for one or two quick questions could you get set up >> thank Um, I guess, uh, it'll be really quick.
Hi, I'm McKenna call at Colorado State.
Uh, I'm just kind of curious about the maybe a philosophical question. Why think about this from a confidentiality perspective instead of integrity?
>> That's a really good question. I think >> it feels more like an integrity condition to me.
>> That's funny. I think more of it as a confidential confidentiality because actually we we don't want any information about the the allocator implementation to leak into the program.
So to me that's that's that's confidentiality but I think we could equally have formulated as integrity actually.
>> Sure. Okay. Yeah.
>> Thank you.
>> Sorry.
>> Your allocator as it's currently formalized. Hi I'm over here.
>> Hi >> Mayano Princeton.
>> Yes.
>> Um your allocator as it's currently formalized has a very strong like Malik style API. Yeah.
>> Most allocation that you see in C or lowle code is just immediate values. I'm going to stick something on the stack and now it's memory. in your formalization, if you were to extend that to handle immediate values and then casting thereof into memory, would those all just be part of the allocator or would those require a different kind of formalization?
>> So, we haven't actually looked at at stack memory or or indeed at the sort of soft object memory safety. Um, I think stack is slightly different. We need a different setup for it. I think the the basic idea is the same but we need to identify more more places to apply this declassification because actually I I think there is something when we copy things into the the stack that would correspond to some kind of declassification operator but this is very much work in progress.
>> Okay, let's uh thank Renee and take a question also.
So our next speaker will be Anna uh who will be telling us about causality and semantic separation.
Hello. Okay. Um thank you. Uh I'm Anna.
I recently finished my master's in CS at MIT and today I'll be talking about a semantic characterization of desparation.
So this work lies at the intersection of programming languages and causal inference. Um our goal is to give the standard graph analysis in causal inference which is called desparation a formal semantic characterization.
um analogous to how program analyses are justified by semantics in PL.
So this is important because a formal semantic characterization can explain precisely uh what property desparation is actually capturing and why its graph rules are correct.
Uh let me first give a brief intro to causal inference. So it is the framework used to reason about cause and effect relationships and to design experiments.
Um so it studies whether variables are causally related and uh whether an experiment could identify those relationships perhaps after conditioning on other variables.
The key object here is a causal model which is a DAG describing how variables could potentially influence one another.
Um so in the example on the right we have the node prior knowledge being able to um affect study time as well as test score. And we might wonder whether the amount of extracurriculars that one participates in can have a direct effect on test score after maybe we condition on study time and prior knowledge.
Now desparation is a purely graph structural criterion and it determines exactly when variables can influence each other after conditioning on other variables. And so this question could be rephrased to are extracurriculars and test score desparated conditioned on study time and prior knowledge. And this is equivalent to the widely known notion of conditional independence.
So from a PL perspective, um desparation looks a lot like a static analysis. Uh it determines what influences what based on graph structure alone. Um however, it's lacking a formal semantic characterization.
And so we introduce semantic separation.
Um we prove that it is equivalent to desparation and we fully mechanize the whole development in rock. Um this lets us interpret deparation using concrete behavior rather than graph structure alone. It also provides an experimental interpretation of the predictions given by desparation which we'll see um towards the end of the talk.
So now I will uh describe how to determine deparation. Uh the basic idea is that influence propagates along paths in a graph and conditioning can either block or open those paths and then two nodes are desparated conditioned on a set of nodes if all paths between the two nodes are blocked by nodes in the conditioning set. And again this is commonly used in uh causal inference uh in experiment design to determine what variables need to be controlled for.
So our example here contains the three ways in which a path of three nodes could be oriented. Um and as I go through these three rules try to notice how they are asymmetric and as a result may be unintuitive which motivates our semantic definition.
So in the first case um we have influence flowing through S and so conditioning on S blocks that influence and blocks the path. In the second rule um we have a common cause of S and T and so conditioning on that common cause also blocks the path.
In the third case, which is the unusual one, um S here is called a collider and S actually blocks the path by default.
But then conditioning on S creates a dependency between its parents. Um this third rule is also more complicated than what I will put on the slide. It's also the hardest to explain semantically. Um so much of the subtlety of our semantic definition comes from trying to explain this behavior.
Now going back to our example, if we condition on study and prior knowledge, we can use these rules to determine whether extracurriculars and test score are desparated.
So by the first rule, the path E to S to T is blocked. And by the second rule, the path E to S to P to T is blocked.
And so E and T are desparated conditioned on S and P.
Now what's interesting is these three rules um completely determine desparation. And um you might be thinking it's not super obvious why they correspond to causal influence in the underlying graph structure. And so we found ourselves asking um why are these graph rules correct and what semantic property are they actually capturing?
So we set out to define a semantic version of deparation.
Uh at first glance it might seem natural to define the semantics using probability distributions since desparation is often presented via um its connection to conditional independence.
However, desparation itself is um purely graph theoretic. So whether two nodes are desparated depends only on graph structure. And we therefore thought that our semantic definition should also depend only on structure and not on any particular distribution.
So far we've been treating causal models as graphs. Um but now we need a semantic representation for those graphs and a way for nodes to obtain their values.
Fortunately, we can use the standard model in causal inference which is called a structural causal model. Um, think of each node as having an associated function which computes the value of that node from the values of its parents. And so, for example, um, the function for study or s could look something like this.
But this model as I presented it is a little too naive because um in practice we can never realistically model everything that affects a variable. So for example um study could maybe also be affected by course load or stress level or other factors that we did not record.
And so the model introduces an additional um input to every node's function. This is called an unobserved term and they're shown here in gray uh with an arrow to the specific node that it corresponds to. And so the function for study could additionally rely on its unobserved term which is u of s.
Now notice that um once we fix an assignment of unobserved terms uh the entire model becomes deterministic. So every node's value is fully determined by just repeatedly applying the functions throughout the graph.
So then we can think of a world as just a particular assignment of those unobserved terms. Um and by changing those assignments, we can compare and reason about different worlds.
So we're now going to define uh the semantics of two basic operations that will be useful in our semantic definition. First uh what does it mean to condition on a set of nodes? So it simply means we'll restrict our attention to um worlds where the values of those nodes match their target values. For example um in this graph if we think of prior knowledge as being measured as uh the amount of material that one already knows on a scale from 0 to one.
Then if we want to condition on prior knowledge equaling say 0.2 2 then any world that properly conditions on P is just one in which f of p equals2 and in the case of this graph that depends only on u of p.
Now we need the notion of changing a variable. So we say to perturb a node means to modify the unobserved terms in a controlled way to change the value of that node and then observe how those changes propagate through the graph. So for example, if we want to change the value of extracurriculars maybe from three to four, um we would start with modifying its unobserved term changing the value that we want and then this can propagate and affect downstream variables uh study and then test score.
So recall that earlier we were interested in if we condition on study and prior knowledge whether extracurriculars and test score are desparated.
Um, now that we've defined the semantics of conditioning and perturbation, we have a way to ask a more semantically precise question that kind of sounds like the desparation of two nodes, which is um, if we perturb the first node, in this case extracurriculars, while maintaining conditioning on the condition set, can we force a change in the second node, in this case, test score?
So we'll now attempt to define the semantic separation of two nodes X and Y. Um the simplest idea is change the unobserved terms in a way that affects X and see whether Y changes. Um but this is too permissive because it allows for changes that are unrelated to the causal structure. Uh for example, say X and Y are completely disconnected nodes. we would affect we would expect for X to never be able to affect Y. Um but if we happen to change the unobserved terms of both X and Y then um Y can be affected by a change having nothing to do with X.
So we need to restrict our attention to the unobserved terms of nodes that can actually affect X. Uh which leads us to think about ancestors since uh recall that a node's value is computed using the values of its parents. Um so we will restrict changes to what we call unblocked ancestors of X which are ancestors that have a path to X that does not go through a conditioned node.
This captures causal structure better but it fails in collider settings where conditioning can create dependencies that are not captured by ancestry.
So for example um if this middle node here w is the exor of its two parents and we want to condition on w equaling one then we would expect for x and y to not be semantically separated because say if we wanted to change the value of x to zero the value of y would have to change to one.
But this is not captured by the unblocked ancestors method because we can only change ancestors of x. And so if we change the value of x to zero using its ancestors um that will propagate and the value of w is now no longer properly conditioned. So the whole system doesn't make sense.
We might then attempt to enforce an additional repair step where we allow the changes to propagate which could affect the conditioned variables and then we allow for an additional step of repair where we make the required changes to recondition those variables.
Um here's an example of when we might actually need repair. So um suppose we're interested in whether a drug affects reaction time and we know that this drug affects blood pressure which affects reaction time. And so in order to isolate the effect that the drug has on reaction time directly, we condition on blood pressure.
Then once we administer the drug, we could affect a subject's reaction time.
So we might use some other common well-known drug for blood pressure to counteract that effect. So that's us propagating the change that the drug made and repairing the value of the conditioned variable.
But this still doesn't quite work in the case that there are multiple condition nodes where one repair step might not be sufficient. Um for example, suppose we have another node sleep. We know that the blood pressure drug causes extreme sleepiness and sleep also affects reaction time. Well, then we also need to repair sleep. And you could of course imagine we had several more nodes that are conditioned on and would need to be repaired. So this kind of suggests that semantic separation cannot be defined as a singlestep uh notion. It should instead be a multi-phase process.
So the fix for this is to allow for a sequence of worlds which represent step-by-step modifications. Um so our final definition has three main ingredients. First a catalyst step makes changes to the unobserved terms that are allowed to affect x.
Next we let those changes propagate through the deterministic functions of the graph. And third, if any conditioned nodes have been disturbed, we allow changes to repair them using their own unobserved terms. And then we go back and forth between the propagation and repair steps since each repair could itself need to propagate and could require more repairs. This results in a sequence of unobserved term assignments where each pair of consecutive assignments we constrain in a way that guarantees that we make changes minimally.
At the end, we compare the original and the final worlds. And if every valid process that changes X leaves Y unchanged, we say that X and Y are semantically separated.
Here's an example from the previous slide where we are conditioning on blood pressure and sleep. So we can show that drug and reaction time are not semantically separated by showing some process that perturbs the drug node and as a result the reaction time node.
So to perturb the drug node, we would have to use its unobserved term that can propagate to affect blood pressure and reaction time. But since blood pressure is a conditioned node, we can repair it using the blood pressure drug that can propagate to affect sleep, which is also a conditioned node. So we can repair that maybe with caffeine. And that can propagate to affect alertness and then reaction time. So since we've now changed the value of reaction time, drug and reaction time cannot be semantically separated.
And what we fully formalized and proved in rock is that the notions of semantic separation and desparation are exactly equivalent. Um very briefly the proof idea for this is to take the contraositive in both directions. So in the forward direction we assume that the two nodes are not desparated. So there exists some unblocked path from X to Y.
We then construct functions for each node and a sequence of assignments that propagates a change from X to Y along that unblocked path.
In the other direction, we assume that the two nodes are not semantically separated. So there exists some functions and unobserved terms that propagate a change from X to Y and then we show that there must exist an unblocked path along which that change is actually propagating.
So hopefully this flavor uh this slide gives you a flavor of the definition and the proof but um it's of course missing a lot of detail. So if you're interested please check out the paper and the codebase.
Now that we have the theorem um here's one way in which it could be applied in an experiment design context. So um in a scientific experiment uh a causal model is used as the hypothesis of relationships among variables. And so by our theorem, if a causal model predicts that two variables are desparated, then no valid perturbation of one could affect the other. And if such an effect is observed experimentally, then the model and the hypothesis have been falsified.
So here's an example. Suppose we hypothesize that drug has no effect on reaction time after we condition on blood pressure.
Now, we've turned a graphical property into a testable prediction, which we might test out by repeatedly recruiting new subjects, administering the drug, and then repairing any effect that this has on their blood pressure using the blood pressure drug.
Throughout this process, we should observe no change in the subject's reaction time. But if we do, then the causal model and our hypothesis encoded by it have been falsified. So yeah, that's one way in which um the theorem can be applied in experiment design. But more broadly, this work introduces a semantic characterization of desparation, proves their equivalence and fully uh formalizes the whole thing in rock. Um yeah, the URLs to the codebase and the paper are available here. And um I'm happy to take any questions. Thank you so much.
questions.
A very nice talk. I am Shinjang from P University. So from my understanding, this operation can be applied to long course models like just a regular B network. So can your framework take the extent to this case?
>> Um yeah I think desparation can be applied in lots of contexts not just experiment design. Um this work is kind of the foundation in maybe a longer term project to uh formally verify experiment design and so I was more interested in using it in experiment design contexts but yeah I think it could be implied in other uses of desparation as well.
>> Okay. Thank you.
>> All right. In the interest of time, I think we'll take any further questions offline. Let's thank Anna.
>> All right. So for our fourth talk of the session, uh Trion will tell us about hyper separation logic. Are you connected? This test so everyone I'm trian and today I'll be talking about hyper separation logic a joint work with Peter Mueller and Tibod from BTH and EPFL and I'm from inside so the goal of this work is to develop the first model the first framework for model reasoning for hyperproperties what are hyperproperties properties. Well, hyperproperties relate multiple program executions together and characterize program behavior by doing so. For example, non-inference relates two program executions universally that's why two universal quantifiers and it states that for any two executions if they have the same public inputs then they must have the same public outputs that is there is no leakage of syncret information.
Another interesting hyper property is generalized non-inference which is a weakened variant of non-inference to account for nondeterminism. Its type is for all for all exist which means that the hyper property states for all two executions there is a third execution for which the three executions satisfy some relation. So the type shows how complex a hyperpropy is and there are many other hyperpropies of interest. On the other side uh we have separation logic which is the gold standard for model reasoning. Separation logic allows us to prove local specifications and by applying the frame rule allows us to augment set specification with an additional heap. However, separation logic is limited only to functional correctness. And the goal of this work is to lift the uh yeah the reasoning to arbitrary hyper property while preserving model reasoning.
Now let's look at some existing uh program logics for hyper properties. The most basic hyperpropy is for all hyper property where the logic we all know uh is her logic for heap manipulating programs. The standard logic is separation logic. for exist hyper properties. The two standards are incorrectness logic in correctness separation logics and so on and so forth.
Recently, two years ago, hyperore logic was introduced and it filled a gap for complex hyperropes by uh the novelty of tracking set of states instead of simply states or kos of states. We'll talk about why set of states are so expressive in the next slide.
However, there was still a big gap which we uh filled with this work uh which is for complex hyperproperties for heap manipulating programs.
This is where hyper separation logic comes. Uh okay. So it's important here to note that hyper separation logic allows us to reason about different hyperpropies under the one singular framework. For example, if before we wanted to reason about for all and exist exist hyperproperties, we needed to use two different frameworks. Now we can do that with one singular framework.
Okay. Now why set of states are so powerful? Consider the following hypertro which uh has the pre condition and post condition in blue just for readability.
So in our case the pre and post conditions are satisfied by set of states and not simply uh states. So our precondition in this case is satisfied by those set of initial states where all initial states sigma satisfy that x points to five and disjointly y is allocated.
Then if we execute uh the program that at why we assign the point at x + one then we end up with a set of reachable states where all these reachable states satisfy that x points to five and y points to six and they're disjoint.
Now if we look at the uh falling separation logic triple which is the same as the both but without the universal quantifier uh it turns out these two have uh the same satisfiability that means that we can express separation logic by just putting an extra universal quantifier.
Now if we look the existential case for example here the precondition states that there exist at least one state sigma which has r allocated then if we nondeterministically assign at r integer between one and four then it is clear that we will have at least one reachable state sigma prime with r still being allocated but now pointing to one without the existential quantifiers we obtain exactly the valid validity of separation sufficient incorrectness logic and for those of you who don't know SS validity states that for all initial states there is at least an execution trace that ends up in the post condition. Yeah. So we can express uh SSI with one extential fair.
Now let's consider the following relational separation logic triple which expresses non-inference uh for L being low sensitivity variable and H being high sensitivity variable.
So RSL works just as normal separation logic but for pairs. So we track pairs of states and not simply states. Here the precondition states that we start with the same W variable L both first and second coordinate have L pointing to the same value and we end up with the same value for O. That means that O doesn't depend on the secret H which is exactly no interference.
We can express this RSL triple with the following HSL triple where the precondition states that for all two initial states sigma 1 and sigma 2 they both have L allocated and moreover it points to the same value n.
Now we execute the program and our post condition states that for all reachable states sigma 1 and sigma 2 primes they both have all allocated and pointing to the same value that means uh all doesn't depend on the secret h it depends only on l which is exactly non-inference so in short why set of states are so powerful because we can put uh restriction on this set of states in the form of quantifiers and by Doing so we can express different logics and even more we can express uh hyperpropies which are not expressable by any uh current separation logic for example exist for all.
Now onto the key result of this work the generalized frame rule the key new thing is the uh new hyperparating conjunction which we'll denote with the folding symbol. Before we see why this frame rule is so general, let's first see uh this restriction that we have here, we restrict that our frame should be consisting only of universal quantifiers. So no existentials. Why is that? Well, because an existential in the frame would enforce reachability in the post condition which in turn enforces termination. That's not that interesting. uh we will see that this is not a bad restriction. Now uh why do you call it generalized frame rule? Well, because using our frame rule, we can obtain the frame rules of order separation logics and even more. For example, consider the following hyper triple which states for all states sigma. for for all initial states if they have x allocated and pointing to five and then we increase it by one then obviously all states will have x pointing to six.
Now if we frame the green frame that states that all states have Y allocated by properties of our hyperar which we will see in a moment in a sense our hyperstar distributes over universal quantifiers just as normal conjunction distributes over universal quantifiers.
Yeah, we obtain this and now if we look closely at the top and the bottom part, this gives us exactly normal separation logic frame rule that is said concisely. We obtain separation logic frame rule by framing one universal quantifier.
We obtain a frame rule similar to RSL for two universal quantifiers.
For SSIL we don't frame with an existential because the two witness executions might be unrelated. This a bit technical. The important part is uh we obtain SSI frame rule by framing one universal quantifier which is exactly what our uh frame rule allows us which is what I meant that our frame rule is yeah that this is not a big restriction.
However, as I said, we go beyond existing frame rules. For example, the following hyper triple above expressing that uh exist for all hyper property. We can for example frame for all for frame.
And what is even better, we are not forced to frame the same amount of quantifiers on the left and on the right, which crucially allows us to frame for example non-inference to generalize non-inference, which is exactly what our goal was to obtain modular reasoning for hyperpropies and so on and so forth.
Now the key challenge of this work is defining the hyperspepparating conjunction. To see how we do it and why it's challenging, let's first consider the normal separating conjunction. P star Q we all know is uh we take all states from P all states from Q and if they're compatible we combine them.
So different separation logics have different type for this uh sigma and different type requires a different definition for this combination predicate.
So separation logic in correctness separation logic and ssio the combination predic predicate for them is just disjoint union for relational separation logic and other such relational logics. uh the combination predicate is just pointwise lifting of the disjoint union. That means first coordinate with first, second with second, third with third and so on so forth. However, for our case, it's not that simple. Why? Well, because we don't have ordering. We cannot do first with first, second with second.
States don't have order. Second, the cardinality of the two states could be different. And on top of that, one could be finite, the other could be infinite.
It's not trivial task to define this combination predicate and uh yeah so now to gain a bit of intuition about this combination predicate let's look syntactically the the following uh assertion the assertion says that all states sigma have x allocated now if we disjointly add to it that all states have y allocated intuitively should mean should be equivalent to the fact that all states have x and y allocated and they are separate. So as I said earlier in a sense the hyper star distributes over universal quantifiers just as normal conjunction does but it uh it is separating for the existential case. If we have that there is at least one state with X allocated and we add to it that all frames have Y allocated intuitively this should imply that there is at least one state with X and Y allocated and they are separate why implication and no B implication because on the left we have that for all states Y is allocated on the right we have guarantee only for the witness state.
So in short our hyperstar should respect both universal quantifiers and existential quantifiers just as normal conjunction does. The only extra thing is that it should be uh separating.
With that said, we proceed to defining the combination predicate. Consider the following uh set of four states in this case SP where circular sectors represent heap. So the bigger the sector, the bigger the heap and course represent stores. So different core, different store.
What does it mean to disjointly combine it with the following set of three states? Does it make sense to uh equate this set of one state? Well, consider that sq satisfies that all states have y allocated. And let's say that this is the part showing that y is allocated. As you can see on the right, Y is not allocated. But we said that our combination predicate should respect universal quantifiers and it doesn't here because we don't have Y on the right. So we should refute this S. We do that formally by the following. We require S the one on the right to be a subset of SP normal star SQ.
by doing so. Uh yeah, so that means that all states in S should be at least as big as the states in SQ. But as we can see this state in S, the green is not as big as the green state in SQ. So we correctly refute this uh S.
What about the following S? Now we can see that the universal quantifier is indeed respected. But what about the existential?
Let's say sp on the left satisfies uh that there is at least one state sigma with x equal to 5. And let's say that the blue color represents that x is equal to 5. As we can see on the right, we don't have blue color. That means that um yeah, that means that the existential quantifier is not respected. So we should refute that because as I said the combination predicate should respect also existential quantifiers.
We formally do that by requiring uh the following predicate PLW which stands for preserved left witness and it states that all witnesses on the left.
So SP is on the left side of the operator. So all witnesses on the left so the blue state should be preserved through a state on the right. So the blue in SQ into S because now we have preserved the witness. uh that means we respect both universal and existential quantifiers.
So this s should be accepted and we have a symmetric preserved drive witness if the existential quantifier was on the right hand side.
Okay with uh with the main contribution covered we now turn to the uh the small contributions.
So first thing we identify and solve two non-locarity problems that means two problems with the frame rule. The first one is about allocation and the second is about errors and for errors we identify two types of errors local which means compatible with the frame rule and non-local which means non-compatible with the frame rule. We solved the problem with uh the non-local errors with the so-called unknown states which unfortunately I don't have time to tell you more.
Then we define hyper triple validity and proof semantic uh rules sound which is what we call hyperparation logic.
And finally we define a syntax for our hyper assertions and prove syntactic uh rules sound which is we obtain a syntactic variant of hyper separation logic. There is uh one more thing that is novel is the so-called scaffold variables which greatly eases the work with our syntax and moreover it allows for a stronger read through rule which is quite technical but it's very interesting and we formalize all of this in the proof assistant is with that I'll finish and give time for some questions thank questions.
>> Okay. So, this is very cool. I'm curious about what sort of future work is is going on.
>> I didn't hear what >> future work. Uh so future works is concurrency and uh non-termination and relational properties. Yeah.
Uh hi thanks for the talk. So if I understood correctly like all the examples you showed the resources and the separation logic were plain regular heaps. So I wonder how much of this generalizes to arbitrary PCMs or like did you use extra properties of the heap that go beyond the basic PCMs for this?
So I believe that this general idea would hold for any type of PCM because I don't show here but there is a this PLW and PRW actually stand for uh they correspond exactly to subtraction which in a sense lift us away from the from the concreteness of the model and just steps on top of the normal separating conjunction. So for whatever the normal separating conjunction works, the hyper separating conjunction should also work just because it it's very abstract. It steps on normal separating conjunction and normal subtraction.
>> Okay, cool. Thanks.
>> Uh hey, thanks for the great talk. Uh so maybe a kind of general question that would maybe apply to hyper whole logic as well. So if sometimes hyper properties are used to reason about two programs at the same time, for example, when you do refinement. So could you extend that like state based reasoning to reason about two programs at the same time?
>> So yeah this is part of yeah we are working on that.
>> So when you mention concurrency maybe that also >> yeah all concurrency relational reasoning and non-termination. Yeah but each one has their thank you their difficulty.
I actually have a question um about uh the the frame rule for the incorrectness or the SSIL logics.
>> Yeah.
>> Um it seemed to me maybe I'm confused, but it seemed like when you you're framing on a for all Yeah. There uh when you're framing on a forall, it seems like the rule you end up with is not quite comparable to the standard rule in in in those logics because there you would have an exist that spans over the entire pre or the entire post. Is that am I >> Yeah. Is that right?
So for separation logic and relational separation logic we require only the consequences the consequence rule for SSI we need one extra step to drop away the universal part that isn't attached to the existential part. So yeah there we need one extra L which is joint.
Um all right let's thank the speaker can the hazard over.
All right. So for our final talk of the session, Robin will be telling us about Pantoime.
>> Try speaking.
>> Hello. Hello. Yeah, this Okay, wait.
Let's see.
Right, perfect. All right, so let's dive right into it. I'm Robin and this talk is about timing site channels for hardware or specifically how to prove your hardware does not have them. Uh so timing leaks uh secrets. So if I have a computation on which you're uh computing on some secret and you observe that this one is uh pretty fast and then you're doing the same computation on a different one and you observe it slow.
But then an attacker can see uh if it's fast, which you know the yellow secret was used. If it's slow, the red secret was used. And these timing site channels uh they're used to break a lot of stuff uh including cryptographic code. Um if you're not careful with this stuff, you'll leak your private key and this is uh very very bad. So uh one of the things to avoid for example is branching on the secret. If you do the timing depends on whether you took it uh because this body takes some time to evaluate and similarly if you use a secret to index into an array you'll uh leak information because actually uh an attacker can m uh can measure if they have a cache hit or miss at the specific index of the secret. Uh on the other hand, it's completely safe to do addition on secrets because we expect that these additions are constant time for all inputs. And uh this notion of using these uh instructions in certain ways or avoiding some is what we call constant time programming.
Um and now we're we want to capture this constant time policy in leakage description. So this leakage description essentially tells you uh what an attacker can observe um or in other words what part of the secret we censor.
So you can think of it as redacting like removing parts of the secret until you give something uh that an attacker uh is able or that you think an attacker is fine to observe. Uh so with respect to this constant time policy if you have an addition operation then uh we censor out the operants because we're not leaking them. We only leak that we are doing an addition. Uh but if we have uh anal then all bets are off. We're just leaking whatever uh is in the conditional because you know we can't protect against this. So this is uh uh the Constantine policy essentially and now the uh crypto writers they just assume that uh this policy is implemented but uh sadly usually uh actually this hardware does not implement it right so it should and I'll emphasize this should because uh there's a lot of for example spectre like attacks which is a type of side channel which uses spective execution uh like only in the last year already do we have like nine new variants of these spectra attacks. So uh yeah it's not a given that these hardware circuits actually implement our uh leakage specification.
Uh so the the idea is that we want to prove a proof that our hardware does indeed implement this leakage description. And how would we prove this? Well, if we have uh two secrets and they go to the same secret uh once they're censored, then ideally we we want uh no observable timing difference from our computation.
So u this is sort of the high level property. If if these secrets censored secrets match then we want the timings of the circuits to match. Uh but turns out that this proof is pretty difficult in general. So requires usually a bunch of inductive invariants and uh you know the people who want to write these proofs are circuit designers. There are no verification experts and usually it lies a bit out of their expertise. Um, and then if you're trying to find this invariant and it doesn't work out, you get a counter example that gives you two secrets and then you have to diagnose four executions of all of these circuits to actually find out what's wrong.
Um so we uh propose a different type of uh non-ference check which is uh inspired by cryptography where you have a simulator which essentially should replicate the timing that an attacker can observe and it will do so using only that sensored bet. And if we actually uh can can find this if if this one exists then we know that the timing of our original circuit is actually independent of the full secret because we could replicate it using just uh the censored version. U so using the simulator our proof goal changes uh quite significantly. So this is kind of the the key thing of this talk is uh now we have this timing of the circuit and we want to uh sensor the secret and show that we have the same timing using the simulator. Um and if we can show this equivalence uh then we're happy right um and some of the nice things nice properties about this is that the simulator it's a circuit itself and you know the hardware designers that want to prove this property they they know how to write circuits it's their job and then if a proof doesn't work out for whatever reason they can actually run all of the components right because the simulator is just a circuit. The leakage is just a circuit. They can run and debug them with the tools they know. Uh and the counter examples there'll be a single secret that's only uh of which you only have to run diagnostics for two runs of circuits instead of four. Uh so that's our our solution. Now let's do a a simple case study of an adder with a fast path to just give a glimpse of uh what this uh proof method would look like in practice.
So first of all, what are circuits?
They're these things that take some input and they produce some output. Uh and they do so using some state and the state is uh updated at every clock tick.
Uh and they use it for the computation and well we're at the PLDI so we could use verilog to write our circuits but we're going to use because why not?
Um and in a HASLL a circuit definition looks like this where you have some state your current state and an input and uh you will compute the next state for next cycle and output. Um so before we actually go into writing our adder there's one quick data type that we use very extensively. So I'll just cover it.
um in hardware you don't always like the hardware always runs but you don't always have something to compute on. So we want to show that uh sometimes it's fine to have no no payload essentially.
So so we have this data type that tells us okay if there's a just we have something to compute on and we'll we'll do our thing. Uh and sometimes there's just nothing and the payload is missing.
Uh this is also commonly known as an option in many languages for those familiar. Um so our adder circuit is a circuit that is this maybe and to uh operence in it right the inputs for addition and we'll uh perform like we'll compute the sum of these operants and then register the result in our state which we will then use in uh the next cycle. So in the next cycle we simply output the result to this sum uh and we'll see that this one has a one cycle delay because we use the state uh to store it. The adder also has a fast path. So if we give it this input where the first oper is zero well then we can simply immediately give you the the value B the other operint. Uh and this one is instant right so that's a a fast path on it. Uh so this adder has the type signature as below and we'll give you the the definition for it as well.
So uh the adder takes some state and input on which it case splits. If there is a state, we're already in a computation. So we ignore the input and we simply give you that result. If we get a result or an input for which the first uh value first first operant was zero, we'll take the fast path.
Otherwise, we'll compute the the sum, put it in the register, which will emit in the next cycle. If there's no input and no state, simply the circuit will be idle and we have nothing nothing as a return. So, how do we model timing?
Well, how else then with another circuit? So, this uh observation uh this is an observation circuit. uh and the idea is that you can model uh timing by using this this payload availability this maybe uh portion of the type. So this observation circuit is simply uh uh little uh function that takes uh out the integer of this uh maybe type uh and now we can run these of course because they're just circuits. So uh just to exemplify if we give the circuit three and four to add uh then we'll see after uh emitting the payload with obser observation we'll just get this uh just unit um and this will happen at uh cycle one. So at cycle zero we'll get nothing because the computation was still pending. Uh conversely if we give it the zero as the first operint uh then we'll take the fast path. So we still see this just unit but it will happen at uh the immediately uh after no cycles essentially. Um so yeah this fastpath leaks and if our attacker can see that some input took zero cycles to compute they'll know uh sort of this shape of the input. Um so that leakage is inherent but it's also not necessarily bad. uh we can actually still capture this leakage in a contract.
Um so to just reiterate the final proof goal is that we have this adder with an observation and we want to show with uh our leakage that we can actually simulate the same timing. Uh and if these are equivalent we're happy. So let's define our leak description um which you know takes our original secret and it censors it out and the intuition behind this type is that the maybe tells you whether there's a payload and the boolean tells you if the first operent was zero or not. Uh so it's a yeah essentially stateless circuit. So it just passes this unit because it doesn't actually use the state. uh for the output a k splits on the input. If there's a an input and the first operant was zero, we'll pass in just true. If the input was there but not zero on the first operant, we'll pass in just false.
And otherwise, uh there was no input.
So, we'll simply propagate this. Uh and yeah, this is the final output. So, we just propagate this unit together with output. Uh and now, well, the goal was to replicate some timing, right? So if we feed this uh just 34 as previously uh into our leakage description, we'll get this output of just false. And now uh just to remember what the timing was for the adder, it emitted after one cycle.
So our simulator uh simply delays using this uh information of just false. It delays the output by one cycle. Uh so emits nothing and then just unit And these are equivalent. So that's great.
And the same happens for uh this uh fast path where we see a different uh result.
And also this one took more uh took less cycles to compute and the simulator will just immediately give you the result. Uh that's that's all it does. And these are equivalents. Uh so that's basically it.
we've uh proven that this leakage description adheres because we could show that a simulator replicated all the timings. Uh so yeah, we're we're happy.
Uh so let's quickly look at the evaluation we did for this project. Uh so we wrote this uh core in hasll called aim core uh which is a risk five five-stage in order pipeline. uh it takes uh it's written in roughly 800 lines of Haskell code and we verified a leakage for this uh where the leakage is 300 lines of code and simulating the timing for this circuit actually is 200.
Um so the verification of this stuff is actually pretty quick. So it only takes 40 seconds to to show that there's a equivalence between uh these circuits.
uh we have artifacts available as well as uh full code uh for both the core and the proof tool that we use to show uh circuit equivalents. Uh another cool thing is that well these uh leakages are executable uh so we can just use them to check whether some crypto library is safe or not. uh and we ran it on lip sodium and wolf SSL and surprisingly we actually found two CVS uh or two bugs in WolfSL which we uh uh submitted to them and they've already resolved these issues. So that's pretty cool. Um yeah, of course it's a short talk and there's a lot of stuff in the paper. Um so here's just some teaser of cool stuff that's there that you can read if you're interested. So um of course writing this simulator by hand can be somewhat daunting. So we actually have a way to sort of create the simulator uh using the already existing circuit. An idea is to invert the leakage uh so pick a candidate uh of all the leakage inputs and then use the circuit to actually uh give you the observation.
Uh actually we can also sometimes omit the simulator entirely and we didn't really talk about what property we actually check to prove the equivalence. So there's some uh interesting stuff here as well uh where we make some sort of refinement relation between these two circuits because actually they both have like different states and we only want to uh actually reason about their outputs being equivalent. And then of course so we also created a a tool which is like a Haskell symbolic evaluator to show equivalents between these circuits that we wrote. Um so uh this is the takeaway.
If you want to do some non-inference stuff and you find this daunting then maybe try out uh this uh different method to prove the same property via simulation uh which uh has some uh nice properties to it. So, here's a a QR code to the proof tool and to Aimcore. Uh, yeah, that's it. Thank you.
Thank you. Quick work. So, with your leakage descriptions being circuits, how do you handle misprediction? I mean in other leakage descriptions for example you have like this always misspredict strategy where you go down both paths of the of the if right to like model that okay potentially we might go down both paths in the cache.
>> So how do you model that in a circuit?
Well, so um first off, so we've only proven stuff about in order circuits, which is pretty much what most work in this area does or no one has really proven like a out of order one. We're working on this. Uh but I uh I guess like uh >> Okay, sorry I didn't want to confuse you. I'm just asking because you mentioned Spectre in the beginning.
>> No, no, of course. Yeah, I guess u like it's definitely our goal to to actually prove these things and I think we can capture also this this notion of speculation inside of these uh contracts but uh yeah it's a work in progress.
>> Thank you.
>> Thanks for the talk. This is Ratchet from MIT. Um so actually the architecture community they care a lot about this problem and they can get to out of order pro uh processors and prove leakage properties about them. They do usually use self-composition which I think is the first thing you're describing up there.
>> Um I actually didn't quite follow what the verification technique is. Once you have your verification conditions do you discharge them automatically or am I writing like a >> proof script on the side?
>> Um so um there's two parts to this. So there's some refinement relation where you uh show like how the states of these two circuits relate. Um and then at the end we discharge this equivalent uh of these two circuits to an SMT solver.
>> And so I give you the refinement mapping.
>> Yeah, you give the refinement mapping.
It's it's um actually this refinement mapping is also just another function.
Uh >> sure.
>> That maps the the states in some way.
Yeah.
>> Thank you.
>> Any more questions?
just one more on maybe the usability or like how do I use uh the proof tool if it's you know especially that extracting the simulator from the circuit I suppose this means I need my circuit description in Haskell >> right so actually like the proof technique itself also translates to verlock. So you can just use it in this setting as well. Uh but we've yeah we've opted to use for all these uh has some nice properties but yeah uh the the general technique just translates.
>> All right, let's thank the speaker.
It's time for lunch.
Thank you.
Hi. Hello, bro. He's not coming.
This is This is Hello.
Yeah, this is my voice. Okay.
Yeah, that's better.
Hello. Hello. Okay. So, uh good afternoon everybody. Welcome to the equality saturation session and my name is Hini. I come from the University of Washington and I will be serving as your session chair for this session.
Okay. Okay. So we will get started in two minutes and uh in this session we have a total of four papers which includes both new extensions to equality saturation engine itself and the new applications of applying the equality saturation technique uh to new domains.
So I think it is very exciting to see both kinds of the uh both kinds of equality saturation research and see them go in hand in hand. Okay.
So uh let's get started.
So for our first paper, I would like to recall everybody that the root of egraphs lies in SMTP solvers and the the application of SMT solvers require you to actually use lots of different versions of egraphs because you want to explore the space uh of predicates and you want to try things and retreat back to things. So this paper really gets us back to this old root of egraphs and please so welcome Jadine and he will explain how you can do that versioning very efficiently for egraphs.
>> Thank you very much. This is joint work with my colleagues George Zakur, Pascal Vzenberger and my supervisor Vido Sravanki and uh in our group we are interested in program verification. So you would consider a program like the following and prove some properties about it. This is going to be the running example is basically a function that is computing the distance between two points given the difference between their coordinates and some function f which denotes which kind of distance we are interested in. And uh particularly this function is going to perform some optimization depending on uh the inputs.
So if for example x is equal to y, we're going to just apply f once. And now when you're considering this kind of function, you might be interested in properties like is the distance function actually commutative between x and y given all these optimizations. And to do that you since it this is an equational uh property you're going to look at the program and see which equality is between terms old. So first you might look at the scope of the programs and notice that some equality is only old under certain assumptions. For example x is equal to y but only in the first branch and in the s branch is actually different and dx is equal to dy in the first second than branch and it's different in the other s branch. Right?
So there are a lot of verification tools uh that are able to reason about these properties and most of them uh rely on an SMT solvers and the SMT solver itself will rely on a data structure which is called an egraph and inside of the egraph point. More recently also uh there are solvers that have been implemented directly on top of egress.
So automated prover like daisy and ciscma uh which are novel and we also include those kind of pros.
So today I'm going to talk about union find and egraphs and some of the extension to reason about conditions and uh more persistence. So first let's deal with union finds and a union find is really a way to encode an equivalence relation. is basically a graph of nodes that are terms and if two terms are connected then they're equal. So we are going to consider the union find at point A in the program and the way uh we're going to reason about equivalence relations here is we are going to put all the terms that we find in the programs inside the union find. So each term is a node and now we're going to scan the program to see which equality actually hold and in point A in the program we start by seeing that dx is equal to f of x. So the way we uh add this equality is by adding this edge between f ofx and dx and this is called a union and operates at the root and then you will discover dy is equal to f of y and dx is equal to dy at point in the program. Notably the fact that the union only performed is performed by um is only adding edges at roots. It means that actually the union find is not a graph but it is a forest and the forest is basically containing an equivalence class of terms. In this case there is an equivalence class of four terms uh displayed and moreover since this equivalence class is a tree it means that you can select a canonical element of the equivalence class by simply following the edges to the root. In this case the equivalence class or the representative of the equivalence class of f of y is ddx.
And finally with pass compression you can shorten the path that you follow from the term so that uh uh future queries are going to be faster.
So now there is a problem with this uh with encoding equality in this way uh at least like for each branch and is basically duplication. So if you were to consider a more complex program like this one, it doesn't matter really what it does, but it basically applies the distance function multiple times and maybe we want to prove a property of this uh program which is of this function which is some sort of commitivity of its arguments. Now for example we don't know uh what is AB. So we need to unfold the definition of this to understand what are the possible cases for AB to B and we are going to see that that the cases are the same as before uh structured in the scopes. So we have to consider three cases and I'm going to represent them as blue nose uh on top of the green tree. And now when you're going to unfold the second definition you're going to span the same tree but for each leaf which means that you're going to now have to explore nine cases. And if you keep going like this, you're going to reach exponential explosion. And more importantly, now you have to consider 81 cases. But uh there are a lot of ancestor in common between these cases, which means that there is a lot of redundant work that uh it's shared. So we need to deal with this uh shared work. And this is why uh researching persistent techniques for union finds and graph uh is very helpful.
And this is in fact our first contribution which is the version union find which is a simple way to reason about persistent in union finds. So the first thing that we see is that the version union find has a same term space as a union find. So the terms in the graph are the same but we now associate to each scope a version which is some identifier actually just to uh recall that specific scope. So for example x is equal to y in that scope and the scope is called 0.1 and the idea is that equalities at all the at a certain scope must must of course all also at inner scopes and this is basically translated into descendant version are uh an equivalence relation that is a corening of equivalence relation of ancestor versions.
So if you have this kind of structure now you can scan the entire program and the first thing that you will do is encounter x is equal to y and you you will just perform a union as before but now you can also label the edge by saying that that equality is actually only valid in the scope 0.1.
So now you can go along with the program discover new equalities and these are uh true but in a different scope 0.2 and those that are true in 0.1 are not true in 0.2. 2 uh and when you find the x is equal to dy is an additional scope which is 0.2.1 and now this union finder is encoding all equivalence relations uh in the same data structure uh for these programs.
Uh importantly the equivalence is now uh basically an extension of connectivity where you are allowed to follow edges that are either in the in the version you're quering or in an sensor. So if you want to find the canonical for example of f of y at version 0.2.1 you're allowed to follow 0.2 because it's an ancestor and 0.2.1 because it's the same version.
So this traversal is allowed. And finally, you can also perform pass compression by adding this additional edge. And this additional edge is not only shortening the path from the element to the other to the representative, but it's also making it easier to recognize whether you can follow that edge or not because you don't have to query whether it is an ancestor or not.
Right? So there are a few problems with this which first is to ensure correctness that is as I said descendant version must cores versions. So I can construct an example like the following where I'm unioning Y and Zed as the descendant and then Y and X at an ancestor. And now what happens is that I have that X is equal to Y at the ancestor but not as a descendant anymore. So what I need to ensure is that every union at ancestor actually implies unions as as a descendants. So there is some process of propagation and this propagation will add that additional edge which ensure actually that x is equal to y as a descendant as well. And another problem with termination which is actually related and the problem is that I can construct an example where fine doesn't terminate by adding this edge at a descendant and this edge at the ancestor. And now if I were to query the u equivalent class of y as a descendant version I'm stuck into this loop.
Uh and the solution to this is somehow making these loops trivial to detect by maintaining version canonical. So by propagation I will end up with a configuration like the following where I know that uh there is this self loop around x and now when the find traverses a self loop I know that I've reached uh the canonical uh for that specific query and another way one way to achieve this is by propagation as before but another way is by full pass compression which achieves this different configuration but it works all the same so you can choose which optimization you want to Okay, we perform a complexity analysis of this data structure and the idea is that we simulate uh a version union find using multip copies of a union find. uh so we apply the same operation but on multiple copies on union find and in this way we can extend the classical result by tenant and uh the basic idea is that the worst case for a version find is the best case of maintaining that many clones where that many means the number of versions that you're considering which also means that any amount of propagations that you're avoiding you're achieving better performances and then uh when we reason about programs like this list function we are also dealing with a function like that argument f and if the functions are pure then we can uh uh we have to deal with congress which basically means that we know that for equal inputs a function must produce equal outputs and the traditional way of doing this is using an egraph and uh the idea is that uh is basically a union find where the elements are enos and an enode is just a function application and a function application is represented as a function symbol with some pointer to this argument. So for example you can see the constant x is just the function symbol x applied to the null pointer and the function application f ofx is the function symbol f apply to x.
Okay. So if you want to consider the egraph at point b in the program now you will discover the equality x is equal to y as before and you can add it as in the union find. But now the problem is that you have just breaking the congruent invariance because you have equal input X and Y but different output f of X and F of Y. So what you need to do is apply rebuilding and rebuilding is really just unioning and finding. You want to union a node with uh it canonicalization which is obtained by replacing uh its argument with canonicals. And this is a key insight because union if the building is just union and find then we already have all the ingredients to uh have an aversion egraph.
And in fact in a version egraph you would this x is equal to y at a certain version which is labeled edge. And then if you want to restore congrent for that version you are just going to apply the union and define as version and you will have the same uh correctness for the congressing variant as as that version.
So if you're going to explore then the rest of the program you're going to uh just proceed as a union find in this case because that is the only case for congruence.
So in the paper we are several artifacts. The first one is a formalization where we define the union find as the egraph and we also prove them correct where correctness means that a user would provide a sequence of uh equalities in a certain in certain versions and the union find will represent the smallest reflexive symmetric and transitive closure of that sequence of equalities and the egraph is also the congruent closure. The version union find has also been mechanized in lin.
And uh the second uh artifact is an implementation which is an open source library in rust where we support most common egress extensions like the fer building analysis in matching saturation and extraction and the insight is that most of them actually boil down into maintain some kind of version cache.
Moreover, recently we explored version proof production which is to appear as a paper actually in two weeks in very native coupoop.
Uh so for the evaluation we considered three case studies. Uh and we compare four different approaches. The first one is versioned egraph. The second one is a naive cloning of egraphs. The third one is a naive implementation of persistent egraph which is uh basically using existing persistent data structure in rust. And uh finally uh Easter egg in the first case study we consider a parameter analysis where we value some some configuration of egress and we generate some random egress. So it's a random workload and uh we consider 25,000 configuration and in general for runtime and memory we see that we achieve better performances.
For the second case study, we consider a new F solver where the workload is uh push and pop, which means that we're going to only work on the leaf and when we are done working on leaf, we can also drop them. We consider 6,000 benchmarks from the SMT lib about UF theory and uh we see that the performances are similar maybe a bit better with respect to the um best approach which is the persistent the naive persistent egraph.
Uh and these are way better than the other two approaches. And finally we consider another domain al together which is what I mentioned before an automated prover called by uh singer 21.
And uh this workload is assistant concurrent where the idea is that uh we want to explore u the possibility of working at any version at any time. And uh we uh we consider the 300 benchmarks uh from the original paper. And here we perform similar to the best approach which is cloning uh which is way better than the other two.
So for the parameter analysis since uh I uh could not pick some precise measurement I'm going to show the charts and in the charts uh we can see uh basically on the y-axis the resources for generating a certain configuration in terms of time or memory and on the x-axis the number of nodes and unions.
We also varied the versions but I cannot show it in the presentation. So what we see is basically that lower means uh better and version dgraph is consistently better uh in terms of time and memory uh varying inos and unions right so uh just to wrap up in summary we are interested in program verification so we are going to consider some programs and we want to prove some properties about them and to do that we're going to deal with equalities and some of these equalities will all only under certain assumptions Then we uh provided an implementation a new implementation of persistence for uh unified and egress uh which basically aims to reduce the amount of redundant work uh between equalities at all under different assumptions and then we provide a formalization uh to prove the correctness uh of the version unified and egraph and pen to library with most common egraph extensions.
And finally an evaluation of three case studies where we show that we are either better or comparable to the uh best approach in the state-of-the-art.
So if you're interested in looking at the paper where there will be many more details, you can scan the QR code. You can also find on the top right uh a link to the artifact and to a Jupyter notebook where there is an easy implementation where you can try it out.
So thanks for the attention and I'm open for questions.
Okay. Uh, thanks Jinine. And now it's time for questions.
>> Hello, I'm Adam Chapala from MIT. My understanding of kind of the the API of your system is that it lets you track uh logical consequences in multiple different hypothesis contexts at once and takes advantage of implication relationships across those hypothesis contexts. But it seems like the only implications you use are those that are present syntactically in the say the program that you're verifying and there could be other provable implications maybe even some that only follow using information to do the equality saturation that you're not taking advantage of. Does that sound fair?
>> Uh I I see your point. I guess what you're saying is something like you might discover that some condition might actually the same and maybe you want to share uh equalities between those conditions, right? A really basic example would be you just copy and paste two versions of a program in a row and you want to reuse all the work from analysis of the first one for analysis of the second one.
>> Yeah. Uh I think like I had an an idea where you would be able to actually this is kind of future work uh where I wanted the ability to be able to merge two versions in such a way that you do not only share among scopes but also among independent version that somehow for some semantic reason uh they you know that they share something right so this is a bit forward looking though like it's work in progress >> sounds like a great extension to look at thank Okay.
Hey, thanks for the good talk. Uh, Max Wilsy from UC Berkeley. Uh, do the versions have to form a tree? Uh, and yeah, do the versions have to form a tree or can they be any just partially ordered? Uh yeah, it's part of this actually like uh right now they are only a tree but uh potentially what you could do is for example say that you want to union two versions and you want to consider a new version that is actually inheriting from two or maybe intersecting two versions saying for example if you want to do a case analysis you want to only consider facts that was in any of them right in all of them sorry but >> but currently they're just a tree.
>> Yeah. Okay, cool.
Okay, I think we can have >> Thanks. So, uh great talk. So, it seems uh so this work is solving the same problem as a color graph, but it seems your yours is better. Is that true? And could you give the intuition for why it's it's better? Wh why it is better?
Uh yes I I can give some observations about what are the differences between other approaches if I can it's lagging a bit sorry uh right so these are some of the features basically of version egraph the idea that we have full sharing between the term space across versions there is also partial sharing of equalities between dependent version and this is kind of the scope and the idea is that right now is between dependent version but we want to do it as I said between independent versions then there is this version pass compression which is actually trading off memory with runtime so there is a better balance also since you have have a view uh about all the versions that you're maintaining you can also do better cache management for example if you apply equality saturation at a version maybe you need the cach for that version but you don't need for for for another version that you're not considering at the moment, right? Uh yeah, a bit of the cons is that uh right now the implementation of this egraph extension is very naive uh and we should probably look into some more version aware uh extensions and uh yes of for of course uh it is a bit more complex to achieve parallelism with respect to cloning or simple persistence.
>> Okay, let's thank Jim again. Uh okay. So for our second talk uh this is about applying equality saturation to electronics design automation. So electron electronics design automation EDA is a very important problem that suffers from phase ordering between different stages of the compilation. So uh our next speaker Ginta will introduce this work that teaches us how to optimize simultaneously both at high level and low level.
Um hello everyone. I'm Ginhel from UC Santa Barbara and I'm going to talk about my uh this is the joint work with J on our uh improving equality saturation for EDA via semantic egraphs and here are my collaborators.
So we have two contribution and we introduce a egraph varants called semantic egraph that capture the semantic equality instead of only the syntactic equalities and we implement a EDA framework called next map on based on the semantic egraph that bridge the higher level and lower level abstraction. So more specifically at hardware um EDA compilation flow it's word level and bit level and we run more sim passes simultaneously than the previous eqet tool. So first about our motivation hardware synthesis is like the uh compiler for hardware and there's long passes along pipeline of passes like there's a open source tool called goes had around 50 or more passes for one FBJ back end and uh all all of them are like manually uh ordered. So there are two kind of passes we are caring about. One is about optimization and which is backand independent transformation and we also care about the technology mapping where we want to lower the uh net list into the target primitive. We are targeting the FPJs. So we care about the lookup tables, flip-flops and the SPS and uh the mapping quality influence the quality of results heavily including the area delay and power. And today actually the uh mapping tool is um traditional mapping tool is doing syntactic pattern matching. So for the um block like DSP lookup table and BAM they have a handcoded library of syntactic patterns and the mapper will pick this pattern like only there's a exact syntactic match we can do like a matching here.
However, this uh this is problematic and here is a very common failure pattern and suppose a previous pass output uh the expression like a multiplies the b plus c. However, the uh encoded pattern in the dsp library is the uh sum multiply a we don't get this dsp match and this will end up being uh worse performance. So um the the reason is syntactic matching only matched one forming a time. So this is where the equality saturation comes in. So equal saturation removes the single form limitation. We can maintain all the equivalent expression in the same e-class and uh the mapper can match against all equivalent form extract only once at the end of the uh flow. However, um this is not true between uh stages because since this pass has a lot of stages and at each boundary, we have to like extract one form uh from the previous stage and hand it off to the next one. For example, we may do the optimization in equal saturation but hand it to the uh hand the result to the separate mapper and the face ordering issue returns. So here's an example. Uh suppose we have it's the same DSP example. Suppose we have the original egraph and we can add a red pro for distributivity and as we're even saturating it more. So the uh the extracted term we can assume it's a multiply b plus c it's extracted and marking green and the mapper search for the some multiply a pattern. though it fails and uh just use more resource then then we can actually do better where um the uh we can do the mapping inside the egraphs so we can successfully have the DSP this or this design mapping a single DSP.
Okay. And so our goal is to run more synthesis stages simultaneously in one egraph. But the uh there are some problem that makes it not uh that straightforward. And so the first uh thing is prior there are a large spectrum of previous exact work including like rover seing and all all this tool is limited into a single stage and they're working for specific optimization go and work only at one level and none have been run on like uh both optimization and mapping simultaneously and none have been on both sword and bit level. So our take on this is that there are no prior exact representation that can really bridge the word and bit level in one egraph.
And so the second problem is the direct syntactic word bit and bridge is like we'll expose the egraph. This is a very classic egraph issue. uh if we use a varog slic concat by syntax for example with a three bit signal s we have like four different form of expressing it in the egraph and uh this will give us actually exponential many different form to be to be rewrite and discovered. So our fix is to use the underlying semantic equality that the underlying bits of two bit factor is equal means that they the two bit factors are equal and we the insight is very straightforward but we can use such semantic environment to replace the exponential management syntactic form.
Um so there's another question how do today's engine maintain this environment efficiently.
So actually we can um do a implementation in within the e-class analysis framework. Uh so we can make a like relation of has bit for a bit vector to has it concrete um bit in a vector. Uh however it's not efficient.
Um computing the bits directly is cheap but maintaining the equality is not. So at least we need two uh different direction of propagation of the equality. either a forward uh direction where we want to propagate the equality from the bits to the bits vector or we want to do the backward propagation from the bit vector to the bits and we need to do pair wise comparison between the uh bit vector or the like class analysis facts. So this is uh our motivation for introducing semantic egraphs. Um and so there are three problems and we have three ideas. Uh in order to deal with the face ordering issue, we uh bring the multi-level representation and in order to uh deal with the direct syntax bridge explode issue, we introduce the semantic eid which replace the opaque eid and uh reflect the semantic value of the e-class. And in order to maintain the environment efficiently, we we have some operator like semantic constructor and decomposer to uh make this happen. Um so let's have a review from the uh egraph eid. So traditional egraph has the opaque eid and the it's a opaque identifier could be a integer that represents the egraphs um but not carrying a semantic meaning and we are replacing it with a semantic identifier from a userdefined domain and it is the e-class semantic value and there is a concrete value uh for for the like successful interpretation case. So we can merge the equal SE ID classes and we also have the uh back fallback case where we we return a symbolic SC which is equivalent to the opaque key ID. So we have the semantics as the first class element in the egraph and in order to maintain the uh se we use and the egress x3 operation add merge and rebuild and uh add some small semantic extension to it. For add we add a semantic constructor for s merge we add a semantic decomposer and we all know that add implementation is lazy. So some union happen at as merge will be delayed to the reboot. So we are also doing similar delay for the SE update and we will need to propagate them in the reboot as well. Um so for insert this is the standard insert uh or add and what we add is we compute the SC via a semantic constructor for this case it's a concat and uh if we find a semantic eid that already exist with another e-class then we find a free equality or we can just do the merge so the constructor can establish the equality at insertion and this equality is not discovered uh with right row and for as merge what we add is uh if both SE we have for 2D class are like composite type and we we can have a semantic decomposer to split the merge into component merge and this can happen recursively so basically we are propagating equality downwards to its component and we also have rebuild so what we are doing in addition is the reboot process can propagate the SCI in both direction either upward to the recomputee of the uh semantic constructor and of the infected parents and also if some merge happens we may trigger the downward propagation uh by the decomposer and it we iterate through a fix point.
So we have some um property the sag is sound because our equality comes from either the syntactic rewrite rules or from the uh relation derived by our syntics.
So no uh like fake equality are invented and we are we are maintaining the list congruence of this two relation. So it's sound and uh also review terminate and uh we are back into our next map tool.
We have the uh tool for hardware implemented based on semantic egraph and the uh we are working on simple domain. So the bit vector SC is the vector of its underlying wire and we have some uh hardware E-class types is like the wire wire vector and register and we also have cell which are like primitive operation over this types and we have the both bit level and word level logic gates and adder mods/ concat on on word and some DSP or technology dependent resource they will be selected at the extraction.
And uh here is a slat round trip example. The goal is to recover the uh like we originally have a 4bit Y vector and uh we first do the slice and then do the concat and we are maintaining their SC ID in the process. So we can find the concat node has the same SC ID as the original R vector. So there is a automated merge happening here and without red rows and here is another example uh for cross level uh single size of an adder. So we have a type of bit blasting row for example to a two bit adder. We have the uh word level expression in the egraph but we use the rewrite rule to make the bit level like mat list implementation in the egraph as well. So suppose we have another three bit adder and it's also bit blasted. We can tell that the uh lower two bit of the three bit adder are the same as the uh two bit adder output. So uh the reason this can happen is the word level slice sharing can be propagated from the bit level net list.
So the takeaway is the uh word level and bit level together in one egraph can expose us more um possible sharing and more optimization opportunities.
Oh here's our workflow. We have the an RTL in and have have the uh semantic egraph do the um search and optimization and mapping and we export everything into IOP constraint do the extraction and uh output the uh map design as a final result and all our synthesis passes share the assembly graph. We have technology independent rewrite including the arithmetic busting and some logic and some retiming. We also have technology mapping uh rule for uh based on a user supplied tech lily and we target have target cells like DSP DAM and express right rules. We have to have some recursive pattern for target some repetitive uh cell structure and on our extraction after the saturation we we do some encoding in IOP to extract a type consistent implementation from our stack and the general model is uh there's a binary variable pick one cell from each e-class and and we have the constraint for driver consistency and to mean and we minimize the resource cost and and we we have the legal loops uh checking so basically register uh our model in in our design. So the feedback loop are expected and we don't uh allow the combinational cycle but that's actually automatically handled. Um the third point is we we have the timing aware and uh constraint encoding by the critical path based on like passed by the e-class and enode and the combinational loop will imply the unbounded delay. So because we have constraint on delay so we will automatically uh eliminate the combination cycle issue and for the complex technology cell uh we deal with some non-uniform input output timing and internal pipeline issue with the split and bind method for our evaluation we have two research question the first one is is next map uh quality like how is the quality compared to the exist the same tools and the second question is does the uh running optimization and mapping simultaneously actually helps. So um we have two more in paper and for the first research question we compare against fiosis and uh commercial tool compare against uses we are better on every resources and over all 25 DSP at scale commercial compared to commercial we are competitive and there are sometimes better but we are overall like at the same scale and we have some benefits on some designs so um the running time is like overall short and for largest 4,000 400,000 and wire size design we run for 32 minutes and both time is inpatient with commercial tool at scale and we have aation study uh for neckmap against the face order next map and we have identical rule but the uh uh they are executed in separate faces with like extraction between each and we can say that next map always ties our beats the fifth order next map and we can sometimes gain very substantial gain from it. So our conclusion is the problem is the prior except will stop at one stage and we propose semantic e graph to m and get more semantic equality efficiently from the sid as a first class of uh encoding of semantics and we implement the next map tool to bridge the word bit level um uh do more synthesis passes at once and get comparative result to commercial tools and this enables optimization beyond the reach of vanilla equa and actually beyond hardware we believe this should generalize to any domain with like composite type or some structure semantic value including string set polinomial so we can see there could be some potential there and for future exploration okay so that's all thank you well let's thank our speaker uh we have time for about two to three questions Uh, >> cool. Thanks. I'm from MIT. I'm just curious, uh, just one question about your evaluation. Did you hit the same critical p or the same max frequency on everything? And did all of the evaluations take the same like wall clock or or cycle count duration? Like is it doing any like wacky, you know, pipelining or anything that >> um I'm not exactly the person doing this evaluation. Uh so I al just to check with J.
>> Okay. Thanks.
Uh I have two question and the first is basically the same and uses is basically not considering the performance. It does not optimizing the fine find out something like that and maybe that's why the evaluation is not recording the max frequency. And my next question is uh the methods seems quite general and may be applicable to multiple back ends uh beyond FPGA right and but in in experience different back end may have vastly different performance uh based on even they are optimized uh through a identical passes. So I wondering if there are any solidities behind this.
>> Okay, thanks for the question. I think for the first one um even though I'm not sure about the detail, I think the framework can support different kind of like um kind of we can wait different kind of optimization go together. We can include the fan out and the the the power the area. If you have some estimation over it, we can add it into the IOP like formula. That's doable. And for the second question, um yeah, I I think it's uh it's very exciting that uh we we are actually we start everything from from like working on EDA with equal saturation. So um and then we we come with a tool and we want to like develop a a series over it to to like explain what's happening here.
Definitely um we can try more other semantic domain that may have different uh implications there and I think it's generally like very promising but there are also some issues that different semantic domain may influence the um maintenance of the SE. So uh it's uh we may not lose we may lose some like good property we can claim here but it it's still like I think it's very exciting to to be explored.
>> Okay, let's thank our speaker again.
So our next speaker is Gang uh who is going to introduce this properly named framework called Quaser that applies equality saturation to optimize quantum circuits.
I'm from university. Today I'm going to create our >> Okay. check. Okay. So today I'm going to present our work equality saturation for quantum circuit organization. This is a joint work with Peggy Ren and Reno from University of Maryland and Ronu from both Columbia University and Certic.
So our project is primarily focusing on optimizing quantum circuits which are composed by a sequence of quantum gates operating on set of quantum cubits. And the motivation here is that that why we need to optimize quantum circuit is that every quantum gates carry noises which means that when we are running a quantum circuit over a real world quantum hardware each gate are subject is subject to error and may produce incorrect results. So therefore the more quantum gates that the operate the circuits that we run for on the hardware then the higher probability that we can get an incorrect answer. So the optimization objective for us is to minimize the number of the noisy gate over the whole circuit while preserving the functional equivalence between the original quantum circuit and the optimized ones. Therefore we can maximize the correct rate about each quantum circuit of the quantum cir algorithm that's that namely improve the quantum circuit fidelity and the most advanced and the most uh widely used quantum circuit optimization method for now is a circuit rewrite that is a pattern matching and substitution to pattern match a subcircuit inside the to be optimized circuit and substitute with another one and this is quite similar and with the people optimization in classical program optimization. So here is a few examples here. For example, the syn not gate cancellation rewrite rule will cancel two adjacent c not gate into an empty circuit and the rotation merge rule will merge two adjacent rotation gates into only one and accumulating their faces. And there are also some uh commutation rules to rewrite and reorder two adjacent not gates or the same not gates and to reorder their position in the gate sequence.
So having seen the uh rewrite as the primary optimization results the next question is that how optimizers apply those rewrites and we found that this not a trivial question and this will later will limit the capability of optimization in in the quantum circuit optimization. So we do a survey and classify most of the quantum optimizers into two two categories. The first is a hurist based industry level quantum compilers in including the cuskit and ticket. They manually crafted a carefully designed small set of rides including tens or even hundreds of rides and they apply those right in designed uh se re sequence or grad function or grad algorithm and another type is a searchbased algorithm in most in the state-of-the-art and in the recent years uh including the quas quo and guac instead of using heristics they leverage the super optimization idea based on the classical optimizations to automatically synthesize a large size of rewrites composing thousands, tens of thousands or even millions of rewrites. And they and then iteratively apply those rewrites selectively. In each iteration, they will apply only one single single rewrite. And this right is selected from such a large resolute set with a coupled with a search specific search algorithm such as the beam search the simulated nailing or the reinforcement learning.
However the question is that no matter what optimizer they are no matter what search algorithm that they use no matter they are heristics or the search based all of them apply those rewrite sequentially. So what is mean by sequential? The equation means that the rewrite just take the output about the previous rewrite as its next rewrite as its own input and to do that iteratively.
So what's the problem with the sequential? So let's use an example to show how the sequential can lead to the suboptimality during the circuit optimization. So this is a real world quantum circuit example and this is a correct optimization path and we can see that it composed by four sequential rewrites from T1 to the T4 and uh in each in each transformation in each right it will do transformation. However in the real world optimization case we not we are not that always so lucky to get to pick such a correct optimization path. For example, what if the optimizer can apply the right T3 here and this is not that not not that rare because uh in each iteration and each rele application we have thousands tens of thousands and millions of rewrites that can be picked and if that transformation was applied instead of T2 then it will uh and so that in that case we are not that lucky and we'll get stuck because after transformation there is no possible pattern match that we can found with the current our coupled re set. So therefore the optimizer found that there is no pattern match found. So therefore we can the whole optimization got stuck and so no further improvement. Yes. So this phenomenon that the order of the order that we apply those optimization can influence the final optimization result is not a fresh result. is it's a wellstudied problem called the phase order problem studied by the leverage at all in 1979. Uh although this phase order problem has been well studied and meditated in classical program optimization we found that the since that all the quantum optimizers previously apply those optimization in a sequential manner. So the phase ordering problem limit their capability and remain to be unsolved.
Therefore we present quaser equality saturation for quantum circuit optimization. First it handles the sub optimality caused by the phase ordering problem and other root causes by using equal saturation and leverage egraph to represent to represent and to do the exploration. And second, it proposed a se a series of optimizations to both reduce the circuit write redundancy during exploration and also improve the graph excalibility on the circuits.
Finally, our evaluation shows that quer stands for the best optimizer across all the option times and all the possible metrics and over tens of over over 100 circuits that we uh our benchmark included. And this is figure shows uh the overview about the quer equalization and I suppose all of you have been quite familiar with this equations and so it's just a brief talk uh it's just a brief briefation that the each input circuit are representating to the egraph and we do the right to explore all the egraph and finally do the extraction.
So here uh first it will do a initialization to the e-graph and for simplicity here we just to encode we just represent the quantum circuit as a sequences and represent encode them into the egraph and for the paper it have multiple decod encoding meth uh approaches. So it will first encode the input circuit one into the egraph and to explore in the exploration it will do multiple rewrites. For example, here 32 re will do a C not gate and not gate computation and store such a intermediate optimization result into the exploration and put them into the same classes with the original one and then it will do a notgate cancellation to that is matched with the circuit two variant stored in the egraph and this this variant circuit three will also be sold into the uh egraph. Finally, it will do extraction under the user specified cost model. Uh and in this case, the best circuit is obvious is definitely the circuit three because it has the least number of the uh noisy gate that is it only include a C not gate and so it drops the circuit one and circuit two and given that extraction result of of the e-classes and ENOS it will reconstruct the final output circuit.
However, there are few challenges for equalization equality saturation to apply to the quantum circuit. It's uh it's driven us to develop a series of optimization both offline and online. So uh due to the time limit I will only discuss about the offline optimization and leave the online optimization works for the scalability issues to the paper.
So the offline the challen major challenge is that we found that there are so many rights that we can apply during the across saturation and this prevent us from achieving uh valid optimization results before running out of resources.
We can see that many previous works has exploited powerful rewrites including the qu coral quest on the gawk. They in includes tens of thousands of rightes.
And if we just apply them naively to directly apply all of them to the graph and after even after 10 iterations it will run off memory and you know about 10 100 gigabytes of memory and in about and cannot generate any valid results in about two hours in most of the circuits.
So the bottleneck is about the e- matching right or when we are applying so many distinct rewrites simultaneously to the egraph this matching pattern matching will indeed cost a lot. So our insight here is that although these rights are exploited by those sequential optimizers they are powerful they are useful but they are not that they are not that necessary to include all of them in the cross scen scenario they are composable and redundant redundant in the graph case and even even better we can find atomic rights that are small enough that are orthogonal and non-redundant and well and well at the same time preserve the optimization capability as strong as the original larger set. For example, this is a three cubit and five quantum gates rewrite and we can find it reduced quantum gate from five to two and actually it can be decomposed into three distinct and smaller atomic rights that are composable include smaller number of the gates and cubits.
So how can we find that one? So a rewrite is decomposible is composable and redundant under a rewrite set R as we previously discussed can be as long as it can be decomp it can be composed by a sequence of combination of sequ sequential rewrites from the left hand side term to the right hand side term and we found that finding such autoomic rights can also be done by leveraging equality saturation.
Given a left hand side term, we can apply a currently refound uh atomic right to the and to saturate such a left hand side term. And if the right hand side term is also be found in the E-class then that means we can find a equivalence equivalence combination or the right red trace from the left hand side to the right hand side. Therefore this uh this right can be safely uh reduced without any degrade about the optimization capability. Uh and if that's not found this atomic right can be added to the uh current automic reset. By leveraging this technique we have squeezed the right over tens of thousand into just exactly 40 and this is a prep pre-processing offline optimization to achieve that.
Uh so as previously discussed we only talk about only one optimization and so and we will jump to the evaluation. We learn about 40 automic rights and in we evaluate our quer optimization results over 176 algorithms with uh over the IBM eagle gated set with the near newest quantum parameters.
Our evaluation shows that quaser is the best optimizer across all the optimization times. Which means that as shown in the figure that uh with x-axis to be the optim optimization time we allow each optimizer to run from 1 minutes to or two hours and the and the y-axis represents the gate reduction rate as a matrix for reevaluating the opential quality. The higher the better.
We found that uh in as shown in the orange line, the quizzer can dominate all the can find the best results across all the benchmarks over after we left him around two up to one minute and it will dominate all the optimization results. Furthermore, we found that quasi is not only the best optimizer on all the times but also on all the metrics. We evaluate quaser on four metrics that all the previous operator evaluated including send not gate reduction rate, total gate reduction rate, fidelity and circuit deps. uh and most importantly it can beat cuskit in over 100 or 100%age in the c not getate reduction rate and in general it can mostly find better solution over 80% of the circuit circuit than all the others and we also show that compared to the other uh re sets the autom set is the best set that can be the fastest and have the best optimization quality.
Okay, so time to summarize we present quaser equal separation for quantum circuit optimization. It for solves the sequential optimizer subop suboptimality challenge caused by the phase order problem by introducing equal saturation on the graphs or over or circuit and it also reduced the revered root redundancy during the exploration by introducing an offline automic root inference also by equality separation and also in our paper we also discussed the scalability of equality saturation over the over the quantum circuit by pro by introducing two two solutions solution. So one is equality saturation based solution called schedule equal saturation and another is a quantum driven solution called based reordering to use integer linear programming to reorder the gate sequences thereby reduce the efforts in the exploration and in total it is the best optimizer for now across all the metrics and all the parks. Thank you.
Quer is open source and please scan the code to try it out and I'm happy to take any questions. Thank you.
Okay, we have time for three questions.
>> Hi, thanks for the good talk. Um, so most equality saturation domains don't at least they don't have so many rewrites and in your case you minimize it to a smaller set but why are there so many rewrites at least to start with in this domain? Oh you mean the initial right set about yeah the one that was >> yeah we will you this is because right are the comparing to rewrite are the term of the equation comparing of the left hand side term left hand circuit and right hand circuit right and this equation can be generated by uh by enumerating the possible combination about the atomic gates and each and this equal equivalent checking can be done by linear algebra checking because each quantum circuit can be represented as a uh linear algebra matrix. So and therefore and since that each red set is not that big it only include several QB cubs and the each each terms include several less than 10 quantum gates. So that comparison over the matrix is not that big is uh is it cost is low. Okay.
Thank you.
>> Hi uh thanks for the talk. Uh so my understanding is that in quantum uh there are often rewrites that can be applied with some probability and in this case as far as I understand like all the rewrites are >> unconditional >> unconditional. So I'm kind of curious if you think that egraphs can be used to apply these probabilistic rules. Yeah, this probabilistic based rewrite rules and are also including the reynthesis based quantum optimizers that I do not discussed in this paper and these are considered as an orthogonal work to apply those probabilistic based rewriting and the quantum resences and integrated into the quaser. Yeah, and we are expected to this to be future extension to our quer work. Thank you.
>> Um great talk. Um there's a well-known complete equal theory by Andre Clement and co-authors um for quantum circuits and I believe that that's under um 50 equations probably on the order of 25 though you can correct me if I'm wrong. Did you try plugging in that set of equations into the egraph tool and was this was this better?
>> Uh sorry excuse me which right? Um there's there's well there's a few papers but there's a paper by Andre Clam and Simon Simone Peris and others that is a complete equational theory for quantum circuits. So that should get you from any quantum circuit to an equivalent one and I believe it's under 40 rewrite rule. So I could be I haven't counted them. Um, so I was just wondering if you tried plugging in any of the results from those papers into the egraph tool and wanted to see whether or not you could get >> I think we can do such a evaluation to evaluate the quality about the uh clemens at all right set comparing to our atomic right set in future. Thank you.
>> Okay.
>> Okay. Uh let's thank Gansang again.
Uh so for our last talk of this section um Zachary Cisco will tell us about funible memories. So as a normal person I'd like to think my memories as nonfgeible because you only live once.
However uh it turns out that this is not the case in computer hardware. So this talk will introduce the abstraction of functional memory that pushes the state-of-the-art on hardware compilation and retargeting for memories. Okay. All right. Hear me? Great. Thank you for the introduction. Um okay. Yeah. So uh like the introduction said in this work we're focusing on a a particular hardware design problem. And so yeah, we're proposing this language abstraction called fungeible memories which gives us a write once map anywhere memory abstraction that hardware developers can use to target many different memory technologies from a single generic interface. Okay. Um in this work we incorporate funible memories into a memory compiler which automatically targets many memories uh according to different technology constraints and then we also go in the other direction.
We also build a memory decompiler which lifts memories from existing gate level or simulation ready designs up to fungeible memories to enable technology retargeting. Okay. And so to accomplish this at scale we developed a structure aware equality saturation technique which scales to net lists with millions of cells and identifies memories that state-of-the-art tools cannot. Um and so while we're in the equality saturation se uh session uh this work really touches on a very specific hardware design problem and so I want to kind of set the stage a little bit give an example right so how do hardware developers design memories right let's consider a register file right so here we have a nice little block diagram of our register file has three source inputs a destination register to write to if we're using a hardware description language like verarilog we would normally describe it like this in a behavioral way, right? We can model our register file as a two dimensional register, two dimensional array of registers. We can use familiar array notation to read to it, to write to it, and that's great, right? This works fine for simulation. Um, if you want to deploy this to something like an FPGA or any real piece of hardware, uh, this kind of implementation does not always work. Okay? And the reason is that languages like verarilog do not provide technology specific constructs in the language right a device like an FPGA has things like BRAMs to to describe memories and verilog doesn't really know what a BRAM is right now there are some uh mitigations to this which I'll talk about in the evaluation which we compare to um so in this case what does a developer do if they want to deploy this on an FPGA or or an ASC or anything like that well they need to write a custom mapping for this register file to be able to map it to BRAMs on a particular FPGA. Right? So in a language like Verarilog, what we see a lot are these if defaf blocks. So developers will split parts of their code uh for each memory into a separate if defaf for each technology target that they want to work for, right? And so our developer here will make a new if defaf block for their FPGA target. And so what do they put in here? Well, for that particular FPGA, they'll read up into their uh they'll pull up their vendor manual that describes how to instantiate that BRAMM and they'll they'll figure out how to do it. This is one example of a particular mapping. It has all these different parameters and ports that they have to hook up in the right way, right? And it's not always that easy because this mapping uh can be a little tricky, right? It's not always the case that we have a onetoone mapping between our behavioral memory design and the one that would conform to a particular technology, right? And so in this example, the interface for a register file has three read ports and one right port. Uh a typical FPGA may only support dual port BRAMs. And so the developer actually has to make a custom mapping which will kind of map their three read port one right port memory into a configuration of multiple uh dualport BRAMs which emulates the behavior of a three read one right memory. Okay. And so these these are transformations that the developer has to make. Okay. And this is just the register file. Uh this is a little data path diagram of a you know a CPU. If we're developing a CPU, we have many more memories in our designs. This mapping process has to be repeated for each of these different types of memories and they have their own configurations and semantics as well. Right? So this makes refactoring a pain on the codebase. Right? If we want to support new technologies, we need more if def blocks if we're using this kind of technique. If we change the behavior during the development process, the design process, we need to refactor those blocks in a consistent way. Right?
And so the takeaway here is that technology constraints um make this implementation brittle to design changes right and the language that we're using verarilog isn't really providing us any helpful abstractions and so that's where fungeible memories come in so funible memory enhances our hardware description language because we try to capture the shape of connections and portedness of memory blocks in hardware designs. Okay. So in this paper we formalize a an algebra of fungeible memories reports in memory blocks or objects that can be constructed and rewritten according to a set of semantics preserving rewrites. Okay. Uh this is a list of just kind of the the different uh rules that we include in the paper. This involves splitting memories by you know read ports, right ports. We can also merge them together.
We also do uh rewrites according to the shapes of our memories. You could split by width and height and so on. We can also do casting between port configurations as well. So this is an example of a readport splitting and merging rewrite. Right? We could start with a memory uh that has k read ports and we could split one of those read ports out into a whole separate memory block uh in this way. So the visualization is on the top and and how we actually formalize it as a memory is on the bottom. Okay. Uh we can also support other kinds of kind of function splitting and merging as well. Certain memories might support optimizations like writeto read forwarding where you would forward the value of a right to a readport, right? Some memory blocks would support this as a flag and some don't. And so depending on your technology constraints, you would either want to kind of fold that logic into the memory if it's supported or if it's not, you want to break it out into separate logic so that you could still map the design, right? And so in in the paper we describe a handful of other uh rewrites over funible memory algebra. Uh and then they have various we have a formalization of that. Um and the last thing I'll mention about this is because this fungeible memory algebra is kind of introducing a sort of new equational theory of of of memory blocks and hardware. We wanted to have some confidence in it. And so we spec we formalized uh funible memories in lean because we wanted to prove that the rewrite rules that we constructed actually preserve behavioral equivalence according to memory rewrite semantics.
Uh so that's described a little more in detail in the paper. The lean source code is also in our artifact as well.
Okay. So with funible memories in hand uh we incorporate that into an existing uh hardware description language so that we can do this automated memory technology mapping. Right? The problem kind of roughly looks like this for our memory compiler where we have a set of funible memories. We have some technology constraints and we want to try to figure out using our rewrites how can we reshape our memories to fit the technology constraints and that gives us a satisfying mapping. Right? So as one concrete implementation of this, we extended Purle, which is a a Python embedded hardware description language to support fungeible memories. As one particular case study, we ported a five-stage risk 5 CPU to use fungeible memories. U one benefit of this is that we were able to kind of shrink our memory implementation in purle from 130 lines of code to three, one for each memory block in the design. So very kind of declarative u memory specification now and this also comes with the back uh with the benefit of getting a bunch of different technology backends for free.
So that's something that you know purle really just focuses on simulation um but now with with our with this automated technology mapper we can then target many different memory backends.
Okay so that's the memory compiler direction. Uh the other direction for this is the memory decompiler, right?
And so this is important for this memory retargeting scenario, right? Um and so there's a lot of different uh kind of more and more open uh hardware designs out there. Uh it's common more and more for developers to compose their designs kind of with different components, put them together. Um and some of those may have been de developed with a different technology in mind or maybe developed with maybe just technology abstract, right? Um so what if we wanted to integrate those components into a new design and retarget it to a new technology? That's where memory decompilation comes in. We want to identify these uh these memories in our design and lift it up to funible memories. Okay.
So the way that we do that um I kind of give a really brief overview of how this works. Um so this would you know in in in kind of the the most basic case you would start with a gate level net list and you would analyze that gate level net list you have all the different flip-flops and logic and try to identify memory blocks in there right so the first step would be grouping uh flip-flops into wider registers based on common enable signals and then kind of building out different memory block candidates we have a set of rewrite rules for identifying readport logic uh which often works over like a mux tree reduction. We have kind of a symmetric set of rewrites for identifying rewrite port logic which kind of works over demox uh reduction. And then putting all those things together, we then identify all those common pieces and extract uh all the different funible memories in that design.
And so the key technique to doing this is a quality saturation. And it's really kind of a perfect setup for a quality saturation because we have this huge uh kind of uh blasted out gate level net list, thousands, even millions of low-level logic cells. And with a lot of the aggressive logic optimizations that hardware synthesis tools may perform, it's really not obvious what sequence of rewrites will reveal this kind of higher level structure that we're looking for.
Right now for anyone who's uses quality saturation like rewriting over an egraphph with millions of nodes is not going to be a fun time. And so in order to actually implement this and get it to work, we had uh three key optimizations to to help with this. Uh the first one we call structure aware scoped rewriting, right? Um and so a lot of these logical rewrites that we might perform over a gate level net list can apply almost anywhere in the net list, right? And that's going to increase rewriting time and grow the egraph unnecessarily. But since our goal is memory decompilation, really only care about signals whose source or sync is a register. And so we can kind of we do this sort of like structure aware rewriting to really cut down on where we're applying some of these uh rewrites, particularly MX tree reduction. We really only care if that's you know connecting to register sources.
Uh so that's a really key one. Another very useful one is subsumption.
During register grouping in the initial phase, we we only really need to choose a single representation of grouped registers rather than maintaining every possible grouping of of of flip-flops.
Uh so we will subsume all other groupings. That's also a a big saver.
And then another uh really helpful one are inverse transformations. And so this comes from uh the Sinsky work. Um and this was this is about inserting speculative equalities into the egraphs and later validating them. And so we adapted this to net lists because uh using inverse transformations is really useful when we're rewriting over net lists because hardware synthesis tools might apply these really aggressive optimizations which we need to undo to kind of reveal some latent structure that was hidden by those optimizations.
So that's a very helpful uh uh technique there as well. Got it. And then the last one and this is probably the most twisted one is that we implemented our equality saturation engine in SQL light.
Okay. And and so we know from egglog and other uh recent research that there's this beautiful connection to relational databases and equality saturation.
And we took this to the extreme and to good effect, I would argue, right? Um, and so kind of doing things directly in in SQL, we could really easily uh implement a lot of these optimizations without maybe some of the guard rails that uh quality saturation frameworks might impose and certain phases like grouping registers by common signals. We can do those kinds of queries very fast in relational databases, right? And so this kind of helped us break some rules uh for for doing decompilation which I would argue since we're decompiling anyways it's okay to be a little messy and so we could sometimes skip uh some expensive steps uh which which normally come up. Okay. Um and so putting all of this together uh we have this tool called memo which uh comprises the compiler and decompiler for funible memories. Uh looks like this. The user would either input a design expressed in purle extended with fungeible memories which would then go through uh the automated memory technology mapper alongside different technology constraints. We have a number of backends spanning simulation platforms and ASIC and FPGA techn memory technologies. And then alternatively, a user can input an existing design as a net list, go through the memory decompilation step, use those funible memories to then retarget to a new technology.
Uh I I'll briefly go over a bit of our evaluation here. We have more more details in the paper. But a big aspect of evaluating this is actually with the memory decompiler because this is enabling us to compare with exist compare over existing designs and compare with related techniques, right?
Um and so in this table we have a handful of uh open source hardware components that include things like caches, fifos, a couple of small risk 5 cores and other SOC components, right?
And so in each of these designs there was an expected memory and sure enough we we were able to decompile and identify those memories and their kind of original configurations.
Decompilation time is fast uh for these and what we wanted to do is we wanted to compare it with something that was the most similar kind of uh operation what we're doing and that would be memory inference in in in hardware synthesis tools. Um and so hardware synthesis tools like Yosis and and and and other uh FPJ synthesis tools have a thing called memory inference which is more of a a syntactic pattern matching operation where it looks at the the verilog code kind of matches over syntactic template and if you don't perfectly match that then it's not going to identify the memory and so memory decompilation works over the structure of the the the net list rather than just a syntactic template. And so we're able to identify memories uh over designs that Yosis missed and and even some uh commercial tools missed.
Uh we have a handful of large scale case studies uh from some open source hardware designs where we do memory decompilation and then also uh do retargeting to FP to different FPGA and ASIC technologies. One example is the Spark core has uh the the net list that that we decompiled had more than 700,000 gates. Memory decompilation took about a minute and we you know recovered all the expected memory blocks.
A bigger one we uh decompiled an L2 cache that was one and a half million gates took about 2 minutes and indeed we you know identified correctly identified all the expected parts of the cache tag state data blocks and all of that. And then our biggest uh case study was the open- source black parrot risk 5 multi-core SOC. Uh the uh kind of synthesized core for this is about 12 million gates. Memory decompilation took about six minutes and we went through and checked and we identified all of the expected memory blocks in that core in the front end and the back end. Okay.
Um and so to to wrap things up um kind of more broader level uh takeaway I think uh in hardware description language design uh technology specialization is perhaps a bit overlooked. Um I think there are indeed some useful abstractions that we can kind of pull out to improve the the development process in this way. I think fungeible memories are one instance of this observation. Right? Right. So we developed this kind of equational theory of how we can think about uh memory design and then using powerful rewriting techniques like equality saturation we can you know improve uh different aspects of of synthesis like tech mapping and we can do it at the scale of real world SOC designs. And so my question to is you know what other equational theories are hiding in the hardware development process that we can kind of think about and build into our tools. Uh so like I said our tool memo is open source that is the compiler and the memory decompiler and its form uh sorry it's lean formalization. I would encourage you to check it out and uh yeah thank you all very much.
>> Well thank uh we have time for two questions.
Sometimes the synthesizer may using some correlation analysis to identify that some parts of the memory is uh uh is is dead and it could be eliminated. In that case the signature of the memory is changed. Is that a problem in your tool or we are stay faithful to the users specifications?
>> Sure. Yeah. This this actually comes up really importantly in uh our rightport uh decompilation stage. So when we're looking at things like you know register files in CPU designs a lot of times we would find that the the synthesis tool would uh kind of erase the demox like the parts of the demox tree related to the zero register right things like this. And so that's where uh techniques like in inverse transformations really help us because they help us undo some of those aggressive optimizations that synthesis might do. Yeah. So that is one example of that.
Okay.
Uh thanks for the talk. Uh it's a bit sad to see that people are would rather use than egglo. Uh but I'm curious about your experience like building an egraph engine on top of SQLite like what's good and what's bad about that.
>> Let me first say I love Egglog. I think it is a wonderful tool and and we did a lot of prototyping with it. Um I think the the experience of writing uh an equality saturation engine is SQL I think is is arguably worse than writing one in egg log because it's like you're you're writing SQL queries and things like that. So, it's it's not as ergonomic as as a language that was designed for doing what it was like what what what Egglo does. Um, but I think that the trade-off is we had so much more direct control over the schema like over what we were doing. We're able to kind of be a little more messy. I would I I would be interested in seeing is there is there a front end to this to kind of describe what we're trying to do and then compile that to SQL instead of having to uh I mean you you can you you can check out our open source tools and see what the SQL looks like and it's kind of gnarly but it it it makes sense when when you when you look at it a little closer but it's not that fun not as fun as I glog.
>> Okay, let's thank again.
>> Okay. Uh so just as an extra plug uh Zach is also looking for prospective students and posttos. So if you are interested in quality saturation computer aure architecture or program language in general please talk to Zach. And this concludes our equality saturation session.
Related Videos
LBF101 Creating an XML Changelog
liquibase7511
3K views•2026-06-15
Alta Labs Cloud Dashboard Real time Network & Xnet Insights!
ShinyTechThings
158 views•2026-06-17
Wait... Group Policy Not Applying? Check This First!
keeplearning_iT
144 views•2026-06-15
Leetcode Weekly Contest 506 | Life's boring these days
Pudeesht
2K views•2026-06-14
microJAM: MAKING A MICRO GAME FOR A GAME JAM IN CLOJURESCRIPT AND TOTALLY NOT C
janetacarr
156 views•2026-06-18
Partitioning vs Bucketing vs Clustering: How to Make Queries 100x Faster
thedataandaiguy
194 views•2026-06-16
Design Claude Code Like a Senior Engineer
hayk.simonyan
344 views•2026-06-19
Linus Torvalds: AI Won’t Replace Understanding Code
SavvyNik
140 views•2026-06-19











