This lecture covers the mathematical foundations of probability theory and Bayesian networks for causal modeling. Probability spaces consist of sample spaces, events (subsets of outcomes), and probability functions that assign probabilities to events. Key concepts include joint probabilities, conditional probabilities, the chain rule, marginalization, and independence (where P(A,B) = P(A)P(B)). Random variables are functions mapping outcomes to numerical values, with their distributions being induced probability functions. Bayesian networks are directed acyclic graphs where nodes represent random variables and edges represent conditional dependencies, with each node having a conditional probability table. The Markov assumption states that each node is conditionally independent of non-descendants given its parents. Separation determines conditional independencies by checking for active trails (paths where information can flow) in the graph. Collider structures (X→Y←Z) behave differently from chains and forks: they are blocked when Y is unobserved but become active when Y or its descendants are observed. Markov equivalence classes contain graphs that encode the same conditional independencies, meaning observational data alone cannot distinguish between them. Faithfulness assumes that all conditional independencies in the distribution are captured by the graph structure.
Deep Dive
Prerequisite Knowledge
- No data available.
Where to go next
- No data available.
Deep Dive
CAI - 02 Probs. & Bayesian Networks | Causality for AI & ML | TU Darmstadt | Winter Semester 2025/26Added:
Okay, then good morning everyone.
Welcome to our second causality lecture and um today we will learn something about the um probability stoastic preliminaries kind of as a refresher as you probably should have learned um most of that already and then also a refresher or a short summary about Beijian networks and probabilistical graphical models that we will also use um yeah uh for causal models.
Um before we start some shortness tutorial um things um there will be a stoastics tutorial um probably about the first part of the lecture that is um uh part of the PGM exercise session today. So if you want to um yeah refresh or practice a bit more after this lecture you can uh choose to go there and there will be a colleague presenting or having a tutorial so you can ask questions related to some probability statistics uh fundamentals um and yeah so um there's also a second organizatorial thing about the exam M collision. Um yeah, we heard that there is an collision with computer vision and um it's probably hard to resolve but um for um it will be interesting to know if there are some exams directly in front of or after our planned exam. So if you know something you could show up here or write in the chat or write something on mood but um yeah probably we have to um try to resolve it on an individual manner. You have to write the studio and then we can look for some replacement.
So are there any do you know about any exams that are directly before or after our planned causal AI exam?
No. Okay.
Thanks.
Then we can start with our topics. So um today's lecture has two sections. One is more like a stoastic recap more general things. Um I will also try to keep it short on parts but feel free to ask something. So I'm happy to explain in more detail because some or there's a lot of mathematical uh notation of course and um if there is something that you want to have explained maybe um again or a bit in more detail feel free to ask. And the second section is about basian networks and probabilistical graphical models which are like the technical foundation we need for causal structures and learning causal structures modeling with causal structures. Um yeah so that's for today and so with that we can start with the stochastics part. So like the fundamentals of stoastic is that we explore probability spaces. Probability spaces are um like a structure that has a sample space as the foundation and then um talks about events in that sample space. Events are basically nothing else than a subset of the sample space that we then also might relate to some semantics. For example, if we um flip a coin uh or better maybe if we roll a dice, we can think about like an even result as an event. And of course, this um relates to some subset of samples. And the last part of the probability space is the probability function which assigns probability probabilities to all of the events that um are in the event space. Um there are some technicalities how a proper event space is defined for finite sample space. It's quite easy. We can just take the power set as our event space. But if we get into um infinite especially uncountable infinite sample spaces like the real numbers then um we have to think more carefully about proper event spaces because if you think about a random real number between zero and one and you want to assign probabilities to outcomes every indivi if every individual number has a probability um above zero it's hard to um then they wouldn't add up to one because they are uncountable um many um uh numbers there. So, but we can of course think about um defining probabilities for intervals and then it works out again. So, uh you probably also learned about um like uh boral sets that are usable for uncountable uh sample spaces.
Um yeah, but we mostly focus on discrete ones. Anyway, and if we have continuous ones, um, uh, you can probably just stick with the, uh, bell sets and, um, yeah, don't have to worry about the mathematical details.
Um so with the probability spaces or with the probability functions you can also think of that um or they can be defined if you just define the probability of the um samples because if you think about assigning a probability to every event and the probability of a of an event is usually um in the discrete case at least just the sum of the probability of each sample that's in the event um it's enough to define the probabilities for each sample. So usually this is done with distributions um and in the discrete case every outcome has a distributions and this is enough to define our probability function and um of course they can differently visualize. You can have like the bar charts. You can have like tables. Um and um yeah, you can have density functions if you have continuous um distributions.
But it's interesting to note that for that we haven't um or we um have just sample spaces and probability functions.
But we will come to random variables in a moment. And there's an at least interesting to note that they are in in the first glance quite a different kind of object but they behave so similar that notation wise we usually treat them the same and interchangeably but at some points it's important to note that they are different things and to make clear what we are exactly talking about but I will come to that in some examples. So for a different example in the probability space if we look at two dice then we can also again think about um that any subset of the two outcomes can have some meaning. We can think about events like that the second number is higher than the first one that the sum has some specific value. And um [clears throat] here you also see that if we want to repeat some random experiments and want to like um yeah calculate something out of the results um it's maybe um useful to define a new probability space that has this um calculations or this valuations of outcomes directly in it and that are basically random variable.
variables.
So, but before we um look at random variables with probabilities, you can do a bunch of stuff. You can think about combinations of events um which are um Yes, >> the summation thing I didn't understand.
Sorry for interrupting.
>> Yeah, sure.
>> It put one over 18 in this one. Uh >> maybe I did a mistake.
Um yeah, might be that. Yeah, it's three and it should be 1 / 12 or should it?
>> You have three, right?
>> Yeah. Yeah, it's 36.
>> Yeah, it's one over 12. Thanks. I will correct. Um yeah so with um joint probabilities so the combination of two events you can think about it graphically as um the intersection of two events. So if you think about the sample space an event as a subset and if you think about two events and you want to combine them so that both are true then they um represent just the intersection or the intervent uh the event describing these both events are just is just the intersection. Um if you want to define conditional probabilities you can do that um as well. And um there is uh like also a chain rule for how to decompose uh a joint probability into um marginal probabilities and conditional probabilities.
And for that it's important that you have multiple possibilities for every permutation of um your events. And um there's also something like mar mar mar mar mar mar mar mar mar mar mar mar mar mar mar mar mar mar mar mar mar mar mar mar mar mar mar mar mar mar mar mar mar mar mar mar mar mar mar mar marginalization. If you start with joint um probabilities and you want to get some marginal probabilities out of it, you can um just combine uh the joint probabilities with the um respective uh uh complements and um yeah there's base rule which is basically just a combination of conditional probabilities and chain rule. Um and then for independence um independence is um interesting because um it allows us to um compress our information. We need to calculate joint distributions. Um if we define independence as that our um or you can phrase it in another way that uh independence means that uh one event does not give any information about a second event happening. So if A is independent of B, then B um gives us no information about the probability that A takes place. Um or the other way around of course or you phrase it in um the way that the joint distribution um just factorizes um into the marginal distributions so that we need uh no more information than just the marginal distributions.
And this um you can also visualize in in certain ways maybe starting with the tabulation there. If you see that um usually to define um um a joint um a joint probability for two events, you need to specify the whole four inner um fields.
But if you know that they are independent and they factoriize um uh into the marginal distributions then it's enough to define just two of the um outside um because with the complement again it's enough um to specify one of them.
And so we need less data or um yeah like less entries in in our model or whatever to specify an independent distribution.
And for chain rule you usually also learned about this in school that uh you can visualize a joint um probability with this probability trees. And of course there are different ways to draw a tree and given maybe um given on the task which information is available it's sometimes more useful to use one tree or the table or whatever. So there are much different uh many different ways to visualize probabilities.
Um but there's an even more interesting concept um called random variables. and random variables um can be understood as a way to design a random experiment and evaluate it.
There is in fact so mathematically speaking they are like a function from one from one probability space into another one. Um and they are basically just um yeah assigning numbers to outcomes of the original sample space.
But the interesting thing is that with um random variables you can um uh combine or you can create new in random variables by addition, multiplication and uh you can think about uh something like um a mean. Um so they are usually the way to go when describing something some random experiment and with there's um there again is something like a distribution but again mathematically a distribution of a random variable is nothing else than the induced probability function that measures um our outcomes of the probab uh of the random variable. But um if you draw a distribution that's all again you need to specify this um probability function.
Um so yeah there sometimes the terms are used interchangeably or sometimes with slight differences. I found that myself a bit confusing sometimes but um uh because there's also like different notation you use for the same thing.
So usually if we talk about distributions from now on we will just notate them as P of X but um this is more like a function or the distribution it um it specifies it's not like the same P of A as before because P of A is just the measuring of one event. So it's a number between zero and one. But P of X is a whole distribution above all outcomes of X. So um yeah, but interestingly they behave quite similar with the rules um that are that we just looked at. So um we I maybe go to here. So um we can again define something like a joint probability distribution, conditional probability distribution.
Can think about marginalization, base rule, independence and we can basically just adapt our rules from before and they work just fine. I mean you can prove it but uh you don't have to. Um and here we can now think about um discrete random variables. Um if we want to visualize their um or we want to characterize um their distributions, we can do so again with a probability tree or with tables. With the tree, we now have more possibilities than just one event and one um one complement of the event because the random var random variables can have more than one outcome or more than two. Um so again we have different possibilities how we want to combine them. So this basically again just visualizes the chain rule but now with the individual values. It's also important here that there is in fact a link back between or back to events and sample spaces or the probability spaces.
Um because if you think about a random variable um you can um after the experiment the random variable has um has assigned one number. So if you think about rolling a dieice again um you can define a random variable that just gives you the number of the dieback. So you have again one to six.
Um and um but if you maybe think about um now rolling two dice and you want a random variable describing the sum of the two um uh dice rolls um you can do that as well. And then the new random variable has um like a range 1 to 12 uh 2 to 12. And um with that you can think so what are now our new samples or new events in the um in the sample space or in the probability space that um relates to our random variable. In this new one we have like this um these 11 outcomes um at least 11 with probability um above zero. And um our events in this sense are again some subsets of it. So if we think about um like a event that our sum should be greater than 10 then uh this is just again a subset of our 11 outcomes. But now to measure these 11 outcomes, we usually uh look at our um our inverse map induced by our um random variable and look again uh to our two dice roll to evaluate our um probabilities for the individual pairs that can be um rolled with the dice. Um so maybe just to make the one point clear if we have um yeah I will do this on the blackboard. Sorry for these online. Um if we have um like a random variable taking one value x. This again is an event because this is um uh describes the fact that our um random variable um or that we get the x as an as an outcome. So basically this describes the event just x as um part of our uh new sample space.
So x for the random variable x. But um we can also think about something like x uh greater than um little x. Um then this relates to a subset where um x and x + one and so on are in our event set.
So this again is a subset of our uh sample space. Um and if we write something like P of X greater or not greater not smaller um something like X greater than X then um yeah we measure this set in this uh in this sample space regarding the induced probability uh function.
Um so if we initialize or if we want to um define program some joint distributions again we can do that um with tables. And here is also again like kind of the motivation where we will come to um probabilistic models uh or graphical models later because again if we just want to specify the whole distribution then we basically have to specify all the inner ones. Um, of course you can um maybe infer some of them out of others, but basically you have to define all the inner ones. But if you know about any independence, then the marginal distributions on the um on the sides are enough to specify everything that's in in the inner one. Or if you think about not just two random variables but five and you know about some independencies that at least allows you some sort of compression and it's not only about compression of um like storing or initializing this data. It's also important if you think about answering questions related to probabilities. For example, if you uh want to solve something like a conditional probability, what is um the probability of x= 5 given that uh y = 7 and now you want to answer uh want to get the answer out of your um out of your uh probability distribution. How to calculate this depends um uh on how the information is stored and if there are independencies to use then this is uh easier and faster to implement u than if if not so um so I you probably heard of um or I think last week we also had the the phrase that um Correlation does not imply causation. But nevertheless, um correlation is important uh for thinking about causation. And um with correlation or causation also independence is important because they are quite uh directly linked.
Um and um formally the correlation coefficient is um a coefficient is defined as a kind of variance between uh two var two variables called co-variance and um describes the coupling of the um uh the coupling of the variables um that are moving around their um mean. And um the um like the calculation that the um coror correlation is zero for two variables is like a necessary condition for independence and it's the converse does not hold. That's important. So if we have um uh if we have correlation zero that does not mean that they are definitely independent that um but you can have like the um contraosition of the implication that we have. So that if you know that the correlation is not zero then you know there have to be some dependency they cannot be completely independent.
Um and that we can also exploit then for some causal modeling. We come to that later.
Um so starting with our second section um we will start with basian networks.
Uh who of you has heard of basian networks already and who have has not?
Okay. Um so be bay networks in general are like a um like a structure that is a combination of an directed as a cyclic graph um and some parameters some tables that specify conditional uh probability distributions for every node. You can think of that as you have like this graph and in every node you have this conditional probability tables. Of course, if there's no parent then you have no conditions and um with that you can um compute any uh joint distribution over that graph or over a subset of the nodes of that graph. And this um is the case because you make basically one assumption that the um nodes are only dependent um or the nodes are uh independent from everything um as if you condition on the parents they have.
Um and for that for example um you can again save some space or save some um uh yeah save some information because if you see there's like two parents then you need to specify a lot more cases than if there's just one parent. So um if you think about or it's um also true if you think about the conditions. So if we ignore this and think about we have to um specify a conditional probability for I don't know uh four conditions this needs a lot more um data than just defining it for two conditions. So if we think about just doing chain rule then usually uh we need to keep all these conditions but um with the assumptions of some independencies we can reduce that um down a lot and um of course this uh assumption should be true in some way. So um we can say that if we have in fact a probability distributions that has these dependencies at least then we know that it factorizes at least to our assumption so that we can model it with with a bash network. But it can be the case that our true distribution has even more um independencies. So it could be compressed even more but maybe we don't know yet. So um yeah so there is like the technical term of an independence map that uh describes that um the graph is called independence map if you have this factorization of your probability distribution that you want to model. Um but um as I said it could be that there is there are different qualities of independence map. There are better ones that are like even allow for more compression than you just have with some given graph that isn't independence map.
And if you now think about these independencies from the assumption from the local mark of assumption um you get several um local independencies like uh basically all the ones that are um yeah given with the uh first statement. But there are in fact more independencies modeled by our graph. And there's a nice way to um infer our or to check for um independencies that is not local in a sense that you say to um yeah one is the parent of the of another one. And so uh if you condition on all the parents then something is independent. You can check for any variable um if they are independent or you can also check for any conditional independencies um with the method called desparation.
And um the seeparation allows us actually to find all the independencies that are entailed by our uh graphical model. So um we can also use this to I don't know find a different or either find an uh approach to proving some um mathematical statements about probabilities uh like this one um or we can also use that to um I don't know think about like the algebra of conditional probabilities and what operations are valid or or not. we can just look at the graph and figure out um what is independent and what not.
So um how is the separation defined? Um within a deck we say that two sets of variables are desparated given some set of observation W.
um if there is no active trail between any one of the first and the second set.
So for the um subcase that X and Y just contain one node. So X and Y um should not should not uh should not have any active trial while observing W um to say that they are desparated.
And what is a trail? A trail is like an undirected uh path across G. So um for one moment we um kind of ignore the arrows but we will see that we have to pay attention at least to like the local um orientate or the local structures of triplets where the orientation does matter if some path is um open or blocked.
And there are like four um four possibilities how a triplet um yeah could be there um and there are like two basically two cases. If one of the first um cases um appears then um we can think about it that the information can flow from from X to Z or the other way. If um we have a chain in either direction because we can think uh intuition uh in with intuition that if we have no information about Y then there is some influence of X on Z via Y or the other way around or if X is like um um like the parent of both then Y if Y is the parent of X and Z then Y has an influence in both and So both are somehow related. So there's again some information um across both. But now if we um would uh if we would observe y then in the first three cases y would um stop the or the observation of y would stop this um flow. Um because if you have some values for Y then the influence uh of X on Z is um prevented.
>> Yeah.
>> When you say observation you mean traversing the network and reaching that node like justically.
>> So terminologically it just um I mean formally it just means that Y is in our observation set. What that exactly means um is not really I mean you don't not really need some um some meaning for that more than it's just in the set at the moment but it will relate to any like the conditional proper uh conditional independencies. So if you observe something you can think of it at you condition on some variable. So you have some data for that variable and you condition on the information you have and if you condition on this information then um two other variables in a chain for example become independent. So they are not longer linked. So there's no active path if you condition.
Um and for the last case it's just the other way around. If you have like these uh collider or v structure we say um then and you have no information about y then x and zed um are independent. But if you condition on Y, if you have some information about Y or any descendant of Y, then you can kind of trace back the influence that both have on uh on Y or its descendant such that you get some information leak uh you can say between X and Z.
So this definition allows us um to check for um independencies and um in fact we can check like all independencies with that um criteria and you can also think about an algorithm how to explore all of the possible independencies but that's um not necessary for now. You can look it up if you try to implement it at some point.
>> But um yeah, basically we find that uh this method works and um uh we now have a criteria to check um or to get all the independencies out of our graph um that we can model with a simple given graph.
Um yeah any questions to that I'm happy to explain some parts in more detail or okay um then we can look at some simple example if you have this uh graph structure from the beginning um we can think about where we find only blocked path And um so um in this example we have like a collider structure um and the collider structure with no observ with no observation is blocked. So um we find that uh this graphs models the marginal independence uh between a and b. Um we can also think about the other triplets.
Um if there is no observations then we know that information can flow. So the path are um active. So there is no independ no other marginal um independence. But now if we look at conditional independencies there is um um and we condition on the uh middle node then we find that all these other um triplets um become not active anymore. So we can find um independencies between those.
>> Yeah. Did you say what an active?
>> Yes. So um you call a trail active and uh a trail is basically a connection between two nodes. I mean in this example we only have triplets as trails but you basically decompose or um successively go this trail along to look at the triplets at the trail. And um uh a trail is active if it's um if the information can flow. So um yeah, you basically have uh these cases and not active if it's not the case.
Um yeah, but for example, if you think about um what is could I condition maybe um or do we have the um conditional independence of A and B given that D?
So as a question to all of you, does it tot given D?
Any ideas?
>> Yeah.
>> Yeah.
>> I think so.
this >> A and C is not separating the condition or I don't know the definition for any of these descendants right so >> yes >> so for these structures or colliders um you have to also look at any descendants so that's also important uh to remember that for colliders you have to check even or uh to find some independencies.
Um so yeah this is not the case for our example.
Um okay so now we kind of have understood how this uh independence map work how we can find some models that um model at least a part of our true independencies.
But um it would be like perfect to have uh not only some of our independencies but just all of them. And um the um you can also think about that if you don't have um uh if you uh want to uh approach like the these identity of independencies from the other side. You can think about the criteria that our um graph has to model at least all of the independencies that are there maybe even more. But then it's not a Bayian network because our mark of uh assumption does not hold anymore. So um this other condition is called faith faithfulness.
And this basically says that if we find some conditional independencies um in if there is some conditional independencies in our distribution that holds then we find uh the deparation. So then our uh graph entails that um conditional independence. Um and in practice we again think more often about the contraositive condition. So that if there is like no deparation then there is some dependence because if we started with an um basian network with an independence map then we know yeah at least there there's a subset of true um independencies but if we think about the non-independencies then we have independencies that are actually not there in the real distribution that we try model. But our faithfulness assumption basically describes that every dependence we find in our model is in our real distribution.
Um yeah.
Um and now we will look again at our different cases um how independencies can be appear in our graphs. And you have seen that uh especially for the first uh these first three structures just model the same independencies.
So if we try to like infer our model structure, if you try to learn a bay network u from data, there is not really a way to distinguish between these three structures.
Um and for um you can also think about that if you have different models um they could both be equivalently true or equivalently good representations um of the data. Remember that they might then have different uh probability tables behind the nodes because of course a different structure makes a different uh a difference uh in the way you have to define the probabilities because the products you um use to calculate any joint uh probability um is then different. But if you try to learn it from the data and kind of uh come up with the structure and the tables to um to model it then you cannot distinguish between these um three types of triplets. And for that there is the term of mark of equivalence. Um where basically you say that any graph modeling the same um independencies belongs to one and the same u mark of equivalence class. And you can also think again or uh use the um soundness and uh kind of completeness of the deparation that we can say we can actually characterize all mark of equivalence class with our deparation.
And we can um find that um we can uh we can define two criteria for um checking two graphs um uh if two graphs are in the same equivalence class. we can look at their skeleton which are just the undirected graph. Um and we can say if they share the same skeleton um and all DV structures all the colliders are the same then they have to be mark of equivalent and we can check again this with the separation.
Um and yeah so um we can now think about like the um causal interpretation of Beijian networks and here um some of it you have also seen last week but um here there is kind of the step from correlation to causation in some way um at least with the I forgot the name about the assumption uh uh we learned last week that if there's some correlation you assume that at least uh one of the causal structures holds so basically if you have a correlation between X and Z then you assume that at least one um of these causal effects might explain um this correlation Yeah. Yeah.
Yeah. Lightning buff assumption or um so we can use that to now think about if we can build some causer models out of our Beijing um networks. And um here again we like the interpretation if we have a chain in one direction we can say yeah there is an indirect causal effect indirect in the cells because um if we just look at two variables we might not have this causal effect. So um there's also like um if you have maybe in biology some uh different levels maybe you have some genes then you have some hormones and then you have some uh features that you can directly or some uh like appearance features you can observe. There might be in a lot of cases only indirect effect. So that the genes not directly have an effect on the appearance just via the hormones or via some other mechanisms they um have influence. Um so this indirectly and there is also then some discussions maybe some different terminology where they say yeah we don't say that there is like a causal effect and um indirect causal effects. Um there might be some uh some people not calling indirect cause causal effects proper causal effects but for modeling like the data generation or like the generation process um behind it uh I think it's useful to think about indirect causal effects still as some proper causal effects because that are like um only in the fewest cases we have um direct links between variables and um if you think about modeling um modeling some complex um process then um these uh indirect cause effects are of course some uh uh crucial parts of it. Also, if you heard in the discussion about um modern AI that there's the strategy to build some digital twins of factories of bure uh of bureaucratic processes then part of it could be to come up with the causal models um that are behind these processes and um of course you will observe a lot of indirect causal effects there. Then um indirect evidential effects are just uh from the other uh perspective with the other perspective and um common cause are also um a very interesting structure because last week we have also seen about the examples with genes and smoking and cancer. And if you and the interesting thing is that causal uh that common causes are again something that might uh be or that could happen instead of a direct causal effect. You observe some correlation between X and Zed and um your first assumption would be yeah maybe there's a direct causal effect between them but no there is in fact only a common cause that induces this or explains this correlation. So um yeah, common causes again happen quite um quite uh regularly and they are very important to get right in the sense that um if you really care about um then uh using or exploiting some cause and mechanism. If you uh have a common cause, you there's no chance influencing zed by modifying x because they the correlation just happens um because of why. So um yeah that is like very important structure to get right when doing causal models uh or modeling of some real world processes. And for the collider structures you have also seen uh like uh in one sense their difference or um what they um yeah that they behave completely different from the other types of um structures. But again now because colliders are so different, this allows us to actually exploit um some um correlation we observe in the model because if we observe some correlation then we can infer that there is some dependence and that restricts our possible triplets we can uh use for modeling our process. So if we have like the the data that there might be a collider structure then at least this collider structure allows us to direct some edges.
And for that um we can also think about um partially directed graphs where we um try to um yeah combine or try to represent like the mark of equivalence class at once.
there we not um look anymore at um Beijian networks because here it's not that clear um in which direction or in which uh how we factoriize our distribution directly. So um this graph usually is not the graph of a basian network because for a basian network you need some directions. Um but if we just think um in isolation about the graph then uh we can think of it as the representation of a mark of equivalence class that we can derive from the correlations of our that we find in our data.
And um yeah, so these are basically the starting point for discovering causal structures from data and um yeah uh we will learn about that in the next lectures I think. So um yeah >> the examples we have for equivalent classes >> in a slide after that direct graphs are equivalent if they share the same structures. Yeah, >> but for the examples they have the same skeleton because they have they are I don't know how to describe but they are the same order like X Y and Z >> uh they have the same undirected edges.
So for for the skeleton you just remove the directions.
So um also the collider has the same skeleton as the other ones. But as the collider or as the collider is a V structure um these examples have the same skeleton but are not a V structure and the V structure is a V structure. So they are in a different um equivalence class. So for this and if we look like in this example um uh I mean that's just one structure but we can maybe yeah we can try to come up with a different one um so does anyone has an has an idea how we find an equivalent model to that an equivalent graph >> you can change the the poor to any any other equivalent uh directions or can do maybe from C to D can do D to C and that's equivalent to structure in one interesting idea but uh this will give us some problem because >> yes we will introduce another structures so is there >> um yeah Then the skeleton changed. So >> I don't think there is a different one.
>> Do you think >> we can like delete the H between C and D and introduce another one between B and E?
>> That's a good point.
Um but again this is a different skeleton because Yeah.
>> So um I think from finding mark of equivalent models we um have no chance here because changing any edge or the direction of any edge. So the skeleton has to be the same. So we can only change direction of edges and um changing any edge would introduce or reduce the number of restructures. So we have no chances here. But we can think about other identity maps. So um if we want to add edges, we can do so. And it's really probably same uh or it is always uh still an identity map. Um but one with less independencies. So like a worse independent map in some sense. Um but for um like equivalent ones I think here it will not work. Yes.
>> Yes.
should be good.
>> So if we have this example and yeah for these online I will try to write it in the blackboard and then uh show that for the undirected ones uh you can flip them in some parts. we can think about that this is fixed.
Um, and now which possibilities do we have for these edges?
>> Yeah.
>> All three can have a chain, you can have a uh Yeah. Yeah. Almost the same as it's the same as that.
>> Yeah. So we in fact have all these three possibilities. We could also have like um all so basically the only one direction we can or the only combination of directions we cannot have is another collider at y and um but for example if we have um this changed then I mean there you can check somehow if there are is if this is even um proper partially uh directed graph in the sense that um the it is um specific enough to define uh equivalence class. So for example, if we have this one, could we So and assume you know about these edges. I mean this is like another uh task that um will come up when we think about directing some edges. Maybe we have some expert knowledge on it where we know oh there has to be some direct link. So we can direct this edge and then we can think about allows us uh allows this to direct even more edges in our undirected graph. So could we direct more edges here? Yes, >> the only thing that it can't do is this way because then it would be cyclic and everything else that you do would introduce one D structure which is a lot which is fine.
>> Mhm. But even for the V structures you probably we probably would have seen this in data. So um I think everyone so this as itself maybe uh might not be observable because if we observe this then we would also observe either this one um and for yeah either this one I think this could be undirected still or we introduced the vstructure So the other case would be this and because of that if you know if this is no okay this would be the second case.
So um like we have one case where the V structure is here which we could uh find in our data or we do not find this V structure but this Vstrap in our data and then we still are still left with one item directly.
>> That's isomeorphic to what you have up there doesn't just observing different variables.
>> Um yes of course yeah but usually it matters which variable is where. But um uh yeah Okay. Um so um actually that's it for today with the lecture but I'm happy to um answer more questions or go to more examples if you have um some questions also from online if there are no nothing >> nothing online. Okay.
So any questions to any topic of today?
>> No.
>> Okay. So maybe I just want to add one more thing. So um we've written here now causal pedex and causal. We're not talking about causality yet. But in the literature because of assumptions and and how we work in practice typically that's what we find here. But in causality we need to talk about interventions right about actually changing variables which is not happening here right now. Yeah. But these are all the structures that we're using and we are implicitly assuming this at least for today. Yeah. Because that's typically how we work right. We expect these variables to be something that we can change or at least reason about changing >> just to differentiate. This is purely like statistics stoastics basics graph theory but it's the basis of how pro stability. Yeah, like the motivation I also uh which I also um talked on that if you think about how to use some correlation some exploiting some correlation that's all about intervention and um like there is this formal um or this formalism from pearl how to for how to define mathematically and properly how to do an intervention what it means and then what you can uh derive from it. So we will look also at that the next weeks. So then thank you for your attention and for today.
And again some reminder if you want to refresh more uh on the statistics part you can go to the P PGM um exercise session. But otherwise we'll see each other next week. Thank you.
Related Videos
Agentforce NOW AMA: Build with React and Salesforce Multi-Framework
SalesforceDevs
490 views•2026-05-28
How agent o11y differs from traditional o11y — Phil Hetzel, Braintrust
aiDotEngineer
450 views•2026-05-28
WEB TECHNOLOGIES UNIT-2 | Degree 4th sem BCOM Computers web technologies unit-2 full explanation💯✅
LearnwithSahera
1K views•2026-05-29
More tests are always better? How to use AI to identify tests that bring little value
Alliance4Qualification
335 views•2026-05-29
Search Algorithms Explained in 60 Seconds! 🤖💨
samarthtuliofficial
218 views•2026-06-01
People of Game of Thrones using JavaScript DOM
AltCampus
296 views•2026-05-30
Introduction to Problem Solving Part - 1 | Lecture 1 | Intermediate DSA
ascensionix
107 views•2026-05-29
🚀 BCS613C Compiler Design | Module 1 to 5 Schema Evaluation 🔥 | VTU 6th Sem 💯 #VTU #bcs613c #exam
Pranavaa-y4y
104 views•2026-06-02











