Giorgini elegantly bridges generative AI and classical physics, providing a robust way to model high-dimensional systems without relying on predefined assumptions. This work represents a sophisticated leap from simple data fitting to capturing the true stochastic essence of complex dynamics.
Deep Dive
Prerequisite Knowledge
- No data available.
Where to go next
- No data available.
Deep Dive
"Learning Dynamics from Statistics: a score-based approach" by Ludovico GiorginiAdded:
for inviting me at your group meeting. Um So, in today's presentation, uh I will talk about building mathematical models from high high-dimensional partially observed dynamical systems.
So, I think this is a central topic in many scientific fields ranging from geophysical fluid dynamics, which is the field that is the closest uh to my research, but also other physical systems like molecular systems or neuroscience. So, so every times we have access to data, so we have observations of the underlying system, and we want to build a mathematical model that is able to reproduce and capture the main causality relationships between the variable of the system, and also hopefully to perform predictions and that uncertainty quantification.
So, of course for this specific class of system that are extremely high-dimensional and also and also multiscale and partially observed, uh it is quite meaningless to try to perform trajectory shadowing. So, it's quite meaningless to try to build a model that is able to precisely predict the trajectory of the system.
Instead, what we are usually So, what our goal becomes usually is to build a mathematical model that is able to reproduce some key statistical and dynamical observable of the the data set that we are observing.
Those observables can be, for example, some moments of the steady state distribution, the whole steady state distribution, and also some uh state or temporal correlations between the observed variables of the system.
So, this is the goal of this talk. So, develop a mathematical model that is able uh uh to reproduce this target statistical and dynamical observables.
So, because of this timescale separation of the data, so the fact that the data can be multiscale, I will use a stochastic model where we have a first component, the drift term f of x, which uh will model the deterministic and slow variable component of my data set.
And I And I also have a stochastic component that takes into account the faster dynamics, the unresolved fast fluctuations of the data set.
So, this will be the model that I will that I want to construct from the the observations with the final goal of being able to reproduce in an efficient way this target statistical and dynamical observables.
Okay, so right now I have been extremely general. So, I have talked about a general model to solve this very general problem.
In the next two slides, I want to give a bit more details about the first what are the requirements that we want from our model, and then also what are the assumptions that we're doing on the observed data set.
So, first let's see what are the requirement that we want from our modeling strategy.
So, as I said, we would like to model real data that can be extremely high-dimensional and also maybe also unevenly sampled or have a very low sampling frequency. So, these are all features of our real data set that we have to take into account when we are building a mathematical model for it.
So, this essentially implies first that since we want our modeling strategy to scale very well with the dimension, and since also the model that we want to develop can be extremely high-dimensional and computationally expensive to integrate, we would like to be able to reconstruct the mathematical model from data using as few model integrations as possible.
So, a naive approach to build a mathematical model can be to start with a model ansatz, integrate it forward for a given time, then calculate all the statistical and dynamical observables, compare them with the targets, evaluate a loss function, and use that to update the model. But, this will require many model integrations that can become extremely computationally expensive. So, our modeling strategy So, one of the main goal of our modeling strategy would be to use as few model integrations as possible.
Another requirement that we would that we want our modeling strategy to have is to avoid the state space clustering.
So, we would like to avoid clustering the state space to estimate the average velocity field because essentially So, clustering the state space, even if we are using an extremely efficient clustering algorithm like, I don't know, bisecting k-means clustering algorithm, suffer from the curse of dimensionality.
So, if we are considering a data set or a system with dimension larger than a few dozens, clustering becomes extremely computationally expensive, so we would like to avoid clustering.
And also we would like to use a finite difference estimation to estimate the velocity from trajectories. So, we are assuming that our realistic data set can be can have a low frequen- a low sampling frequency, which essentially means that we cannot trust the data set to recover the velocity of our dynamics using finite difference.
So, these are the three main um constraints that we want our modeling strategy to have in order to be scalable to high-dimensional system and also being used to realistic settings.
Then, let's see what are instead the constraints that we want to impose on our system.
So, on which kind of systems this So, the model the modeling strategy that I will propose in this talk can be applied.
So, first of all, I will assume that the system I'm studying has statistical stationarity. So, essentially that we can define a steady state distribution.
So, of course this can be generalized also to cyclostationary data by augmenting the state space or, for example, to data set that show a low trends that can be detrended and then we can get a stationary um data set.
I will also assume that the system has um so, is ergodic and mixing, which essentially means that time averages along long trajectory will converge to ensemble averages, and also that the correlation function decays sufficiently fast.
And then I'm also assuming that we have an effective timescale separation. So, this essentially allow us to model the data set using the Langevin equation that I showed you before. So, essentially treat the fast timescale as a noise process, and then build a drift term for the slow timescales.
I will also assume to have enough observables of the slow timescales, and so in such a way I can get a full recovery of the slow variables.
Even if we don't have a complete observables for the slow variables, we can still augment the state space using some delay embedded version of the data set.
And then finally, even if uh we will not have probably some very fine resolution of the data sample, we still assume to have enough resolution to be able to define correlation functions. So, we don't have enough resolution to evaluate the velocity at every time point by using finite difference, but still we have enough data essentially to define a correlation function.
So, these are the goals that I would like my modeling strategy to achieve, and also those are the main assumptions that I'm doing on the system that I'm studying.
So, in this talk, I will present two different directions. So, two different modeling strategies that I'm currently pursuing.
The first one So, in the first one, I'm assuming to have a model ansatz. So, essentially I have a knowledge of the functional form of my model, so a knowledge that be derived directly from physics.
And this model answers depends on a set of parameters that I would like to determine.
I have an initial guess for those parameters, and I would like to calibrate those model parameters in order to reproduce a set of statistical observables that I derive from my observations.
The second direction instead is a bit more ambitious.
I assume not to know any model answers. So, I don't start from model answers assumption, and I want to derive from my data the whole functional form of the model.
In in this case, what I'm interested to do precisely is to start from the set of statistical and dynamical target observables, and I would like to infer from them the most general class of dynamical systems that by construction reproduces those dynamical observables.
And I would like to do it. So, this is the most ambitious part of the of the method because I would like to do it without ever integrating my model forward. So, just from the knowledge of the target statistical and dynamical observable, I would like to be able to infer the model without ever integrating my model forward in time.
So, these are the two directions that I will present today.
Let's um for both directions, the key element that allow essentially those two directions is the score function. So, essentially, I will show how from the knowledge of the score function, we are able to build those two modeling strategies. So, the score function is defined as the gradient of the logarithm of the steady state distribution.
Um Um so, the score function is one of the central So, it's a central quantity in machine learning specifically in score-based generative modeling because essentially it's a quantity that allow uh the generation of new data sample according to a specific probability density without the knowledge of the normalization constant that that otherwise someone needed to to evaluate in order to to define a normalized probability density function.
So, essentially, it's a way to sample new data sample according to a specific probability density without the knowledge of the steady state distribution, which can be extremely challenging to be constructed from high dimensional systems.
In the In this talk, I will So, there are of course many different methodology to estimate the score function from data. In this talk, I will use the denoising score matching method, which essentially consists in taking the data set that I would like to use to estimate the score function, I perturb it by adding a tiny Gaussian white noise with amplitude sigma, and then I will use the the denoising score matching identity, according to which I can write the score function of this perturbed probability density in fun So, in in terms of the expected value of zeta conditioned on the perturbed value x sigma. So, given So, I am at the end what I'm doing is to train a neural network to predict the value of the noise zeta given an input the perturbed value of the data set point x sigma.
Um so, if sigma So, by considering sigma So, the perturbation amplitude very small, I'm essentially estimating the score function of a perturbed density, which is extremely close to the observed one. And so, the score function that I derive using this method it will be very close to the correct score function.
The advantage of this algo of this algorithm is that it scales extremely well with the dimension because essentially we are recasting the score estimation problem, which will otherwise imply the differentiation of the logarithm of the steady state density, with a regression problem.
And this regression problem scales very well with with the dimension. This essentially means that we can estimate the score function quite efficiently also for very high dimensional systems. And then we'll use the knowledge of the score function to infer the full mathematical model that explain that is able to reproduce this set of target observables that I mentioned at the beginning.
So, let's start from the first modeling strategy.
So, as I said, I'm assuming to have a model answers, so to have an answers for the functional form of my Langevin equation with multiplicative noise.
And this model answers depends on a set of parameters alpha and beta.
We call alpha all the parameters inside the drift term, and beta all the parameters inside the diffusion term.
So, the problem consists now in finding the values for alpha and beta that reproduces the target statistical observables.
So, learning So, in in order to solve this calibration problem, what we need is the parameter sensitivity. So, if we call phi m the set of observables that we want our model to reproduce, what we need to know is how those observable phi of m will change by changing by a tiny amount each of the model parameters.
So, if we know these these quantities, we can essentially update the parameters accordingly in order to minimize the distance between the target observables and the observables predicted by the model.
We can estimate the statistical Jacobians using a naive approach by just integrating the model forward many times. Every time we perturb just one parameter by a tiny amount, and then using finite difference, we estimate this statistical Jacobians.
This method of course works, but it has the problem that it requires an extremely large number of model integrations. So, we need at least one model integration for every parameter. And if the model is very large, it becomes extremely computationally expensive to integrate, in particular, if we have many parameters that we want to calibrate.
So, the the key idea here is to recast this calibration problem to a perturbation problem. So, if we imagine to Taylor expand at first order the drift and and the diffusion coefficients after we perturb by a tiny amount each of those parameters, so essentially, what we obtain is a perturbed version of this Langevin equation, where we have an additional term in the drift and in the diffusion coefficients, which is given by the perturbation amplitude of the parameter multiplied by the partial derivatives of the drift and the diffusion term with respect to that specific parameter. So, essentially, it's like if we are studying a perturbed version of our model, and we would like to predict how all those statistical observables will change after we add this perturbation.
Okay. So, now we have this perturbation problem, which can be addressed using a very powerful tool from statistical physics, which is the generalized fluctuation-dissipation theorem or GFT.
So, GFT is a mathematical machinery that allows to predict how a dynamical system respond to a perturbation without actually perturbing it, just from the knowledge of its statistics.
So, without entering into the mathematical details of this problem, we can essentially write the response of an observable A to a perturbation given by uh U of so a drift perturbation given by U of X and a diffusion perturbation given by V of X.
So, those are nothing but these perturbations that we defined before.
And we can write the response of this observable A in terms of an integral of the correlation between the observable itself and this conjugate variable B, which depends on the perturbation that we are applying. So, U of X and V of X.
And importantly on the score function S.
So, essentially if we have the knowledge of the score function since we know the analytic expression for our model. And also we know the analytic expression for the perturbation that we are applying U and V.
We can predict how our system, so in this case how the observable A will respond to an external perturbation.
So, if we identify the observable A with the set of observables that we want our model to reproduce.
We can essentially construct using the generalized fluctuation-dissipation theorem the parameter sensitivities that we are interested in.
Just from the knowledge of the score function itself.
So, this is the main connection. So, using GFDT we are able to estimate the parameter sensitivities without the running the model forward for every parameter.
We just needed to run the model forward one time to estimate the score function.
And from this single model integration we are able to estimate all the parameter sensitivities that we can use for calibration.
Okay, so as we have seen this entirely so this so the applicability of the generalized fluctuation-dissipation theorem to this specific problem relies only on the estimation of the score function.
So, at this point you may ask okay, but why nobody like followed this direction?
And the reason is that estimating the score function for high-dimensional system has always been the main bottleneck to use this generalized fluctuation-dissipation theorem inside realistic and practical problems.
So, usually people in the literature have always struggled in estimating the score function. And so, it was possible for low-dimensional systems. Instead for high-dimensional systems, what has been usually done in the literature was to use the so-called quasi-Gaussian approximations, which consists in approximating the steady state distribution with a multivariate Gaussian, which allows to write the score function in terms of a linear function that depends on the covariance matrix of the data set, which of course is extremely easy to be estimated.
But the main problem is that when the dynamics is highly nonlinear, so when the steady state distribution also is highly non-Gaussian, then this approach introduces some strong biases, which doesn't allow to get quite precise prediction of the system responses using GFDT.
But we we have seen before how yeah, recent advances in score estimation methods using a neural network allow us to get a very precise and efficient estimation for the score function also for very high-dimensional systems.
And so, this knowledge allow us to construct and to estimate the system responses using the generalized fluctuation-dissipation theorem.
So, we applied those ideas already to evaluate and to predict system responses for quite high-dimensional systems. We started PDEs discretized on around 10 at the third grid points.
And we considered more specifically two-dimensional turbulent data and Alan-Khan reaction-diffusion data. So, these are the two papers where we published this connection between the score-based generative modeling and the generalized fluctuation-dissipation theorem. And so, now the idea is to use this mathematical machinery to evaluate the parameter sensitivities.
And so, to do that I so, I described how we can do it. So, from the knowledge of the response function we can estimate the parameter sensitivities. Now, let's see how to do that in practice.
More specifically, let's consider two examples now. So, the first one is a very low-dimensional model. We have a three-dimensional SDE with a multiplicative noise. So, this model is used in geophysical fluid dynamics to describe El NiƱo-Southern Oscillation, which is a interannual so, it's a annual variability phenomenon of the sea surface temperature in the tropical Pacific. We have two slow variables, one fast variable, which are coupled.
This model depends on six coefficients.
And so, what we are going to do is to start so, is to first run this model with the correct values of these coefficients to have an obser so, to build our observations.
Then we will try to recover the correct values of those coefficients and starting from an initial guess obtained by perturbing by around 20% each of those coefficients and running our calibration algorithm. So, using GFDT to estimate the parameter sensitivities and use this knowledge inside a Newton algorithm to estimate the correct values of the coefficient.
Those are the six observables that we would like to recover. So, we start with a parameter guess.
Um we use this parameter guess to integrate the model forward.
We use this data set to estimate the parameter sensitivities. And then we are updating the parameters using the knowledge of the parameter sensitivities.
And we are iterating this procedure until the statistics of our predicted system will converge to the target statistical observables.
We are doing this procedure using three different methods to evaluate the parameter sensitivities. We are first using finite difference. So, we are just integrating many time the trajectory forward one for each parameter. And then we are using finite difference to estimate the parameter Jacobian.
And we are using that information to update the parameter value.
Then we are using the generalized fluctuation-dissipation theorem using two different ways to estimate the score function.
We first used the quasi-Gaussian approximation. So, we just wrote the score function in terms of a linear function that depends on the covariance matrix of the data. And then we are using the denoising score matching approach to estimate the score function.
And we will be comparing those three approaches.
These are the results.
So, here on the left you can see the L2 norm between predicted versus target statistical observables as a function of the algorithm iteration. So, in this case we introduced the breaking point when the L2 norm was falling below 10 at the minus three.
And as you can see the blue curve and the gray and the gray curve, which represent the calibration algorithm using the denoising score matching score function plus GFDT and the naive and and the naive finite difference estimation for the parameter Jacobians in just five iteration we are falling below the threshold.
Which essentially means in five iteration we were able, as you can see here in this panel showing the parameter deviation, to precisely recover the correct parameters of the model.
And instead using the Gaussian approximation for the score, so like a more so, less precise estimation of the score function was very difficult to have the algorithm con converged to the correct value.
But so, here what we can see that using the generalized fluctuation-dissipation theorem plus the denoising score matching to estimate this score function, we were able to have very similar performances with respect to the naive finite difference method at a fraction of the computational cost because we So, for every iteration, we needed to integrate the system forward only one time instead of six time So, the number of the parameters that we want to calibrate or like in in this case 12 because we use the center difference for parameter for the uh um parameter Jacobian estimation. So, essentially here we have an algorithm that's doesn't scale So, doesn't So, the for which the computational time doesn't scale linearly with the number of parameters, but is constant since we only need to run the model forward one one single time for every iteration.
Okay. So, in this case we considered a quite a low dimensional system.
Next, I will consider this coupled Lorenz '96 system, which is around a 400 dimensional system. We have 36 slow mode and we have a for each slow mode we have 10 fast modes. Plus, we also have some white noise in each of um those variables.
And what we want to do now is to do something different. So, we would like to build a stochastic closure for the X So, the slow variables. So, essentially we would like to build a 36 dimensional model instead of this model here that is around a 400 dimensional which is able to precisely recover the target statistical observables evaluated from the high dimensional model. So, we have observations for X which have been generated integrating this very high dimensional system and we would like to build this reduced order model which is only 36 dimensional with the correct values of alphas of the alpha coefficients and the sigma coefficients such that they reproduce this set of target statistical observables which are the mean uh the variance, skewness, excess kurtosis, and uh covariance C1.
We have in total five parameters that we want to calibrate >> [gasps] >> uh on the on these five different statistical observables.
And so, we used also in this case these three different methods.
We have in orange GFDT plus Gaussian uh estimation of the score function. And then in gray and in blue, we have the finite difference method and then the GFDT plus the noise score matching for the score estimation. So, essentially in in this case we can see like yeah, clear advantage in using the the noise score matching to build the score function. And from the knowledge of the score function, also in this case which is quite high dimensional, we can observe how we get quite similar performance than of using finite difference at a fraction of the computational cost. And so, essentially at at a fraction of the number of time that we have to integrate our model forward.
Okay. So, this is the first direction. So, start from a model answers and use this combination between the generalized fluctuation dissipation theorem from non-equilibrium statistical physics with the noise score matching from generative modeling to to to estimate uh the parameter sensitivities with a very limited number of model integrations.
Now, let's see the second direction.
So, in this case we don't have any model answers for the functional form of our mathematical model.
We have a set of statistical and dynamical observables that we want our model to reproduce, which in this case are the full steady state distribution and a set of correlation functions where phi m and phi n are So, it's a set of observables of the the state variable of the system.
So, given these constraints, we would like to build a mathematical model that by construction reproduces those constraints without integrating our model forward.
And again, we will use the score function to do that.
Specifically, we will use two different score functions in this case. We have the plain score function that we've seen before and also we will use the conditional score function. This essentially is the gradient of with respect to X0 of the logarithm of the conditional probability density function. So, the probability density function of X at time T conditioned on X0.
The conditional score function can be constructed using the noise score matching precisely as we did it for the plain score function. In fact, we can write the conditional score function in terms of the joint score function and the plain score function.
For the joint score function, we just take our data set. We use a delay embedding in order to build a time series of X0 and XT.
We do it for different value of the time delay and in this way we estimate the score function the joint score function.
And then we can combine it with us and using the same the noise score matching machinery we have seen before, we can estimate both the conditional score and the score function from data.
And as we have seen before, both these algorithms scale quite well with the dimension of the system.
So, the idea here is then to use those two quantities where the first quantity essentially takes into account the geometry of the steady state distribution.
Instead, the second quantity essentially takes into account how the system relaxes towards the steady state distribution. So, it's carrying information also about the the dynamics of the system and not only about the statistics.
So, this is the intuition. So, try to use those two quantities that can be evaluated quite well also for very high dimensional systems to build our stochastic modeling approach.
So, let's start from our Langevin equation. So, this is the same Langevin equation I wrote at the beginning. Yeah, just have here a factor square root of two.
And then let's first impose stationarity. So, we want for a given sigma X to find our drift term F such that by construction reproduces the steady state distribution.
And to do that, we can write the Fokker-Planck equation relative to the Langevin [clears throat] equation, impose the stationarity and we can show that without losing any generality we can write the drift term in this way. So, in terms of the score function that we defined before and the diffusion matrix. So, this symmetric matrix D of X and another anti-symmetric matrix R of X.
So, this is a very general expression. We're not doing any approximation here. We are We are just finding the most general way to express the drift for a given the diffusion in such a way that it reproduces the steady state distribution by construction, which essentially means in such a way that F So, this specific shape of F solve the stationary Fokker-Planck equation.
Now, we can So, we can see that we have two different tensors D of X and R of X.
D of X is symmetric and represent the diffusion tensor.
Instead, R which is the anti-symmetric part can be interpreted as the term that breaks the tail balance and that introduces some rotational component to our system without changing the steady state distribution. So, this can be related to an Helmholtz decomposition of the drift term. So, we have a a term a symmetric term which essentially satisfies the detail balance and give us a system which is just a Brownian motion inside a potential. And then we have this other circulatory term which introduces some rotational component that breaks detailed balance.
Okay, so now by using this expression here for the drift term, we are guaranteed to recover the steady state distribution. So we achieved the first goal of our modeling strategy which is to build a stochastic model that by construction reproduces the observed the steady state distribution of the data set. Now let's try to impose also the second constraint which is we want to reproduce also the the time correlations.
So we would like to reproduce this time correlations for a set of observables phi n.
So without going into the mathematical details of this derivation, we can show that the time derivative of this correlation function for this specific model, so for this specific Langevin equation with drift term given by this expression over here, can be written in this way.
So essentially we can relate the time derivative of this correlation function with this expression here which contains the two phi, so the two observable phi m and phi n, the conditional score function, and the matrix, so the tensor m.
And the tensor m is the only term here that we don't know because we can estimate this quantity here from data.
We just evaluate the correlation function and then we estimate the derivative.
We can estimate the conditional score.
We know the analytical expression for both phi m and phi n because this is the libraries of observable that we are considering.
The only term that we don't know is this matrix m x of 0. So now let's see how we can derive this matrix m x of 0.
So first let's do this decomposition. So let's decompose m x in terms of a constant term plus a fluctuation.
This fluctuation is so the average value over the stationary density of this fluctuation must be equal to 0. So essentially here we have a yeah, a constant term plus zero mean fluctuation term delta m. So let's use this expression here for m inside the this equation over there and we can then rewrite c dot in terms of two terms. This first term which depends on phi does not depend on the conditional score, depends only on the stationary score.
Okay, so we have the first term here which is much easier to evaluate because yeah, we don't need to estimate the conditional score and it only depends on this constant matrix phi and this plain score itself minus this additional so this additional term which is nothing but this one over here written in terms of delta m instead of m.
So the key idea here is that if we have a library of observable which is rich enough such such that m of x is uniquely determined, then we can estimate m m of x, so this tensor m of x which is the missing element for so in our stochastic model by essentially using this relationship over here.
So using this relationship here and a library of observable which is rich enough, we can estimate both phi and delta m.
So let's see now how we can estimate phi first.
So to do that, let's consider just the coordinate observable. Okay, so we have here in theory like a very large libraries of observable. Now let's focus on a few of them and few of them of x equal to x. So we're just considering the observable coordinate.
By doing this replacement, the first term becomes this expected value multiplied by phi.
Now if we consider t equal to 0, then this expected value becomes minus the identity because of the Stein identity.
This term here becomes equal to 0 because we have so you can just integrate the conditional score term at t equal to 0 and you will get 0.
So this essentially means that if if you consider the coordinate observable and t equal to 0, we are able to derive a relationship for phi.
So we can essentially fix the average value of the matrix m and we can write it in terms of the coordinate the time derivative of the coordinate correlation at t equal to 0.
Now let's see what this implies. So we have then fixed the phi. We have this additional term, this correction term e of x.
When we consider the phi phi phi phi 1 of x equal to x, we then have this coordinate term at the beginning.
We can rewrite this expected value in terms of the gradient with respect to x 0 of the expected value of x of t conditioned on x 0 multiplied by delta m.
So by just considering the coordinate observable case, we can derive phi and then we can write this relationship for the correction term. But at this point we can notice that if m of x, so if the expected value of x of t conditioned on x 0 is approximately affine which essentially means if we can write the expected value of x t conditioned on x 0 in terms of a linear function of x 0, then when we take the gradient, we will get a constant term with respect to x and then by construction the average value of x of delta m is equal to 0 which essentially means that if the conditional mean is approximately affine which essentially is is is the case if the joint probability density function of x x t is a Gaussian, we can then use so we can then replace our matrix m of x which is state dependent with just the matrix phi and we have a model that by construction reproduces both the temporal correlations and the steady state distribution.
Okay, so if this term so if m of t is linear in x 0 which is often the case because if the conditional probability density so if the joint probability density of x 0 and and x t can be approximated with a Gaussian distribution, then m of t depends linearly on x 0.
So if this term is negligible, then we can reproduce the time correlations of the observed data just using phi, so this constant matrix phi that we can easily determine from the correlation function instead of the state dependent matrix m.
And then we have built a Langevin equation that by construction reproduces both the steady state distribution and the time correlations.
If we want instead to add more constraint on the correlations, so we want to add more constraints on the dynamics adding more correlations, then we have to obtain also the matrix delta m that we can parameterize with a neural network.
So in this specific case, we parameterize the whole m of x with a neural network and then we define delta m of x as m theta minus phi and then we can train a neural network delta m of theta to minimize this loss function. So we have this first term that that essentially forces the neural network to learn the set of correlation functions that we want our system to reproduce. Then we have this penalty term that essentially enforces that the average value of delta m is equal to 0 plus we have a regularization term.
But you can see here that we are so we are writing a loss function that doesn't depend on a forward model integration. So we never have to integrate our Langevin equation forward in time.
We just use the the knowledge of the conditional score, the score function, and the time derivative of the correlation functions to train the neural network for delta M.
And this can be can become extremely efficient when the model that we want to integrate becomes very computationally expensive.
Okay, so this is the methodology.
So we have seen that's yeah, we we are able to train this neural network without integrating the model forward and when so for some specific cases we can simplify the shape so the functional form of M by replacing them with a constant if we are just interested in the time correlation of the systems. So now I will conclude showing you some application of these ideas.
So I start from an analytic warm up. So we consider this one-dimensional system for which we can determine analytically all the relevant quantities.
So we can derive the station the conditional score, the stationary score, the time derivative of the correlation functions and so on.
These are the true values for fee and delta M.
And then by applying the method I >> [clears throat] >> discussed before we can obtain them using the relationship that I showed you at the beginning. So using that relationship we recover precisely the fee the correct fee and the correct delta M.
So this was just like a test where we have we know everything is analytically.
So let's see a different case. In this case we have a two-dimensional system with where we have our drift term which contains both a term that can be written as the gradient of a potential plus a circulatory component. We also have a multiplicative noise.
And in this case we cannot write explicitly the score function and the conditional score. So we need to train two neural networks for S and for the conditional score.
We apply the methodology that I described before by enforcing the reproduction of the correlation functions. We derive a quite accurate reconstruction of the mobility fields so the M tensor.
We have some errors in particular in this term but even if so we have like some errors for the M to one terms when we integrate our model we get a precise recovery of the univariate PDF, bivariate PDF all the correlation functions. Here I'm comparing two different model integrations. We have the model integration with the full mobility matrix M of X and a model integration where I'm replacing the full mobility matrix with a fee so with this constant closure that I introduced before.
We can see here that using the full mobility matrix obtained by training a neural network for M we get a more precise recovery of the correlation functions in particular for this cross correlation.
And also if I now consider the target dynamical observables so the target correlation functions that I used to train the neural network when I evaluate them from the trajectory that I obtained by integrating my model I get yeah a quite better recovery with respect to the constant M matrix closure.
So essentially this is to show that yeah by applying this algorithm we are able to estimate the mobility matrix M of X together with the score and the conditional score then combining those pieces together we obtain an expression for the drift term that is able to reproduce the steady state density the time correlations together with all the correlations that we enforced in the training.
Okay, so now let's consider more high-dimensional systems.
So for the next two systems I will only consider the constant closure for M. So essentially I approximate M of X with its average value so with fee.
In this case I'm integrating this Kuramoto-Sivashinsky PDE.
I'm integrating this partial differential equation with 512 Fourier modes. I obtain a 1024-dimensional um um time series. I'm considering just one mode every 32. So essentially I'm subsampling this 1024-dimensional state to a 32-dimensional state.
And then using those those 32-dimensional modes to build my Langevin equation. So essentially here I'm building so I'm starting from a fully fully deterministic partial differential equation partial which is partially observed and then using a completely different model to um be so to predict its dynamics using my stochastic closure.
And here are the results. So this is the time series obtained by integrating my Langevin equation. This is the real observed time series and here I'm plotting the comparison between the um the bivariate and the univariate PDFs obtained from the observations and the one obtained from a model integration of my Langevin equation. And here instead is the autocorrelation function for both the observations and my Langevin integration.
Then finally I considered the sea surface temperature data from Plasim so which is a um a global circulation model of intermediate intermediate complexity.
I have the data for the sea surface for the global sea surface temperature evolution and I want a model that is able essentially to predict and to model this sea surface temperature data.
So the data set is around 2000-dimensional.
I did a dimensionality reduction taking the first 20 principal components.
And here since I have a strong periodicity I augmented the state space by including some harmonic functions.
And these are the results.
I yeah was predicting the probability the conditional probability density of the 20 principal components together with their autocovariance. So as you can see we can have a quite decent reconstruction of the PDFs and the ACFs of all the 20 principal components. Here I'm plotting just the first 10.
And also we were able to capture the nonlinear so and the non-Gaussian probability density function evaluated at every grid point um from our simulation. So here essentially I'm plotting the probability density at different season of the temperature at a given grid point and I'm doing that using the full observation so essentially all the principal components just the first 20 principal components and the 20-dimensional stochastic model that I trained on these first 20 components and I integrated forward.
And as you can see so even if I did a dimensionality reduction of the data set I was still able to get this nonlinear probability density functions using this quite simple stochastic model that I built using this constant closure for my mobility matrix.
Okay, so these are some of the papers on that I so either published or put on archive on this topic. We tried the different directions that I haven't presented here.
But so these were so the main references and to conclude so we have seen how to model high-dimensional partially observed chaotic systems um how the knowledge of the score function plays a key role in allowing this modeling this modeling strategies. We have seen two different directions. In the first one we have a model answers and we are just calibrating the model parameters using a combination between the generalized fluctuation distribution theorem and the score modeling.
The other direction instead doesn't have any model answer.
We just try to starting from a set of statistical and dynamical observables to build a model that by construction reproduces all of them without integrating the model forward.
And then we've seen how this approach can scale on different systems from toy models to very high dimensional systems.
Okay, thanks for listening and let me know if you have any question.
Thank you, Ludovico.
Any questions?
I have a question. So, uh So, I'm wondering according to your formulation, is does your method allow you that have allow you to to work on data set that has absolutely no time information?
What do you mean with absolutely no time information? So, like time series uh where every snapshot is completely uncorrelated? Yeah, yeah. In that case, yes. So, so you can do it, but you will be able to build a mathematical model that reproduces the steady state distribution, but not the dynamics because you don't have any information about the dynamics.
So, what you can do in in that case and that will be yeah, much more simple, is to replace here uh M of X just with the identity, right?
So, if you're only caring about the steady state distribution and also you don't have any information to build the so so to estimate the correlation functions, then it essentially means that any So, you cannot infer M of X because M of X is carrying information about the dynamics. So, you can replace M of X with the identity.
Uh so so you So, if I train a model, I only need to train the M, right?
Uh if you train So, if you only want to reproduce the steady state distribution because you don't have information about the dynamics, you just need to train a neural network to learn the score function.
So, M can be just replaced with the identity.
Hm. Okay, because then so any value So, any shape of M of X will uh uh give you the correct steady state distribution.
So, you can just choose M of X equal to the identity.
Also, if you choose M of X equal to the identity, it probably is an optimal choice because uh um you have the fastest convergence towards the steady state density. So, like if you integrate your model, will converge quite fast towards the steady state density. So, if instead M of X is a constant matrix uh with a wide um variety [clears throat] wide amplitude in the eigenvalues, you have essentially that some modes will decay faster than others and so you have to wait like a longer time to see thermalization of the system towards the steady state distribution.
Yeah, this is very interesting because we we previously have a have a paper that targeting exactly on no time information and we we got some difficulty when we move from low dimension like two or three to to thousands of dimension. In thousands of dimension, our method basically uh almost failed and uh so yeah, so I'm wondering if your method can can be helpful. Yeah, so we estimated the score function also for thousand dimensional systems and yeah, like it's not a problem.
But which So, how have you done this? So, did you use the neural network to to estimate [clears throat] this the score function? Yeah, yeah. We we basically first train to get a score function, then we train a dynamic function. But that dynamic function is also a neural network. So, we we do we do not >> But you don't need that because if you just care about the the steady So, a system that reproduces the steady state density, you can just integrate this Langevin equation without any other network. So, is this a cover full solution or just a subset of solution?
No, this is a general solution. So, this expression for f of x is a general solution.
So, it's essentially is So, given this Langevin equation, if you ask what is the most general expression for the drift term in such a way that uh So, it reproduces the observed steady state distribution. So, essentially that solve the stationary Fokker-Planck equation, then this is the most general expression.
Interesting. Yeah. But since So, here the main point is to estimate So, it's to reproduce the dynamics. So, this is the non-trivial part.
If you're just interested in the statistics, then yeah, take M of X equal to the identity and that's it.
Okay. Thank you very much. I I will write an email to you. Uh Yeah.
By the way, when some system have a source and sink and does your method can can cover those situations?
Yeah, so if So, there is no time modulation of them. So, so essentially if you can define a steady state distribution, then yes.
But yeah, I haven't tested them that much. So, Thank you. I showed you the system that Yeah, I tested the algorithm.
Related Videos
Escaping the Fog
LogicLemurGaming
760 viewsā¢2026-06-03
Olympiad Mathematics | Indian | Can You Solve This One?
PhilCoolMath
650 viewsā¢2026-06-03
A Brutal Radical Expression Made Easy! The Shortcut Changes Everything.
tamoshop
112 viewsā¢2026-06-02
V : jee main /advance class 11 mathematics : Binomial Theorem class-1 ( 29 may 2026 )
dcamclassesiitjeemainsadva9953
125 viewsā¢2026-05-29
Is This Pentomino Tileable?
3cycle
241 viewsā¢2026-05-30
This Sudoku Has Many Lines!!
CrackingTheCryptic
2K viewsā¢2026-05-29
Olympiad Mathematics | Indian Can You Solve This One?
PhilCoolMath
268 viewsā¢2026-06-02
Olympiad Mathematics | Indian | Can You Solve This?
PhilCoolMath
669 viewsā¢2026-06-02











