This video brilliantly simplifies a major breakthrough in computer science, making a complex mathematical milestone accessible to a broader audience. It proves that even long-standing algorithmic challenges can be solved with elegant, near-linear efficiency.
深度探索
先修知识
- 暂无数据。
后续步骤
- 暂无数据。
深度探索
What If Roads Could Be Negative?本站添加:
The story of the single source shortest path problem is one of the most fascinating yet underappreciated gems in computer science. The problem itself, finding shortest paths, is very wellnown. But what many people don't know is that technically speaking, it's still unresolved and there have been some pretty remarkable theoretical breakthroughs in the 2020s towards resolving it. Only recently did we start chipping away at these problems.
Problems that many people have heard of.
They've been around since the 50s. We've had like dozens and dozens and dozens of papers, but there were just no fast algorithms. This video is about one of the most significant of these breakthroughs which happened in 2022. In our time here, I want to give you a glimpse into the cutting edge of algorithms research. What does progress look like on one of the oldest, most wellstudied problems in computer science?
Finding shortest paths is a completely ubiquitous task that people have been trying to solve ever since the early days of digital computers. It's routine for software like Google Maps that helps us navigate between cities. And even beyond the obvious applications, there are many other problems that can be represented as a shortest path problem with a network called a graph, which consists of nodes and edges between them, each with a certain number called a weight, where the answer to a question we care about is given by the minimum weight path between two nodes. What's interesting is that some of these problems involve negative edge weights.
Now this makes no sense in the original context of say mapping applications where every road would take a positive amount of time to drive across. But there is a bigger picture.
>> When you first think about shortest paths, people instinctively think about time like how long does it take me to get from here to San Francisco driving.
But the weight on shortest paths could also mean other things. For instance, one useful thing you could do would be to create a graph where the nodes represent different currencies and the edges between them represent the negative logarithm of the exchange rates between them. You can think about what this accomplishes for yourself or you can just take my word for it, but this negative edge from euros to Canadian dollars means Canadian dollars are less valuable than euros. And the shortest path from euros to Canadian dollars would represent the cheapest way to exchange euros to Canadian dollars, which may involve going through some other currencies first. You're one of these finance sharks, so rather than just doing it the normal way, you like take some huge roundabout way to get to euros that is like a little more profitable. This kind of problem is dealt with all the time in financial technology and it's crucial to be able to solve it as efficiently as possible in the real world where exchange rates are constantly fluctuating and the best trading opportunities might only be available for milliseconds at a time.
Now, when there aren't any negative edges, we've known how to find the shortest paths just about as fast as theoretically possible since the 1950s.
The algorithm that accomplishes this is called Dystra's algorithm and it's famous for its simplicity. According to the inventor of the algorithm, Edgar Dystra, it only took him about 20 minutes on a short coffee break to devise it. But the same could not be said about the case where edges can be negative.
We have had algorithms to find shortest paths with negative edges also since around the time that Dystra's algorithm was invented. The most notable and well-known of them being the Bellman Ford algorithm. The problem is the Bellman Ford algorithm is slow. We're still doing an overview right now, so we'll get to how these algorithms actually work in a few minutes. But just to provide context, I'll tell you that if a graph has nodes and m edges, the number of computational steps needed to execute the Bellman Ford algorithm is in the worst case proportional to M * N.
Computer scientists have a special name for this kind of fact. We'd say the running time of the algorithm is big O of M * N. This doesn't tell you exactly how long the algorithm will take to run.
But what it does tell you, and this is what theoretical computer scientists are often focused on, is how the worst case running time scales with the size of the input. Namely, it scales proportionally to m * n. Which means that if you were to increase the number of nodes and edges in the graph by a factor of say 10, the algorithm could take up to 10 * 10 or 100 times as long to run. If we plot the running time in terms of m and n, you can see that as m and n increase, meaning as you go farther out from the origin, there's immediately a very steep increase in the running time. Now, Dystra's algorithm on the other hand has a much faster running time. It's O of M + N log N. Now when you increase the number of nodes and edges by a factor of 10, if you do some algebra, you'll find that the algorithm only takes about 10 times as long to run. Meaning the running time roughly scales just with the size of the input this time. This fact is once again nicely visualized with a plot where this time as we increase the size of the input, the increase to the running time is nearly linear. In fact, when your running time looks like this, that is proportional to the input size, potentially with some extra logarithmic factors, it's called nearlinear time. And this is essentially as fast of a running time as theoretically possible for most algorithmic problems. In the algorithms community, when there's news of a breakthrough, it's often because someone found a nearlinear time algorithm for a problem that we previously only had slower solutions for. Now, as we saw a moment ago, the Bellman Ford algorithm is much much slower than nearlinear time. But for many years, it was the best tool we had to solve the shortest path problem when there are negative edge weights involved. It was around the 1980s that people started finding ways to solve the shortest path problem for negative edges more efficiently than the Bellman Ford algorithm. There were improvements that got the running time down to O of M * N 3/4 log N then O of M * N 1/2 log N. And in the late 2010s and early 2020s, there were even more algorithms that got tantalizingly close to nearlinear time. But it was in 2022 that researchers finally achieved this goal. There were actually two independent groups that got to this result concurrently. Or actually, one of them achieved something slightly different called almost linear time, which has a separate technical definition and is marginally slower in theory than nearlinear time. Although for all intents and purposes, they're basically the same. Anyway, that one, as you'll notice from the title of the paper, actually solves a much more general problem than finding shortest paths. The problem they solved is called maximum and minimum cost flow. Finding shortest paths with negative edges just happens to be formulable as a special case of this. But this video is about the other algorithm by Bernstein Nanukai and Wolf Nilson which we'll call the BNW algorithm. The really interesting thing about this algorithm is that the techniques used for it are much different from the ones being used by others in years prior. The state-of-the-art methods at the time used in algorithms such as the first group I mentioned are extremely sophisticated and involve encoding the graph as a highdimensional convex polyhedrin and exploring it to optimize the value of a certain function. The BNW algorithm on the other hand is essentially just a combination of a bunch of techniques that had existed for a long time but put together in quite a clever way. To learn more, I had the pleasure of sitting down with one of the authors, Professor Aaron Bernstein at New York University.
>> You learn Bellman Ford is MN and then you learn like, oh, there's this M squared n algorithm from the '90s and then there's nothing. Many people who are into graph algorithms at some point thought about this problem at least a little. Probably once or twice in grad school, I spent like a month on it and just got nowhere. And then somehow, I don't exactly remember why, but the three of us decided to try again. We just kind of started from scratch trying very basic things.
>> So how exactly does this algorithm work?
Well, first things first, a key ingredient is that one of the steps involves combining Dystra's algorithm and the Bellman Ford algorithm. So now is a perfect time to warm up by walking through those two. Let's start with Dystra's algorithm.
Suppose that in this graph we wanted to find the shortest path from node A to node E. Initially, we only know the distance from A to itself is just zero.
And for the other nodes, we'll mark a current distance of infinity since we haven't found any paths there yet. We start by exploring all edges going from A to other nodes. This edge to node B can get us there with a distance of two.
Similarly, these edges get us to C and D with distances of 3 and 1, respectively.
That's all the edges from node A. So we can mark it off as complete. The key now is to look at the remaining unvisited nodes and continue at the one with the current shortest distance. In our case, that's node D with a distance of one.
Crucially, by this point, we won't ever find a shorter path to node D. We picked D because the current path was shorter than going anywhere else. So it could only be more expensive to get to D by going somewhere else first and then back to D.
With that knowledge, from here we look at all the edges from D to other nodes.
This edge to B has a weight of four. And since D has a distance of 1 from A, it means there's a path from A to B of distance 5. But we already have a path to B of distance 2. So we'll leave this alone. The edge from D to C, however, has a weight of 1, meaning there's a path to C of distance 2, which is better than the current distance of three. So, we'll update the path. We also find a path to E of distance 4.
Next, we again continue at the unvisited node with the shortest distance from A, which is a tie between B and C this time. They both have a distance of two.
So, we could pick either of these, but we'll look at B. Like before, we now have the shortest path to B because it would be equally or more expensive to go anywhere else first. Looking at the edges coming from B, we find an edge back to D, but we already know the shortest path to D, so we can just ignore this. We also find this other path to E, but it doesn't improve the distance, so we'll leave that alone.
Moving on to node C, we find a better path to E of distance 3. And that's all the edges in the graph. So now we're done. We've found the shortest path from A to E with a total distance of three.
Furthermore, notice that a byproduct of the algorithm was that it ended up finding an entire tree of shortest paths from A to every other node, not just node E. It's for this reason that the name for the problem that Dystra's algorithm solves is the single source shortest path problem or SSSP.
But did you notice the hidden assumption throughout all of this? When we got to know D, we declared that the current path to D was the shortest path. And that was because it was already more expensive to go anywhere else first.
Meaning it could only be more expensive to go somewhere else and then back to D.
But that's only if there are no negative edges. To illustrate the point, if this edge with a weight of two that we ignored previously had a big negative weight like -3, then it would actually be better to go to B first, which would cost 2, and then to D, which would cost -3. changing the total cost from positive 1 to negative 1. So we can't rely on Dyster's algorithm if there are negative edge weights. And that's where the Bellman Ford algorithm comes in. The Bellman Ford algorithm starts out the same way as Dyster's algorithm. You look at all the outgoing edges from the source node A and update the distances to their destinations. This gives us all shortest paths from A of length one.
Since we've just branched out to all the nodes that are one edge away from A, that's it for this iteration because so far we only knew how to reach A. In the next iteration, we know how to get to A, B, C, and D. So, we'll look at all of the edges coming from each of those nodes, which happens to be all of the edges in the graph this time, and update the paths for all edges that give an improved cost. This gives us a better path to C of cost 2, a path to D of cost -1, and a path to E of cost 4. Since the previous iteration gave us all shortest paths with one edge, we now have all shortest paths with at most two edges, because we just looked at every edge in the graph, essentially trying to extend the previous shortest paths by one more edge. In the next iteration, we repeat the process looking at every single edge where we find a new path to C of cost zero and a new path to E of cost 2. For the same reason, this gives us all shortest paths with at most three edges.
And finally, we do one more iteration where we find a new path to E of cost one. And now we have all the shortest paths with at most four edges. And I say that was the final iteration because these are actually guaranteed to be the overall shortest paths. This is because there are only five total nodes in the graph. So any path of length more than four would have to revisit one of the nodes which could only make the path cost more unless that is there's a negative cost cycle meaning a loop of edges where you can keep getting more and more negative costs by just taking the loop over and over again. If this is the case though, it kind of makes the concept of a shortest path meaningless because you'll get costs of negative infinity. But the good news is that we can detect whether there's a negative cost cycle by just running one more iteration of Bellman Ford. If any shorter paths are found in this final iteration, we know there must be a negative cost cycle because otherwise all of the shortest paths would have already been found on the previous iteration.
So that's the Bellman Ford algorithm and this works but as I mentioned before it's pretty slow. Looking at the pseudo code here you have to do one iteration for each node in the graph and for each iteration you could be checking as many as every single edge. So with nodes and m edges the number of operations performed grows proportionally to m * n.
That's how you get the o of m * n running time I referenced before.
Dystra's algorithm, on the other hand, is much faster since without any negative edges, it can be smarter about when it visits a node. It only visits a new node when it knows it's already found the shortest path there. This makes it so that the algorithm only has to check each node and each edge once.
So, the running time very nearly scales with the size of the graph. It's very close to O of M + N, which would be linear time. Now the true running time is not quite this fast since besides exploring the nodes and edges, there is also that step in each iteration of picking the minimum distance node found so far to explore next. So most implementations end up with an O of M plus N login running time which is usually achieved using a data structure called a Fibonacci heap to keep track of the distances. But as we saw before, this is still much faster than O of M * N. So, how does the BNW algorithm bridge the gap between these two results? That is, how does it achieve a nearlinear running time while still working for negative edges?
Well, there's a specific concept it uses repeatedly to accomplish this. To illustrate it, let's take a look at this simple graph. Suppose we wanted to find the shortest path from A to B. We can see pretty easily that it goes through C, achieving a total cost of 1. However, there's a negative edge weight, which means Dystra's algorithm won't work. It would instead find this path from A to B with a cost of two. Since the edge from A to B is the best local choice, this ends up happening because the algorithm has no way of knowing that taking the more expensive edge to C is going to end up better in the long run. But consider this. What if some of that cost of going to node C is delayed until after you leave? Meaning the incoming edge to node C costs say three less for a total of one. But the outgoing edge costs three more for a total of zero. The cost through node C is clearly still the same since these two updates offset each other. Meaning the change does not alter the shortest path. But we've managed to turn the original graph which had a negative edge into a graph with no negative edges. Meaning we can now just use Dystra's algorithm on the new graph to find the shortest paths.
This trick of shifting the cost from ingoing to outgoing edges is called a price function. In this example, we'd say the price of node C is three since that was the amount by which we changed edges to and from C to create the modified graph. In fact, more generally, you can assign a different price to every node of a graph, and all shortest paths between pairs of nodes in that graph will remain the same. We already sort of just saw why this is true. But to hammer the point home, consider an arbitrary path between a pair of nodes U and V. Say the total cost of that path is C. Remember from the previous example that when you assign a price to a node, it does not change the cost of paths that go through that node since the discount of entering and the extra cost of leaving offset each other. So we'll just consider the prices of the end points U and V. The price at U means that leaving U will get more expensive, so it adds to the cost. The price at V means that entering V is less expensive, so it subtracts from the cost.
Therefore, the new cost of the path from U to V is just the old cost C plus the price of U minus the price of V. So the cost of paths from U to V do change, but they all change by the same amount. the price of u minus the price of v. And thus the shortest path has to be the same as it was before.
To summarize, applying the right price function to a graph can simultaneously eliminate all the negative edges and keep all the shortest paths the same.
And this suggests the following simple framework for an algorithm. First find a price function that eliminates all the negative edges. Then just run Dystra's algorithm which we know is fast. And since all the shortest paths are maintained by the price function, they'll correspond to the shortest paths in the original graph. It really is as simple as that. But it's not as easy as it might sound to find the right price function. In our small example, it was pretty easy to do just by inspection.
But that's just because it was a small example. In fact, we'll see later in the video that finding a price function to eliminate all negative edges and finding shortest paths are essentially the same problem. you can use one to solve the other and vice versa. Finding a good price function directly is just as hard of a problem as finding shortest paths.
So the solution in the BMW algorithm is to instead find the price function gradually. The central step in the main routine is to repeatedly find new price functions that result in the most negative edge weights becoming half as negative. This turns out to be easier than bringing them all the way up to zero in one go. And at a high level, the entire algorithm is just to do this over and over again until the negative edge weights are so small that they're insignificant and can simply be rounded up to zero. At which point you can use Dyster's algorithm and get all the shortest paths. So this sub routine scale down, which makes the most negative edges half as negative, is where all of the important stuff happens. And that's what we'll focus on from here.
Since our goal right now is to eventually cut the most negative edge weight in half, the first step is to sort of push things in the right direction by doing the simplest thing possible, just artificially subtracting off half of the most negative edge weight from all of the negative edges.
So if the most negative weight is -10, we would add five to all the negative edges. This gets the most negative edge weight down to half its value. Now, we're not done because unfortunately this doesn't keep all the shortest paths the same. For example, here's the shortest path from D to H before the modification. However, after adding five to all the negative edges, the shortest path changes since these two edges on the first path became more expensive.
But this is actually okay. The goal right now is not to find the shortest paths yet. Instead, it's to find a price function that completely eliminates the remaining negative edges. So, why is this useful? Well, notice what happens to the most negative weight. It starts at -5. But when we update the weights of all the edges from the price function, that weight gets bumped up to zero. So what we can do is take the same price function and apply it to the original graph, the one before we added five to all the negative edges. Over there, the same price function would cause the most negative weight to be bumped up from -10 to5. And this might seem like déja vu because we already increased this weight from -10 to5 in the first step. But the difference is in what happens to the rest of the graph. This time the increase of five was caused by a price function which means the shortest paths are maintained. So finding a price function that eliminates all the negative edges on this modified graph achieves our original goal of scaling down the negative weights in the original graph by a factor of two. And this might seem like we're just kicking the can down the road because of course for this example I've drawn in the solution for that price function. But this is actually the key step. We were just talking about how finding a price function to eliminate negative edges is hard in general. But in this modified graph, we've already eliminated some of the negativity. Remember, we added five to all the negative edges. And it turns out that this seemingly small change will make it easier enough to solve the problem efficiently. So, long story short, our goal is now to find a price function that eliminates all the negative edges in this modified graph.
And the key insight that the BMW algorithm employs to do this is to separate the negative edges into categories and then engineer things so that almost all of the negative edges are very easy to eliminate and you only end up with a handful of difficult negative edges. Few enough that they won't slow things down too much. So then what are these categories? Well, they're created by a process that breaks the graph down into many parts with this technique called low diameter decomposition or LDD for short.
>> So, let's say this is my graph.
Then the first thing I do is I break it up into these low diameter components.
Here's an overly simplified explanation of how that works. You take the graph and essentially draw a bunch of circles around smaller sections within it. Then you take these components and put them in some order. There are some specifics here that we're skipping over, but the main point is that this process distinguishes between three different categories of edges like we were talking about. Number one are the edges within each component. Number two are the edges that go forward in the ordering between components, meaning from left to right in this picture. And number three are the edges that go backward in the ordering between components, meaning from right to left in this picture. The rest of the scaleown procedure consists of three phases, each of which eliminates the negative edges of one of these types. But it turns out that only the type three edges are difficult to eliminate. The type 1 and type two edges are easy. Oh, and you may be wondering where the name low diameter decomposition comes from. Well, it comes from the fact that each of these components is left with a smaller quote unquote diameter than in the original graph where diameter means the greatest cost of any path between nodes in that component. I realize this specific definition sounds a bit unmotivated and I'll tell you this property is very necessary to make the overall algorithm run efficiently. In fact, to do so, the algorithm needs very granular control over the resulting diameter for all of the components, which roughly corresponds to drawing these green circles at different sizes. Sometimes the algorithm needs smaller diameters and sometimes it needs bigger ones. And all of this is aimed at making the number of difficult edges, the type three edges, as small as possible so that they don't take too long to deal with. The way low diameter decomposition works, you create these components and you also define an ordering on them such that most of the edges are forward edges and only a small number of edges are these backward edges. That's just like a power low diameter de composition gives you. In this video, we won't get into all the nitty-gritty details about the specific settings of the LDD that accomplish this. We'll briefly touch on that towards the end. But for now, what I want to shift your focus to is these three types of edges. And with this rough sketch of an idea for how LDD works, we're now ready to start going through those three different phases of the algorithm. And along the way, when we need more specificity, we'll address some more concrete guarantees of the LDD procedure. Now, even though both the type 1 and type 2 edges are easy to eliminate in the sense that you can do it efficiently, it'll be a bit hard to dive right into the explanation of phase 1 from here because understanding phase 1 sort of requires that you already understand the other two phases.
Conceptually, the easiest part to understand is phase 2. So, we're actually going to start with that. Here, the goal is to find a price function that makes all of the negative forward edges non- negative. To do this, let's start by considering just the edges between the first and second components here. In particular, note that the highest negative weight is -10.
Now, think about what would happen if we set a price of -10 to every node in the second component. Here the price is negative which is a bit confusing but it just means that all of those edges entering the second component get more expensive by a cost of 10. And because the highest negative weight was -10 it means none of these edges will be negative anymore.
It's important to realize that this change doesn't affect any of the edges within the second component or any component for that matter. Since the price of each node in the component is the same. every edge between two nodes that are both in that component simultaneously get more expensive and less expensive by the same amount. So, it offsets. Now, you'll notice that the price function did in fact change these edges leaving the second component which weren't really our target. But that's okay because we're going to fix these edges next. In particular, we're going to consider all of the edges flowing into the third component. Take the highest negative weight among those, which is -42, and then set a price of -42 to every node in the third component, which increases all of those incoming edges by a cost of 42, making them all non- negative. And this does not mess up what we did in the previous iteration. The price on nodes in the third component only affects edges entering and leaving that component. So we can simply continue on for each component down the line, finding the biggest negative incoming edge and setting the price for every node in that component to be that value. And once you do that, you'll be left with no more negative edges that flow from left to right between components. This concludes phase 2 of scale down. Now again, we haven't even done phase 1 yet, but we're still going to continue to phase 3 first since part of phase 1 involves just skipping to phase 3 when a certain condition is met.
So it'll be helpful to understand what we're skipping to first and then once we get to phase 1, it should be easier to explain what's going on. All of the edges inside the components are now positive and all of the forward edges are now positive. Let's say the path from S to T looked like this.
Most of the edges here are either inside the components or they are these forward edges. You only have a couple backwards edges. Phase 3 eliminates those negative backwards edges. But really what the procedure for phase 3 does is it actually eliminates all remaining negative edges in whatever graph you have. It just so happens that by this point in the algorithm, the only negative edges left will be those backwards edges. And as you'll see later, there are few enough of those that phase 3 can run efficiently. But to understand how phase 3 works, we're going to put away this schematic that was helping us visualize things in the context of the low diameter decomposition and think more generally about eliminating a small number of remaining negative edges in some graph.
The way this is done is a clever combination of two algorithms we've already seen and discussed. Dyra's algorithm and the Bellman Ford algorithm. But there's a key difference now compared to how we saw these algorithms at the beginning of the video. There the end goal was to find all the shortest paths from an input source which we were labeling node A in the previous examples. But in this new context, the goal is not actually to find shortest paths from the original input source node yet. Instead, phase 3 is run from what's called a dummy source S, which is an extra node that we add to the graph with edges of weight zero to every other node.
We do this because firstly it ensures that all nodes are actually reachable from the source. For example, if we used node A as the source, then node B over here wouldn't be reachable since there aren't any incoming edges to node B. The only edge connected to node B is this outgoing edge. From the dummy source though, there's a trivial path to every single node. Just use the dummy edge.
And the reason why we want all the nodes to be reachable is because it turns out that once we find all the shortest paths from S to other nodes. If we set the price of each of those nodes to be the computed distances from S, it actually has the effect of making all edges non- negative.
To see why, think about an arbitrary edge in the graph. Say this one with a weight of five between node F and node I. Once we apply the price function, the new weight of this edge is going to be the price at F, which is the distance from S to F, plus the old weight five, minus the price at I, which is the distance from S to I. Here, to make things more concrete, let's see what that distance from S to F actually is.
Remember what distance means is the cost of the shortest path which happens to be this one from S to C to E and then to F for a total cost of -7. And then this weight of five extends that path to I for a total cost of -2. So the distance from S to I cannot exceed -2. It can't be greater than the distance from S to F plus the weight from F to I because we just found a path from S to I with that cost. Doing a little rearranging, we can see that the new weight of this edge from F to I after the price function is applied has to be greater than or equal to zero. And indeed, when we apply the price function to the entire graph with the actual shortest distances to all of the nodes, it makes all of the edges non- negative.
So, we know that computing the shortest distances to all the nodes from a dummy source can give us a good price function. But how do we actually compute those distances? This may seem a bit like we're going in circles because the overall task we're trying to solve is to find the shortest paths. But remember, we're going off of the assumption that we only have a small number of negative edges left by this point since we're in phase 3 of the main algorithm. And as you'll see in a moment, this will be the key for phase 3 to run efficiently.
So like I said, we're going to sort of combine Dystra's algorithm and the Bellman Ford algorithm. Now, because each of these edges from the dummy source has a weight of zero, we can initialize the distances to all of the nodes in the graph at zero. The process continues by running a single iteration of Bellman Ford on the negative edges, checking each of them to see whether they provide a shorter path to their destination. In this case, all of them will. The current shortest distance to every node is just zero. So any incoming negative edge will improve the distance to that negative number. Now things get more interesting. The next step is to run Dystra's algorithm with the updated distances, but only on the non-gative edges since Dyster's algorithm only works for non-gative edges. We'll do a little refresher just so we're on the same page here. Remember, in Dyster's algorithm, you find the node with the current shortest distance, which in this case would be node J with a distance of -10, and you explore its outgoing edges to see if they provide shorter paths.
Here we find a shorter path to node G with a cost of -5.
And now we just repeat find the node with the next shortest distance and explore its outgoing edges. That's node L this time. Although it doesn't have any outgoing edges, so we can just skip it. Then we go to node G itself. Also no updates there. And we continue on like this until all the nodes have been visited where eventually we'll find that this edge from K gives a shorter path to M. It's worth noting here that for example on the next iteration when we visit node M itself we do not check its outgoing edge to L because that's a negative edge and this dystra phase is only for the non- negative part of the graph. So when everything is said and done, the result is that we've essentially extended all of the previous paths as much as possible on the non-gative part of the graph, leaving us with all shortest paths that use at most one negative edge since we've only run Bellman Ford one time so far. The next natural step is to well try to use those negative edges again. So we do another iteration of Bellman Ford trying to extend all of those paths by one more negative edge. You can maybe guess what's coming next. We're going to do another dystra phase. And after that, we'll have found all shortest paths with at most two negative edges.
This simply runs over and over until the distances stop improving, meaning all of the shortest paths have been found. In this case, it takes four total iterations of Bellman Ford and Dystra for that to happen. A nice way to interpret what that means is that the maximum number of negative edges on any shortest path is four. Since we had to do Bellman forward four times before the distances stopped improving. For example, one such path is from S to L using these four negative edges. The reason why I'm showing you this is because it has implications for how fast the algorithm can run. If there aren't that many negative edges in the graph, well, you need fewer iterations to complete this process. So, say you have at most k negative edges on any path, then you only have to run bellman forward k times, right? Exactly. Because each time gets one more of those edges.
>> One more negative edges. Exactly. And then >> dyster will get the positive ones in between.
>> In between. Exactly. Since each of those iterations does a single pass of Bellman Ford followed by Dystra's algorithm, the overall running time is just the sum of the running times for each of those steps time K, which simplifies to O of M + N log N * K. You also don't really need K to be a hard limit for the number of negative edges on any shortest path.
It turns out that you get the same result with the slightly weaker assumption that K is the average number of negative edges on any shortest path.
And taking a step back, remember the whole point of doing the low diameter decomposition, the reason why we're going to all the trouble to do these three different phases is that there won't be that many type three edges. Or in other words, K isn't really that big.
We really would like the number of edges on the shortest path to be small. And we'd really like to break the graph up into components where every shortest path uses actually few edges. Again, we won't get into the specifics of how you make all the numbers work out, but it turns out that using the right settings for the LDD, the expected number of negative red edges on any shortest path is only O of log M squared.
Substituting that value for K in the expression we got before, we get that the overall running time for phase 3 is O of M + N log N log M^ 2. or the way the authors put it in their paper, this simplifies to O of M log M cubed if M is assumed to be greater than or equal to N, meaning there are more edges in the graph than there are nodes, which is a very common and reasonable assumption.
Either way though, this is nearlinear time. These expressions just grow proportionally to the size of the graph, modulo, a few log factors. So, we're now finally ready to go back and understand phase 1, eliminating all the negative edges within each component created by the LDD.
Now, if you think about it, there's nothing inherently special about each one of these components compared to the overall graph. They each look like some smaller network of nodes and edges. So, eliminating the negative edges within one of these components is essentially the same kind of task as the overall problem, which is eliminating all the negative edges in the entire graph. When you have a situation like this where part of the solution to a problem is just to solve a smaller version of that same problem, it often lends itself well to recursion which is exactly how the BMW algorithm handles this. So the actual algorithm is you do this recursively. So like inside this blue part instead of now solving it using just like Bellman Ford or something you recurse and then inside those you would also have like forward edges and backwards edges and then in fact the way you solve each of these internal blue circles is by more blue circles and so on. Within each component you solve the problem by doing another low diameter decomposition with an even smaller diameter that creates its own type 1, type 2 and type three edges which are each handled by their respective phases.
That means those components are also broken down by another LDD and so on and so on. Now this has to stop at some point and to understand when it should stop there's one more important technical guarantee of the LDD that we haven't talked about yet which is that the specific settings used for it in the BMW algorithm make it so that every time you break the graph into a new set of components each of those components has at most half the number of negative edges on all its shortest paths. Now, we won't get into the proof of why the fraction ends up being exactly 1/2, but it should at least make sense why the shortest paths within each component contain fewer negative edges. After all, the components are smaller, containing fewer overall edges. So, a natural byproduct of this is that the shortest paths within them are going to contain fewer negative edges.
So we're going to like decompose the graph into smaller pieces such that within each piece your paths are not too negative which kind of makes sense right like in this smaller piece like even if this is s and this is t maybe like t is over here then my most negative path looks more like this it's not very negative. So how does this answer our question of when the recursion stops? Well, initially in the overall graph, we know that the number of negative edges on any shortest path is at most n, the number of nodes. That would be a path that visits every single node once. And we also know that every time we do an LDD, it makes the shortest paths contain at most half as many negative edges. It'll go from n to n /2 to n /4, etc. And you can keep doing as many iterations as it takes before this gets all the way down to one at most one negative edge on any shortest path. And once that's the case, well, we already have an algorithm that takes care of this scenario efficiently. It's the Bellman Ford plus dystra combination that we used in phase 3, which as you'll remember runs very fast as long as the number of negative edges on any shortest path is small, which in this case it is.
So at this point we can simply skip to phase three and we're done.
So yeah that's pretty much it. That's the algorithm. To recap the main framework is you start with some graph and a source from which you want to find all shortest paths and then you repeatedly use scale down to make all the negative edges less and less negative. This involves using a low diameter decomposition to break the graph down into parts and three phases to fix all three types of negative edges created by that process.
After enough iterations of scale down, the negative edges will be so small that they're practically negligible, at which point you can just use Dyster's algorithm to find all the shortest paths. So that's the main idea. And there are some extra details to be discussed, more than could ever hope to fit in a video of any reasonable length.
But one thing worth mentioning is we've been implicitly assuming throughout the video that the original graph has no negative cost cycles which remember is a loop of edges where the total cost of taking that loop is negative. For instance, in phase 3 of scale down the Bellman Ford plus dystra combination is supposed to run until the costs stop improving. But as we saw in the example for the regular Bellman Ford algorithm earlier, if there's a negative cost cycle, the costs will never stop improving. you'll get costs of negative infinity. And so this would actually just run forever.
>> The algorithm is not meant to find a negative white cycle. So if you have a negative white cycle, the algorithm will just fail 100% of the time. Like the price function just won't work. It can't work. There is no good price function if you have a negative white cycle. The way this is handled gets a bit technical, but the gist of it is to run the main algorithm, but if it's taking a lot longer than expected, you assume there must have been a negative cost cycle.
And then there's some fancy methods used to detect those cycles efficiently in that case. But anyway, now that we've seen how the BMW algorithm works, we should check that it's indeed a fast algorithm because that's kind of the whole point. Remember that what we're going for is nearlinear time, which means the number of operations performed should be at most proportional to the size of the graph potentially multiplied by some extra logarithmic factors.
Now there is in fact another simplifying assumption which we didn't really need for the main explanation of the algorithm but which is important to touch on in order to understand the final running time. The actual first step that the algorithm does is it transforms the original graph such that all negative weights are exactly -1.
This might sound a bit strange because throughout the video we've been thinking about weights with pi or negative numbers. But this is just because after that transformation happens, the main algorithm then scales up those negative weights again, namely by a factor of 2 n. A rough picture you could have in your mind for this transformation is that you replace each negative weight in your original graph with a path of that many negative 1's. Now, in reality, this turns out to be too slow to really be effective. So, the actual way this is done is a bit more clever and much more efficient. And you really don't need to worry too much about the specifics here.
The main point is that the running time of this algorithm doesn't actually depend solely on the size of the graph.
It also depends on the values of the negative weights. Namely, when this transformation is used, it introduces an extra log factor to the running time log of W, where W is the most negatively weighted edge in the graph. And again, we won't prove it in this video. If you want to learn more, you can check out this paper by Andrew Goldberg, who first published this technique in 1995.
But going back to our simplified example with the path of negative 1's, it should at least make sense why the running time is going to depend on the value of the weights. Edges with larger negative weights get expanded into longer paths.
So there will be more overall edges in the graph after the transformation.
This means that theoretically the BMW algorithm could actually be very slow if you have exponentially large edge weights. But if this is not the case, meaning the edge weights are at most some polomial function of the size of the graph, which is a very common assumption that applies to most situations, then some simple algebra reveals that this log of W factor inside the big O notation is equivalent to a log N factor where N is the number of nodes. Meaning the running time can now be expressed conventionally purely in terms of the size of the graph. Now you may be wondering why are we doing this transformation at all. And the short answer is the resulting graph is a lot nicer to work with and reason about.
Many of the technical details we didn't have time to go over in this video rely on all the negative weights being exactly -1. So it's a trade-off. You add an extra log of W factor of running time in exchange for having everything else work nicely. So with that out of the way, what's the running time of the main algorithm once this transformation is done? Well, since this loop simply takes the most negative edge weight, which remember is scaled up to be 2 n by this point, and cuts it in half until it gets down to -1, the number of times this loop is run is roughly log base 2 of 2n.
So it adds one log n factor to the running time. After that main loop, the only thing left is Dystra's algorithm.
And we already know that that runs very efficiently. It'll be insignificant compared to the work contributed by the calls to scale down. So let's focus on the running time of scale down itself.
Now the first part of scale down is to do the low diameter decomposition. And although I never said it explicitly, it's of course very important that this runs in nearlinear time. Otherwise, none of this would work. And it does. For those curious, here's the specific running time. We won't prove it in this video, but you can learn more here. Now, skipping to phase two, remember that this step just involves a simple loop over all the nodes and edges. So, you don't even need any log factors for that. It's actual linear time. For phase three, well, we already saw that this runs in nearlinear time as long as there are few enough negative edges, which if you'll remember from before, there always will be by that point. And for the recursive component introduced by phase 1, we know that each use of scale down successively has the maximum number of negative edges on any shortest path from n all the way down to one. This means the number of recursive levels is roughly log base 2 of n. The dominating term here is this one from phase 3. And so to get the final running time, we simply multiply it by the number of recursive levels. Before we do that calculation, it'll make our lives easier if we do a quick simplification. And that is to realize that m and n are very closely related, especially when you're just talking about log factors. We already briefly talked about one common assumption, which is that m is greater than or equal to n. Essentially meaning that your graph doesn't just look completely barren. There are at least as many edges as there are nodes. But there's also a limit to how many edges you can have. Each of the nodes can be connected to at most each of the other n minus one nodes. So in any graph you can have at most n * n minus1 edges which will generously round up to n^ 2. If m is lower bounded by n but upper bounded by n^ 2 then log of m is lower bounded by log of n but upper bounded by log of n^ 2 which is just 2 log n. So O of log M and O of log N are equivalent to each other. Since the only thing that big O notation cares about is proportionality.
That means when we multiply these two expressions together to get the running time for scale down, we can just convert all the logs to be in terms of N and we get O of M log N 4th.
Multiplying that by the one extra log factor from the main loop, you get a running time of O of M log N 5th for the main algorithm. and then multiplying that by the extra log of w factor we talked about and another few log of n factors required for the negative cost cycle detection we briefly touched on which again we won't prove in this video but you can learn more here you end up with a final running time of o of m log n 8th log w as desired near linear time now this is an amazing result finding shortest paths is one of the most important problems in computer science.
And since Dyra's algorithm was invented in the 1950s, it was unknown whether it was possible to solve the shortest path problem for negative edges with a similar running time. But not only is it possible, the techniques used in the BMW algorithm like price functions, low diameter decomposition, and the Bellman 4 plus dyra combination had been around for decades. It just took a bunch of clever insights to put them all together in the right way.
The more recent techniques that people have been trying to apply to these graph problems >> look nothing like this. Sure. Right.
Could it be the case that simpler techniques could be applied to different kinds of graph problems? A few kind of of the core problems in graph algorithms. Negative way shortest path is one of them. Maximum matching was another. Maximum flow was another.
Problems that many people have heard of.
They've been around since the 50s and only recently, very recently, did we start chipping away at these problems using like a entirely different set of techniques and these new techniques and said like took your problem, took it to a totally different realm, worked on this other realm and then brought it back to reality and they were like amazingly able to make progress where no one had for a really long time. But the BNW algorithm shows that at least for this one problem, it's possible to get a good solution while sticking to the older techniques which are relatively simpler compared to the state-of-the-art methods. Our algorithm, it's not super simple, but it more uses this familiar set of techniques. And so we were hoping like maybe it's time to revisit problems like matching. The field has evolved. We have new tools. We have new mindsets.
Maybe we can come back to those problems and also come up with kind of relatively simple algorithms.
>> Even for the negative weight shortest path problem though, this still isn't quite the end of the story. While the final running time for the algorithm is great in theory, there are some additional factors to consider that mean more work is necessary for these techniques to be implemented successfully in practice.
>> This algorithm, has it been implemented in practice? Our algorithm runs in nearlinear time which means m * log to the 9th which if n is a billion quadrillion then log to the 9th is indeed a slow growing like technically log to the 9th is a slow growing function but that slow growing only kicks in when n is some absolutely enormous number. Even if you have like a trillion nodes log to the 9th is completely unaffordable. our algorithm was like just not optimized at all, but that people have been working on. So like future improvements have really like both simplified it and like made it much faster. So it's still like not dystrous in terms of the number of logs, but it's getting there. I said at the beginning of the video that technically the single source shortest path problem is still unresolved, and that's because we still don't know the optimal running time.
There very well may be improvements that could get it down even further. And this isn't just for negative weights, by the way. In 2025, five researchers became the first to discover an algorithm that can in theory solve the shortest path problem for non- negative weights even faster than Dystra's algorithm.
In computer science, it's often very tricky to decide for which problems we should keep looking for better solutions, especially when the problems are among the oldest and most thoroughly studied. In fact, in many ways, this is the fundamental type of question that theoretical computer scientists wrestle with. Some of the biggest unsolved mysteries in the field, like P versus NP, which we've covered on this channel, are at their core about whether better solutions to certain problems even exist. But while many questions remain mysteries and will likely continue to for quite some time, there are those few that come along every now and then where the path to progress is hidden in plain sight in old techniques waiting to be revisited with fresh eyes or manipulated in a slightly different way. Sometimes all it takes is the courage to dive in, get your hands dirty, and try again.
Thanks so much to New York University for providing funding for this video and to Professor Aaron Bernstein and Dupon Nanon Kai for taking the time to walk me through their work and also to Professor Christian Wolf Nelson for being involved with the project. If you're interested in having your research featured on Purple Mind, reach out to me at my email in the description below. Of course, a huge thanks to my Patreon supporters.
And as always, thanks so much to you for watching. I hope you enjoyed and learned something and I'll see you next time.
相关推荐
Agentforce NOW AMA: Build with React and Salesforce Multi-Framework
SalesforceDevs
490 views•2026-05-28
How agent o11y differs from traditional o11y — Phil Hetzel, Braintrust
aiDotEngineer
450 views•2026-05-28
WEB TECHNOLOGIES UNIT-2 | Degree 4th sem BCOM Computers web technologies unit-2 full explanation💯✅
LearnwithSahera
1K views•2026-05-29
More tests are always better? How to use AI to identify tests that bring little value
Alliance4Qualification
335 views•2026-05-29
Search Algorithms Explained in 60 Seconds! 🤖💨
samarthtuliofficial
218 views•2026-06-01
People of Game of Thrones using JavaScript DOM
AltCampus
296 views•2026-05-30
Introduction to Problem Solving Part - 1 | Lecture 1 | Intermediate DSA
ascensionix
107 views•2026-05-29
So What's Odin Lang Even Good For
TechOverTea
131 views•2026-06-01











