HorizonDB is a cloud-native fork of PostgreSQL that separates compute and storage into independent fleets, enabling linear scaling up to 192 V cores with 8GB memory per core, while maintaining 100% PostgreSQL API compatibility; it uses a write-ahead log architecture where the compute node sends transactions to a distributed WAL storage fleet across availability zones, and a separate data storage fleet handles actual data pages with transparent sharding, allowing the compute layer to focus solely on transaction processing without storage tasks.
Deep Dive
Prerequisite Knowledge
- No data available.
Where to go next
- No data available.
Deep Dive
Overview of Azure HorizonDB
Added:Hi everyone. In this video, I want to dive into what exactly is Azure Horizon DB.
Now, if we take a step back, if we think of PostgreSQL, this is a very mature, it is open source, uh relational database. It has super strong capabilities, has a large ecosystem of extensions, of community, has AI capabilities.
And when I mentioned that word relational database, what's really important about this, what we care about when we leverage it, is it is acid compliant, which means we get a guarantee that it's safe, it's reliable, we get correct transactions.
If any part of a transaction fails, the entire transaction is rolled back. There can't be any partial updates, which would cause business problems. Hey, I added money to this account, but I didn't remove it from this one. It helps avoid things like corruption. Now, you might be wondering, okay, we're talking about PostgreSQL, I know in Azure, we have the Azure Database for PostgreSQL Flexible Server. Now, that added things like Entra integrated authentication.
It adds things like the PG vector extension. So, again, there's a number of great extensions to this, but we have PG vector.
When we think of those embeddings that represent the semantic meaning of data, super useful for AI and natural language.
There's things that build on top of that, like the disk ANN, which is a better nearest neighbor vector search capability.
So, that work out there. So, we're doing this good solution in Azure for PostgreSQL.
And we see PostgreSQL used in large and small applications. However, if we think about from an architecture perspective, Azure or non-Azure, it's really built around this idea that we have a primary, the one that we can read and write from, and it has both the compute and the storage responsibilities. It is not distributed.
Now, yes, I can technically add a number of standbys or read replicas that have read endpoints. I can go and use them maybe for analytics.
But, primarily, I'm going to bottleneck my performance on whatever this primary node is. So, and also, when I think about the replication, that's work this primary has to do.
Now, you may be saying, "Hey, look, I know we have the Citus extension.
And what the whole point of the Citus extension is it lets us horizontally scale our database."
Now, what it does is it does that by turning PostgreSQL into a distributed database. So, the data has to be distributed via shards.
We partition up the data based on this distribution, this sharding column.
And while that solves the scalability challenge, what it does mean is that when I'm the application, I have to change my app.
Because I have to think about, okay, I have a distribution key. I have to consider as part of my query design that distribution. So, there's changes I have to make.
So, then let's think about, okay, we have the cloud.
And it's very common now to have cloud-native architectures, where we disaggregate, we separate the compute and the storage. That is as independently scale, and also I can take advantage of native capabilities for services that exist in the cloud. For example, if I think about cloud storage service, I think about blob, there's a whole set of capabilities with replication, with snapshots, with backups that exist as part of that service.
So, the whole goal about this is, great, we have this PostgreSQL database that has a lot of really good things.
Well, Horizon DB is a cloud-native version of PostgreSQL. So, it is a fork of PostgreSQL. A fork means, hey, we take the code and we essentially take a copy of it, to which we can now make changes.
And as part of that, it enables that disaggregation of the compute and the storage. So, I can get higher scale, higher performance, higher availability, better security. But, here's the really important part.
From an API perspective, it is the same.
So, it's the same API. It is the same PostgreSQL tools that connect. It has a 100% PostgreSQL compatibility.
So, when I'm thinking about, "Okay, my app and that Okay, I want to go and talk to Horizon DB." There are no changes that I have to make.
But, as part of that, I'm going to get all of these benefits around that idea of the better scale, performance, and availability.
So, I get a whole bunch of goodness without breaking anything. So, I get that cloud scale without me as the application developer having to do anything differently. And And that's the whole goal around this.
So, how does this work? How are we making these changes?
So, from an architecture perspective, it is still a relational database.
So, I still have my primary compute.
So, we're going to separate out the compute and the storage. So, there's only one single primary instance because of that relational database.
Now, when I think about this, this primary compute, and I think about the interactions, so, the read write endpoint talks to that.
From a scalability perspective, this can have up to 192 V cores.
For each V core, I get 8 GB of memory.
So, it's going to scale linearly.
So, let's actually take a quick look at what we see when we create one of these.
So, if I'm creating an Azure Horizon DB instance today, you can pick things like obviously the region, PostgreSQL version, but from a configuration, what we're picking here is the number of V cores, and we get that associated amount of memory. So, that is the primary thing we are selecting, and then by default, it's going to add one replica that will be in a different availability zone. But notice I could scale these all the way up to get 15 replicas in total.
Now, if we focus back on something we just saw, what's really important about this particular primary?
Well, it lives within a specific availability zone.
Now, let's start talking about how some of this um separation, disaggregation is actually working.
So, the what it's going to do is it's using a database as a log architecture. So, what that means is the compute node sends the transactions for the write-ahead log, the WAL, which is basically the transaction log, to a separate fleet.
And so, if this is my compute, what we're actually going to have is this separate idea of the write-ahead log storage fleet.
And this is made up of of lots of different instances distributed across all of the availability zones. So, 1 2 and 3.
And so, as there are transactions made, what's going to happen here is this primary compute is actually sending them across multiple instances across all three of the AZs. So, it's actually sending it across cross-zone in millisecond time.
Remember, these are all append-only writes. And the WAL itself is using an NVMe pass-through so it accelerates its performance by skipping layers of the kernel of the IO stack so we get a really, really high performance here.
And the compute's quorum manager is responsible for sending those across to the different instances, across the different AZs in this WAL storage fleet.
And it has to have the majority to succeed in the commit for it to consider that transaction to be successful.
So, this is how we start to see how the compute and the storage are separated across.
But then what we have is if this is the the write-ahead log storage fleet, we have a second Oh, wrong color. Let's get the color right.
We have a second data storage fleet.
Which again is made up of lots and lots of different instances providing the service, which again would be spread over all of the different zones within the region.
Now, once we have that, what's actually happening here is that this stores the actual data pages. It is transparently sharding the segments, think a chunk of data up to 1 GB across this storage fleet.
So, any IOPS, any throughput I have is spread across that entire fleet.
And so, this while storage fleet is responsible to populate the data storage fleet. But, it's going to do it very intelligently.
It has a filter, so it only sends the changes to the storage nodes that powers the particular data portion.
And then, the data storage fleet, the node, will reconstruct the data pages by applying the write head log content it has.
Now, today the maximum database storage size is 128 TB.
Now, that size is really focused on what they've tested today, what they're supporting today.
There's not a reason why that's the maximum possible size, and that might change in the future.
But, it's also going to auto scale. You don't have to manually provision that storage.
And what's important when you look at this just initially, is realize this means the compute node is not performing any of the storage tasks. They're all offloaded to the storage fleets, which means the compute node can just focus on doing compute stuff, the handling of the transactions. Now, if we jump over for a second, we can actually see some examples of the things that are offloaded. So, these are all savings, if you will, that now that primary compute node does not have to do. So, some of it is disk IO savings, some of it is network savings, but it's offloading those components.
So, it can focus more of its capability on the actual transaction parts.
Now, it also means this compute layer is stateless.
All of the durable content is in this well storage fleet and the data storage fleet.
Now, this compute layer does also have NVMe SSDs.
Now, this SSD storage will be multiple times bigger than the amount of memory, which again is based on the ratio of the number of V cores.
And then, only if from a caching perspective it doesn't hit either what's in memory or what's in the SSD, then it has to go and pull the data from that data storage fleet. So, it will, where required, go and fetch the various pages.
So, I can think about, hey, nanosecond latency where it's in the memory, microsecond latency where it's in that NVMe storage, millisecond where it is actually a remote read.
Now, for ultimate durability, what sits under all of this is, as you might guess when I think about storage capabilities, it is using zone redundant blob.
So, I've got ZRS blob.
And this is being used both for the write-ahead log and it's being used for the data storage fleet.
It happens to use block blobs for that append-only well, it's using page blobs as part of the durable storage for the data fleet.
Because the well storage fleet is only keeping the tail of the log and then any retention requirements are fulfilled uh by the blob.
And these storage accounts are all Horizon DB instance-specific. There is no sharing across any instances of Horizon DB.
Database backups are done via snapshots.
So, backups snapshot.
Now, as you saw when I showed you the page, you can have standby read replicas.
So, in addition to my primary computer with its configuration, what I can also add here are n numbers of standby or read replicas.
These also have those same NVMEs.
And as mentioned, you can have up to 15 maximum of these.
And what happens here is there's actually a load balancer that distributes to however many you have. And so, the read-only endpoint is housed on the load balancer, which then will go ahead and send it through. Now, these are actually using the data storage fleet. So, there is no PostgreSQL replication replication involved anywhere. They are all the same config from a compute and memory perspective as the primary.
And so, if I had those 15 read replicas, that's where I would get the 3,072 V cores you may have seen.
Now, they also have, as I mentioned, all of those same shared buffers, a local SSD cache. They get those pages sub-millisecond read latency as required. Today, they have to be in the same region as the primary because they are sharing the data storage.
Now, what's also going to happen because we have this um caching, the write-ahead log fleet will also send transactions to any of those standbys where it's a change to a page that is in memory for that specific replica.
And obviously, these are also used for high availability purposes. If the primary goes down, it picks the closest replica, makes it primary, and then moves the read-write endpoint to that instance.
Now, because this write-ahead log storage fleet is the single source of truth, and because these are stateless, when there is a failover, it's really simple.
You just switch the role holder of the primary, and you move the endpoint, and you're done.
There is done as a normal PostgreSQL reinitialization task required, so failover is much, much faster.
Uh I can manually fail over when I want as well.
GeoHA, I believe will be available at GA. It'll be a separate cluster in another region.
There'll be a shared virtual endpoint on top to make it transparent. It'll use async replication as you always do anytime you're multi-region.
So, this is the architecture. You can see where it's separating that compute and the storage. You can get these really high performance, this really high scale, and these really nice availability capabilities.
Now, for my other feature capabilities, we sort of talked about some of the things we have already.
So, when I think about Horizon DB, it it's going to have that same Entra ID integrated authentication.
There will be private endpoint supported. It currently supports 75 of the most popular PostgreSQL SQL extensions. They're working on more. So, you think about things like that, disk ANN, BM25 for rich text searching and ranking.
Um it can mirror to Microsoft Fabric.
So, if you think of Fabric as that data virtualization layer, all these different engines that can operate on the data, I want to expose ontologies, um semantic models of my enterprise entities referring to data, hey, Horizon DB can take part of that.
There are AI pipelines. I can describe um the AI workflow using SQL. So, it's the declarative. So, chunking, embedding, uh the generation, the ranking, all of that, I can run that as a fault-tolerant pipeline. It's going to live inside a PG durable um function.
It has model management. So, when I think about doing things like creating those embeddings that represent the semantic meaning, well, I need AI models for that. It would do the model management for you. So, it would integrate with Microsoft Foundry.
And so, when I think about embeddings, chat completions, and re-ranking, they will just be available. Yes, you can bring your own if you want to, but it will also just take care of that if you want.
Now, from a pricing perspective, we're paying for the compute instances.
You're going to pay for the optional standbys.
You're going to pay for the size of the database.
And then, you're going to pay for these backups.
So, those are your cost dimensions.
There isn't a separate IOPS or throughput price. It's baked into the cost of the database storage.
Now, if we jump over and actually look at the pricing page, you can see it talks It kind of showed that as we create, but you pay for the number of vCPUs, which obviously includes the amount of memory.
And then, I pay for the amount of storage the database, and I pay for the amount of backup. And then, I would pay for whatever the cost of the Foundry model. So, if it's managing the models, it doesn't charge you an additional cost for managing them, but you would still be paying for the Foundry cost of whatever models it goes and creates to do that management.
Now, something you do have to consider when you think about the pricing is sizing the environment correctly.
Because think about it now. There's There's one copy of the data.
So, when I think about There's one copy. I'm not having separate copies across, for example, different replicas.
Remember, it's offloading a lot of tasks now to these storage fleets. So, more of the vCPUs, more of the memory in the compute instances can just go to doing the work.
So, when you think about sizing it, don't just think, "Hey, my PostgreSQL instance is this size today, this number of cores, this amount of memory. I need the same for HorizonDB." It might be you can have smaller instances, but achieve the same amount of performance. You're going to want to pay attention to that as part of that exercise.
And so, that was it. I really just wanted to talk about what it was. So, HorizonDB it is a cloud native 100% PostgreSQL compatible database, but because of that cloud native architecture, it's solving many of the scale, the performance, the availability challenges we normally see.
Now, this is early days. It is in preview. I would keep an eye on documentation as more features will get added as it gets closer to GA, as it hits GA, then past GA.
For now, you're not going to want to use this in production. It's preview.
You could definitely start experimenting and then think about where it may fit in your architectures, especially where you are seeing challenges today, as you think about that scale, the performance, the availability, and this may well be your solution.
Hope that's useful. Till the next video, take care.
Related Videos
Walmart Manager Arrested After Stealing $670,000 - A Data Analyst 800 Miles Away Caught Him
bodycamsecretsyt
111 views•2026-06-09
This Machine Still Runs on Punch Cards 🤯📄 #youtubeshorts
WaltersShortsChannel
6K views•2026-06-10
GitLab’s Manav Khurana: AI Agents, Orbit, and the Future of Coding
TechVoices-live
374 views•2026-06-10
"What's the Difference Between a Class and an Object?"#class #programming #softwaredevelopment
CS-with-Alireza
349 views•2026-06-08
I Made an Antivirus That Secretly Attacks Scammers
ScammerPayback
153K views•2026-06-13
Leetcode Weekly Contest 506 | Life's boring these days
Pudeesht
2K views•2026-06-14
Why Your Computer FREEZES?
GreshamCollege
1K views•2026-06-09
Programming in English
MattGodbolt
584 views•2026-06-14











