PGQ is a zero-bloat Postgres queue implementation that uses snapshot-based batching and truncate-based table rotation to avoid the 'Q spiral of death' problem that affects traditional queue systems using FOR UPDATE SKIP LOCKED. Unlike conventional queues that accumulate dead tuples and cause performance degradation, PGQ maintains three rotating tables and uses snapshot diffs to identify new jobs, resulting in zero dead tuples and stable performance under load. The trade-off is 50-150ms latency due to its 10-tick-per-second architecture, making it suitable for most use cases but not for millisecond-level dispatch requirements.
Deep Dive
Prerequisite Knowledge
- No data available.
Where to go next
- No data available.
Deep Dive
Zero Bloat Postgres Queue | Scaling Postgres 416Added:
Thankfully, I've never had to deal with queue issues that much. Most of my larger clients have a queue system set up outside of Postgres. Maybe it's Redis or some other technology. And at a certain scale, you can run a queue in Postgres. For example, my current SaaS app uses Rails active storage library, which is essentially using a Postgres queue using for update skip locked, etc. But what if you wanted to keep your queue in Postgres, but you want to push the limits of what's possible when doing things like for update skip locked?
Well, we're going to talk about that today.
But I hope you, your friends, family, and co-workers continue to do well.
Our first piece of content is actually a repository, and this is Nick's repository from Postgres FM.
And just recently, he released this tool called PGQ that he describes as a zero-bloat Postgres queue built on top of battle-proven Skype's PGQ, one SQL file to install PG cron to tick. So, the first notable thing about this, this is not an extension.
If you look at the code distribution, it's basically a lot of PL/pgSQL files, but you basically clone the repo and run a SQL file to install it.
You need to have Postgres greater than 14 and have something that calls PGQ ticker periodically. So, you could use PG cron.
So, apparently, you can set up like a 1-second tick, and then it reticks every 100 milliseconds, so 10 ticks per second, basically.
So, that means this is available essentially everywhere. You don't have to install an extension to get started.
But what really makes this interesting is the implementation. So, it does not use for update skip locked.
And you can see this cool graphical representation where it's showing dead tuples rising with all the other queuing systems, whereas PGQ essentially has no dead tuples because it never updates or deletes.
What it does is use snapshot based batching.
So, it uses snapshots to determine what transactions are new within the queue table to be able to process. And then a truncate based rotation method for tables. So, it cycles through three tables at a time.
So, that results in zero bloat, no performance decay, it's built for heavily loaded system just like the Skype PGQ architecture was built for.
It has all the same Postgres guarantees, and it even works on managed Postgres on cloud vendors.
Now, there is one trade-off for this, and that's latency because he says, quote, "PG ticks 10 times per second by default."
So, jobs are going to have a 50 to 150 millisecond latency.
And I think for a vast majority of use cases, that's perfectly fine, but it kind of depends on what you're using your queue for.
He said you could maybe reduce the latency by doing 20 ticks per second.
And he said there might be a way to push it even more, but if you need single millisecond dispatch, PGQ's not the right tool for that use case.
But if you want to be stable under load without load, that's what PGQ was built to do.
Now, the next blog post related to this is PGQ two snapshots and a diff. This is from the build.com.
And of course, he's talking about PGQ.
He talks about how current queues are typically made. You basically insert into some sort of a jobs table, then you use a CTE to identify the jobs you want to process using for update skip locked.
Then you update that jobs table indicating which jobs were pulled off of the table and eventually you probably delete those rows. But once you start scaling this up you hit what he says Brandur called Q spiral of death. Where you have a long-running transaction of some sort holds back the global X men and auto vacuum can't reclaim the dead two poles, job table grows, index lookups slow down and then the inserts slow down because the indexes are full of dead entries.
And then the skip block scan slows down because of all the extra two poles it has to sort through. That causes the throughput to drop and then the backlog grows and it's a never-ending cycle or a death spiral. And PGQ uses a different technique to do that. So like I mentioned it maintains three tables and it looks like you can specify independent queues and each queue will have its own set of three tables and it just cycles through and truncates a table once it's done. So there's no deletes, there's no updates ever to it.
And the ticker that runs once per second although it looked like from the documentation it's actually 10 times per second populates a table that looks like this and it's basically storing a snapshot for each tick.
So it's basically X men X max and the list of in progress transaction IDs at that moment in time. And when a consumer, someone who's connecting up to the queue to see what jobs are available runs basically takes the last tick and the current tick to determine are there any transactions that need to be processed. So you can see here where the visible in snapshot for the current tick but the last snapshot, it's not visible.
So, the only changes that happened to this table are the inserts of the new jobs coming in. The only consumer hot path is just simply a select to say, "Hey, is anything available for me within this snapshot range?"
And he, of course, goes into more detail about the implementation, also talking about the truncate. And essentially, the disadvantage of this is the latency involved with picking up each job.
But he says, "You know, everyone's been setting up a similar queue system using for update skip locked, but this is an innovative way circa 20 years ago.
Basically, what the name of this blog post is taking two snapshots and doing a diff between them to see what has changed in a queue that needs to be processed."
So, check this out if you're interested.
Another blog post related to this, potential consequences of using Postgres as a job queue. This is from richyen.com, and he talks about the typical queue using for update skip locked and some problems related to that.
And he mentions some alternatives like using advisory locks instead, talking about PGQ that Skype developed, but is no longer really maintained. But we, of course, have the new PGQ. You could also use Redis or Kafka as well.
Next piece of content, maybe PG backrest isn't dead. So, there's been a maintenance update that's been added to the PG backrest repository, where the maintainer, David, said, "Uh my inbox blew up, especially with messages from people or organizations who have PG backrest users of their own to support.
They would prefer the project to continue with him as the primary maintainer."
And he said he didn't think this was going to happen trying to do fundraising earlier, but now based upon of this influx of support, it appears quote all but certain that I will be able to secure enough funding to continue the project.
And I'm assuming that means as the primary maintainer.
And that this will be supported by a coalition of sponsors. As well as potentially bringing on another maintainer to distribute the workload and provide continuity in the future.
And this is what I kind of suspected might happen because if there are organizations relying on this tool and recommending it, they're probably going to step in to try and save the project or alternatively choose another solution.
So, this is definitely interesting outcome. Check this out if you're interested. Uh next blog post related to that PGX backup continuity support for PG backrest. This is from the bill.com.
And apparently they have forked PG backrest to make PGX backup.
That's because that's what the request with David is was that if it's going to be forked that you should rename it. And it lists the support that PGX will be providing for this backup solution. But I wonder if this will be necessary based upon the previous article. We'll just have to see.
Next piece of content, multi-exact members at 64-bits. One last wrap-around to worry about. This is from the bill.com.
And there's been an enhancement in Postgres 19 setting the multi-exact offset to 64-bits.
Now, this is not the multi-exact ID.
That still has the same limit, but these are the members. And I believe we covered a blog post a number of months ago where there was an organization that hit the PG multi-exact members limit.
And they were doing things like trying to track disk space to monitor the size that that was at. But apparently it's been updated to 64-bits.
So, that should no longer be a problem.
But, he said one downside of this is that when you are going to be upgrading, if you run PG upgrade, it's going to have to take a while to run this type of migration due to the size change.
But, if you want more insight into this, you can definitely check out this blog post.
Next piece of content, it depends using session variables in Postgres. This is from pgedge.com.
And he's talking about how you can use variables in Postgres, and it's basically using session variables.
First, he looks at what MySQL does. You can just use a set command to set a variable and then use it directly in queries.
With SQL Server, it's a little bit more formal, but you can declare variables the same way.
In Oracle, it has its own syntax as well.
Now, if you use PSQL, you can set PSQL client-side variables to do this type of thing.
But, if you're trying to do it within an application, that won't work because it's only through PSQL.
So, you can set session variables, and as long as you include a dot in the variable you're naming, it will accept it as a custom variable. So, you can create a variable to set anything you want, as you can see here.
And then you can show it. But, the problem is you can't use these in queries because these are essentially commands, set, show, reset, etc. But, what you can do is use methods that set these session variables. So, you can do set config to set it, or use current setting to extract it or get it.
So, he shows various different examples of using that.
So, if you want to learn more about this, definitely check out this blog post.
Next piece of content, Cybertec's contributions to PostgreSQL 19. This is from cybertec-postgresql.com.
And a lot of these have been covered in previous episodes of scaling Postgres, but I did want to say that the technique that they're using for creating repack concurrently is a result of working with the people at Cybertec who developed PG squeeze because they were using the technique to do logical replication to keep the old table and the new table in sync and do the changeover. So, thanks to all their work to get this new feature into Postgres.
Hopefully, it'll make it into Postgres 19. They also mentioned various documentation usability enhancements, as well as getting Postgres to run on all the different features of Debian operating systems. So, if you want to learn more, definitely check this out.
Next piece of content, PG Keeper, building the bouncer we needed for Postgres. This is from figma.com.
And this is yet another connection pooler for Postgres, and Figma decided to build their own. So, they were using PG bouncer for a time, but they were hitting limits with regard to that.
Mainly, it is single-threaded, so I understand that is a nuisance.
They also couldn't prioritize any particular traffic. They wanted to have better connection management logic, and they wanted some better operational tooling, essentially.
They did also evaluate PG Cat as well, so that did give them a multi-threaded pooler, but adding all the different observability features they needed and feature flagging and admission control, they felt there would be too many changes to address, so they built PG Keeper instead.
So, if you want to learn more about the tool that they built, as well as its rollout, definitely check out this blog post.
Next piece of content, Christoph's seven rules of disaster response. This is from the build.com.
And this first one is wind your watch, which is to his point is basically take a deep breath and take your time to assess the situation. Don't react too quickly. The second is no one on the call who doesn't have something to do. So, basically a small focused team. Three is have a final decision maker.
Don't have a bunch of developers sitting around on the call wondering what to do.
You definitely need to have a decision maker there.
Next is analyze, don't argue.
So, basically rely on the data to lead you where it leads as opposed to trying to be right, for example.
Next is point and call. Explicitly say, "All right, our plan is to do A and then we are going to do A now." Explicitly call out what you're doing.
Next is appoint someone to deal with senior management. So, designate someone for communications so that management's kept up to date, but ideally keep them off the call.
And lastly, no postmortem until after the corpse is cold. Basically, wait for the issue to be resolved and then you can examine what went wrong and why.
So, I thought these were great guidelines to cover.
And the last piece of content, how are committers selected? This is from vander.me.
And he goes through how Postgres committers are selected and chosen for the project in terms of who does it and the process and general criteria. So, check this out if you're interested.
I hope you enjoyed this episode. Be sure to check out scalingpostgres.com where you can find links to all the content discussed as well as sign up to receive weekly notifications of each episode. There you can find an audio version of the show as well as a full transcript. Thanks.
I'll see you next week.
Related Videos
Agentforce NOW AMA: Build with React and Salesforce Multi-Framework
SalesforceDevs
490 viewsβ’2026-05-28
How agent o11y differs from traditional o11y β Phil Hetzel, Braintrust
aiDotEngineer
450 viewsβ’2026-05-28
Re: π£οΈπthepropheduπ2026 GST 103 CLASS (E-EXAM REVISION)
theprophedu
636 viewsβ’2026-06-04
WEB TECHNOLOGIES UNIT-2 | Degree 4th sem BCOM Computers web technologies unit-2 full explanationπ―β
LearnwithSahera
1K viewsβ’2026-05-29
More tests are always better? How to use AI to identify tests that bring little value
Alliance4Qualification
335 viewsβ’2026-05-29
Search Algorithms Explained in 60 Seconds! π€π¨
samarthtuliofficial
218 viewsβ’2026-06-01
People of Game of Thrones using JavaScript DOM
AltCampus
296 viewsβ’2026-05-30
Instagram accounts got PWNed
EricParker
13K viewsβ’2026-06-03











