PiD (Pixel Diffusion Decoder) is an AI model that performs 4K upscaling while simultaneously restoring image details, unlike traditional upscalers that merely enlarge images. It operates in pixel space rather than latent space, allowing it to reconstruct unclear faces, textures, and details during the upscaling process. The key parameter 'degrade_sigma' controls restoration strength: lower values (0.1) preserve original image fidelity while higher values (0.5-0.75) enable more aggressive detail reconstruction. PiD works best with SD3 and Flux latents in ComfyUI, and while it excels at restoring blurry images, it may introduce artifacts like plastic textures due to its LCM sampling algorithm.
Deep Dive
Prerequisite Knowledge
- No data available.
Where to go next
- No data available.
Deep Dive
PiD in ComfyUI: 4K Upscale + Real Repair,How PiD Rebuilds Blurry Images in 4KAdded:
Let’s first look at a really interesting model from NVIDIA called PixelDiT Sounds powerful, right?
But when we look at the images it generates the texture the naturalness of the details and the overall quality are not especially impressive There are even some obvious problems So a lot of people may ask: why are we talking about a model whose results are not that good?
Actually the main focus today is not this model itself What we really want to talk about is another model in the same technical direction called PiD The most attractive thing about PiD is that it can upscale images to 4K while also having very strong image restoration ability Let’s look directly at the results from PiD first These images are close to 4K level You’ll notice that it is not simply stretching the image larger The character’s skin, hair clothing texture and background lighting all get a lot of details reconstructed This kind of result no longer feels like traditional sharpening It feels more like the model has re-understood the image and then regenerated it again at a higher resolution There are already many tools that can do 4K upscaling today The part that makes PiD truly worth studying is its very strong restoration ability With a normal upscaling model if the original image is blurry it will usually still be blurry after upscaling It does not really have restoration ability But PiD is different During the upscaling process it can reconstruct unclear faces textures and other details from the original image Next let’s go through some basic theory so you can understand more clearly what these two models are actually doing First you need to know that both PixelDiT and PiD are related to one term: pixel space When we talk about generation models we usually have two spaces One is called pixel space and the other is called latent space In most workflows we operate in latent space The model first generates in latent space and then uses a VAE to decode that latent into an image The benefit is that it is fast and low-cost The problem is that the VAE decoding step may lose details especially textures text edges, and small objects These can easily become compressed and not clean enough PixelDiT takes a more aggressive approach It is a DiT diffusion model that generates images directly in pixel space Its goal is to bypass the loss caused by latent compression and let the model directly process pixel-level details Since the image quality generated by this model is not very high at the moment its real significance is that it explores a possible direction So the model we’ll focus on is the one below: PiD PiD also works in pixel space So what is the full name of PiD?
It stands for Pixel Diffusion Decoder That means it is not a text-to-image model It is a latent decoding and upscaling model It directly uses pixel diffusion to turn latent data into a high-resolution image You can understand it like this: PixelDiT is a text-to-image model in pixel space while PiD is a decoding and upscaling model in pixel space PixelDiT tells us that pixel-space diffusion is a direction that can work PiD applies that direction to something more practical: high-resolution decoding 4K upscaling detail reconstruction and image restoration ComfyUI officially supports this model now You can go to the Comfy-Org repository on Hugging Face On that page the file locations are explained very clearly If you want to use the PixelDiT text-to-image model you need to download that file You also need to download a text encoder If you want to run a PiD workflow there will be more model files because PiD has different versions depending on the base model and the target size For example, for Flux one there is a five hundred twelve to twenty forty-eight version and a ten twenty-four to forty ninety-six version Flux two and SD3 also have corresponding versions There is one important point to note: PiD is not currently a universal image upscaler This set of models from Comfy-Org is mainly used with SD3 and Flux latents Now let’s briefly break down the workflow For easier demonstration I also built the workflow on RunningHub In the ComfyUI space RunningHub is a really great online workflow platform because whenever new models or new extensions appear it usually follows up very quickly RunningHub also provides an online community platform for ComfyUI creators where you can find a lot of creative inspiration and example workflows Let’s first look at the PixelDiT text-to-image workflow Here the model we load is a BF16 model The text encoder is a Gemma two encoder and the CLIP type must be set to PixelDiT For the positive prompt and negative prompt you can write whatever you want For the VAE Loader we choose pixel_space Pay attention here In order to keep the overall workflow structure visually consistent Comfy-Org still keeps the concept of latent here But you can remember one simple sentence: PixelDiT does not rely on a traditional VAE to generate images For sampling just follow the official setup Finally we decode it and get an image Of course this part is mainly for you to understand the idea Now let’s look at the basic PiD workflow This is not a text-to-image workflow It is an upscaling workflow So first we start with a reference image and then we encode it During encoding, we do use a VAE But pay attention: right now the supported formats are only Flux and SD3 Here, I’m using Flux one When choosing the PiD model you need to choose based on your upscaling goal Here I’m using the Flux one model that upscales from ten twenty-four to forty ninety-six and it is a four-step upscaling model The encoder is unchanged It is still Gemma two The CLIP type still needs to be set to PixelDiT For the prompt I wrote something very simple: best quality Here, there will be two VAEs The first one is the Flux one VAE which is used to encode the image into latent space Then we run sampling After sampling when we decode the result we no longer use that VAE Instead we use what we call pixel space Now let’s look at the sampling The sampling process in this PiD workflow is quite short because it only uses four steps The sampling algorithm is LCM and CFG is one So you’ll notice that the speed is very fast But because it uses LCM sampling some images may carry that common LCM-style texture For example the skin may look a little plastic and some details may look overly clean or almost like they were redrawn So PiD is not a mindless replacement for every upscaling model It does have real restoration ability but it also has its own style and limitations In this workflow there are two very important nodes The first one is called Context Window The second one is called PiD Conditioning Let’s start with Context Window This node is designed for high-resolution image generation A 4K image is already very large The model cannot easily process the entire image at once the same way it processes a ten twenty-four image So it splits the large image into windows and processes them in parts This is what we call the context window Since my target resolution is forty ninety-six I set the context size to twenty forty-eight and the overlap to five hundred twelve The fuse method is pyramid What kind of mode is this?
You can simply understand it as splitting the large image into blocks first and then combining those blocks back together We use this mechanism in many places For example when we talk about tiling the underlying idea is basically the same thing The second node is called PiD Conditioning This node understands how to process the input latent There are two key points here The first is latent format meaning which VAE generated this latent You can choose Flux or SD3 The second key point is degrade_sigma This parameter controls how much the input latent is degraded That sounds a bit abstract so let me say it more directly The smaller degrade_sigma is the more conservative PiD becomes It is more like upscaling while respecting the original image with only light detail repair The larger degrade_sigma is the more aggressively it repairs and reconstructs the image But at the same time consistency with the original image may also decrease So essentially this parameter is controlling a trade-off Now let’s demonstrate it First, look at this image The original image itself is not very clear especially the character’s face After upscaling the details are not stable enough If we use a normal upscaling model it will most likely enlarge the blurry areas along with everything else So first let’s set degrade_sigma to zero point one This is the value used in the official workflow Now let’s look at the result The image has indeed been upscaled to forty ninety-six but the face is still not clear enough It feels more like the original image was enlarged rather than truly restored So zero point one does not work well here Now let’s go directly to zero point five and sample it again You can see that the change is much more obvious now The character’s face becomes much clearer The eyes the edges of the facial features and the skin details are all reconstructed This is the very interesting part of PiD It does not only do super-resolution It regenerates usable details at a high resolution But when the restoration strength goes up the model starts guessing information that did not exist clearly in the original image If it guesses correctly that is restoration If it guesses wrong it changes the identity changes the facial features or changes the texture So for many portrait restoration cases zero point five is a practical starting point Next let’s look at a more extreme example This image is basically almost unusable The noise is very heavy the details are very broken and the facial information is very poor First let’s set degrade_sigma to zero point five You’ll notice that PiD can indeed restore the overall image The scene becomes clearer and the character outline becomes more stable But pay attention: the details are still not ideal especially the character’s face and eyes So in this situation you can consider increasing the restoration strength further For example I set it to zero point seven five Now sample it again and you’ll see that the facial details become much more natural The image also looks more like a complete high-quality picture So the correct way to use PiD is not to push every image to a high value like zero point seven five Instead you should first judge what the original image looks like decide how much restoration you want and then adjust this parameter Finally let’s talk about usage suggestions If you just want to quickly upscale a clear image PiD may not always be the best solution Traditional super-resolution Ultimate SD Upscale, SeedVR or other upscaling tools may be more stable PiD’s advantage is not that it is the most stable Its advantage is that it can both upscale and restore If the original image quality is poor PiD becomes more valuable than a normal upscaling model especially when the face clothing texture or background details are missing In those cases you can increase the restoration parameter But I still recommend not pushing it very high right from the beginning Another point is the LCM sampling algorithm we mentioned When generating some details LCM itself can have certain weaknesses For example it may lean toward a plastic texture and some details may be polished too smoothly You also need to keep this in mind during sampling To summarize PixelDiT itself is a pixel-space text-to-image model with strong research value But if we directly use it to generate images the results are not especially good So its greater value is that it points to a possible direction In practice we are more likely to use the PiD model For normal ComfyUI users you only need to remember three things First, PiD can do 4K upscaling Second PiD can do more than 4K upscaling; it can also do restoration Third the way we control the restoration strength is through degrade_sigma This is the core parameter The smaller the value the more conservative it is The larger the value the stronger the restoration becomes But consistency is also more likely to be damaged Alright, that’s all for today So what are you waiting for?
Go try it yourself Follow me and become someone who truly understands AI
Related Videos
OpenHuman VS Hermes AI: Who Wins?
JulianGoldieSEO
285 views•2026-05-29
Long-Running Agents — Build an Agent That Never Forgets with Google ADK
suryakunju
142 views•2026-05-30
This computer is made from real human brain cells. And you can buy it.
Talktmsmedia
3K views•2026-05-28
BREAKING: Microsoft’s New Image Generating Model Beat Out GPT 1.5 and Nano Banana 2
aimmediahouse
122 views•2026-06-03
I Made the Same Anime Fight Scene in Every AI Video Generator
NobleGooseAnime
295 views•2026-05-30
Nvidia Bets Big On AI PCs | New Chip To Power Windows Laptops | Technology | AI Updates | N18S
cnnnews18
3K views•2026-06-01
I Tested NEW Opus 4.8 on Four Projects (Updated LLM Leaderboard)
AICodingDaily
298 views•2026-05-29
3D Platformer Update - NO CAPES
SolarLune
294 views•2026-05-30











