Alpha-to-coverage is a GPU hardware feature which approximates alpha-blending using MSAA (multi-sample antialiasing) — essentially an anti-aliased version of alpha-testing, as a cheap alternative to order-independent transparency (OIT) techniques.
This article is an interactive visualization based on Anti-aliased Alpha Test: The Esoteric Alpha To Coverage by Ben Golus. It is based on my own understanding, so it may not be entirely correct!
The source of this demo is on GitHub!
There are several components to this demo/visualization. They are demonstrated in various "presets", each with an accompanying section of the article. The resolution of the visualization defaults to one where you can clearly see the resulting pixels. At any time you can adjust any of the options — including the resolution, using the "size" slider, which goes from 1×1 to 8192×8192 (up to your canvas resolution).
This is a visualization of a "bush" made of "leaves" using my understanding of the alpha-to-coverage technique described by Golus.
And this is the aliased version of the same thing — using alpha-testing instead of alpha-to-coverage.
Just for fun, this shows the bush rendered using alpha-blending without any kind of order-independent transparency. The leaves are painted in entirely the wrong order.
Now, let's dive into some scenes that show how alpha-to-coverage works. We'll do this by rendering a few transparency-gradients on top of each other. This scene renders one white quad (on a black background) with an alpha gradient from 0% at the top to 100% at the bottom...
... and this scene plops a black quad on top, this time with an alpha gradient from 0% on the let to 100% on the right.
Here we can see the "ideal" result for this scene. However, alpha-blending is order-dependent, so this only works because we have drawn our geometry in the correct order. Instead, we would like to avoid blending, so that we can rely on the depth test to "sort" our geometry for us.
A simple alpha-test (displaying a fragment if alpha > 50%, and discarding it if alpha < 50%)... sort of works, but the quality (in a scene like this) is extremely low.
But we're using MSAA! We don't have to deal in binaries — if this were real geometry, we wouldn't have to answer "yes"/"no" for whether each pixel is covered by the geometry. Instead, each of the samples in each pixel would have its own coverage bit. Let's zoom in...
Here, we can see each individual sample in our 4xMSAA render target. We
have quite a bit more resolution to work with. In the fragment shader
(which runs just once per pixel), we can actually output a "sample mask"
which tells the GPU which individual samples to keep/discard —
without having to run the fragment shader once for every sample (which
is also possible). We can see that the alpha-test is essentially
producing either a sample mask of 0000
or
1111
. But we could output 1, 2, or 3 samples as well...
... which is exactly what alpha-to-coverage will do for us! Here, we see our first real alpha-to-coverage algorithm, the one used by some NVIDIA GPUs.
(Note the samples are arranged somewhat arbitrarily; that's OK, because all the samples for a pixel get averaged together in the end anyway.)
When we average out the samples, we get results that look like this. That's sort of a gradient!
Now we can zoom out and see the higher-resolution result.
Well... it's alright, but it still looks pretty chunky. After all, with 4 bits, there are only 5 values we can work with (0, 1, 2, 3, or 4 samples covered). Some GPU architectures improve this by applying dithering patterns that repeat over 2×2 or even 4×4 pixel areas, providing average "bit depths" of 16 or 64, respectively.
Let's look at a different GPU architecture. This is the 2×2 dithered alpha-to-coverage pattern used by some Apple GPUs. Much better! But notice that the gradient is cut off at the diagonal — in the "ideal" version, this wasn't the case.
A different visualization can help us understand why this happens. Here, we are looking at a single solid blue quad at varying alpha values. This lets us see how the architecture translates alpha values into exact bitmask patterns.
Note in particular how, on this architecture, once a sample "pops in",
it never "pops out". This means that for two alpha values
a1
and a2
, if a2 > a1
,
a2
will always fully cover up a2
.
This is not what would be expected with blending!
Some architectures get a bit fancier to produce somewhat more blending-like results. Here is the algorithm used by some AMD GPUs. Note how it divides alpha into more than 17 steps (29 to be exact), even though it only uses the 16 sample bits over a 2×2 area. Instead, it "moves" coverage bits around, seemingly arbitrarily, at steps where it doesn't change the overall coverage percentage.
But this fixes the problem we observed previously! Now, some of those
more-transparent a1
samples show through under
less-transparent a2
samples.
Effectively, this technique somewhat randomizes the mixing of nearby colors, using slight differences in alpha (that would be barely perceptible) as a source of "randomness". It's not perfect, and it has some odd artifacts, but it does put us much closer to the blended result.
What could we do with more? Here's the pattern used by some Qualcomm GPUs. It's 4×4 (so 64 samples), yet has (just) 18 steps.
And here's how it renders! Pretty cool! Still some artifacts, but they easily disappear in a real scene. (The "blurry foliage" preset below will show how this actually works nicely in practice!)
This circular gradient scene shows the banding/dithering in a more natural scenario. Here on NVIDIA, with banding...
... here on Apple, with 2×2 dithering (AMD looks about the same because there is only one draw call here)...
... here on Qualcomm, with 4×4 dithering...
... and here on your own GPU's native alpha-to-coverage algorithm (which may or may not be the same as one of the above).
Finally, a slight variation on the initial demo.
By default, the foliage demo uses a 1px feathered edge to produce an antialiased but sharp edge. But at full resolution, the foliage demo also looks neat with blurry-edged leaves, shown here.
Try out various emulated devices — they make a subtle difference! Note in particular how both AMD and Qualcomm (especially Qualcomm) are brighter toward the center of the plant where there are a lot more overlapping leaves. This is because their algorithms do a better job preserving coverage of one leaf before drawing the next one. (I believe this technically comes at the cost of some depth-testing accuracy due to the randomization, but it's not really noticeable.)
textureLoad
from
the corresponding sample index and display that. Outside of those
circles, we show the color from the single-sampled resolve target
(which averages the 4 samples), or the grid lines.
sample_mask
builtin instead,
with alpha set to 1. The sample mask is generated by a function which
carefully emulates the behavior of some tested hardware. Figuring out
this function is the hard part...
This tool captures the alpha-to-coverage behavior on the current device and then attempts to generate code to emulate it, by detecting the block size and the exact thresholds beween steps. Click on the "Generate an emulator for this device" button to run the generator.
This preset will show the generated emulator in the visualization, and also show your native device as a small dot inside the big one as a comparison. If you can still see the small dot after clicking the button, that means the generated emulator did not successfully capture your device's behavior. If that happens, or your device has a different pattern than any of the ones I have provided, please submit your result!