Übersetzung: Multisampling Anti-Aliasing unter der Lupe [Archiv]

KillerCookie

2004-01-03, 00:54:18

Seite 1/8 = done... [b]ich mache mit seite 3 weiter[b/]. also 2 ist noch zu vergeben.

Multisampling Anti - Aliasing under the magnifying glass

May 22nd, 2003/from aths / page 1 of 8

Introduction

We would like to take a special topic apart in detail today: multisampling anti - aliasing.
Grundsätzliches zum Thema findet sich ausführlich in einem älteren Artikel,dem ebenfalls einige Seiten zum multisampling gewidment sind.

Some things which are important to this topic are, however without deepening the preparatory explanations, started to cut in the fast run once again. We hope that this article is also understandable without any accompanying material. Otherwise we refer to our archive.

The topic today isn´t gray theory but the funktion of concrete buyable hardware. Next we use the multisampling - procedure of the Geforce3, 4 Ti and FX. Later we´ll react to the multisampling of the Radeon (available at radeon 9500 and higher).

The enumeration shows that all modern cards are using Multisampling to realize Anti - Aliasing. The procedure must have it´s advantages, of course an important part of the article is dedicated to them.

The grid - process: How geometry turns to graphic

If the graphics card get´s a traingle, which is located in a fancy room, it´ll be projected on the screen. This triangle has only 2D coordinates (x and y). (indeed the Z - coordinates for the deepness are safed, too)

The triangle setup detects in which screen lines the triangle is located: it searches all 3 corner points and starts rendering at the smallest Y value and goes up to the biggest Y value. Because the X values of the corner points are confessed furthermore it can be found himself out now, too, for every line at which X position the triangle starts or in the current line ends.

The chosen representation shall offer only a principle survey of the grid function because Multisampling builds on it. The decoupling of the Triangle set-up of the pipeline introduced with Early Z at the latest remains unconsidered.

[Picture: Example: a triangle covers 4 lines.]

The pipeline "knows" which textures should be painted on the triangle because of the initialization of the texture stages. For every corner point of the triangle a texture coordinate is submitted for every texture now, too. This coordinate is line wise and then pixel wise adapted correspondingly so that the pipeline for this pixel samples the right place from the texture. A perspective correction takeplace by the way, a linear interpolation of the texture coordinate axes doesn't suffice. Modern graphics cards have presumably a logic of their own already which directly calculates the texture coordinate corrected perspectively from the pixel coordinate in the triangle.
The still stored Z values also are co-treated by the way correspondingly so that one also knows the depth of the pixel pixel wise.

Now all required data are existing tu texture the pixel. The TMU of the pipeline get´s the (perspectively corrected) texture coordinates and provides the texture sample back. For this pixel are (matching with the filter setup) 4 or 8 single texture texles needed. That is detailed descripted in our filter article.

We only called the points that are important for the anti - aliasing. Now it will be told that a new article appoint here, but back to the topic multisampling - anti - aliasing.

[Picture: Thats the way the grided triangle looks like: informations are lost at the edges.]

How the image shows there is some essential inexactness during the griding process. But this is not only during the handling with the geometry but for the textureing, too but we will discuss this later.

Next to the edges: How can we stop that inexactness?

So und ich geh jetzt erstmal kräftig pennen :D :zzz:

Greets Maik

Nerothos

2004-01-03, 01:11:35

Without reading your whole translation:

I think the title should be changed. "Anti-Aliasing under the magnifying glass" sounds a bit strange to me. "Anti-Aliasing in detail" or "a closer look on AA" or sth. like that sounds a lot better (in my opinion) =)

Vielleicht liegt das auch nur daran, dass Deutsch meine Muttersprache ist *ggg*

KillerCookie

2004-01-03, 02:38:00

"magnifying glass" klingt zwar vielleicht etwas komisch ist aber die exakte übersetzung... und es wird ja auch unter die lupe genommen :D . also es sind noch 6 seiten zu vergeben.

Greets Maik

CannedCaptain

2004-01-03, 15:00:27

Seite 2 ist mein!

KillerCookie

2004-01-03, 21:58:56

Seite 3/8 = done... ich mache mit seite 4 weiter... (macht sehr viel spaß *ocl*...) also bis zur nächsten seite. (wer liest denn mal korrektur???)

Seite 3:

Multisampling anti - aliasing under the magnifying glass

May 22nd, 2003/from aths / page 3 of 8

Geforce multisampling: On the fast way

Multisampling means to sample only one texture value for each pixle but several subpixels are created. Sayed another way: While supersampling renders the whole picture in higher resolution but with multisampling only the edges are calculated with a higher resolution but not the textures. Because the sampling takes most of the space of a texel pipeline, it´s possible to calculate many multisampling - AA - samples with only one pipeline via a an extansion of the pipeline. Thats the way it´s done. How are the NVIDIA chips working? Our example relates to the GeForce4 TI and GeForce FX. The difference to the GeForce3 is to be explained later.

The triangle setup has the function to provide work to the pixle - pipelines. At the GeForce Example the triangle setup works nearly doubly exact for multisampling - AA. Per screen line 2 treaingle lines are created. The higher resolution apply´s not only to the lines but for the splits, too. An example:

Because of competence reasons the triangle setup works with blocks instead of lines on modern graphics cards. At the GeForce such a block has got a size of 2 x 2 pixels. The triangle is now rendered blockwise and not linewise like shown on page 1. Apropos the Early - Z - Test is used for such 2 x 2 blocks. Our block has got 4 pixels. Now It´s reasonably that each of the 4 GeForce - pipelines is responsible for one pixel in the block.

[Picture: In this block the pixel on the right bottom is covered a 1/4 by the triangle.]

Each pipeline gets only one pixel. In this example one pipeline stays without of work because one of the pixels is at the outside of the polygon. A wrong cut like this is inavoidable at edges. At a line - based architecture which processes 4 x1 blocks there are more wrong cut´s. General the wrong cut´s and the incompetence grows bigger with the value of smaller rendered triangles. Expenditurely geometry with many roundings creates many small triangles. The triangle setup is to be optimized for this.

At this point the addition that the block - creation goes stepwise. Next the triangle is cuted into rudely segments to get for example data. With this data the Z - values are calculated and then the traingle is cuted into smarter segments which goes to the Early - Z - Test. Next the smart segments are rendered by partial visibility. The LOD - calculation for the MIP - mapping is used for each 2 x 2 block and not for each pixel. Thats exactly enough and calculation time is kept back.

The reasons for the block generation are that the rendering performance is relevant bigger as the the linewise processing.
The opimization occures to full burstline, to use the full memory interface and at blockwise rendering the wrong cut´s are important smaller than at line based rendering. Besides the cache hits at the rendering process are important better from waht memory bandwith is kept back.

Now lets stop the trip into the optimization strategy and follow the way of the subpixels through one pipeline.

It creates finally a texture sample that applies to all subpixels. At the multisampling there is a frame buffer for each subpixel. As in our example only one subpixel is located in the polygon, our texture sample is written in only one frame buffer which compares to the subpixel on the right top. There are two way´s why "empty" subpixels appears in a pixel: They aren´t a part of the polygon or the Easy - Z - Test produces that there is a coverup.

[Picture: Each subpixel got´s it´s own frame buffer. They are called "Multisampling buffer", too.

With multisample buffers can create more effects than AA but than more than only 4 buffers are remained. But the space of the RAM isn´t a problem but the performance. Apropos it is irrelevant for the function that the buffers are "side by side" or "into each other". In other words: the function isn´t affected in any wise if the redering is in various buffers or specially adressed in one enlarged buffer.

soo die nächste seite gibt heute nacht oder morgen...

Greets Maik

KillerCookie

2004-01-04, 03:58:13

Gäääähn... Seite 5/8 = done... 03:57 ? ich muss verrückt sein ;D. also das war jetzt der 3. streich (und der 4. folgt sogleich :D) also ich mache dann mal noch seite 4/8. (edit: hatte 2 seiten vertauscht - 4 und 5 :))

Seite 5/8:

Multisampling anti - aliasing under the magnifying glass

May 22nd, 2003/from aths / page 4 of 8

Framebuffer - compression: put on where it takes advantage

The compression which is commercialized with the R300 resp Nvidia with the GeForce FX isn´t a compression because no RAM is saved but only bandwith. The framebuffer compression is much simplier as the Z - compression. The circumstance during the multisampling that
all subpixels inside a pixel has got the same color value (if the pixel is located completely in the polygon). Radeon 9500 and higher resp GeForce FX 5600 and higher write the color only one time into the framebuffer and a hint that the color applies to all subpixels. If this pixel is covered during the rendering process by a polygon - edge the chip writes the colorsamples normally into the related framebuffer.

Unsolved is the way how the framebuffer - compression is exactly realized. Thinkable are fundamentaly two commencements. 1. Each inwardpolygon - pixel becomes a hint like "the color of the multisampling - buffers number 0 applies to all subpixels" and 2. a tile- (resp burstline-) based compression is used. By "render to texture" there is fundamentally no compression. Colore compression is (our knowledge) as well as by Ati and by Nvidia at any time only available together with multisampling and compresses maximal like the AA - level (at 4x AA max. 4:1.)

Like shown before the framerate increases exact while the Z - compression is used during the multisampling process. The color compression is the next logical step and increases the performance distinctively at 4x and higher again exactly. Ati commercializes that feature as a part of the HyperZ III - packet, with other words: as a art of the performance increasing architecture. Nvidia uses color compression as a part of the picture quality improving operation packet and calls it intellisampling. The GeForce FX 5200 holds multisampling - anti - aliasing (MSAA) but no color compression. How much per performance intellisample HCT (NV35 and higher) increases the preformance is unknown.

The feature of the "framebuffer - compression" is definitely necessarily to use the full potential of multisampling. Without reducing the the quality the performance increases mainly if there are many subpixels with other words: if every performance thrust is required. All graphics cards which holds multisampling but no color - pseudo - compression show a AA technologie which is stopped at the half way.

Multisampling alone saves only fillrate but thanks to the features which are saving bandwith, anti aliasing is really fast. Which anomalies do we have to take?

Textures, pixelshader and multisampling: Where is light there is shadows, too.

The pipeline collects a texture value and writes it for all the 4 subpixel into the framebuffer. During the downfiltering all 4 colors are read and mixed . Logical wise nothing changes if 4 colors are mixed with each ohter. During the multisampling textures aren´t changed.

[Picture: This pixel belongs fully to the triangle. The same color value is written 4 times into the multisampling buffers and these 4 colors are mixed during the downfiltering. Result: Texel is exact like there is no AA.]

But the Z - test is processed for each subpixel. This is definitely requisitely. Lets assume that 2 triangles are infiltrating each other. If the cut - edge is to be smoothed too the test is to be run subpixelwise. If this wasn´t done there is no smoothing. Good to look at parhelia. Parhelia smoothes only polygon outside - edges but no cut - edges.

Then there is to be explained what is happening with the texture edge during multisampling. We remember - für jedes Dreieck wird pro Pixel eine Farbe gesampelt, welche dann für jedes Subpixel, das zum Polygon gehört, verwendet wird. That means that the subpixel don´t get the "real" color. This articulates mainly at edges. There it can happen that a texture value is sampled which isn´t located on the polygon. In DirectX 9 there is the possibility to avoid that but only with a pixel shader 3.0: with a flag ("centroid") the sample position ist corrected as far as it is located on in the polygon. The graphics cards of today have to live with pixelcolors for polygon edges that are sampled in texture areas that arent located on the polygon.

By the pixelshader (version 1.3 and higher, supported by radeon 8500 and higher, geforce 4 ti and higher and parhelia) there is the possibility to set not only an new pixel - color in the shader but an new Z - value, too. This is requuisitely to render Z - correct bump mapping. Bump mapping generates at a smoothed polygon with pixelwise lighting effects a roughness - effect. This "hillness" is only a imitation of course the polygon stays smoth. This leads to unexpectedly results if this polygon is inflatrated by an ohter. In order that the coverup - refraction compares with the simulated roughness each pixelshader (which takes care of the bump mapping) has to change the Z - value bacause it was used to check the coverup.

This Z - value applies to all subpixels . Simple sayed: coverup corected bump mapping is now pixel exact possible but not subpixel exact.

There are multisampling procedures which has theire own anomalies. This had to be say of course.

PS.: der letzte teil war (finde ich) ziemlich schwer und es könnte da vermehrt zu fehlern gekommen sein...

Greets Maik

KillerCookie

2004-01-04, 18:26:57

Hallo,
@all 3d gurus und leute die sehr gut englisch können: liest mal wer korrektur???

MfG Maik

zeckensack

2004-01-05, 02:39:13

Original geschrieben von Jason15
Hallo,
@all 3d gurus und leute die sehr gut englisch können: liest mal wer korrektur???

MfG Maik *meld*
Morgen. Heute peil' ich sowieso nichts mehr *eg*

FZR

2004-01-05, 02:52:28

Original geschrieben von Jason15
...
3. "Kleine" Karten - Fluch oder Segen?
http://www.3dcenter.de/artikel/2003/09-23_a.php
...
MfG Jason

Würde den wohl übersetzen wenn nicht schon jemand angefangen hat.

aths

2004-01-05, 09:39:25

Original geschrieben von Jason15
Multisampling Anti - Aliasing under the magnifying glass
Überschriften bitte nicht wörtlich übersetzen. Ich äußere mich nachher zu den Überschriften zu meinen Artikeln (muss jetzt zur FH.) Soviel vorne weg: "Unter der Lupe" heißt, Nahbetrachtung, zum Beispiel "closeup view".

Original geschrieben von Commander Larve
Without reading your whole translation:

I think the title should be changed. "Anti-Aliasing under the magnifying glass" sounds a bit strange to me. "Anti-Aliasing in detail" or "a closer look on AA" or sth. like that sounds a lot better (in my opinion) =) "In detail" ist "reserviert" für bestimmte Artikel :) Ein "im Detail"-Artikel heißt bei mir, erstens "von der Pike auf", zweitens wirklich ins Detail zu gehen. Der MSAA-Artikel ist eher ein "Ergänzungs-Artikel".

CannedCaptain

2004-01-05, 16:51:08

Original geschrieben von derbis
Würde den wohl übersetzen wenn nicht schon jemand angefangen hat.

it's yours!

aths

2004-01-05, 17:46:46

Title: "Multisampling Antialiasing — A Closeup View"

Bitte den echten Apostroph nehmen, kein ´ oder so. Da es Probleme mit HTML geben kann, einfach diesen Code nehmen: ' (das zeigt er natürlich gleich als Apostroph an, der Code lautet: ') Der Text kann ja erst mal mit ' geschrieben werden, vor dem Senden an Leo müsste dann mit Search&Replace daraus ' gemacht werden.

Lost Prophet

2004-01-14, 12:42:14

page 1
check (von jemand anderem) steht noch aus, ansonsten fertig

____________________

Multisampling Antialiasing — A Closeup View

May 22nd, 2003 / from aths / page 1 of 8

Introduction

Today, we try to get to the bottom a rather special topic: multisampling anti-aliasing. The basics were described in an older, untranslated article, additional data concerning this topic will only be briefly explained, we still hope though, that the article is comprehendable.

This topic is by no means just theory, it's a function of actual, purchasable hardware. First we examine the multisampling-proceduresused by GeForce3, 4 Ti and FX, later we'll look at the multisampling of Radeon (Radeon 9500 or higher).

On virtually all modern cards anti-aliasing is realised via multisampling. Thus this method must have certain advantages, and analysing these makes up a big part of this article. But nothing's for free, and the disadvantages which might not be anticipated beforehand, are disussed as well.

The raster-process: How geometry turns into graphics

The first thing a graphic card does when it receives a triangle, which is located in an imaginary space, is projecting it onto the screen-surface. Hence this triangle only posesses 2D-coordinates now, X and Y. (the Z-coordinate for depth is still saved alongside, though)

Now the triangle-setup determines in which lines the triangle is located, by checking all three vertices. Then it starts rendering with the smallest Y-value, and works its way through to the largest Y-value. Since the X-values of the vertices are still known, it can be determined for each line, at which X-value the triangle begins and ends.

The chosen illustration should just give a general overview about the raster-process, because multisampling is based on it. The uncoupling of triangle-setup and pipline, which probably was intruduced with Early Z (at the latest), is not taken into accout.

Picture: [Example: A triangle covers 4 lines.]

By initialising the texture stages, the pipline "knows" which textures have to be applied on the triangle. Now with every vertex of the triangle a texture-coordinate for every texture is passed on. That the pipeline samples the right point from the texture for this pixel, the coordinate is adjusted accordingly; first line-by-line, then pixel-by-pixel. With that goes a perspective correction, because a linear interpolation of the texture-coordinates-axes isn't sufficient. But supposedly modern graphics cards have a special algorithm, which calculates the perpectively corrected texture-coordinate directly from the pixel-coordinates within the triangle. The Z-values are processed as well, so that also depth is known per pixel.

Now all the data is available, that is required to texture the pixel. The TMU of the pipeline gets the (perspectively corrected) texture-coordinates and returns the texture-sample. Depending on the filter-settings 4 or 8 single texture-texels are needed usually.

We only mentioned the points that are important for anti-aliasing, but be assured, a future article will continue from this point, but now back on topic: multisampling anti-aliasing.

Picture: [That's how the rasterised triangle looks like: information-loss at the edges.]

As the image above shows, inevitably, inaccuracies occur. This does not only concern the geometry, but also the texturing, as discussed later.

First about the edges: How can we gain control over these inaccuracies?

_______________

edit: ueberarbeitet

cya, axel

aths

2004-01-15, 00:23:19

Original geschrieben von Purple_Stain
Today, we would like to scrutinise a rather special topic: multisampling anti-aliasing.Scrutinise? Versteht kein Mensch. Wie wäre es mit "try to get to the bottom of", aber auf jeden Fall wenn's geht "normale" Wörter verwenden. Die Übersetzung ist ja auch für Leute, die englisch nur als Fremdsprache können.

Lost Prophet

2004-01-15, 01:12:42

Original geschrieben von aths
Scrutinise? Versteht kein Mensch. Wie wäre es mit "try to get to the bottom of", aber auf jeden Fall wenn's geht "normale" Wörter verwenden. Die Übersetzung ist ja auch für Leute, die englisch nur als Fremdsprache können.

also ich hab nicht von allen artikeln korrektur gelesen aber beispielsweise im letzten war "under closest scrutiny" drin. das wort ist nicht abgehoben, hoert man im RL auch. ich denke auch nicht dass ich unverstaendliches englisch schreibe, auf dem level auf dem wir uns bei diesen themen (in dem fall MSAA) bewegen finde ich es nur passend. IMO passt das wort, aber naja, was soll ich sagen, du bist der autor.

ich mach mich mal an die zweite seite.

cya, axel

edit: CC hat die 2. seite noch nicht gepostet, dh mach ich nr 3 von jason15

edit3: fragen entfernt, siehe oben

Lost Prophet

2004-01-15, 03:47:21

page 2
check (von jemand anderem) steht noch aus, ansonsten fertig

_________________________

Multisampling Antialiasing — A Closeup View

May 22nd, 2003 / from aths / page 2 of 8

Oversampling: The easiest way

The triangle-setup only knows 2 different conditions per pixel: triangle covered or not. That's where a digitalising-error comes into play. Actually, the pixels on the triangle-edge are just partially covered, but this is ignored. If the centre of the pixel is covered, the entire pixel is counted as a part of the triangle.

An effect of this digitalising-error is "stair"-steps appearing, where a smooth edge is supposed to be. In general, this effect is inevitable, because it is rendered onto a pixel raster (which, by the way, is very coarsely meshed comparing to the accuracy of the geometry data). The visibility of this steps could be lessened, for example by taking into account how much of the pixel is covered. I.e. if a pixel was covered by 50%, its colour-value would make up only 50% of the final on-screen pixel. But to calculate the exact coverage and mix the colours accordingly is way over the top of today's hardware, so in practice it is tried to calculate approximate values of the actual coverage. Hence, every implementation of 3D anti-aliasing procedures nowadays is an approximation.

One way to achieve smoother edges is oversampling: The X- and Y- axis are scaled up by a whole-numbered factor, then this enlarged frame is rendered, and afterwards filtered down to the original resolution again. This means that in this process of downfiltering the information of several framebuffer-pixels makes it into the the final on-screen pixel. That way "softer" edges with colour-transitions are achieved.

Picture: [2x2 oversampling: The internal resolution is higher.]
Picture: [After the downfiltering the edges are a bit smoother.]

Whenever possible the filter-kernel should cover whole pixels and not include fractions of pixels. A scaling by 1.5 x 1.5 for example also results in smoother edges, but the outcome isn't very good:

Picture: [Scaling by non-integer factors is disadvantageous.]

Supersampling: Better smoothing for the same expense?

Oversampling is a supersampling-method, but not viceversa. Also, Supersampling can be made more efficient through repositioning of subpixels. Then terms like "Edge Equivalent Resolution" become important. Why the edge-smoothing can be improved through a different subpixel-mask, is discussed later on in this article.

Such well-elaborated supersampling needs special hardware-support though, while oversampling can also be realised via the drivers alone. In this case the driver does not take over the anti-aliasing calculations (this is not "software anti-aliasing"!!), but guides the card through certain steps, required for the generation of primitive supersampling procedures.

To distinguish "oversampling" from "supersampling" is not the most common method, one could simply distinguish between ordered grid supersampling (OGSS) and rotated grid supersampling (RGSS). But to us the extra-term "oversampling" does not apper pointless, since it describes quite a different approach, namely just scaling the axes, while supersampling is also capable of methods, which achieve a much higher-quality result, with the same amount of performance burnt.

The more advanced, so called "rotated grid supersampling" (the name refers to the form of the subpixel-mask) could also be realised with a bit of work solely via the drivers, if the card supports multisample-buffering. But with this method the amount of geometry-calculations required increases by a multiple. Cards with multisamplebuffering-support usually also posses the special anti-aliasing hardware-support we talked about earlier.

Supersampling has the disadvantage of occupying a pipeline per pixel. With multisampling this isn't the rule.

_______

heisst dass, das sparsed SS nicht treiberseitg realisierbar ist, und RGSS schon, oder besteht da ein widerspruch zwischen absaetzen 2 und 4 der 2ten haelfte?

edit: nachbearbeitet; seiten in reihenfolge gebracht

cya, axel

Lost Prophet

2004-01-15, 12:21:29

page 3
check (von jemand anderem) steht noch aus, ansonsten fertig

_______

Multisampling Antialiasing — A Closeup View

May 22nd, 2003 / from aths / page 3 of 8

GeForce multisampling: Done the quick way

In the first place, multisampling means, to sample only one texture-value per pixel, although more subpixels are generated. In other words: While supersampling renders the whole frame in higher resolution, multisampling does that only for edges, not for textures. Because sampling makes up the biggest part of the texel-pipeline, extending it makes it possible to render several multisampling anti-aliasing samples in a single pipeline. So how exactely do the nVidia-chips preceed? Due to higher distribution, our example adresses GeForce4 Ti and FX, what's different to GeForce3 will be issued later.

The purpose of the triangle-setup is to supply the pixel-pipelines with work. For multisampling anti-aliasing (MSAA)the triangle-setup of GeForce is done with double the resolution. Per screen-line two triangle-lines are generated. The higher internal resolution also applies to the columns of course. An example:

For efficiency-reasons the triangle-setup of modern graphics card doesn't work with lines anymore, but uses blocks instead. The GeForce uses blocks sized 2x2, which are also used for the Early Z-test. Such a block contains 4 pixels, so it's quite obvious, that every of the 4 GeForce-pipelines is responsible for one the pixels.

Picture:

The pipelines now get a pixel each. In this example, one pipeline stays idle, because one of the 4 pixels is outside the polygon. Such clipping is inevitable, but with a line-based architecture, which processes 4x1-"blocks", the overall-clipping would be even greater. Generally, the smaller rendered triangles are, the more the clipping and as a consequence the inefficiency increases. Complex geometry with many roundings creates a lot of small triangles, thus it is worthwhile to optimise the triangle-setup especially for this kind of situation.

At this point we have to mention, that the blocks are generated stepwise. To begin with, the triangle is divided into rough blocks, to obtain data for calculating the Z-values later on. Then, these are divided again into smaller blocks, which go through the Early Z-test, and if at least partially visible, are rendered afterwards. The LOD-calculation for MIP-mapping is done per 2x2-block as well, which is still accurate enough, and saves a bit of precious rendering power.

The reasons for using blocks are mainly the increased rendering-efficiency in comparison to line-based processing. To make full use of the memory-interface it's optimised for full burstlines, and in comparison to line-based rendering, block-wise rendering reduces clipping dramatically. In addition the cache hits are significantly better when using blocks, which also saves memory-bandwith.

Enough of our trip to possible ways of optimising this process, let's follow the subpixels through a pipeline.

The pipeline generates [i]one texture-sample which applies to all subpixels. With multisampling, each subpixel has its framebuffer. Since in our example only one subpixel lies within the polygon, the texture-sample is written only in the framebuffer which belongs to the subpixel "upper right-hand corner". There are two possibilities, why "empty" subpixels within the pixel can occur: Either they don't belong to the polygon, or the Z-test showed that they are covered.

Picture: [Each subpixel got its own framebuffer, which are also called multisample buffers.]

"Downfiltering" is reading the colour-values out of all four framebuffers and averaging them, to determine the final colour of the pixel. That's also how the smoothing effect is achieved.

Multisample buffers are capable of creating several other effects that anti-aliasing, although such actions would require a lot more than just 4 such buffers. Also lacking for this kind of application is more power, RAM isn't that much of an issue. It is irrelevant for the function whether the buffers are aligned side-by-side or box-like. Thus, if it's rendered into multiple buffers, or specially addressed in one big buffer, is a matter, which doesn't affect the function the slightest.

__________

ist doppelt feine aufloesung nicht eigentlich 4fache aufloesung oder 2fache genaugkeit? (absatz 2)

anmerkung an jason15 hat sich erledigt.

edit: ueberarbeitung; seiten in reihenfolge gebracht. (copy & paste)

cya, axel

KillerCookie

2004-01-15, 20:45:40

Oh man... sorry wegen der vielen fehler aber die bisherigen seiten waren wirklich keine glanzleistungen... ich werde die nächste seite ich werde mir für die 6. seite etwas mehr zeit nehmen und versuchen das besser hin zubekommen. tja... ich habe zwar schon viel mit englisch zu tun gehabt doch wenn man dann gleich so vor informationen überlaufende texte bekommt ist das um einiges schwerer. also: beim nächsten mal wirds besser :D

MfG Maik

aths

2004-01-15, 22:49:21

Original geschrieben von Purple_Stain
edit2: aths, bitte die fragen bzgl links beantworten. Verlinken immer dann, wenn zum Zeitpunkt des Releases der Artikel in englischer Sprache verfügbar ist :)

Lost Prophet

2004-01-15, 23:51:01

Page 4
check (von jemand anderem) steht noch aus, ansonsten fertig
__________

Multisampling Antialiasing — A Closeup View

May 22nd, 2003 / from aths / page 4 of 8

So what's the point? - Why multisampling is faster

The anti-aliasing quality of the GeForce3 wasn't groundbraking, in fairness this title belongs to Voodoo5. But the big advantage in favour of GeForce3 was the speed. We'll talk about quality later, and turn to speed now. Because this is the big advantage of multisampling anti-aliasing.

GeForce3 doesn't calculate a texture-value for every subpixel. This saves work, but spares the fillrate more than the bandwith. That's because the bandwith saved is mainly from texturesampling, but the sampling is done from the cache anyway. And although it is sampled only once per pixel, the colour-value still has to be saved seperately for every subpixel.

The Z-test is done simultaneously for all subpixels in a pixel. Now here's the tricky part: The additional expense of Z-fillrate required for anti-aliasing is saved again right away, but for now the Z-bandwith remains as high as it was.

As a consequence of that, the 4x anti-aliasing is quite slow, simply because there's not enough memory-bandwith, despite certain optimisations by multisampling. The high anti-aliasing performance in comparison to the competition is mainly due to a lot of raw power. Though beaten by a GeForce2 Ultra in theoretical fillrate, the GeForce3 showed its true strength in 32-bit mode, and devastated all other 3D-gamer cards available to that point.

Other than the GeForce2, the GeForce3 is able to make use of its 4x2 architecture also in 32-bit mode, this additional power also affects the anti-aliasing speed of course. Multisampling-technology is only one component of high anti-aliasing speed. Another important part is the handling of the mentioned Z-bandwith, which is used extensively for anti-aliasing. GeForce-cards higher than GeForce3 are capable of Z-compression, which already increases the speed in general, but pays off even more for multisampling anti-aliasing.

We won't deprive you of some numbers, we tested the Villagemark in 1400x1050x32 on a GeForce4, clocked at 300/350 (Core, RAM).

Table: [MSAA-performance]

This table shows two things: Firstly, MSAA alone isn't sufficient for attractive anti-aliasing speeds. Secondly, in 4x MSAA-mode, the Z-compression has more effect than Early Z, ant that's especially interesting because the RAM of our GeForce4 is over-clocked, which increases the available memory-bandwith. And still the Z-compression has such a big effect on saving bandwith. Without anti-aliasing, Z-compression increases overall speed only by 3%, with 2x MSAA already by 9%, and with 4x MSAA by a full 13%. Depending on the benchmark, the results scale differently, but the the tendency is obvious: Z-compression is crucial for fast MSAA. The FX 5200 does not posses this feature and therefore is not a recommendable card for anti-aliasing.

Differences with HRAA and AccuView: What has been improved for GeForce4 Ti?

Anti-aliasing and anisotropic filtering are marketed under a joint name since GeForce4: AccuView on GeForce4, Intellisample on GeForce FX, and Intellisample HCT on GeForce FX 5900. The AccuView-technology was adopted unalteredly, so quality hasn't been changed. AccuView itself is based on the GeForce3-HRAA ("High Resolution Anti-Aliasing"), but was improved. The changes concern speed, as well as quality. First about quality. Geforce3 has a bit more awkward subpixel-positions:

Picture: [GeForce3 on the left, GeForce4 Ti (and FX) on the right.]

This scheme shows the positions of the subpixels (red), the spot used for the coordinate-calculation for the texture-sample (blue), and the temporary lines in the triangle-setup (green).

Why do GeForce4 and FX sample "better" than GeForce3? The average distance from the subpixels to the texel-position (where the colour-value is sampled from, which applies to all the subpixels) is shorter than with GeForce3. Thus the average colour-deviation per subpixel is smaller. In addition, the subpixels are centered nicely, not moved to the upper left-hand corner, the geometry-sampling is more transparent.

There are also differences in performance. The Early Z-test, which is the abandoning of covered pixels, doesn't work on GeForce3 when MSAA is active. That's why already with 2x MSAA the performance drops significantly. GeForce4 Ti generally is a bit chary concerning Early Z, but with newer drivers, tweakprograms such as the aTuner are able to activate this feature, it also works with anti-aliasing then.

There's another difference which only concerns the 2x mode in 3d-fullscreen though. While GeForce3 generally uses conventional downfiltering, GeForce4 or higher wait with the filtering from the multisampling buffers until the RAMDAC-scanout. This saves Read/Write-bandwith for the downsampling-buffer, but it only improves the performance as long as the framerate doesn't drop under 1/3 of the display-frequency. The NV35 (GeForce FX 5900) also uses this method with 4x MSAA. The break-even here is 3/5 of the display-frequency. But where does the NV35 takes the power for 4x MSAA from?
________

edit: ueberarbeitet; seiten in richtige reihenfolge gebracht. (copy & paste)

cya, axel

Lost Prophet

2004-01-16, 01:58:34

page 5
check (von jemand anderem) steht noch aus, ansonsten fertig

_____________________________

Multisampling Antialiasing — A Closeup View

May 22nd, 2003 / from aths / page 5 of 8

Framebuffer-compression: Pursue what's efficient

What's marketed as a compression with R300 and Geforce FX by ATi resp. nVidia, actually isn't one. Because only bandwith is saved, and not a bit of RAM. Framebuffer-compression is very likely, to be a lot simpler than Z-compression. With multisampling all subpixels in a pixel have the same colour-value (at least as long as the entire pixels lies within the polygon). This circumstance is taken advantage of, and hence, in this case Radeon higher than 9500, resp. GeForce FX higher than 5600 write the colour into the framebuffer only once, along with a note, that it applies to all subpixels. But if in the process of rendering the pixel is covered by a visible edge, the colour-samples are written into the framebuffers the usual way.

How exactly the framebuffer-compression is realised remains unsolved. Basically there are two approaches that are realistic. The first one is, that a note is attached to each pixel which is entirely in the polygon, saying "The color-value of multisampling-buffer 0 applies to all subpixels". The second possible way is that a tile- (resp. burstline-) based architecture is used. With render-to-texture" no compression seems to take place at all. As far as we know, colour-compression in both ATi- and nVidia-chips is only active together with multisampling, and compresses by up to the level selected for anti-aliasing (ie. up to 4:1 compression for 4x AA).

As shown before, there is a significant increase in frame-rate, when Z-compression is enabled. A colour-compression is the next logical step and gives the performance another boost, especially in modes like 4x and higher. ATi markets this feature as part of the HyperZ III-package, ie. as part of the efficiency-increasing architecture, while nVidia adds it to the image-quality enhancing package "Intellisample". How much further Intellisample HCT (since NV35) boosts the speed, is still unknown. The FX 5200 is capable of multisampling anti-aliasing, but not of colour-compression.

Framebuffer-compression is absolutely essential, to fully make use of the multisampling-potential. Without quality-loss, the speed is increased, especially when there's a lot of subpixels, a case where every extra performance thrust is very welcome anyway. Cards without this colour-compression only have half the multisampling anti-aliasing.

Multisampling alone practically saves only fillrate, but thanks to these additional bandwith-saving features, anti-aliasing gains a lot of speed. So what's the flipside of the coin?

Textures, pixelshaders and multisampling: Where there's light, there also are shadows.

The pipeline samples one texture-value, and writes it into the corresponding frame-buffers for all four subpixels. Downfiltering, reads out all the buffers and mixes the colours. Obviously nothing changes if you mix four identical colours. Hence, textures are left alone by multisampling.

Picture: [The entire pixel belongs to the triangle. The same color-value is written into the multisampling buffers 4 times, and these 4 colors are mixed during downfiltering. Result: The texel is exactly as if there wasn't any anti-aliasing.]

Still the Z-test is taken subpixel-wise, which is absolutely necessary. Let's pretend two triangles coincide. That the emerged cutting-edge can be smoothed, the test has to be done subpixel-wise. If this isn't done, there is no smoothing, as it can be seen with Parhelia. Parhelia's anti-aliasing only works on the outer polygon-edges and not on the cutting-edges.

Still to be solved is what happens to the textures on the outer edges when multisampling. We remember - for every triangle one colour per pixel is sampled, and then applied to all subpixels that belong to the polygon. Thus the subpixels don't get their actual colour-value. This resulting effect appears especially around edges, where it's possible, that a texture-value is sampled which isn't in the polygon. In DirectX 9 there is a possibilty to avoid this, but only Pixelshader 3.0 or higher. With a flag ("centroid") the sample-position is corrected so far, that it lies within the polygon. Cards nowadays have to cope with the fact, that pixel-colours for polygon-edges may possibly be sampled in areas, that aren't coverd by the polygon.

With pixelshader version 1.3 or higher (supported since GeForce4 Ti, Radeon 8500 and Parhelia), the colour-value and the Z-value can be altered in the shader. This is required to render Z-correct bump mapping. Bump mapping creates roughness on a flat polygon by per-pixel lighting-effects. This "hillyness" is only illusion, the polygon remains flat. This leads to unanticipated events when such a polygon coincides with another one. To calculate what covers what, with the simulated roughness in mind, the pixelshader, which is responsible for bump mapping, also has to alter the Z-value, since this value is used for the "coverage"-check.

But afterwards the changed Z-value applies to all the subpixels. In short: Now cover-correct bump mapping is possible pixel-accurate, but not subpixel-accurate.

There are some other multisampling-applications, which also have their peculiarities. Of course, they should be mentioned as well.

_________

ein farbwert fuers ganze polygon?(2ter teil, absatz 3)

anmerkung an jason15 hat sich erledigt

edit: nachbearbeitung; seiten in richtige reihenfolge gebracht. (copy & paste)

cya, axel

KillerCookie

2004-01-16, 13:34:19

@purple stain
danke :)... bei dem letzten habe ich auch am längsten gesessen.

ich werde mich dann mal um seite 6 kümmern...

MfG Maik

Lost Prophet

2004-01-18, 10:26:01

page 6
check (von jemand anderem) steht noch aus, ansonsten fertig

____________

Multisampling Antialiasing — A Closeup View

May 22nd, 2003 / from aths / page 6 of 8

Alternative multisampling-methods and their problems

Matrox created its own anti-aliasing solution for Parhelia, and called it "Fragment Anti-Aliasing". A colour-compression is implemented, Z-fillrate- and bandwith are saved as well. The problem of the unsmoothed cutting-edges was already mentioned, but on the other hand the smoothing-quality on the outer edges is quite good, since a 4x4 subpixel-mask is used. This is very inefficient though, because with more better positioning, almost equal quality is achieved with a lot less subpixels. In addition, the memory for anti-aliasing samples is limited (obviously there's no RAM for 16x full-screen anti-aliasing), and as a consequence of that, in some cases a few of the outer edges aren't smoothed at all.

As we mentioned before, since the framebuffer-compression saves just bandwith, AA-modes with many subpixels are very RAM-intensive. The Matrox-solution works only up to a certain number AA-samples per frame, anything above that isn't calculated.

To get this under control, a procedure called Z3 anti-aliasing was invented. Per pixel the colours of up to 3 triangles can be saved. Thus, every pixel has 3 "slots", which can save a colour each, with a note to how many pixels it applies. For additional subtexels, the colours are mixed on-the-fly, but problems occur when there's more than 3 subtexels. This of course means that more than 3 triangles add their colours, and also the handling of the Z-values gets more difficult then.

Z3 anti-aliasing doesn't save the Z-values per subpixels, only per triangle that covers the pixel. Additionally a mask is saved, saying which subpixels are covered by the triangle, hence to how many subpixels the colour of the triangle applies. Also a directional indication for the Z-values is saved, that the Z-values of the remaining subpixels covered by the trianlge can be reconstructed. But if a fourth triangle enters the pixel, this process gets to its border. As a result, the colour-values on the edges are a bit imprecise, but a great amount of RAM is saved. In consumer-hardware this procedure hasn't been implemented so far.

All the mentioned problems could be solved at once, if Tile Based Deferred Rendering was used. Multisampling saves a lot of texel-fillrate, but bandwith also isn't a problem here, because every tile is rendered only once and is final when it's written into the frame-buffer. A multisampling anti-aliasing procedure working fast also with a lot of subpixels, but using only little RAM, practically is only possible on a Tile Based Deferred Renderer.

Enough of the theory. What we saw in practise was mainly about GeForce. Let's look at Radeon now.

Quality vs. Radeon: Who smoothes better?

With Radeon 9500 or higher ATi offers multisampling as well. Here the comparison:

Pictures: [GeForce since 4 Ti on the left, Radeon since 9500 on the right.]

Again, AA-lines are coded green, subpixels red, and texel-positions blue. With 4x anti-aliasing, the R300-core therefore uses 4 temporary lines in the triangle-setup. The subpixels are also better distributed on the X-axis. For the edges this results in a higher effective resolution:

Pictures: [A more intelligent grid delivers smoother edges.]

Although the chosen example isn't very good, it still can be seen that edges in the lower picture have better colour-transitions due to the more efficient grid.

In terms of multisampling anti-aliasing quality the R300 (as well as its successors R350 and R360) are miles ahead of the NV30: While the NV30 can only part the triangle into two lines per final on-screen line, the R300 can part it in up to six. What GeForce3, 4 and FX offer in terms of edge-smoothing, has been hoplessly out-dated by the R300. And since the R300-core there's an additional mechanism, which increases the smoothing-quality even more.

Gamma-correct Downfiltering: What you calculate should be what you see.

Actually this isn't a multisampling-specific thing, but since in practise this feature is only encoutered along with multisampling anti-aliasing, we still want to mention it.

Generally gamma-correction is used, when the picture appears too dark. If just the brightness was altered in a dark picture, black would turn to grey, and the light areas would merge, because everything bright would turn to white. The signal-dynamic decreases, and ultimately the frame would be worse. Thus gamma-correction is used. It has the convenient property, not to lessen the contrast-range, and instead turns the non-linear brightness of regular cathode ray tubes nearly back to linear.

For historic reasons, a lot of graphics were designed for usual monitors though, and to linerarise the brightness in such a frame results in a unexpectedly bright picture. Therefore the possibility to correct the monitor via the gamma-curve is usually left aside. For anti-aliasing the downfiltering follows a linear line though: For example, if the procedure creates up to 3 colour-graduations, all three should be equally distributed with regard to brightness.

But in reality, moderatly dark colours, are displayed that dark, that it's very hard to distinguish them from black. Hence, not all of the colour-graduations are visible anymore, which results in a "hilly" egde. But a full correction of the frame with the gamma-function would deliver too bright pictures. That's why it's a good idea to to gamma-correct the subpixels before the downfiltering, and back-correct the final pixel.

Picture: [If inner and outer area appear to have the same colour, the brightness is linear.]

As can be seen above, the mixed color of two pixels can have be different from what they "really" create in terms of brightness. Gamma-correct downfiltering solves this, but there are limits.

The driver has to know, what the gamma-correction setting of the device is, and then estimate which value would be ideal. On some displays, which undertake the gamma-correction outside the driver, this method can lead to an overcorrection, also resulting in worse visibility of the colour-graduations.

We got through the part of the multisampling now. But some subtleties aren't recognised until the process is compared to another one. This shall happen on the next page.

______________

edit: da ich die seiten in reihenfolge gebracht habe, bezieht sich der folgende post von aths auf 2 posts weiter unten (seite 7)

cya, axel

aths

2004-01-19, 05:56:52

Original geschrieben von Purple_Stain
Quality vs. Supersampling: Is quality the best formula?Sorry, dass ich wieder meckere. Aber noch mal die Bitte, nicht wörtlich zu übersetzen. Die Überschrift ist eine Anspielung auf die Dr. Oetker-Werbung, die außerhalb Deutschlands keiner kennt.

"Ist Qualität das beste Rezept?" stellt die Frage: "Ist Qualität das einzige/wichtigste, worauf man achten sollte?" Das wäre sinngemäß zu übersetzen.

Es kommt nicht auf das Wort, sondern auf den Inhalt an. Meinetwegen zerteilt Sätze, oder fügt sie zusammen. Formuliert ganze Passagen um, wenn's der Sache dient. Die Sache heißt, den Inhalt in möglichst einfachem Englisch wiederzugeben. Einleitende Worte am Satzanfang wie "Offenbar", "Sofern" etc. müssen wirklich nicht wortgetreu übertragen werden. Im Gegenteil, lieber "gewöhnliches" Englisch schreiben. Wenn das einleitende Wort dem Satz eine besondere Bedeutung gibt, kann das in der Übertragung auch erst mitten im Satz oder am Ende stehen, sofern sich damit die Lesbarkeit erhöht.
Original geschrieben von Purple_Stain
Here more performance is used than quality is generated.Alternative Idee: "This burns more performance than delivers quality". (Der Satzbau ist wahrscheinlich falsch, meine Englisch-Kenntnisse sind bescheiden.) Die alternative Idee entfernt sich vom Original. Falls das der Verständlichkeit dient, kein Problem.

Kritisch sind allerdings Passagen, wo ich über Firmen lästere, oder sie lobe. Hier ist meist jedes einzelne Wort genau abgewogen, hier würde ich mich über eine möglichst exakte Wiedergabe freuen.

Lost Prophet

2004-01-19, 08:44:17

Original geschrieben von aths
Sorry, dass ich wieder meckere. Aber noch mal die Bitte, nicht wörtlich zu übersetzen. Die Überschrift ist eine Anspielung auf die Dr. Oetker-Werbung, die außerhalb Deutschlands keiner kennt.

"Ist Qualität das beste Rezept?" stellt die Frage: "Ist Qualität das einzige/wichtigste, worauf man achten sollte?" Das wäre sinngemäß zu übersetzen.

Es kommt nicht auf das Wort, sondern auf den Inhalt an. Meinetwegen zerteilt Sätze, oder fügt sie zusammen. Formuliert ganze Passagen um, wenn's der Sache dient. Die Sache heißt, den Inhalt in möglichst einfachem Englisch wiederzugeben. Einleitende Worte am Satzanfang wie "Offenbar", "Sofern" etc. müssen wirklich nicht wortgetreu übertragen werden. Im Gegenteil, lieber "gewöhnliches" Englisch schreiben. Wenn das einleitende Wort dem Satz eine besondere Bedeutung gibt, kann das in der Übertragung auch erst mitten im Satz oder am Ende stehen, sofern sich damit die Lesbarkeit erhöht.
Alternative Idee: "This burns more performance than delivers quality". (Der Satzbau ist wahrscheinlich falsch, meine Englisch-Kenntnisse sind bescheiden.) Die alternative Idee entfernt sich vom Original. Falls das der Verständlichkeit dient, kein Problem.

Kritisch sind allerdings Passagen, wo ich über Firmen lästere, oder sie lobe. Hier ist meist jedes einzelne Wort genau abgewogen, hier würde ich mich über eine möglichst exakte Wiedergabe freuen.

das aendert einiges, ich habe bis jetzt zwar nicht wortgenau uebersetzt und nur ein bisschen die woerter innerhalb eines satzes gewuerfelt. freut mich zu hoeren, ich werd die seiten nochmal ueberarbeiten.

braucht man ja eine offizielle genehmigung dafuer ;)

cya, axel

ps. ich hab den doktor oetker nicht erkannt... :D

Lost Prophet

2004-01-21, 09:20:18

page 7
check (von jemand anderem) steht noch aus, ansonsten fertig

___________________________

Multisampling Antialiasing — A Closeup View

May 22nd, 2003 / from aths / page 7 of 8

Quality vs. Supersampling: Is quality the only thing to look for?

While multisampling calculates texture-values only per pixel, supersampling has a kind of texture-oversampling integrated, which results in an anisotropic filter-effect. At least in theory. In practise this is only true for GeForce-cards. On Radeos, most drivers back-"correct" the texture-detail via MIP-Map-LOD, to gain a bit of performance. With GeForce-cards on the other hand the full quality-advantage for textures is seized. On a Voodoo-card it was necessary to alter the LOD-controller manually, in order to get higher texture-detail.

But the "true" anisotropic filtering (short "AF"), is able to achieve equal quality with significantly less rendering power, thus the combination MSAA + AF is prefered. But compared to supersampling, multisampling has a disadvantage with textures when a so called "Alpha Testing" comes into play.

The Alpha Test is able to change texture-values marked invisible into translucent, by abandoning the entire pixel after the Alpha Test and not writing it into the framebuffer. Especially older game make use of this technique. Edges within the texture emerge; supersampling smoothes that, multisampling of course doesn't. The texture doesn't get worse though, it just stays the way it would have without anti-aliasing.

When talking about textures, the term "Nyquist-boundary" has to be mentioned as well. This boundary allows to calculate, which texture-resolution has to be chosen, for the picture to be as sharp as possible without flickering. In short, a ratio of "1 filtered texel per pixel" is ideal. But for AF more filtered texels per pixel are needed, and these have to be sampled from higher-resolution MIP-textures then.

Thanks to adaptive oversampling from higher resolved textures, AF makes biassed textures less blurry. Such biassed textures are encountered when looking at a polygon in a flat angle. The alpha-texels are also sampled from the higher-resolution textures. But in that case the whole pixel remains invisible, not just a few AF-samples. This is a violation of the Nyquist-criteria, and results in horrible flickering. The flickering would be less with AF disabled, but then all biassed textures would be squishy.

Supersampling does not solve this problem, but the quality is clearly improved: On one hand normal textures get an AF-effect, and still can get better when AF is activated. But more importantly: The flickering with alpha-textures is reduced, because internally, the entire alpha-texture is sampled in smaller steps. Instead of just translucent pixels, supersampling now also permits translucent subpixels. This advantage is partly nullified again though, because due to the subtler internal texture-sampling, it is also sampled from a higher-resolution MIP-level (if a higher resolved MIP-texture is still available). But in practise, such a frame is still more comfortable to look at with supersampling than with multisampling. Especially branched treetops flicker stronlgy with latter.

With Alpha Blending (instead of Alpha Testing) the flickering is more of a minor issue. Alpha Blending makes it possible to grade between visible and translucent. For example, a texel can be marked "20% visible". In this case the colour is sampled from the texture, and the data for this spot is read from the framebuffer. Then the colour-value is mixed accordingly, and written back into the framebuffer afterwards. But there's a problem, Alpha Blending only works properly when rendering from back to front. That's because the objects in the back have to shine through the ones in the front, not viceversa. But to have the frame set up from behind is disadvantageous with regard to efficiency (generally it is tried to render from front to back). Due to static rendering order, additional wrong results can occur from certain angles, as in Max Payne: From below the lattice-stairs look fine, from above the snow shines through where bars are supposed to be.

As long as the rendering order takes into account which textures are treated with Alpha Blending, this method is superior in quality compared to Alpha Testing. But it requires more work from the graphics card and CPU.

From the quality point-of-view, the best appears solution to be a combination of supersampling and Alpha Blening. Unfortunately, all modern cards only offer supersampling (if at all) in the inefficient oversampling-mode. This burns more performance than quality is generated. Hence, the multisampling/oversampling-combinations offered by nVidia since GeForce3 are not recommendable. Except the case, when unused power is available without the possibility to invest it in higher resolutions. That the 4xS mode still reached popularity is mainly due to the generally bad 4x MSAA-performance of GeForce3 and 4.

Pictures: [On the right 4xS: 4 subpixels, 2 subtexels. The mode is superior to 4x MS, but far away from ideal.]

For the sake of completeness also the 4xS-mode of GeForce3:

Picture: [4xS was available via tweakprograms long before it was announced officially for GeForce4.]

Just as a side note, GeForce4 MX is only capable of 2x MSAA, 4x MSAA isn't possible. The 4x-mode is simply 2x2 oversampling. 4xS, as a combination of 2x MSAA and 1x2 oversampling is still available though.

Time to draw a conclusion.

________________________________

von unsichtbar zu durchsichtig? was hab ich falsch verstanden? (absatz 3)

edit: nachbearbeitet

cya, axel

Lost Prophet

2004-01-22, 01:26:34

page 8
check (von jemand anderem) steht noch aus, ansonsten fertig

__________

Multisampling Antialiasing — A Closeup View

May 22nd, 2003 / from aths / page 8 of 8

Concluding reflections: What do we want?

Generally multisampling anti-aliasing (with AF activated) is superior to supersampling, because better quality is achieved with the same performance, resp. equal quality with better performance. At least as long as higher resolutions are available. A higher resolution has smoother edges and more detailed textures. Thus our primary goal is to have highly resolved, as well as smooth-edged graphics. When anti-aliasing is switched off, even on 1600x1200 the steps-effect is too annoying for the quality-loving gamer.

Anti-aliasing can generate more quality than performance is burnt, and therefore it will never be "unnecessary". But how is this gain created?

If the resolution of both axes is doubled, "twice" as smooth edges are achieved, but require quadruple the performance. Basically it's the same with oversampling: More performance is lost, than quality is generated. A 4x rotated grid supersampling also requires 4 times the performance, but on the other hand it also smoothes the edges 4 times better. The performance invested is in balance with the quality gained.

4x rotated grid multisampling needs far less than an additional 300% speed, but still offers "4 times smoother" edges. Now we get more than we put in.

Pictures: [4x OG has double the sampling-accuracy on the axes, while 4x RG has 4 times the accuracy. Hence the RG-mask is more senitive to changes which results in smoother edges.]

With 4x OG on the other hand, 2 subpixels can be omitted without really being penalised.

Pictures: [With 2x RG the edge-smoothing is a bit worse than with 4x OG. But not significantly, since the internal increase in axis-resolution is the same.]

Since multisampling was introduced with GeForce4, there have been complaints about all successors including GeForce FX 5900, that the 4x MSAA-mode doesn't improve quality a lot compared to 2x, because the 4x-mode still uses the inefficient perpendicular grid.

There are cases which are generally problematic with multisampling, whether it's on a Radeon or a GeForce. Maliciously this could be called a "cheat" or at least a "hack", simply because several things are only calculated once per pixel, but are applied to all subpixels.

In any case a subpixel-mask with rotated grid should be prefered over a mask with ordered grid though, because with the same number of subpixels (hence, the same amount of work) the smoothing-result is significantly better. The fact that GeForce FX doesn't support multisampling-modes higher than 4x, and not up to 6x like Radeon 9500 and higher do, isn't really a disadvantage. What has an effect on image-quality though, is that the 4x-mode uses an ordered grid-mask, and thus the smoothing-quality is far behind from what Radeon 9500 or higher achieve. So be careful when reading anti-aliasing benchmarks comparing "4x vs. 4x"; they usually are pointless, because very different subpixel-masks are used.

To avoid the mentioned multisampling-problems, it would be nice, if rotated grid supersampling was offered as well. GeForce-cards since GeForce3 at least have combo-modes, which mix multisampling with oversampling, which unfortunately is the worst supersampling-method though. In theory the gained quality costs too much speed, for it to be worthwile, but in practise there are games which aren't very demaning, and thus the unused performance can still be invested into additional quality.

To a large extent, multisampling is fillrate-free, and if both Colour- and Z-compression are offered, only little additional bandwith is used. Therefore such anti-aliasing nearly is "for free". It can be expected, that future games will take the multisampling-problems more into account, and that supersampling gets less and less important.

This article wouldn't have been possible without the patient help of Xmas, and especially Demirug. I am very grateful for this support.

This article represents our best knowledge, if you find a mistake, please mail us, or post in our board.

____________

@all: Da schon wieder mind. 3 Artikel vor der Tuer stehen, wuerde ich sagen, dass wir den "Taktverdopplung bei DDR"- alsu auch den "Kleine Karten - Fluch oder Segen?"-Artikel erstmal beiseite lassen. Mal abgesehen davon dass diese 2 IMO nicht wichtig und neben bei alt sind, sollten wir uns zuerst dem AA-masken Artikel widmen.

edit: alle seiten in reihenfolge gebracht
chronologischer ablauf der uebersetzungen: 1,3,4,5,2,7,8,6

cya, axel

KillerCookie

2004-01-22, 12:20:09

Also, dann mal bitte online stellen... weiter gehts bei dem Masken - Artikel...

MfG Maik

Lost Prophet

2004-01-22, 19:08:55

Original geschrieben von Jason15
Also, dann mal bitte online stellen... weiter gehts bei dem Masken - Artikel...

MfG Maik

noch nicht, da sollte nochmal wer drueber gehen

ich hab zeckensack schon eine pm geschickt, und er hat gesagt er machts heute abend

danach online stellen

cya, axel

zeckensack

2004-01-22, 23:42:30

Zwei Anmerkungen vorweg:
1)Ich schlage eine Änderung des Titels vor:
Multisampling antialiasing inspected
2)Anti-aliasing oder antialiasing?
Es sind im englischen Sprachraum beide Schreibweisen recht verbreitet, und IMO auch beide nicht falsch. Jedoch sollten wir hier konsistent bleiben. Ich habe jetzt mal die Version ohne Bindestrich reingedrückt.
____________________

Multisampling Antialiasing — A Closeup View

May 22nd, 2003 / by aths / page 1 of 8

Introduction

Today, we'd like to pick apart a rather special topic: multisampling antialiasing. The basics were touched upon in an older article [that hasn't been translated yet]. Today we'll get to the bottom of it.

We'll quickly run through some prerequisite information, but won't go into too much detail there. We hope that the article is still comprehensible.

This topic is by no means just theory, it's implemented in actual, purchasable hardware. First we examine the multisampling procedures used by GeForce3, 4 Ti and FX, later we'll look at the multisampling of the Radeon line (Radeon 9500 or higher).

On virtually all modern cards antialiasing is performed via multisampling. Thus this method must have certain advantages and analysing these makes up a big part of this article. But nothing comes for free, and the disadvantages which might not be initially obvious, are disussed as well.

The raster-process: How geometry turns into graphics

The first thing a graphics card does when it receives a triangle, which is located in a virtual space, is projecting it onto the screen surface. Hence this triangle only posesses 2D-coordinates now, X and Y. (the Z coordinate for depth is still saved alongside, though).

Now the triangle setup determines in which lines the triangle is located, by checking all three vertices. Then it iterates through the lines covered by the triangle from smallest Y through largest Y. Since the X values of the vertices are still known, it can be determined for each line, at which X value the triangle begins and ends.

The chosen illustration should just give a general overview about the raster process, which we need because multisampling is based on it. We do not take into account any decoupling of triangle setup and piplines, which was introduced in conjunction with early Z at the latest.

Picture: [Example: A triangle covers 4 lines.]

By initialising the texture stages, the pipline "knows" which textures have to be applied on the triangle. Now with every vertex of the triangle a texture coordinate for every texture is passed on. The pipeline samples the right point from the texture for this pixel by adjusting these coordinates accordingly; first line-by-line, then pixel-by-pixel. With that goes a perspective correction, because a linear interpolation of the texture coordinates along two axes isn't sufficient. But supposedly modern graphics cards have a special algorithm, which calculates the perpectively corrected texture coordinate directly from the pixel coordinates within the triangle. The Z values are processed as well, so that also depth is known per pixel.

Now all the data required for texturing the pixel is available. The TMU of the pipeline gets the (perspectively corrected) texture coordinates and returns the texture sample. Depending on filter settings 4 or 8 texels are usually needed.

We only mentioned the points that are important for antialiasing, but rest assured, a future article will continue from this point, but now back on topic: multisampling antialiasing.

Picture: [That's how the rasterised triangle looks like: loss of information at the edges.]

As the image above shows, inaccuracies are inevitable. This does not only concern the geometry, but also the texturing, as discussed later.

First about the edges: How can we overcome these inaccuracies?

zeckensack

2004-01-23, 00:08:46

Multisampling Antialiasing — A Closeup View

May 22nd, 2003 / by aths / page 2 of 8

Oversampling: The easiest way

The triangle setup only knows 2 different conditions per pixel: triangle covered or not. That's where a digitising error comes into play. Actually, the pixels on the triangle edge are just partially covered, but this is ignored. If the centre of the pixel is covered, the entire pixel is treated as a part of the triangle.

An effect of this digitising error is "stair" steps appearing, where a smooth edge is supposed to be. In general, this effect is inevitable because it is rendered onto a pixel raster (which, by the way, is very coarse comparing to the accuracy of the geometry data). The visibility of these steps could be reduced, for example by taking into account how much of the pixel is covered. I.e. if 50 per cent of a pixel were covered, the corresponding triangle's colour value would make up only 50 per cent of the final on-screen pixel. But to calculate the exact coverage and mix the colours accordingly is far beyond today's hardware, so in practice an approximation of the actual coverage is used. Hence every implementation of 3D antialiasing procedures nowadays is an approximation.

One way to achieve smoother edges is oversampling: The X and Y axes are scaled up by an integral factor, and after rendering this enlarged frame, it is filtered back down to the original resolution. This means that in this process of downfiltering the information of several framebuffer pixels makes it into one final on-screen pixel. That way "softer" edges with colour transitions are achieved.

Picture: [2x2 oversampling: The internal resolution is higher.]
Picture: [After the downfiltering the edges are a bit smoother.]

Whenever possible the filter kernel should cover whole pixels and not include fractions of pixels. A scaling by 1.5x1.5 for example also produces smoother edges, but the result isn't convincing:

Picture: [Scaling by non-integral factors is disadvantageous.]

Texture quality will also suffer from non-integral oversampling.

Supersampling: Can we get better smoothing for the same expense?

Oversampling is a supersampling method, but not vice versa. Supersampling can be made more efficient through repositioning of subpixels. Then terms like "edge equivalent resolution" become important. Why the edge smoothing can be improved through a different subpixel mask, is discussed later on in this article.

Such "sophisticated" supersampling needs special hardware support though, while oversampling can also be realised via the drivers alone. In this case the driver does not take over the antialiasing calculations (this is not "software antialiasing"!!), but guides the card through certain steps, required for the generation of primitive supersampling procedures.

Separating "oversampling" from "supersampling" is not the most common approach, one could simply distinguish between ordered grid supersampling (OGSS) and rotated grid supersampling (RGSS). But we find using the distinct term "oversampling" has its merits, since it describes the quite different approach of just scaling the axes, while supersampling also encompasses methods of higher quality at a given performance impact.

The more advanced so called "rotated grid supersampling" (the name refers to the layout of the subpixel mask) could also be realised with a bit of work solely via the drivers if the card supports multisample buffers. But with this method the required amount of geometry calculations is multiplied. Cards with support for multisample buffers usually also posses the special antialiasing hardware support we talked about earlier.

Supersampling has the disadvantage of occupying a whole pixel pipeline per subpixel. Multisampling can break this rule.

zeckensack

2004-01-23, 00:33:55

Multisampling Antialiasing — A Closeup View

May 22nd, 2003 / by aths / page 3 of 8

GeForce multisampling: done the quick way

The fundamental idea behind multisampling is to sample only one texture value per pixel, although more subpixels are generated. In other words: while supersampling renders the whole frame in higher resolution, multisampling does that only for edges, not for textures. Because sampling makes up the biggest part of the texturing pipeline, a relatively simple extension allows the generation of several multisampling antialiasing samples in a single pipeline. So how exactly do the nVidia chips handle that? Due to higher popularity our example adresses GeForce4 Ti and FX. The implementation differences in comparison to the GeForce3 will be addressed later.

The purpose of the triangle setup is to supply the pixel pipelines with work. For multisampling antialiasing (MSAA) the triangle setup of GeForce is done at doubled resolution. Per screen line two triangle lines are generated. The higher internal resolution also applies to the columns of course. An example:

For efficiency-reasons the triangle setup of modern graphics card doesn't work with lines anymore, but uses blocks instead. The GeForce uses 2x2 sized blocks, which are also used for the early Z test. Such a block contains 4 pixels, so it's quite obvious, that every of the 4 GeForce-pipelines is responsible for one the pixels.

Picture:

The pipelines now get a pixel each. In this example, one pipeline stays idle, because one of the 4 pixels is outside the polygon. This kind of efficiency loss is inevitable, but with a line-based architecture which processes 4x1-"blocks", the overall loss would be even greater. Generally, the smaller the rendered triangles are, the higher are the risks for idle pipelines. Complex geometry - e.g. with very smooth curves - creates a lot of small triangles, so it is worthwhile to optimise the triangle setup especially for this kind of situation.

At this point we have to mention that the blocks are generated stepwise. To begin with, the triangle is divided into rough blocks, to obtain data for calculating the Z values later on. Then these are divided again into smaller blocks which go through the early Z test and, if at least partially visible, are finally rendered. The LOD calculation for MIP mapping is done per 2x2 block as well. This is still accurate enough and saves a bit of precious silicon real estate.

The reasons for using blocks are mainly the increased rendering efficiency in comparison to line based processing. Long bursts are desirable to make full use of the memory interface and in this respect block wise rendering is dramatically better than line based rendering. Cache hits are also significantly better when using blocks, which again reduces pressure on the memory controller.

Enough of our trip to possible ways of optimising this process, let's follow the subpixels through a pipeline.

The pipeline generates [i]one texture sample which applies to all subpixels. With multisampling each subpixel has its framebuffer. Since in our example only one subpixel lies within the polygon, the texture sample is written only in the framebuffer which belongs to the subpixel "upper right hand corner". There are two possible ways for "empty" subpixels within the pixel to occur: either they don't belong to the polygon or the Z test showed that they are occluded.

Picture: [Each subpixel has its own framebuffer; these buffers are also called multisample buffers.]

"Downfiltering" reads the colour values from all four framebuffers and averages them to determine the final colour of the pixel. That's also how the smoothing effect is achieved.

Multisample buffers would allow several other effects besides antialiasing, although that would require a lot more than just 4 buffers. Memory consumption really isn't the issue here, it's performance. By the way, it is functionally irrelevant whether the buffers are separate or interleaved. Addressing logic would have to be different, but the visual result would be the same.

zeckensack

2004-01-23, 01:01:37

Multisampling Antialiasing — A Closeup View

May 22nd, 2003 / by aths / page 4 of 8

So what's the point? - Why multisampling is faster

The antialiasing quality of the GeForce3 wasn't groundbraking, in all fairness this title belongs to the Voodoo5. But the big advantage in favour of GeForce3 was speed. We'll talk about quality later, and turn to speed now. Because this is the big advantage of multisampling antialiasing.

GeForce3 doesn't calculate a texture value for every subpixel. This saves both fillrate and bandwidth. It saves a lot more fillrate than it saves bandwidth, though, because the bandwidth savings are mainly from reduced texture sampling, but this sampling is done mostly from cache anyway. And although there is only a single colour value to be generated per pixel, it still has to be stored seperately for every subpixel.

The Z test is done simultaneously for all subpixels in a pixel. Now here's the tricky part: The additional expense of Z fillrate required for antialiasing is saved again right away, but for now the Z bandwidth remains as high as it was.

As a consequence of that, the 4x antialiasing is quite slow, simply because there's not enough memory bandwidth, despite certain optimisations. The high antialiasing performance in comparison to the competition is mainly due to a lot of raw power. Though beaten by a GeForce2 Ultra in theoretical fillrate, the GeForce3 showed its true strength in 32 bit mode, and devastated all other 3D gamer cards available at that time.

Other than the GeForce2, the GeForce3 is able to make use of its 4x2 architecture also in 32 bit mode, and this additional power also affects the antialiasing speed, of course. Multisampling technology is only one component of high antialiasing speed. Another important part is the handling of the mentioned Z bandwidth which is extensively stressed by antialiasing. GeForce models higher than GeForce3 are capable of Z compression, which already increases the speed in general, but pays off even more for multisampling anti aliasing.

We won't deprive you of some numbers, we tested Villagemark in 1400x1050x32 on a GeForce4, clocked at 300/350 (Core, RAM).

Table: [MSAA-performance]

This table shows two things: first, MSAA alone isn't sufficient for attractive antialiasing speeds. Second, in 4x MSAA mode Z compression has more effect than early Z, and that's especially interesting because our GeForce4's memory was overclocked, which increases the available memory bandwidth. And still Z compression yields such a significant performance boost. Without antialiasing, Z compression increases overall speed only by 3%, with 2x MSAA already by 9%, and with 4x MSAA by a full 13%. Depending on the benchmark the results scale differently, but the tendency is obvious: Z compression is crucial for fast MSAA. The FX 5200 does not posses this feature and therefore is not a recommendable card for antialiasing.

Differences with HRAA and AccuView: What has been improved for GeForce4 Ti?

Antialiasing and anisotropic filtering are marketed under a compound name since GeForce4: AccuView on GeForce4, Intellisample on GeForce FX, and Intellisample HCT on GeForce FX 5900. AccuView technology was adopted unaltered, so quality hasn't changed either way. AccuView itself is based on the GeForce3's HRAA ("High Resolution Antialiasing"), but was improved. The changes concern speed as well as quality. First about quality. Geforce3 has a bit more awkward subpixel-positions:

Picture: [GeForce3 on the left, GeForce4 Ti (and FX) on the right.]

This picture shows the positions of the subpixels (red), the spot used for texture sample coordinate calculation (blue) and the temporary lines in the triangle setup (green).

Why do GeForce4 and FX sample "better" than GeForce3? The average distance from the subpixels to the texel position (where the colour value is sampled from, which applies to all the subpixels) is shorter than with GeForce3. Thus the average colour error per subpixel is smaller. In addition, the subpixels are centered nicely, not offset to the upper left-hand corner, the geometry sampling is more natural.

There are also differences in performance. The early Z test, which is the process of discarding occluded pixels, doesn't work on GeForce3 when MSAA is active. That's why already with 2x MSAA the performance drops significantly. GeForce4 Ti generally is a bit chary concerning Early Z, but with newer drivers, tweak programs such as aTuner are able to activate this feature and it also works with antialiasing.

There's another difference which only concerns the 2x mode in 3d fullscreen though. While GeForce3 generally uses conventional downfiltering, GeForce4 or higher wait with the filtering from the multisampling buffers until the RAMDAC scanout. This saves read/write-bandwidth for the downsampling buffer, but it only improves the performance as long as the framerate doesn't drop under 1/3 of the display refresh rate. The NV35 (GeForce FX 5900) also uses this method with 4x MSAA. The break-even here is 3/5 of the refresh rate. But where does NV35 draw the power for 4x MSAA from?

zeckensack

2004-01-23, 01:02:49

Zecki geht jetzt schlaf0rn. Morgen mittag geht's weiter.

Lost Prophet

2004-01-23, 01:37:34

Original geschrieben von zeckensack
Zecki geht jetzt schlaf0rn. Morgen mittag geht's weiter.

danke erstmal =)

3 kleine fehler hab ich gefunden:

seite 1: 3. absatz im raster-prozess teil, kurz vorm bild
"...in conjunction with early Z at the latest."

seite 2: vor-vor-letzter absatz:
"...encompasses methods of higher quality at a given performance impact."

seite 3:
2.er absatz
"For multisampling antialiasing (MSAA) the triangle setup of GeForce is done at doubled resolution."

Original geschrieben von zeckensack
Zwei Anmerkungen vorweg:
1)Ich schlage eine Änderung des Titels vor:
Multisampling antialiasing inspected
2)Anti-aliasing oder antialiasing?
Es sind im englischen Sprachraum beide Schreibweisen recht verbreitet, und IMO auch beide nicht falsch. Jedoch sollten wir hier konsistent bleiben. Ich habe jetzt mal die Version ohne Bindestrich reingedrückt.

ich hab mich mit dem titel an aths' vorschlag gehalten, ist mir egal welcher von den 2en

hm ich hab anti-aliasing geschrieben weil ichs nicht besser gewusst hab, naechstes mal dann halt ohne.
das ist auch mein groesstes problem, ich lese nicht in englischen hardware-foren, hab daher keine ahnung wie ein wort zu uebersetzen is, und muss dann entweder raten oder irgendwie umschreiben.
wenigstens is dieses mal ein bisschen mehr von meinem englisch uebriggeblieben :D

cya, axel

zeckensack

2004-01-23, 11:04:39

Original geschrieben von Purple_Stain
danke erstmal =)

3 kleine fehler hab ich gefunden:

seite 1: 3. absatz im raster-prozess teil, kurz vorm bild
"...in conjunction with early Z at the latest."

seite 2: vor-vor-letzter absatz:
"...encompasses methods of higher quality at a given performance impact."

seite 3:
2.er absatz
"For multisampling antialiasing (MSAA) the triangle setup of GeForce is done at doubled resolution."Gefixt :)
ich hab mich mit dem titel an aths' vorschlag gehalten, ist mir egal welcher von den 2en

hm ich hab anti-aliasing geschrieben weil ichs nicht besser gewusst hab, naechstes mal dann halt ohne.Wie gesagt, IMO ist beides richtig, nur sollte es einheitlich sein. Die Variante ohne den Bindestrich habe ich auch nur deswegen 'durchgedrückt', weil sie zuerst vorkam (in der Überschrift).
Wavey Dave schreibt "anti-aliasing" (http://www.beyond3d.com/reviews/sapphire/9600xt/)
Anandtech schreibt "antialiasing" (http://www.anandtech.com/video/showdoc.html?i=1931&p=4)
*schulterzuck*

das ist auch mein groesstes problem, ich lese nicht in englischen hardware-foren, hab daher keine ahnung wie ein wort zu uebersetzen is, und muss dann entweder raten oder irgendwie umschreiben.
wenigstens is dieses mal ein bisschen mehr von meinem englisch uebriggeblieben :D

cya, axel Ach was, das war schon ziemlich gut :)

zeckensack

2004-01-23, 11:41:53

Multisampling Antialiasing — A Closeup View

May 22nd, 2003 / by aths / page 5 of 8

Framebuffer compression: low hanging fruit

What's marketed as a compression with R300 and Geforce FX by ATi resp. nVidia, actually isn't one. Because only bandwith is saved, and not a bit of RAM. Framebuffer compression is very likely to be a lot simpler than Z-compression. With multisampling all subpixels in a pixel have the same colour (at least as long as the entire pixels lies within the polygon). This is taken advantage of and hence, in this case, Radeon models starting from the 9500 and GeForce FX models starting from 5600 write the colour into the framebuffer only once, along with a note that it applies to all subpixels. But if in the process of rendering the pixel is covered by a visible edge, the colour samples are written into the framebuffers the usual way.

How exactly the framebuffer compression is implemented remains unknown. Basically there are two approaches that are realistic. The first one is that a note is attached to each pixel which is entirely in the polygon, saying "the colour of multisampling buffer 0 applies to all subpixels". The second possible way is that a tile (or burstline) based architecture is used. With render-to-texture no compression seems to take place at all. As far as we know, colour compression in both ATi and nVidia chips is only active in conjunction with multisampling, and compresses by up to the level selected for antialiasing (ie. up to 4:1 compression for 4x AA).

As shown before, there is a significant increase in frame rate, when Z compression is enabled. A colour compression is the next logical step and delivers a further performance boost, especially in modes like 4x and higher. ATi market this feature as part of the HyperZ III package, where they lump together various methods to increase efficiency, while nVidia adds it to the image quality enhancing package "Intellisample". How much further Intellisample HCT (which was introduced with NV35) boosts the speed, is still unknown. The FX 5200 is capable of multisampling antialiasing, but not of colour compression.

Framebuffer compression is absolutely essential to fully make use of the multisampling potential. Without any loss of quality the speed is increased, especially when there's a lot of subpixels, a case where every bit of extra performance is very welcome anyway. Cards without this kind of colour pseudo-compression only realise half of multisampling's potential.

Multisampling alone only saves fillrate but thanks to these additional bandwith savings, antialiasing gains a lot of speed. So what's the flipside of the coin?

Textures, pixelshaders and multisampling: Where there's light, there must be shadows

The pipeline samples one texture value, and writes it into the corresponding frame buffers for all four subpixels. Downfiltering reads out all the buffers and mixes the colours. Obviously nothing changes if you mix four identical colours. Hence textures are left alone by multisampling.

Picture: [The entire pixel belongs to the triangle. The same color is written into the multisampling buffers 4 times and these 4 equal colors are mixed during downfiltering. Result: the pixel is exactly as if there wasn't any antialiasing.]

Still the Z test is taken per subpixel, which is absolutely necessary. Let's pretend two triangles intersect. To be able to smooth the new edge at the intersection, the test has to be done for each subpixel. If this isn't done there is no smoothing, as can be seen on Parhelia. Parhelia's antialiasing only works on first class polygon edges and not on intersection edges.

But what happens to the textures on polygon edges when multisampling? We remember - for every triangle one colour per pixel is sampled, and then applied to all subpixels that belong to the polygon. Thus the subpixels don't get their actual colour value. This resulting effect appears especially around edges, where it's possible, that a texture value is sampled which isn't in the polygon. In DirectX 9 there is a means to avoid this, but only for pixel shader version 3.0 or higher. With a flag ("centroid") the sample position is adjusted to always fall into the polygon. Current cards have to cope with the fact that pixel colours for polygon edges may possibly be sampled in areas that aren't coverd by the polygon.

With pixel shader version 1.3 or higher (supported since GeForce4 Ti, Radeon 8500 and Parhelia), the colour value and the Z value can be altered in the shader. This is required to render Z correct bump mapping. Bump mapping creates roughness on a flat polygon by per-pixel lighting effects. This "bumpiness" is only illusion, the polygon remains flat. This illusion breaks when such a bumpy polygon intersects another one. To calculate what occludes what, according to the simulated bumps, the pixelshader, which is responsible for bump mapping, also has to alter the Z value, since this value is used for the "occlusion" check.

But afterwards the changed Z value applies to all the subpixels. In summary: it's now possible to get per-pixel correct occlusion out of the bumps, but it's not subpixel accurate.

There are some other multisampling approaches with their own peculiarities. Let's talk about these next.

Anmerkung in eigener Sache:
"Centroid" ist purer Blödsinn (http://www.opengl.org/discussion_boards/ubb/Forum3/HTML/010037.html)
... und "Z corrected bump mapping" ist auch reichlich dümmlich, weil es nur bei Polygonen die mehr oder weniger exakt parallel zur Projektionsebene liegen funktioniert.

zeckensack

2004-01-23, 12:16:41

Multisampling Antialiasing — A Closeup View

May 22nd, 2003 / by aths / page 6 of 8

Alternative multisampling methods and their problems

Matrox created its own antialiasing solution for Parhelia and called it "Fragment Anti-Aliasing". A colour compression is implemented, Z fillrate and bandwith are saved as well. The problem of the unsmoothed intersection edges was already mentioned, but on the other hand the smoothing quality on outer polygon edges is quite good, since a 4x4 subpixel mask is used. This is a bit over the top though, because with better positioning, almost equal quality could have been achieved with a lot less subpixels. In addition, the memory for antialiasing samples is limited (obviously there's no RAM for 16x fullscreen antialiasing), and as a consequence of that, in some cases a few polygon edges aren't smoothed at all.

As we mentioned before, since the framebuffer compression saves just bandwith, AA modes with many subpixels require a lot of memory. The Matrox solution works only up to a certain number AA samples per frame, anything above that isn't calculated.

To get this under control, a procedure called Z3 antialiasing was invented. Per pixel the colours of up to 3 triangles can be saved. Thus every pixel has 3 "slots" which can save a colour each, along with the information about how many pixels the colour applies to. For more samples, the colours are mixed on-the-fly, but inaccuracies may occur when there are more than 3 subpixels, i.e. when more than three triangles cover some samples of a pixel. This will also lead to problems with handling Z values correctly.

Z3 antialiasing doesn't save the Z values per subpixel, only per triangle that covers the pixel. It also uses a Z gradient that can be used to reconstruct the covered subpixels' Z values. But if a fourth triangle touches the pixel, this process hits a limitation. As a result, the colour values on the edges are a bit imprecise, but a lot of memory is saved. In consumer hardware this procedure hasn't been implemented so far.

All the mentioned problems could be solved at once, if Tile Based Deferred Rendering was used. Multisampling saves a lot of texel fillrate, but bandwith also isn't a problem here, because every tile is rendered only once and is final when it's written into the framebuffer. Fast multisampling antialiasing with both a lot of subpixels and low memory footprint is practically only possible on a Tile Based Deferred Renderer.

Enough of the theory. Our practical investigation has so far only concerned the GeForce line. What about the Radeon?

Quality vs. Radeon: Who's the smoothest?

On Radeon 9500 or higher ATi offers multisampling as well. Here's the comparison:

Pictures: [GeForce since 4 Ti on the left, Radeon since 9500 on the right.]

Again, AA lines are shown in green, subpixels in red, and texel positions in blue. With 4x antialiasing the R300 core uses 4 temporary lines in the triangle setup. The better subpixels distribution also applies to the X axis. For the edges this results in a higher effective resolution:

Pictures: [A more intelligent grid delivers smoother edges.]

Although the chosen example isn't the best, it still demonstrates that the more efficient grid delivers smoother colour gradients.

In terms of multisampling antialiasing quality the R300 (as well as its successors R350 and R360) are miles ahead of the NV30: while the NV30 can only subdivide the triangle into two temporary lines per final on-screen line, the R300 can split it in up to six. What GeForce3, 4 and FX offer in terms of edge smoothing, has been hoplessly outdated by the R300. And since the R300 core there's an additional mechanism, which increases the smoothing quality even more.

Gamma correct Downfiltering: What you calculate should be what you see.

Actually this isn't a multisampling specific thing, but since in practice this feature is only encoutered along with multisampling antialiasing, we still want to mention it.

Generally gamma correction is used, when an image appears too dark. If just the brightness was cranked up on a dark image, black would turn to grey, and the lighter areas would merge, because everything above a certain brightness would become equally white. Dynamic range decreases and ultimately the image would be worse. Thus gamma correction is used. It has the convenient property to not reduce the contrast range, and instead turns the non-linear brightness of regular cathode ray tubes nearly back to linear.

For historic reasons, a lot of graphics were designed for common monitors though, and to linerarise the brightness in such a frame results in a unexpectedly bright picture. Therefore a truly linear brightness response via gamma correction is often undesirable. For antialiasing the downfiltering should assume a linear response though: for example, if the procedure creates up to 3 colour graduations, all three should be equally distributed with regard to brightness.

In reality moderatly dark colours are displayed so dark that it's very hard to distinguish them from black. Hence, not all of the colour graduations are visible anymore, which results in a "jaggy" egde. But then again a full correction of the frame with the gamma function would deliver too bright images. That's why it's a good idea to to gamma correct the subpixels before downfiltering, and to readjust the final pixel.

Picture: [If the inner and outer areas appear to have the same intensities, your monitor's brightness response is linear.]

As can be seen above, averaging two colours can result in unexpected brightness levels. Gamma corrected downfiltering solves this, but there are limits.

The driver needs to know, what the gamma correction setting of the device is, and then estimate which value would be ideal. On some displays, which perform the gamma correction outside the driver, this method can lead to an overcorrection, also resulting in worse visibility of the colour graduations.

We're actually now done with discussing multisampling. But some subtleties aren't obvious until the method is compared to another one. This shall happen on the next page.

zeckensack

2004-01-23, 13:04:54

Multisampling Antialiasing — A Closeup View

May 22nd, 2003 / by aths / page 7 of 8

Quality vs. supersampling: Can there be only one?

While multisampling calculates texture colours only per pixel, supersampling delivers automatic texture oversampling, which results in an effect similar to anisotropic filtering. At least in theory. In practise this is only true for GeForce cards. On Radeos, most drivers back-"correct" the texture detail via MIP-Map-LOD, to gain a bit of performance. With GeForce cards on the other hand the full quality advantage for textures is realised. On a Voodoo card it was necessary to alter the LOD control manually in order to get higher texture detail.

The "true" anisotropic filtering (short "AF"), is able to achieve equal quality with significantly less rendering power, thus the combination MSAA + AF is prefered. But compared to supersampling, multisampling has a disadvantage with textures when the so called "alpha testing" comes into play.

The alpha test can turn texture samples marked as translucent truly invisible, by abandoning the entire pixel after the alpha test and not writing it into the framebuffer. Especially older games make use of this technique. Edges within the texture emerge; supersampling smoothes that, multisampling of course doesn't. The texture doesn't get worse though, it just stays the way it would be without antialiasing.

When talking about textures, the term "Nyquist boundary" has to be mentioned as well. This boundary allows to calculate, which texture resolution has to be chosen for the image to be as sharp as possible without shimmering. In short, a ratio of "1 filtered texel per pixel" is ideal. But for AF more filtered texels per pixel are needed, and these have to be sampled from higher resolution MIP textures then.

Thanks to adaptive oversampling from higher resolution mipmap levels, AF reduces the bluriness of textures at sharp angles. The alpha texels are also sampled from the higher resolution textures. But in that case the alpha test is performed only once for the whole pixel, not per AF subsample. This is a violation of the Nyquist criterion and results in horrible flickering. The flickering would be less with AF disabled, but then its positive effects would obviously be lost as well.

Supersampling does not solve this problem, but the quality is clearly improved: On one hand normal textures get an AF effect, and still can get better when AF is activated. But more importantly: the flickering with alpha textures is reduced because internally the entire alpha texture is sampled in smaller steps. Instead of just transparent pixels, supersampling now also permits transparent subpixels. This advantage is partly nullified again though, because due to the subtler internal texture sampling, it is also sampled from a higher resolution MIP-level (if that's available). But in practise, such a frame is still more comfortable to look at with supersampling than with multisampling, where especially branched treetops flicker tend to stronlgy flicker.

With alpha blending (instead of alpha testing) the flickering is more of a minor issue. Alpha blending allows smooth transitions between visible and tranparent. For example, a texel can be marked "20% visible". In this case the colour is sampled from the texture, and the data for this spot is read from the framebuffer. Then the two colours are mixed accordingly and the result is written back into the framebuffer. But there's a problem, alpha blending only works properly when rendering from back to front. That's because the objects in the back have to be visible through the ones in the front, not vice versa. But this back-to-front rendering order is disadvantageous with regards to efficiency (generally it is tried to render from front to back). Due to static rendering order, additional wrong results can occur from certain angles, as in Max Payne: from below the lattice stairs look fine, from above the snow shines through where bars are supposed to be.

As long as the rendering order takes into account which textures are treated with alpha blending, this method is superior in quality compared to alpha testing. But it requires more work from the graphics card and CPU.

From a quality point of view, the best solution appears to be a combination of supersampling and alpha blending. Unfortunately, all modern cards only offer supersampling (if at all) in the inefficient oversampling mode. This burns more performance than quality is generated. Hence, the multisampling/oversampling combinations offered by nVidia since GeForce3 are not recommendable. Except the case, when unused power is available without the option to invest it in higher resolutions. That the 4xS mode still reached popularity is mainly due to the generally bad 4x MSAA performance of GeForce3 and 4.

Pictures: [On the right 4xS: 4 subpixels, 2 subtexels. This mode is superior to 4x MS, but far from ideal.]

For the sake of completeness also the 4xS-mode of GeForce3:

Picture: [4xS was available via tweak programs long before it was announced officially for GeForce4.]

Just as a side note, GeForce4 MX is only capable of 2x MSAA, 4x MSAA isn't possible. The 4x mode is simply 2x2 oversampling. 4xS, as a combination of 2x MSAA and 1x2 oversampling is still available though.

Time to draw a conclusion.

zeckensack

2004-01-23, 13:20:10

Multisampling Antialiasing — A Closeup View

May 22nd, 2003 / by aths / page 8 of 8

Conclusion: What do we want?

Generally multisampling antialiasing (with AF activated) is superior to supersampling, because better quality is achieved at the same performance, or, respectively, comparable quality at higher performance. At least as long as resolution can be increased. A higher resolution has smoother edges and more detailed textures. Thus our primary goal is to have smooth, high resolution edges. When antialiasing is switched off, even on 1600x1200 the jaggies are too annoying for true quality nuts.

Antialiasing can generate more quality than performance is burnt, and therefore it will never be "unnecessary". But how is this gain created?

If the resolution of both axes is doubled, "twice" as smooth edges are achieved, but it requires four times the performance. Basically it's the same with oversampling: more performance is lost, than quality is generated. A 4x rotated grid supersampling also requires 4 times the performance, but on the other hand it also smoothes the edges 4 times better. The performance investment is in balance with the quality gain.

4x rotated grid multisampling needs far less than an additional 300% speed, but still offers "4 times smoother" edges. Now we get more back than we spend.

Pictures: [4x OG doubles sampling accuracy on both axes while 4x RG offers 4 times base accuracy. Hence the RG mask more effectively catches slopes which results in smoother edges.]

With 4x OG on the other hand 2 subpixels can be omitted without really being penalised.

Pictures: [With 2x RG the edge smoothing is a bit worse than with 4x OG. But not significantly, since the internal increase in axis resolution is the same.]

Since multisampling was introduced with GeForce3, there have been complaints about all successors including GeForce FX 5900, that the 4x MSAA mode doesn't improve quality a lot compared to 2x, because the 4x mode still uses the inefficient rectangular grid.

There are cases which are generally problematic with multisampling, whether it's on a Radeon or a GeForce. Maliciously this could be called a "cheat" or at least a "hack", simply because several things are only calculated once per pixel, but are applied to all subpixels.

In any case a subpixel mask with rotated grid should be preferred over a mask with ordered grid though, because with the same number of subpixels (hence the same amount of work) the smoothing result is significantly better. The fact that GeForce FX doesn't support multisampling modes higher than 4x, and not up to 6x like Radeon 9500 and higher do, isn't really a disadvantage. What has an effect on image quality though, is the fact that the 4x mode uses an ordered grid, and thus the smoothing quality is far behind what Radeon 9500 or higher achieve. So be careful when reading anti aliasing benchmarks comparing "4x vs. 4x"; they usually are pointless, because very different subpixel masks are used.

To avoid the mentioned multisampling problems, it would be nice if rotated grid supersampling was offered as well. GeForce cards since GeForce3 at least have hybrid modes which mix multisampling with oversampling, which unfortunately is the worst supersampling method though. In theory the gained quality costs too much speed to be worthwile, but in practice there are games which aren't too demanding, and there the unused performance can still be invested into additional quality.

To a large extent multisampling is fillrate free, and if both colour and Z compression are offered, only little additional bandwith is used. Therefore such antialiasing nearly is "for free". It can be expected that future games will account for potential multisampling problems and that, as a result, supersampling gets less and less important.

This article wouldn't have been possible without the patient help of Xmas and especially Demirug. I am very grateful for this support.

This article represents the best of our knowledge. If you find a mistake please mail us, or post on our board.