NVIDIAs Gerüchteküche oder die Quelle von multichip, tile based und eDRAM [Archiv]

Demirug

2003-05-08, 21:10:45

Woher kommen nur immer wieder die Gerüchte das der nächste NVIDIA chip

- ein tile based Verfahren benutzt
- eine multi chip Lösung is
- Embedded DRAM einsetzt

???

Von NVIDIA selbst

Wenn man mal ein wenig im Developerbereich von NVIDIA rumsucht findet man unter den Vortragsfolien für die GDC 2002 diesen netten Beitrag

http://developer.nvidia.com/docs/IO/2642/ATT/gdc2002_what_comes_after_4.pdf

Und nun schaut mal auf Seite 17. :D

mapel110

2003-05-08, 22:07:50

aha

wenn der nv40 350 mio transistoren hat, kommt doch eine multichiplösung in frage. :D

Richthofen

2003-05-08, 22:44:13

Ich glaub Multichip kann man ausschliessen.
Wenn ich jetzt mal davon ausgehe wie begrenzt die Kapazitäten bei aktuellen Fertigungsprozessen ist, dann kann Multichip nicht wirklich zur Debatte stehen.

e-Dram und TBR ist schon möglich. Ich denke schon das e-DRAM eine Lösung ist, aber mit Sicherheit nicht die neulich verkündeten 16 MB.
TBR ist so ne Sache. Dafür müsste man das bisherhige Chipdesign gehörig durcheinander werfen.
Igrendwie muss der Chips noch ins NV3X Konzept passen. Der NV40 ist zwar wieder eine Neuentwicklung aber ich denke schon, dass man die NV3x Linie beibehalten, verbessern und ausbauen wird.

GloomY

2003-05-08, 23:40:31

Lol, und mal wieder Gordon Moore falsch zitiert. Der hat gar nichts über die Performance ausgesagt...

Aber ich finde es immerhin erstaunlich, dass nVidia in TBRs zumindest eine mögliche Lösung sieht. Wenn man bedenkt, dass sie dies vor 2 Jahren noch konsequent abgelehnt haben... ("Tiling is bad.")
Vielleicht kommen sie ja irgendwann zur Vernunft :D

Ailuros

2003-05-09, 01:40:37

2000 Transform and lighting (…DX8)

Wie bitte?

Uebrigens kann man heutzutage unter tile based vieles verstehen. In relativem Sinn hatte sogar die originale Voodoo tiles was Speicher-Optimierungen betrifft.

Wenn man beim shading auf einem IMR das rendering in zwei passes (an unshaded pass followed immediately by a shaded pass) aufbricht, dann wird das rendering in diesem Sinn auch verzoegert ergo deferred.

Ausschnitt:

About hierarchial z buffering. It provides efficient z checking and hidden surface removal under most circumstances. However, there are some cases where it does not work well.

Problems

There are 3 major problems with hierarchical z buffering based z checking:

1. The parallel plane problem.
2. The pin hole problem.
3. The z ordering problem.

The parallel plane problem refers to multiple parallel planes in close proximity to each other and at a slant angle to the viewer. Examples of this are a deck of cards, the pages of a book, some sheets of paper on a desk, paintings hung on a wall, etc. The close proximity of the planes and their slant means the the znear of the hidden planes are closer than the zfar of the visible plane down to a few pixels. Thus the z check fails at the higher levels of the hierarchy even though the triangles are processed front to back and the further triangles are all hidden.

The pin hole problem refers to small random holes in the scene preventing larger areas from filling up. This is a common problem in natural scenes such as trees and forests. This problem is worse than might be expected since even for holes that are eventually filled by distant geometry, all the geometry that occurs before the hole is filled fails the z check at the higher levels of the hierarchy. Note that like the problem above, this problem happens even though the geometry is processed front to back.

The z ordering problem is much more apparent. Most scenes are not ordered front to back since sorting by render state is usually more important.

Solutions for Hierarchical Z Buffers

1. The parallel plane problem can be solved by z checking multiple cells/pixels (say 4 or more) in parallel per pipe for each level of the z hierarchy. That is, the solution to the first problem is to use a combination of hierarchical z and parallel z checks (a combination of ATI's and NVidia's approaches).

2. The pin hole problem is a much more general problem that all occlusion culling algorithms face. One solution is to use something like the HOM algorithm that ignores small holes and treats them as occluded. However, this causes artifacts in the scene and I don't recommend it. The best solution is to use faster z checking hardware with more parallel z checking to solve the problem. So the best solution here is the same as the solution to 1.

3. The z ordering problem can be solved by performing application driven deferred rendering. The geometry is processed once with all shading turned off, z writing turned on, and no lighting done in the vertex shader. The second time it is processed with shading turned on, z writing turned off (but z checking turned on in both cases), and lighting processed. The first pass only processes z's, which are compressed and so it consumes little bandwidth. Since the render states do not matter on the first pass the geometry can be processed strictly front to back. On the second pass, the z buffer is fully set and only visible surfaces are pixel shaded and rendered and the geometry is ordered by render state. This creates a mechanism for deferred shading using immediate mode rendering.

Tilers

Deferred rendering tilers are also two pass mechanisms similar to solution 3 above. The major difference from solution 3 is that they perform the two passes in the driver without any application intervention, the first pass is essentially a very low resolution scan keeping the geometry in the (tile) buffer instead of a high res scan that keeps z values in the (z) buffer, and the second pass does not need to transform the geometry. On the first pass, solution 3 must store the first z (compressed and also in the hierarchy) while a tiler must store the triangle in the tile buffer. As triangle rates increase (and thereby scene complexity) the memory bandwidth tilts more in favor of solution 3 above, however, solution 3 requires more vertex shader performance.

The other major occlusion problem that all the above HSR methods face is eliminating T&L processing for hidden triangles. This is best accomplished by using a hierarchical visibility query of the bounding boxes in the scene graph in the first pass (before the lighting is done) and culling the geometry in those bounding boxes that are hidden. This method can be used with either a tiler or a hierarchical z buffer/fast z check.

http://www.beyond3d.com/forum/viewtopic.php?t=868&highlight=

Aus rein persoenlicher Spekulation halte ich es fuer wahrscheinlicher dass NVIDIA in der Zukunft eher zu hybriden Loesungen greift, als einem reinem display list renderer.

Was den "tiling is bad..." Kommentar betrifft, 3dfx hat aehnliches behauptet bevor sie Gigapixel aufgekauft haben.

Ailuros

2003-05-09, 01:47:05

TBR ist so ne Sache. Dafür müsste man das bisherhige Chipdesign gehörig durcheinander werfen.

Nicht unbedingt, warum siehe quote oben (unter der Vorraussetzung dass auch ein wahrer display list renderer gemeint ist).

Gigapixel brauchte sehr wenig Zeit um Rampage mit Hydra zu vereinen (siehe "Fusion"). SageII (der getrennete Geometrie Prozessor) kam mit eigenem Speicher (on die) und GP´s patentierten hierarchical Z Methode.