8x1 || 4x2 == egal ? [Archiv]

Archiv verlassen und diese Seite im Standarddesign anzeigen : 8x1 || 4x2 == egal ?

Unregistered

2003-02-27, 19:37:07

There's been a lot of confusion and controversy within the last few days regarding the architecture and number of pipelines on the GeForce FX chip. Both Beyond3D and Tech-Report got in-depth and technical on the matter as well. I was able to talk with NVIDIA last night on the phone and got a chance to go over the issue with the Product Manager at NVIDIA.

Head on over to our article to read what NVIDIA had to say and what we feel. Sample from the article:

NVIDIA feels confident that when more advanced games takes advantage of the GeForce FX's features, then we'll truly see a performance and visual difference with the competition. I did point out to them that most people don't buy video cards based on potential, and they agreed. They said that's why they're working closely with developers to help get some games out, and encourage them to put benchmarks in them, so that potential can be realized sooner.
Update: MikeC over at NVNews also offers up some of his own thoughts based on our article.

Update 2: Looks like Croteam (they developed Serious Sam 1 & 2) shares the same opinion as I do:

Just wanted to write a word or two regarding the issue raised couple of days ago. Seems like the whole Internet community wants to crucify nVidia about the controversy of how many rendering pipelines GeForceFX realy has. Is it 8 pipelines with 1 texture unit, or 4 with 2, or ... uh... I don't know anymore. And it really DOESN'T matter that much!
The only thing that matters is how fast and how good it can render pixels. And both GeForceFX and Radeon9700 are great products, the kind of hardware that developers long for. So, personally, I don't care much what's "under the hood".

Don't get me wrong, I am into 3D-graphic hardware, but this pipeline thing really went out of proportion. Number of pipelines is a good hardware information, and that's all there's to it. It really doesn't need to reflect the speed of the hardware directly. Come to think of it... currently, there are no games that utilize even 1/3rd of nifty features these two boards have.

Guess that settles it from a developer's point of view, eh?

3dGPU (http://www.3dgpu.com/)

Finde solange die Leistung doch entsprechend ist, kannst doch eigentlich egal sein ob jetzt 4x2 oder 8x1 solange die Leistung stimmt.
Aber NV ist ja sehr schlecht mit der FX weil sie ja "nur" 4x2 nutzen ^^

Was sagt ihr dazu ?

Unregistered

2003-02-27, 19:52:00

Originally posted by Unregistered

3dGPU (http://www.3dgpu.com/)

Finde solange die Leistung doch entsprechend ist, kannst doch eigentlich egal sein ob jetzt 4x2 oder 8x1 solange die Leistung stimmt.
Aber NV ist ja sehr schlecht mit der FX weil sie ja "nur" 4x2 nutzen ^^

Was sagt ihr dazu ?

Tja, warum macht Nvidias Marketingabteilung dann ein 8-Pipeline-Design draus oder ist nicht bereit darüber wirklich Auskunft zu geben?

Pussycat

2003-02-27, 20:46:07

Bei einer Ungeraden Texturzahl ist das Design ein Nachteil. Daran ist nichts gut zu reden.

Unregistered

2003-02-27, 21:14:43

8 x 1 = 8 Pixel-Shader2 Einheiten

4 x 2 = 4 Pixel-Shader2 Einheiten.

Also für alles neue ( >= DX8.0 ) ist es schon ein Unterschied, ob es sich um ein 8x1 oder 4x2 Setup handelt.

Xmas

2003-02-28, 01:16:19

Originally posted by Unregistered
8 x 1 = 8 Pixel-Shader2 Einheiten

4 x 2 = 4 Pixel-Shader2 Einheiten.
Das stimmt ja so nicht unbedingt. Die übliche Notation ist Anzahl Pipes x Anzahl TMUs pro Pipe, und das sagt über die ALUs erst einmal gar nichts aus. Zumal sinnvollerweise mindestens so viele ALUs wie TMUs vorhanden sein sollten, wenn nicht überall AF/trilinear verwendet wird.

Ailuros

2003-02-28, 02:29:53

Frage: kann die Limitierung von nur 4 colour writes keine Nachteile mit sich fuehren?

zeckensack

2003-02-28, 07:14:19

Originally posted by Ailuros
Frage: kann die Limitierung von nur 4 colour writes keine Nachteile mit sich fuehren? Kann.
Tendenziell nimmt der Nachteil ab, je komplexer die Pixel Shader werden.

Demirug

2003-02-28, 07:30:58

Originally posted by zeckensack
Kann.
Tendenziell nimmt der Nachteil ab, je komplexer die Pixel Shader werden.

... und je besser die benutzen Texturefilter sind. Wenn die TMUs nur bilinear können und man grundsätzlich einen Trilinearen filter benutzt ist es egal ob nun 4*2 oder 8*1 Pipelines hat.

Power

2003-02-28, 08:08:25

Originally posted by Demirug

... und je besser die benutzen Texturefilter sind. Wenn die TMUs nur bilinear können und man grundsätzlich einen Trilinearen filter benutzt ist es egal ob nun 4*2 oder 8*1 Pipelines hat.

wäre dann die Matrox Technik mit 4*4 nicht von Haus aus Leistungsfähiger bei richtiger verteilung ?

Demirug

2003-02-28, 08:49:27

Originally posted by Power

wäre dann die Matrox Technik mit 4*4 nicht von Haus aus Leistungsfähiger bei richtiger verteilung ?

Ja die 4*4 Pipeline (mit 5 ALUs pro Pipe) müsste eigentlich in Situationen mit langen Shadern und guten Filtern eine GF4TI schlagen. Das Problem liegt dort auch IMO nicht an der Pipeline selbst sondern davor bzw. dahinter.

robbitop

2003-02-28, 13:05:36

die Parhelia hat 5 ALLUs pro pipe?

Dann dürfte die mangelnde Effizienz wirklich nicht daher kommen...

@Demi
was denkst du, hat die NV30 2ALUs pro Pipe (bei 4 Pipes wären es ja auch noch 8 ALUs)?

Demirug

2003-02-28, 13:21:02

Originally posted by robbitop
die Parhelia hat 5 ALLUs pro pipe?

Dann dürfte die mangelnde Effizienz wirklich nicht daher kommen...

Ja hat sie: http://www.3dcenter.de/artikel/parhelia/pic1.php

Das Problem ist allerdings das ich nicht weiss wie mächtig eine solche ALU im vergleich zu einem Reg-Combiner von NVIDIA ist.

@Demi
was denkst du, hat die NV30 2ALUs pro Pipe (bei 4 Pipes wären es ja auch noch 8 ALUs)?

Wenn meine Rechnung stimmt hat der NV30 32 FP-ALUs in den Pixelshader verteilt. Wie diese nun aber angeordnet sind, in wie weit sie unabhängig voneinader arbeiten und wie viele Takte eine ALU pro PS-Operation verbraucht werden kann ich nicht sagen.

robbitop

2003-02-28, 13:29:32

32??? :O
und wieviele hat der R300?
Die NV2x hatten doch bloss 4 gehabt oder??

Xmas

2003-02-28, 13:35:12

Originally posted by robbitop
32??? :O
und wieviele hat der R300?
Die NV2x hatten doch bloss 4 gehabt oder??
2 pro Pipeline * 4 Pipelines * 4 Komponenten.

Demirug

2003-02-28, 13:39:11

Originally posted by Xmas

2 pro Pipeline * 4 Pipelines * 4 Komponenten.

Nein, wenn man so rechnet wären es beim NV30 sogar 128. Bei den 32 ALUs handelt es sich aber wohl sehr wahrscheinlich um micro-ALU. IMHO bilden 4 dieser ALUs eine vollständige ALU.

robbitop

2003-02-28, 13:40:25

also vergleichbar mit den 8ALUs der R300 leistungsmässig?

Demirug

2003-02-28, 13:40:27

Originally posted by robbitop
32??? :O
und wieviele hat der R300?
Die NV2x hatten doch bloss 4 gehabt oder??

Die NV2x Rheie hat pro Pipelines zwei Reg-Combiner (entspricht einer vollständigen ALU). Bei 4 Pipelines also 8.

Demirug

2003-02-28, 13:41:17

Originally posted by robbitop
also vergleichbar mit den 8ALUs der R300 leistungsmässig?

Mit den aktuellen Treiber wohl (noch) nicht.

Xmas

2003-02-28, 13:43:25

Originally posted by Demirug
Nein, wenn man so rechnet wären es beim NV30 sogar 128. Bei den 32 ALUs handelt es sich aber wohl sehr wahrscheinlich um micro-ALU. IMHO bilden 4 dieser ALUs eine vollständige ALU.
Du meinst Mul, Add, Komponentweise Addierung und ? (Table Lookup für rcp, rsq)?
Diese Aufteilung könnte man beim NV25 aber genauso machen.

robbitop

2003-02-28, 13:43:53

kannst du über Ursachen spekulieren?? ^^

Demirug

2003-02-28, 14:10:41

Originally posted by Xmas

Du meinst Mul, Add, Komponentweise Addierung und ? (Table Lookup für rcp, rsq)?
Diese Aufteilung könnte man beim NV25 aber genauso machen.

Sowas in der Art. Ich habe die Information das der NV30 128 Skalar bzw 32 Vector4 FP-Operationen pro Takt durchführen kann.

In wie weit dazu jetzt aber auch Load und Store Befehle zählen und wie das Swizzling und die Source Modifiers da mitzählen kann ich nun nicht sagen.

Wenn man das Load und Store zählt aber das Swizzling und die Source Modifiers dort schon dabei währen ergibt sich:

Ein Multiply-Add (mad, die wohl aufwendigste 1 Slot Operation) bräuchte also 12 Loads, 4 Stores, 4 mul und 4 add = 24 Skalar operationen (auf 4 Takte verteilt).

Das würde dann ca 5,33 Ops pro Takt entsprechen.

Ein einfacher Add kommt auf 8 Load, 4 Store, 4 add = 16 Skalar Ops was 8 Pixel/Takt entspricht.