PDA

Archiv verlassen und diese Seite im Standarddesign anzeigen : GPUPI


Lard
2014-12-24, 09:37:22
Das Programm berechnet die Kreiszahl Pi komplett parallelisiert auf eurer Grafikkarte.
https://www.overclockers.at/news/legends-never-die-gpupi

HD 7970 1200/1700MHz
http://www.forum-3dcenter.org/vbulletin/attachment.php?attachmentid=50717&stc=1&d=1419409374

Radeonfreak
2014-12-24, 10:28:06
http://abload.de/img/testnjpgf.gif

http://abload.de/img/unbenanntgypg1.png

http://abload.de/img/unbenannt168rtq.png

M4xw0lf
2014-12-24, 11:14:01
Liegts an OpenCL, oder warum ist die GTX980 kaum schneller als die HD7970?

Tyrann
2014-12-24, 11:47:39
http://i.imgur.com/BYEjUCV.gif

http://i.imgur.com/1vw8ma2.png

kevsti
2014-12-24, 12:49:14
Liegts an OpenCL, oder warum ist die GTX980 kaum schneller als die HD7970?Sind nVidia Karten nicht schon immer langsamer beim GPU Computing? Bzw. jedenfalls seit der HD6xxx oder HD7xxx Reihe.

Raff
2014-12-24, 20:42:59
Maxwell 2.0 sollte bei Compute eigentlich nicht mehr zurückstehen.
Aber sei's drum - hier ist erst mal Kepler auf Dope. Ich muss ganz schön mit der Taktkeule ran, um hier mitzuhalten, Hawaii ist jedoch völlig uneinholbar. Aber CUDA, Nvidias liebstes weil proprietäres Kind, ist erwartungsgemäß etwas schneller als OpenCL:

OpenCL GPU: NVIDIA GeForce GTX 780 Ti @ 1.300/3.800 MHz

OpenCL 1.1 CUDA 6.5.29 is ready.

Compiling OpenCL kernels ... done.

Calculating 1.000.000.000th digit of PI. 20 iterations.

Allocated device memory : 335546368 Bytes
Batch Size : 20M
Reduction Size : 64

00h 00m 00.242s Batch 1 finished.
00h 00m 01.085s Batch 2 finished.
00h 00m 01.844s Batch 3 finished.
00h 00m 03.371s Batch 4 finished.
00h 00m 06.752s Batch 5 finished.
00h 00m 09.800s Batch 6 finished.
00h 00m 10.567s Batch 7 finished.
00h 00m 11.324s Batch 8 finished.
00h 00m 12.844s Batch 9 finished.
00h 00m 16.201s Batch 10 finished.
00h 00m 19.229s Batch 11 finished.
00h 00m 19.995s Batch 12 finished.
00h 00m 20.752s Batch 13 finished.
00h 00m 22.277s Batch 14 finished.
00h 00m 25.652s Batch 15 finished.
00h 00m 28.694s Batch 16 finished.
00h 00m 29.460s Batch 17 finished.
00h 00m 30.216s Batch 18 finished.
00h 00m 31.734s Batch 19 finished.
00h 00m 35.086s Batch 20 finished.
00h 00m 38.030s PI value output -> 5895585A0

Device time for pi calculation: 37.220 s
Device time for memory reduction: 0.809 s

_____________________________________________________

CUDA GPU: GeForce GTX 780 Ti with compute capability 3.5 @ 1.300/3.800 MHz

=> Kernel 1, Batch Size: 20M, Blocks: 20480, Threads: 1024
=> Kernel 2, Batch Size: 20M, Blocks: 20480, Threads: 1024

Calculating 1.000.000.000th digit of PI. 20 iterations.

Allocated device memory : 335545360 Bytes
Batch Size : 20M
Reduction Size : 64

00h 00m 00.108s Batch 1 finished.
00h 00m 00.952s Batch 2 finished.
00h 00m 01.680s Batch 3 finished.
00h 00m 03.389s Batch 4 finished.
00h 00m 06.616s Batch 5 finished.
00h 00m 09.482s Batch 6 finished.
00h 00m 10.218s Batch 7 finished.
00h 00m 10.946s Batch 8 finished.
00h 00m 12.646s Batch 9 finished.
00h 00m 15.847s Batch 10 finished.
00h 00m 18.693s Batch 11 finished.
00h 00m 19.429s Batch 12 finished.
00h 00m 20.156s Batch 13 finished.
00h 00m 21.863s Batch 14 finished.
00h 00m 25.085s Batch 15 finished.
00h 00m 27.947s Batch 16 finished.
00h 00m 28.682s Batch 17 finished.
00h 00m 29.409s Batch 18 finished.
00h 00m 31.107s Batch 19 finished.
00h 00m 34.305s Batch 20 finished.
00h 00m 37.072s PI value output -> 5895585A0

Device time for pi calculation: 36.243 s
Device time for memory reduction: 0.829 s

Rest: Phenom II X6 @ 4,0 GHz (Northbridge @ 2,94), Windows 7 x64

MfG,
Raff

Butterfly
2014-12-25, 10:51:40
@LARD
Danke für den Thread!
Es scheint wirklich kein Muli-GPU genutzt zu werden.
Bis zur 7 Schleife wir meine HD 7970 @1100-1500MHz nur mit ~56% Ausgelastet, danach geht sie auf 99% hoch.

Was ist denn die "Device time for memory reduction" ?

http://abload.de/img/gpupi_79701100-1500tzuyb.jpg

atopisch
2014-12-25, 12:10:34
Core i7-4770K@4,4GHz, GTX980@1498MHz Boost
Cuda:
http://abload.de/img/gpupixfa7r.png (http://abload.de/image.php?img=gpupixfa7r.png)
http://abload.de/img/gpupi-gpu-z2nluo.png (http://abload.de/image.php?img=gpupi-gpu-z2nluo.png)
http://abload.de/img/nvidia_20141225_1221570bwv.png (http://abload.de/image.php?img=nvidia_20141225_1221570bwv.png)

Lard
2014-12-25, 13:13:29
Bis zur 7 Schleife wir meine HD 7970 @1100-1500MHz nur mit ~56% Ausgelastet, danach geht sie auf 99% hoch.
Das ist bei mir genauso.
Anfangs ist die Auslastung für einige Sekunden nur bei 50/55%.
http://www.forum-3dcenter.org/vbulletin/attachment.php?attachmentid=50726&stc=1&d=1419510492
Was ist denn die "Device time for memory reduction" ?

Nach der reinen Berechnung erfolgt eine Reduktion aller Zwischenergebnisse zu einer einzigen Zahl, die im Laufe des Benchmarks kumuliert wird und Basis für das finale Ergebnis dient. Diese sogenannte "Memory Reduction" ist ein bekanntes Problem in der parallelen Programmierung und findet deshalb ebenso auf dem OpenCL-Device statt.

Butterfly
2014-12-25, 22:52:01
@Lard
Danke für die Info!
Es scheint mit dem GPU Takt zu skalieren, also geringere Werte sind besser?
Ich muss mal schauen welchen Featureset der y-cruncher für die memory reduction nutzt.

Hier mal die vollständige Werte: http://abload.de/img/gpupi_79701125-1575dwr9f.jpg

Darkman.X
2014-12-26, 02:33:20
i7-5930K @ 4 GHz
GTX 780 Ti @ default

1. Bild: CUDA
Device time for pi calculation: 48.605 s
Device time for memory reduction: 1.089 s

2. Bild: GPU OpenCL
Device time for pi calculation: 49.653 s
Device time for memory reduction: 1.067 s

3. Bild: CPU OpenCL 1.2 (Intel-Runtimes)
Device time for pi calculation: 603.612 s
Device time for memory reduction: 4.644 s

4. Bild: CPU OpenCL 2.0 (Intel-SDK)
Device time for pi calculation: 601.181 s
Device time for memory reduction: 5.433 s

5. Bild: CPU OpenCL 2.0 (AMD-SDK)
Device time for pi calculation: 547.627 s
Device time for memory reduction: 18.074 s

MORPHiNE
2014-12-26, 14:21:56
i7 4770K @ 4,4 GHz
Device time for pi calculation: 744.077 s
Device time for memory reduction: 25.755 s
http://i.imgur.com/OIzqrb0.png

R9 290X @ 1050/1350 MHz
Device time for pi calculation: 21.794 s
Device time for memory reduction: 0.979 s
http://i.imgur.com/SGI1cRx.png

Geldmann3
2014-12-26, 16:34:59
R9 290 Standardtaktrate

947/1250

OpenCL GPU: AMD Hawaii (40 CUs, 947 MHz)
OpenCL 2.0 AMD-APP (1642.5) is ready.

Compiling OpenCL kernels ... done.

Calculating 1.000.000.000th digit of PI. 20 iterations.

Allocated device memory : 335546368 Bytes
Batch Size : 20M
Reduction Size : 64

00h 00m 00.146s Batch 1 finished.
00h 00m 00.900s Batch 2 finished.
00h 00m 01.667s Batch 3 finished.
00h 00m 02.902s Batch 4 finished.
00h 00m 05.135s Batch 5 finished.
00h 00m 07.115s Batch 6 finished.
00h 00m 07.868s Batch 7 finished.
00h 00m 08.635s Batch 8 finished.
00h 00m 09.852s Batch 9 finished.
00h 00m 12.021s Batch 10 finished.
00h 00m 13.946s Batch 11 finished.
00h 00m 14.698s Batch 12 finished.
00h 00m 15.467s Batch 13 finished.
00h 00m 16.702s Batch 14 finished.
00h 00m 18.937s Batch 15 finished.
00h 00m 20.915s Batch 16 finished.
00h 00m 21.670s Batch 17 finished.
00h 00m 22.438s Batch 18 finished.
00h 00m 23.655s Batch 19 finished.
00h 00m 25.825s Batch 20 finished.
00h 00m 27.675s PI value output -> 5895585A0

Device time for pi calculation: 26.567 s
Device time for memory reduction: 1.108 s

Übertaktet auf 1070/1250

OpenCL GPU: AMD Hawaii (40 CUs, 947 MHz)
OpenCL 2.0 AMD-APP (1642.5) is ready.

Compiling OpenCL kernels ... done.

Calculating 1.000.000.000th digit of PI. 20 iterations.

Allocated device memory : 335546368 Bytes
Batch Size : 20M
Reduction Size : 64

00h 00m 00.137s Batch 1 finished.
00h 00m 00.823s Batch 2 finished.
00h 00m 01.524s Batch 3 finished.
00h 00m 02.641s Batch 4 finished.
00h 00m 04.656s Batch 5 finished.
00h 00m 06.440s Batch 6 finished.
00h 00m 07.123s Batch 7 finished.
00h 00m 07.820s Batch 8 finished.
00h 00m 08.921s Batch 9 finished.
00h 00m 10.878s Batch 10 finished.
00h 00m 12.614s Batch 11 finished.
00h 00m 13.296s Batch 12 finished.
00h 00m 13.991s Batch 13 finished.
00h 00m 15.108s Batch 14 finished.
00h 00m 17.120s Batch 15 finished.
00h 00m 18.907s Batch 16 finished.
00h 00m 19.590s Batch 17 finished.
00h 00m 20.285s Batch 18 finished.
00h 00m 21.387s Batch 19 finished.
00h 00m 23.346s Batch 20 finished.
00h 00m 25.013s PI value output -> 5895585A0

Device time for pi calculation: 23.898 s
Device time for memory reduction: 1.115 s

Übertaktet auf 947/1350

OpenCL GPU: AMD Hawaii (40 CUs, 947 MHz)
OpenCL 2.0 AMD-APP (1642.5) is ready.

Compiling OpenCL kernels ... done.

Calculating 1.000.000.000th digit of PI. 20 iterations.

Allocated device memory : 335546368 Bytes
Batch Size : 20M
Reduction Size : 64

00h 00m 00.148s Batch 1 finished.
00h 00m 00.919s Batch 2 finished.
00h 00m 01.706s Batch 3 finished.
00h 00m 02.969s Batch 4 finished.
00h 00m 05.251s Batch 5 finished.
00h 00m 07.270s Batch 6 finished.
00h 00m 08.041s Batch 7 finished.
00h 00m 08.828s Batch 8 finished.
00h 00m 10.073s Batch 9 finished.
00h 00m 12.287s Batch 10 finished.
00h 00m 14.252s Batch 11 finished.
00h 00m 15.024s Batch 12 finished.
00h 00m 15.810s Batch 13 finished.
00h 00m 17.072s Batch 14 finished.
00h 00m 19.349s Batch 15 finished.
00h 00m 21.369s Batch 16 finished.
00h 00m 22.143s Batch 17 finished.
00h 00m 22.929s Batch 18 finished.
00h 00m 24.174s Batch 19 finished.
00h 00m 26.390s Batch 20 finished.
00h 00m 28.277s PI value output -> 5895585A0

Device time for pi calculation: 27.032 s
Device time for memory reduction: 1.245 s

Das Übertakten des Videospeichers wirkt sich nicht positiv auf die Performance aus.

Lightning
2014-12-26, 16:59:05
Mal ein kleinerer Fermi zum Vergleich: GeForce GTX 460 @ 850/1000 MHz

OpenCL: 117.911 s | 2.376 s

http://abload.de/img/13tusy.png

CUDA: 114.917 | 2.154 s

http://abload.de/img/2xiuml.png

=Floi=
2014-12-27, 06:11:18
die tolle ösi seite geht natürlich wieder nicht mehr...

Spasstiger
2014-12-27, 09:08:30
1 Mrd. Stellen von Pi in 25 Sekunden, Respekt. Erstmals überhaupt wurden 1 Mrd. Nachkommastellen von Pi im Jahr 1989 berechnet.

Darkman.X
2014-12-27, 11:58:29
So lange der Vorrat reicht (beim übermäßigen Traffic wird die Datei wohl gesperrt/gelöscht):

Download (www.google.de)


EDIT:
Ursprungsseite wieder erreichbar, Download gelöscht.

krypton
2014-12-31, 20:57:23
Die GTX 680 ist auch schon etwas betagt:
http://abload.de/img/gpupiyqs1n.png

teezaken
2015-01-01, 18:43:39
GTX 770 @ 1,3Ghz

http://abload.de/img/unbenannt7yuh1.jpg (http://abload.de/image.php?img=unbenannt7yuh1.jpg)

M4xw0lf
2015-01-02, 13:07:29
Auf der Website heißt es ja, das Programm rechnet in FP64 (teils sogar erweitert auf 128), aber dann wundern mich die Ergebnisse doch etwas. Die HD 7970 müsste dann nämlich sogar schneller als die 290er aufgrund ihrer weniger (bzw. gar nicht) beschnittenen DP-Leistung.

Voodoo
2015-01-03, 13:12:29
R7 260X

pi calculation: 116.363 s
memory reduction: 1.713 s

750Ti OpenCL

pi calculation: 155.942 s
memory reduction: 3.754 s

750Ti Cuda

pi calculation: 149.670 s
memory reduction: 3.860 s

Warum erzeugt eigentlich die 750Ti (OpenCL & CUDA) so hohe CPU Last und die 260X nicht?

Rooter
2015-01-03, 14:54:25
NVIDIA GeForce GTX 550 Ti (4 CUs, 2000 MHz)

OpenCL:
Device time for pi calculation: 175.358 s
Device time for memory reduction: 2.866 s

CUDA:
Device time for pi calculation: 171.047 s
Device time for memory reduction: 2.926 s

:freak:

MfG
Rooter

HarryHirsch
2015-05-16, 00:40:38
http://abload.de/img/2015-05-16_003821e8o0d.png (http://abload.de/image.php?img=2015-05-16_003821e8o0d.png)

Der Weltrekord liegt bei 2,531 sek. :freak:

labecula
2015-05-16, 09:31:29
NVIDIA GeForce GTX 980 AMP!Extreme (16 CUs, 1392 MHz)
i7 2600K 4.3Ghz -> PCIe2.0, Win7 x64, 350.12WHQL
OpenCL 1.2
Cuda 7.0.0
20M/64


OpenCL:
Device time for pi calculation: 29.963s
Device time for memory reduction: 0.986s

CUDA:
Device time for pi calculation: 30.384s
Device time for memory reduction: 0.992s

Achill
2015-05-16, 10:31:12
r290x@1040/1300 (Stock) => 24.403s
OpenCL 2.0 AMD-APP is ready. Timer: HPET (14.32 MHz)

OpenCL GPU: AMD Radeon R9 290X (44 CUs, 1040 MHz)
Compiling OpenCL kernels ... done.

Calculating 1.000.000.000th digit of PI. 20 iterations.

Allocated device memory : 335546368 Bytes
Batch Size : 20M
Reduction Size : 64

00h 00m 00.642s Batch 1 finished.
00h 00m 01.288s Batch 2 finished.
00h 00m 02.239s Batch 3 finished.
00h 00m 04.246s Batch 4 finished.
00h 00m 06.169s Batch 5 finished.
00h 00m 06.799s Batch 6 finished.
00h 00m 07.450s Batch 7 finished.
00h 00m 08.397s Batch 8 finished.
00h 00m 10.350s Batch 9 finished.
00h 00m 12.217s Batch 10 finished.
00h 00m 12.847s Batch 11 finished.
00h 00m 13.493s Batch 12 finished.
00h 00m 14.443s Batch 13 finished.
00h 00m 16.446s Batch 14 finished.
00h 00m 18.366s Batch 15 finished.
00h 00m 18.996s Batch 16 finished.
00h 00m 19.643s Batch 17 finished.
00h 00m 20.583s Batch 18 finished.
00h 00m 22.534s Batch 19 finished.
00h 00m 24.403s PI value output -> 5895585A0

Statistics

Calculation + Reduction time: 23.306s + 1.079s

r290x@1125/1300 => 22,884s
OpenCL 2.0 AMD-APP is ready. Timer: HPET (14.32 MHz)

OpenCL GPU: AMD Radeon R9 290X (44 CUs, 1040 MHz)
Compiling OpenCL kernels ... done.

Calculating 1.000.000.000th digit of PI. 20 iterations.

Allocated device memory : 335546368 Bytes
Batch Size : 20M
Reduction Size : 64

00h 00m 00.600s Batch 1 finished.
00h 00m 01.209s Batch 2 finished.
00h 00m 02.102s Batch 3 finished.
00h 00m 03.978s Batch 4 finished.
00h 00m 05.777s Batch 5 finished.
00h 00m 06.371s Batch 6 finished.
00h 00m 06.984s Batch 7 finished.
00h 00m 07.868s Batch 8 finished.
00h 00m 09.696s Batch 9 finished.
00h 00m 11.448s Batch 10 finished.
00h 00m 12.043s Batch 11 finished.
00h 00m 12.651s Batch 12 finished.
00h 00m 13.545s Batch 13 finished.
00h 00m 15.421s Batch 14 finished.
00h 00m 17.217s Batch 15 finished.
00h 00m 17.812s Batch 16 finished.
00h 00m 18.420s Batch 17 finished.
00h 00m 19.307s Batch 18 finished.
00h 00m 21.135s Batch 19 finished.
00h 00m 22.884s PI value output -> 5895585A0

Statistics

Calculation + Reduction time: 21.735s + 1.130s

seaFs
2015-05-16, 13:57:40
HD6970 880/1350 --> 111.375s + 0.521s
[code]
OpenCL 1.2 AMD-APP is ready. Timer: RTC (1 ms)

OpenCL GPU: AMD Radeon HD 6970 (24 CUs, 880 MHz)
Compiling OpenCL kernels ... done.

Calculating 1.000.000.000th digit of PI. 20 iterations.

Allocated device memory : 335546368 Bytes
Batch Size : 20M
Reduction Size : 64

00h 00m 03.003s Batch 1 finished.
00h 00m 06.043s Batch 2 finished.
00h 00m 10.398s Batch 3 finished.
00h 00m 19.619s Batch 4 finished.
00h 00m 28.444s Batch 5 finished.
00h 00m 31.443s Batch 6 finished.
00h 00m 34.484s Batch 7 finished.
00h 00m 38.759s Batch 8 finished.
00h 00m 47.544s Batch 9 finished.
00h 00m 55.956s Batch 10 finished.
00h 00m 58.955s Batch 11 finished.
00h 01m 01.995s Batch 12 finished.
00h 01m 06.351s Batch 13 finished.
00h 01m 15.571s Batch 14 finished.
00h 01m 24.396s Batch 15 finished.
00h 01m 27.395s Batch 16 finished.
00h 01m 30.435s Batch 17 finished.
00h 01m 34.709s Batch 18 finished.
00h 01m 43.496s Batch 19 finished.
00h 01m 51.907s PI value output -> 5895585A0

Statistics

Calculation + Reduction time: 111.375s + 0.521s

HD5850 760/1050 --> 125.276s + 6.255s
[code]
OpenCL 2.0 AMD-APP is ready. Timer: RTC (1 ms)

OpenCL GPU: AMD Radeon HD 5850/6850 (18 CUs, 760 MHz)
Compiling OpenCL kernels ... done.

Calculating 1.000.000.000th digit of PI. 20 iterations.

Allocated device memory : 335546368 Bytes
Batch Size : 20M
Reduction Size : 64

00h 00m 03.298s Batch 1 finished.
00h 00m 06.654s Batch 2 finished.
00h 00m 11.618s Batch 3 finished.
00h 00m 22.091s Batch 4 finished.
00h 00m 32.115s Batch 5 finished.
00h 00m 35.749s Batch 6 finished.
00h 00m 39.555s Batch 7 finished.
00h 00m 44.751s Batch 8 finished.
00h 00m 55.215s Batch 9 finished.
00h 01m 05.064s Batch 10 finished.
00h 01m 08.723s Batch 11 finished.
00h 01m 12.485s Batch 12 finished.
00h 01m 17.744s Batch 13 finished.
00h 01m 28.532s Batch 14 finished.
00h 01m 38.969s Batch 15 finished.
00h 01m 42.997s Batch 16 finished.
00h 01m 46.515s Batch 17 finished.
00h 01m 51.478s Batch 18 finished.
00h 02m 01.900s Batch 19 finished.
00h 02m 11.560s PI value output -> 5895585A0

Statistics

Calculation + Reduction time: 125.276s + 6.255s

Nuon
2015-05-27, 19:12:13
CUDA 7.0.5 is ready. Timer: HPET (14.32 MHz)

CUDA GPU: GeForce GTX 760
Kernel 1, Batch Size: 20M, Blocks: 20480, Threads: 1024
Kernel 2, Batch Size: 20M, Blocks: 20480, Threads: 1024

Calculating 1.000.000.000th digit of PI. 20 iterations.

Allocated device memory : 335545360 Bytes
Batch Size : 20M
Reduction Size : 64

00h 00m 01.905s Batch 1 finished.
.
.
.
00h 01m 16.869s Batch 19 finished.
00h 01m 24.081s PI value output -> 5895585A0

Statistics

Calculation + Reduction time: 82.262s + 1.808s

kruemelmonster
2015-05-27, 20:41:10
GTX 670 @ 1280/3506:

OpenCL 1.2 CUDA 7.5.8 is ready. Timer: HPET (14.32 MHz)

OpenCL GPU: NVIDIA GeForce GTX 670 (7 CUs, 1110 MHz)
Compiling OpenCL kernels ... done.

Calculating 1.000.000.000th digit of PI. 20 iterations.

Allocated device memory : 335546368 Bytes
Batch Size : 20M
Reduction Size : 64

00h 00m 01.588s Batch 1 finished.
...
00h 01m 03.394s Batch 19 finished.
00h 01m 09.335s PI value output -> 5895585A0

Statistics

Calculation + Reduction time: 67.856s + 1.470s

---

CUDA 7.0.5 is ready. Timer: HPET (14.32 MHz)

CUDA GPU: GeForce GTX 670
Kernel 1, Batch Size: 20M, Blocks: 20480, Threads: 1024
Kernel 2, Batch Size: 20M, Blocks: 20480, Threads: 1024

Calculating 1.000.000.000th digit of PI. 20 iterations.

Allocated device memory : 335545360 Bytes
Batch Size : 20M
Reduction Size : 64

00h 00m 01.589s Batch 1 finished.
...
00h 01m 03.876s Batch 19 finished.
00h 01m 09.869s PI value output -> 5895585A0

Statistics

Calculation + Reduction time: 68.369s + 1.491s

GTX 470 @ 772/1544/1804 @ Quadro 5000:

OpenCL 1.2 CUDA 7.5.8 is ready. Timer: HPET (14.32 MHz)

OpenCL GPU: NVIDIA Quadro 5000 (14 CUs, 1544 MHz)
Compiling OpenCL kernels ... done.

Calculating 1.000.000.000th digit of PI. 20 iterations.

Allocated device memory : 335546368 Bytes
Batch Size : 20M
Reduction Size : 64

00h 00m 01.269s Batch 1 finished.
...
00h 00m 55.996s Batch 19 finished.
00h 01m 01.258s PI value output -> 5895585A0

Statistics

Calculation + Reduction time: 60.107s + 1.140s

---

CUDA 7.0.5 is ready. Timer: HPET (14.32 MHz)

CUDA GPU: Quadro 5000
Kernel 1, Batch Size: 20M, Blocks: 27307, Threads: 768
Kernel 2, Batch Size: 20M, Blocks: 20480, Threads: 1024

Calculating 1.000.000.000th digit of PI. 20 iterations.

Allocated device memory : 335549456 Bytes
Batch Size : 20M
Reduction Size : 64

00h 00m 01.271s Batch 1 finished.
...
00h 00m 57.004s Batch 19 finished.
00h 01m 02.363s PI value output -> 5895585A0

Statistics

Calculation + Reduction time: 61.225s + 1.128s

OpenCL schneller als CUDA, und Big Fermi schneller als Little Kepler. :eek: