S940
2011-09-25, 00:58:37
Klingt sehr praktisch, Performance passt angeblich auch, fast zu schön um wahr zu sein ^^
The use of graphics processing units (GPUs) in high-performance parallel computing continues to become more prevalent, often as part of a heterogeneous system.
For years, CUDA has been the de facto programming environment for nearly all general-purpose GPU (GPGPU) applications. In spite of this, the framework is available only on NVIDIA GPUs, traditionally requiring reimplementation in other frameworks in order to utilize additional multi- or many-core devices. On the other hand, OpenCL provides an open and vendorneutral programming environment and runtime system. With implementations available for CPUs, GPUs, and other types of accelerators, OpenCL therefore holds the promise of a "write once, run anywhere" ecosystem for heterogeneous computing.
Given the many similarities between CUDA and OpenCL, manually porting a CUDA application to OpenCL is typically straightforward, albeit tedious and error-prone. In response to this issue, we created CU2CL, an automated CUDA-to- OpenCL source-to-source translator that possesses a novel design and clever reuse of the Clang compiler framework.
Currently, the CU2CL translator covers the primary constructs found in CUDA runtime API, and we have successfully translated many applications from the CUDA SDK and Rodinia benchmark suite. The performance of our automatically translated applications via CU2CL is on par with their manually ported countparts. PDF Link:
http://eprints.cs.vt.edu/archive/00001161/01/CU2CL.pdf
Quellink:
http://hgpu.org/?p=5670
Thx@Dresdenboys Blog http://www.planet3dnow.de/vbulletin/images/smilies/wink.gif
Edit:
An der Performance hapert es auf ATi GPUs dann aber doch noch:
Finally, we will support the application of device-specific
optimizations as a backend to our CU2CL translator. Why?
Preliminary results from running CU2CL’s automatically-
translated OpenCL applications in Section V on an AMD
Radeon HD 5870 (rather than an NVIDIA GTX 280) de-
liver mediocre results. Although the AMD GPU has higher
theoretical peak performance than the NVIDIA GTX 280, its
execution times are 0 : 075 s, 15 : 24 s, and 2: 11 s for vectorAdd,
Needleman-Wunsch, and SRAD, respectively. These values
are all at least 50 % worse than the OpenCL run times
on the NVIDIA GPU presented in Table V. So, while we
have enabled the potential to run CUDA GPU codes on any
OpenCL-capable device, it does not mean that the these GPU
codes will perform well without device-specific optimizations,
as shown in [4], [5]. Thus, as long-term future work, CU2CL
Aber immerhin läufts ^^
The use of graphics processing units (GPUs) in high-performance parallel computing continues to become more prevalent, often as part of a heterogeneous system.
For years, CUDA has been the de facto programming environment for nearly all general-purpose GPU (GPGPU) applications. In spite of this, the framework is available only on NVIDIA GPUs, traditionally requiring reimplementation in other frameworks in order to utilize additional multi- or many-core devices. On the other hand, OpenCL provides an open and vendorneutral programming environment and runtime system. With implementations available for CPUs, GPUs, and other types of accelerators, OpenCL therefore holds the promise of a "write once, run anywhere" ecosystem for heterogeneous computing.
Given the many similarities between CUDA and OpenCL, manually porting a CUDA application to OpenCL is typically straightforward, albeit tedious and error-prone. In response to this issue, we created CU2CL, an automated CUDA-to- OpenCL source-to-source translator that possesses a novel design and clever reuse of the Clang compiler framework.
Currently, the CU2CL translator covers the primary constructs found in CUDA runtime API, and we have successfully translated many applications from the CUDA SDK and Rodinia benchmark suite. The performance of our automatically translated applications via CU2CL is on par with their manually ported countparts. PDF Link:
http://eprints.cs.vt.edu/archive/00001161/01/CU2CL.pdf
Quellink:
http://hgpu.org/?p=5670
Thx@Dresdenboys Blog http://www.planet3dnow.de/vbulletin/images/smilies/wink.gif
Edit:
An der Performance hapert es auf ATi GPUs dann aber doch noch:
Finally, we will support the application of device-specific
optimizations as a backend to our CU2CL translator. Why?
Preliminary results from running CU2CL’s automatically-
translated OpenCL applications in Section V on an AMD
Radeon HD 5870 (rather than an NVIDIA GTX 280) de-
liver mediocre results. Although the AMD GPU has higher
theoretical peak performance than the NVIDIA GTX 280, its
execution times are 0 : 075 s, 15 : 24 s, and 2: 11 s for vectorAdd,
Needleman-Wunsch, and SRAD, respectively. These values
are all at least 50 % worse than the OpenCL run times
on the NVIDIA GPU presented in Table V. So, while we
have enabled the potential to run CUDA GPU codes on any
OpenCL-capable device, it does not mean that the these GPU
codes will perform well without device-specific optimizations,
as shown in [4], [5]. Thus, as long-term future work, CU2CL
Aber immerhin läufts ^^