Pfft Benchmarks

This page contains the results of the pfft benchmarks. On the x axis of the charts there is base two logarighm of the length of the input sequence and on the y axis there is the speed in GFLOPS. The speed in GFLOPS is defined as the number of operations that a basic Cooley-Tukey algorithm would take to perform a transform divided by the number of nanoseconds the function call takes. So the speed in GFLOPS is calculated like this:

5 * n * log2(n) / nanoseconds

for the complex transforms and like this:

2.5 * n * log2(n) / nanoseconds

for the real ones.

The pfft code was compiled (unless noted otherwise) using the GDC compiler compiled with GCC 4.8.0 20120715. All the benchmarks were done using the test/test.d file and were run on an Intel Core i5 2500K CPU. The operating system was Debian Wheezy.

The benchmarks on this page are for 0.1.1 release of pfft.

Comparison with FFTW

This sections contains a speed comparison of pfft and FFTW. The version of FFTW used was 3.3.2 and it was compiled using GCC 4.8.0 20120603. The FFFTW_PATIENT flag was passed to all the fftw planner functions. This flags results in fast transform times but very slow initialization.

Complex single precision transform

Complex double precision transform

Real single precision transform

Real double precision transform

Comparison of Phobos and two interfaces to pfft

This section compares the speed of the FFT in Phobos (std.numeric.Fft), the speed of the pfft API in pfft.pfft and the speed of the Phobos compatible API for pfft in pfft.stdapi. All results are for a complex single precision transform. The FFT in Phobos does not use SIMD, so the std.numeric.Fft in all three graphs uses scalar floating point math.

AVX floating point math

SSE floating point math

Scalar floating point math

Comparison of different D compilers

The chart below compares the speed of pfft when compiled with DMD, LDC and GDC. The version of DMD was 2.59. The LDC used was revision 56233a373ce.

Complex single precision transform