I use gcc 4.7.1 Enabling OpenMP option makes my 6 core CPU busy on all 6 of them. Surprisingly on Time x Velocity = 10000 x 10 grid it takes longer them with disabled OpenMp option and only single core working. I have got 230 seconds vs 160 seconds. Besides optimizing wisdom of FFT takes forever with OpenMP. on a grid Time x Velocity = 10000 x 100 with OpenMP execution time = 988 seconds without = 1708 seconds with 4 thread on a grid Time x Velocity = 10000 x 100 with OpenMP execution time = 1036 seconds without = 1603 seconds