Performance

Released in December 2012, Fluids v.3 is the fastest, large scale, open source, GPU-based fluid simulator around. At least until next year anyway.

Measurement of interactive SPH fluid simulations in academic papers and commercial products, are typically reported with # of particles, and frames per second, for a given hardware. Measurements of offline simulations for motion pictures are typically given as total simulation time required for a number of frames.
Using these measures, Fluids v.3 achieves the following:
- 8,388,608 particles at 1/4 fps on a GeForce GTX460M (192 cores), or 2880 frames (2 mins actual time) simulated in 3.5 hours
- 4,194,304 particles at 1/2 fps on a GeForce GTX460M, or 2880 frames (2 mins actual) in 1.5 hours
- 1,048,576 particles at 4.2 fps, GTX460M, or 2880 frames (2 mins actual) in 11 minutes.
- 262,144 particles at 23 fps, GTX460M, or 2880 frames (2 mins actual) in 2 minutes.
- 65,536 particles at 113 fps, and 16,384 particles at 434 fps.

UPDATE: I recently tested Fluids v.3 on a newer GeForce GTX670, with 1334 cores, with:
- 4,194,304 particles at 4 fps, or 2880 frames (2 mins actual) in 11 minutes
- 1,048,576 particles at 20 fps

I. Hardware Algorithm Efficiency

While number of particles and frames-per-second gives us a picture of real world performance in an application, it is more useful to compare pure algorithm efficiency regardless of the number of particles. This can be found by multiplying # particles by the frames per second, to get the average number of particles simulated per second, on a given hardware platform.

H.E. (in particles per second) = # Particles * fps

Figure 1. Algorithm performance for various SPH fluid simulators, measured using Hardware Efficiency metric (pps), relative to the number of particles. Orange curve is the current Fluids v.3 simulator, on a GeForce GTX460M. Blue line is NVIDIA’s PhysX, measured from Spaete’s Fluid Sandbox. Green and Purple lines are recent academic results by Pajarola (2010), and Fang Chao (2010).

Figure 1 shows the hardware-based algorithm efficiency for various SPH fluid simulators.  These results were calculated by running each SPH simulator on a GeForce GTX460M, disabling any advanced rendering, and measuring pure simulation efficiency as a frame rate for a given number of particles. The hardware efficiency, H.A.E is computed from this.

# Particles msec / frame Hardware Efficiency (particles per second)
Fluids v.3 (GPU)
4,096 0.68 6,113,432
8,192 1.30 6,301,538
16,384 2.30 7,123,478
32,767 4.20 7,801,666
65,536 8.80 7,447,272
131,072 18.21 7,197,803
262,144 42.30 6,197,257
524,288 98.00 5,349,877
1,048,576 234.00 4,481,094
2,900,800 1085.00 2,673,548
8,388,608 4433.00 1,892,309
Fluid Sandbox, NVIDIA PhysX (GPU)
26,000 41.67 624,000
74,140 60.75 1,220,344
102,440 68.92 1,486,404
200,000 104.17 1,920,000
Fang Chao (OpenCL)
16,384 7.69 2,129,920
65,536 28.57 2,293,760
Pajarola, 2010
16,128 8.13 1,983,744
75,200 38.46 1,955,200
129,024 58.82 2,193,408
255,600 100.00 2,556,000
RealFlow 2012 **
3,987 20 secs 22,327
27,865 203 secs 13,726
158,778 755 secs 7,781
1,200,000 8 hours 8,571
2,700,000 14 hours, 53 min 13,303

** RealFlow 2012 uses an adaptive time step, and hybrid fluid-grid methods for increased realism. Thus comparisons should be taken lightly. Measurements based on simulation experiments reported on youtube.

II. Algorithm Efficiency

Hardware efficiency, above, is independent of number of particles, but still depends on the capabilities of the GPU. A better measure would report pure algorithm efficiency normalized for different hardware.  This can be accomplished by dividing by the peak GFlop rating of the GPU.

A.E. = particles per second per Gflop

Measuring pure Algorithm Efficiency requires us to run the SPH simulation on a number of different hardware devices. At present, I have measure Fluids v.3. on a Tesla GeForce GTX460M (92 cores), and a Kepler GeForce GTX670 (1334 cores).

Initial results are as follows:

# Particles H.E. (pps) Hardware A.E. (pps per Gflop)
1,048,576 4,481,094 GTX 460M, 192 core, 518.4 Gflops 8644
1,048,576 12,000,000 GTX 670, 1334 core, 2460 Gflops 4878

More tests are needed (see Development page). What this shows is that simulation efficiency varies both with number of particles, and with the underlying hardware. While a jump in hardware from 518 Gflops to 2460 Gflops should result in a 4.75x increase, the actual increase is only 2.67x. The reasons for this are subtle. More measurements, for different number of particles, and different hardware, should provide a clearer picture. Overall, Fluids v.3 achieves 4,400,00 pps on a GTX460M, running 4 million particles at 1/4 fps, and 12,000,000 pps efficiency on a GeForce GTX 670, allowing simulations of 4 million particles at 4 fps.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>


Reload Image
*