Metal Particles (as demo /bench) new Nbody-Metal (demo/bench)

mitch_de · November 21, 2015

Yep, the faster the gpu can compute (more + faster compute units) you can set numbodies higher and get more GFLOPS, because fast gpus are not under full load by "only" 32K bodies.

Lowend or midrange gpus will not get higher GFLOPS by using more bodies as 32K.

For all gpus same : more numbodies = less FPS. I think if you get more than 10 FPS with 32K you can try to bench with 64K bodies and look for perhaps more GFLOPS.

Even 128K bodies should work - i dont know - for usage with very fast gpus.

EDIT: Yep, 128K works also on my lowend GT 740. Same GFLOPS as with 64K, around 332 GFLOPS - but only 1.0 FPS, with 256K 0.2 FPS

So, if you have any gpu which is not lowend (GT 2/4/5/610,..20,..30) better start with 64K bodies to get close to the max. GFLOPS.

32768 = 32K

65536 = 64K

131072 = 128K

262144 = 256K (maybe for highend gpus like GTX 960+ usable) , my GT 740 gpu slows/stalls the whole OS X GUI running 256K bodies.

gils83 · November 21, 2015

Yes for medium GPU (GTX 950/60) 64k=10 fps

Fljagd · November 23, 2015

Très intéressant

gils83 · November 23, 2015

test cuda-Z

Fljagd · November 23, 2015

test cuda-Z

the problem for me with Cuda-Z I can only test one card at a time

gils83 · November 23, 2015

the problem for me with Cuda-Z I can only test one card at a time

why ?

post screen

Fljagd · November 23, 2015

why ?

post screen

it is either not both at the same time

mitch_de · November 23, 2015

Yep, nbody cuda can use > 1 gpu by adding numdevices= parameter 2,3,4...

Great to see first 2+ gpus compute nbdody result getting 2400 GFlops.

Try to use -benchmark to compare GFLOPS without any cpu/gpu work for OpenGL rendering.

Very fast gpus didnt show much diff - at least running 64K+ bodies. Lowend gpus or lowend cpus will show differences, because combined OpenGL / gpu compute task slows down

the GFLOPS for gpu computing.

Also older, highend GPUs (fermi, kepler) which are even faster in OpenGL than newer midrange kepler (vs fermi) / maxwell(vs fermi, kepler) gpus are often much slower in gpu computing (OpenCL, CUDA).

My GT 740(kepler) DDR3 for example ist only 5-10% faster in OpenGL to my older GT 440 DDR5 (fermi) gpu.

But much faster in CUDA, OpenCL- up to 2 times faster, average 30% faster.

Fljagd · November 23, 2015

Yep, nbody cuda can use > 1 gpu by adding numdevices= parameter 2,3,4...

Great to see first 2+ gpus compute nbdody result getting 2400 GFlops.

Try to use -benchmark to compare GFLOPS without any cpu/gpu work for OpenGL rendering.

Very fast gpus didnt show much diff - at least running 64K+ bodies. Lowend gpus or lowend cpus will show differences, because combined OpenGL / gpu compute task slows down

the GFLOPS for gpu computing.

Also older, highend GPUs (fermi, kepler) which are even faster in OpenGL than newer midrange kepler (vs fermi) / maxwell(vs fermi, kepler) gpus are often much slower in gpu computing (OpenCL, CUDA).

My GT 740(kepler) DDR3 for example ist only 5-10% faster in OpenGL to my older GT 440 DDR5 (fermi) gpu.

But much faster in CUDA, OpenCL- up to 2 times faster, average 30% faster.

mitch_de · November 23, 2015

Use at least 64K numbodies. Otherwise , like 16K with 2 cuda devices or 8K the gpus will not get full work load. like 16K 1900 GFlops vs 64K 2400 even using OpenGL.

64K (or 128K) will may give much higher = same or little higher GFLOPS as the 64K non benchmark (OPenGL window) 2400 GFlops.

65536 = 64K

131072 = 128K

Less than 64K (like 32K ....8K) bodies may only outperform lowend gpus!

Less than 64K (midrange+ gpu) is more an OpenGL Bench as an gpu compute bench.

Reduced FPS by more bodies doesn´t matter (running non benchmark, OpenGL runs) - Nbody CUDA an gpu compute bench.

Running very less numbodies, like 2K or 8K - is 90% cpu+OpenGL bench (GFLOPS only 1/3 - 1/2 of max. GFLOPS), 64K+ 90% gpu compute bench, running in -benchmark mode 95%.

And the focus is only on the GFLOPS, not OpenGL FPS.

Fljagd · November 23, 2015

Use at least 64K numbodies. Otherwise , like 16K with 2 cuda devices or 8K the gpus will not get full work load. like 16K 1900 GFlops vs 64K 2400 even using OpenGL.

64K (or 128K) will may give much higher = same or little higher GFLOPS as the 64K non benchmark (OPenGL window) 2400 GFlops.

65536 = 64K

131072 = 128K

Less than 64K (like 32K ....8K) bodies may only outperform lowend gpus!

Reduced FPS by more bodies doesn´t matter (running non benchmark, OpenGL runs) - Nbody CUDA an gpu compute bench.

And the focus is only on the GFLOPS, not OpenGL FPS.

gils83 · November 23, 2015

it is either not both at the same time

ok ,

clic on "performance" for GTX 960

Fljagd · November 23, 2015

ok ,

clic on "performance" for GTX 960

mitch_de · November 23, 2015

Great : now using 128K bodies you get 2529 GFLOPS (using both gpus) in -benchmark mode

I think thats the max. for that gpus - 1400 + 1100 Gflops (running each alone)

Fljagd · November 23, 2015

Great : now using 128K bodies you get 2529 GFLOPS (using both gpus) in -benchmark mode

I think thats the max. for that gpus - 1400 + 1100 Gflops (running each alone)

Bildschirmfoto 2015-11-23 um 13.44.14.jpg

it's powerful :rofl:

mitch_de · November 23, 2015

Yep, and dont worry about different GFLOPS shown in CUDA-z(OpenSource vs Nbody CUDA(by Nvidia).

Differnet compute code (Nbody much more complex), different GFLOPS.

Seems that Nbody Cuda (from Nvidia) likes/benefit more from the modern maxwell gpu vs kepler gpu than CUDA-Z:

Nbody 1446 / 1150 GFLOPS = maxwell GTX 960 is 1,25 times faster than kepler GTX 660 TI

CUDA Z : 2709 / 2312 GFLOPS = maxwell GTX 960 is "only" 1,17 times faster than kepler GTX 660 TI

gils83 · November 23, 2015

good job for Adobe Première Pro "mercury"

Fljagd · November 23, 2015

good job for Adobe Première Pro "mercury"

you just declarrer your card in adobe premiere pro so that they are supported

gils83 · November 23, 2015

you just declarrer your card in adobe premiere pro so that they are supported

Yes for you

Fljagd · November 23, 2015

Yes for you

yes because they are not in the list

gils83 · November 23, 2015

http://best-mac-tips.com/2014/08/21/enable-cuda-hardware-rendering-adobe-premier/

Fljagd · November 23, 2015

http://best-mac-tips.com/2014/08/21/enable-cuda-hardware-rendering-adobe-premier/

Thank you but done

Fortunately

Fred :wink_anim:

Micky1979 · November 23, 2015

Intel HD4000, 2097152 particles at 47 fps, it's ok?

mitch_de · November 23, 2015

looks good/normal. Only some AMDs have problems with Metal particles. Someone contacted the dev of Metal particles to change some code for discrete gpu usage - i dont know if sucess.

MattsCreative · November 26, 2015

https://twitter.com/TechnezReview/status/669948158504386561no issues with any test amd radeon 290x

Metal Particles (as demo /bench) new Nbody-Metal (demo/bench)

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites