Jump to content

Metal Particles (as demo /bench) new Nbody-Metal (demo/bench)


mitch_de
 Share

144 posts in this topic

Recommended Posts

60fps with this new version 

vs

8fps on the older one 

 

But btw visually it's seems that there are a lot more particles showed on the older one... :/

 

take a look:

 

 

 

Indeed, the 33Mill version have bugs like it seems...

 

Compare this 3 pictures:

 

 

JahStories @60 FPS  (Nvidia GTX 780)

post-735125-0-92367700-1447114984_thumb.jpg

 

 

mich_de @10 FPS (Nvidia GT 740):

post-735125-0-53649000-1447115012_thumb.jpg

 

 

Ramalama @5 FPS (Intel Iris 5100):

post-735125-0-10141700-1447114999_thumb.jpg

 

 

It looks like only the intel card have 33m pixels...

 

Cheers :-)

  • Like 1
Link to comment
Share on other sites

Thanks reporting results with 33 Mill version. The shown, computed particles were cut off / limited by changing discrete VRAM cache option. 

So i build an 2 Mill final version.

12 FPS GT 740, 99% GPU load.

 

If very fast GPUs run in the 59/60 FPS Metal limit OR! gpu load goes below 80% please resport - in this case i must use more particles count.

 

final version on first page.

 

post-110586-0-57616900-1447155355_thumb.jpg

Link to comment
Share on other sites

Thanks reporting results with 33 Mill version. The shown, computed particles were cut off / limited by changing discrete VRAM cache option. 

So i build an 2 Mill final version.

12 FPS GT 740, 99% GPU load.

 

If very fast GPUs run in the 59/60 FPS Metal limit OR! gpu load goes below 80% please resport - in this case i must use more particles count.

 

final version on first page.

 

attachicon.gifBildschirmfoto 2015-11-10 um 12.25.00.jpg

 

Hi mitch, 

 

It running now at 60 FPS on the Iris 5100...

the 2m version i mean...

 

But now it seems, this is the first version I've not getting all pixels...

so its running now @60 fps...

 

 

EDIT:

However, its working as a demo, it means we can check if meal is supported or not... so thank you very much for this :-)

Everything else doesn't matter :-)

 

Cheers :-)

Link to comment
Share on other sites

You got around 40 FPS with first version, but there were more (4 Mill particles) . All 2 Mill should be shown with the 2 Mill version.

Only if other gpus also run in the 60 fps limit i will use more particles.

  • Like 1
Link to comment
Share on other sites

:)

 

HD 7950

 

final 60 fps 

 

"GPU load small , 60 fps is max ?"

Yep, METAL seems to have fixed vsync with 60 HZ(FPS) max. Cant bei changed. Even not by dev tools (disable vsync) in the past.

That is bad, because we can setup more particles and run slower  to avoid that 60 FPS  limit.

But then, the lowend gpus are stalling near with 2-5 fps.

 

 

Please try the 3Mill test build if you also hitting the 60 FPS limit.

I now get only 8 FPS GT 740. Perhaps wie need 4 Mill if your fast GPU is to close to 60 FPS or GPU load less than 80%.

EDIT: If you have an AMD cpu, i dont know the speed compared to Intel i5/i7 , the cpu maybe also an bottleneck firing the fast gpu with work.

How is your CPU load running that stuff? Normaly should be low (at least <= 25%).

OSXMetalParticles_3Mill_test.zip

Link to comment
Share on other sites

You should try rendering to an offscreen surface only to avoid the 60 FPS vsync limit that all on-screen Metal apps are stuck with (due to CoreAnimation controlling the display pipeline).  That would allow a much fairer comparison between GPUs and drivers.

Link to comment
Share on other sites

some new METAL playground N-body Metal demo  :) , well known as cuda, opencl bench - now computes with metal - but i see that more as an demo,not as an benchmark

threadwidth + maxthreadspergroup may very between drivers/gpu models and are shown as info

 

Key s switches between METAL devices - if you have more than one.

Info: Metal is limited to 60 FPS - if you run in that limit you can use key + to get more bodies (up to 96K)  = more gpu work, less FPS

 

16k bodies (= min)  GT 740 45 FPS

post-110586-0-85542700-1447413123_thumb.jpg

 

32K bodies  GT 740 11 FPS

post-110586-0-64753300-1447413316_thumb.jpg

 

64K bodies  GT 740 1,8 - 2,8 FPS ( stalls, sometimes crashes)

post-110586-0-07536000-1447413879_thumb.jpg

 

DL:

NBody-Metal_test1.zip

Link to comment
Share on other sites

definitely optimized for IGPU
Intel 4200U 1.6 GHZ GPU 4400(half the cores of 5000/5100) max 1 GHz

using Intel Power Gadget
benchmark starts at 14 Watts CPU 1.6 GHz/ GPU 1 GHz then changes to 9 Watts CPU 800 MHz(idle speed)/GPU 1 GHz and the framerate actually improves

 

metal particles
2M- 35FPS

Link to comment
Share on other sites

Yep, main goal of metal is iGPU/embedded usage.

 

Anyone has already tried new Nbody-Metal? (look post #70)

 

post-110586-0-17187300-1447584215_thumb.jpg

Would  be interesting if the switch (key s) function works and you can test any METAL device which is available.

Link to comment
Share on other sites

Yep, main goal of metal is iGPU/embedded usage.

 

Anyone has already tried new Nbody-Metal? (look post #70)

 

attachicon.gifBildschirmfoto 2015-11-15 um 11.42.55.jpg

Would  be interesting if the switch (key s) function works and you can test any METAL device which is available.

 

post-735125-0-75557200-1447598385_thumb.jpg

 

I haven't second Graphics... only Iris 5100

 

Cheers :-)

Link to comment
Share on other sites

Yep, main goal of metal is iGPU/embedded usage.

 

Anyone has already tried new Nbody-Metal? (look post #70)

 

attachicon.gifBildschirmfoto 2015-11-15 um 11.42.55.jpg

Would  be interesting if the switch (key s) function works and you can test any METAL device which is available.

 

Well, you should probably enable an offscreen rendering mode to let the discrete GPUs actually run at full speed.  Also, you should move all your resources to the framebuffer memory by using MTLStorageModeManaged and [MTLBuffer didModifyRange:], as leaving the particle data in system memory (i.e. MTLStorageModeShared) will be a huge penalty for discrete GPUs.  The goal of Metal is to dramatically reduce the CPU overhead typically associated with OpenGL, but you can't just take code that runs on iOS or iGPU and expect it to work well on a big discrete GPU.

Link to comment
Share on other sites

 Share

×
×
  • Create New...