[Chimera-users] multi GPU rendering
goddard at sonic.net
Fri Dec 18 10:42:11 PST 2015
One more bit of useful info regarding rendering large maps with solid style. The “perpendicular to view” projection mode is not the same as the mode “x, y, or z planes”. The perpendicular mode uses an OpenGL graphics 3d texture and slices the map perpendicular to the viewing direction and uses trilinear interpolation on the graphics card. The x,y,z planes mode uses OpenGL 2d textures perpendicular to one of the x, y, or z axes. While the perpendicular mode seems better the trouble is that 3d textures are generally not used in video games, so this is a very obscure graphics feature that may not be well optimized in the graphics driver, it may be slow (e.g. be done with no hardware acceleration), it may have limits to the size of the volume. On the other hand 2d textures are used in every video game and are optimized to the hilt — they are what is used to put color patterns on any surface. With some graphics cards and drivers the 3d textures will work as well as 2d textures, particularly the high-end cards, but you will only find this out by trial and error. The default “auto” projection mode never uses perpendicular to view because it is a bad choice on many low-end graphics cards.
> On Dec 18, 2015, at 1:38 AM, Benoît Zuber <benoit.zuber at ana.unibe.ch> wrote:
> Hi Tom
> Thanks for the additional info. I didn't know about the projection mode option. I was wondering before why under some conditions z planes were super thin when viewed perpendicular to z axis and why sometimes the gaps were filled. This was a bit problematic for some data sets where we actually mainly look in this orientation. Now using projection mode "perpendicular to view" instead of auto forces the planes to be rendered thick. great.
> watching nvidia-smi outputs confirms that only one GPU is used for rendering. Watching is also very useful to know how much the volume must be cropped before switching to the next step size without saturating the GPU memory.
> Now I got it, if we want to watch bigger volumes, we just need to get a top geforce with 12 Gb instead of 3 Gb (Santa if you can hear me)
> On 18.12.2015 01:02, Tom Goddard wrote:
>> Hi Ben,
>> From what I read both Nvidia SLI and AMD CrossFire for using multiple graphics cards both require hardware jumpers between the cards.
>> The Chimera solid benchmark makes a cube shaped volume and times rendering it. A 1024 cubed volume takes 2 Gbytes on the graphics card because each grid point takes two bytes, one for the gray level and one for the transparency. An important thing to know about the solid rendering with really big volumes is that by default when you rotate it will make up to 3 full copies of the data — so up to 6 Gbytes in the benchmark example. It makes a copy when you rotate to viewing a different face of the cube. It then treats it as slices parallel to that face. So for a cube shaped map you will see a freeze when you rotate beyond 45 degrees. For non-cube maps I think the rule is it uses the face of the box that is showing the most area. For big volumes if you don’t care so much about looking at it along x and y directions then you might change projection mode the Features / Solid Rendering Options from “auto” to “z planes”. You probably want to know what auto does — it uses z planes if the volume is a thin slab, but "x, y, or z planes” if it is not a thin slab. Thin means x or y at least 4 times larger than z (I pressed Help on Volume Viewer dialog to learn that bit, even though I wrote the code). Once “projection mode” is “z plane” then you won’t get any of the extra copies.
>> For really big volumes I could probably the transparency and use one byte per grid point. I could just scale transparency with intensity (which is what volume viewer does anyways). That makes the data of the graphics card half as big so it would be possible to show larger maps. I’ll try that in our next generation ChimeraX software — would be a nice improvement.
>> Back to using multiple graphics cards — I don’t think the driver can possibly split the volume solid rendering data (2d OpenGL textures) — it will try to put all of it on both graphics cards — so you won’t get more memory by using more cards.
>>> On Dec 17, 2015, at 2:39 PM, < <mailto:benoit.zuber at ana.unibe.ch>benoit.zuber at ana.unibe.ch <mailto:benoit.zuber at ana.unibe.ch>> <benoit.zuber at ana.unibe.ch <mailto:benoit.zuber at ana.unibe.ch>> wrote:
>>> Hi Tom
>>> Thanks for your thorough reply. I am not in front of the workstation but I’ll use the gpu activity monitor to guess if chimera is using the three GPUs. I have some doubts that the OS / openGL driver is distributing rendering work on each gpu in our configuration. I think that for the SLI to work you need to wire both cards together, which is not the case in our workstation. But anyway, I’ll check activity.
>>> The rendering case I was referring to consists of solid rendering. I never try to render the full dataset at full resolution as we are talking about datasets of around 50 Gb. I use a very high step to select a sub region, then I gradually decrease step size and fine tune the contrast transfer function. When I am happy I render at step 2 or step 1, depending on the sub region size. Clearly there is time needed to read data from disk; when reading is done the program still computes stuff for a few seconds. Then I can rotate, zoom, …, the volume. If the sub region is not too large then it is super smooth. But if I select a too large sub volume, then I can rotate some degrees, then it freezes for a while, then it is smooth again, etc. So I guess it corresponds to your description of the problem.
>>> I took a look at the benchmark. That’s very useful! Obviously I am not going to get better performance with a quadro (at least one in the benchmark: All quadro except K4000 get worse scores for solid rendering than the GTX 780 Ti. K4000 actually got the same score as the 780 Ti. Only with a GTX Titan would I get a nearly 2x improvement, which makes sense since this card has 2x the memory of the 780 Ti (6Gb vs 3Gb). The GTX Titan X is not in the benchmark yet but it should be even better as it has 12 Gb Ram. Possibly the quadro k6000, which is not in the benchmark either, could have similar or better performance as this one (it also has 12 Gb RAM) but … it costs about 4 times more than a Titan X. Sad that only the quadro work for stereo under linux (but for that we bought a once state of the art quadro on eBay for 150$ and installed it on another PC :-)
>>> De : Tom Goddard <goddard at sonic.net <mailto:goddard at sonic.net>>
>>> Date : jeudi, 17 décembre 2015 19:22
>>> À : Benoit Zuber <benoit.zuber at ana.unibe.ch <mailto:benoit.zuber at ana.unibe.ch>>
>>> Cc : chimera users list <chimera-users at cgl.ucsf.edu <mailto:chimera-users at cgl.ucsf.edu>>
>>> Objet : Re: [Chimera-users] multi GPU rendering
>>>> Hi Ben,
>>>> I haven’t tried Chimera with 2 or more graphics cards working together. I don’t know if it is likely to improve rendering speed. I don’t think applications need any special support to use multiple cards — the system graphics driver distributes the work across 2 or more cards. With Nvidia cards this technology is called SLI and with AMD cards it is called CrossFire. From what I understand they do thing like have each of 2 cards render every other frame — since the graphics are pipelined the current frame and next frame could be rendered simultaneously possibly doubling the speed. I have my doubts that this will improve Chimera rendering speed for large density maps. I think slow rendering of maps in solid (grayscale) style typically happens in Chimera when the map data doesn’t fit on the graphics card, so every frame it has to transfer all the data to the card — the performance plummets once you get a map that big. If the data does fit on the graphics card, it usually renders at full frame rate (60 frames per second), although that isn’t true of all graphics cards. This all relates to the speed rotating the model. When you first load a big map it can take a long time because disk drive speed is slow — a solid state drive helps speed this up. If you explain exactly the case where you see slow rendering I can perhaps advise.
>>>> In general I think the Geforce cards perform better (fewer bugs, faster speed) than the Quadro cards, and would only recommend Quadro if you use stereoscopic display with shutter glasses. Here are Chimera benchmarks for a range of graphics cards:
>>>> <http://plato.cgl.ucsf.edu/trac/chimera/wiki/benchmarks>http://plato.cgl.ucsf.edu/trac/chimera/wiki/benchmarks <http://plato.cgl.ucsf.edu/trac/chimera/wiki/benchmarks>
>>>>> On Dec 17, 2015, at 5:28 AM, wrote:
>>>>> Can chimera make use of more than one GPU for 3D rendering? We have 3 Geforce GTX 780 Ti (mainly used for number crunching) on a particular workstation and would be interested to make full use of them for rendering tomograms or serial block face imaging data.
>>>>> Another question: has anyone done benchmark comparison for 3D rendering in Chimera between a top of the line Geforce GTX card and a top Quadro card? Is there a strong improvement with the Quadro, which might justify the price gap?
>>>>> Prof. Benoît Zuber
>>>>> Institute of Anatomy
>>>>> University of Bern
>>>>> Baltzerstrasse 2
>>>>> Postfach 922
>>>>> 3000 Bern 9
>>>>> <mailto:benoit.zuber at ana.unibe.ch>benoit.zuber at ana.unibe.ch <mailto:benoit.zuber at ana.unibe.ch>
>>>>> +41 31 631 84 40
>>>>> <http://www.ana.unibe.ch/research/experimental_morphology/index_eng.html>http://www.ana.unibe.ch/research/experimental_morphology/index_eng.html <http://www.ana.unibe.ch/research/experimental_morphology/index_eng.html>_______________________________________________
>>>>> Chimera-users mailing list
>>>>> Chimera-users at cgl.ucsf.edu <mailto:Chimera-users at cgl.ucsf.edu>
>>>>> http://plato.cgl.ucsf.edu/mailman/listinfo/chimera-users <http://plato.cgl.ucsf.edu/mailman/listinfo/chimera-users>
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Chimera-users