This article is about programming multiple graphics cards to render OpenGL scenes on up to 6 monitors with a single computer. Recently, I’ve been doing research in this area for the Allosphere, an immersive, 30ft. display at UC Santa Barbara. Rather than have 16 computers, each with a projector, and gigabit ethernet (which has been the classic way to do cluster display walls for over 20 years), it may be more cost effective and lower latency to have only 2 to 4 high-performance workstations with 3x NVIDIA graphics cards in each. We recently built such a test system for a project called Presence (collaboration with Dennis Adderton and Jeff Elings), with multiple monitor rendering in OpenGL.
How do you use OpenGL to render to multiple displays?
Once upon a time, it was possible to use the “horizontal span” feature of some graphics cards. This instructed the OS to present to opengl a single continuous frame buffer you could write to. However, this has been discontinued due to changes in the Windows OS. I don’t know if such a feature ever existed for linux.
The only way I know of now is to detect and render to each monitor individually per frame. This is also the only way to achieve a 3×2 display wall using 3 graphics cards, because the “horizontal span” only let you place them side-by-side. By rendering to each monitor, you can create arbitrary monitor layouts, and also arbitrary methods of projection. This sounds inefficient, but there are many things that can be done to speed it up. Its also possible to run Cg shaders on each monitor for a single frame. In the Presence project, we found that we could render deferred shading on 6-screens, with shadows, and depth-of-field on each.
How does this work?
The key is an undocumented feature of the OpenGL API called wglShareLists (although there is a man page for it, I say undocumented because it says very very little about how to invoke it, conditions required for it to work, or how use it with multiple GPUs).
The common way to start opengl is to create a device context (in Windows this is an HDC, in linux an Xwindow), and then create an opengl render context, called an HGLRC. An opengl render context basically contains graphics data – textures, display lists, vertex buffer objects, frame buffers, etc. It does not record individual render commands invoked at render-time, but essentially all pre-frame data.
With multiple displays, you need to detect each monitor and create an HDC on each (This can be done with EnumDisplaySettingsEx). If you have two monitors, but _one_ card – a dual-head card which is common – then you only need one HGLRC (render context) because there is only one card to store data. During rendering, you switch which HDC is active, but keep the same HGLRC (see wglMakeCurrent).
If you want to play with multiple cards, then you need to create a window, an HDC, and an HGLRC for each screen. Since each card has its own memory space, they somehow need to share all textures, vertex buffers and data. This is what wglShareLists does. It instructs the OpenGL API to copy all server-side commands to every opengl render context that is shared. The undocumented bit is that this will happen even if the HGLRCs exist on different cards on the PCI bus. Take for example a glTexImage2D, which transfers texture data to the GPU for later rendering. In this case, the OpenGL driver will replicate the glTexImage2D command to every GPU on the bus. In addition, if you have 3 cards, you don’t need to explicitly create 3 textures.. share lists lets you access all of them through the primary context, although there is in fact a copy of your texture in each GPU memory.
This may sound slow. It is, but at present there’s no other way to share a texture across three GPUs. (Perhaps in the future SLI may provide this, but it currently has other limits that dont permit multi-monitor rendering). Remember, however, this is not a rendering cost. It is a buffer setup cost, which for static scenes will usually occur only once at the beginning of your app. Thus, once the data is on the GPUs using wglShareLists, you can ask each card to render it relatively quickly.
If you are trying to render dynamic geometry that changes every frame, then you’ve got much bigger problems. Note that I’m not talking about moving static objects, such as character limbs or terrain. These should still be fast on multiple monitors , because the vertex buffers dont change, or can be generated using vertex shaders. I’m talking about geometry such as a dynamic tentacle mesh where all verticies move each frame. This requires a PCI bus transfer on every frame, and should be avoided. When you render to multiple GPUs, the bus transfer overhead is multiplied by however many graphics cards you have. Thus, avoid dynamic geometry rendering on multiple cards.
Sticking with static geometry buffers (as in most games), how does the rendering work?
Now that the HDC and HGLRCs are setup for each monitor. And assuming you’ve called glShareLists properly, the only thing to do is render. Rendering to multiple displays is fairly simple.
You attach the OpenGL driver to the context you want to render to using wglMakeCurrent. This tells the driver to render to that particular device context (OS window) using a particular opengl render context (graphics state). You then invoke opengl graphics commands as usual.
First, you would setup the perspective, model and view matricies to create a window into your screen for that particular monitor. Depending on the layout of your monitors, there are several ways to do this. The simplest is to use glFrustum (not gluPerspective) to select the sub-portion of a camera frustum that you wish to render on a particular monitor. Then, you call opengl draw commands. If you bind to a texture, or use a vertex object, it will use the shared graphics state that now exists on every card – you basically don’t have to worry about which card the texture comes from.
Another note about performance. I said that wglShareLists is only slow at the beginning of your app, as textures are transfered to each graphics card. This is only partly true. Your main render loop also now consists of perspective matrix setup, and draw commands, for each monitor. Ideally, since the graphics data is shared, it should be possible to instruct each GPU on the bus to do their rendering now in parallel (at the same time the other GPUs are rendering their monitors). However, as far as I know, modern GPUs can’t do this yet (NVIDIA?). Basically, your render loop has to wait while you send draw commands separately to each GPU, then wait for that GPU to finish so you can swap its buffer, thus updating each monitor. Fortunately, since the vertex/texture data is already on the card, and since you’ve writter your render code to bundle opengl calls together as much as possible (i hope!), then this doesn’t take too much longer.
So, the overall pseudo-code is:
1. Detect all hardware displays
2. Setup for each one
2a. … Create OS window (HWND CreateWindow method)
2b. … Get the HDC device context from the window (GetDC method)
2c. … Create HGLRC opengl context. (wglCreateContext method)
3. Call wglShareLists
4. Set wglMakeCurrent to HDC and HGLRC for context 0 (wglMakeCurrent method)
5. Create textures, VBOs, disp lists, frame buffers, etc.
6. Start main rendering (for each monitor)
6a. … Call wglMakeCurrent for HDC/HGLRC for specific monitor
6b. … Create projection, view matricies for specific monitor
6c. … Clear frame and depth buffer
6d. … Draw scene
6e. … Call wglSwapBuffers to refresh that monitor
6f. End render loop
7. Delete all textures, VBOs, then close contexts.
Using the methods above, I was able to render the happy Buddha (a test object in the graphics community) at over 60 fps with deferred shading, soft shadows, and depth of field on 6x monitors using three Nvidia GeForce 8800GTX cards.
A final point: I’ve found there are two types of multi-monitor research out there: 1) what most commerical games, and graphics students do – which is to figure out, at most, how to do a dual-monitor setup using a single dual-head card (one GPU), and 2) large research institutions that build giant display walls using dozens or hundreds of computers the old fashioned way. There is very little work so far using multiple GPUs in a single computer, probably because graphics cards to do this are so new (NVIDIA spends lots of time meeting the huge needs of parallel GPGPU scientific computing).
However, I encourage those interested to explore single computer multi-GPU rendering for these reasons: a) The hardware is relatively cheap now (an LCD can be had for $150 ea). b) This area of research is relatively unexplored so far. c) Although a projector gives a larger physical area, unlike a projector you actually increase your renderable resolution for every monitor added. Thats an anti-aliased pixel resolution of 3840 x 2048 for six screens (6x1280x1024). If you render to 6 projectors, were talking huge space. d) It looks really cool having a desktop running a game at ultra-highres on 6 screens!
For some screen-shots of results, check here:
(with Dennis Adderton and Jeff Elings):