Multi-Monitor Rendering in OpenGL

This article is about programming multiple graphics cards to render OpenGL scenes on up to 6 monitors with a single computer. Recently, I’ve been doing research in this area for the Allosphere, an immersive, 30ft. display at UC Santa Barbara. Rather than have 16 computers, each with a projector, and gigabit ethernet (which has been the classic way to do cluster display walls for over 20 years), it may be more cost effective and lower latency to have only 2 to 4 high-performance workstations with 3x NVIDIA graphics cards in each. We recently built such a test system for a project called Presence (collaboration with Dennis Adderton and Jeff Elings), with multiple monitor rendering in OpenGL.

How do you use OpenGL to render to multiple displays?

Once upon a time, it was possible to use the “horizontal span” feature of some graphics cards. This instructed the OS to present to opengl a single continuous frame buffer you could write to. However, this has been discontinued due to changes in the Windows OS. I don’t know if such a feature ever existed for linux.

The only way I know of now is to detect and render to each monitor individually per frame. This is also the only way to achieve a 3×2 display wall using 3 graphics cards, because the “horizontal span” only let you place them side-by-side. By rendering to each monitor, you can create arbitrary monitor layouts, and also arbitrary methods of projection. This sounds inefficient, but there are many things that can be done to speed it up. Its also possible to run Cg shaders on each monitor for a single frame. In the Presence project, we found that we could render deferred shading on 6-screens, with shadows, and depth-of-field on each.

How does this work?

The key is an undocumented feature of the OpenGL API called wglShareLists (although there is a man page for it, I say undocumented because it says very very little about how to invoke it, conditions required for it to work, or how use it with multiple GPUs).

The common way to start opengl is to create a device context (in Windows this is an HDC, in linux an Xwindow), and then create an opengl render context, called an HGLRC. An opengl render context basically contains graphics data – textures, display lists, vertex buffer objects, frame buffers, etc. It does not record individual render commands invoked at render-time, but essentially all pre-frame data.

With multiple displays, you need to detect each monitor and create an HDC on each (This can be done with EnumDisplaySettingsEx). If you have two monitors, but _one_ card – a dual-head card which is common – then you only need one HGLRC (render context) because there is only one card to store data. During rendering, you switch which HDC is active, but keep the same HGLRC (see wglMakeCurrent).

If you want to play with multiple cards, then you need to create a window, an HDC, and an HGLRC for each screen. Since each card has its own memory space, they somehow need to share all textures, vertex buffers and data. This is what wglShareLists does. It instructs the OpenGL API to copy all server-side commands to every opengl render context that is shared. The undocumented bit is that this will happen even if the HGLRCs exist on different cards on the PCI bus. Take for example a glTexImage2D, which transfers texture data to the GPU for later rendering. In this case, the OpenGL driver will replicate the glTexImage2D command to every GPU on the bus. In addition, if you have 3 cards, you don’t need to explicitly create 3 textures.. share lists lets you access all of them through the primary context, although there is in fact a copy of your texture in each GPU memory.

This may sound slow. It is, but at present there’s no other way to share a texture across three GPUs. (Perhaps in the future SLI may provide this, but it currently has other limits that dont permit multi-monitor rendering). Remember, however, this is not a rendering cost. It is a buffer setup cost, which for static scenes will usually occur only once at the beginning of your app. Thus, once the data is on the GPUs using wglShareLists, you can ask each card to render it relatively quickly.

If you are trying to render dynamic geometry that changes every frame, then you’ve got much bigger problems. Note that I’m not talking about moving static objects, such as character limbs or terrain. These should still be fast on multiple monitors , because the vertex buffers dont change, or can be generated using vertex shaders. I’m talking about geometry such as a dynamic tentacle mesh where all verticies move each frame. This requires a PCI bus transfer on every frame, and should be avoided. When you render to multiple GPUs, the bus transfer overhead is multiplied by however many graphics cards you have. Thus, avoid dynamic geometry rendering on multiple cards.

Sticking with static geometry buffers (as in most games), how does the rendering work?

Now that the HDC and HGLRCs are setup for each monitor. And assuming you’ve called glShareLists properly, the only thing to do is render. Rendering to multiple displays is fairly simple.

You attach the OpenGL driver to the context you want to render to using wglMakeCurrent. This tells the driver to render to that particular device context (OS window) using a particular opengl render context (graphics state). You then invoke opengl graphics commands as usual.

First, you would setup the perspective, model and view matricies to create a window into your screen for that particular monitor. Depending on the layout of your monitors, there are several ways to do this. The simplest is to use glFrustum (not gluPerspective) to select the sub-portion of a camera frustum that you wish to render on a particular monitor. Then, you call opengl draw commands. If you bind to a texture, or use a vertex object, it will use the shared graphics state that now exists on every card – you basically don’t have to worry about which card the texture comes from.

Another note about performance. I said that wglShareLists is only slow at the beginning of your app, as textures are transfered to each graphics card. This is only partly true. Your main render loop also now consists of perspective matrix setup, and draw commands, for each monitor. Ideally, since the graphics data is shared, it should be possible to instruct each GPU on the bus to do their rendering now in parallel (at the same time the other GPUs are rendering their monitors). However, as far as I know, modern GPUs can’t do this yet (NVIDIA?). Basically, your render loop has to wait while you send draw commands separately to each GPU, then wait for that GPU to finish so you can swap its buffer, thus updating each monitor. Fortunately, since the vertex/texture data is already on the card, and since you’ve writter your render code to bundle opengl calls together as much as possible (i hope!), then this doesn’t take too much longer.

So, the overall pseudo-code is:

1. Detect all hardware displays
2. Setup for each one
2a. … Create OS window (HWND CreateWindow method)
2b. … Get the HDC device context from the window (GetDC method)
2c. … Create HGLRC opengl context. (wglCreateContext method)
3. Call wglShareLists
4. Set wglMakeCurrent to HDC and HGLRC for context 0 (wglMakeCurrent method)
5. Create textures, VBOs, disp lists, frame buffers, etc.
6. Start main rendering (for each monitor)
6a. … Call wglMakeCurrent for HDC/HGLRC for specific monitor
6b. … Create projection, view matricies for specific monitor
6c. … Clear frame and depth buffer
6d. … Draw scene
6e. … Call wglSwapBuffers to refresh that monitor
6f. End render loop
7. Delete all textures, VBOs, then close contexts.

Using the methods above, I was able to render the happy Buddha (a test object in the graphics community) at over 60 fps with deferred shading, soft shadows, and depth of field on 6x monitors using three Nvidia GeForce 8800GTX cards.

A final point: I’ve found there are two types of multi-monitor research out there: 1) what most commerical games, and graphics students do – which is to figure out, at most, how to do a dual-monitor setup using a single dual-head card (one GPU), and 2) large research institutions that build giant display walls using dozens or hundreds of computers the old fashioned way. There is very little work so far using multiple GPUs in a single computer, probably because graphics cards to do this are so new (NVIDIA spends lots of time meeting the huge needs of parallel GPGPU scientific computing).

However, I encourage those interested to explore single computer multi-GPU rendering for these reasons: a) The hardware is relatively cheap now (an LCD can be had for $150 ea). b) This area of research is relatively unexplored so far. c) Although a projector gives a larger physical area, unlike a projector you actually increase your renderable resolution for every monitor added. Thats an anti-aliased pixel resolution of 3840 x 2048 for six screens (6x1280x1024). If you render to 6 projectors, were talking huge space. d) It looks really cool having a desktop running a game at ultra-highres on 6 screens!

For some screen-shots of results, check here:
http://www.rchoetzlein.com/art/recent/presence.htm
(with Dennis Adderton and Jeff Elings):

18 Responses to “Multi-Monitor Rendering in OpenGL”

  1. Piotr Garbat says:

    The article about multi_GPU viewing on yours webpage is very interesting.
    I don’t understand first part of pseudo code.

    1. Detect all hardware displays
    2. Setup for each one
    2a. … Create OS window
    2b. … Create HDC device context

    Could you explain me why you create OS window? It is necessary in full screen mode?

    How you enumerate GPU display card? Is the EnumDisplaySettingsEx funcion use to get position coordinates and size of monitor display screen?

  2. admin says:

    An OS window is created for each display to enable hardware acceleration on them. If you attempt to create only one OS window on multiple displays it will not be accelerated. This is also necessary in full screen mode, because the GPU will run on each display one at a time. I know this is true of Windows XP/NT, but have not checked on Windows 7 although I would guess its the same.

    You enumerate cards and modes in Windows using EnumDisplayDevices and EnumDisplaySettingsEx, there are similar functions in X-Windows lib. I will be releasing some open source code for this shortly in an upcoming project called LUNA.

  3. Raj says:

    is directx an easier approach for this?

  4. admin says:

    Good question. I haven’t tried multi-monitor in DirectX, although I suspect there should be a similar mechanism.

  5. Ian Roth says:

    Nice article!

    Are you using a single rendering thread or multiple threads? I’ve used multiple threads before with a dedicated rendering thread for each OpenGL context. This will at least avoid OpenGL context switching. With multiple GPUs it may allow commands to be sent to each GPU simultaneously. I would be curious to find out how this affects your rendering performance.

  6. admin says:

    Good question. I’ve only ever tried single threads. Many game engines are written using single threads (except for audio) as the CPU has to switch between game logic, render loading, and AI anyway. So a single thread can reduce CPU overhead. Multiple threads may give you some gains, but I’m not sure as I haven’t tried it. The multi-threading you’re talking should give improvements in CPU-GPU transfer, but the bigger cost of polygon rendering and pixel fillrate are still parallelized on multiple GPUs even with a single CPU thread. However there could be other benefits I’m not seeing though. Ultimately, the only way to know their real differences would be to try it.

  7. admin says:

    Continuing to respond on multi-threading, here is some quick thinking on the issue.
    In essence, Multi-GPU rendering consists of the following steps (sort-first method) per frame:

    1. CPU planning of the scene to render.
    2. CPU ordering/selection of how to filter or view the scene
    3. Transfer data from CPU to GPU #1
    4. Transfer data from CPU to GPU #2
    5. Transfer data from CPU to GPU #n
    6. Render polygons on GPU #1
    7. Render polygons on GPU #2
    8. Render polygons on GPU #n

    The heaviest costs are 6,7,8. Ideally these steps are always parallelized, even if the CPU is single threaded. I know that one GPU will render in parallel with the CPU. I have not fully tested if this is the case for multiple NVidia GPUs, but would be surprised if not. So 6,7,8 should be in parallel.

    So multi-threading really affects only steps 3,4,5. The idea being that a multi-core CPU should be able to multi-thread the loading of data from CPU to multiple GPUs. Keep in mind the PCI bus overhead for a game is minimal. All major data (meshes, textures) is transfered ahead of time, so the only PCI transfer is draw requests. So the added complexity of multi-threading the CPU would generally not be worth it, except in the case of truly dynamic geometry (transfered to GPU every frame). The context switches above, and even the CPU-GPU draw requests, are quite minimal compared to the rendering costs.

    For example, a typical game might take: 10 ms CPU logic, 20 ms GPU rendering, 5 ms audio, 1 ms CPU-GPU transfer. Since the GPU is rendering in parallel with the CPU, the CPU actually has 5 ms of free time to spare in this example (20 ms > 10 + 5+ 1) = plenty of time to load the GPU again.

    In general, however, I honestly have no idea how much multi-threading the CPU might help for truly dynamic geometry. Some interesting questions here about what modern hardware can do…

  8. Mahjai says:

    [ appreciation ]

    Good article…nice to see there is an option beyond working with the codebase of Chromium or Equalizer to manage multi-monitor, multi-headed GPU rendering under OpenGL.

    [ background ]
    am just returning to OpenGL work on Windows after nearly a decade of consoles, portables, & DX9…the OpenCL interop & handheld rise has inspired a return. DX9 does easily let you enumerate ‘adapters’ & get attached monitors (supporting dual-head or triple-head GPUs), but it has plenty of oddities as well-

    [ status ]
    Am eager to try this approach for the WGL world…have just gotten multi-monitors to run well on OSX and am attempting to build ‘easy-to-assemble-home-Virtual-Worlds’ in the wall-to-wall C.A.V.E. manner.

    [ question #1 ]

    Have read that Nvidia’s GL drivers send *all* commands, not just data upload ( VBOs, Textures) but actual render commands (DrawElements, etc) to all connected GPUs. Have either of you guys experienced this? Sounds like ATI (ahem, now AMD) lets you send 1:1 commands to each DC/RC pairing. Wonder what intel does? Especially now w/ SandyBridge having a rumored ‘solid/serious’ OpenGL 4.1 implementation.

    [ question #2 ]

    Has anyone here experimented with using an FBO per GPU (to render everything for a given scene/overall-camera-viewport) and then rendering (via a single quad render…or blit if the ATI_blit or NV_copy extensions are present) the FBO subsection to each monitor? This minimizes ‘draw-calls’ and makes a single pass although it could require *huge resources* for the color/depth buffer for the FBO? ( especially w/ three 2560×1600 monitors…)

    [ repeat thanks ]
    nice to find folks facing similar concerns. thanks for any feedback.
    Goodluck on LUNA-

  9. admin says:

    [Q1] – Yes, I’ve heard that too – that ATI/AMD lets you send commands to individual DC/RCs, but I’ve not yet tested it. It may also be possible with OpenGL if you don’t use the wglShareLists function. You could bind to each DC/RC directly, and repeat the commands yourself. This could give more flexibility, but may incur a cost since you are making the calls instead of the driver at a lower level.

    [Q2] – Interesting idea. It may be possible. But be aware that to be equivalent the FBO would need to be N times the original resolution (where N is number of monitors). Then, you need to transfer that data back to the CPU, then back over to each GPU in order to split it up. There is some cost there, but it may free you GPU cycles on the other GPUs for other things.

    Both ideas clearly deserve more investigation to make any real claims.

  10. To cope with the situation of slow-moving windows 7 you may look at a number of solutions. The basic spots to look at could be registry, adware and spyware presence and basic design of the system. Actual registry is a simple matter to consider. To complete pc registry arrangement you’ll need serious efforts and knowledge or alternatively – serious device to perform the duty. The alternatives for spyware and adware effect are somewhat apparent and offered even for free online. The matters associated with condition regarding the actual program – for example state of the Memory as well as sufficiency of it capacity or alternatively need for hard disc cleansing must be resolved either by complicated strategy of component evaluation or via employing specific software program.

  11. Leo says:

    Hello,

    great stuff, thx for sharing! I’m tring to use GL/CL interoperability and on my computer I have one GeForce GTX460 and one Radeon HD5870, each hooked to a monitor screen. And to do this, I need to share the GL context to the CL context creation, but I just can’t get that to work if I try using the secondary card as OpenCL device. The GPU Caps Viewer guys are able to do it very nicely for both cards no matter which card is set as primary device.

    So, I’ve some questions on this :)
    1- What you mean by “Create OS window”? Is that any different from “Create window”, like using HWND WINAPI CreateWindow(…) function?
    2- Do you have to register different classes for window creation or you use the same for all windows?
    3- Why we have to “Create HDC”? Is that to force a device context creation for a given card? What we usually do is to “GetDC” from the window’s handle, but indeed, I don’t know how to force window creation with the context device for a given card. But, at the same time, how to get your GL draws on a window if the hDC is not “Get” from the window handle? How to associate the window to the device context?

    Thank you for your attention and time!

  12. admin says:

    Responses:
    1. By Create OS Window, I just mean window CreateWindow or CreateWindowEx function
    2. I use the same window class for all windows
    3. There was an error in the pseudo-code above regarding HDC (which I just fixed). Actually, the only step is to Create HGLRC to create the OpenGL render context. The operating system device context (hDC) comes from the window using GetDC, as you mention. The steps I am using are these:

    HWND hw = CreateWindow ( L”LUNA”, L”", WS_OVERLAPPEDWINDOW, … )
    hDC = GetDC ( hw );
    nPixelFormat = ChoosePixelFormat ( hDC, &pfd);
    if ( nPixelFormat==0 ) { error.PrintF ( “devgl”, “Cannot choose pixel format.\n” ); }
    result = SetPixelFormat ( hDC, nPixelFormat, &pfd);
    HGLRC hRC = wglCreateContext ( hDC );
    result = wglMakeCurrent ( hDC, hRC);

    All of the above steps are repeated for each display. These are the setup steps. During rendering only the wglMakeCurrent function is called (for each screen)
    The last function wglMakeCurrent associates the OpenGL render context hRC with the current device context hDC. During the render loop, each OpenGL context is assigned to the device context in turn. This function is what causes opengl to target the output to a particular screen. Notice that the OpenGL context (hRC) is created on the hDC which is retrieved from a particular window using GetDC above. Thus, there is a unique hDC and unique hRC for each screen, although Windows (the OS) creates the hDCs for you after each window is created.

  13. Leo says:

    Thanks for that fast reply!

    So:

    1- how do you create window for each monitor? Is that by using the x, y arguments of the createwindow function? (I can\’t see any other possible way of doing that) Is that why you mention EnumDisplayDevices and EnumDisplaySettingsEx, so that you can get the devmod.dmPosition field to know each monitor\’s coordinate on the desktop coordinates system?
    2- How do you know that the hDC associated to a particular window is related to a given card? The fact of creating a window on its desktop part is enough to guarantee this?

    I ask this because when I try creating it like this, I have the impression that the opengl context created (with the steps you mentioned) will always belong to the primary card no matter where the window is created. In my system, the AMD card is the primary and the NVIDIA is the secondary, and when I try creating the OpenCL context from GL it always give me this error: CL_INVALID_GL_SHAREGROUP_REFERENCE_KHR (when an invalid OpenGL context or share group object handle is specified to the opencl context creation), like the gl context do not belong to the NVIDIA card (?). Maybe creating one window to each monitor and using wglsharelists will make it work? Do you have made any experience with CL/GL particularly with multi-gpu of different manufacturers?

    3- Have you tried your experiments with cards from different manufactures?

    4- One last question, how about the LUNA project, is it officially launched? :)

    Best regards

  14. admin says:

    Easier to answer in reverse order:
    - A demo version of LUNA is now available for download to play with here:
    http://www.karasemantic.com/node/6
    You’re welcome to try it out. Let me know if you have any install issues.

    - I am still working on the full version, and also on streamlining the code so the full source is not yet available. However, will send you the code files for device management and window detection by e-mail.

    - You use the function EnumDisplayDevices to report the adapters present. Note that in Windows with OpenGL, each monitor will appear as a new adapter even though they may be on the same card. The EnumDisplaySettingsEx function is then used determine the dimensions and position of the monitor in Windows layout format. The x/y coordinate of the monitor is used to tell OpenGL which card/device the display resides on.

    - In general, you don’t need to worry about which hDC it is. You create windows, Windows OS places them in a spatial layout, and creates any new hDC device contexts internally if necessary across cards. That’s why you should use GetDC to return the correct hDC for a specific window. It may be identical to others or it may not.

    - Using cards from different manufacturers is generally a bad idea. There can be incompatibilites in how they respond to the OpenGL server. Thus, I highly recommend getting cards from the same vendor, and all the same card if possible.

    Hope this helps!
    Check the code I sent as the final word on how to actually do the above.

  15. Leo says:

    Thx for the link! I’ve installed it (successfully), but if I install in a non default folder nothing will happens when launching, nothing at all, no matter if I launch if from double-click on the luna.exe or command line. However, if I install it in the default folder, it runs nicely (but I need some more time to explore it more deeply :) )

  16. admin says:

    Glad it installed ok. I’ll look into the install path issue.
    You can find a very quick tutorial for LUNA here:
    http://www.karasemantic.com/node/14

  17. Khandu says:

    Good article…nice to see there is an option beyond working with the codebase of Chromium or Equalizer to manage multi-monitor, multi-headed GPU rendering under OpenGL.

  18. Bill says:

    Thanks for the article!. I’m exploring OpenGL. :)

Leave a Reply


Reload Image
*