HOME | DD

Description
There is a paper from 1968 which described a recurring pattern in the design of computer graphics hardware: it would turn out that certain graphics algorithms were too slow to run on general-purpose CPUs, so simple specialized hardware was created to speed up those functions. Then the algorithms got more complicated, and so did the specialized hardware. Eventually the hardware was effectively turning into something resembling a general-purpose CPU. Whereupon it was found worthwhile to offload certain functions onto yet another layer of specialist hardware, and so the circle went round again.

The precise details of those specialist functions are no longer relevant with today’s technology. But I think the general principle still applies.

Before delving more into this, first let us be clear about the two main kinds of 3D graphics rendering: there is real-time rendering, where successive frames have to be output fast enough to give the perception of motion at actual speed, and there is non-real-time rendering, where each frame can take as long as you like to render (typically several minutes or even several hours), because they are separately encoded into the final sequence for later real-time playback. Alternatively, they can be described more concisely as online versus offline rendering, respectively.

Online rendering is needed for interactive applications, the best-known one being video games. But before video games, there were flight simulators, which were a cheaper (and less risky) way of training pilots (particularly inexperienced ones) than putting them in real aircraft. As the hardware cost came down, other kinds of simulations became feasible, and nowadays you can even learn to drive a car in one. Moreover, there has been a convergence, so that serious professional-quality simulators can now be built on the same sort of consumer-grade hardware that is used for PC-based video games.

The programming interfaces (APIs) for online versus offline rendering tend to be very different. The most common online rendering API is OpenGL , which is found on just about every platform, from mobile smartphones to desktop PCs, workstations and beyond. Microsoft is also fond of its own Direct3D (part of the DirectX API suite), but that is confined to the Windows platform.

By contrast, there are a huge number of different offline renderers, each with its own API. Different renderers tend to have very different ways of specifying material and lighting settings, fidelity (or not) to physical realism, the level of quality desired--just about every way that they can be different. For example, Pixar created its own in-house renderer, called “RenderMan ”. It also published the API spec for this renderer, called the “RenderMan Interface Specification”. In theory, it is possible for other groups to create their own renderers that conform to this same spec. And in fact this has been done, with the Aqsis open-source project. But most offline renderers do not bother to conform to any common interface specification.

At one time, the requirement for online rendering was to be able to output on the order of 24 frames per second. Nowadays this is considered a barely-adequate minimum, and serious video gamers go for rigs that can manage 60 frames per second at HD resolution, i.e. 1920×1080 pixels per frame. This kind of specification is currently achievable for a four-figure price tag, while higher-end rigs are available that can do the same frame rate at UHD (Ultra-HD), or 3840×2160 resolution.

Some of these video games do look quite breathtakingly realistic--at least at first glance. If you can do high-quality rendering in real-time, why would you bother with non-real-time rendering? The answer is that there are always compromises in real-time rendering. The most obvious are the sacrifices in the realism of light and shadow. It is simply impossible to do physically accurate optical effects on arbitrary scenes in real time with currently-available graphics hardware: this requires ray-tracing--computing the path of hundreds or even thousands of rays of light for every single pixel. Each ray of light might bounce around the scene, interacting with multiple objects; if light hits a partly-transparent, partly-reflective object, then part of the light goes through and part of it will bounce off, so both components have to be separately computed. Multiply this computation problem by the number of pixels in each frame, and you begin to see the magnitude of the work involved.

Video games use tricks like baking, or precomputing part of the behaviour of various materials and objects at the time the game is being developed. This can speed up the final rendering at gameplay time immensely, but at the cost of fixing the characteristics of those materials and objects. For example, if you precompute a shadow under an object, then you cannot allow the player to pick up that object, because then the precomputed shadow would be wrong. And so you end up with a scene where the objects that are fixed in place have shadows, but the ones that the player can pick up have no shadows.

Another common trick is environment mapping. To make an object look shiny, it is easy enough to prerender the surrounding scene into an image which is than texture-mapped onto the surface of the object to look like a reflection--the computation required to display the texture in this way is modest enough to be performed at interactive speeds nowadays. But that image texture can only include the fixed part of the surrounding scene. If there are other objects dynamically appearing and moving in the same scene, you may notice that their reflections do not show in the shiny object.

A classic early example of environment mapping (from the days when it couldn’t be done in real time) is Chromosaurus from 1985. Notice how each dinosaur shows a reflection of the surrounding landscape, but not of the other dinosaurs.

But anyway, back to the wheel of reincarnation.

I was thinking about it while watching this two-part tour (part one and part two ) of a monster SGI Onyx2 cluster. This particular setup has five general-purpose compute nodes and five special-purpose graphics nodes, and would have sold new in the mid-1990s for several million dollars.

What would it have been used for? You will note the video connectors at the back. It would have been used to generate 3D images, probably in real time, for display on monitors or recording to video tape. Unlike photographic movie film, analog video could not be recorded one frame at a time: images had to be captured in real time, at full playback speed.

Digital video frames could have been rendered first to disk, and then played back in real time with less CPU effort. But remember, this was a time when multi-gigabyte hard drives were considered huge. You might have been able to hold a few minutes’ worth of SD-video-quality frames at best on the available hard drive space.

Or, the frames could have been fed to a digital film printer for movie work. Unlike analog video, these machines could expose one frame of photographic film at a time. Presumably all the special-purpose hardware was then expected to be useful just for speeding up the rendering, even if it wasn’t necessarily happening in real time.

In this promo tape for the now-defunct Softimage 3D-modelling/animation software package from 1995 (part one , part two ), you will notice several mentions of how the application itself is functionally identical on SGI hardware and on Windows NT machines, but the latter render faster than the former. How could this be? The Windows NT boxes would have been fairly generic x86-based machines, which at the time did not come with any hardware 3D graphics capability--unlike the SGI systems, with their multi-bitplane frame buffers, “geometry engines” and all the rest of it. So why were the NT boxes faster?

The reason is, all that special-purpose SGI hardware was useless for offline rendering. As algorithms for computing the effect of light and shadow, reflection and refraction got more advanced, they became a poor fit for the specialized graphics hardware of the time. So it came down to raw CPU power. And here, Intel had the edge over the MIPS chips used in the SGI machines, simply because it was selling into a markets orders of magnitude larger in size, and so it could invest correspondingly more into improving the performance of its chips. SGI was simply unable to compete with the mass computing market.

The difference in hardware suitability for online versus offline rendering persists to today. One new development is that the more sophisticated graphics cards of the last several years are incorporating more programmable capability. This is reflected in a shift in OpenGL graphics APIs from the old fixed-function pipeline, with its simplistic model of lighting and materials, towards programmable “vertex shaders” and “fragment (pixel) shaders” (and other shader types in newer versions of OpenGL), which are written in the purpose-built “GL Shader Language”, quite separate from whatever language you might be using to write your actual graphics program.

The next logical step seems to be the advent of the “general-purpose GPU” or “GPGPU”, which could be used for sophisticated computational tasks other than online rendering--and such tasks would include offline rendering. This has been happening to some extent for some time. But there are limitations. The main one is the lack of a standard. The Khronos Group (the same outfit responsible for developing OpenGL) has created OpenCL , the “Open Compute Language”, which is supposed to abstract away from hardware differences in GPGPUs.

Unfortunately, industry support for OpenCL seems a bit spotty. The market leader, Nvidia, prefers its own proprietary “Cuda” system, though it does also offer OpenCL as an option. The Blender 3D modelling software package includes a wonderful, high-quality offline renderer called Cycles, which can be configured to do perform its intensive computations on either CPU or GPU. But this continues to hit issues with its use of OpenCL on the other major vendor of GPGPU hardware, AMD. It looks like Nvidia dominates the GPGPU arena almost completely for now, which does not make for a healthy competitive market. As long as this remains the case, it seems like the wheel of computer graphics reincarnation will progress slowly, if at all, beyond this early stage of the next revolution.

Description
There is a paper from 1968 which described a recurring pattern in the design of computer graphics hardware: it would turn out that certain graphics algorithms were too slow to run on general-purpose CPUs, so simple specialized hardware was created to speed up those functions. Then the algorithms got more complicated, and so did the specialized hardware. Eventually the hardware was effectively turning into something resembling a general-purpose CPU. Whereupon it was found worthwhile to offload certain functions onto yet another layer of specialist hardware, and so the circle went round again.

The precise details of those specialist functions are no longer relevant with today’s technology. But I think the general principle still applies.

Before delving more into this, first let us be clear about the two main kinds of 3D graphics rendering: there is real-time rendering, where successive frames have to be output fast enough to give the perception of motion at actual speed, and there is non-real-time rendering, where each frame can take as long as you like to render (typically several minutes or even several hours), because they are separately encoded into the final sequence for later real-time playback. Alternatively, they can be described more concisely as online versus offline rendering, respectively.

Online rendering is needed for interactive applications, the best-known one being video games. But before video games, there were flight simulators, which were a cheaper (and less risky) way of training pilots (particularly inexperienced ones) than putting them in real aircraft. As the hardware cost came down, other kinds of simulations became feasible, and nowadays you can even learn to drive a car in one. Moreover, there has been a convergence, so that serious professional-quality simulators can now be built on the same sort of consumer-grade hardware that is used for PC-based video games.

The programming interfaces (APIs) for online versus offline rendering tend to be very different. The most common online rendering API is OpenGL , which is found on just about every platform, from mobile smartphones to desktop PCs, workstations and beyond. Microsoft is also fond of its own Direct3D (part of the DirectX API suite), but that is confined to the Windows platform.

By contrast, there are a huge number of different offline renderers, each with its own API. Different renderers tend to have very different ways of specifying material and lighting settings, fidelity (or not) to physical realism, the level of quality desired--just about every way that they can be different. For example, Pixar created its own in-house renderer, called “RenderMan ”. It also published the API spec for this renderer, called the “RenderMan Interface Specification”. In theory, it is possible for other groups to create their own renderers that conform to this same spec. And in fact this has been done, with the Aqsis open-source project. But most offline renderers do not bother to conform to any common interface specification.

At one time, the requirement for online rendering was to be able to output on the order of 24 frames per second. Nowadays this is considered a barely-adequate minimum, and serious video gamers go for rigs that can manage 60 frames per second at HD resolution, i.e. 1920×1080 pixels per frame. This kind of specification is currently achievable for a four-figure price tag, while higher-end rigs are available that can do the same frame rate at UHD (Ultra-HD), or 3840×2160 resolution.

Some of these video games do look quite breathtakingly realistic--at least at first glance. If you can do high-quality rendering in real-time, why would you bother with non-real-time rendering? The answer is that there are always compromises in real-time rendering. The most obvious are the sacrifices in the realism of light and shadow. It is simply impossible to do physically accurate optical effects on arbitrary scenes in real time with currently-available graphics hardware: this requires ray-tracing--computing the path of hundreds or even thousands of rays of light for every single pixel. Each ray of light might bounce around the scene, interacting with multiple objects; if light hits a partly-transparent, partly-reflective object, then part of the light goes through and part of it will bounce off, so both components have to be separately computed. Multiply this computation problem by the number of pixels in each frame, and you begin to see the magnitude of the work involved.

Video games use tricks like baking, or precomputing part of the behaviour of various materials and objects at the time the game is being developed. This can speed up the final rendering at gameplay time immensely, but at the cost of fixing the characteristics of those materials and objects. For example, if you precompute a shadow under an object, then you cannot allow the player to pick up that object, because then the precomputed shadow would be wrong. And so you end up with a scene where the objects that are fixed in place have shadows, but the ones that the player can pick up have no shadows.

Another common trick is environment mapping. To make an object look shiny, it is easy enough to prerender the surrounding scene into an image which is than texture-mapped onto the surface of the object to look like a reflection--the computation required to display the texture in this way is modest enough to be performed at interactive speeds nowadays. But that image texture can only include the fixed part of the surrounding scene. If there are other objects dynamically appearing and moving in the same scene, you may notice that their reflections do not show in the shiny object.

A classic early example of environment mapping (from the days when it couldn’t be done in real time) is Chromosaurus from 1985. Notice how each dinosaur shows a reflection of the surrounding landscape, but not of the other dinosaurs.

But anyway, back to the wheel of reincarnation.

I was thinking about it while watching this two-part tour (part one and part two ) of a monster SGI Onyx2 cluster. This particular setup has five general-purpose compute nodes and five special-purpose graphics nodes, and would have sold new in the mid-1990s for several million dollars.

What would it have been used for? You will note the video connectors at the back. It would have been used to generate 3D images, probably in real time, for display on monitors or recording to video tape. Unlike photographic movie film, analog video could not be recorded one frame at a time: images had to be captured in real time, at full playback speed.

Digital video frames could have been rendered first to disk, and then played back in real time with less CPU effort. But remember, this was a time when multi-gigabyte hard drives were considered huge. You might have been able to hold a few minutes’ worth of SD-video-quality frames at best on the available hard drive space.

Or, the frames could have been fed to a digital film printer for movie work. Unlike analog video, these machines could expose one frame of photographic film at a time. Presumably all the special-purpose hardware was then expected to be useful just for speeding up the rendering, even if it wasn’t necessarily happening in real time.

In this promo tape for the now-defunct Softimage 3D-modelling/animation software package from 1995 (part one , part two ), you will notice several mentions of how the application itself is functionally identical on SGI hardware and on Windows NT machines, but the latter render faster than the former. How could this be? The Windows NT boxes would have been fairly generic x86-based machines, which at the time did not come with any hardware 3D graphics capability--unlike the SGI systems, with their multi-bitplane frame buffers, “geometry engines” and all the rest of it. So why were the NT boxes faster?

The reason is, all that special-purpose SGI hardware was useless for offline rendering. As algorithms for computing the effect of light and shadow, reflection and refraction got more advanced, they became a poor fit for the specialized graphics hardware of the time. So it came down to raw CPU power. And here, Intel had the edge over the MIPS chips used in the SGI machines, simply because it was selling into a markets orders of magnitude larger in size, and so it could invest correspondingly more into improving the performance of its chips. SGI was simply unable to compete with the mass computing market.

The difference in hardware suitability for online versus offline rendering persists to today. One new development is that the more sophisticated graphics cards of the last several years are incorporating more programmable capability. This is reflected in a shift in OpenGL graphics APIs from the old fixed-function pipeline, with its simplistic model of lighting and materials, towards programmable “vertex shaders” and “fragment (pixel) shaders” (and other shader types in newer versions of OpenGL), which are written in the purpose-built “GL Shader Language”, quite separate from whatever language you might be using to write your actual graphics program.

The next logical step seems to be the advent of the “general-purpose GPU” or “GPGPU”, which could be used for sophisticated computational tasks other than online rendering--and such tasks would include offline rendering. This has been happening to some extent for some time. But there are limitations. The main one is the lack of a standard. The Khronos Group (the same outfit responsible for developing OpenGL) has created OpenCL , the “Open Compute Language”, which is supposed to abstract away from hardware differences in GPGPUs.

Unfortunately, industry support for OpenCL seems a bit spotty. The market leader, Nvidia, prefers its own proprietary “Cuda” system, though it does also offer OpenCL as an option. The Blender 3D modelling software package includes a wonderful, high-quality offline renderer called Cycles, which can be configured to do perform its intensive computations on either CPU or GPU. But this continues to hit issues with its use of OpenCL on the other major vendor of GPGPU hardware, AMD. It looks like Nvidia dominates the GPGPU arena almost completely for now, which does not make for a healthy competitive market. As long as this remains the case, it seems like the wheel of computer graphics reincarnation will progress slowly, if at all, beyond this early stage of the next revolution.