Accelerated Stencil Shadow Extrusion via Vertex Blending by (11 November 2003) 
Return to The Archives 
Introduction

One of the two major shadowing algorithms is the shadow volume algorithm, first described by Crow [1] and
Williams [4]. An excellent review describing the current state of the art was written by Eric Lengyel [3],
which also shows how to implement the algorithm robustly. Implementing shadow volumes requires geometry computations which have successfully been accelerated using programmable graphics hardware. This article shows a combined CPU/GPU approach where the graphices card can accelerate shadow volume extrusion even if a programmable vertex unit is not available. The geometry computations necessary for shadow volumes are silhouette determination and silhouette extrusion. Silhouette determination finds the contour between front and backfacing regions with respect to the lightsource. Silhouette extrusion then projects any vertex which is part of that contour away from the lightsource, preferably to infinity. It is possible to perform all of these computations on the GPU, but this requires a specially augmented geometry which increases the vertex load by a factor of 3. An alternative method only moves the silhouette extrusion to the GPU while the CPU does the silhouette determination [2]. It is the combined CPU/GPU approach that this article will be focusing on. 
The Combined CPU/GPU Approach to Shadow Volumes

In the combined CPU/GPU approach, the CPU does silhouette determination while the GPU does silhouette extrusion.
The geometry is prepared so that each vertex used for shadow casting comes in 2 versions, one time as (x, y, z, 0)
and another time as (x, y, z, 1). One of these will get projected away from the lightsource while the
other ones remain unaltered. Exactly which version of the vertices gets projected, the ones with w = 0 or the
ones with w = 1, is an arbitrary implementation choice. The task of the CPU is to write an index buffer that references
the proper vertices to make up a shadow volume, a front cap, or a back cap. To project a vertex at (x,y,z) away from a lightsource at (lx, ly, lz) towards infinity, the GPU has to perform the following calculation:
where (x', y', z', w') is the projected vertex. For directional lightsources, all vertices are projected to (lx, ly, lz, 0), so no pervertex computation is necessary. Usually a vertex program is employed to perform the projection, or not, depending of the w coordinate of the incoming vertex. 
Matrix Reformulation

The projection described above can be expressed in matrix form as
When multiplied with the modelview matrix (the matrix that transforms from model coordinates into eye coordinates), the resultant matrix will transform vertices from model coordinates directly to their projected position in eye space. Selective projection of vertices can thus be reformulated as a selective vertex transform, where the w coordinate of the incoming vertex dictates which matrix to apply. Fortunately, selective transform is available since the first generation of hardware T&L cards via vertex blending. We establish two transformation matrices and set up the render state in a way interprets the wcoordinate of the incoming vertex as blend weight for a blended transform. If we have transformation matrices M0 and M1, blend weight w and original vertex v, the operation becomes
When M0 is the matrix, that transforms vertices from model space into eye space, and M1 is a matrix that transforms vertices from model space to their projected position away from the lightsource, as discussed above, then all vertices that have w = 0 are projected away from the lightsource while the vertices that have w = 1 retain their original position. 
Example Code

I have provided example code how to set up vertex pointers and matrices in OpenGL, using GL_ARB_vertex_blend.
Note that in DirectX, the same functionality is available with the vertex format D3DFVF_XYZB1.
In this function, the vector of floats which is given as parameter is supposed to point into an array of (x, y, z, w) values. GL_WEIGHT_SUM_UNITY is specified as to implicitly make the weight for the second matrix complementary to the weight for the first one. The following code will then set up the appropriate modelview matrices.
In this function, the modelview parameter is the matrix that transforms from model coordinates into eye coordinates, while the lightpos parameter is a 3vector specifying the position of the lightsource in eye coordinates. 
Nonprojective Vertex Blending

The catch is that there exist implementations where projective matrices are not supported for vertex blending,
since vertex blending was originally designed for tweening or softskinning animation. When using vertex
blending for shadow volume extrusion, we face the problem that one of our matrices is projective.
Since we are projecting to infinity, the projection consists merely of setting w = 0 in the output vertex.
If the hardware doesn't allow projective vertex blending, we can alternatively make x, y and z huge and
retain w = 1. The correct substitution would be to make x, y and z infinitely large, which is not possible.
But for practical purposes there may be a large enough factor that works ok in a given situation. The alternative projection operation then is
and the corresponding matrix is
Now that both matrices have the last row as (0, 0, 0, 1), we can use it even for nonprojective vertex blending. The altered setup code is shown below.

Conclusion

An alternative way of accelerating the shadow volume extrusion has been shown that can be hardware accelerated on any T&L card. 
References

[1] Crow F (1977) Shadow Algorithms for Computer Graphics. Computer Graphics 11: 242247 [2] Kilgard M J, Everitt C (2003) Optimized Stencil Shadow Volumes. GDC Presentation: http://developer.nvidia.com/docs/IO/8230/GDC2003_ShadowVolumes.pdf [3] Lengyel E (2002) The Mechanics of Robust Stencil Shadows. Gamasutra Feature: http://www.gamasutra.com/features/20021011/lengyel_01.htm [4] Williams L (1978) Casting Curved Shadows on Curved Surfaces. Computer Graphics 12: 270–274 
