Vertex Buffer Objects (VBOs)

Efficient rendering of geometric data
using OpenGL VBOs in SPECviewperf

Introduction

The goal of SPECviewperf is to be a good predictor of graphics performance for real-world applications. The testing files (viewsets) within SPECviewperf generate OpenGL command streams that are similar to those used by the application. SPECviewperf provides a measure of graphics subsystem performance and its impact on the complete system, without the full overhead of an application.

Given its connection to real applications, it is important that SPECviewperf can provide performance measurement based on new technologies implemented in those applications. An example of such a development is vertex buffer objects (VBOs), which are included in OpenGL 1.5.

VBOs offer a way to obtain performance and flexibility benefits for OpenGL applications. This white paper details some of the motivations behind VBOs, as well as the specific OpenGL functions related to their use. It also touches on potential performance implications and shows where VBO functions are placed within the SPECviewperf test-harness code.

Background

OpenGL traditionally provides two main approaches for rendering geometric data – immediate mode and display lists.

When using immediate mode, applications send all the geometric data to the graphics processor (GPU) every frame, which is advantageous in situations such as modeling or animation where geometry is frequently created or modified. If geometric data does not change frequently, however, immediate mode can result in wasted data transfer when compared with storing the same geometric data within graphics memory.

Because immediate mode transfers data as individual elements, such as a single vertex or normal, it typically creates significant traffic to and from system memory and over the CPU’s front-side bus. This translates into increased CPU cycles to perform the actual transfer. These effects are further compounded by greater function call overhead in the API, which at a hardware level results in increased traffic and CPU cycles.

As attributes such as colors and texture coordinates are associated with vertices to improve visual quality, the problem compounds. Triangle strips, triangle fans and line strips attempt to mitigate some of the data transfer needs by allowing individual vertices to define a triangle/line. In spite of this, however, immediate mode frequently causes data retrieval, transfer and CPU bottlenecks that inhibit overall graphics performance.

As an alternative to immediate mode, OpenGL provides display lists. These enable a series of graphics commands to be grouped together. This gives OpenGL implementations more opportunity to process and store data in ways that can improve overall graphics performance. Display lists can be stored within graphics memory, for example, to avoid transfer over the graphics bus.

Display lists also make it attractive for OpenGL implementations to allow GPUs to pull data directly from system memory with DMA transfers. While it is possible to transfer an individual vertex by a DMA transfer, the benefits of reduced CPU cycles and front-side bus traffic are more than outweighed by the setup costs involved. Display lists allow more data and/or commands to be transferred in one transfer and setup.

Despite these benefits, display lists do have some disadvantages. In some situations, geometric data changes require creating a new display list. Depending on the frequency with which geometric data is updated, the potential performance advantages may be outweighed by the complexities of managing creation/deletion of display lists. Similarly, for best performance it is assumed some OpenGL states will not change within the display list. If a state does in fact change, then the benefits of display lists may not apply because it forces OpenGL implementations to potentially store values in system memory and/or update the GPU’s settings. This would prevent commands and data from being processed as a block and require CPU intervention.

Display lists are created by a program and issued to the OpenGL client. Ultimately, however, they are processed by the GPU from a copy stored by the OpenGL server. This creates a doubling of data when compared with immediate mode. It also raises another issue: The size of the OpenGL server copy of the display list is not visible to the OpenGL program. This can cause issues when memory space is constrained.

As an alternative to display lists, OpenGL also implements vertex arrays. These allow vertex and attribute data to be grouped and treated as a block, which promotes some of the data transfer efficiencies afforded by display lists. Vertex arrays also allow data such as geometry and color to be interleaved, which can be convenient when creating and referencing. Unfortunately, vertex arrays prohibit assuming that any individual piece of data will not change. As a result, when drawing an object using vertex arrays, the data in the array must be validated each time it is referenced. This adds overhead into data transfer. Vertex arrays do not suffer, however, from the limitation of storing two copies of all data.

VBOs are intended to enhance the capabilities of OpenGL by providing many of the benefits of immediate mode, display lists and vertex arrays, while avoiding some of the limitations. They allow data to be grouped and stored efficiently like vertex arrays to promote efficient data transfer. They also provide a mechanism for programs to give hints about data usage patterns so that OpenGL implementations can make decisions about the form in which data should be stored and its location. VBOs give applications the flexibility to be able to modify data without causing overhead in transfer due to validation. When combined with programmability, VBOs extend OpenGL’s capabilities into new areas, such as modifying vertex data with previously rendered pixel data, and render to vertex array.

Detailed description of VBOs

The idea behind VBOs is to provide regions of memory (buffers) accessible through identifiers. A buffer is made active through binding, following the same pattern as other OpenGL entities such as display lists or textures.

VBOs provide control over the mappings and unmappings of buffer objects and define the usage type of the buffers. This allows graphics drivers to optimize internal memory management and choose the best type of memory – such as cached/uncached system memory or graphics memory – in which to store the buffers.

The binding operation converts each pointer in the client-state function into offsets relative to the current bound buffer. As a result, the bind operation turns a client-state function into a server-state function. The scope of data used by client-state functions is only accessible by the OpenGL client itself and other OpenGL clients are not able to access that client’s data. Because the VBO mechanism changes client-state functions into server-state functions, it is now possible to share VBO data among various clients. As a result, OpenGL clients are able to bind common buffers in the same way as textures or display lists.

The following is an outline of the key OpenGL calls associated with VBO usage:

glBindBuffer: This allows client-state functions to use binding buffers instead of working in absolute memory on the client side. Binding the buffer #0 switches off VBO and reverts to the usual client-state mode with absolute pointers.
glBufferData, glBufferSubData, and glGetBufferSubData: These functions control the size of the buffer data, provide usage hints, and allow copying to a buffer.
glMapBuffer and glUnmapBuffer: These functions lock and unlock buffers, allowing data to be loaded into them or relinquishing control to the server. A temporary pointer is returned as an entry to the beginning of the buffer, which also maps the buffer into client memory. OpenGL is responsible for how this mapping into the client’s absolute memory occurs. Because of this, mapping must be done for a short operation, and the pointer is not persistent and should be stored for further use.

VBOs are intended to work with the following OpenGL target objects:

Array buffers (ARRAY_BUFFER): These buffers contain vertex attributes, such as vertex coordinates, texture coordinate data, per vertex-color data, and normals. They can be interleaved (using the stride parameter) or sequential, with one array after another (write 1,000 vertices, then 1,000 normals, and so on). glVertexPointer and glNormalPointer each point to the appropriate offsets.
Element array buffers (ELEMENT_ARRAY_BUFFER): This type of buffer is used mainly for the element pointer in glDraw[Range]Elements(). It contains only indices of elements.

These two targets should be set up so that the element arrays are available at the same time as array buffers in glDraw[Range]Elements(). The targets enable users to switch among various element buffers while keeping the same vertex array buffer. This can be used to implement LOD and other effects by changing the elements table while working on the same database of vertices.

New procedures, functions and tokens

Usage flags

STREAM_DRAW
STREAM_READ
STREAM_COPY
STATIC_DRAW
STATIC_READ
STATIC_COPY
DYNAMIC_DRAW
DYNAMIC_READ
DYNAMIC_COPY

Access flags

READ_ONLY
WRITE_ONLY
READ_WRITE

Targets

ARRAY_BUFFER
ELEMENT_ARRAY_BUFFER

void BindBuffer (enum target, uint buffer):

The BindBuffer function is used to bind a buffer ID as the actual buffer to use. It switches off the use of buffers if the ID is zero.

void *MapBuffer (enum target, enum access);
boolean UnmapBuffer (enum target);

The function MapBuffer provides a pointer corresponding to the mapped area of the current buffer object. UnmapBuffer releases the mapping.

void BufferData (enum target, sizeiptr size, const void *data, enum usage);

The BufferData function can be used two ways:

To set up the memory amount and usage for the current buffer object with data set to NULL. The user can map the buffer later to set up its data.
To allocate memory, set the usage, and copy data; typically used when dealing with a static memory model.

void BufferSubData (enum target, intptr offset, sizeiptr size, const void *data);

The BufferSubData function copies data in a specific range inside the buffer object.

void GetBufferSubData (enum target, intptrARB offset, sizeiptrARB size, void *data);

The GetBufferSubData function retrieves sub-data from a specific range in the current buffer object.

void DeleteBuffers (sizei n, const uint *buffers);
void GenBuffers (sizei n, uint *buffers);
boolean IsBuffer (uint buffer);

These three functions are similar to display list/textures identifiers; they can allocate, free or query identifiers for buffer objects.

void GetBufferParameteriv (enum target, enum pname, int *params);

The GetBufferParameteriv function returns various parameters concerning the current buffer object. Pname can be:

BUFFER_SIZE: Returns the size of the buffer object.
BUFFER_USAGE: Returns the usage of the buffer object.
BUFFER_ACCESS: Returns the access flag of the buffer object.
BUFFER_MAPPED: Indicates if this buffer is mapped.

void GetBufferPointerv (enum target, enum pname, void **params);

The GetBufferPointerv function returns the actual pointer of the buffer if it has been mapped (MapBuffer). Pname can only be BUFFER_MAP_POINTER for this time.

Tokens for Get{Boolean, Integer, Float, Double}v

The buffer object ID zero is reserved, and when buffer object zero is bound to a given target, the commands affected by that buffer binding behave normally. When a nonzero buffer ID is bound, then the pointer represents an offset, and will go through VBO management.

These tokens show which buffers are bound as VBO offsets:

ARRAY_BUFFER_BINDING
ELEMENT_ARRAY_BUFFER_BINDING
VERTEX_ARRAY_BUFFER_BINDING
NORMAL_ARRAY_BUFFER_BINDING
COLOR_ARRAY_BUFFER_BINDING
INDEX_ARRAY_BUFFER_BINDING
TEXTURE_COORD_ARRAY_BUFFER_BINDING
EDGE_FLAG_ARRAY_BUFFER_BINDING
SECONDARY_COLOR_ARRAY_BUFFER_BINDING
FOG_COORDINATE_ARRAY_BUFFER_BINDING
WEIGHT_ARRAY_BUFFER_BINDING

Token for GetVertexAttribiv:

When working with VBOs and vertex programs, some attributes can have arbitrary meanings. An array of normals, for example, can be used to store other information. Instead of using a token from the previous section, the index of the attribute can be used. This token allows the user to query which attribute number is being used by VBOs through an offset system.

VERTEX_ATTRIB_ARRAY_BUFFER_BINDING

Purposes of various VBO functions

glBufferData()

This function is an abstraction layer between the memory and the application. Behind each buffer object is a complex memory management system. The glBufferData() function looks at the size and type of the data store, reserves storage, and optionally initializes the data from the user’s pointer. If storage space was previously allocated for this buffer, an individual implementation may choose to either reuse the previous storage or discard the current storage and allocate a new storage. If the data pointer specified is not NULL, the storage for the buffer is initialized with size machine units (typically bytes) from the data pointer. For specifics on when memory associated with the buffer is freed instead of resized, please consult documentation from individual GPU vendors.

Usage flags

The usage argument is a key value for helping the VBO memory manager fully optimize buffers. While these values are only hints, and they can be ignored by the implementation, applications are strongly encouraged to provide correct usage flags. Additional implementation-specific information on the interpretation of hints may be available from GPU vendors.

Name of flag	Definition
STATIC_...	Assumed to be a 1-to-n update-to-draw. Means the data is specified once, or possibly very rarely.
DYNAMIC_...	Assumed to be an n-to-n update-to-draw. Generally, it means data that is updated frequently, but is drawn multiple times per update, such as any dynamic data that is updated every few frames or so.
STREAM_...	Assumed to be a 1-to-1 update-to-draw. Can be thought of as data that is updated about once each time it’s drawn. STREAM is like DYNAMIC: Data will be changed over time. Data is expected to change frequently.
..._READ_...	Means there must be easy access to read the data. This option is typically not meaningful for VBOs by themselves.
..._COPY_...	Means _READ_ and _DRAW_ operations will be used on this buffer. This option is typically not meaningful for VBOs by themselves.
..._DRAW_...	Means the buffer will be used for sending data to the GPU.

Table 1: List of usage flags

This combination of memory usage can help an implementation’s memory manager balance between different kinds of memory, such as system, uncached and video. Since different categories of memory have different access characteristics for the CPU and GPU, these usage hints allow the proper selection to occur. On the client side, these are not hard restrictions, but suggestions that help graphics drivers decide where to store the data and how to manage it. Nothing prevents creating a STATIC data store and then updating it every frame. Nor is there any reason the user can’t create a STREAMING data store that is never modified, although such usage patterns in conflict with supplied hints are strongly discouraged.

glBufferSubData()

This function gives the user a way to replace a range of data in an existing buffer. It works much in the same ways as glCopyTexSubImage(). An individual implementation may either interlock or queue the update to ensure that all previous rendering operations from this buffer have completed.

glBindBuffer()

This sets the current buffer object. All subsequent calls to set array pointers will refer to this object, and all updates will occur to this buffer. Binding the special buffer name to zero tells the driver not to use buffer objects.

glMapBuffer()

This function maps the buffer object into the client’s memory, if it is possible. The pointer returned can be both read from and written to directly by the CPU, allowing arbitrary updates. To maintain the proper OpenGL semantics, where operations always appear to occur in order, the implementation may be required to either stall or make a copy of the buffer to allow the mapping to occur, if the buffer is still in use by the GPU. When the buffer cannot be mapped, the implementation will return a NULL pointer.

glUnmapBuffer()

This function unmaps the buffer object from the client’s memory. It returns a success code that the application should check to ensure the update occurred correctly. When a failure is reported, the contents of the buffer may have become undefined due to an extraordinary event occurring while the buffer was mapped. In this case, the data should be resubmitted by the application.

glVertexPointer()

This function sets up the offset (originally a pointer), depending on the current buffer object.

Suggestions for efficient VBO usage

Updating data efficiently

Keep in mind that the driver cannot guess what to do with the memory pointer returned by glMapBuffer(). Will a few bytes be changed, or will the whole buffer be updated? The pointer returned by glMapBuffer() refers to the actual location of the data. It is possible that the GPU could be working with this data, so requesting it for an update will force the driver to wait for the GPU to finish its task.

To solve this conflict, glBufferData() can be called with a NULL pointer to discard the previous buffer, or the glBufferSubData() function can be used instead to specify the exact subregion. Calling glMapBuffer() tells the driver that the previous data is no longer valid. As a consequence, if the GPU is still working on the data, there will not be a conflict, and the implementation may allocate a new buffer. The glMapBuffer() function may return this new pointer that can be used while the GPU is working on the previous set of data. In the glBufferSubData() case, the data must be updated in a contiguous block. No reading of the data is allowed, so the implementation may be able to queue the update.

Avoid CPU operations with vertex buffer objects

While vertex buffer objects offer great potential in the efficiency of providing data to the GPU, they are often highly inefficient when coupled with operations that require CPU processing. As a result, feedback and selection may not perform well when combined with vertex buffer objects. Additionally, building display lists from data in a vertex buffer object or using glArrayElement() with vertex buffer objects will typically be highly inefficient.

Utilize GPU-friendly data types and alignment

With vertex buffer objects, it is now the job of the GPU to directly interpret the data, whereas the CPU could previously reformat it as needed during submission. If the data format that is placed in a vertex buffer object cannot be directly handled by the GPU, the implementation may have to read the data back to the CPU for processing, which is often highly inefficient. It is best to check with GPU vendors for the full list of optimal formats, but most common data types are presently supported, as long as the attribute is aligned on a 32-bit boundary.

Use “first” in glDrawArrays() value instead of changing glVertexPointer

In the function:

glDrawArrays (GLenum mode, GLint first, GLsizei count);

Instead of changing glVertexPointer() to a specific offset and leaving “first” to NULL, it can be more efficient to change the “first” argument of glDrawArrays.

Use glDrawRangeElements instead of glDrawElements

Using range elements is more efficient for two reasons:

If the specified range can fit into a 16-bit integer, the driver can optimize the format of indices to pass to the GPU. It can turn a 32-bit integer format into a 16-bit integer format. In this case, performance doubles.

The range is precious information for the VBO manager, which can use it to optimize its internal memory configuration.

Implementing VBOs within the SPECviewperf test harness

As mentioned in the introduction, the goal of SPECviewperf is to test graphics hardware by delivering OpenGL command streams taken from real applications. As a performance evaluation tool, SPECviewperf has to be able to use VBOs in a wide variety of ways to reflect application usage.

The usage patterns of many of the applications covered with the current SPECviewperf viewsets would typically use a static data model for VBOs, where the data is defined once and drawn many times. Because of this, the GL_STATIC_DRAW usage hint is the default. As applications adopt and use VBOs, SPECviewperf can easily accommodate different usage patterns.

The current implementation of VBOs within SPECviewperf doesn’t transfer data into or out of buffers, so the glMapBuffer and glUnMapBuffer calls are not made. As VBOs become adopted and implemented within applications, it is expected that SPECviewperf will be modified accordingly.

Here are the key places where VBOs are implemented within SPECviewperf:

Step 1 - Create pointers, allocate memory, generate buffer handles, and define buffer attributes (viewperf.c):

……
mode.useVertexBufferObjects = 0;
mode.vboUsageMode = GL_STATIC_DRAW;
mode.vboMaxSize = 0;
mode.vboMaxPrims = 0;
……

unsigned int currVboID;
int vtxSize = numVertsInVBO * sizeof ( struct vector);
int colSize = numVertsInVBO * sizeof ( struct colorvector);
int nmlSize = numVertsInVBO * sizeof ( struct vector);
int texSize = numVertsInVBO * sizeof ( struct texvector);
int vtxOffs = 0;
int colOffs = vtxOffs + vtxSize;
int nmlOffs = colOffs + colSize;
int texOffs = nmlOffs + nmlSize;
int vboSize = texOffs + texSize;
int vtxDelta = prevVertexPointer - vertexData;

if (numVertexBufferObjects >= allocVertexBufferObjects){
allocVertexBufferObjects = 2 * allocVertexBufferObjects + 16;
vertexBufferObjects = realloc(vertexBufferObjects, allocVertexBufferObjects * sizeof (GLuint));

if (!vertexBufferObjects) {
printf("Error: could not allocate memory for vertexBufferObjects\n");
exit(0);

}
}

glGenBuffers(1, (GLuint *) &currVboID);
vertexBufferObjects[numVertexBufferObjects++] = currVboID;
glBindBuffer(GL_ARRAY_BUFFER, currVboID);
glBufferData(GL_ARRAY_BUFFER, vboSize, NULL, mode.vboUsageMode);
glBufferSubData(GL_ARRAY_BUFFER, vtxOffs, vtxSize, prevVertexPointer);
glBufferSubData(GL_ARRAY_BUFFER, colOffs, colSize, prevColorPointer);
glBufferSubData(GL_ARRAY_BUFFER, nmlOffs, nmlSize, prevNormalPointer);
glBufferSubData(GL_ARRAY_BUFFER, texOffs, texSize, prevTexturePointer);

for (i = prevdb; i <= db; i++) {

pDataBlock[i].vertexIndex -= vtxDelta;
pDataBlock[i].vertexBufferID = currVboID;
pDataBlock[i].texCoordOffset = texOffs;
pDataBlock[i].normalOffset = nmlOffs;
pDataBlock[i].vertexOffset = vtxOffs;

}

pevent->rb->vertexBufferObjects = vertexBufferObjects;
pevent->rb->numVertexBufferObjects = numVertexBufferObjects;
……

Step 2 - Within the draw loop bind current buffer and set appropriate pointers (viewperf.c):

……
if (pDb->vertexBufferID != currVertexBufferID) {

glBindBuffer(GL_ARRAY_BUFFER, pDb->vertexBufferID);
if (pDb->colorOffset >= 0) {

glColorPointer(4, GL_FLOAT, 0, ( const GLvoid *) pDb->colorOffset);
}

if (pDb->normalOffset >= 0) {

glNormalPointer(GL_FLOAT, 0, ( const GLvoid *) pDb->normalOffset);

}

if (pDb->texCoordOffset >= 0) {

glTexCoordPointer(2, GL_FLOAT, 0, ( const GLvoid *) pDb->texCoordOffset);

}

if (pDb->vertexOffset >= 0) {

glVertexPointer(3, GL_FLOAT, 0, ( const GLvoid *) pDb->vertexOffset);

}
currVertexBufferID = pDb->vertexBufferID;

}

(void) pPrimitiveLoop(tb, pDb, pDb + 1);

pDb++;
……

Step 3 - Upon exit, free VBO buffers (viewperf.c):

glBindBuffer(GL_ARRAY_BUFFER, 0);
glDeleteBuffers(renderblock.numVertexBufferObjects, renderblock.vertexBufferObjects);
free(renderblock.vertexBufferObjects);
renderblock.numVertexBufferObjects = 0;
renderblock.vertexBufferObjects = NULL;

Preparing for the future

SPEC’s OpenGL Performance Characterization (SPECopc) project group, the developers of SPECviewperf, expect VBOs to be an integral part of the rendering path for future graphics-intensive applications. VBOs have been added to SPECviewperf 8.1 to enable users and vendors to begin testing performance for graphics applications that will potentially use VBOs. No performance results using VBOs will be published on the SPEC/GPC web site until VBOs become a part of applications represented by viewsets within SPECviewperf.

This document was written by Ian Williams of NVIDIA (SPECopc chair) and Evan Hart of ATI.

Standard Performance Evaluation Corporation