Efficient rendering of geometric 
          data 
          using OpenGL VBOs in SPECviewperf
        Introduction 
        The goal of SPECviewperf is to be a good predictor of graphics performance 
          for real-world applications. The testing files (viewsets) within SPECviewperf 
          generate OpenGL command streams that are similar to those used by the 
          application. SPECviewperf provides a measure of graphics subsystem performance 
          and its impact on the complete system, without the full overhead of 
          an application. 
         Given its connection to real applications, it is important that SPECviewperf 
          can provide performance measurement based on new technologies implemented 
          in those applications. An example of such a development is vertex buffer 
          objects (VBOs), which are included in OpenGL 1.5. 
         VBOs offer a way to obtain performance and flexibility benefits for 
          OpenGL applications. This white paper details some of the motivations 
          behind VBOs, as well as the specific OpenGL functions related to their 
          use. It also touches on potential performance implications and shows 
          where VBO functions are placed within the SPECviewperf test-harness 
          code. 
        Background  
        OpenGL traditionally provides two main approaches for rendering geometric 
          data – immediate mode and display lists. 
        When using immediate mode, applications send all the geometric data 
          to the graphics processor (GPU) every frame, which is advantageous in 
          situations such as modeling or animation where geometry is frequently 
          created or modified. If geometric data does not change frequently, however, 
          immediate mode can result in wasted data transfer when compared with 
          storing the same geometric data within graphics memory. 
        Because immediate mode transfers data as individual elements, such 
          as a single vertex or normal, it typically creates significant traffic 
          to and from system memory and over the CPU’s front-side bus. This 
          translates into increased CPU cycles to perform the actual transfer. 
          These effects are further compounded by greater function call overhead 
          in the API, which at a hardware level results in increased traffic and 
          CPU cycles. 
        As attributes such as colors and texture coordinates are associated 
          with vertices to improve visual quality, the problem compounds. Triangle 
          strips, triangle fans and line strips attempt to mitigate some of the 
          data transfer needs by allowing individual vertices to define a triangle/line. 
          In spite of this, however, immediate mode frequently causes data retrieval, 
          transfer and CPU bottlenecks that inhibit overall graphics performance. 
        
        As an alternative to immediate mode, OpenGL provides display lists. 
          These enable a series of graphics commands to be grouped together. This 
          gives OpenGL implementations more opportunity to process and store data 
          in ways that can improve overall graphics performance. Display lists 
          can be stored within graphics memory, for example, to avoid transfer 
          over the graphics bus. 
        Display lists also make it attractive for OpenGL implementations to 
          allow GPUs to pull data directly from system memory with DMA transfers. 
          While it is possible to transfer an individual vertex by a DMA transfer, 
          the benefits of reduced CPU cycles and front-side bus traffic are more 
          than outweighed by the setup costs involved. Display lists allow more 
          data and/or commands to be transferred in one transfer and setup. 
        Despite these benefits, display lists do have some disadvantages. In 
          some situations, geometric data changes require creating a new display 
          list. Depending on the frequency with which geometric data is updated, 
          the potential performance advantages may be outweighed by the complexities 
          of managing creation/deletion of display lists. Similarly, for best 
          performance it is assumed some OpenGL states will not change within 
          the display list. If a state does in fact change, then the benefits 
          of display lists may not apply because it forces OpenGL implementations 
          to potentially store values in system memory and/or update the GPU’s 
          settings. This would prevent commands and data from being processed 
          as a block and require CPU intervention. 
        Display lists are created by a program and issued to the OpenGL client. 
          Ultimately, however, they are processed by the GPU from a copy stored 
          by the OpenGL server. This creates a doubling of data when compared 
          with immediate mode. It also raises another issue: The size of the OpenGL 
          server copy of the display list is not visible to the OpenGL program. 
          This can cause issues when memory space is constrained. 
        As an alternative to display lists, OpenGL also implements vertex arrays. 
          These allow vertex and attribute data to be grouped and treated as a 
          block, which promotes some of the data transfer efficiencies afforded 
          by display lists. Vertex arrays also allow data such as geometry and 
          color to be interleaved, which can be convenient when creating and referencing. 
          Unfortunately, vertex arrays prohibit assuming that any individual piece 
          of data will not change. As a result, when drawing an object using vertex 
          arrays, the data in the array must be validated each time it is referenced. 
          This adds overhead into data transfer. Vertex arrays do not suffer, 
          however, from the limitation of storing two copies of all data. 
        VBOs are intended to enhance the capabilities of OpenGL by providing 
          many of the benefits of immediate mode, display lists and vertex arrays, 
          while avoiding some of the limitations. They allow data to be grouped 
          and stored efficiently like vertex arrays to promote efficient data 
          transfer. They also provide a mechanism for programs to give hints about 
          data usage patterns so that OpenGL implementations can make decisions 
          about the form in which data should be stored and its location. VBOs 
          give applications the flexibility to be able to modify data without 
          causing overhead in transfer due to validation. When combined with programmability, 
          VBOs extend OpenGL’s capabilities into new areas, such as modifying 
          vertex data with previously rendered pixel data, and render to vertex 
          array. 
        Detailed description of VBOs 
        The idea behind VBOs is to provide regions of memory (buffers) accessible 
          through identifiers. A buffer is made active through binding, following 
          the same pattern as other OpenGL entities such as display lists or textures. 
        
        VBOs provide control over the mappings and unmappings of buffer objects 
          and define the usage type of the buffers. This allows graphics drivers 
          to optimize internal memory management and choose the best type of memory 
          – such as cached/uncached system memory or graphics memory – 
          in which to store the buffers. 
        The binding operation converts each pointer in the client-state function 
          into offsets relative to the current bound buffer. As a result, the 
          bind operation turns a client-state function into a server-state function. 
          The scope of data used by client-state functions is only accessible 
          by the OpenGL client itself and other OpenGL clients are not able to 
          access that client’s data. Because the VBO mechanism changes client-state 
          functions into server-state functions, it is now possible to share VBO 
          data among various clients. As a result, OpenGL clients are able to 
          bind common buffers in the same way as textures or display lists. 
        The following is an outline of the key OpenGL calls associated with 
          VBO usage: 
        
          -  glBindBuffer: This allows client-state functions to use 
            binding buffers instead of working in absolute memory on the client 
            side. Binding the buffer #0 switches off VBO and reverts to the usual 
            client-state mode with absolute pointers. 
-  glBufferData, glBufferSubData, and glGetBufferSubData: 
            These functions control the size of the buffer data, provide usage 
            hints, and allow copying to a buffer. 
-  glMapBuffer and glUnmapBuffer: These functions 
            lock and unlock buffers, allowing data to be loaded into them or relinquishing 
            control to the server. A temporary pointer is returned as an entry 
            to the beginning of the buffer, which also maps the buffer into client 
            memory. OpenGL is responsible for how this mapping into the client’s 
            absolute memory occurs. Because of this, mapping must be done for 
            a short operation, and the pointer is not persistent and should be 
            stored for further use. 
VBOs are intended to work with the following OpenGL target objects: 
        
        
          -  Array buffers (ARRAY_BUFFER): These buffers contain 
            vertex attributes, such as vertex coordinates, texture coordinate 
            data, per vertex-color data, and normals. They can be interleaved 
            (using the stride parameter) or sequential, with one array after another 
            (write 1,000 vertices, then 1,000 normals, and so on). glVertexPointer 
            and glNormalPointer each point to the appropriate offsets. 
          
-  Element array buffers (ELEMENT_ARRAY_BUFFER): 
            This type of buffer is used mainly for the element pointer in glDraw[Range]Elements(). 
            It contains only indices of elements. 
These two targets should be set up so that the element arrays are available 
          at the same time as array buffers in glDraw[Range]Elements(). 
          The targets enable users to switch among various element buffers while 
          keeping the same vertex array buffer. This can be used to implement 
          LOD and other effects by changing the elements table while working on 
          the same database of vertices. 
        New procedures, functions and tokens 
        Usage flags 
        
          - STREAM_DRAW 
- STREAM_READ 
- STREAM_COPY 
- STATIC_DRAW 
- STATIC_READ 
- STATIC_COPY 
- DYNAMIC_DRAW 
- DYNAMIC_READ 
- DYNAMIC_COPY 
Access flags 
        
          - READ_ONLY 
- WRITE_ONLY 
- READ_WRITE 
Targets 
        
          - ARRAY_BUFFER 
- ELEMENT_ARRAY_BUFFER 
 void  BindBuffer (enum target, uint 
          buffer): 
        The BindBuffer function is used to bind a buffer ID as the 
          actual buffer to use. It switches off the use of buffers if the ID is 
          zero. 
         void  *MapBuffer (enum target, enum 
          access); 
          boolean  UnmapBuffer (enum target); 
        The function MapBuffer provides a pointer corresponding to 
          the mapped area of the current buffer object. UnmapBuffer releases 
          the mapping. 
         void  BufferData (enum target, sizeiptr 
          size, const void *data, enum usage); 
        The BufferData function can be used two ways: 
        
          - To set up the memory amount and usage for the current buffer object 
            with data set to NULL. The user can map the buffer later to set up 
            its data. 
- To allocate memory, set the usage, and copy data; typically used 
            when dealing with a static memory model. 
 void  BufferSubData (enum target, 
          intptr offset, sizeiptr size, const void *data); 
        The BufferSubData function copies data in a specific range 
          inside the buffer object. 
         void  GetBufferSubData (enum target, 
          intptrARB offset, sizeiptrARB size, void *data); 
        The GetBufferSubData function retrieves sub-data from a specific 
          range in the current buffer object. 
         void  DeleteBuffers (sizei n, const 
          uint *buffers); 
          void  GenBuffers (sizei n, uint *buffers); 
          boolean  IsBuffer (uint buffer); 
        These three functions are similar to display list/textures identifiers; 
          they can allocate, free or query identifiers for buffer objects. 
         void  GetBufferParameteriv (enum target, 
          enum pname, int *params); 
        The GetBufferParameteriv function returns various parameters 
          concerning the current buffer object. Pname can be: 
        
          - BUFFER_SIZE: Returns the size of the buffer object. 
- BUFFER_USAGE: Returns the usage of the buffer object. 
- BUFFER_ACCESS: Returns the access flag of the buffer object. 
- BUFFER_MAPPED: Indicates if this buffer is mapped. 
 void  GetBufferPointerv (enum target, 
          enum pname, void **params); 
        The GetBufferPointerv function returns the actual pointer 
          of the buffer if it has been mapped (MapBuffer). Pname can 
          only be BUFFER_MAP_POINTER for this time. 
        Tokens for Get{Boolean, Integer, Float, Double}v 
        The buffer object ID zero is reserved, and when buffer object zero 
          is bound to a given target, the commands affected by that buffer binding 
          behave normally. When a nonzero buffer ID is bound, then the pointer 
          represents an offset, and will go through VBO management. 
        These tokens show which buffers are bound as VBO offsets: 
        
          - ARRAY_BUFFER_BINDING 
- ELEMENT_ARRAY_BUFFER_BINDING 
- VERTEX_ARRAY_BUFFER_BINDING 
- NORMAL_ARRAY_BUFFER_BINDING 
- COLOR_ARRAY_BUFFER_BINDING 
- INDEX_ARRAY_BUFFER_BINDING 
- TEXTURE_COORD_ARRAY_BUFFER_BINDING 
- EDGE_FLAG_ARRAY_BUFFER_BINDING 
- SECONDARY_COLOR_ARRAY_BUFFER_BINDING 
- FOG_COORDINATE_ARRAY_BUFFER_BINDING 
- WEIGHT_ARRAY_BUFFER_BINDING 
 Token for GetVertexAttribiv: 
        When working with VBOs and vertex programs, some attributes can have 
          arbitrary meanings. An array of normals, for example, can be used to 
          store other information. Instead of using a token from the previous 
          section, the index of the attribute can be used. This token allows the 
          user to query which attribute number is being used by VBOs through an 
          offset system. 
        
          - VERTEX_ATTRIB_ARRAY_BUFFER_BINDING 
Purposes of various 
          VBO functions 
         glBufferData() 
         This function is an abstraction layer between the memory and the application. 
          Behind each buffer object is a complex memory management system. The 
          glBufferData() function looks at the size and type of the data 
          store, reserves storage, and optionally initializes the data from the 
          user’s pointer. If storage space was previously allocated for 
          this buffer, an individual implementation may choose to either reuse 
          the previous storage or discard the current storage and allocate a new 
          storage. If the data pointer specified is not NULL, the storage for 
          the buffer is initialized with size machine units (typically 
          bytes) from the data pointer. For specifics on when memory associated 
          with the buffer is freed instead of resized, please consult documentation 
          from individual GPU vendors. 
        Usage flags 
         The usage argument is a key value for helping the VBO memory manager 
          fully optimize buffers. While these values are only hints, and they 
          can be ignored by the implementation, applications are strongly encouraged 
          to provide correct usage flags. Additional implementation-specific information 
          on the interpretation of hints may be available from GPU vendors. 
        
           
            | Name of flag | Definition | 
           
            | STATIC_... |  Assumed to be a 1-to-n update-to-draw. Means the data is specified 
                once, or possibly very rarely.  | 
           
            | DYNAMIC_... |  Assumed to be an n-to-n update-to-draw. Generally, it means 
                data that is updated frequently, but is drawn multiple times per 
                update, such as any dynamic data that is updated every few frames 
                or so.  | 
           
            | STREAM_... |  Assumed to be a 1-to-1 update-to-draw. Can be thought of as 
                data that is updated about once each time it’s drawn. STREAM 
                is like DYNAMIC: Data will be changed over time. Data 
                is expected to change frequently.  | 
           
            | ..._READ_... |  Means there must be easy access to read the data. This option 
                is typically not meaningful for VBOs by themselves.  | 
           
            | ..._COPY_... |  Means _READ_ and _DRAW_ operations will be 
                used on this buffer. This option is typically not meaningful for 
                VBOs by themselves.  | 
           
            | ..._DRAW_... |  Means the buffer will be used for sending data to the GPU.  | 
        
        
        Table 1: List of usage flags 
         This combination of memory usage can help an implementation’s 
          memory manager balance between different kinds of memory, such as system, 
          uncached and video. Since different categories of memory have different 
          access characteristics for the CPU and GPU, these usage hints allow 
          the proper selection to occur. On the client side, these are not hard 
          restrictions, but suggestions that help graphics drivers decide where 
          to store the data and how to manage it. Nothing prevents creating a 
          STATIC data store and then updating it every frame. Nor is 
          there any reason the user can’t create a STREAMING data 
          store that is never modified, although such usage patterns in conflict 
          with supplied hints are strongly discouraged. 
         glBufferSubData() 
        This function gives the user a way to replace a range of data in an 
          existing buffer. It works much in the same ways as glCopyTexSubImage(). 
          An individual implementation may either interlock or queue the update 
          to ensure that all previous rendering operations from this buffer have 
          completed. 
         glBindBuffer() 
        This sets the current buffer object. All subsequent calls to set array 
          pointers will refer to this object, and all updates will occur to this 
          buffer. Binding the special buffer name to zero tells the driver not 
          to use buffer objects. 
         glMapBuffer() 
        This function maps the buffer object into the client’s memory, 
          if it is possible. The pointer returned can be both read from and written 
          to directly by the CPU, allowing arbitrary updates. To maintain the 
          proper OpenGL semantics, where operations always appear to occur in 
          order, the implementation may be required to either stall or make a 
          copy of the buffer to allow the mapping to occur, if the buffer is still 
          in use by the GPU. When the buffer cannot be mapped, the implementation 
          will return a NULL pointer. 
         glUnmapBuffer() 
        This function unmaps the buffer object from the client’s memory. 
          It returns a success code that the application should check to ensure 
          the update occurred correctly. When a failure is reported, the contents 
          of the buffer may have become undefined due to an extraordinary event 
          occurring while the buffer was mapped. In this case, the data should 
          be resubmitted by the application. 
        glVertexPointer() 
         This function sets up the offset (originally a pointer), depending 
          on the current buffer object. 
        Suggestions for efficient VBO usage 
        
         
          Keep in mind that the driver cannot guess what to do with the memory 
            pointer returned by glMapBuffer(). Will a few bytes be changed, 
            or will the whole buffer be updated? The pointer returned by glMapBuffer() 
            refers to the actual location of the data. It is possible that the 
            GPU could be working with this data, so requesting it for an update 
            will force the driver to wait for the GPU to finish its task. 
          To solve this conflict, glBufferData() can be called with 
            a NULL pointer to discard the previous buffer, or the glBufferSubData() 
            function can be used instead to specify the exact subregion. Calling 
            glMapBuffer() tells the driver that the previous data is 
            no longer valid. As a consequence, if the GPU is still working on 
            the data, there will not be a conflict, and the implementation may 
            allocate a new buffer. The glMapBuffer() function may return 
            this new pointer that can be used while the GPU is working on the 
            previous set of data. In the glBufferSubData() case, the 
            data must be updated in a contiguous block. No reading of the data 
            is allowed, so the implementation may be able to queue the update. 
          
        
        
         
          While vertex buffer objects offer great potential in the efficiency 
            of providing data to the GPU, they are often highly inefficient when 
            coupled with operations that require CPU processing. As a result, 
            feedback and selection may not perform well when combined with vertex 
            buffer objects. Additionally, building display lists from data in 
            a vertex buffer object or using glArrayElement() with vertex 
            buffer objects will typically be highly inefficient. 
        
        
          -    Utilize GPU-friendly data types and alignment 
            
 
          With vertex buffer objects, it is now the job of the GPU to directly 
            interpret the data, whereas the CPU could previously reformat it as 
            needed during submission. If the data format that is placed in a vertex 
            buffer object cannot be directly handled by the GPU, the implementation 
            may have to read the data back to the CPU for processing, which is 
            often highly inefficient. It is best to check with GPU vendors for 
            the full list of optimal formats, but most common data types are presently 
            supported, as long as the attribute is aligned on a 32-bit boundary. 
          
        
        
         
          In the function: 
          glDrawArrays (GLenum mode, GLint first, GLsizei count);
           Instead of changing glVertexPointer() to a specific offset 
            and leaving “first” to NULL, it can be more efficient 
            to change the “first” argument of glDrawArrays. 
          
        
        
          -  Use glDrawRangeElements instead of glDrawElements 
 
          Using range elements is more efficient for two reasons: 
          
            - If the specified range can fit into a 16-bit integer, the driver 
              can optimize the format of indices to pass to the GPU. It can turn 
              a 32-bit integer format into a 16-bit integer format. In this case, 
              performance doubles. 
- The range is precious information for the VBO manager, which can 
              use it to optimize its internal memory configuration.
        Implementing VBOs within the SPECviewperf test 
          harness 
        As mentioned in the introduction, the goal of SPECviewperf is to test 
          graphics hardware by delivering OpenGL command streams taken from real 
          applications. As a performance evaluation tool, SPECviewperf has to 
          be able to use VBOs in a wide variety of ways to reflect application 
          usage. 
        The usage patterns of many of the applications covered with the current 
          SPECviewperf viewsets would typically use a static data model for VBOs, 
          where the data is defined once and drawn many times. Because of this, 
          the GL_STATIC_DRAW usage hint is the default. As applications 
          adopt and use VBOs, SPECviewperf can easily accommodate different usage 
          patterns. 
        The current implementation of VBOs within SPECviewperf doesn’t 
          transfer data into or out of buffers, so the glMapBuffer and 
          glUnMapBuffer calls are not made. As VBOs become adopted and 
          implemented within applications, it is expected that SPECviewperf will 
          be modified accordingly. 
        Here are the key places where VBOs are implemented within SPECviewperf: 
        
        Step 1 - Create pointers, allocate memory, generate buffer 
          handles, and define buffer attributes (viewperf.c): 
        
         …… 
          mode.useVertexBufferObjects = 0; 
          mode.vboUsageMode = GL_STATIC_DRAW; 
          mode.vboMaxSize = 0; 
          mode.vboMaxPrims = 0; 
          …… 
         unsigned int currVboID; 
          
          int vtxSize = numVertsInVBO * sizeof 
          ( struct vector); 
          int colSize = numVertsInVBO * sizeof 
          ( struct colorvector); 
          int nmlSize = numVertsInVBO * sizeof 
          ( struct vector); 
          int texSize = numVertsInVBO * sizeof 
          ( struct texvector); 
          int vtxOffs = 0; 
          int colOffs = vtxOffs + vtxSize; 
          int nmlOffs = colOffs + colSize; 
          int texOffs = nmlOffs + nmlSize; 
          int vboSize = texOffs + texSize;
          int vtxDelta = prevVertexPointer - vertexData; 
        
         if (numVertexBufferObjects 
          >= allocVertexBufferObjects){ 
          allocVertexBufferObjects = 2 * allocVertexBufferObjects + 16; 
          vertexBufferObjects = realloc(vertexBufferObjects, allocVertexBufferObjects 
          * sizeof (GLuint)); 
         if (!vertexBufferObjects) 
          { 
          printf("Error: could not allocate memory for vertexBufferObjects\n"); 
          
          exit(0); 
         } 
          } 
         glGenBuffers(1, (GLuint *) &currVboID); 
          vertexBufferObjects[numVertexBufferObjects++] = currVboID; 
          glBindBuffer(GL_ARRAY_BUFFER, currVboID); 
          glBufferData(GL_ARRAY_BUFFER, vboSize, NULL, mode.vboUsageMode); 
          glBufferSubData(GL_ARRAY_BUFFER, vtxOffs, vtxSize, prevVertexPointer); 
          
          glBufferSubData(GL_ARRAY_BUFFER, colOffs, colSize, prevColorPointer); 
          
          glBufferSubData(GL_ARRAY_BUFFER, nmlOffs, nmlSize, prevNormalPointer); 
          
          glBufferSubData(GL_ARRAY_BUFFER, texOffs, texSize, prevTexturePointer); 
        
         for (i = prevdb; i <= 
          db; i++) { 
         
           pDataBlock[i].vertexIndex -= vtxDelta;
            pDataBlock[i].vertexBufferID = currVboID;
            pDataBlock[i].texCoordOffset = texOffs;
            pDataBlock[i].normalOffset = nmlOffs;
            pDataBlock[i].vertexOffset = vtxOffs; 
        
         } 
         pevent->rb->vertexBufferObjects = vertexBufferObjects; 
          
          pevent->rb->numVertexBufferObjects = numVertexBufferObjects; 
          …… 
         Step 2 - Within the draw loop bind current buffer and set 
          appropriate pointers (viewperf.c): 
         …… 
          if (pDb->vertexBufferID != currVertexBufferID) 
          { 
         
           glBindBuffer(GL_ARRAY_BUFFER, pDb->vertexBufferID);
            if (pDb->colorOffset >= 0) { 
           
             glColorPointer(4, GL_FLOAT, 0, ( const 
              GLvoid *) pDb->colorOffset);
              }
          
           if (pDb->normalOffset 
            >= 0) { 
           
            glNormalPointer(GL_FLOAT, 0, ( const 
              GLvoid *) pDb->normalOffset); 
          
           } 
           if (pDb->texCoordOffset 
            >= 0) { 
           
             glTexCoordPointer(2, GL_FLOAT, 0, ( const 
              GLvoid *) pDb->texCoordOffset); 
          
           } 
           if (pDb->vertexOffset 
            >= 0) { 
           
             glVertexPointer(3, GL_FLOAT, 0, ( const 
              GLvoid *) pDb->vertexOffset); 
          
           } 
            currVertexBufferID = pDb->vertexBufferID; 
        
         } 
         (void) pPrimitiveLoop(tb, pDb, pDb + 1); 
         pDb++; 
          …… 
        Step 3 - Upon exit, free VBO buffers (viewperf.c):
         glBindBuffer(GL_ARRAY_BUFFER, 0); 
          glDeleteBuffers(renderblock.numVertexBufferObjects, renderblock.vertexBufferObjects); 
          
          free(renderblock.vertexBufferObjects); 
          renderblock.numVertexBufferObjects = 0; 
          renderblock.vertexBufferObjects = NULL; 
        Preparing for the future
        SPEC’s OpenGL Performance Characterization (SPECopc) project 
          group, the developers of SPECviewperf, expect VBOs to be an integral 
          part of the rendering path for future graphics-intensive applications. 
          VBOs have been added to SPECviewperf 8.1 to enable users and vendors 
          to begin testing performance for graphics applications that will potentially 
          use VBOs. No performance results using VBOs will be published on the 
          SPEC/GPC web site until VBOs become a part of applications represented 
          by viewsets within SPECviewperf. 
        This document was written by Ian Williams of NVIDIA (SPECopc chair) 
          and Evan Hart of ATI.