Modern-er OpenGL

Best Practices for Modern OpenGL

January 2022

Introduction
Direct State Access (DSA)
Textures

Texture Binding
Texture Creation
Texture Views
Sampler Objects

Smarter Input Assembly
Compute Shaders for Postprocessing
Debugging

OpenGL Debug Callback
Graphics Debugging Tools

Future Work

Addressing Type Safety
Addressing Global State

More Info
Addendum

Introduction

It is well known that pre-modern OpenGL (basically anything before 4.2) has problems with global state and poor separation of concerns, making it more difficult to write bug-free code. Moreover, pre-modern OpenGL code is difficult to audit the correctness of due to the global state, forcing the programmer to constantly consider the nebulous scope of API commands.

Sometimes, pre-modern OpenGL does not offer a clean, idiomatic way to approach a task, so we must write code that feels hacky or suboptimal simply because there isn't a way to precisely express what we want to do (this often goes hand-in-hand with OpenGL frequently conflating functionality).

Modern OpenGL (and I mean modern, not the 12-year-old OpenGL 3.3 that tutorials call modern) solves or mitigates many of these issues. This guide meant to describe this functionality and techniques that can further help in writing robust graphics code with OpenGL.

Direct State Access (DSA)

Anyone learning OpenGL should immediately notice the extra indirection required to modify objects (also known as bind-to-edit). This model is confusing, error-prone, and outdated.

OpenGL 4.5 (actually, GL_ARB_direct_state_access) furthered society by allowing us to directly specify the name of the object we want to modify (using new functions). Here is an example of how we would update some code that gets a mapped pointer from a buffer:

// yuck!
glBindBuffer(GL_ARRAY_BUFFER, myBuffer);
void* ptr = glMapBuffer(GL_ARRAY_BUFFER, GL_READ_WRITE);

// yum!
void* ptr = glMapNamedBuffer(myBuffer, GL_READ_WRITE);

This guide shows what all the DSA functions are and how to use them (hint: it's much more straightforward than before).

With ubiquitous DSA usage, calls to glBindBuffer and glBindTexture can be entirely removed from within one's code. Calls to glBindFramebuffer and glBindVertexArray will no longer be needed at load time, but are still necessary at runtime to set rendering state.

Note that because of OpenGL's quirky create-on-bind model, you will have problems using DSA if you use any of the glGen* functions to create objects. Instead, you should use the glCreate* family of functions, which initialize objects with default values instead of merely generating a name.

Textures

Textures, like much of OpenGL, are the product of decisions that made sense long ago. Those decisions led to an API that, these days, is unergonomic at best, and error-prone at worst. Luckily, modern OpenGL offers a variety of ways to increase our sanity when using them.

Texture Binding

This may look familiar:

glActiveTexture(GL_TEXTURE0);
glBindTexture(GL_TEXTURE_2D, myTex);

In OpenGL 4.5+, we can replace it with the following:

glBindTextureUnit(0, myTex);

Exquisite.

Additionally, samplers and other opaque shader types can be given explicit binding points (as of OpenGL 4.2). The following declaration matches the binding that was made above:

layout(binding = 0) uniform sampler2D myTex;

I suggest using this feature if you want to avoid calling glUniform1i to set sampler binding points.

Texture Creation

Preferring immutable storage offers a number of benefits. First, immutable storage requires us to specify all of the memory the texture will use up front. This reduces driver work and prevents strange behavior from the programmer (different internal formats for each mip, weird mip sizes, unsized internal formats). Separating the allocation and the upload of the texture also makes code easier to read and write. We can transform our old code as so:

// old, crusty
GLuint myTex;
glGenTextures(GL_TEXTURE_2D, &myTex);
glBindTexture(GL_TEXTURE_2D, myTex);
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA8, width, height, 0, GL_RGBA, GL_UNSIGNED_BYTE, pixels);

// new, shiny
GLuint myTex;
glCreateTextures(GL_TEXTURE_2D, 1, &myTex);
glTextureStorage2D(myTex, 1, GL_RGBA8, width, height);
glTextureSubImage2D(myTex, 0, 0, 0, width, height, GL_UNSIGNED_BYTE, GL_RGBA, pixels);

Texture Views

Available from OpenGL 4.3, texture views basically allow us to reinterpret the memory of another texture in some way. A few things texture views can achieve are:

Treating a single face of a cubemap as a 2D texture
Treating a single slice of an array texture as its own (non-array) texture
Treating a single mip of a texture as its own texture
Treating a texture as another texture with a different (same size) internal format

This can be particularly useful for cases where we are finely manipulating textures, like in a bloom downsampling pass, where we may want to treat individual mip levels of a texture as unique 2D textures.

It should be noted that views of textures can only be made if the texture's memory is immutable (see the previous subsection). Here is an example of making a 2D view of the 50th slice of a 2D array texture:

GLuint myTex;
glCreateTextures(GL_TEXTURE_2D_ARRAY, 1, &myTex);
glTextureStorage3D(myTex, 1, GL_R32F, 128, 128, 128);

GLuint myTexView;
glGenTextures(1, &myTexView);
glTextureView(myTexView, GL_TEXTURE_2D, myTex, GL_R32F, 0, 1, 49, 1);

I prefer to use texture views over raw textures as a means of unifying my texture abstraction.

Note that texture views are quite cheap to create as they don't require a device memory allocation.

Sampler Objects

Introduced in OpenGL 3.3, sampler objects aren't a very new feature. However, tutorials rarely seem to use them despite the advantages they bring.

Sampler objects are used to describe a sampler state, as one might suspect by the name. This includes the following state:

Min, mag, and mipmap filtering modes
Anisotropic filtering level
LOD range
LOD bias
Comparison operator (for shadow samplers)
Wrap mode
Border color
Seamless cubemap filtering mode

Note that while seamless cubemap filtering is on this list, it can also be globally forced with glEnable(GL_TEXTURE_CUBE_MAP_SEAMLESS);.

Samplers can be bound to texture units, and override the texture's built-in sampler when done so.

glBindTextureUnit(index, textureID);
glBindSampler(index, samplerID);

Unless you have a very weird renderer, your code will benefit from textures being sampler-agnostic. Despite this, it's a common idiom in OpenGL to create a texture and immediately set some common sampler state like in the following:

GLuint myTex;
glGenTextures(1, &myTex);
glBindTexture(GL_TEXTURE_2D, myTex);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_LINEAR_MIPMAP_LINEAR);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_LINEAR);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_S, GL_REPEAT);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_T, GL_REPEAT);

This is an issue because it conflates a texture with how it will be used. If we want to sample the same texture with two different samplers in the same frame, we must remember to set all the affected state in each place. With sampler objects, you just create one for each pass and bind it before drawing- no runtime sampler state changes required!

The other issue is that having a sampler state for each texture leads to code duplication and inflexibility. If we want to change a sampler parameter (such as anisotropic filtering) for objects rendered in a specific pass, each affected texture must be modified. With sampler objects, all you have to do is modify the respective ones used in the pass. If you want multiple passes with unique sampler parameters (which you may very well want in your engine), then you have to set the sampler state for every texture used in the pass prior to drawing.

Sampler objects allow us to write simpler, more readable, and more pure code (and thus fewer bugs):

GLuint samplerFoo;
GLuint samplerBar;
GLuint myTex;

void InitFoo()
{
  glCreateSamplers(1, &samplerFoo);
  glSamplerParameteri(samplerFoo, ..., ...);
}

void InitBar()
{
  glCreateSamplers(1, &samplerBar);
  glSamplerParameteri(samplerBar, ..., ...);
}

void RenderFoo()
{
  glBindSampler(0, samplerFoo);
  glBindTextureUnit(0, myTex);
  // Draw some stuff
}

void RenderBar()
{
  glBindSampler(0, samplerBar);
  glBindTextureUnit(0, myTex);
  // Draw some other stuff
}

The only potential for error is in forgetting to unbind a sampler when using both discrete samplers objects and built-in texture samplers. Therefore, when introducing sampler objects, I would suggest replacing all instances of glBindTextureUnit (and glActiveTexture + glBindTexture) with something similar to the following:

void BindTextureSampler(GLuint unit, GLuint texture, GLuint sampler)
{
  glBindTextureUnit(unit, texture);
  glBindSampler(unit, sampler)
}

This enforces the use of samplers and makes it more difficult to accidentally bind a texture without a sampler (or vice versa).

Note that samplers are also very cheap to create and destroy (at least they were on the Nvidia and AMD drivers I tested. If you're unsure, you can always create a hash map to cache samplers since you won't need more than a few).

Smarter Input Assembly

VAOs are a major source of confusion among beginners. Not only that, they also don't fit cleanly into a renderer. We often wish to draw several meshes with the same shader and vertex format. However, pre-4.3 OpenGL only offers the following ways to reasonably accomplish this:

Have one VAO per object or mesh and bind it when drawing that object
Have one VAO that is fully respecified each time we wish to draw another mesh
Have a single VAO and vertex+index buffer per vertex attribute layout, stuff all meshes into those, and draw with glDrawElementsBaseVertex

Fortunately, OpenGL 4.3 adds a way to separate the vertex attribute format from its buffer binding. This eliminates a source of error and makes the code easier to read. This also allows us to have a single VAO per vertex layout (or shader). The initialization code for them will not have to reference a single buffer! At runtime, we can choose which buffer(s) to use with glVertexArrayVertexBuffer and glVertexArrayElementBuffer.

I will once again shill fendevel's sweet modern OpenGL guide, which explains how these functions work in action.

Compute Shaders for Postprocessing

Typically, full screen passes and other passes that operate on pixels of a texture are implemented by rasterizing a full screen quad or triangle, then doing the algorithmic work inside of a fragment shader. Before such passes, you need to make sure the OpenGL state can faciliate this: set the depth test to GL_ALWAYS, disable depth writes, bind a VAO and a shader that is set up with a simple vertex shader and the fragment shader you care about, bind a framebuffer with the target texture, and the viewport is set. That isn't even all the state that can cause your simple quad to not appear correctly!

Beginning in OpenGL 4.3, compute shaders (due to their general-purpose nature) allow us to cleanly express "do some operation on each pixel of a texture", which is frequent in post processing pipelines. On the host (CPU) side, we just need to do the following unique steps before dispatching (executing) the shader:

Calculate the number of work groups we want to dispatch
Issue a memory barrier after the dispatch to make writes visible to future passes

Everything else (binding the program, specifying input and output) proceeds as usual. We can now write a function that is more or less isolated from global state:

void ApplyEffect(GLuint sourceTexture, GLuint targetTexture, GLuint sourceSampler, 
                 GLuint computeProgram, int texWidth, int texHeight)
{
  glUseProgram(computeProgram);
  
  // Shader input
  glProgramUniform2i(computeProgram, texWidth, texHeight);
  glBindSampler(0, sourceSampler);
  glBindTextureUnit(0, sourceTexture);

  // Shader output
  glBindImageTextures(0, 1, &targetTexture); // this function is in OpenGL 4.4, use glBindImageTexture in older versions

  // Hard-coded local_size, could be reflected from the shader at load time
  const int local_size = 16;
  const int numGroupsX = (texWidth + local_size - 1) / local_size;
  const int numGroupsY = (texHeight + local_size - 1) / local_size;
  glDispatchCompute(numGroupsX, numGroupsY, 1);

  // Issue overkill barrier to ensure all writes from this dispatch are visible to every potential consumer
  // Ideally, you would use as few barrier bits as possible and put it closer to where the data is actually consumed
  glMemoryBarrier(GL_ALL_BARRIER_BITS);
}

On the GPU side, there are a few more things we need to do given that we can't use automatic interpolation to assign each pixel a UV. Pay close attention to the comments.

#version 440 core

// Input
layout(binding = 0) uniform sampler2D s_source;
uniform ivec2 u_texDim;

// Output
layout(binding = 0) uniform writeonly image2D i_target;

layout(local_size_x = 16, local_size_y = 16) in; // hardcoded-but-sane work group size
void main()
{
  // Get global ID and return if out of bounds (we can only dispatch threads at work group granularity)
  ivec2 gid = ivec2(gl_GlobalInvocationID.xy);
  if (any(greaterThanEqual(gid, u_texDim)))
    return;

  // Calculate the UV of the center of a pixel using this thread's global ID
  vec2 uv = (vec2(gid) + 0.5) / u_texDim;

  // The "meat" of the shader (where the effect is implemented)
  vec4 mySample = textureLod(s_source, uv, 0);
  vec4 finalColor = // do some operation on mySample

  imageStore(i_target, gid, finalColor);
}

Compute shaders offer the ability to perform lower-level optimizations, but describing them would be a series of its own. The Khronos OpenGL wiki has a nice reference for using compute shaders in OpenGL. For GPGPU techniques and info on GPU hardware, there exists plenty of online resources. I would recommend taking a look at the Learn section of GPUopen for starters. A great crash-course in compute shaders and low-level information is available here on YouTube.

One disadvantage of using compute shaders is that we can no longer reliably use non-Lod, non-Grad texture sampling functions. Why? According to the GLSL spec, implicit derivatives are undefined in non-fragment stages. In other words, it's extremely easy to invoke undefined behavior when using a function as basic as texture in a compute shader. For this reason, all texture fetches should be ones with "Lod" or "Grad" in the name (textureLod should cover most of your bases). texelFetch and imageLoad are unaffected as they retrieve texels without filtering.

Something you may have noticed in the compute shader example is a notion of "work groups" and "local size" or "work group size". These concepts relate to how work is batched in a compute dispatch. I won't get into them here as they are explained better in the links above, but it's important to understand these values and good values for them in different scenarios.

Debugging

OpenGL Debug Callback

This doesn't need a lengthy explanation, as other guides have covered this extensively. I will once again link fendevel's guide to show you how to use glDebugMessageCallback. This unilaterally surpasses GL_CHECK macros and anything involving glGetError in usability and helpfulness.

Graphics Debugging Tools

These aren't a part of OpenGL, but I will nonetheless briefly cover them as it feels important enough.

Modern frame debugging tools give us a comprehensive view of our application's graphics API state and all of our API resources at any point during a frame. This can be useful any time we are wondering why something isn't rendering correctly (which is often, to be frank).

There are a few premiere tools to choose from depending on your hardware and usage of OpenGL:

RenderDoc: a simple, yet powerful cross-platform debugger for core OpenGL
Nsight Graphics: a powerful debugger and profiler for Nvidia GPUs. Not as easy to use as RenderDoc, but supports some OpenGL extensions as well as profiling
Intel GPA: admittedly, I know very little about this tool. It's a cross-platform graphics debugger with profiling capabilities, but I assume profiling is restricted to Intel iGPUs

Note: a common issue beginners encounter with using graphics debuggers is that the debugger will assume a different working directory than what the programmer expects. This is often caused by IDEs (particularly Visual Studio) setting the working directory to be the same as the project directory, which causes the application to fail when it is unable to loaded certain files (like shaders). On all debuggers, there is an option adjacent to the application path to specify a working directory.

If you aren't already, start using at least one graphics debugger. It will save you from many hours of staring at code and wondering why it isn't working. It's no different than CPU debugging in that regard.

Future Work

Although modern OpenGL addresses many issues had with its previous iterations, there still exist some "features" that can cause issues for the average user.

Unsigned integers for objects (poor type safety)
Global pipeline state
Global binds

It's good to be aware of these so we can build better abstractions and write safer code.

Addressing Type Safety

A simple and effective way to address the lack of type safety is to make type wrappers. Even a simple struct containing an unsigned int will stop the majority of cases where object types are confused.

struct Texture { GLuint id; };
struct Buffer { GLuint id; };

void RenderMesh(Texture tex, Buffer buf) { ... }

But why stop at simple type wrappers when we have expressive languages? DSA functions can be trivially abstracted as class methods. These classes can also be used to automatically clean up API objects at the end of their lifetime, given copy and move semantics are carefully considered.

Addressing Global State

Addressing the global state is a much more difficult problem. Essentially, we want to constrain certain operations (glDraw* calls in particular) to be called only in scopes in which we have explicitly defined. More simply, this means we need to create something similar to Vulkan's vkPipeline to hold some pipeline state, and vkCmdBeginRenderPass and vkCmdEndRenderPass to specify a scope in which we wish to use that state. Then, we need to require certain commands (draw commands in particular) to take place inside of that scope.

I drafted how it may look to use an API that implements this paradigm:

// Encapsulates the whole graphics pipeline- vertex input state, shaders, blend state, and more
GraphicsPipeline graphicsPipeline = ...;

// Describes render targets and the viewport(s)
RenderPassInfo renderPass = ...;
BeginRenderPass(renderPass);
  BindGraphicsPipeline(graphicsPipeline);
  BindVertexBuffer(...);
  DrawElements(...); // Okay, we are inside a render pass and a pipeline has been bound
EndRenderPass();

DrawElements(...);   // Assert! We are outside a render pass

If the wrapper is correctly constructed, it should be impossible to leak state. It will also become much easier and more efficient to deduplicate state as it can only be set in a few places (calls to BeginRenderPass and Bind*Pipeline).

More Info

The history of OpenGL is great for browsing and discovering new features.

Addendum

In the time since this post was written, I have implemented this vision in my new library Fwog. There are several examples which you can peruse (at your leisure) to see code that follows these guidelines closely.

⬅️ ➡️