Junior Graphics Engineer Interview Questions

Junior graphics engineer interviews require a lot more domain-specific knowledge than your typical junior software engineer interviews. I've failed many interviews because I didn't fully understand common graphics interview topics. So I made this page to include some common interview topics I encountered during my interview process. I'm not going to go super in depth on all topics on this page, but I will link to other resources that do go in depth. Not all these topics will come up, but it's good to have familiarity with all of the topics. If you're interested about my experience applying for graphics jobs, you can read about that here.

Math

Powers of 2

Yes, really. I've been directly asked about them in at least 2¹ interviews.

I just know that 2⁴ = 16, 2⁸ = 256, and that I can multiply or divide by 2 to get whatever specific power of 2 I get asked to compute.

Powers of 2 (Wikipedia)

Hexadecimal

Like binary, but with 16 numbers.

Hexadecimal (Wikipedia)

Two's Compliment

To create a negative number in two's compliment, determine the binary representation of the positive number, flip the bits, and add 1.

Two's compliment (Wikipedia)

Matrices

Rotation matrix properties

Each column vector is normalized
Dot product of 2 unique column vectors is 0
All column vectors are orthonormal to each other, form an orthonormal basis
Inverse matrix is the transpose

Model matrix is model -> world space

View matrix is world -> camera space

Projection matrix applies perspective correction to the view frustum

Multiplication order matters, typically we multiply by scale, then rotation, then translation.

Rotation matrix (Wikipedia)
Tutorial 3 : Matrices (OpenGL-tutorial)

Ray Intersections

A ray is defined at x = dt + o, where x is position, d is direction, t is time/distance, o is origin. d and o are normalized.

Sphere

r² = dot(x - p, x - p)

r is the radius of the sphere
p is the center

r² = dot(dt + o - p, dt + o - p)
r² = dot(dt + R, dt + R)

R = o - p

r² = d² * t² + 2dRt + R²
0 = d² * t² + 2d(o - p)t + (o - p)² - r²
0 = At² + Bt + C

A = dot(d, d)
B = 2 * dot(d, o - p)
C = dot(o - p, o - p) - r²

Use quadratic equation to solve for real solutions of t.

Two solutions: two intersection points on the sphere, usually take the closer point
One solution: ray is tangent to the sphere
Zero solutions: no intersection point

Plane

0 = dot(x - p, n)

p is a point on the plane
n is the plane's normal

0 = dot(dt + o - p, n)
0 = dot(d, n) * t + dot(o - p, n)
-dot(o - p, n) / dot(d, n) = t

t is negative: the ray is facing away from the plane
dot(d, n) = 0: ray and plane are parallel

Ray-Sphere Intersection (Scratchapixel)
Ray-triangle intersection (Brian Curless)

Dot/Cross products

Dot product

The angle between two vectors
dot(a, b) = length(a) * length(b) * cos(theta)
For normalized vectors, the range of dot product is -1 to 1

dot = 1: parallel facing same way (0 degrees)
dot = 0: normal/orthogonal (90 degrees)
dot = -1: parallel, facing opposite (180 degrees)
The dot product between two parallel vectors is -1 or 1. (Most people just say 1 apparently)

Dot product is used in the rendering equation, BRDFs, phong shading, determining angle between two vectors.

Cross product

AxB = vector that is orthogonal to A and B.
AxB = -(BxA) = (-A)xB
A.(AxB) = B.(AxB) = 0
length(AxB) = area of the parallelogram created by A and B
Cross product is used to generate a camera basis, or to create a normal vector for a plane from 3 points.

Quaternions

Way to represent rotation with 4D numbers.

Quaternions (W. Randolph Franklin)

Euler Angles

Simple way to represent rotation. Can run into Gimbal lock, better to use quaternions.

Euler Angles (Wolfram)
Gimbal lock (Wikipedia)

Barycentric Coordinates

Coordinate system where you represent a point on a triangle by the weighted sum of the triangle's vertices.

Barycentric Coordinates (Wikipedia)

Rendering Equation

For each input direction wi, calculate the light absorbed (irradiance) by the surface along wi (Li * wi·n), then multiply that value by the amount of the absorbed light that exits the surface (radiance) in the direction of wo (fr). Then, if the surface itself emits light, calculate the amount of radiance that leaves the surface along wo (Le) and add it to the previous sum. This is equal to the total amount of radiance that leaves the surface along wo (Lo).

Why is irradiance = Li * wi·n? Iradiance is flux per unit area. The radiance (Li) is the flux. To understand how the dot product give us 'per unit area', lets think about a flashlight. If you hold a flashlight directly above a table, a flashlight will form a circle on the table. As you tilt the flashlight so that it is at a 45 degree angle to the table, the circle will turn into an ellipse. The radiance in these two cases is the same, but the area covered by the light is more in the 45 degree case, so the irradiance is smaller. This effect is modeled with the dot product term. You can see this effect visually below.

Rendering equation is impossible to solve exactly, so it has to be approximated. The basic rendering equation does not capture some rendering effects like subsurface scattering, transmission, and volumetric effects. However, the rendering equation can be modified to accomodate these effects by integrating over the whole sphere instead of the hemisphere.

Rendering Equation (Wikipedia)

BXDF

Bidirectional X Distribution function, where X is

Reflectance
Transmission
Surface
Subsurface Scattering

Takes in two inputs, light direction into a surface (wi), light direction out of the surface (wo). The output of a BXDF is the ratio of radiance to irradiance for the surface. You can also think of it as the amount of light absorbed along wi that is reflected out along wo, or the % chance that a ray coming in along wi will reflect out along wo.

Properties of BXDFs

All outputs are of the BXDF are non-negaive.
Helmholtz reciprocity, BXDF(wi, wo) = BXDF(wo, wi)
Conserves energy, integral of the BRDF is 1.

Types of BXDFs

Lambertian
Disney Diffuse
Blinn-Phong
Cook-Torrance

BRDF (Wikipedia)
Blinn-Phong (Wikipedia)
Cook-Torrance (Wikipedia)
Lambertian (Sakib Saikia)
Disney Diffuse (Brent Burley)

Sampling

In graphics, we have to approximate things, like the rendering equation. We use various sampling techniques to calculate the approximations. We are trying to avoid aliasing/jagged edges in the images we render, sampling helps to do that.

Sampling Techniques (Wadii Bellamine)
Sampling, Aliasing, & Mipmaps (Barb Cutler)

Importance Sampling

The reason that importance sampling is so important is due to the following observation: whether a given sample contributes 50% or 0.1% to the final sum, the time it takes to evaluate the function at the two sample points is the same. Because of this, we should spend more time evaluating areas of the interval that contribute significantly to the final sum, and less time on areas that do not contribute much.

Importance Sampling (Pharr, Jakob, Humphreys)
Intro to Sampling (made by me!)

Graphics Pipeline

Pipeline

Vertex shader

Runs per vertex/point in the mesh. Typically transform mesh vertices from model space to screen space. Vertex shaders predominently use the GPU's matrix/vector multiplication units.

Optional Tessellation shader

Used to create subdivided/fine detailed geometry within an input triangle patch.

Optional Geometry shader

Used to create new geometry, like shadow volumes.

Clipping

Remove vertices that are outside of the 1x1x1 cube, add new vertices if neccesary.

Primative Assembly

Turn vertices into shapes (triangles, points, lines)

Face culling

Front face/back face culling. If triangles face away from the screen, they are likely behind triangles that face the camera, so we should ignore them. This means triangles have a winding order (CW or CCW).

Rasterization

Takes the triangles and figures out which pixels are covered by the triangles. If doing MSAA, might take multiple samples per pixel to reduce jaggies.

Optional Early depth test

Depth test fragment against previously written depth for the pixel. If current fragment will be behind what is currently there, ignore this fragment to save computation time.

Fragment shader

Take interpolated vertex output data, use it as input to calculate the pixel's color. Typically lighting calculations are done here, and fragment shaders use a lot of texture lookups and ALUs.

depth/stencil test/alpha blending

Write to render target if this fragment will be in front of what is currently there. Same for stencil testing, also need to do alpha blending. Typically, depth and stencil occur before fragment shading, but it's possible to change the depth of a fragment in a fragment shader which would force depth test afterwards.

Rendering Pipeline Overview (Khronos)

TBDR

Tile Based Deferred Rendering. Vertices are binned into tiles, then each tile is rendered separately, which reduces memory bandwidth between pipeline stages.

Harness Apple GPUs with Metal (Apple)

CPU/GPU Architecture

Stack vs Heap

Stack

A sequential block of memory, basically an array.
There is a stack pointer that points to the top of the stack.
Stack allocation is very quick, because adding new memory to the stack just requires incrementing the stack pointer. Deallocation is the same.
Meant for 'temporary variables' that are locally scoped within a function that won't be needed after the function returns.
Compiler can make some optimizations to preallocate all stack memory for a function when it is called.

Heap

A larger group of memory than the stack.
In the heap, the OS basically finds spot in the heap to fit the data, which can be a challenge for large data structures.
This can lead to gaps of unused memory, memory fragmentation.
It takes longer to allocate and deallocate memory in the heap because of this.
Data referenced by pointers is typically stored in the heap, pointers are usually on the stack.
Heap is for data that needs to persist beyond the lifetime of a function (pointers).
Heap is bigger than stack.

Stack vs Heap Memory Allocation (GeeksforGeeks)

Cache, Memory, Cache Line

When executing a function, the assembly instructions and data for that function are usually stored sequentially. In order to speed up execution, the OS will prefetch a section of code instructions and data surrounding the current instruction location and store it in the cache, which is a very small, very fast bit of memory that is close to the CPU. While executing the function, the CPU will check the cache to see if the next instruction/data is in the cache. If it is, there is no need to go back to main memory to fetch the data. However, if it is not in the cache, this is a cache miss, and the data must be fetched from a lower cache level or main memory. The cache line is the amount of data that is read into the cache aat once. Since the cache relies on things being sequential, data structures that aren't contiguous can have many cache misses, stuff like graphs, trees, linked lists. Arrays have relatively fewer cache misses, since data is stored sequentially. There are multiple levels of caches, L1 is the fastest.

Why software developers should care about CPU caches (EventHelix)
Understanding GPU caches (Rastergrid)

GPU Architecture

Thread - single invocation of a shader

SIMD group/Warp - a group of 32 threads. This size is fixed by the GPU. In a SIMD group, all threads are executed in parallel in at once on a single GPU core.

Threadgroup/Wavefront - A group of threads that are dispatched to a GPU. The threadgroup size is typically set by the programmer.

Thread masking - GPUs execute SIMD groups in parallel and in lock step. As an example, if there is an add instruction in a shader, the GPU will fetch the inputs to the add instruction all at once, and execute the add instruction all at once, and write the results all at once. This is very efficient, but causes an issue for threads with if statements and loops (divergent threads). This is because some threads will execute different instructions while other threads need to execute other instructions. To solve this, the GPU will execute both branches of the if statement for each thread. The GPU will use thread masking to ignore the writes to the registers for the threads that are not supposed to be executing the instructions. This is why if statements should generally be avoided in GPU code if possible.

Command Buffer - Instructions for a GPU draw call are stored in a command buffer. Stuff like pipeline state, buffer bindings, rendering mode (triangles, points, lines). OpenGL/WebGL do not expose command buffers through their APIs, but they are first class citizens in Metal/Vulkan/other modern APIs.

Command Queue - Command buffers are submitted to the command queue, which holds the command buffers for the draw calls.

GPU Fundamentals (Jeff Larkin)
What's up with my branch on GPU? (Anton Schrein)
Command Organization and Execution Model (Apple)

Rendering Techniques

Physically Based Rendering

Physically Based Rendering, a way to realistically compute the way light interacts with materials. PBR uses BRDFs to approximate the rendering equation, and is generally photorealistic. PBR models typically have a diffuse/albedo component, a specular/gloss component, a roughness, a metallness, and emmission components. Not all of the components correspond to exact photorealistic parameters, but the parameters are meant to be tweakable by artists.

Physically based rendering (Wikipedia)

Forward Rendering

Typical way that the graphics pipeline works. Have some triangles, put them through the vertex shader, compute lighting in fragment shader, then write to frame buffer. You might use a depth buffer to stop a fragment from being drawn behind a pixel that has already been drawn.

Forward Rendering vs. Deferred Rendering (Brent Owens)

Deferred Rendering

While forward rendering works ok, it does not prevent the case where you compute lighting on a fragment that will later be overwritten by another fragment that is closer to the camera. This means that your lighting calculations for the first fragment are wasted (overdraw), which is inefficient. Deferred rendering tries to solve this by deferring lighting to a second compute pass after the triangles have all been rasterized. In deferred rendering, during the fragment shader, the shader writes the various material components to a G-buffer, which is a group of textures containing diffuse/specular/normal/depth information about the triangles. After rasterization, a second compute pass takes the G-buffer and computes lighting. This way lighting is only calculated on pixels that appear on screen.

Deferred rendering has some drawbacks, it uses significantly more texture memory than forward rendering, it can't handle transparent materials well, and it can be difficult to do MSAA with it.

Deferred Shading (Wikipedia)

Visibility Buffer Rendering

Visibility buffer rendering solves some issues with deferred rendering. In visibility buffer rendering, the fragment shader writes the triangle/primitive id and draw call id to the visibility buffer. This means you only need a depth + visibility buffer, no G-buffer. Then, in the lighting compute pass, the primitive id and draw call id are used to fetch the MVP matrices and material information from a material buffer, which is then used for lighting.

Triangle Visibility Buffer (Wolfgang Engel)

Shadow Mapping

Shadow mapping is a real time shadow algorithm. A depth buffer is rendered from the camera's point of view, and from the light's point of view. Then, pairs of depth values are compared between the images to determine if the pixels should be in light or shadow. Cascaded shadow mapping creates multiple depth buffers from the light's point of view for different depth ranges away from the camera to improve the resolution of the shadows.

Shadow Mapping (Learn OpenGL)
Cascaded Shadow Mapping (Microsoft)

Mipmapping

Mipmapping is the process of creating lower resolution version of textures. With each mip level, the resolution is cut in half (pixel count cut into a quarter). The new mipmap levels will take up 33% more memory of than original texture. Having mipmaps can reduce Moiré patterns, and can save on texture memory by loading a lower resolution version of a texture if that texture appears far away from the camera.

Mipmap (Wikipedia)

Tone Mapping

A way to approximate HDR content on an low dynamic range screen.

Tone Mapping (Wikipedia)

Bloom

A way to approximate the washed out/glow effect that bright lights cause on camera sensors.

Bloom (Wikipedia)

Ray Tracing

A technique to create images by tracing light paths through a scene.

Ray tracing (Wikipedia)

C++/OS stuff

Virtual Functions

Polymorphic classes are allowed to have virtual functions, which allow the child classes to implement different versions of the function for each object. The program will determine which virtual function to execute for the object at runtime. classes with virtual functions will get an additional void* vptr member (increasing the object size) pointing to the vtable. The vtable is essentially a list of the object's virtual function address. Pure virtual functions are marked with = 0 and they must be implemented by the subclass.

Virtual Functions and Runtime Polymorphism in C++ (GeeksforGeeks)
Virtual Function in C++ (GeeksforGeeks)

Pointer vs Reference

Pointer

Pointer is int*
Can point to a single object or an array of objects
Could be null, could be void*

Reference

Reference is int&
Only points to a single object
Will not be null, will not be void&

Pointer vs Reference (GeeksforGeeks)

Templated Classes

Templated classes allow you to write generic code/container classes. However, each new use of the class needs to be compiled individually, which can increase compile time and binary size. They can also be annoying to debug on old versions of C++.

Templates in C++ (GeeksforGeeks)

Static

A static variable in a class is a shared between all objects of that class, they can all access and modify it.

Static Keyword in C++ (GeeksforGeeks)

Size, Padding, Alignment

The size of an object is important, and there is a difference between the padding, alignment, and size of an object. An empty object without a virtual function will have a size of 1 byte, with a virtual function is 4 bytes.

Structure Member Alignment, Padding and Data Packing (GeeksforGeeks)

Mutex and Semaphore

Used for multithreaded programming and locking to prevent race conditions. Mutexes generally allow one thread to access a resource, while a semaphore can allow multiple threads to access a resource.

Mutex vs Semaphore (GeeksforGeeks)

Paging

Paging allows you to have 'contiguous' memory that is not actually contiguous. This is nice for heap memory, which can become fragmented.

Paging in Operating System (GeeksforGeeks)

Data Structures/Algorithms

DFS/BFS

Depth First Search and Breadth First Search. The DFS/BFS algorithms are pretty simple, and they come up occasionally. The iterative cases are very similar to each other. The iterative DFS is usually better than recursive, since large trees could cause the program to run out of stack memory.

                     
                         //The main difference between dfsIterative and bfsIterative is one
 //function uses a stack while the other uses a queue.
 void dfsIterative(TreeNode* root) {
     if (root == NULL) {
         return;
     }
     stack s;
     s.push(root);
 
     while (!s.empty()) {
         TreeNode* n = s.pop();
         process(n);
         for (TreeNode* child: n->children) {
             if (child != NULL) {
                 s.push(child);
             }
         }
     }
 }
 
 void bfsIterative(TreeNode* root) {
     if (root == NULL) {
         return;
     }     
     queue q;
     q.enqueue(root);
 
     while (!q.empty()) {
         TreeNode* n = q.dequeue();
         process(n);
         for (TreeNode* child: n->children) {
             if (child != NULL) {
                 q.enqueue(child);
             }
         }
     }
 }
 
 //DFS can also be done recursive
 void dfsRecursive(TreeNode* root) {
     if (root == NULL) {
         return;
     }
 
     process(root);
     for (TreeNode* child: root->children) {
         dfsRecursive(child);
     }
 } 
                

BFS vs DFS for Binary Tree (GeeksforGeeks)

Reverse Linked List

Been asked this question twice, its pretty simple to do in linear time.

                     
                         ListNode* reverse(ListNode* head) {
     ListNode* currentNode = head;
     ListNode* nextNode = head->next;
     currentNode->next = NULL;
 
     while (nextNode != NULL) {
         ListNode* newNextNode = nextNode->next;
         nextNode->next = currentNode;
         currentNode = nextNode;
         nextNode = newNextNode;
     }
     
     //new head of the list
     return currentNode;
 }
                    
                

Reverse a linked list (GeeksforGeeks)

Spacial Data Structures

Used to query spacial data in sub-linear time. Still takes linear time to build spacial data structures.

k-d tree - partition data along alternating dimensions

Bounding Volume Hierarchy - Create bounding boxes around objects, then more bounding boxes around groups of objects

Binary Space Parition - Partition data using planes

Octree - tree where each node has 8 children

k-d tree (Wikipedia)
BVH (Wikipedia)
BSP (Wikipedia)
Octree (Wikipedia)

Linked List vs Arrays

Linked List

A chain of nodes that contain pointers to the next and previous nodes in the list.
Very quick to add, remove, and insert elements into the list.
Generally takes linear time to find the nth element in the list.
Won't have the best cache performance, since list nodes aren't neccesarily next to each other in memory.

Array

Collection of elements in a continous block of memory.
Very quick to index into the array to the nth element.
Can't resize the array without creating a new array and copying data over.
Generally takes linear time to remove an element from an array.
Typically has good cache performance since data is contiguous.

Linked List vs Array (GeeksforGeeks)

Other stuff

Cloth Simulation

Common cloth simulation methods use mass/spring. A bunch of cloth vertices are connected by springs that keep the vertices in place. There are structural springs, which are for adjacent vertices in the face. Shear vertices are for non-adjacent vertices in a face. Flexion/bend vertices are 'two hops' away.

Cloth Simulation (Zhen Wei)
Introduction to Cloth (Blender)

Subdivision Surfaces

Process of smoothing out a surface and adding more geometry. Most common technique is Catmull-Clark. Typically we use half-edge data structures to represent meshes that we want to subdivide.

Subdivision Surface (Wikipedia)
Catmull-Clark (Wikipedia)

Fluid Simulation

Fluid simulation involves solving the Navier-Stokes equation, which model forces like gravity, pressure, viscosity, etc. Two main types of fluid: Eulerian and Lagrangian. Lagrangian is about tracking individual particles in the simulation, while Eulerian is about tracking the flow of particles through a specific location, rather than individual particles. Lagrangian is a particle based approach (mesh free), while Eulerian is a grid/mesh based approach.

Navier-Stokes & Flow Simulation (Barb Culter)
Eulerian vs Lagangian (Wikipedia)
Navier-Stokes equations (Wikipedia)

Integration Methods

Explicit Euler, Implicit Euler, and Runge-Kutta (RK4) are common integration methods. Euler is typically unstable on forces that vary with time/distance/position. RK4 is more stable, but it's more complicated.

Cloth Simulation (Zhen Wei)

Apple Silicon

If you want to work for Apple (or are just interested), you'll want to be familiar with Apple Silicon.

Tailor your Metal apps for Apple M1 (Apple)
Bring your Metal app to Apple silicon Macs (Apple)
Optimize Metal Performance for Apple silicon Macs (Apple)
Discover Metal enhancements for A14 Bionic (Apple)
Modern Rendering with Metal (Apple)

Node-Based Graphs

Lots of rendering/VFX stuff uses node graphs to represent shaders/render passes/materials/etc.

Shader Graph (Unity)
Visual Effect Graph (Unity)
Organizing GPU Work with Directed Acyclic Graphs (Pavlo Muratov)
Render graphs (Apoorva Joshi)
Render Graph (Simon's tech blog)