Junior Graphics Engineer Interview Questions
Junior graphics engineer interviews require a lot more domain-specific knowledge than your typical junior software engineer interviews. I've failed many interviews because I didn't fully understand common graphics interview topics. So I made this page to include some common interview topics I encountered during my interview process. I'm not going to go super in depth on all topics on this page, but I will link to other resources that do go in depth. Not all these topics will come up, but it's good to have familiarity with all of the topics. If you're interested about my experience applying for graphics jobs, you can read about that here.
Math
Powers of 2
Yes, really. I've been directly asked about them in at least 21 interviews.
I just know that 24 = 16, 28 = 256, and that I can multiply or divide by 2 to get whatever specific power of 2 I get asked to compute.
Hexadecimal
Like binary, but with 16 numbers.
Two's Compliment
To create a negative number in two's compliment, determine the binary representation of the positive number, flip the bits, and add 1.
Matrices
Rotation matrix properties
- Each column vector is normalized
- Dot product of 2 unique column vectors is 0
- All column vectors are orthonormal to each other, form an orthonormal basis
- Inverse matrix is the transpose
Model matrix is model -> world space
View matrix is world -> camera space
Projection matrix applies perspective correction to the view frustum
Multiplication order matters, typically we multiply by scale, then rotation, then translation.
Ray Intersections
A ray is defined at x = dt + o, where x is position, d is direction, t is time/distance, o is origin. d and o are normalized.
Sphere
- r2 = dot(x - p, x - p)
- r is the radius of the sphere
- p is the center
- r2 = dot(dt + o - p, dt + o - p)
- r2 = dot(dt + R, dt + R)
- R = o - p
- r2 = d2 * t2 + 2dRt + R2
- 0 = d2 * t2 + 2d(o - p)t + (o - p)2 - r2
- 0 = At2 + Bt + C
- A = dot(d, d)
- B = 2 * dot(d, o - p)
- C = dot(o - p, o - p) - r2
- Use quadratic equation to solve for real solutions of t.
- Two solutions: two intersection points on the sphere, usually take the closer point
- One solution: ray is tangent to the sphere
- Zero solutions: no intersection point
Plane
- 0 = dot(x - p, n)
- p is a point on the plane
- n is the plane's normal
- 0 = dot(dt + o - p, n)
- 0 = dot(d, n) * t + dot(o - p, n)
- -dot(o - p, n) / dot(d, n) = t
- t is negative: the ray is facing away from the plane
- dot(d, n) = 0: ray and plane are parallel
Dot/Cross products
Dot product
- The angle between two vectors
- dot(a, b) = length(a) * length(b) * cos(theta)
- For normalized vectors, the range of dot product is -1 to 1
- dot = 1: parallel facing same way (0 degrees)
- dot = 0: normal/orthogonal (90 degrees)
- dot = -1: parallel, facing opposite (180 degrees)
- The dot product between two parallel vectors is -1 or 1. (Most people just say 1 apparently)
- Dot product is used in the rendering equation, BRDFs, phong shading, determining angle between two vectors.
Cross product
- AxB = vector that is orthogonal to A and B.
- AxB = -(BxA) = (-A)xB
- A.(AxB) = B.(AxB) = 0
- length(AxB) = area of the parallelogram created by A and B
- Cross product is used to generate a camera basis, or to create a normal vector for a plane from 3 points.
Quaternions
Way to represent rotation with 4D numbers.
Euler Angles
Simple way to represent rotation. Can run into Gimbal lock, better to use quaternions.
Barycentric Coordinates
Coordinate system where you represent a point on a triangle by the weighted sum of the triangle's vertices.
Rendering Equation
For each input direction wi, calculate the light absorbed (irradiance) by the surface along wi (Li * wi·n), then multiply that value by the amount of the absorbed light that exits the surface (radiance) in the direction of wo (fr). Then, if the surface itself emits light, calculate the amount of radiance that leaves the surface along wo (Le) and add it to the previous sum. This is equal to the total amount of radiance that leaves the surface along wo (Lo).
Why is irradiance = Li * wi·n? Iradiance is flux per unit area. The radiance (Li) is the flux. To understand how the dot product give us 'per unit area', lets think about a flashlight. If you hold a flashlight directly above a table, a flashlight will form a circle on the table. As you tilt the flashlight so that it is at a 45 degree angle to the table, the circle will turn into an ellipse. The radiance in these two cases is the same, but the area covered by the light is more in the 45 degree case, so the irradiance is smaller. This effect is modeled with the dot product term. You can see this effect visually below.
Rendering equation is impossible to solve exactly, so it has to be approximated. The basic rendering equation does not capture some rendering effects like subsurface scattering, transmission, and volumetric effects. However, the rendering equation can be modified to accomodate these effects by integrating over the whole sphere instead of the hemisphere.
BXDF
Bidirectional X Distribution function, where X is
- Reflectance
- Transmission
- Surface
- Subsurface Scattering
Takes in two inputs, light direction into a surface (wi), light direction out of the surface (wo). The output of a BXDF is the ratio of radiance to irradiance for the surface. You can also think of it as the amount of light absorbed along wi that is reflected out along wo, or the % chance that a ray coming in along wi will reflect out along wo.
Properties of BXDFs
- All outputs are of the BXDF are non-negaive.
- Helmholtz reciprocity, BXDF(wi, wo) = BXDF(wo, wi)
- Conserves energy, integral of the BRDF is 1.
Types of BXDFs
- Lambertian
- Disney Diffuse
- Blinn-Phong
- Cook-Torrance
Sampling
In graphics, we have to approximate things, like the rendering equation. We use various sampling techniques to calculate the approximations. We are trying to avoid aliasing/jagged edges in the images we render, sampling helps to do that.
Importance Sampling
The reason that importance sampling is so important is due to the following observation: whether a given sample contributes 50% or 0.1% to the final sum, the time it takes to evaluate the function at the two sample points is the same. Because of this, we should spend more time evaluating areas of the interval that contribute significantly to the final sum, and less time on areas that do not contribute much.
Graphics Pipeline
Pipeline
- Vertex shader
- Runs per vertex/point in the mesh. Typically transform mesh vertices from model space to screen space. Vertex shaders predominently use the GPU's matrix/vector multiplication units.
- Optional Tessellation shader
- Used to create subdivided/fine detailed geometry within an input triangle patch.
- Optional Geometry shader
- Used to create new geometry, like shadow volumes.
- Clipping
- Remove vertices that are outside of the 1x1x1 cube, add new vertices if neccesary.
- Primative Assembly
- Turn vertices into shapes (triangles, points, lines)
- Face culling
- Front face/back face culling. If triangles face away from the screen, they are likely behind triangles that face the camera, so we should ignore them. This means triangles have a winding order (CW or CCW).
- Rasterization
- Takes the triangles and figures out which pixels are covered by the triangles. If doing MSAA, might take multiple samples per pixel to reduce jaggies.
- Optional Early depth test
- Depth test fragment against previously written depth for the pixel. If current fragment will be behind what is currently there, ignore this fragment to save computation time.
- Fragment shader
- Take interpolated vertex output data, use it as input to calculate the pixel's color. Typically lighting calculations are done here, and fragment shaders use a lot of texture lookups and ALUs.
- depth/stencil test/alpha blending
- Write to render target if this fragment will be in front of what is currently there. Same for stencil testing, also need to do alpha blending. Typically, depth and stencil occur before fragment shading, but it's possible to change the depth of a fragment in a fragment shader which would force depth test afterwards.
TBDR
Tile Based Deferred Rendering. Vertices are binned into tiles, then each tile is rendered separately, which reduces memory bandwidth between pipeline stages.
CPU/GPU Architecture
Stack vs Heap
Stack
- A sequential block of memory, basically an array.
- There is a stack pointer that points to the top of the stack.
- Stack allocation is very quick, because adding new memory to the stack just requires incrementing the stack pointer. Deallocation is the same.
- Meant for 'temporary variables' that are locally scoped within a function that won't be needed after the function returns.
- Compiler can make some optimizations to preallocate all stack memory for a function when it is called.
Heap
- A larger group of memory than the stack.
- In the heap, the OS basically finds spot in the heap to fit the data, which can be a challenge for large data structures.
- This can lead to gaps of unused memory, memory fragmentation.
- It takes longer to allocate and deallocate memory in the heap because of this.
- Data referenced by pointers is typically stored in the heap, pointers are usually on the stack.
- Heap is for data that needs to persist beyond the lifetime of a function (pointers).
- Heap is bigger than stack.
Cache, Memory, Cache Line
When executing a function, the assembly instructions and data for that function are usually stored sequentially. In order to speed up execution, the OS will prefetch a section of code instructions and data surrounding the current instruction location and store it in the cache, which is a very small, very fast bit of memory that is close to the CPU. While executing the function, the CPU will check the cache to see if the next instruction/data is in the cache. If it is, there is no need to go back to main memory to fetch the data. However, if it is not in the cache, this is a cache miss, and the data must be fetched from a lower cache level or main memory. The cache line is the amount of data that is read into the cache aat once. Since the cache relies on things being sequential, data structures that aren't contiguous can have many cache misses, stuff like graphs, trees, linked lists. Arrays have relatively fewer cache misses, since data is stored sequentially. There are multiple levels of caches, L1 is the fastest.
GPU Architecture
Thread - single invocation of a shader
SIMD group/Warp - a group of 32 threads. This size is fixed by the GPU. In a SIMD group, all threads are executed in parallel in at once on a single GPU core.
Threadgroup/Wavefront - A group of threads that are dispatched to a GPU. The threadgroup size is typically set by the programmer.
Thread masking - GPUs execute SIMD groups in parallel and in lock step. As an example, if there is an add instruction in a shader, the GPU will fetch the inputs to the add instruction all at once, and execute the add instruction all at once, and write the results all at once. This is very efficient, but causes an issue for threads with if statements and loops (divergent threads). This is because some threads will execute different instructions while other threads need to execute other instructions. To solve this, the GPU will execute both branches of the if statement for each thread. The GPU will use thread masking to ignore the writes to the registers for the threads that are not supposed to be executing the instructions. This is why if statements should generally be avoided in GPU code if possible.
Command Buffer - Instructions for a GPU draw call are stored in a command buffer. Stuff like pipeline state, buffer bindings, rendering mode (triangles, points, lines). OpenGL/WebGL do not expose command buffers through their APIs, but they are first class citizens in Metal/Vulkan/other modern APIs.
Command Queue - Command buffers are submitted to the command queue, which holds the command buffers for the draw calls.
Rendering Techniques
Physically Based Rendering
Physically Based Rendering, a way to realistically compute the way light interacts with materials. PBR uses BRDFs to approximate the rendering equation, and is generally photorealistic. PBR models typically have a diffuse/albedo component, a specular/gloss component, a roughness, a metallness, and emmission components. Not all of the components correspond to exact photorealistic parameters, but the parameters are meant to be tweakable by artists.
Forward Rendering
Typical way that the graphics pipeline works. Have some triangles, put them through the vertex shader, compute lighting in fragment shader, then write to frame buffer. You might use a depth buffer to stop a fragment from being drawn behind a pixel that has already been drawn.
Deferred Rendering
While forward rendering works ok, it does not prevent the case where you compute lighting on a fragment that will later be overwritten by another fragment that is closer to the camera. This means that your lighting calculations for the first fragment are wasted (overdraw), which is inefficient. Deferred rendering tries to solve this by deferring lighting to a second compute pass after the triangles have all been rasterized. In deferred rendering, during the fragment shader, the shader writes the various material components to a G-buffer, which is a group of textures containing diffuse/specular/normal/depth information about the triangles. After rasterization, a second compute pass takes the G-buffer and computes lighting. This way lighting is only calculated on pixels that appear on screen.
Deferred rendering has some drawbacks, it uses significantly more texture memory than forward rendering, it can't handle transparent materials well, and it can be difficult to do MSAA with it.
Visibility Buffer Rendering
Visibility buffer rendering solves some issues with deferred rendering. In visibility buffer rendering, the fragment shader writes the triangle/primitive id and draw call id to the visibility buffer. This means you only need a depth + visibility buffer, no G-buffer. Then, in the lighting compute pass, the primitive id and draw call id are used to fetch the MVP matrices and material information from a material buffer, which is then used for lighting.
Shadow Mapping
Shadow mapping is a real time shadow algorithm. A depth buffer is rendered from the camera's point of view, and from the light's point of view. Then, pairs of depth values are compared between the images to determine if the pixels should be in light or shadow. Cascaded shadow mapping creates multiple depth buffers from the light's point of view for different depth ranges away from the camera to improve the resolution of the shadows.
Mipmapping
Mipmapping is the process of creating lower resolution version of textures. With each mip level, the resolution is cut in half (pixel count cut into a quarter). The new mipmap levels will take up 33% more memory of than original texture. Having mipmaps can reduce Moiré patterns, and can save on texture memory by loading a lower resolution version of a texture if that texture appears far away from the camera.
Tone Mapping
A way to approximate HDR content on an low dynamic range screen.
Bloom
A way to approximate the washed out/glow effect that bright lights cause on camera sensors.
Ray Tracing
A technique to create images by tracing light paths through a scene.
C++/OS stuff
Virtual Functions
Polymorphic classes are allowed to have virtual functions, which allow the child classes to implement different versions of the function for each object. The program will determine which virtual function to execute for the object at runtime. classes with virtual functions will get an additional void* vptr member (increasing the object size) pointing to the vtable. The vtable is essentially a list of the object's virtual function address. Pure virtual functions are marked with = 0 and they must be implemented by the subclass.
Pointer vs Reference
Pointer
- Pointer is int*
- Can point to a single object or an array of objects
- Could be null, could be void*
Reference
- Reference is int&
- Only points to a single object
- Will not be null, will not be void&
Templated Classes
Templated classes allow you to write generic code/container classes. However, each new use of the class needs to be compiled individually, which can increase compile time and binary size. They can also be annoying to debug on old versions of C++.
Static
A static variable in a class is a shared between all objects of that class, they can all access and modify it.
Size, Padding, Alignment
The size of an object is important, and there is a difference between the padding, alignment, and size of an object. An empty object without a virtual function will have a size of 1 byte, with a virtual function is 4 bytes.
Mutex and Semaphore
Used for multithreaded programming and locking to prevent race conditions. Mutexes generally allow one thread to access a resource, while a semaphore can allow multiple threads to access a resource.
Paging
Paging allows you to have 'contiguous' memory that is not actually contiguous. This is nice for heap memory, which can become fragmented.
Data Structures/Algorithms
DFS/BFS
Depth First Search and Breadth First Search. The DFS/BFS algorithms are pretty simple, and they come up occasionally. The iterative cases are very similar to each other. The iterative DFS is usually better than recursive, since large trees could cause the program to run out of stack memory.
//The main difference between dfsIterative and bfsIterative is one
//function uses a stack while the other uses a queue.
void dfsIterative(TreeNode* root) {
if (root == NULL) {
return;
}
stack s;
s.push(root);
while (!s.empty()) {
TreeNode* n = s.pop();
process(n);
for (TreeNode* child: n->children) {
if (child != NULL) {
s.push(child);
}
}
}
}
void bfsIterative(TreeNode* root) {
if (root == NULL) {
return;
}
queue q;
q.enqueue(root);
while (!q.empty()) {
TreeNode* n = q.dequeue();
process(n);
for (TreeNode* child: n->children) {
if (child != NULL) {
q.enqueue(child);
}
}
}
}
//DFS can also be done recursive
void dfsRecursive(TreeNode* root) {
if (root == NULL) {
return;
}
process(root);
for (TreeNode* child: root->children) {
dfsRecursive(child);
}
}
Reverse Linked List
Been asked this question twice, its pretty simple to do in linear time.
ListNode* reverse(ListNode* head) {
ListNode* currentNode = head;
ListNode* nextNode = head->next;
currentNode->next = NULL;
while (nextNode != NULL) {
ListNode* newNextNode = nextNode->next;
nextNode->next = currentNode;
currentNode = nextNode;
nextNode = newNextNode;
}
//new head of the list
return currentNode;
}
Spacial Data Structures
Used to query spacial data in sub-linear time. Still takes linear time to build spacial data structures.
k-d tree - partition data along alternating dimensions
Bounding Volume Hierarchy - Create bounding boxes around objects, then more bounding boxes around groups of objects
Binary Space Parition - Partition data using planes
Octree - tree where each node has 8 children
Linked List vs Arrays
Linked List
- A chain of nodes that contain pointers to the next and previous nodes in the list.
- Very quick to add, remove, and insert elements into the list.
- Generally takes linear time to find the nth element in the list.
- Won't have the best cache performance, since list nodes aren't neccesarily next to each other in memory.
Array
- Collection of elements in a continous block of memory.
- Very quick to index into the array to the nth element.
- Can't resize the array without creating a new array and copying data over.
- Generally takes linear time to remove an element from an array.
- Typically has good cache performance since data is contiguous.
Other stuff
Cloth Simulation
Common cloth simulation methods use mass/spring. A bunch of cloth vertices are connected by springs that keep the vertices in place. There are structural springs, which are for adjacent vertices in the face. Shear vertices are for non-adjacent vertices in a face. Flexion/bend vertices are 'two hops' away.
Subdivision Surfaces
Process of smoothing out a surface and adding more geometry. Most common technique is Catmull-Clark. Typically we use half-edge data structures to represent meshes that we want to subdivide.
Fluid Simulation
Fluid simulation involves solving the Navier-Stokes equation, which model forces like gravity, pressure, viscosity, etc. Two main types of fluid: Eulerian and Lagrangian. Lagrangian is about tracking individual particles in the simulation, while Eulerian is about tracking the flow of particles through a specific location, rather than individual particles. Lagrangian is a particle based approach (mesh free), while Eulerian is a grid/mesh based approach.
Integration Methods
Explicit Euler, Implicit Euler, and Runge-Kutta (RK4) are common integration methods. Euler is typically unstable on forces that vary with time/distance/position. RK4 is more stable, but it's more complicated.
Apple Silicon
If you want to work for Apple (or are just interested), you'll want to be familiar with Apple Silicon.
Node-Based Graphs
Lots of rendering/VFX stuff uses node graphs to represent shaders/render passes/materials/etc.