Lecture 9: Programmable Shaders
Prof. Hsien-Hsin Sean Lee
School of Electrical and Computer Engineering
Georgia Institute of Technology
Why Programmable Shaders
• Hardwired pipeline
–
–
–
–
–
Produces limited effects
Effects look the same
Gamers want unique look-n-feel
Multi-texturing somewhat alleviates this, but not enough
Less interoperable, less portable
• Programmable Shaders
–
–
–
–
–
Vertex Shader
Pixel or Fragment Shader
Starting from DX 8.0 (assembly)
DX 9.0 added HLSL (High Level Shading Language)
HLSL (MS) is compatible with Cg (Nvidia)
Evolution of Graphics Processing Units
• Pre-GPU
– Video controller
– Dumb frame buffer
• First generation GPU
– PCI bus
– Raterization done on GPU
– ATI Rage, Nvidia TNT2, 3dfx Voodoo3 (‘96)
• Second generation GPU
– AGP
– Include T&L into GPU
– Nvidia GeForce 256, ATI Radeon 7500, S3 Savage3D (’98)
• Third generation GPU
– Programmable vertex shader
– Nvidia GeForce3, ATI Radeon 8500, Microsoft Xbox (’01)
• Fourth generation GPU
– Both programmability in vertex and fragment shaders
– Nvidia GeForce FX, ATI Radeon 9700 (’02)
Programmable Graphics Pipeline
3D Apps
Fixed Function Pipeline
API commands
3D API:
Direct3D
NVidia GeForce FX
GPU cmd & data stream
Vtx index
GPU
Frontend
Assembled
polygons
Primitive
Assembly
Rasterization
& Interpolation
Transformed
vertices
Programmable
Vertex Shader
Source: Cg tutorial
Pixel
location
Pixel
updates
Raster
Operations
Frame
Buffer
Transformed
Fragments
Programmable
Fragment Shader
Graphics Programmable Pipeline
FF Pixel
HW T&L
Vertices
Vertex
Shader
Culling
Clipping
Rasterization
Pixel
Shader
Blend
Mask
DirectX 8.0 Pipeline
Choice between programmable and fixed function pipeline
(mutually exclusive, parallel pipelines)
Input
Assembler
Vertex
Shader
Geometry
Shader
Rasterization
Pixel
Shader
DirectX 10.0 Pipeline, Fully Programmable
Output
Merger
Shader Languages
• HLSL most common post-DX 10.0.
– No assembly shaders allowed in DX 10.0.
• Other options:
– Cg (compatible with HLSL)
– GLSL
– Legacy DirectX shaders in assembly
– Sh
– OpenVidia (U of Toronto)
Basic Shader Mechanics
• Data types:
– Typically floats, and vectors/matrices of floats
– Fixed size arrays
– Three types:
• Per-instance data, e.g. per-vertex position
• Per-pixel interpolated data, e.g. texture coordinates
• Per-batch data, e.g. light position
– Data are tightly bound to the GPU
– Flow control is very simple:
• No recursion
• Fixed size loops for v_2_0 or earlier
• Simple if-then-else statements allowed in the latest APIs
• Texkill (asm) or clip (HLSL) or discard (GLSL) or
allows you to abort a write to a pixel (form of flow control)
Vertex Shader
• Transform to clip-space (i.e. screen space)
• Inputs:
– Common inputs:
•Vertex position (x, y, z, w)
•Texture coordinate
•Constant inputs
•Can also have fog, color as input, but usually leaves
them untouched for pixel shader
– Output to Pixel (fragment) shader
• Vertex shader is executed once per vertex,
thus less expensive than pixel shader
Vertex Shader
12 Temporary registers
aL
v0
r0
v1
v2
v15
C0
Vertex data registers
r1
C1
r2
C2
Vertex
Shader
r11
a0
Cn
Loop
Address
Register Register
oPos
oTn
position
texture
oFog
fog
Each register is a 4-component vector register except aL
oD1
oD0
oPts
Diff. color
Spec. color
Output
Pt size
Constant registers (n=95 or 255)
Vertex stream
Input to Vertex Shader
Vector Component
Shader decl name
Register
Position
D3DVSDE_POSITION
V0
Blend Weight
D3DVSDE_BLENDWEIGHT
V1
Blend Indices 1 thru 5
D3DVSDE_BLENDINDICES
V2
Normal
D3DVSDE_NORMAL
V3
Point Size
D3DVSDE_PSIZE
V4
Diffuse
D3DVSDE_DIFFUSE
V5
Specular
D3DVSDE_SPECULAR
V6
Texture Coordinates
D3DVSDE_TEXTCOORD0 – 7
V7 to v14
Position 2
D3DVSDE_POSITION2
V15
Normal 2
D3DVSDE_NORMAL2
v16
Pixel (or Fragment) Shader
• Determine each fragment’s color
– custom (sophisticated) pixel operations
– texture sampling
• Inputs
– Interpolated output from vertex shader
– Typically vertex position, vertex normals, texture
coordinates, etc.
• Output
– Color (including alpha)
– Depth value
• Executed once per pixel so is executed a lot more
times than vertex shader typically
– It is advantageous to compute stuff on a per-vertex basis
to improve performance
Pixel Shader
r1
color1
(spec)
Color0
(diff)
v0
v1
C0
C1
Color registers
Texture color
Registers
(4 or 6)
rn
Cn
t0
s0
t1
Pixel
Shader
tn
s1
sn
oC0
oDepth
color
Depth
Constant registers
(8 for v1, 32 for v2)
r0
Sampler Registers
Temporary registers
(2, 4, 24, ..)
Pixel stream
Use of the Vertex Shader
• Transform vertices to clip-space
• Pass normal, texture coordinates to PS
• Transform vectors to other spaces (e.g.
texture space)
• Calculate per-vertex lighting (e.g., Gouround
shading)
• Distort geometry (waves, fish-eye camera)
Adapted from Mart Slot’s presentation
Use of Pixel Shader
• Texturing objects
• Per-pixel lighting (e.g., Phong shading)
• Normal mapping (each pixel has its own
normal)
• Shadows (determine whether a pixel is
shadowed or not)
• Environment mapping
Adapted from Mart Slot’s presentation
HLSL / Cg
• Compatible, jointly developed by Microsoft
and Nvidia
• A C-like language and syntax
• But do not have
– Pointers
– Dynamic memory allocation
– Unstructured/complex control structure
•e.g. goto
•Recursion (note that functions are inlined)
– Bitwise operations (may have in the future)
A Simple Vertex Shader
Passed from
D3D apps
uniform extern float4x4 gWVP;
struct VtxOutput {
float4 position : POSITION;
float4 color
: COLOR;
};
Semantics
Reserved
data type
VtxOutput All_greenVS(float2 position : POSITION)
{
VtxOutput OUT;
Input to
Vertex Shader
OUT.position = mul(float4(position, -30.0f, 1.0f), gWVP);
OUT.color
= float4(0, 1, 0, 1);
return OUT;
}
Structure passed to
fragment shader
Adapted from Cg Tutorial
A Simple Vertex Shader (Alternative)
uniform extern float4x4 gWVP;
void All_greenVS(float2 position : POSITION,
out float4 oPosition : POSITION,
out float4 oColor
: COLOR)
{
oPosition = mul(float4(position, -30.0f, 1.0f), gWVP);
oColor
= float4(0, 1, 0, 1);
}
No structure declaration
Semantic
• Something new to C/C++ programming
• A colon and a keyword, e.g.,
–
–
–
–
MonsterPos : POSITION
VertexColor : COLOR
Vertexnormal : NORMAL
VertexUVcoord : TEXTCOORD0
• A glue that
– binds an HLSL program to the rest of the graphics pipeline
– Connects the semantic variables to the pipeline
The uniform Type Qualifier
• A uniform variable value comes from external
– E.g., D3D application
• Retrieve the initial value from a constant
register (e.g., c0, read-only) in the GPU
• Global to all processed vertices in the entire
shading process
A Simple Pixel (or Fragment) Shader
struct PixelOutput {
float4 color : COLOR;
};
PixelOutput All_greenPS(float4 color : COLOR)
{
PixelOutput PSout;
PSout.color = color;
return PSout;
}
Adapted from Cg Tutorial
Profiles
• Need a profile to compile the vertex shader
and the pixel shader
• Specify shader models, for example
– vs_2_0 for DX9 vertex shader
– ps_2_0 for DX9 pixel shader
• Specify particular models for compilation
• Can be embedded inside technique
vertexShader = compile vs_2_0 PhongVS();
pixelShader = compile ps_2_0 PhongPS();
Flow Control (Predicating Constants)
if (posL.y < 0)
outVS.posH.x = -1.0f;
else
outVS.posH.x = 2.0f;
c0
0 -3
2
0
=?
posL=v0
x
y
z
w
r0
?
y
z
w
vs_2_0 compiled code
def c0, 0, -3, 2, 1
dcl_position v0
slt r0.x, v0.y, c0.x
mad oPos.x, r0.x, c0.y, c0.z
? * (-3) + 2
oPos.x
vs_1_1
vs_2_0
vs_3_0
>= 96
>=256
>=256
Constant registers
Constant registers
Constant registers
y
z
w
DirectX Effect FX Framework
• A D3D “Effect”
– Encapsulates shader properties
• E.g., Water modeling, steel modeling has their own effects
– Reusable for the same type of modeled objects
• An effect consists of one or more techniques
– To enable fallback mechanism on different GPUs
– Several versions of an effect (GPU-dependent)
• A technique consists of one or more passes
• Described in an effect file (.fx) in D3D
– External file
– No application recompilation needed
An Example of an FX File
uniform extern float4x4 gWVP;
uniform extern float4
gAmbMtrl;
void VShader(float4 pos : POSITION, float4 normal : NORMAL,
out float4 oColor : COLOR)
{
. . . . .
}
float4 PShader(float4 color : COLOR) : COLOR
{
. . . . .
}
technique SuperShading
{
pass P0
{
vtxshader = compile vs_2_0 VShader();
pxlshader = compile ps_2_0 PShader();
FillMode = Wireframe; // default Solid
}
}
Creating an Effect from Direct3D Apps
ID3DEffect *mFX = 0;
ID3DXBuffer* errors = 0;
D3DXCreateEffectFromFile(
gd3dDevice,
“MyShading.fx",
0,
0,
D3DXSHADER_DEBUG,
0,
&mFX,
&errors));
D3DXCreateEffectFromFile(
LPDIRECT3DDEVICE9 pDev,
LPCTTR pSrcFile,
const D3DXMACRO *pDefines,
LPD3DXINCLUDE pinclude,
DWORD flags,
LPD3DXEFFECTPOOL pPool,
LPD3DXEFFECT
*ppEffect
LPD3DXBUFFER
*ppCompilationErrors);
Fx file location
(output) returned
effect pointer
Setting Effect Parameters
mFX->SetMatrix(mhWVP, &(mWorld*mView*mProj));
mFX->SetValue(mhAmbMtrl, &GlobalMtrl.amb, sizeof(D3DXCOLOR));
// Obtain handles.
D3DXHANDLE mhTech = mFX->GetTechniqueByName(“SuperShading");
D3DXHANDLE mhWVP
= mFX->GetParameterByName(0, "gWVP");
D3DXHANDLE mhAmbMtl= mFX->GetParameterByName(0, “gAmbMtrl");
gWVP is the float4x4
matrix used in the .fx file
gAmbMtrl is the float4
color used in the .fx file
• Several other parameter setting flavors
– SetFloat
– SetVector
– SetTexture
Applying an Effect
/* buildFX() */
mhTech = mFX->GetTechniqueByName(“SuperShading");
/* drawScene() */
gd3dDevice->BeginScene()
. . . .
mFX->SetTechnique(mhTech);
mFX->SetMatrix(mhWVP, &(mWorld*mView*mProj));
numPasses = 0;
mFX->Begin(&numPasses, 0);
for (i=0; I < numPasses; i++)
{
mFX->BeginPass(i);
gd3dDevice->DrawIndexedPrimitive(D3DPT_TRIANGLELIST,
0, 0, 36, 36, 6);
mFX->EndPass();
}
mFX->End()
gd3dDevice->EndScene()
gd3dDevice->Present(0, 0, 0, 0);
Change Parameters During a Pass
• If you need to change parameters (i.e., Set*
methods) in the middle of a pass
 call Id3DXEffect::CommitChanges
/* DrawScene() */
mFX->BeginPass(i);
. . .
mFX->SetMatrix(mhWVP, &(mWorld*mView*mProj));
mFX->CommitChanges();
. . .
gd3dDevice->DrawIndexedPrimitive(D3DPT_TRIANGLELIST,
0, 0, 36, 36, 6);
mFX->EndPass();
First HLSL/D3D Example
FirstHLSL
(See Demo in Visual Studio)
Vertex Shader Code for Texturing
uniform extern float4x4 gWVP;
void TexVS(float3 position : POSITION,
float3 color
: COLOR,
float2 texcoord : TEXCOORD0,
out float4 oPos
: POSITION,
out float4 oColor
: COLOR,
out float2 oTexcoord : TEXCOORD0)
{
oPos
= mul(float4(position, 1.0f), gWVP);
oColor
= float4(color, 1.0f);
oTexcoord = texcoord;
}
Adapted from Cg Tutorial
Pixel Shader Code for Texturing
uniform extern texture texture_monster_skin;
sampler TexS = sampler_state
{
Texture = <texture_monster_skin>;
MinFilter = Anisotropic;
MagFilter = LINEAR;
MipFilter = LINEAR;
MaxAnisotropy = 8;
AddressU = WRAP;
AddressV = WRAP;
};
void TexPS(float4 pos
: POSITION,
float3 color
: COLOR,
float2 texcoord : TEXCOORD0,
out float4 oColor
: COLOR)
{
float4 temp = tex2D(TexS, texcoord);
color = lerp(temp, color, 0.5);
}
Adapted from Cg Tutorial
Math Operators
• Most commonly used C/C++ operations are
supported
• Some are reserved for the future
implementations, e.g.,
– Bitwise logic operation (&, ^, |, &=, |=, ^=…)
– Shift: << , >>, <<=, >>=
– Modular: %
– *, -> (No pointer support or indirection in
Cg/HLSL)
Standard Library Function
• Many, … to name a few
dot(a, b)
cross(a, b)
distance(pt1, pt2) : Euclidean distance
lerp(a, b, f) : r = (1-f)*a + f*b
lit(NL, NH, pwr) : calculate amb, diff, spec co-efficients
mul(M, N)
normalize(v)
reflect(I, N) : calculate reflect vector of ray I
sincos(x, s, c) : calculate sin(x) and cos(x)
Sampler Objects (in .fx file)
uniform extern texture texture_brick;
sampler MyTexS = sampler_state
{
Texture = <texture_brick>;
MinFilter = POINT;
MagFilter = LINEAR;
MipFilter = Anisotropic;
MaxAnisotropy = 8;
AddressU = WRAP;
AddressV = WRAP;
};
Nearest Point Sampling
Bilinear Filtering
Most expensive;
Alleviate distortion when angle
between normal and camera is wide
void PixelShader(float2 texcoord : TEXCOORD0)
{
float4 color = tex2D(MyTexS, texcoord);
}
Vertex Shader: Logical View
Vertex Processing Unit
Per-vertex
Input
Data
Register
File
Shader
Bound
Bound
Bound
r0
r1
r2
r3
...
Swizzle /
Mask Unit
.rgba
.xyzw
.zzzz
.xxyz
...
Math/Logic
Unit
cosine
log
sine
sub
add
...
Start Addr
Textures
Samplers
Consants
Per-vertex
Output
Data
Transformed
and
Lit vertices
Shader Resources (bound by application)
sampler mysampler = sampler_state
{
Texture = mytexture;
AddressU = Clamp;
AddressV = Clamp;
}
Sampler
Unit
ID3DXConstantTable::SetF
Texture
Memory
Input Data
Architectural State
Output Data
Control Logic
State Information
Memory
Shader
Constants
// [Deprecated]
SetVertexShaderCsontantB
SetVertexShaderConstantF
...
Pixel Shader: Logical View
Pixel Processing Unit
Interpolator
Per-pixel
Input
Data
Register
File
Shader
Bound
Bound
Bound
r0
r1
r2
r3
...
Swizzle /
Mask Unit
.rgba
.xyzw
.zzzz
.xxyz
...
Math/Logic
Unit
cosine
log
sine
sub
add
...
Start Addr
Textures
Samplers
Consants
Per-pixel
Output
Data
Pixel Color
Depth Info
Stencil Info
Shader Resources (bound by application)
sampler mysampler = sampler_state
{
Texture = mytexture;
AddressU = Clamp;
AddressV = Clamp;
}
Sampler
Unit
Texture
Memory
Input Data
Architectural State
Output Data
Control Logic
State Information
Memory
Color buffer
Depth Buffer
Stencil Buffer
ID3DXConstantTable::SetF
Shader
Constants
// [Deprecated]
SetVertexShaderCsontantB
SetVertexShaderConstantF
...
First HLSL/D3D Texturing
Example
FirstHLSLTexture
(See Demo in Visual Studio)
First HLSL/D3D Texturing
Morphing Example
FirstHLSLTexture2
(See Demo in Visual Studio)
Per-Vertex vs. Per-Pixel Shading
GouroudVertexShader(float3 posL : POSITION0,
float3 normalL : NORMAL0,
out float oPos : POSITION0,
out float oColor : COLOR0)
{
. . .
lightVecW= normalize(LightPosW - posW);
s = max(dot(normalW, lightVecW), 0.0f);
diffuse = s*(DiffMtrl*DiffLight).rgb;
PhongVertexShader(float3 posL : POSITION0,
float3 normalL : NORMAL0,
out float4 oPos : POSITION,
out float3 posW : TEXCOORD0,
out float3 normalW : TEXCOORD1)
{
posW = mul(float4(posL, 1.0f), World);
normalW = mul(float4(normalL, 0.0f),
WorldInvTrans).xyz;
// Transform to homogeneous clip space.
oPos = mul(float4(posL, 1.0f), gWVP);
}
toEyeW = normalize(EyePosW - posW);
reflectW = reflect(-lightVecW, normalW);
t = pow(max(dot(reflectW, toEyeW), 0.0f),
SpecPower);
spec = t*(SpecMtrl*SpecLight).rgb;
float4 PhongPixelShader(float3 posW : TEXCOORD0,
float3 normalW : TEXCOORD1) : COLOR
{
. . . .
lightVecW = normalize(LightPosW - posW);
ambient = (AmbMtrl*AmbLight).rgb;
oColor = ambient + ((diffuse + spec) / A);
ambient = (AmbientMtrl*AmbientLight).rgb;
// Transform to homogeneous clip space.
oPos = mul(float4(posL, 1.0f), gWVP);
s = max(dot(normalW, lightVecW), 0.0f);
diffuse = s*(DiffMtrl*DiffLight).rgb;
}
toEyeW = normalize(EyePosW - posW);
reflectW = reflect(-lightVecW, normalW);
t = pow(max(dot(reflectW, toEyeW), 0.0f), SpecPower);
spec = t*(SpecMtrl*SpecLight).rgb;
GouroudPixelShader(float4 c : COLOR0) : COLOR
{
return c;
}
color = ambient + ((diffuse + spec) / A);
return float4(color, 1.0f)
}
Utah Teapot Shading
Examples
1. HLSLTeapot
2. HLSLTeapotTwoShading
(See Demo in Visual Studio)
Descargar

ECE 4893 Multicore and GPU Programming for Video …