影像處理相關範疇
․影像處理可分為六個面向: (1) 影像輸入數位化與影像顯示(輸出) (2) 影像強化
和影像失真還原 (涉及較多數學處理) (3) 影像編碼與壓縮 (4) 外形處理: 為影像
特徵的重要處理方式, 例如影像骨架處理為pattern recognition常用的處理技巧
(5) 影像分析/分割/分類、特徵擷取、特徵表示與描述 (6) 彩色影像處理 (7)
computer vision 離不開影像表示、解讀與處理
․同樣用來處理圖形資訊的學門除影像處理外, 尚有電腦圖學與pattern recognition--是影像處理的後級處理部份, 屬於智慧認知此一術語所指的過程; 同一物
件的旋轉、放大、縮小的特徵向量大致相近--這也是圖樣分類的一個依據。藉
由影像分析, 將物件從背景中抽離, 再擷取有用特徵--從中認識、描述、分割影
像及作分類; 涉及數學上的機率, 例如: 一個圖樣x來自 i 類的機率以 p ( i x ) 表
示, 而 p ( x  k ) 為  k 類別中圖樣x的機率密度函數, p ( k ) 則是出現  k 類的機率;
p(x)與類別無關, 稱為x的相對頻率(還是機率)。w k 為第k次疊代的加權值, 分類
T
錯誤則改變加權值; 其訓練法則如右: w k 1  w k  cx k if
wk xk  0
and
xk   i
T
且常需計算  x ,  x2
w  w  cx
if
w x 0
and
x 
k 1
k
w k 1  w k
k
others
k
k
k
j
․影像處理後的資訊仍為一圖形, 圖訊識別處理後的資訊為抽象、精簡的數據;
電腦圖學則是研究如何由一些文字敘述產生所對應圖形 (例如圓心座標、半
徑和顏色)的方法, 基本上電腦圖學和圖訊識別可看成是兩個相反的程序。影像
分析是希望從輸入的影像中, 找出圖樣的形狀 (圓形或方形或…)、中心位置和
影像分析
顏色等, 其與電腦圖學之關係如右:
影像
描述
電腦圖學
Topic on Rasterization
․Since all modern displays are raster-oriented, the difference between rasteronly and vector graphics comes down to where they are rasterised; client side
in the case of vector graphics, as opposed to already rasterised on the (web)
server.
光柵式掃瞄
․Basic Approach: The most basic algorithm takes a 3D scene, described as
polygons, and renders it onto a 2D surface, usually a computer monitor.
Polygons are themselves represented as collections of triangles. Triangles
are represented by 3 vertices in 3D-space. At a very basic level, rasterizers
simply take a stream of vertices, transform them into corresponding 2D
points on the viewer’s monitor and fill in the transformed 2D triangles as
appropriate.
四元數
․Transformations are usually performed by matrix multiplication. Quaternion
math may also be used. The main transformations are translation, scaling,
rotation, and projection. A 3D vertex may be transformed by augmenting an
extra variable (known as a "homogeneous variable") and left multiplying the
resulting 4-component vertex by a 4 x 4 transformation matrix.
․A translation is simply the movement of a point from its original location to
another location in 3-space by a constant offset. Translations can be
represented by the leftmost matrix, where X, Y, and  1 0 0 x   x 0 0 0 




0 1 0 y
Z are the offsets in the 3 dimensions, respectively.
0 y 0 0




0

0
0
1
0
0
z

1
0

0
0
z
0
0
0

1
․A scaling transformation is performed by multiplying the position of a vertex
by a scalar value. This has the effect of scaling a vertex with respect to the
origin. Scaling can be represented by the upright matrix, and X, Y, and Z are
the values by which each of the 3-dimensions are multiplied. Asymmetric
scaling can be accomplished by varying the values of X, Y, and Z.
․Rotation matrices depend on the axis around which a point is to be rotated.
0 sin  0 
0
0
 cos 
 cos   sin  0 0 
1) Rotation about the X-axis:  1 0





0
1
0
0
0 cos   sin  0
sin 
cos 
0 0

2) Rotation about the Y-axis:  0 sin  cos  0    sin  0 cos  0   0
0
1 0





3) Rotation about the Z-axis:  0 0
0
0
1
0
1
0
0 1
 0
 0
(1)
(2)
θ in all each of these cases represent the angle of rotation.
(3)
․Rasterization systems generally use a transformation stack to move the
stream of input vertices into place. The transformation stack is a standard
stack which stores matrices. Incoming vertices are multiplied by the matrix
stack. As an illustrative example, imagine a simple scene with a single model
of a person. The person is standing upright, facing an arbitrary direction while
his head is turned in another direction. The person is also located at a certain
offset from the origin. A stream of vertices, the model, would be loaded to
represent the person. First, a translation matrix would be pushed onto the
stack to move the model to the correct location. A scaling matrix would be
pushed onto the stack to size the model correctly. A rotation about the y-axis
would be pushed onto the stack to orient the model properly. Then, the
stream of vertices representing the body would be sent through the rasterizer.
Since the head is facing a different direction, the rotation matrix would be
popped off the top of the stack and a different rotation matrix about the y-axis
with a different angle would be pushed. Finally the stream of vertices
representing the head would be sent to the rasterizer.
After all points have been transformed to their desired locations in 3-space
with respect to the viewer, they must be transformed to the 2-D image plane.
The orthographic projection, simply involves removing the z component from
transformed 3d vertices. Orthographic projections have the property that all
parallel lines in 3-space will remain parallel in the 2-D representation.
However, real world images are perspective images, with distant objects
appearing smaller than objects close to the viewer. A perspective projective
transformation needs to be applied to these points.
大量、大額
․Conceptually, the idea is to transform the perspective viewing volume into
the orthogonal viewing volume. The perspective viewing volume is a frustum,
that is, a truncated pyramid. The orthographic viewing volume is a 平截頭體
rectangular box, where both the near and far viewing planes are parallel to
the image plane. 平截頭角錐體
․A perspective projection transformation can be represented by the following
0
0 
matrix:  1 0
F and N here are the distances of the far and near


0 1
0
0
viewing planes, respectively. The resulting four


0 0 ( F  N ) / N  F 
vector will be a vector where the homogeneous


1/ N
0 
0 0
variable is not 1. Homogenizing the vector, or
multiplying it by the inverse of the homogeneous variable such that the
homogeneous variable becomes unitary, gives us our resulting 2-D location
in the x and y coordinates.
․Clipping: Once triangle vertices are transformed to their proper 2d locations,
some of these locations may be outside the viewing window, or the area on
the screen to which pixels will actually be written. Clipping is the process of
truncating triangles to fit them inside the viewing area. 修剪
截頭、去尾、截短
․The common technique is the Sutherland-Hodgeman clipping algorithm: each
of the 4 edges of the image plane is tested at a time. For each edge, test all
points to be rendered. If the point is outside the edge, the point is removed.
For each triangle edge that is intersected by the image plane’s edge, that is,
one vertex of the edge is inside the image and another is outside, a point is
inserted at the intersection and the outside point is removed.
․Scan conversion: The final step in the traditional rasterization process is to
fill in the 2D triangles that are now in the image plane, also known as scan
conversion. The first problem to consider is whether or not to draw a pixel at
all. For a pixel to be rendered, it must be within a triangle, and it must not be
occluded, or blocked by another pixel. The most popular algorithm of filling in
pixels inside a triangle is the scanline algorithm. Since it is difficult to know
that the rasterization engine will draw all pixels from front to back, there must
be some way of ensuring that pixels close to the viewer are not overwritten
by pixels far away. 遮蔽(不透光)
․The z buffer, the most common solution, is a 2d array corresponding to the
image plane which stores a depth value for each pixel. Whenever a pixel is
drawn, it updates the z buffer with its depth value. Any new pixel must check
its depth value against the z buffer value before it is drawn. Closer pixels are
drawn and farther pixels are disregarded.
․To find out a pixel's color, textures and shading calculations must be applied.
A texture map is a bitmap that is applied to a triangle to define its look. Each
triangle vertex is also associated with a texture and a texture coordinate (u,v)
for normal 2-d textures in addition to its position coordinate. Every time a pixel
on a triangle is rendered, the corresponding texel (or texture element) in the
texture must be found-- done by interpolating between the triangle’s vertices
associated texture coordinates by the pixels on-screen distance from the
vertices. In perspective projections, interpolation is performed on the texture
coordinates divided by the depth of the vertex to avoid a problem known as
perspective foreshortening (a process known as perspective texturing).
․Before the final color of the pixel can be decided, a lighting calculation must
be performed to shade the pixels based on any lights which may be present
in the scene. There are generally three light types commonly used in scenes.
․Directional lights are lights which come from a single direction and have the
same intensity throughout the entire scene. In real life, sunlight comes close
to being a directional light, as the sun is so far away that rays from the sun
appear parallel to Earth observers and the falloff is negligible.
降低、減少
․Point lights are lights with a definite position in space and radiate light
evenly in all directions. Point lights are usually subject to some form of
attenuation, or fall off in the intensity of light incident on objects farther away.
Real life light sources experience quadratic falloff. Finally, spotlights are like
real-life spotlights, with a definite point in space, a direction, and an angle
defining the cone of the spotlight. There is also often an ambient light value
that is added to all final lighting calculations to arbitrarily compensate for
global illumination effects which rasterization can not calculate correctly.
周圍的、周遭的
․All shading algorithms need to account for distance from light and the normal
vector of the shaded object with respect to the incident direction of light. The
fastest algorithms simply shade all pixels on any given triangle with a single
lighting value, known as flat shading.
․There is no way to create the illusion of smooth surfaces except by subdividing into many small triangles. Algorithms can also separately shade
vertices, and interpolate the lighting value of the vertices when drawing
pixels, known as Gouraud shading. The slowest and most realistic approach
is to calculate lighting separately for each pixel, noted as Phong shading.
This performs bilinear interpolation of the normal vectors and uses the result
to do local lighting calculation.
․Acceleration techniques: To extract the maximum performance out of any
rasterization engine, a minimum number of polygons should be sent to the
renderer, culling out objects which can not be seen. 剔除
圈住
․Backface culling: The simplest way to cull polygons is to cull all polygons
which face away from the viewer, known as backface culling. Since most 3d
objects are fully enclosed, polygons facing away from a viewer are always
blocked by polygons facing towards the viewer unless the viewer is inside the
object. A polygon’s facing is defined by its winding, or the order in which its
vertices are sent to the renderer. A renderer can define either clockwise or
counterclockwise winding as front or back facing. Once a polygon has been
transformed to screen space, its winding can be checked and if it is in the
opposite direction, it is not drawn at all. Note: backface culling can not be
used with degenerate and unclosed volumes.
纏繞、纏捲
․Using spatial data structures to cull out objects which are either outside the
viewing volume or are occluded by other objects. The most common are
binary space partitions, octrees, and cell and portal culling.
入口、門戶
․Texture filtering, one of further refinements, to create clean images at any
distance : Textures are created at specific resolutions, but since the surface
they are applied to may be at any distance from the viewer, they can show up
at arbitrary sizes on the final image. As a result, one pixel on screen usually
does not correspond directly to one texel.
․Environment mapping is a form of texture mapping in which the texture
coordinates are view-dependent. One common application, for example, is to
simulate reflection on a shiny object. One can environment map the interior of
a room to a metal cup in a room. As the viewer moves about the cup, the
texture coordinates of the cup’s vertices move accordingly, providing the
illusion of reflective metal.
․Bump mapping is another form of texture mapping which does not provide
pixels with color, but rather with depth. Especially with modern pixel shaders,
bump mapping creates the feel of view and lighting-dependent roughness on
a surface to enhance realism greatly.
․Level of detail: Though the number of polygons in any scene can be
phenomenal, a viewer in a scene will only be able to discern details of closeby objects. Objects right in front of the viewer can be rendered at full
complexity while objects further away can be simplified dynamically, or even
replaced completely with sprites. 幽靈
․Shadow mapping and shadow volumes are two common modern techniques
for creating shadows, taking object occlusion into consideration.
․Hardware acceleration: Most modern programs are written to interface with
one of the existing graphics APIs, which drives a dedicated GPU. The latest
GPUs feature support for programmable pixel shaders which drastically
improve the capabilities of programmers. The trend is towards full programmability of the graphics pipeline.
Graphics File Formats
․Bitmap is a two-dimensional array of values, and each element of the array
corresponds to a single dot in a picture, i.e. a pixel. In most cases a pixel’s
value is an index into a table of colors, indicating that the pixel is to be
displayed using the color pointed to by the index. The colors in the table are
collectively referred to as a palette.
․Termed bitmapped and vector formats. They are fundamentally different: the
former stores a complete, digitally encoded image while the latter representing a picture as a series of lines, arcs circles, and text, something like
“move to(100,100), select color blue, circle with radius 50”, etc. The main
disadvantage is they can only reasonably describe a line drawing, not a
photographic image. Metafiles: containing a list of image creation commands
along with vectors and circles, and are really programs. Drawing the image it
describe is impractical without access to the graphics package it depends on.
․If palette sizes used are16 or 256 colors, then the corresponding index sizes
are 4 and 8 bits referred to as the number of bits per pixel. In a bitmap of 4
bits per pixel, each byte holds 2 separate pixel values. In a bitmap of 8 bits
per pixel, each byte represents a single pixel.
․Bitmaps that represent very large numbers of colors simultaneously generally
do not employ the palette scheme, but a pixel’s value directly defines a color.
․Pixel Ordering: The simplest is to store the pixels a row at a time. Each row is
referred to as a scan line, and most often storing from left to right with rows
from top to bottom.
․Image Geometry: Every computer image has an internal geometry used to
position elements in the picture. The 2 most common are screen coordinates
and graph coordinates. The former is commonly used for display and shown
as the following left figure (the 2 scales may be different, IBM VGA for
example.)
(0,0)
(1,0)
(2,0)
(0,1)
(1,1)
(2,1)
(0,2)
(1,2)
….
….
Y
screen(left) and graph coordinates(right)
The latter is often used to be printed on a paper.
(2,10)
(10,3)
X
․Bitmapped Graphics File Formats
BMP Microsoft Windows Bitmap, general-purpose for bitmapped image
GIF CompuServe Graphics Interchange Format, general-purpose to
transmit image by modem–- utilizing data compression to reduce
transmission times and also supporting interlaced image.
TIFF Aldus/Microsoft Tagged Image File Format, complex, multipurpose and
open-ended and supporting all types of bitmaps and bitmap-related
measures.
JEPG Joint Photographic Experts Group under the auspices of the ISO,
fundamentally a bitmapped format. Instead of storing individual pixels,
it stores blocks of data that can be approximately to reconstruct blocks
of pixels, and also called lossy.
․Interleaving: The simplest is to store the even numbered rows, then the odd
rows, i.e. 0, 2, 4, …, 1, 3, 5, ... Or maybe 0, 2, 4, …, 98, 99, 97,
95, …, 3, 1 supposing there were a total of 100 rows. The
original point is to match the order of scan lines used on TV, i.e. even-downodd-up. Another advantage is one can quickly construct an approximate
version of the image without having to read the whole file.
․GIF uses a four-way interleave that first stores every eighth row, then three
more sets of rows, each of which fills in the rows halfway between the ones
already stored. GIF is copyrighted but freely used, and employs patented
LZW compression.
․The most practical approach to dealing with a bitmapped image is to treat it
as a collection of scan lines--- writing functions that read and write scan lines,
display scan lines, and the like.
(下面為補充)
****************************************************************************************************************
․Most adaptive-dictionary-based techniques have theirs roots in two landmark papers by Jacob
Ziv and Abraham Lempel in 1977 and 1978. That is what we call the LZ77 family (also known as
LZ1), and this LZ78 or LZ2 family. The most well-known modification of LZ2 is the one by Terry
Welch, known as LZW.
․LZW 稱為辭典式壓縮法, 將重覆性較高的原始資料或是字串, 利用索引值編碼, 取代原始資料達到壓
縮目的; 因為簡捷, 適合硬體計算, 例如字元a, 以一個索引值100, 代替原字元輸出。若有新字元, 即
增加索引且重新編碼, 減少資料體積達到壓縮目的; 多用於大面積純色影像, 在TIFF、PDF、GIF和
PostScript等格式中, 皆有支援LZW。缺點是增加儲存時間。
․Printer Data File: 2 general types, namely extended text formats and page
description languages. The former embed picture
information inside a convention text stream; that is, plain
text prints as itself, and escape sequences introduce nontext elements, PCL
of Hewlett-Packard’s being a de facto standard for low- to medium-performance laser printers for example. The other is to define an entirely new
language to describe what is to be printed on the page, PostScript becoming
the standard description language for example.
Converting File Types
1) bitmap to bitmap, one reads a file format, extracts the array of pixels, and
then writes the same pixels in any other format, PBM utilities (PGM, PPM)
supporting the transformations for example. Image transformation of this
kind has nothing to do with file processing per se!
․Promoting from a less expressive format to a more expressive format does
nothing at all– a white pixel remains white, a 50 percent gray remains a 50
percent gray, and so forth. Conversion in reverse direction is not easy. The
goal is to produce the best-looking image possible given the limitations of
the new format.
2) Color to Gray Conversion: For each pixel, one need only determine the
pixel’s luminance, a value conventionally computed from the component
values as Y(or L) in slide 2.
․Color Quantization: Sometimes one has a color image with more colors than
the local hardware can handle, such as a full-color image to be displayed on
a screen that can only show 64 or 256 colors. A process called quantization
selects a representative set of colors and then assigns one of those colors to
each pixel. For example, that a digitalized photograph using 246 gray scale
values to be displayed on a screen with 3 bits per pixel, a show of 8 colors, is
much coarser but still recognizable.
․Dithering: The limited number of colors is quite noticeable in areas with
gradual changes of color. One way to decrease this effect is by dithering,
spreading the quantization error around from pixel to pixel to avoid unwanted step effects. It turns out to be a much smoother image than the previous
example by using 8 colors and also dithering.
3) Vector to vector conversion reconciles the slightly different semantics of
different formats and, to some degree, handling coordinate systems. For
example, a ‘circle’ command in the original turns into a ‘circle’ command in
the translated file. Problems arise when the two formats don’t have
corresponding commands. One might approximate it or simulate it with a
series of short line segments.
4) Vector to bitmap rasterization is the task of taking an image described in a
vector graphics format (shapes) and converting it into a raster image (pixels
or dots) for output on a video display or printer, or for storage in a bitmap file
format. Rasterization refers to the popular rendering algorithm for displaying
three-dimensional shapes on a computer. Real-time applications need to
respond immediately to user input, and generally need to produce frame
rates of at least 25 frames per second to achieve smooth animation.
Rasterization is simply the process of computing the mapping from scene
geometry to pixels and does not prescribe a particular way to compute the
color of those pixels. Shading, including programmable shading, may be
based on physical light transport, or artistic intent.
5) Bitmap to vector conversion is more different than any of the previous types.
Determining a File Format
․A frequent problem in graphics file processing is determining the format of a
particular file. The easiest but least reliable way to do so is to use the file’s
extension. On most systems, a PCX file name ends with .PCX, a TIFF file
with .TIF or .TIFF and what not. A more reliable technique is what’s known as
the magic number approach. Nearly all file formats have an identifiable byte
string, either by deliberate design or by fortunate coincidence, at or near the
beginning of the file.
․A table of common types and magic numbers follows. The length and offset
are in decimal, with offset 0 being the beginning of the file. The magic
numbers are written as pairs of characters, meaning hex byte values, or as
single character, meaning the literal ASCII characters, and are written in the
actual order the bytes appear in the file.
format offset length value
MacPaint
PCX
GEM IMG
IFF/ILBM
BMP
Targa
0
0
0
0
0
0
4
1
4
4
2
1
GIF 87
GIF 89
JFIF
0
0
0
6
6
11
HP-GL
0
4
WMF
0
6
PCL
0
0
4
2
PostScript
PBM
PGM
PPM
0
0
0
0
2
2
2
2
comments
00 00 00 02 ……… version number field, not always set correctly
0a
00 01 00 08 ………. fourth byte may also 09
FORM
BM
00
No magic at front and the first in the being the length of the image ID--almost always 0, since most files don’t have an ID. More advanced editions
have an identifying string at the end of the file, which is “TRUEVISIONTARGA” followed by a period and a byte of binary zeros.
G I F 8 7 a ………… no extensions
GIF89a
FF D8 FF E0 xx xx the xx byte vary from file to file
4A 46 49 46 00
1B % 0 A
.………. Files usually start with an HP-GL command, which is 2
uppercase letters followed by a digit or semicolon. ( to be
printed on a PCL5 starts with this string to switch to HP-GL
mode--- the 0 is sometimes replaced by 1, 2, or 3. )
01 00 09 00 00 03 The 9 is the header size, which may in theory be larger if
the header is extended. The 03 is the windows version number.
D7 CD C6 9A …..… placeable WMF header
1 B E ……….…… Most beginning with this printer reset sequence. ( some
beginning with a longer sequence starting 1B and an asterisk. )
% ! ………….……. Header string not mandatory, but found in nearly all files.
P1
P 4 for raw PBM
P 2 ………………. P 5 for raw PBM
P3
P 6 for raw PBM
Refraction
․Refraction is the change in direction of a wave due to a change in its speed.
This is most commonly observed when a wave passes from one medium to
another. Refraction of light is the most commonly observed phenomenon, but
any type of wave can refract when it interacts with a medium, for example
when sound waves pass from one medium into another or when water waves
move into water of a different depth. Refraction is described by Snell's law,
which states that the angle of incidence θ1 is related to the angle of
refractionθ2 by
, where v1 and v2
are the wave velocities in the respective media,
and n1 and n2 the refractive indices. In general,
the incident wave is partially refracted and
partially reflected; the details of this behavior
are described by the Fresnel equations.
․The refractive index of a transparent substance
or material is defined as the relative speed at
which light moves through the material with
respect to its speed in a vacuum. By convention,
An image of the Golden Gate Bridge
is refracted and bent by many
differing three dimensional pools of
water
the refractive index of a vacuum is defined as having a value of 1.0, which
serves as a universally accepted reference point. The index of refraction of
other transparent materials, commonly identified by the variable n, is defined
through the equation: n (Refractive Index) = c/v , where c is the speed of
light in a vacuum and v is the velocity of light in the material.
Refraction in a Perspex (acrylic) block.
Reflection mapping
(下面為補充)
․In computer graphics, reflection mapping is an efficient method of simulating
a complex mirroring surface by means of a precomputed texture image. The
texture is used to store the image of the environment surrounding the
rendered object. There are several ways of storing the surrounding
environment; the most common ones are the Spherical Environment
Mapping in which a single texture contains the image of the surrounding as
reflected on a mirror ball, or the Cubic Environment Mapping in which the
environment is unfolded onto the six faces of a cube and stored therefore as
six square textures.
․This kind of approach is more efficient than the classical
ray tracing approach of computing the exact reflection by
shooting a ray and following its optically exact path, but
An example of
these are (sometimes crude) approximations of the real
reflection mapping.
reflection. Another important advantage is that it's the only
way to create reflections of real-world backgrounds in synthetic objects.
A typical drawback of this technique is the absence of self reflections: you
cannot see any part of the reflected object inside the reflection itself.
․Spherical environment mapping (sometimes known as standard environment
mapping) involves the use of a textured hollow sphere whose inside surface
has no parallax(視差) in relation the object that reflects it (i.e. every point on
the surface of the object reflects the same spherical data). A spherical texture
is created, using a fisheye lens or via prerendering a preexisting virtual scene,
and is mapped onto the sphere. Pixel colors in the final rendering pass are
determined by calculating the reflection vectors from the points on the object
to the texels in the environment map. This technique often produces results
which are superficially similar to those generated by raytracing, but is less
computationally expensive due to the colors of the points to be referenced
being known beforehand, simplifying the GPU workload down to calculating
the angles of incidence and reflection.
․There are limitations to spherical mapping that detract from their realism.
Because spherical maps are stored as azimuthal projections of the
environments they represent, there is an abrupt point of singularity (a “black
hole” effect) visible in the reflection on the object where texel colors at or near
the edge of the map are distorted due to inadequate resolution to represent
the points accurately.
․Cube mapping was developed to address this issue.
If cube maps are made and filtered correctly, they
have no visible seams (see below for detailed
explanation). They have since superseded sphere
maps in many contemporary graphical applications,
namely realtime rendering.
․Cube mapped reflection is a technique that uses
cube mapping to make objects look like they reflect
the environment around them. Generally, this is
done with the same skybox that is used in outdoor
renderings. Although this is not a true reflection
since objects around the reflective one will not be
seen in the reflection, the desired effect is usually
achieved.
A diagram depicting an reflection
being provided by cube mapped
reflection. The map is projected
onto the surface from the point
of view of the observer. Highlights
which in raytracing would be
provided by tracing the ray and
determining the angle made with the normal, can be 'fudged', if they are manually painted into the texture field
(or if they already appear there depending on how the texture map was obtained), from where they will be
projected onto the mapped object along with the rest of the texture detail.
․Cube mapped reflection is done by determining the vector that the object is
being viewed at. This camera ray is reflected about the surface normal of
where the camera vector intersects the object.
․This results in the reflected ray which is then passed to the cube map to get
the texel which the camera then sees as if it is on the surface of the object.
This creates the effect that the object is reflective.
․HEALPix environment mapping, is a technique basically like cube mapping,
but it uses a HEALPix map because it preserves better details than a cube
map does.
․Application in real-time 3D graphics:
Cube mapped reflection, when used correctly,
may be the fastest method of rendering a
reflective surface. To increase the speed of
rendering, each vertex calculates the position
Example of a three-dimensional
of the reflected ray. Then, the position is
model using cube mapped reflection
interpolated across polygons to which the vertex is attached.
This eliminates the need for recalculating every pixel's reflection.
․If normal mapping is used, each polygon has many face normals (the
direction a given point on a polygon is facing), which can be used in tandem
with an environment map to produce a more realistic reflection. In this case,
the angle of reflection at a given point on a polygon will take the normal map
into consideration. This technique is used to make an otherwise flat surface
appear textured, for example corrugated metal, or brushed aluminium.
․HEALPix mapped reflection, like cube mapping, is the fastest method of
rendering a reflective surface when used correctly.
Lambertian reflectance
․If a surface exhibits Lambertian reflectance, light falling on it is scattered such
that the apparent brightness of the surface to an observer is the same
regardless of the observer's angle of view. More technically, the surface
luminance is isotropic. For example, unfinished wood exhibits roughly
Lambertian reflectance, but wood finished with a glossy coat of polyurethane
(聚氨酯) does not, since specular highlights may appear at different locations
on the surface. Not all rough surfaces are perfect Lambertian reflectors, but
this is often a good approximation when the characteristics of the surface are
unknown. Lambertian reflectance is named after Johann Heinrich Lambert.
․In computer graphics, Lambertian reflection is often used as a model for
diffuse reflection. This technique causes all closed polygons (such as a
triangle within a 3D mesh) to reflect light equally in all directions when
rendered. The effect this has from the viewer's perspective is that rotating or
scaling the object does not change the apparent brightness of its surface. The
reflection is calculated by taking the dot product of the surface's normalized
normal vector N, and a normalized light-direction vector L, pointing from the
surface to the light source. This number is then multiplied by the color of the
surface and the intensity of the light hitting the surface: ID = L∙NCIL
where ID is the intensity of the diffusely reflected light (surface brightness), C
is the color and IL is the intensity of the incoming light. Because L∙N =
|N||L|cosθ , where α is the angle between the direction of the two vectors,
the intensity will be the highest if the normal vector points in the same
direction as the light vector (cos0 = 1, the surface will be perpendicular to
the direction of the light), and the lowest if the normal vector is perpendicular
to the light vector (cos(π / 2) = 0, the surface runs parallel with the direction
of the light).
․Lambertian reflection is typically accompanied by specular reflection, where
the surface luminance is highest when the observer is situated at the perfect
reflection direction, and falls off sharply. This is simulated in computer
graphics with various specular reflection models such as Phong, CookTorrance. etc. Spectralon (a thermoplastic resin that can be machined into a
wide variety of shapes for the fabrication of optical components. Spectralon
gives the highest diffuse reflectance of any known material.) is a material
which is designed to exhibit an almost perfect Lambertian reflectance, while
Scotchlite(反光材料為一表面具有回歸反射介子( Retro-reflector )之材質,其具
有能將光自光源之來向反射回去之特性,是專門為安全而設計的產品,利用每
平方英吋千萬個立方體的完全反射,在光線昏暗或晚上,大大增加穿著者的可
見度◦) is a material designed with the opposite intent of only reflecting light on
one line of sight. While Lambertian reflectance usually refers to the reflection
of light by an object, it can be used to refer to the reflection of any wave. For
example, in ultrasound imaging, "rough" tissues are said to exhibit Lambertian
reflectance.
Lambert's cosine law
․In optics, Lambert's cosine law says that the radiant intensity observed from a
"Lambertian" surface is directly proportional to the cosine of the angle θ
between the observer's line of sight and the surface normal. The law is also
known as the cosine emission law or Lambert's emission law.
․An important consequence of Lambert's cosine law is that when such a
surface is viewed from any angle, it has the same apparent radiance. This
means, for example, that to the human eye it has the same apparent
brightness (or luminance). It has the same radiance because, although the
emitted power from a given area element is reduced by the cosine of the
emission angle, the size of the observed area is decreased by a corresponding amount. Therefore, its radiance (power per unit solid angle per unit
projected source area) is the same. For example, in the visible spectrum, the
Sun is not a Lambertian radiator; its brightness is a maximum at the center of
the solar disk, an example of limb darkening. A black body is a perfect
Lambertian radiator.
․Lambertian scatterers: When an area element is radiating as a result of being
illuminated by an external source, the irradiance (energy or photons/time/area)
landing on that area element will be proportional to the cosine of the angle
between the illuminating source and the normal. A Lambertian scatterer will
then scatter this light according to the same cosine law as a Lambertian
emitter. This means that although the radiance of the surface depends on the
angle from the normal to the illuminating source, it will not depend on the
angle from the normal to the observer. For example, if the moon were a
Lambertian scatterer, one would expect to see its scattered brightness
appreciably diminish towards the terminator due to the increased angle at
which sunlight hit the surface. The fact that it does not diminish illustrates that
the moon is not a Lambertian scatterer, and in fact tends to scatter more light
into the oblique angles than would a Lambertian scatterer.
Details of equal brightness effect:
The situation for a Lambertian surface (emitting
or scattering) is illustrated in Figures 1 and 2.
For conceptual clarity, we will think in terms of
photons rather than energy or luminous energy.
The wedges in the circle each represent an
equal angle d Ω and, for a Lambertian surface, Figure 1: Emission rate (photons/s) in a
the number of photons per second emitted into normal and off-normal direction. The
each wedge is proportional to the area of the
number of photons/sec directed into any
wedge.
wedge is proportional to the area of the
It can be seen that the length of each wedge
wedge.
is the product of the diameter of the circle and
cosθ. It can also be seen that the maximum rate of photon emission per
unit solid angle is along the normal and diminishes to zero for θ = 90°. In
mathematical terms, the radiance along the normal is I photons/(s·cm2·sr)
and the number of photons per second emitted into the vertical wedge is I dΩ
dA. The number of photons per second emitted into the wedge at angle θ is
I cos(θ) dΩ dA.
Figure 2 represents what an observer sees.
The observer directly above the area element
will be seeing the scene through an aperture of
area dA0 and the area element dA will subtend a
(solid) angle of dΩ0. We can assume without loss
of generality that the aperture happens to subtend
solid angle dΩ when "viewed" from the emitting
area element. This normal observer will then be
recording I dΩ dA photons per second and so will
be measuring a radiance of
photons/s∙cm2∙sr.
Figure 2: Observed intensity
(photons/(s·cm2·sr)) for a normal
and off-normal observer; dA0 is
the area of the observing aperture
& dΩ is the solid angle subtended
The observer at angle θ to the normal will be
by the aperture from the viewpoint
seeing the scene through the same aperture of
of the emitting area element.
area dA0 and the area element dA will subtend
a (solid) angle of dΩ0 cosθ. This observer will be recording I cosθ dΩ dA
photons per second, and so will be measuring a radiance of
photons/s∙cm2∙sr, which is the same as the normal
observer.
specular highlight
․A specular highlight is the bright spot of light that
appears on shiny objects when illuminated (for
example, see image at right). Specular highlights
Specular highlights on a pair of
are important in 3D computer graphics, as they
spheres.
provide a strong visual cue for the shape of an object and its location with
respect to light sources in the scene.
․Microfacets: The term specular means that light is perfectly reflected in a
mirror-like way from the light source to the viewer. Specular reflection is
visible only where the surface normal is oriented precisely halfway between
the direction of incoming light and the direction of the viewer; this is called the
half-angle direction because it bisects (divides into halves) the angle between
the incoming light and the viewer. Thus, a specularly reflecting surface would
show a specular highlight as the perfectly sharp reflected image of a light
source. However, many shiny objects show blurred specular highlights.
This can be explained by the existence of microfacets. We assume that
surfaces that are not perfectly smooth are composed of many very tiny facets,
each of which is a perfect specular reflector. These microfacets have normals
that are distributed about the normal of the approximating smooth surface.
The degree to which microfacet normals differ from the smooth surface
normal is determined by the roughness of the surface.
․The reason for blurred specular highlights is now clear. At points on the
object where the smooth normal is close to the half-angle direction, many of
the microfacets point in the half-angle direction and so the specular highlight
is bright. As one moves away from the center of the highlight, the smooth
normal and the half-angle direction get farther apart; the number of
microfacets oriented in the half-angle direction falls, and so the intensity of the
highlight falls off to zero.
․The specular highlight often reflects the color of the light source, not the color
of the reflecting object. This is because many materials have a thin layer of
clear material above the surface of the pigmented material. For example
plastic is made up of tiny beads of color suspended in a clear polymer and
human skin often has a thin layer of oil or sweat above the pigmented cells.
Such materials will show specular highlights in which all parts of the color
spectrum are reflected equally. On metallic materials such as gold the color of
the specular highlight will reflect the color of the material.
․Models of microfacets: A number of different models exist to predict the
distribution of microfacets. Most assume that the microfacet normals are
distributed evenly around the normal; these models are called isotropic. If
microfacets are distributed with a preference for a certain direction along the
surface, the distribution is anisotropic.
NOTE: In most equations, when it says (A∙B) it means Max(0, (A∙B)).
․Phong distribution: In the Phong reflection model, the intensity of the specular
highlight is calculated as:
or
Where R is the mirror reflection of the light vector off the surface, and V is the
viewpoint vector.
․In the Blinn–Phong shading model, the intensity of a specular highlight is
calculated as:
or
, where N is the smooth
surface normal and H is the half-angle direction (the direction vector midway
between L, the vector to the light, and V, the viewpoint vector).
․The number n is called the Phong exponent, and is a user-chosen value that
controls the apparent smoothness of the surface. These equations imply that
the distribution of microfacet normals is an approximately Gaussian
distribution, or approximately Pearson type II distribution, of the corresponding angle. While this is a useful heuristic and produces believable results, it is
not a physically based model.
․A slightly better model of microfacet distribution can be created using a
Gaussian distribution. The usual function calculates specular highlight
intensity as:
, where m is a constant between 0 and 1 that
controls the apparent smoothness of the surface.
․A physically based model of microfacet distribution is the Beckmann
distribution. This function gives very accurate results, but is also rather
expensive to compute.
, where m is the
average slope of the surface microfacets.
․The Heidrich–Seidel distribution is a simple anisotropic distribution, based on
the Phong model. It can be used to model surfaces that have small parallel
grooves or fibers, such as brushed metal, satin, and hair. The specular
highlight intensity for this distribution is:
where n is the anisotropic exponent, V is the viewing direction, L is the
direction of incoming light, and T is the direction parallel to the grooves or
fibers at this point on the surface. If you have a unit vector D which specifies
the global direction of the anisotropic distribution, you can compute the vector
T at a given point by the following:
, where N is the unit
normal vector at that point on the surface. You can also easily compute the
cosine of the angle between the vectors by using a property of the dot product
and the sine of the angle by using the trigonometric identities.
․It should be noted that the anisotropic kspec should be used in conjunction
with a non-anisotropic distribution like a Phong distribution to produce the
correct specular highlight.
․The Ward anisotropic distribution uses two user-controllable parameters
αx and αy to control the anisotropy. If the two parameters are equal, then an
isotropic highlight results. The specular term in the distribution is:
The specular term is zero if N·L < 0 or N·E < 0. All vectors are unit vectors.
The vector V is the vector from the surface point to the eye, L is the direction
from the surface point to the light, H is the half-angle direction, N is the
surface normal, and X and Y are two orthogonal vectors in the normal plane
which specify the anisotropic directions.
․The Cook–Torrance model uses a specular term of the form
. Here D is
the Beckmann distribution factor,
and F is the Fresnel term,
. For a performance reasons in a real-time 3D graphics
Schlick's approximation is often used to approximate Fresnel term. G is the
Geometric attenuation term, describing selfshadowing due to the microfacets,
and is of the form
. In these formulas E is
the vector to the camera or eye, H is the half-angle vector, L is the vector to
the light source and N is the normal vector, and α is the angle between H
and N.
․Using multiple distributions: If desired, different distributions (usually, using
the same distribution function with different values of m or n) can be
combined using a weighted average. This is useful for modeling, for example,
surfaces that have small smooth and rough patches rather than uniform
roughness.
Phong shading
․Phong shading refers to a set of techniques in 3D computer graphics. Phong
shading includes a model for the reflection of light from surfaces and a
compatible method of estimating pixel colors by interpolating surface normals
across rasterized polygons.
․The model of reflection may also be referred to as the Phong reflection model,
Phong illumination or Phong lighting. It may be called Phong shading in the
context of pixel shaders or other places where a lighting calculation can be
referred to as "shading". The interpolation method may also be called Phong
interpolation, which is usually referred to by "per-pixel lighting". Typically it is
called "shading" when contrasted with other interpolation methods such as
Gouraud shading or flat shading. The Phong reflection model may be used in
conjunction with any of these interpolation methods.
․Phong reflection is an empirical model of local illumination. It describes the
way a surface reflects light as a combination of the diffuse reflection of rough
Surfaces with the specular reflection of shiny surfaces. It is based on Bui
Tuong Phong's informal observation that shiny surfaces have small intense
specular highlights, while dull surfaces have large highlights that fall off more
gradually. The reflection model also includes an ambient term to account for
the small amount of light that is scattered about the entire scene.
Visual illustration of the Phong equation: here the light is white, the ambient and diffuse colors are both blue,
and the specular color is white, reflecting almost all of the light hitting the surface, but only in very narrow
highlights. The intensity of the diffuse component varies with the direction of the surface, and the ambient
component is uniform (independent of direction).
For each light source in the scene, we define the components is and id as the
intensities (often as RGB values) of the specular and diffuse components of
the light sources respectively. A single term ia controls the ambient lighting;
it is sometimes computed as a sum of contributions from all light sources. For
each material in the scene, we define:
ks: specular reflection constant, the ratio of reflection of the specular term of
incoming light
kd: diffuse reflection constant, the ratio of reflection of the diffuse term of
incoming light (Lambertian reflectance)
ka: ambient reflection constant, the ratio of reflection of the ambient term
present in all points in the scene rendered
α: is a shininess constant for this material, which is larger for surfaces that
are smoother and more mirror-like. When this constant is large, the
specular highlight is small.
․We further define lights as the set of all light sources, L as the direction
vector from the point on the surface toward each light source, N as the normal
at this point on the surface, R as the direction that a perfectly reflected ray of
light would take from this point on the surface, and V as the direction pointing
towards the viewer (such as a virtual camera).
Then the Phong reflection model provides an equation for computing the
shading value of each surface point Ip: Ip = KaIa+  ( K ( L  N ) i  K ( R  V ) i )

d
d
s
s
light
․The diffuse term is not affected by the viewer direction (V). The specular term
is large only when the viewer direction (V) is aligned with the reflection
direction R. Their alignment is measured by the α power of the cosine of the
angle between them. The cosine of the angle between the normalized vectors
R and V is equal to their dot product. When α is large, in the case of a nearly
mirror-like reflection, the specular highlight will be small, because any
viewpoint not aligned with the reflection will have a cosine less than one
which rapidly approaches zero when raised to a high power. When we have
color representations as RGB values, this equation will typically be calculated
separately for R, G and B intensities.
Phong shading interpolation example
․Phong shading improves upon Gouraud shading and provides a better
approximation of the shading of a smooth surface. Phong shading assumes a
smoothly varying surface normal vector. The Phong interpolation method
works better than Gouraud shading when applied to a reflection model that
has small specular highlights such as the Phong reflection model.
․The most serious problem with Gouraud shading occurs when specular
highlights are found in the middle of a large polygon. Since these specular
highlights are absent from the polygon's vertices and Gouraud shading
interpolates based on the vertex colors, the specular highlight will be missing
from the polygon's interior. This problem is fixed by Phong shading.
․Unlike Gouraud shading, which interpolates colors across polygons, in Phong
shading we linearly interpolate a normal vector across the surface of the
polygon from the polygon's vertex normals. The surface normal is interpolated
and normalized at each pixel and then used in the Phong reflection model to
obtain the final pixel color. Phong shading is more computationally expensive
than Gouraud shading since the reflection model must be computed at each
pixel instead of at each vertex. In some modern hardware, variants of this
algorithm are implemented using pixel or fragment shaders. This can be
accomplished by coding normal vectors as secondary colors for each polygon,
have the rasterizer use Gouraud shading to interpolate them and interpret
them appropriately in the pixel or fragment shader to calculate the light for
each pixel based on this normal information.
****************************************************************(下面為補充) *****************
․Dithering can enhance the range and effectiveness of a display device having
fewer color or intensity resources. Its fundamental purpose is, in a sense, to
fool the human eye into thinking a dithered image contains more colors or
gray levels than it actually has.
․The eye itself has a resolution limit of about 1 minute of arc. At a distance of
10~12 inches, a typical viewing distance for a printed page, this corresponds
to a width of about 0.003 inches (and as it turns out, 0.003 inches is about
the size of the dots printed by a 300-dpi laser printer). It means that two dots
or objects that are closer together than 0.003 inches cannot be distinguished
at that distance. On complex imagery, we don’t always ‘see’ to this resolution
limit, the human brain performs spatial integration, averaging intensities in
small areas to reduce the image’s apparent complexity.
․Another important property of the eye is its intensity response, which can be
thought of as a graph of actual versus perceived intensity. This is not linear,
but logarithmic. Thus, the perceived relative brightness of 2 light sources is
based on the ratio of their intensities, not the difference of their intensities.
The smallest intensity change the eye can detect is an intensity ratio of about
1.01, which has important implications for image rendering.
․A continuous-tone image is one in which intensity changes between adjacent
image areas are fine enough that the eye cannot perceive discrete differences. In the case of a black and white image, this occurs when it is 1.01 or
less.
․Dynamic Range: The ratio of the brightest perceived intensity to the least
bright one is referred to as the medium’s dynamic range.
․There must be enough reproducible intensities, or grey levels, between the
two intensity extremes such that successive intensities vary by a ratio of 1.01
or less in order to produce continuous-tone image. That is, the minimum
number of required gray levels, n, is: n = log1.01 (1/ i0 ), where 1 meaning the
max intensity and i0 the least intensity. Newsprint, for example, has a
dynamic range of about 10 and, hence, the min number of intensity is 230.
Ex: The paper printed in a laser printer or copier has a dynamic range of
perhaps 20. How the 256-level grey image appear on the newsprint or on
the copier? Is it continuous?
A: Yes!
Descargar

Neural Networks: Ch. 1 & Ch. 2