Tutorial on
Neural Network Models for Speech
and Image Processing
B. Yegnanarayana
Speech & Vision Laboratory
Dept. of Computer Science & Engineering
IIT Madras, Chennai-600036
[email protected]
WCCI 2002, Honululu, Hawaii, USA
May 12, 2002
1
Need for New Models of
Computing for Speech & Image
Tasks
• Speech & Image processing tasks
• Issues in dealing with these tasks by human
beings
• Issues in dealing with the tasks by machine
• Need for new models of computing in dealing
with natural signals
• Need for effective (relevant) computing
• Role of Artificial Neural Networks (ANN)
2
< Prev
Next >
Organization of the Tutorial
Part I
Feature extraction and classification
problems with speech and image data
Part II
Basics of ANN
Part III
ANN models for feature extraction and
classification
Part IV
Applications in speech and image
processing
3
< Prev
Next >
PART I
Feature Extraction and
Classification Problems in
Speech and Image
4
Feature Extraction and Classification
Problems in Speech and Image
• Distinction between natural and synthetic
signals (unknown model vs known model
generating the signal)
• Nature of speech and image data (nonrepetitive data, but repetitive features)
• Need for feature extraction and classification
• Methods for feature extraction and models for
classification
• Need for nonlinear approaches (methods and
models)
5
< Prev
Next >
Speech
vs
Audio
• Audio (audible) signals (noise, music, speech
and other signals)
• Categories of audio signals
– Audio signal vs non-signal (noise)
– Signal from speech production mechanism
vs other audio signals
– Non-speech vs speech signals (like with
natural language)
6
< Prev
Next >
Speech Production Mechanism
7
< Back
Different types of sounds
8
< Back
Categorization of sound units
9
< Back
Nature of Speech Signal
• Digital speech: Sequence of samples or
numbers
• Waveform for word “MASK” (Figure)
• Characteristics of speech signal
– Excitation source characteristics
– Vocal tract system characteristics
10
< Prev
Next>
Waveform for the word “mask”
<Back
11
Source-System Model of
Speech Production
Pitch
period
Impulse
train generator
Voice/
unvoiced
switch
u(n)
Random
noise generator
X
Vocal tract
parameters
Time-varying
digital filter
s(n)
G
12
< Prev
Next >
Features from Speech Signal
(demo)
• Different components of speech
(speech, source and system)
• Different speech sound units (Alphabet
in Indian Languages)
• Different emotions
• Different speakers
13
< Prev
Next >
Speech Signal Processing
Methods
• To extract source-system features and
suprasegmental features
• Production-based features
• DSP-based features
• Perception-based features
14
< Prev
Next >
Models for Matching and
Classification
• Dynamic Time Warping (DTW)
• Hidden Markov Models (HMM)
• Gaussian Mixture Models (GMM)
15
< Prev
Next >
Applications of Speech
Processing
• Speech recognition
• Speaker recognition/verification
• Speech enhancement
• Speech compression
• Audio indexing and retrieval
16
< Prev
Next >
Limitations of Feature Extraction
Methods and Classification Models
• Fixed frame analysis
• Variability in the implicit pattern
• Not pattern-based analysis
• Temporal nature of the patterns
17
< Prev
Next >
Need for New Approaches
• To deal with ambiguity and variability
in the data for feature extraction
• To combine evidence from multiple
sources (classifiers and knowledge
sources)
18
< Prev
Next >
Images
• Digital Image - Matrix of numbers
• Types of Images
– line sketches, binary, gray level and color
– Still images, video, multimedia
< Prev
19
Next >
Image Analysis
• Feature extraction
• Image segmentation: Gray level, color,
texture
• Image classification
20
< Prev
Next >
Processing of Texture-like Images
2-D Gabor Filter
f ( x , y ,  , , x, y ) 
1
2  x  y
exp
A typical Gaussian
filter with =30
< Prev
x 2
y 2
1
((
)
( )
 2
x
y

)
j  ( x cos   y sin  )
A typical Gabor filter with
=30, =3.14 and =45 21
Next >



Limitations
• Feature extraction
• Matching
• Classification methods/models
22
< Prev
Next >
Need for New Approaches
• Feature extraction: PCA and nonlinear
PCA
• Matching: Stereo images
• Smoothing: Using the knowledge of
image and not noise
• Edge extraction and classification:
Integration of global and local
information or combining evidence
23
< Prev
Next >
PART II
Basics of ANN
24
Artificial Neural Networks
• Problem solving: Pattern recognition
tasks by human and machine
• Pattern
vs
data
• Pattern processing
vs
data processing
• Architectural mismatch
• Need for new models of computing
25
< Prev
Next >
Biological Neural Networks
• Structure and function: Neurons,
interconnections, dynamics for learning
and recall
• Features: Robustness, fault tolerance,
flexibility, ability to deal with variety of
data situations, collective computation
• Comparison with computers: Speed,
processing, size and complexity, fault
tolerance, control mechanism
• Parallel and Distributed Processing (PDP)
models
26
< Prev
Next >
Basics of ANN
• ANN terminology: Processing unit (fig),
interconnection, operation and update
(input, weights, activation value, output
function, output value)
• Models of neurons: MP neuron, perceptron
and adaline
• Topology (fig)
• Basic learning laws (fig)
27
< Prev
Next >
Model of a Neuron
28
<back
Topology
<back
29
Basic Learning Laws
30
<back
Activation and Synaptic Dynamic
Models
• General activation dynamics model
x i ( t )   A ix i  ( B i  C i x i )( I i  f i ( x i ))  ( E i  D ix i )( J i 
Passive
decay
term
Excitatory term

f j ( x j ) w ij )
ji
Inhibitory term
• Synaptic dynamics model
 ij ( t )   w ij ( t )  s i ( t ) s j ( t )
w
Correlation term
Passive
decay term
• Stability and convergence
<Prev
31
Next>
Functional Units and Pattern
Recognition Tasks
• Feedforward ANN
– Pattern association
– Pattern classification
– Pattern mapping/classification
• Feedback ANN
– Autoassociation
– Pattern storage (LTM)
– Pattern environment storage (LTM)
• Feedforward and Feedback (Competitive
Learning) ANN
– Pattern storage (STM)
– Pattern clustering
– Feature map
32
< Prev
Next >
Two Layer Feedforward Neural
Network (FFNN)
33
< Prev
Next >
PR Tasks by FFNN
• Pattern association
–
–
–
–
Architecture: Two layers, linear processing, single set of weights
Learning:, Hebb's (orthogonal) rule, Delta (linearly independent) rule
Recall: Direct
Limitation: Linear independence, number of patterns restricted to input
dimensionality
– To overcome: Nonlinear processing units, leads to a pattern classification
problem
• Pattern classification
– Architecture: Two layers, nonlinear processing units, geometrical
interpretation
– Learning: Perceptron learning
– Recall: Direct
– Limitation: Linearly separable functions, cannot handle hard problems
– To overcome: More layers, leads to a hard learning problem
• Pattern mapping/classification
– Architecture: Multilayer (hidden), nonlinear processing units, geometrical
interpretation
– Learning: Generalized delta rule (backpropagation)
– Recall: Direct
– Limitation: Slow learning, does not guarantee convergence
– To overcome: More complex architecture
34
< Prev
Next >
Perceptron Network
• Perceptron classification problem
• Perceptron learning law
• Perceptron convergence theorem
• Perceptron representation problem
• Multilayer perceptron
35
< Prev
Next >
Geometric Interpretation of
Perceptron Learning
36
< Prev
Next >
Generalized Delta Rule
(Backpropagation Learning)
o

 w kj   s s , s  ( bk  f ) f k
o
k
h
j
o
k
o
k
K
w
h
ji
h
o

  s a i , s  f j  w kj s k
h
j
h
j
k 1
< Prev
37
Next >
Issues in Backpropagation
Learning
• Description and features of error
backpropagation
• Performance of backpropagation learning
• Refinements of backpropagation learning
• Interpretation of results of learning
• Generalization
• Tasks with backpropagation network
• Limitations of backpropagation learning
• Extensions to backpropagation
38
< Prev
Next >
PR Tasks by FBNN
• Autoassociation
–
–
–
–
–
Architecture: Single layer with feedback, linear processing units
Learning: Hebb (orthogonal inputs), Delta (linearly independent inputs)
Recall: Activation dynamics until stable states are reached
Limitation: No accretive behavior
To overcome: Nonlinear processing units, leads to a pattern storage
problem
• Pattern Storage
– Architecture: Feedback neural network, nonlinear processing units,
states, Hopfield energy analysis
– Learning: Not important
– Recall: Activation dynamics until stable states are reached
– Limitation: Hard problems, limited number of patterns, false minima
– To overcome: Stochastic update, hidden units
• Pattern Environment Storage
– Architecture: Boltzmann machine, nonlinear processing units, hidden
units, stochastic update
– Learning: Boltzmann learning law, simulated annealing
– Recall: Activation dynamics, simulated annealing
– Limitation: Slow learning
39
– To Overcome: Different architecture
< Prev
Next >
Hopfield Model
• Model
• Pattern storage condition
sgn(
w
ij
a kj )  a ki
i  1,... N
where
k  1,... L
j
w ij 
1
N
L
a
li
a lj
l 1
• Capacity of Hopfield model: Number of
patterns for a given probability of error
• Energy analysis:
V
1
2
w
V  0
Continuous Hopfield model:
< Prev
 s
s sj 
ij i
f (x ) 
i i
1 e
Next >
 x
1 e
x
40
State Transition Diagram
41
< Prev
Next >
Computation of Weights for Pattern Storage
Patterns to be
stored (111)
and (010).
Results in set
of inequalities
to be
satisfied.
42
< Prev
Next >
Pattern Storage Tasks
• Hard problems : Conflicting requirements on a
set of inequalities
• Hidden units: Problem of false minima
• Stochastic update
Stochastic equilibrium:
Boltzmann-Gibbs Law
< Prev

P ( s ) 
Next >
E
T
e
Z
43
Simulated
Annealing
44
< Prev
Next >
Boltzmann Machine
• Pattern environment storage
• Architecture: Visible units, hidden
units, stochastic update, simulated
annealing
• Boltzmann Learning Law:
 w ij 

T

( p ij


p ij )
45
< Prev
Next >
Discussion on Boltzmann Learning
•
Expression for Boltzmann learning
–
–
–
–
•
Implementation of Boltzmann learning
–
–
–
–
•
Significance of p+ij and p-ij
Learning and unlearning
Local property
Choice of  and initial weights
Algorithm for learning a pattern environment
Algorithm for recall of a pattern
Implementation of simulated annealing
Annealing schedule
Pattern recognition tasks by Boltzmann machine
– Pattern completion
– Pattern association
– Recall from noisy or partial input
•
Interpretation of Boltzmann learning
– Markov property of simulated annealing
– Clamped-free energy and full energy
•
Variations of Boltzmann learning
– Deterministic Boltzmann machine
– Mean-field approximation
< Prev
46
Next >
Competitive Learning Neural
Network (CLNN)
Output layer with
on-center and
1
off-surround
connections
Input layer
1
2
j
N
2
j
M
47
< Prev
Next >
PR Tasks by CLNN
•
Pattern storage (STM)
–
–
–
–
–
•
Architecture: Two layers (input and competitive), linear processing units
Learning: No learning in FF stage, fixed weights in FB layer
Recall: Not relevant
Limitation: STM, no application, theoretical interest
To overcome: Nonlinear output function in FB stage, learning in FF stage
Pattern clustering (grouping)
– Architecture: Two layers (input and competitive), nonlinear processing units in
the competitive layer
– Learning: Only in FF stage, Competitive learning
– Recall: Direct in FF stage, activation dynamics until stable state is reached in
FB layer
– Limitation: Fixed (rigid) grouping of patterns
– To overcome: Train neighbourhood units in competition layer
•
Feature map
– Architecture: Self-organization network, two layers, nonlinear processing units,
excitatory neighbourhood units
– Learning: Weights leading to the neighbourhood units in the competitive layer
– Recall: Apply input, determine winner
– Limitation: Only visual features, not quantitative
– To overcome: More complex architecture
48
< Prev
Next >
Learning Algorithms for PCA networks
49
< Prev
Next >
Self Organization Network
Output layer
Input layer
t
(a) Network structure
(b) Neighborhood regions at
different times in the output
layer
50
< Prev
Next >
Illustration of SOM
51
< Prev
Next >
PART III
ANN Models for Feature
Extraction and Classification
52
Next >
Neural Network Architecture and
Models for Feature Extraction
• Multilayer Feedforward Neural Network
(MLFFNN)
• Autoassociative Neural Networks
(AANN)
• Constraint Satisfaction Models (CSM)
• Self Organization MAP (SOM)
• Time Delay Neural Networks (TDNN)
• Hidden Markov Models (HMM)
53
< Prev
Next >
Multilayer FFNN
• Nonlinear feature extraction followed by
linearly separable classification problem
54
< Prev
Next >
Multilayer FFNN
• Complex decision
hypersurfaces for
classification
• Asymptotic
approximation of a
posterior class
probabilities
55
< Prev
Next >
Radial Basis Function
• Radial Basis Function NN: Clustering followed by
classification
Basis function
j(a)
Class labels
Input vector a
j
c1
cN
56
< Prev
Next >
Autoassociation Neural Network
(AANN)
•
•
•
•
Architecture
Nonlinear PCA
Feature extraction
Distribution capturing ability
57
< Prev
Next >
Autoassociation Neural Network
(AANN)
• Architecture
Input Layer
Dimension
Compression
Hidden Layer
Output Layer
58
<Back
Distribution Capturing Ability of
AANN
• Distribution of feature vector (fig)
• Illustration of distribution in 2D
case (fig)
• Comparison with Gaussian Mixture
Model (fig)
59
< Prev
Next >
Distribution of feature vector
<Back
60
(a) Illustration of distribution in 2D case
(b,c) Comparison with Gaussian Mixture Model
<Back
61
Feature Extraction by AANN
• Input and output to AANN: Sequence of
signal samples
(captures dominant 2nd order statistical
features)
• Input and output to AANN: Sequence of
Residual samples
(captures higher order statistical
features in the sample sequence)
62
< Prev
Next >
Constraint Satisfaction Model
• Purpose: To satisfy the given (weak)
constraints as much as possible
• Structure: Feedback network with units
(hypotheses), connections (constraints /
knowledge)
• Goodness of fit function: Depends on the
output of unit and connection weights
• Relaxation Strategies: Deterministic and
Stochastic
63
< Prev
Next >
Application of CS Models
• Combining evidence
• Combining classifiers outputs
• Solving optimization problems
64
< Prev
Next >
Self Organization Map
(illustrations)
• Organization of 2D input to 1D feature
mapping
• Organization of 16 Dimensional LPC
vector to obtain phoneme map
• Organization of large document files
65
< Prev
Next >
Time Delay Neural Networks for
Temporal Pattern Recognition
66
< Prev
Next >
Stochastic Models for Temporal
Pattern Recognition
• Maximum likelihood formulation:
Determine the class w, given the
observation symbol sequence y, using
criterion
max P ( y / w )
w
• Markov Models
• Hidden Markov Models
67
< Prev
Next >
PART IV
Applications in Speech & Image
Processing
68
Applications in Speech and Image
Processing
• Edge extraction in texture-like images
• Texture segmentation/classification by CS
model
• Road detection from satellite images
• Speech recognition by CS model
• Speaker recognition by AANN model
69
< Prev
Next >
Problem of Edge Extraction in
Texture-like Images
•
•
•
•
Nature of texture-like images
Problem of edge extraction
Preprocessing (1-D) to derive partial evidence
Combining evidence using CS model
70
< Prev
Next >
Problem of Edge Extraction
• Texture Edges are the locations where there is an abrupt
change in texture properties
Image with 4 natural
texture regions
Edgemap showing
micro edges
Edgemap showing
macro edges
71
< Prev
Next >
1-D processing using Gabor Filter and
Difference Operator
• 1-D Gabor smoothing filter : Magnitude and Phase
1-D Gabor Filter: Gaussian modulated by a complex sinusoidal
f ( x, , ) 
1
2 
exp(
x
2
2
2
 j x )
Even Component
fc ( x) 
1
2 
exp(
2
 x
2
2
) cos(  x )
Odd Component
fs ( x) 
1
2 
exp(
 x
2
2
2
) sin(  x )
72
< Prev
Next >
1-D processing using Gabor filter and
Difference operator (contd.)
• Differential operator for edge evidence
– First derivative of 1-D Gaussian function
c( y ) 
 y
2 
3
exp(
 y
2
2
2
)
• Need for a set of Gabor filters
73
< Prev
Next >
Texture Edge Extraction using 1-D Gabor
Magnitude and Phase
• Apply 1-D Gabor filter along each of the parallel lines of an
image in one direction ( say, horizontal )
• Apply all Gabor filters of the filter bank in a similar way
• For each of the Gabor filtered output, partial edge
information is extracted by applying the 1-D differential
operator in the orthogonal direction ( say, vertical )
• The entire process is repeated in the orthogonal (vertical
and horizontal) directions to obtain the partial edge evidence
in the other direction
• The partial edge evidence is combined using a Constraint
Satisfaction Neural Network Model
74
< Prev
Next >
Texture Edge Extraction using a set of 1-D Gabor Filters
Input Image
Bank of 1-D Gabor Filters
Filtered Image
Post-processing using 1-D Differential operator and
Thresholding
Edge evidence
Combining the Edge evidence using Constraint
Satisfaction Neural Network Mode
Edge map
< Prev
Next >
75
Combining Evidence using CSNN model
Structure of 3-D CSNN Model
J
I
K
3D lattice of size IxJxK
+ve
Connections
among the
nodes across
the layers of
for each pixel
-ve
Connections
from a set of
neighboring
nodes to each
node in the
same layer.
76
< Prev
Next >
Combining the Edge Evidence using
Constraint Satisfaction Neural Network
(CSNN) Model
•
•
•
•
•
•
•
Neural network model contains nodes arranged in a 3-D
lattice structure
Each node corresponds to a pixel in the post-processed
Gabor filter output
Post processed output of a single 1-D Gabor filter is an input
to one 2-D layer of nodes
Different layers of nodes, each corresponding to a particular
filter output, are stacked one upon the other to form the 3-D
structure
Each node represents a hypothesis
Connection between two nodes represents a constraint
Each node is connected to other nodes with inhibitory and
excitatory connections
77
< Prev
Next >
Combining Evidence using CSNN model
(contd.)
Let, W i , j , k ,i 1, j 1, k 1 represents the weight of the connection from
node (i,j,k) to node (i1,j1,k) within each layer k, and the
W i , j , k ,i 1 , j 1 , k 1 represents the constraint between
weight
the nodes in two different layers (k and k1) in the same
column. These are given as:
 1
, if i  i1  1 or
 8
 1

,
if i  i1  1 or
 
16

 1
,
if i  i1  1 or

 8
W i , j , k ,i 1 , j 1 , k
j  j1  1
j  j1  2
j  j1  3
• The node is connected to other nodes in the same
column with excitatory connections
and
W i , j , k ,i , j , k 1 
1
2 ( K  1)
78
< Prev
Next >
Combining Evidence using CSNN model
(contd.)
•
 i , j , k  {0 ,1} as the output of the node (i,j,k),
Using the notation
and the set {  i , j ,k , i , j , k } as the state if the network
•
The state of the neural network model is initialized using:
1, if the pixel has evidence
 i , j ,k , ( 0 )  
 0 , otherwise
of an edge pixel
•
In the deterministic relaxation method, the state of the network is
updated iteratively by changing the output of each node at one time
•
The state of each node is obtained using:
Ui,j,k (n) =  Wi,j,k,i1,j1,k i1,j1,k +  Wi,j,k,i,j,k1 i,j,k1 +Ii,j,k
Where Ui,j,k(n) is the net input to node(i,j,k) at nth iteration, and Ii,j,k
is the external input given to the node (i,j,k)
•
The state of the network is updated using:
1, if U i,j,k  
 i , j , k ( n  1)  
where  is the threshold
 0 , otherwise
79
< Prev
Next >
Comparison of Edge Extraction using Gabor
Magnitude and Gabor Phase
Texture Image
Texture Image
2-D Gabor Filter
2-D Gabor Filter
1-D Gabor
Magnitude
1-D Gabor
Magnitude
1-D Gabor
Phase
1-D Gabor
Phase
80
< Prev
Next >
Texture Segmentation and
Classification
• Image analysis (revisited)
• Problem of texture segmentation and
classification
• Preprocessing using 2D Gabor filter to derive
feature vector
• Combining the partial evidence using CS model
81
< Prev
Next >
CS Model for Texture Classification
• Supervised and unsupervised problem
• Modeling of image constraint
• Formulation of a posterior probability CS
model
• Hopfield neural network model and its energy
function
• Deterministic and Stochastic relaxation
strategies
82
< Prev
Next >
CS Model for Texture ClassificationModeling of Image Constraints
• Feature formation process: Defined by the conditional probability
of the feature vector gs of each pixels given the model
2
2
parameter of each class k.
 || g s   k || / 2  k
P (G s  g s | Ls  k ) 
e
( 2 )  k
M
2
• Partition process: Defines the probability of the label of a pixel
given the label of the pixels in its pth order neighborhood.
P ( Ls | Lr ,  r  N ) 
p
s
e
  r  N sp  ( L s  Lr )
Zp
• Label competition process: Describes the conditional probability of
assigning a new label to an already labeled pixel

P ( Ls  k | Ls  l )l 
< Prev
e
 l 
(k l )
Zc
83
Next >
CS Model for Texture ClassificationModeling of Image Constraints (contd.)
• Formulation of Posteriori Probability
P ( Ls  k | G s  g s , Lr ,  r  N s , Ls  l ) l
p

1
p
e
 E ( L s  k |G s  g s , L r ,   N s , L s  l )  l
Z
where
E ( Ls  k | G s  g s , Lr ,  r  N s , Ls  l ) l
p

|| g s   k ||
2
and
2
k
2

1
2
ln(( 2  )  k ) 
m
2
  ( L
s
 Lr ) 
  (k  l )
l
p
 r N s
Z  Z p Z c P (G s  g s ) P ( Ls  k )
• Total energy of the system
E
total


E ( Ls  k | G s  g s , Lr ,  r  N s , Ls  l )  l
p
s ,k
< Prev
84
Next >
CS Model for Texture Classification
(ijK)
K
(ijK)
(ijk)
(ijk)
(ij1)
k
+ve
-ve
J
(ij1)
I
Connections among the
nodes across the layers
of for each pixel
Connections from a
set of neighboring
nodes to each node
in the same layer.
E
state
< Prev
85
Next >
Hopfield Neural Network and its
Energy Function
E
Hopfield

o1
oj
B1
Bj
oN
BN
1
W


2
i
O iO i1 
i ,i 1
i1
 BO
i
i
i
K
I
E
Hopfield
 
1
W


2
i , j ,k i 1, j1,k 1
J
O i , j ,k O i 1, j 1,k 1 
i , j , k ,i 1 , j 1 , k 1
B
i , j ,k
O i , j ,k
i , j ,k
86
< Prev
Next >
Results of Texture Classification Natural Textures
Natural Textures
Initial Classification
Final Classification
87
< Back
Results of Texture Classification Remotely Sensing Data
Band-2 IRS image
containing 4 texture
classes
Initial Classification
Final Classification
88
< Back
Results of Texture Classification Multispectral Data
SIR-C/X-SAR image
of Lost City of Ubar
Classification using
multispectral information
Classification using
multispectral and textural
information
89
< Back
Speech Recognition using CS
Model
• Problem of recognition of SCV unit (Table)
• Issues in classification of SCVs(Table)
• Representation of isolated utterance of
SCV unit
– 60ms before and 140 ms after vowel
onset point
– 240 dimensional feature vector
consisting of weighted cepstral
coefficients
• Block diagram of the recognition system
for SCV unit (Fig)
• CS network for classification of SCV
unit(Fig)
90
< Prev
Next >
Problem of Recognition of SCV
Units
<Back
91
Issues in Classification of SCVs
• Importance of SCVs
– High frequency of occurrence: About 45%
• Main Issues in Classification of SCVs
– Large number of SCV classes
– Similarity among several SCVs classes
• Model of Classification of SCVs
– Should have good discriminatory capablity
( Artificial neural networks )
- Should be able to handle large number of classes
( Neural networks based on a modular approach )
<Back
92
Block Diagram of
Recognition
System for SCV
Units
<Back
93
CS Network for Classification of SCV Units
POA Feedback
Subnetwork
External evidence of bias for the
node is computed using the output
of the MLFFNN5
External evidence of bias for the
node is computed using the output
of the MLFFNN1
External
evidence of bias
for the node is
computed using
the output of
the MLFFNN9
Vowel Feedback
Subnetwork
<Back
MOA Feedback
Subnetwork
94
Classification Performance of CSM
and other SCV Recognition Systems
on Test Data of 80 SCV Classes
S C V Re cog nition
S y ste m
C a se 1
D e cision C rite ria
C a se 2
C a se 3
C a se 4
H M M b a se d system
4 5 .5
5 9 .2
6 5 .9
7 1 .4
8 0 -cla ss M LFFN N
4 5 .3
5 9 .7
6 6 .9
7 2 .2
M O A m odu lar
n e tw ork
2 9 .2
5 0 .2
5 9 .0
6 5 .3
PO A m odu lar
n e tw ork
V ow e l m od ula r
n e tw ork
3 5 .1
5 6 .9
6 9 .5
7 6 .6
3 0 .1
4 7 .5
5 8 .8
6 3 .6
C om b ine d e vid en ce
b a se d sy ste m
5 1 .6
6 3 .5
7 0 .7
7 4 .5
C on strain t
S a tisfaction m od e l
6 5 .6
7 5 .0
8 0 .2
8 2 .6
95
< Prev
Next >
Speaker Verification using AANN
Models and Vocal Tract System
Features
•
•
•
•
One AANN for each speaker
Verification by identification
AANN structure: 19L 38N 4N 38N 19 L
Feature: 19 weighted LPCC from 16th order
LPC for each frame of 27.5 ms and frame shift
13.75ms
• Training: Pattern mode, 100 epochs, 1 min of
data
• Testing: Model giving highest confidence for
10 sec of test data
96
< Prev
Next >
Speaker Recognition using Source
Features
•
•
•
•
One model for each speaker
Structure of AANN: 40L 48N 12N 48N 40L
Feature: About 10 sec of data, 60 epochs
Testing: Select model giving highest
confidence for 2 sec of test data
97
< Prev
Next >
Other Applications
•
•
•
•
•
Speech enhancement
Speech compression
Image compression
Character recognition
Stereo image matching
98
< Prev
Next >
Summary and Conclusions
• Speech and image processing: Natural tasks
• Significance of pattern processing
• Limitation of conventional computer
architecture
• Need for new models or architectures for
pattern processing tasks
• Basics of ANN
• Architecture of ANN for feature extraction and
classification
• Potential of ANN for speech and image
processing
99
< Prev
References
1. B.Yegnanarayana, “ Artificial Neural Networks”, Prentice-Hall of India, New Delhi, 1999
2. L. R. Rabiner and B. H. Juang, “Fundamentals of Speech Recognition”, Prentice-Hall, New Jersey,
1993
3. Alan C. Bovik, Handbook of Image and Video Processing, Academic Press, 2001
4. Xuedong Hwang, Alex Acero and Hsiao-Wuen Hon, “Spoken Language Processing”, Prentice-Hall,
New Jersey, 2001
5. P. P. Raghu, “Artificial Neural Network Models for Texture Analysis”, PhD Thesis, CSE Dept., IIT
Madras, 1995
6. C. Chandra Sekar, “Neural Network Models for Recognition of Stop Consonant Vowel (SCV)
Segments in Continuous Speech”, PhD Thesis, CSE Dept., IIT Madras, 1996
7. P. Kiran Kumar, “Texture Edge Extraction using One Dimensional Processing”, MS Thesis, CSE
Dept., 2001
8. S. P. Kishore, “Speaker Verification using Autoassociative Neural Netwrok Models”, MS Thesis,
CSE Dept., IIT Madras, 2000
9. B. Yegnanarayana, K. Sharath Reddy and S. P. Kishore, “Source and System Features for Speaker
Recognition using AANN Models”, ICASSP, May 2001
10. S. P. Kishore, Suryakanth V. Ganagashetty and B. Yegnanarayana, “Online Text Independent Speaker
Verification System using Autoassociative Neural Network Models”, INNS-IEEE Int. Conf. Neural
Networks, July 2001.
11. K. Sharat Reddy, “Source and System Features for Speaker Recognition”, MS Thesis, CSE Dept., IIT
Madras, September 2001.
12. B. Yegnanarayana and S. P. Kishore, “Autoassociative Neural Networks: An alternative to GMM for
Pattern Recognition”, to appear in Nerual Networks 2002.
100
Descargar

Tutorial on Neural Network Models for Speech and …