```2006
Speech/Audio Signal Processing
J.-S. Roger Jang (張智星)
CS Dept, Tsing-Hua Univ, Taiwan
(清華大學 資訊系)
http://www.cs.nthu.edu.tw/~jang
jang@cs.nthu.edu.tw
2006
Outline
Wave file manipulation
Time-domain processing
Delay, filtering, sptools …
Frequency-domain processing
Spectrogram
Pitch determination
Auto-correlation, SIFT, AMDF, HPS ...
Others
Formant estimation, speech coding
3
2015/10/7
3
2006
Toolbox/Blockset Used
MATLAB
Signal Processing Toolbox
DSP Blockset
4
2015/10/7
4
2006
MATLAB Primer
Before you start, you need to get familiar with MATLAB.
page:
asp
Exercise:
1. Please plot two curves y=sin(2*t) and y=cos(3*t) in
the same figure.
2. Please plot x vs. y where x=sin(2*t) and y=cos(3*t).
5
2015/10/7
5
2006
To read a MS .wav file (PCM format only):
[y, fs, nbits, opts] = wavread(file)
If the wav file is stereo, y will be a two-column
matrix.
6
2015/10/7
6
2006
plot((1:length(y))/fs, y);
xlabel('Time in seconds');
ylabel('Amplitude');
Exercise：
1. Plot the waveform of “rrrrr.wav”. Use MATLAB’s “zoom”
button to find the consecutive curling “R” occurs.
2. Plot the two-channel waveform in “flanger.wav”.
7
2015/10/7
7
2006
Solution to the Previous Exercise
subplot(2,1,1), plot((1:length(y))/fs, y(:,1));
subplot(2,1,2), plot((1:length(y))/fs, y(:,2));
8
2015/10/7
8
2006
To Play Wav Files
To play sound using Windows audio output device:
wavplay, sound, soundsc
wavplay(y, fs)
wavplay(y, fs, ‘async’): non-blocking call
wavplay(y, fs, ‘sync’): blocking call
sound(y, fs)
soundsc(…): autoscale the sound
Example (wavPlay01.m)：
wavplay(y, fs);
Exercise：
Follow the example to play “flanger.wav”.
9
2015/10/7
9
2006
To read/play sound using DSP Blockset:
DSP Blockset/DSP Sources/From Wave File
DSP Blockset/DSP Sinks/To Wave Device
Example:
Frame-based operation!
Exercise:
Create a model as shown above.
10
2015/10/7
10
2006
Solution
Solution to the previous exercise:
slWavFilePlay01.mdl
11
2015/10/7
11
2006
To Write a Wave File
To write MS wave files: wavwrite
wavwrite(y, fs, nbits, wavefile)
“nbits” must be 8 or 16.
“y” must have two columns for stereo data.
Amplitude values outside [-1,1] are clipped.
Example (wavWrite01.m)：
wavwrite(y, fs*1.2, 8, ‘testout.wav’);
!start testout.wav
Exercise：
Try out the above example.
12
2015/10/7
12
2006
To Record a Wave File
To record wave files:
1. Use the recording utility under WinXP.
2. Use “wavrecord” under MATLAB.
3. Use “From Wave Device” under Simulink, under “DSP
Blocksets/Platform Specific IO/Windows (Win32)”
Example：
1. Go ahead and try WinXP recording utility!
2. Try “wavRecord01.m”
3. Try “slWavFileRecord01.mdl”
Exercise:
Try out the above examples.
13
2015/10/7
13
2006
Time-Domain Speech Signals
A typical time-domain plot of speech signals:
Amplitude: volume or intensity
Frequency: pitch
14
2015/10/7
14
2006
Changing Wave Playback Param.
To control the play of a sound:
•
•
•
•
•
Normal: wavplay(y, fs)
High volume: wavplay(2*y, fs)
Low volume: wavplay(0.5*y, fs)
High pitch (and faster): wavplay(y, 1.2*fs)
Low pitch (and slower): wavplay(y, 0.8*fs)
Exercise:
• Try “wavPlay01.m” and trace the code.
• Create “wavPlay02.m” such that you can record your
own voice on the fly.
15
2015/10/7
15
2006
Time-Domain Signal Processing
Take-home exrecise:
How to get a high pitch with the same time span?
16
2015/10/7
16
2006
Synthetic Sounds
Use a sine wave generator (under DSP blocksets)
to produce sounds
Single frequency:
Multiple frequencies:
Amplitude modulation:
Exercise:
17
2015/10/7
Create the above models.
17
2006
Solution
Solution to the previous exercise:
sineSource01
sineSource02
sineSource03
18
2015/10/7
18
2006
Delay in Speech/Audio
What is a delay in a signal?
y(n) --> y(n-k)
What effects can delay generate?
Echo
Reverberation
Chorus
Flanging
19
2015/10/7
19
2006
Single Delay in Audio Signal
Block diagram:
Input
u(n)
-k
z
a
Output
y(n) =
u(n) + a*u(n-k)
Exercise:
Create the above model.
20
2015/10/7
20
2006
Multiple Delay in Audio Signal
How to create “karaoke” effects:
a
Input
u(n)
-k
z
Output y(n)
y(n) = u(n) + a u(n-k) + a 2u(n-2k) + a 3u(n-3k) ...
21
2015/10/7
21
2006
Multiple Delay in Audio Signal
Parameter values:
• Feedback gain a < 1
• Actual delay time = k/fs
Exercise:
• Create the above model and change some parameters
to see their effects.
• Modify the model to take microphone input (so you can
start singing karaoke now!)
• Use a “configurable subsystem” to include all possible
input files and the microphone. (See next page.)
22
2015/10/7
22
2006
Multiple Delay in Audio Signal
How to use “configurable subsystem” block?
1. Create a library (say, wavinput.mdl)
2. Get a block of “configurable subsystem”
3. Fill the dialog box with the library name
23
2015/10/7
23
2006
Audio Flanging
Flanging sound:
• A sound similar to the sound of a jet plane flying
• “Pitch modulation” due to a variable delay
• dspafxf.mdl (all platforms)
• dspafxf_nt.mdl (for 95/98/NT)
24
2015/10/7
24
2006
Audio Flanging
Original spectrogram:
25
2015/10/7
Modified spectrogram:
25
2006
Signal Processing Using sptool
To invoke sptool, type “sptool”.
26
2015/10/7
26
2006
Speech Production
How is speech produced?
Speech is produced when air is forced from the
lungs through the vocal cords (glottis) and along
the vocal tract.
Analogy to System Theory:
Input: air forced into the vocal cords
Output: media vibration
System (or filter): vocal tract
Pitch frequency: frequency of the input
Formant frequency: resonant frequency
27
2015/10/7
27
2006
Source Filter Model of Speech
The source-filter model of speech production:
Speech is split into a rapidly varying excitation
signal and a slowly varying filter. The envelope of
the power spectra contains the vocal tract
information.
28
Two important characteristics of the model are
fundamental (pitch) frequency (f0) and formants
2015/10/7 (F1, F2, F3, …)
28
2006
Frame Analysis of Speech Signal
Speech wave form :
Zoom in
Overlap
Frame
29
2015/10/7
29
2006
Spectrogram
Spectrogram (specgram.m) displays short-time
frequency contents:
Wave form :
Spectrogram :
30
2015/10/7
30
2006
Real-time Spectrogram
Try “dspstfft_win32”:
Spectrum:
31
2015/10/7
Spectrogram:
31
2006
Pitch and Formants
Pitch and formants can be defined visually:
First formant
F1
32
2015/10/7
Pitch period = 1/f0
Second formant
F2
32
2006
• http://cslu.cse.ogi.edu/tutordemos/SpectrogramRe
Waveform:
Spectrogram:
33
2015/10/7
“compute”
33
2006
Pitch Determination Algorithms
Time-domain:
• Auto-correlation
• AMDF (Average Magnitude Difference Function)
• Gold-Rabiner algorithm (1969)
Frequency-domain:
• Cepstrum (Noll 1964)
• Harmonic product spectrum (Schroeder 1968)
Others:
• SIFT (Simple inverse filter tracking)
• Maximum likelihood
34
2015/10/7
• Neural network approach
34
2006
Autocorrelation of Each Frame
Let s(k) be a frame of size 128.
1
128
s(k):
s(k-h):
h=30
x(30) = dot prod. of overlapped
= sum(s(31:128).*s(1:99)
Autocorrelation
x(h):
35
2015/10/7
30
Pitch period
35
2006
Autocorrelation via DSP Blockset
Real-time autocorrelation demo:
Exercise:
Construct the above model and try it.
36
2015/10/7
36
2006
Pitch Tracking via Autocorrelation
Real-time pitch tracking via autocorrelation:
pitch2.mdl
37
2015/10/7
37
2006
Formant Analysis
Characteristics of formants:
• Formants are perceptually defined.
• The corresponding physical property is the
frequencies of resonances of the vocal tract.
• Formant analysis is useful as the position of the
first two formants pretty much identifies a vowel.
Computation methods:
•
•
•
•
38
2015/10/7
Peak picking on the smoothed spectrum
Peak picking on the LP spectrum
Factoring for the LP roots
Fitting of mixture of Gaussians
38
2006
Formant Analysis
Track Draw:
• A package for formant synthesis with options to
sketch formant tracks on a spectrogram.
• http://www.utdallas.edu/~assmann/TRACKDRAW/tr
ackdraw.html
Formant Location Algorithm
• MATLAB code by Michelle Jamrozik
• http://ece.clemson.edu/speech/files.htm
39
2015/10/7
39
2006
Speech Waveform Coding
Time domain coding
• PCM: Pulse Code Modulation
• DPCM: Differential PCM
Frequency domain coding
• Sub-band coding
• Transform coding
Speech Coding in MATLAB
http://www.eas.asu.edu/~speech/education/educ1.ht
ml
40
2015/10/7
40
2006
Conclusions
Ideal tools for speech/audio signal processing:
•
•
•
•
MATLAB
Signal Processing Toolbox
DSP Blockset
•
•
•
•
•
•
41
2015/10/7
Reliable functions: well-established and tested
Visible graphical algorithm design tools
High-level programming language yet C-compatible
Powerful visualization capabilities
Easy debugging
Integrated environment
41
2006
References
[1] “Discrete-Time Processing of Speech Signals”,
by Deller, Proakis and Hansen, Prentice Hall, 1993
[2] “Fundamentals of Speech Recognition”, by
Rabiner and Juang, Prentice Hall, 1993
[3] “Effects Explained”, http://www.harmonycentral.com/Effects/effects-explained.html
[4] “TrackDraw”,
http://www.utdallas.edu/~assmann/TRACKDRAW/tr
ackdraw.html
42
2015/10/7
[5] “Speech Coding in MATLAB”,
http://www.eas.asu.edu/~speech/education/educ1.
html
42
```