2006
Speech/Audio Signal Processing in MATLAB/Simulink
Speech/Audio Signal Processing
in MATLAB/Simulink
J.-S. Roger Jang (張智星)
CS Dept, Tsing-Hua Univ, Taiwan
(清華大學 資訊系)
http://www.cs.nthu.edu.tw/~jang
[email protected]
2006
Speech/Audio Signal Processing in MATLAB/Simulink
Outline
Wave file manipulation
Reading, writing, recording ...
Time-domain processing
Delay, filtering, sptools …
Frequency-domain processing
Spectrogram
Pitch determination
Auto-correlation, SIFT, AMDF, HPS ...
Others
Formant estimation, speech coding
3
2015/10/7
3
2006
Speech/Audio Signal Processing in MATLAB/Simulink
Toolbox/Blockset Used
MATLAB
Simulink
Signal Processing Toolbox
DSP Blockset
4
2015/10/7
4
2006
Speech/Audio Signal Processing in MATLAB/Simulink
MATLAB Primer
Before you start, you need to get familiar with MATLAB.
Please read “MATLAB Primer” at the following
page:
http://neural.cs.nthu.edu.tw/jang/demo/demoDownload.
asp
Exercise:
1. Please plot two curves y=sin(2*t) and y=cos(3*t) in
the same figure.
2. Please plot x vs. y where x=sin(2*t) and y=cos(3*t).
5
2015/10/7
5
2006
Speech/Audio Signal Processing in MATLAB/Simulink
To Read a Wave File
To read a MS .wav file (PCM format only):
wavread
y = wavread(file)
[…] = wavread(file, [n1, n2])
[y, fs, nbits, opts] = wavread(file)
[…] = wavread(file, n)
[y, fs, nbits] = wavread(file)
If the wav file is stereo, y will be a two-column
matrix.
6
2015/10/7
6
2006
Speech/Audio Signal Processing in MATLAB/Simulink
To Read a Wav File
Example (wavRead01.m):
[y, fs] = wavread('singapore.wav');
plot((1:length(y))/fs, y);
xlabel('Time in seconds');
ylabel('Amplitude');
Exercise:
1. Plot the waveform of “rrrrr.wav”. Use MATLAB’s “zoom”
button to find the consecutive curling “R” occurs.
2. Plot the two-channel waveform in “flanger.wav”.
7
2015/10/7
7
2006
Speech/Audio Signal Processing in MATLAB/Simulink
Solution to the Previous Exercise
wavRead02.m:
[y, fs] = wavread(‘flanger.wav’);
subplot(2,1,1), plot((1:length(y))/fs, y(:,1));
subplot(2,1,2), plot((1:length(y))/fs, y(:,2));
8
2015/10/7
8
2006
Speech/Audio Signal Processing in MATLAB/Simulink
To Play Wav Files
To play sound using Windows audio output device:
wavplay, sound, soundsc
wavplay(y, fs)
wavplay(y, fs, ‘async’): non-blocking call
wavplay(y, fs, ‘sync’): blocking call
sound(y, fs)
soundsc(…): autoscale the sound
Example (wavPlay01.m):
[y, fs] = wavread(‘rrrrr.wav’);
wavplay(y, fs);
Exercise:
Follow the example to play “flanger.wav”.
9
2015/10/7
9
2006
Speech/Audio Signal Processing in MATLAB/Simulink
To Read/Play Using DSP Blocks
To read/play sound using DSP Blockset:
DSP Blockset/DSP Sources/From Wave File
DSP Blockset/DSP Sinks/To Wave Device
Example:
Frame-based operation!
Exercise:
Create a model as shown above.
10
2015/10/7
10
2006
Speech/Audio Signal Processing in MATLAB/Simulink
Solution
Solution to the previous exercise:
slWavFilePlay01.mdl
11
2015/10/7
11
2006
Speech/Audio Signal Processing in MATLAB/Simulink
To Write a Wave File
To write MS wave files: wavwrite
wavwrite(y, fs, nbits, wavefile)
“nbits” must be 8 or 16.
“y” must have two columns for stereo data.
Amplitude values outside [-1,1] are clipped.
Example (wavWrite01.m):
[y, fs] = wavread(‘rrrrr.wav’);
wavwrite(y, fs*1.2, 8, ‘testout.wav’);
!start testout.wav
Exercise:
Try out the above example.
12
2015/10/7
12
2006
Speech/Audio Signal Processing in MATLAB/Simulink
To Record a Wave File
To record wave files:
1. Use the recording utility under WinXP.
2. Use “wavrecord” under MATLAB.
3. Use “From Wave Device” under Simulink, under “DSP
Blocksets/Platform Specific IO/Windows (Win32)”
Example:
1. Go ahead and try WinXP recording utility!
2. Try “wavRecord01.m”
3. Try “slWavFileRecord01.mdl”
Exercise:
Try out the above examples.
13
2015/10/7
13
2006
Speech/Audio Signal Processing in MATLAB/Simulink
Time-Domain Speech Signals
A typical time-domain plot of speech signals:
Amplitude: volume or intensity
Frequency: pitch
14
2015/10/7
14
2006
Speech/Audio Signal Processing in MATLAB/Simulink
Changing Wave Playback Param.
To control the play of a sound:
•
•
•
•
•
Normal: wavplay(y, fs)
High volume: wavplay(2*y, fs)
Low volume: wavplay(0.5*y, fs)
High pitch (and faster): wavplay(y, 1.2*fs)
Low pitch (and slower): wavplay(y, 0.8*fs)
Exercise:
• Try “wavPlay01.m” and trace the code.
• Create “wavPlay02.m” such that you can record your
own voice on the fly.
15
2015/10/7
15
2006
Speech/Audio Signal Processing in MATLAB/Simulink
Time-Domain Signal Processing
Take-home exrecise:
How to get a high pitch with the same time span?
16
2015/10/7
16
2006
Speech/Audio Signal Processing in MATLAB/Simulink
Synthetic Sounds
Use a sine wave generator (under DSP blocksets)
to produce sounds
Single frequency:
Multiple frequencies:
Amplitude modulation:
Exercise:
17
2015/10/7
Create the above models.
17
2006
Speech/Audio Signal Processing in MATLAB/Simulink
Solution
Solution to the previous exercise:
sineSource01
sineSource02
sineSource03
18
2015/10/7
18
2006
Speech/Audio Signal Processing in MATLAB/Simulink
Delay in Speech/Audio
What is a delay in a signal?
y(n) --> y(n-k)
What effects can delay generate?
Echo
Reverberation
Chorus
Flanging
19
2015/10/7
19
2006
Speech/Audio Signal Processing in MATLAB/Simulink
Single Delay in Audio Signal
Block diagram:
Input
u(n)
-k
z
a
Output
y(n) =
u(n) + a*u(n-k)
Simulink model:
Exercise:
Create the above model.
20
2015/10/7
20
2006
Speech/Audio Signal Processing in MATLAB/Simulink
Multiple Delay in Audio Signal
How to create “karaoke” effects:
a
Input
u(n)
-k
z
Output y(n)
y(n) = u(n) + a u(n-k) + a 2u(n-2k) + a 3u(n-3k) ...
Simulink model:
21
2015/10/7
21
2006
Speech/Audio Signal Processing in MATLAB/Simulink
Multiple Delay in Audio Signal
Parameter values:
• Feedback gain a < 1
• Actual delay time = k/fs
Exercise:
• Create the above model and change some parameters
to see their effects.
• Modify the model to take microphone input (so you can
start singing karaoke now!)
• Use a “configurable subsystem” to include all possible
input files and the microphone. (See next page.)
22
2015/10/7
22
2006
Speech/Audio Signal Processing in MATLAB/Simulink
Multiple Delay in Audio Signal
How to use “configurable subsystem” block?
1. Create a library (say, wavinput.mdl)
2. Get a block of “configurable subsystem”
3. Fill the dialog box with the library name
23
2015/10/7
23
2006
Speech/Audio Signal Processing in MATLAB/Simulink
Audio Flanging
Flanging sound:
• A sound similar to the sound of a jet plane flying
overhead, or a "whooshing" sound
• “Pitch modulation” due to a variable delay
Simulink demo:
• dspafxf.mdl (all platforms)
• dspafxf_nt.mdl (for 95/98/NT)
24
2015/10/7
24
2006
Speech/Audio Signal Processing in MATLAB/Simulink
Audio Flanging
Simulink model:
Original spectrogram:
25
2015/10/7
Modified spectrogram:
25
2006
Speech/Audio Signal Processing in MATLAB/Simulink
Signal Processing Using sptool
To invoke sptool, type “sptool”.
26
2015/10/7
26
2006
Speech/Audio Signal Processing in MATLAB/Simulink
Speech Production
How is speech produced?
Speech is produced when air is forced from the
lungs through the vocal cords (glottis) and along
the vocal tract.
Analogy to System Theory:
Input: air forced into the vocal cords
Output: media vibration
System (or filter): vocal tract
Pitch frequency: frequency of the input
Formant frequency: resonant frequency
27
2015/10/7
27
2006
Speech/Audio Signal Processing in MATLAB/Simulink
Source Filter Model of Speech
The source-filter model of speech production:
Speech is split into a rapidly varying excitation
signal and a slowly varying filter. The envelope of
the power spectra contains the vocal tract
information.
28
Two important characteristics of the model are
fundamental (pitch) frequency (f0) and formants
2015/10/7 (F1, F2, F3, …)
28
2006
Speech/Audio Signal Processing in MATLAB/Simulink
Frame Analysis of Speech Signal
Speech wave form :
Zoom in
Overlap
Frame
29
2015/10/7
29
2006
Speech/Audio Signal Processing in MATLAB/Simulink
Spectrogram
Spectrogram (specgram.m) displays short-time
frequency contents:
Wave form :
Spectrogram :
30
2015/10/7
30
2006
Speech/Audio Signal Processing in MATLAB/Simulink
Real-time Spectrogram
Try “dspstfft_win32”:
Spectrum:
31
2015/10/7
Spectrogram:
31
2006
Speech/Audio Signal Processing in MATLAB/Simulink
Pitch and Formants
Pitch and formants can be defined visually:
First formant
F1
32
2015/10/7
Pitch period = 1/f0
Second formant
F2
32
2006
Speech/Audio Signal Processing in MATLAB/Simulink
Spectrogram Reading
Spectrogram Reading
• http://cslu.cse.ogi.edu/tutordemos/SpectrogramRe
ading/spectrogram_reading.html
Waveform:
Spectrogram:
33
2015/10/7
“compute”
33
2006
Speech/Audio Signal Processing in MATLAB/Simulink
Pitch Determination Algorithms
Time-domain:
• Auto-correlation
• AMDF (Average Magnitude Difference Function)
• Gold-Rabiner algorithm (1969)
Frequency-domain:
• Cepstrum (Noll 1964)
• Harmonic product spectrum (Schroeder 1968)
Others:
• SIFT (Simple inverse filter tracking)
• Maximum likelihood
34
2015/10/7
• Neural network approach
34
2006
Speech/Audio Signal Processing in MATLAB/Simulink
Autocorrelation of Each Frame
Let s(k) be a frame of size 128.
1
128
s(k):
s(k-h):
h=30
x(30) = dot prod. of overlapped
= sum(s(31:128).*s(1:99)
Autocorrelation
x(h):
35
2015/10/7
30
Pitch period
35
2006
Speech/Audio Signal Processing in MATLAB/Simulink
Autocorrelation via DSP Blockset
Real-time autocorrelation demo:
Exercise:
Construct the above model and try it.
36
2015/10/7
36
2006
Speech/Audio Signal Processing in MATLAB/Simulink
Pitch Tracking via Autocorrelation
Real-time pitch tracking via autocorrelation:
pitch2.mdl
37
2015/10/7
37
2006
Speech/Audio Signal Processing in MATLAB/Simulink
Formant Analysis
Characteristics of formants:
• Formants are perceptually defined.
• The corresponding physical property is the
frequencies of resonances of the vocal tract.
• Formant analysis is useful as the position of the
first two formants pretty much identifies a vowel.
Computation methods:
•
•
•
•
38
2015/10/7
Peak picking on the smoothed spectrum
Peak picking on the LP spectrum
Factoring for the LP roots
Fitting of mixture of Gaussians
38
2006
Speech/Audio Signal Processing in MATLAB/Simulink
Formant Analysis
Track Draw:
• A package for formant synthesis with options to
sketch formant tracks on a spectrogram.
• http://www.utdallas.edu/~assmann/TRACKDRAW/tr
ackdraw.html
Formant Location Algorithm
• MATLAB code by Michelle Jamrozik
• http://ece.clemson.edu/speech/files.htm
39
2015/10/7
39
2006
Speech/Audio Signal Processing in MATLAB/Simulink
Speech Waveform Coding
Time domain coding
• PCM: Pulse Code Modulation
• DPCM: Differential PCM
• ADPCM: Adaptive Differential PCM
(dspadpcm.mdl)
Frequency domain coding
• Sub-band coding
• Transform coding
Speech Coding in MATLAB
http://www.eas.asu.edu/~speech/education/educ1.ht
ml
40
2015/10/7
40
2006
Speech/Audio Signal Processing in MATLAB/Simulink
Conclusions
Ideal tools for speech/audio signal processing:
•
•
•
•
MATLAB
Simulink
Signal Processing Toolbox
DSP Blockset
Advantages:
•
•
•
•
•
•
41
2015/10/7
Reliable functions: well-established and tested
Visible graphical algorithm design tools
High-level programming language yet C-compatible
Powerful visualization capabilities
Easy debugging
Integrated environment
41
2006
Speech/Audio Signal Processing in MATLAB/Simulink
References
[1] “Discrete-Time Processing of Speech Signals”,
by Deller, Proakis and Hansen, Prentice Hall, 1993
[2] “Fundamentals of Speech Recognition”, by
Rabiner and Juang, Prentice Hall, 1993
[3] “Effects Explained”, http://www.harmonycentral.com/Effects/effects-explained.html
[4] “TrackDraw”,
http://www.utdallas.edu/~assmann/TRACKDRAW/tr
ackdraw.html
42
2015/10/7
[5] “Speech Coding in MATLAB”,
http://www.eas.asu.edu/~speech/education/educ1.
html
42
Descargar

This would be an example of a two line header