The following is an archive of e-mail messages sent to students in Joe Campbell's Spring 1996 Speech Processing class at JHU. Each message begins with "Date:" and the students' questions are generally quoted with ">" characters in traditional Internet fashion. You'll find errata for the text book (Deller, Proakis, and Hansen; aka "DPH" "D,P,H" and "John, John, and John"), homework tips, and exam pointers. ##### Date: Tue, 13 Feb 1996 22:20:54 -0500 From: jpcampb (Joseph P Campbell) Subject: Re: Homework #2 One of your classmates asked these questions (>) and here are my answers: > Q1: The test says in section 2.3 under dipthongs that "Fig. 2.14 confirms > that the three diphthongs /Y/, /W/, and /O/ move from one vowel target to a > second, . . .". I can't see diphthong /O/ in the figure. Those tricky dipthongs! Researchers agree on these three and there are possibly others (this is as much art as it is science). If you look at Fig 2.14 (p. 129), you'll see a dipthong symbol /cI/ at about (500, 1100 Hz) - (watch those log scales). That is also known as /O/ ("boy"). Two other notes: - Another errata in Fig 2.14 is at about (440, 1020 Hz) the /u/ should be /U/. - The IPA symbols keep changing! So, on p. 118 you'll see this dipthong /O/ labeled with a /backward-c I/ instead of /cI/. (If you want to go wild, you can download IPA fonts from the 'net or get them off the Adobe Type-On-Call CD-ROM to make all these funky characters in your papers). > Q2: Problem 2.11 asks "Which formants require the greatest articulatory > movement?" Does this mean which formant transition (eg. for a diphthong) > requires the greatest movement? To clarify the problem 2.11, the last sentence could be restated as: Which formant requires the greatest articulatory movement during the production of the dipthong /W/? > Q3: In problem 2.16(s) none of the groups has words that all contain the > same phoneme. Can this be correct or am I missing something? Let's look at 2.16(a), group 2: s"u"ch, p"u"ddle, to"u"ch, and st"u"d all contain the phoneme /AX/ ("ago"). Can you hear it? ##### Date: Tue, 13 Feb 1996 23:48:58 -0500 From: jpcampb (Joseph P Campbell) Subject: Re: Alg #1 > I have started working on Algorithm #1 and I have a few questions > for you. I am not really sure what you are looking for, and > after looking at the grading sheets, I am still not sure. The Algorithm statements are intentionally flexible to allow students full credit for probing different depths in different areas. It would be perfectly acceptable for one student to program a bunch of simple windows (e.g., rectangular, Hamming, Hanning, Blackman, and Bartlett/triangle) while another programs a very complex window (e.g., an eigenvector based window). (I should mention that 99.9% of the students so far have taken the option to do many simple windows.) > 1. There are numerous window types available on Matlab. I > am assuming I can use those functions to produce the windows. Since you haven't previously used these windows, you need to program them. However, you can use Matlab's built-in windows to verify your own. > Also, I am assuming that using ONLY the window type which > require 1 input, N, is acceptable. Are these safe assumptions? The windows will all be 1-dimensional here (i.e., we wont be using multidimensional windows), but you're probably referring to windows with an adjustable parameter(s) (e.g., Chebyshev and Kaiser). In terms of difficulty, these windows are in between the examples above, so you could program both the Chebyshev and Kaiser windows for full credit. > 2. There is nothing real descriptive on the test data which > you would like to see. On the grading sheet, it simply > states "sufficiency", and on the assignment, it states > "variety". Can you give me a hint on how many test signals > you would like to see? That depends on what the test signals are. I recommend using test signals that allow you to duplicate the figures and tables in the text book. I strongly recommend using simple input signals where you know what the mathematical expression for the ouput should be so you can compute and plot an error signal. You also need to be sure that you've fully exercised your algorithm(s) - including testing the boundry conditions of your arrays. You might want to exercise your functions independently, glue them together, and then test the result. Some people may find just a few test signals do the trick while others may need many - as long as you've done the above, you get full credit either way. > Are we suppose to just stick with > speech-like signals? I am guessing that we should simulate > some "vowel" and "consonant" sounds. You might want to do that for fun after finishing the project. Using speech (especially real speech) signals makes it challenging to verify the correctness of your algorithm because speech is not a simple input signal. But don't worry, we'll work with synthetic speech soon... ##### Date: Wed, 14 Feb 1996 14:35:06 -0500 From: jpcampb (Joseph P Campbell) Subject: Re: Further clarification of Alg #1 A follow-up question on Alg #1 from the speech class: > Q1: For algorithm #1, Do you want us to actually code up an FFT > algorithm or can we just use the brute force DFT definition ? You can use any DFT algorithm you wish, which includes the class of algorithms commonly called FFTs. The processing power you have available may also influence your choice (they don't call them fast for nothing). > Q2: Why can't we just use the FFT function in matlab? We've had > the FFT algorithm crammed down our throats by now. You could use Matlab's built-in FFT, but as stated on the Speech Processing Term Project, you'll lose a great deal of credit if you don't also program a DFT yourself. I see where you're coming from, but many professionals who are also students have been out of school for awhile and this is a good project to get their feet wet and get (re)acquainted with Matlab. This is also one of the few projects where most of the class typically gets close to perfect scores, so don't blow this opportunity to bolster your grade! Another question on Alg #1 from the speech class: > A quick question regarding the first project. You have specified > independent sample, transform and window lengths. In general, ignoring > the window, if the transform size is bigger than the data, the data is > zero padded, and if the transform size is smaller than the data, the data > is truncated. I'm not sure how should it work with the added complication of > the window length. It would not seem to make much sense if the window size > was specified to be greater than the transform size as truncating a window > changes the window characteristics. These are precisely some of the issues that students (and DFT designers) need to deal with. One way to handle this is to check that these arguments make sense. (Although not part of Alg #1, it gets even more interesting when considering overlapped transforms as commonly used in spectrograms and speech processing.) ##### Date: Wed, 14 Feb 1996 14:53:17 -0500 From: jpcampb (Joseph P Campbell) Subject: Re: Further clarification of problem 2.16 A follow-up question on 2.16 from the speech class: > > From jpcampb Tue Feb 13 22:21 EST 1996 > > > > > Q3: In problem 2.16(s) none of the groups has words that all contain the > > > same phoneme. Can this be correct or am I missing something? > > > > Let's look at 2.16(a), group 2: s"u"ch, p"u"ddle, to"u"ch, and st"u"d all > > contain the phoneme /AX/ ("ago"). Can you hear it? > > Ya I hear it. But the question states: > > "Are the sounds from each group the same phoneme?" > > I would have to answer NO. Take group 2 for example. "put" does > not have the same phoneme as the rest of the group therefore the > answer to the question is NO. That is correct. Please remove the conditional part of the following sentence of the question (i.e., strike "If so,"). It should read: "Give one additional word which contains the same speech sound (give the same word for the same sound)." Also, remember to provide answers for the final sentence of 2.16(a). ##### Date: Wed, 14 Feb 1996 15:37:15 -0500 From: jpcampb (Joseph P Campbell) Subject: Re: Homework #2 Another question on HW 2 from the speech class: > Which phoneme notation would you like us to use? IPA or ARPABET > single letter? IPA is more difficult to write (and 'type'). Whatever you like. My solutions will usually be given in single-symbol ARPAbet notation to match the text book's usual notation. ##### Date: Tue, 20 Feb 1996 18:28:40 -0500 From: jpcampb (Joseph P Campbell) Subject: Re: algorithm 1 > I have couple of questions regarding to the algorithm 1: > 1. What do you exactly expect on the document, how detail do you want it? Enough detail so that your project can be understood by someone with the prerequisites for this course. Most students take 2 to 10 pages (excluding code and plots) to write up a project. If you're comparing your results with published results or if you're using published data, it's a good idea to include a copy of the relevant portions of the publications in your report (if this is infeasible, include a citation). Another way to look at this question is to put yourself in the grader's position and refer to the ALGORITHM #1 grading sheet I handed out. Make sure you're strong in the 4 grading categories (documentation, code, test data, and algorithm). > 2. The alogorithm part, do you want us to write out what our algorithm is > or is that part of our coding? Your documentation should contain a description of your algorithm. Most students find it helpful to include an algorithm block diagram of their signal processing steps with labeled inputs and outputs with ties to their code. ##### Date: Mon, 26 Feb 1996 14:28:42 -0500 From: jpcampb (Joseph P Campbell) Subject: Prob 2.6 Errata Please note that the pitch period for the typical male speaker should be 8.0 msec (not 80 msec) in Problem 2.6. ##### Date: Mon, 4 Mar 1996 17:03:23 1996 Subject: HW Prob 7.4 I have a question about problem 7.4 on the homework. First let me describe what I think the problem is asking for. We have a random process S which we wish to quantize. Here no assumtions are made about its distribution, or statistics. Define X = S + D, where D is dither noise uniformly distrubed over [-delta/2, delta/2]. Let Xq = Q[X], i.e. the quantized value of X. Define the error as Ed = S - Xq, which is the error between the original signal and the quantized signal plus dither. Part a) of the problem asks us to show that Ed is independent of S. For two things to be independent their joint distribution must factor into the product of the marginals f(e,s) = f(e)f(s). Or equivalently the conditional distributions must equal the marginals f(e|s) = f(e), f(s|e) = f(s). Consider the conditional distrubution f(e|s). Let y(i) for 0 <= i <= L, be the quantization region thresholds and xq(i) for 1 <= i < =L, be the quantized values Say xq(i) < s < xq(i+1), then X is distributed uniformly on [s-delta/2,s+delta/2]. xq(i-1) y(i) xq(i) s y(i+1) xq(i+1) y(i+2) ---+----------|----------+----x-----|----------+----------|----- f(x|s) _______________________ | | --------------------+----------x----------+------------------------------- s-delta/2 s s+delta/2 Now if we look at the quantized values Xq it is clear that for a given a value of s E will take on only two values, ie E=xq(i) with probability 1-(s-xq(i))/delta, and E=xq(i+1) with probability (s-xq(i))/delta. i.e. f(e|s) is a discrete distribution with only two possibilities. If we look at the unconditional distribution f(e) this is a continuous distribution and f(e) is not equal to f(e|s) => E and S are not independent. So am I missing something? I wondered if the bayes formulation of the def of independence holds for continuous densities, since it is often confusing when going between f(e|S) and f(e|s). But I found Papoulis used the realization form and states that f(y|X=x)=f(y) as the consequence of independence. Though I suppose Papoulis is not a very good reference when it comes to technical details. ##### Date: Mon, 4 Mar 1996 22:54:28 -0500 (EST) From: Joe Campbell Subject: Homework #4 Tips I'm getting a lot of questions that I hope are addressed by the following. Make corrections to Fig 7.48 as given in Errata. Clarifications to DPH Problem 7.4: d(n) | | V x(n) x_q(n) s(n) ----> + ---------> Q ----------> + -----> s_q(n) Fig 7.49 for Prob 7.4 parts (a) and (b) d(n) |--------------------------- | | V x(n) x_q(n) -V s(n) ----> + ---------> Q ----------> + -----> s_q(n) Fig 7.49 for Prob 7.4 part (c) e(n) = s(n) - [ x_q(n) - d(n) ] (7.145) [missing brackets] ##### Date: Tue, 5 Mar 1996 15:25:45 -0500 From: jpcampb (Joseph P Campbell) Subject: Re: HW Prob 7.4 One of your classmates asks some questions that are difficult to fully answer without giving away the solution to the intriguing Problem 7.4 (which is how high quality multibit quantization is done in the real world), so here are some clues: > I got your HW #4 Tips email. But my question still stands. I am convinced > that part a) is incorrect. I believe that I can show that f(e|s) is discrete and > f(e) is the convolution of f(d) and whatever dist s has on an interval, which > is normally assumed uniform but may not be here, but anyway it is continuous. > Thus f(e|s) is not equal to f(e), therefore Ed and S are not independent. Recall that the amplitude of the dither noise sequence is less than half the step size of the quantizer. So, you need to consider the fractional part of s+d and then think about P(e|s) and P(e) (a picture helps a lot). [Upon further consideration, it appears that the premise of problem 7.4a is incorrect and that it contradicts Rabiner and Schafer's problem 5.6. Hmmm...] > An additional question is that since D~[-delta/2,delta/2] we cannot add it back > in after we have quantized, since it is smaller than the LSB of Xq the subtraction > will have no effect unless you assume that the subtraction is over the real line. > But then the problem seems pointless. Recall that quantization is a nonlinear process. ##### Date: Mon, 11 Mar 1996 00:20:42 -0500 (EST) From: Joe Campbell Subject: Re: Alg #2 A question on Alg #2 from your of your classmates: > I tested the vowel /u/ which has a total of 40 areas. I averaged them > in groups of 4 for a total of 10 averaged areas. I wanted to test > another vowel like /a/ which has 35 areas. If I overlap the 35 areas > to get 10 averaged areas, how should I treat the overlapping areas? Figuring out a reasonable means for sampling the areas is an important part of the project, so I leave it up to you, but here are some hints: -Averaging is only one method of sampling -You don't have to use the same method for each vowel -It's helpful to plot the area functions to see what makes sense in a stepwise approx (e.g., give special treatment to constrictions) -You may try multiple sampling methods to maximize agreement with a ref -Sample consistently with modeling assumptions -If you use G. Fant's X-ray cross-section data given in class notes: --Single subject (38 year old Russian) --5 vowels including two allophones of /i/ denoted /i/ and /i-bar/ --The resonator system has a max length of 19.5 cm in /u/, on account of the longer pathway of the air stream past the high back tongue hump. --Russian vowels may not match English vowel spectral envelopes in text, so... Since G. Fant's book is very difficult to acquire, here is a portion of Fant's Table 2.31-1 of measured values of formant frequencies from spectrographic analysis of the sounds sustained by the subject during the X-ray sessions (Fn) combined with Fant's Table 2.34-1 of formant bandwidths in Hz based upon digitally calculated data (~Bn): Vowel F1/~B1 F2/~B2 F3/~B3 /a/ 700/57 1080/72 2600/130 /o/ 535/54 780/65 2500/100 /u/ 300/69 625/50 2500/110 /i-bar/ 300/43 1480/125 2230/77 /i/ 240/60 2250/75 3200/240 /e/ 440/39 1800/95 2550/170 ##### Date: Tue, 12 Mar 1996 23:31:18 -0500 (EST) From: Joe Campbell Subject: Re: Algorithm #2 Another question on Algorithm #2: > If I work out the poles to my vocal tract simulation they all come > out on the unit circle. Is this because we have assumed lossles tubes > and ideal boundary conditions (r[lips] = r[glottis] = 1) ? This was true for an example in class, but should not be true for your project. Do not assume that r[lips]=1 in your project. ##### Date: Wed, 20 Mar 1996 23:05:48 -0400 From: Dr. Joe Campbell Subject: Re: Sample Midterm Dear Speech Students, >Q1: My sample midterm has no answer for question 3 and question 5 parts > C and D. Was that intentional? Yes. >Q2: Does the uniform sampler error power of: (delta squared / 12) only apply > in cases when the number of steps is relatively large (is that why we > can assume uniform density function)? This is part of the reason. Additional factors are given on a page in your handouts called QUANTIZATION OF SAMPLED SIGNALS. To dig deeper, Rabiner & Schafer, p. 181-6 has a nice treatment. ##### Date: Mon, 1 Apr 1996 12:02:32 -0400 From: Dr. Joe Campbell Subject: Re: Alg #3 Some questions on Alg #3: > For Algorithm #3, I would like to use delta modulation with adaptive > quantization for one of my coders. The book says I should sample the > input by at least 5 times the Nyquist rate. My speech synthesizer is > currently sampled at 10KHz (as per the 10 tube example in the > handout). Should I just crank up the sample rate to about 5 * 8KHz > (40KHz) ? You need to do a good job of approximating the oversampling of a speech input signal and I'm not sure this is what you mean by "just crank up the sample rate." One approach is to use sample rate conversion as you probably learned about in DSP class (i.e., upsample->filter->downsample). Another approach is to modify your speech synthesizer. > Also, if I decided to build an ADPCM coder, can I use a first order > filter? How should I estimate my predictor coefficients without > building a Linear Predictor? (Linear Predictor is Algorithm #4). Can > I use a first order predictor with: > > a(1) = R(1) / R(0); (This is a first order linear predictor isn't it?) You can build a 1st order predictor as above. It's also simple to work out a 2nd order predictor by hand (I think we did this in class). You may want to reconsider the linear predictor to get credit for it on 2 projects instead of one (and also get the bugs ironed out before the last project). > Could you send me the email address of the grader? I forgot to copy > it down last time. jack@entropic.com ##### Date: Tue, 2 Apr 1996 13:06:46 -0400 From: Dr. Joe Campbell Subject: Re: Alg #3... More Alg #3 questions: > Q1: What's a good way to check it? My first attempt calculates > the first two reflection coefficients correctly using cooked > up autocorrelation coefficients. Congratulations! This is one good test. The time and frequency domain errors (as asked for) are also good tests. > Q2: Should I use the biased form of the autocorrelation calculation? > (eg. should I include the 1/N term?). You should calculate the autocorrelation function in accordance with the development I gave in class (also in your book). > Q3: Can I use Matlab's xcorr function? Yes, but it's up to you to use it correctly. You need to insure proper summation limits, proper windowing, and decide if you need normalization (e.g., maximum likelihood or unbiased estimates). It may be eaiser to roll your own using Matlab's nice vector operations. > Q4: I can only generate up to 8192 byte vectors. What's a good frame > size to start with? Frame sizes around 20 milliseconds are common in speech processing. A number of standard coders use 20, 22.5 (LPC-10e), 25, and 30 (CELP) milliseconds. Some coders use multiple frame rates, e.g., CELP uses a 30 ms spectral analysis frame and 7.5 ms codebook subframes. ##### Date: Sat, 13 Apr 1996 00:43:11 -0400 From: Dr. Joe Campbell Subject: Re: Algorithm #4 >In playing with the Covariance method of determining my linear predictor >coefficients I encountered a problem that I didn't have with the Autocorrelation >method. If I use synthetic data (eg. a test signal composed of three sinusoids >of differing frequencies added together), I get an ATA matrix that is very >nearly singular. This doesn't happen with my speech data. I think the problem >is caused my the fact that my test data has a very high degree of correlation >and that for higher order predictors each collumn in the ATA matrix is very >nearly the same. Does this make any sense? Is there a sollution for the >covariance method in this case? Excellent observation! I tried to emphasize in class that the Covariance Method can lead to unstable LP filters. A nearly singular covariance matrix is a sign of this (assuming you don't have a programming error)... As for solutions, here are 6 separate solutions: 1. Whiten the spectrum to reduce dynamic range before Choleski Decomposition (this tends to improve the conditioning of the covariance matrix) a) Apply preemphasis (see D,P,H p. 330). Complementary deemphasis is needed (in LPC-10e, the preemphasis and deemphasis are slightly mismatched). b) Add a small amount of noise to the diagonal of the covariance matrix (sometimes the amount of noise added to the diagonal is based upon the energy of the residual, sometimes it's just a fixed number like 1e-6). This is called high frequency compensation and is covered in: Atal & Schroeder, Predictive Coding of Speech Signals and Subjective Error Criteria, IEEE TASSP, June 1979 (eq. 16) 2. Modify the poles a) Fold the poles back inside the unit circle while maintaining their frequency (like homework problem 1.7). Spectral magnitude can be preserved with stability by reflecting the poles (z(k), the roots of A(z)) outside the unit circle by replacing a pole at z(k) by 1/z(k). (Spectral phase is changed, but the LPC model ignores phase anyway.) b) Clip k(m) coefficients (see O'Shaughnessy p. 344) 3. Use the Autocorrelation Method in regions where the Covariance Method fails. (The Autocorrelation Method is stable if you have adequate precision arithmetic.) 4. Repeat the previous stable frame. Solutions 1a and/or 1b are the most common approaches (2a is often mentioned in the literature, but root finding is nontrivial in real time and isn't too common in practice). Hopefully, solution 1a will be adequate for this project. ##### Date: Tue, 16 Apr 1996 13:27:56 -0400 From: Dr. Joe Campbell Subject: Re: Alg. #4 Some additional questions on Algorithm #4: >I find that if I add some low level noise to my test signal the covariance >method works beautifully!!! I suppose the noise helped decrease the >correlation between successive samples therebye making my matrix >less "singular". Exactly! This is another method to add to the list in my last mailing of methods to stablize the Covariance Method. In speech coding systems, this low level noise is usually added for us by acoustic and A/D noises. >Also, since the additive noise is uncorrelated with my >signal, I'm able to extract the signals spectrum from the noise. We did >something like this in the DSP lab where we extracted radar data from >noise (I think). Yes, the Covariance Method is a very powerful model-based spectral estimation tool. Adding noise, will also add some noise to your model (e.g., predictor coefficients). Adding white noise flattens the formant peaks and smooths out the model spectrum (this effect should be negligible when adding a minute amount of white noise). The poles are moved inward, away from the unit circle and toward the origin (Kay, The effects of noise on the autoregressive spectral estimator, 1979, IEEE Trans on ASSP V27, N5, p. 478-485). On the other hand, noise can be reduced by correcting the reflection coefficients (Kay, 1980, Noise compensation for autoregressive spectral estimates, IEEE TASSP V28, N3, p. 292-303). >Can we interpret each collumn of the ATA matrix as an >individual random variable and the entire matrix as an ensemble of samples >of the random variable? Let me think about that... >Is it required that we use the Cholesky decomposition technique along >with the forward elimination and back-substitiution algorithms (ref. D,P,H >pp. 312 and 313) for determining the LP coefficients in the Covariance case? I would give students who program Cholesky Decomposition more credit than those who do not, but this is difficult to answer out of context. Perhaps our grader will be able to offer additional guidance... >Matlab's matrix inversion is pretty fast. They also support a matrix devision >operation that uses Gausian elimination. Can I use these? I think these alternate algorithms are best used as checks. However, if it's a matter of using one of these alternate algorithms versus skipping the Autocorrelation Method, the Covariance Method, or the Lattice Method, go ahead and use the alternate algorithms to maximize your grade (and hopefully your understanding). > In the project description, you ask to "...compute the spectrum of the >resultant filter and compare it the spectrum of the input speech signal." > >Question 1: > >Would you like an overlay plot of filter response with the signal spectrum? That would be nice (you might want to use different symbols and/or add an offset to differentiate your spectrums). >Question 2: > >The resultant filter is > M -1 > P(z) = sum| a(i)z (eqn. 5.28; D,P,H) > i=1 > >which is the prediction error filter. If we plot P(z), this will correspond to >a filter with zeros. And comparing it to spectrum of the input speech signal >will not match up. > >Should we plot > G > O(z) = --------- , (eqn 5.3; D,P,H) > 1 - P(z) > >which is the actual vocal tract response instead? This will give a spectrum >with poles that match up with the input signal spectrum. Yes (actually, eqn 5.3 is the resultant filter and eqn 5.28 is the inverse filter). You may want to normalize the gain for your plots (I typically normalize my magnitude response plots so that the maximum magnitude of each plot is 0 dB). ##### Date: Wed, 17 Apr 1996 12:34:52 -0400 From: Jack Unverferth Subject: Re: Alg. #4 "Campbell - Joseph P." writes: > >>Is it required that we use the Cholesky decomposition technique along >>with the forward elimination and back-substitiution algorithms (ref. D,P,H >>pp. 312 and 313) for determining the LP coefficients in the Covariance case? > >I would give students who program Cholesky Decomposition more credit >than those who do not, but this is difficult to answer out of context. >Perhaps our grader will be able to offer additional guidance... > I concur with this statement. The basic correct operation of your project a baseline. The difference in grades after that is based upon the effort that you show in your report (note that this is not directly correlated with the number of pages in you report!). This level of effort includes your programming, testing and write-up. I can say that the inclusion of the Cholesky Decomposition would show better level-of-effort and understanding than using matlab functions. ##### Date: Fri, 19 Apr 1996 10:39:05 -0400 From: Dr. Joe Campbell Subject: Re: Alg #4 - Cholesky Decomp - Errata! >I think there's a bug in Fig 5.15 Page 312 of John, John and John. > >The counter "i" should go from M to 1. In the back-substitution... You are right, this is a serious error in the text! Also note that the forward elimination and back substitution algorithms assume that L and U are nonsingular (the text doesn't make this clear). By the way, Golub and Van Loan, Matrix Computations, JHU Press (the bible of linear algebra) covers forward and back substitution and presents algorithms using Matlab style code (e.g., in the 2nd edition, Algorithm 3.1.4 Back Substitution: Column Version). I believe a student is missing the first page of my midterm exam solutions (Q1 and Q2) because I ended up with an extra copy. Contact me if you're missing it and would like it. Next week I'll have solutions to the final homework assignment and a sample final exam for your enjoyment. ##### Date: Tue, 23 Apr 1996 17:36:39 -0400 From: jpcampb (Joseph P Campbell) Subject: Alg 4: LP Energy Reduction The normalized mean-square linear-prediction error (the LP energy reduction) isn't entirely clear in your text. Given that e(m) is the output of a p-th order prediction error filter corresponding to an N-sample speech segment s(m), the LP energy reduction (Ep/E0) is: Autocorrelation Method: N-1+p 2 <- note upper summation limit SUM e (m) E_p m=0 --- = ------------------ E_0 N-1 2 SUM [w(m) s(m)] <- window w(m) m=0 E_p p 2 --- = PROD (1 - k(i) ) <- not true for Covariance Method E_0 i=1 Covariance Method: N-1 2 SUM e (m) E_p m=0 --- = -------------- E_0 N-1 2 SUM s (m) m=0 Please refer to Table 8.2 in Rabiner and Schafer's text Digital Processing of Speech Signals for backup. ##### Date: Mon, 29 Apr 96 18:46:07 EDT From: Joe Campbell Subject: Re: Final Exam > I need to narrow my HMM study scope down a little. Do you expect us > to know the details of the verious algorithyms that are involved in > the solutions to the HMM three questions (eg. are we going to have > to crank numbers through the forward-backward proceedure, the Viterbi > algorithm and the Baum-Welch reestimation formulas) ? No, but you should understand the general concepts. For example, I recommend reviewing the easier problems in Ch 12 and the examples given in the Juang and Rabiner article from IEEE ASSP Magazine that I handed out. > How about contiuous observation HMM? Thanks for suggesting this exam question for next year's class :-)! You should understand the general concepts of generally using a VQ in the discrete case and Gaussian mixture densities (i.e., a weighted sum of Gaussians having different means and variances) in the continuous case. > Considering we're basically teaching the details to ourselves are you > going to cut us a break (your HMM lecture was pretty general) ? You bet! ##### Date: Tue, 30 Apr 1996 00:48:49 -0400 From: Dr. Joe Campbell Subject: Re: Chapter 12 problems. >Problem 12.7 > >Is the main idea of the probem (parts a-c) the fact that the feature >vectors are statistically indistinguishable and therefore all equally >as likely (that is, any state will generate any observation with equal >probability?). To be precise: For any observation sequence, the difference in the likelihoods between the two models will be a constant, _ _ D* - D* = D* - D* 1 2 1 2 which is independent of the observations. The same model will always be chosen, the one with the state probabilities yielding the higher _ D* One-half of the utterances will be correctly "recognized" - exactly the chance level. Therefore, I'd like to live about 100 light-years away (assuming I only have 100 years left to live :-). >Problem 12.11 > >Since were setting the expected values equal does this mean that the >probability of any observed sequence is the same no matter which model >is used? Yes, they tend to generate the same observation sequences because the two models are statistically equivalent (even though their A and B matrices look different). >If I solve the equation: > > O1 = y1,y2 > P(O1 | HMM1) = P(O1 | HMM2) > >I get a bunch of square terms running around that I can't simplify out. > >Could you give me a hint on what I'm doing wrong? E{y(t)|M1}= y1 P(y(t)=y1 | M1) + y2 P(y(t)=y2 | M1) = [ y1 y2 ] [ P(y(t)=y1 | M1) ] <- product of 2 matrices [ P(y(t)=y2 | M1) ] = [ y1 y2 ] p1(t) t-1 = [ y1 y2 ] B1 A1 pi1(1) <- pi is the initial state prob = [ y1 y2 ] B1 A1 pi1(t-1) Similarly, E{y(t)|M2}= [ y1 y2 ] B2 A2 pi2(t-1) Using pi(t) = A pi(t-1) for either model, we discover that pi1(t) = pi2(t) = [0.5 0.5] for every t > or = 1 Therefore, in order for the means to be equal, it must be true that A1 B1 = A2 B2. Equating these matrix products yields: pq + (1-p)(1-q) = rs + (1-r)(1-s) which yields eq 12.191. #####