This tutorial is mostly for people (esp. my AI students) who want to find formants of any sounds sample. So here are the things/software you will need.
- a good mic (recommended but not necessary)
- a quite room
- any software which can cut and save wav files. I use Praat.
- the console version of Praat, called PraatCon. This will be used to extract the formants from wav files.
- about 30 mins.
Once you have all these things, here is what you do for recording the phone sounds
- Start Praat. You will see two windows. You can close the other window but keep the one titled Praat Objects open.
- In the menu, select New/Record mono sound. Select sampling frequency to 16000 Hz, press Record and start speaking in the mic. When you are through, press Stop. Speak all the phones you want in a single go but with silence in between. In the screenshot below, AA (as in father) is spoken multiple times.
- Press Save to list and then Close. You should now an entry in the Objects listbox. Make sure the entry is select and then press Edit.
- Select the phone for which you want to extract the formants by pressing the mouse button on the start of the phone and dragging it to the point right before it finishes. Please note that we will be using the mean formant value for a phone so it is essential that only the middle part of the phone is selected to ensure accuracy and no other noise or silence is present. The selected area will be highlighted.
- From the File menu, select Write selected sound to WAV file, enter the name of the wav file and save.
- Repeat step 4 and 5 for all the phones you want to save.
That’s it for the recording part. Alternately, if you already have a collection of sound files with a single phone, you can use that. You can download one such collection here. The zip file has 40 files where 2 speakers have spoken AA (as in father) and II (as in been) 10 times each.
Although you can use the GUI based Praat to find out the formant values for each phone, we want to automate the task. To do so, we will use the console version of Praat called PraatCon, and run a script to extract the formants. The small Praat script written below takes the name of a phone file as argument and outputs the first two formats F1 and F2 to the console.
form Display mean F1 and F2 sentence filename endform filename_noext$ = replace$ (filename$, ".wav", "", 0) Read from file... 'filename$' select Sound 'filename_noext$' To Formant (burg)... 0 5 5500 0.025 50 f1 = Get mean... 1 0 0 Hertz f2 = Get mean... 2 0 0 Hertz clearinfo print 'filename$' 'f1''tab$''f2''newline$'
This script along with a small batch file (run.bat) which runs this script for all wav files present in the current directory, can be downloaded with all the sound files by clicking on the link below. To get this running, extract all files in a directory and place your downloaded copy of PraatCon in the same directory. The just click on the run.bat icon and watch the magic!
The sound files in the archive give this nice little graph when their F1 and F2 values are plotted as an XY scatter graph shown below. All this was done to construct a linearly separable dataset which can be given to students for training a perceptron to distinguish between two phones. Any suggestion/comments are welcome.