T.J.Moir Personal pages

The Massey Speech Project



Picture of acoustic direction finder
Key words: Speech recognition,smart house,acoustic direction finder,time-delay estimation.



Welcome to the Massey University speech project. This part concerns a system to find the (x,y,z) co-ordinate of a moving sound source. In essence this is a type of Sonar - similar systems are used in the marine industry. The system is passive in that it does no emit any sound waves but only receives from the acoustic source. The current system uses 5 micophones and computes the co-ordinates based on time-delay estimation bewteen the source and the microphones. Time-delay estimation is harder than it looks since there are many problems in a real environment such as additive noise and room reverberation.Reverberation is the acoustic equivalent of multi-path propogation in communication systems. Thus an acoustic wave may well reach a microphone via more than one path. This can lead to false readings. There is a huge literature on acoustic bearing estimation - too much for this webpage. If you have labview you can run a simulation of a 4-microphone acoustic bearing estimator where the source moves in a circle.


Acoustic bearing estimation link.


The present system in the picture link above was built by a final year Mechatronics student for his year 4 project. The system uses a camera which tracks a radio around the room.Time-delay estimation is done using the PHAT algorithm (phase-transform). There are other similar algorithms eg the SCOT algorithm,Hanan-Thomson method and so on.All these methods use the so-called Generalized Cross Correlation method which is an extension of ordinary cross-correlation since the ordinary method works effectivly only when the source is white-noise! The methods only differ in the weightings that are used within the cross-correlation integral. The method uses the fast-fourier transform (FFT).The accuracy of teh system is down to how accurate a time-delay you can measure. Delays can only be an integer number of sample instants (normally) so this in itself introduces errors. The higher the sample rate the more acurate. Here we use a sample rate of 40kHz for each channel.


Tel: +64 9 414 0800 ext 9805
Mail me at .... :tom@speechresearch.co.nz
Back to home page