the first signal (x) are multiplied by the time shifted
values of the second signal (y) and the N product
terms are added. The lag step-size is determined
by the sample rate; for example, if the signals are
sampled at 20 kHz, lag values would change in
increments of 50 microseconds.
Consider Figure 1. The baseline between two
microphones is shown in red. Sound from A will
arrive at mic1 and mic2 at the same time, so the
maximum cross-correlation value will be at zero lag.
Sound from B will arrive at mic1 last; in this case, the
sampled values from mic1 would have to be shifted
backward in time (-time) to line up with the sampled
values from mic2. So, the maximum cross-correlation value
would be at a negative lag.
Conversely for C, the values from mic1 would have to
be shifted forward in time (+time) to line up with the signal
from mic2, so the maximum cross-correlation value would
be at a positive lag. Any source to the left of A will have the
maximum cross-correlation value at a positive lag. Because D
is on the extension of the baseline (green), this produces the
greatest time difference — the maximum lag — between the
two microphones. If the distance between the microphones
is 10 cm and the speed of sound is 34,320 cm/sec, then it
will take 291 μs for the sound to travel from mic1 to mic2.
If the signals are sampled at 20 kHz, the maximum cross-correlation would probably occur at a lag of + 6 because
(291 μs / 50 μs) = 5. 8.
However, there’s a problem with the simple diagram in
Figure 1. What if A was on the other side of the baseline?
Could we tell this from the sampled values at mic1 and
mic2? The answer is no. We could not tell which side of the
baseline the sound came from; the solution is ambiguous.
There is always a second potential solution as shown in
Figure 2. That’s why the ROBOEAR has three microphones
and three baselines which yield three ambiguous solutions
from which we are usually able to produce one unique
The time difference between sounds arriving at mic2
and mic1 is the same for sources on different sides of the
baseline if they are at equal angles from the baseline, as
shown by angles A and B in Figure 2. So, how do we
determine angle A? Where the lag distance equals the lag
time (in multiples of the sample period) times the speed of
sound. The angle A is the arcsine of this ratio.
For example, if the maximum cross-correlation occurs at
L = 4, the Lag distance is 4*( 50 μs)*( 34,320 cm/sec ) = 6.86
cm. If we assume the same 10 cm baseline, then the angle
A is the arcsine of 6.86/10 or 43.3 degrees. This illustrates
how we can use phase or time difference to determine the
bearing angle to a sound source.
As mentioned above, differences in amplitude are more
effective in determining direction at higher frequencies
because there’s more shadowing effect from the head
producing a more pronounced amplitude difference
between the ears. The ROBOEAR doesn’t have a head and
it has three ears (microphones) instead of two. So, the way
the ROBOEAR uses amplitude is to consider the amplitude
of the samples from each microphone as a vector from the
origin to the microphone. Figure 3 shows the geometry.
The amplitude of the signal from each microphone
is the maximum value minus the minimum value in a 127
sample set. Let:
h1 = the amplitude of the samples from mic1
R1x = the X component of the vector from the origin in the
direction of mic1
R1y = the Y component of the vector from the origin in the
direction of mic1
and similar definitions for mic2 and mic3.
R1x = cos( 60°)*h1 = 0.5*h1 R1y = sin( 60°)*h1 = 0.866*h1
R2x = cos(- 60°)*h2 = 0.5*h2 R2y = sin(- 60°)*h1 = -0.866*h2
R3x = -h3 R3y = 0
Figure 1. Sound sources
relation to microphone
Figure 2. The
SERVO 09/10.2018 43
To post comments on this article and find any associated files and/or downloads, go to www.servomagazine.com/index.