76 SERVO 07.2014
Few people use the
terms that I use — speech
recognition. Although I
started this article with
the term speech
understanding, I will use
the term speech
recognition in the rest of
this article since speech
implies transmission of
information using a voice. I
will also delve into speaker-dependent speech
recognition systems that require
training versus speaker-independent
systems that understand most — if not
all — speakers.
From Crude Relays in
the 1960s to Digital
These days, voice control devices
are affordable for robotics hobbyists.
However, it’s been a long circuitous
route. As I recounted in previous
articles, many years ago I outfitted a
robot with what passed for a speech
glorified electronic abacus counted
syllables as I spoke and controlled a
circuit by stepping a 10-position
telephone relay to a particular
position. What this device lacked in
speech recognition functionality, it
made up for in consistency. It
consistently exchanged the second
string of commands I issued with the
words and phrases in the first string.
For example, if I said “stop,” “now,
go,” “go right now,” and “you can go
left,” the robot would continue with
those movements, even as I told it to
“go,” “now stop,” go left now,” and
“you can go right.” Crude doesn’t
begin to describe my robot, which hit
one wall after another as I struggled
to remember the right number of
syllables. It impressed my friends for
about a minute before they burst into
peals of laughter.
In sharp contrast to my
rudimentary device of yesteryear,
today’s digital signal processing (DSP)
technology uses specialized
microcontroller-driven chips with
firmware-based speech recognition
algorithms. Industrial and university
experimenters have only had access to
the computer power required for high
level DSP and speech recognition for
about 20 years. It’s been available to
hobbyists for a fraction of that time.
The following sections discuss the
evolution and application of speech
recognition products for robots over
the past 50+ years.
The IBM Shoebox
Leads the Way
William C. Dersch built the
”Shoebox”’ speech recognizer (see
Figure 4) which was demonstrated at
the 1962 Seattle World’s Fair. An
earlier prototype was in a wooden
box (see Figure 5). IBM researchers
had long used pattern recognition
and artificial intelligence as stepping
stones to true speech recognition. The
Shoebox machine recognized 10 digits
and six control words — including
“plus,” “minus,” and “total” — which
Post comments on this article at www.servomagazine.com/index.php/magazine/article/july2014_ThenNow.
Figure 4. William Dersch demonstrates the IBM
Shoebox speech recognizer.
Figure 5. The original prototype of the 'Shoebox.'
Figure 6. Final Shoebox prototype with
digital 'digit heard' display on top.