Problem Domain Books


Jurafsky and Martin

Speech and Language Processing, 2nd Ed.
Danieal Jurafsky, James H. Martin

This book introduces important concepts related to word processing and speech recognition. It reviews topics such as word edit distance, speech synthesis, and algorithms for processing relationships among words.

Jurafsky and Martin

Probabilistic Robotics
Sebastian Thrun, Wolfram Burgard, Dieter Fox

We're not actually using this book but it's a good book.

Academic Literature


Deep Speech: Scaling up end-to-end speech recognition
https://arxiv.org/abs/1412.5567

This paper is by Baidu research on a deep learning-based model for transcribing human speech. The model achieves a 16% error rate on the test set, making it one of the best open-source speech to text models available. Mozilla’s implementation of this paper is what will be used to transcribe speech in the car.



Deep Speech 2: End-to-end speech recognition in English and Mandarin
https://arxiv.org/abs/1512.02595

A continuation of the first Deep Speech paper released by Baidu. It focuses on a deep learning approach for learning English and Mandarin Chinese speech. This is achieved with even greater accuracy than was found in the previous Deep Speech paper by exploring more learning architectures and algorithms. This advanced exploration was enabled with better high-performance computing techniques. The accuracy of the system compares to that of a human transcriber with about a 6% error rate!



Learning Naturally Spoken Commands for a Robot
https://www.isca-speech.org

Studies done by Honda research regarding training a robot to understand spoken commands. Because the autonomous car will also be asked to understand spoken commands, this paper is useful for understanding one method for accomplishing this.