What is pod|fanatic?
What makes pod|fanatic’s search engine unique?
Most podcast search engines are based exclusively on episode titles and descriptions which are submitted by the podcast makers themselves. pod|fanatic is different: we use state-of-the-art audio transcription technology to generate machine transcripts of each episode. When you search our index, you are actually searching these speech-to-text transcripts. This means that if someone mentions a subject, but doesn’t list it in their episode description, the episode will still show up in our search results.
So, I’ve seen some of the transcriptions and they’re pretty much unreadable. Why is that?
The transcriptions on our episode pages are automatically generated by doing voice recognition on the actual podcast audio files. If you’ve ever used speech-to-text software before, you’ve probably noticed that it’s not very accurate. Without doing any audio training (this is the process by which speech recognition engines can learn what your voice sounds like), most processing engines will have an accuracy rate of about 20%–40%. Unfortunately, since every podcast host has a different voice, the possibility of doing training is out of the question. In other words, the custom speech data that would allow us to recognize Ira Glass’ voice somewhat reliably, would be almost useless when trying to understand Marc Maron (and vice-versa).
20% accuracy!? But doesn’t that make these transcriptions useless?
No! While the transcriber does definitely get many words wrong (enough to make the transcriptions useless for human reading), it tends not to make the same mistake twice. Because of this, the transcripts are still very useful as the basis for a search engine. You see, topics that are the focal point of a conversation are usually mentioned more than once. The more times a word is mentioned, the higher the likelihood that it will get recognized correctly by the transcriber. Once the transcript is complete, we can then remove all of the words that are mentioned only once, and use the remaining words to generate a keyword list for doing searches.