There is a lot of excitement among startups and other companies about building for Amazon’s and Google’s voice platforms. My reaction to this can be summed up as follows:
Google and Amazon will have a perfect copy of every enduser interaction. So anything that takes off they can wrap into the core based on what they have learned. And something that can be accessed directly by saying just “Alexa” or “OK Google” will alway get more use then something where you need to add another provider’s name explicitly to invoke it.
So what can be done instead? I believe there is an opportunity to create an open voice platform. It would consist of an open hardware reference design with microphones, compute and network connectivity combined with open software for speech-to-text and text-to-speech capabilities. Open as in anyone can write on top of it and there is no underlying platform provide who also receives all the language (other than for the purposes of improving voice-to-text).
People should be able to choose any name or word to activate the device. We should be able to install “skills” from anywhere, meaning there should be no app market monopoly on the device. It should also be possible to easily install new voices. I see a huge amount of potential for innovation on such a platform, including much better support for keeping state, connecting multiple devices in the same home, etc.
All of this is now possible because of the breakthroughs that have been made in speech recognition. We finally have the ability to have speaker and domain independent speech-to-text recognition. This allows for a clean separation of layers where applications can be developed that live only at the text level.
If this were a Kickstarter project, I would back it instantly.