Google’s Pixel phones are the company’s preferred way of showcasing its AI chops to consumers. Pixel phones consistently set the phone camera bar thanks to Google’s AI prowess. But many of the AI features have nothing to do with the camera. The Pixel 4 and Pixel 4 XL unveiled this week at the Made by Google hardware event in New York City continue this tradition. Camera improvements aside, the Pixel 4 makes a play for a new arena that Google clearly wants to rule: offline natural language processing.
At Google’s I/O 2019 developer conference in May, multiple executives touted being able to shrink the company’s cloud-based language model, which is over 100GB, to less than 100MB. The smaller model isn’t as accurate, of course, but it can work offline. The competition, whether that be Apple, Amazon, Samsung, or Microsoft, have nothing like it.
Live Caption and Recorder, which debut exclusively when the Pixel 4 and Pixel 4 XL ship on October 22, are the direct result of this improvement. The former was first shown off at I/O and the latter leaked weeks ago. In fact, as a result of the leaks, Google didn’t even talk about Live Caption onstage this week and quickly skimmed over Recorder. But a closer look shows that they are indeed cut from the same cloth. Update: Google confirmed to me that Live Caption and Recorder use the same underlying speech model with some custom training for the different use cases.
Live Caption and Recorder work only in English. For Live Caption, Google plans to support more languages “in the near future.” For Recorder’s transcription and search functions, more languages are “coming soon.” Coincidence? I think not.
How Live Caption and Recorder work
Live Caption provides real-time continuous speech transcription of whatever is playing on your phone. The feature can caption any media, including songs, audio recordings, podcasts, phone calls, video calls, and so on. Live Caption can be accessed via the volume buttons; it appears as a software icon when the volume UI pops up. As soon as speech is detected, captions will appear on your phone screen. You can double-tap to show more, and also drag the captions to anywhere on your screen. You don’t need to open another app, and you don’t need a Wi-Fi or data connection.
The Recorder app records meetings, lectures, and anything else you point your phone’s microphone at. Like any other similar app, you can save recordings and listen to them later. Recorder goes further, however, by simultaneously transcribing speech, as well as automatically recognizing audio events like applause, birds, cats, dogs, laughter, music, roosters, speech, phones, and whistling. Furthermore, you can search within your recordings to find a specific word or sound. Here as well, you don’t need a Wi-Fi or data connection.
The new Recorder app uses speech recognition and AI to transcribe lectures, meetings, interviews and more—and makes them easy for you to find later. (English only right now, with more languages to come.) #madebygoogle pic.twitter.com/fdKRItuS4b
— Google (@Google) October 15, 2019
So Live Caption is for anything coming out of your phone’s speakers and Recorder is for anything coming into your phone’s microphone. That said, Live Caption and Recorder don’t work if you’re on a phone call, voice call, or video call.
Back at I/O, Brian Kemler, Android accessibility product manager, told me Google had no plans to let Live Caption support transcriptions. “Not for Live Caption. Obviously, we thought about that. But we want the captions to be truly captions in the sense that they’re ephemeral, if they help you understand or consume that experience. But we want to protect the people, the publishers, content, and content owners. We don’t want to give you the ability to pull out all that audio, transcribe it, and then do [whatever they want with it].”
That’s what Recorder is for.
Android 10 required
Don’t confuse Live Caption and Recorder with Live Transcribe, which Google released in February. That tool uses machine learning algorithms to turn audio into real-time captions, but it relies on the cloud (specifically, the Google Cloud Speech API). Live Transcribe is available on 1.8 billion Android devices. Live Caption and Recorder may work on-device, but the number of devices is limited.
Google says that the Pixel 4 and Pixel 4 XL use a Pixel Neural Core for on-device processing. Live Caption is coming to the Pixel 3, Pixel 3a, Pixel 3 XL, and Pixel 3 XL “later this year.” Google is also “working closely with other Android phone manufacturers to make it more widely available in the coming year.” Obviously, none of these have a Pixel Neural Core (Pixel 3 and Pixel 3 XL have a Pixel Visual Core, the Pixel 3a and Pixel 3a XL have neither).
We can conclude that Live Caption will work best on the Pixel 4 and Pixel 4 XL, but Google is clearly able to get it to work without the Pixel Neural Core. (In fact, Kemler showed it to me on a Pixel 3a back in May.)
We can conclude the same for Recorder. The app leaked late last month. Enthusiasts were able to get it to work on various devices, including non-Pixel phones. The only real requirement seemed to be Android 10.
Google’s strategy here seems obvious to me. The company will use the Pixel 4 and Pixel 4 XL to show off Live Caption and Recorder in English. As the company adds more languages and gets comfortable with performance, Live Caption and Recorder will become more widely available. First on older Pixel phones, and eventually on other Android devices.
That way, Google will be able to say it’s bringing cool AI features to more and more people. At the same time, it will ensure that anyone buying the latest Pixel phone is getting its cutting-edge AI features first.
ProBeat is a column in which Emil rants about whatever crosses him that week.