Highlights –
- Google is also working towards building a capability where users can conduct multi-search to gain quick insights about various objects in a scene.
- With its “look and talk” feature, users will no longer be required to say “Hey Google” each time for the system to recognize that they are talking to it.
People today want to interact and engage with the world around them in ways fueled by technology.
Keeping this in mind, Google announced several AI-enabled features to Voice, Lens, Assistant, Maps, and Translate.
This includes “search within a scene,” which builds on Google Voice search and Google Lens and allows users to point at an object or use live images coupled with text to define search capabilities.
“It allows devices to understand the world in the way that we do, so we can easily find what we’re looking for,” said Nick Bell, who leads search experience products at Google. “The possibilities and capabilities of this are hugely significant.”
For example, Bell said, a recently bought cactus for his home office started to wither, and what he did was click a picture of it and searched for care directions that helped him bring it back to life.
With another capability hinged on multimodal understanding, while browsing a food blog, a user may encounter an image of a dish they want to try. But before trying that dish, they would like to know the ingredients and a good restaurant nearby that can offer delivery. Here multimodal understanding comes to play as it perceives the dish elaborately and combines that with stated intent by scanning millions of images, reviews, and community contributions, Bell said.
This function will be available globally later this year in English and will be available in additional languages over time.
Similarly, Google is also working towards building a capability where users can conduct multi-search to gain quick insights about various objects in a scene. For example, at a bookstore, users can scan the complete shelf and get every minute detail about the books and recommendations and reviews. This is done with the help of computer vision, Natural Language Processing (NLP), knowledge from the web, and on-device technologies.
“Search should not just be constrained to typing words into the search box,” Bell said. “We want to help people find information wherever they are, however, they want to, based around what they see, hear and experience.”
Goodbye ‘Hey Google’
It will be now easier for users to initiate a conversation with Google Assistant. With its “look and talk” feature, users will no longer be required to say “Hey Google” each time for the system to recognize that they are talking to it.
“A digital assistant is really only as good as its ability to understand users,” said Nino Tasca, director of Google Assistant. “And by ‘understand,’ we don’t just mean ‘understand’ the words that you’re saying but holding conversations that feel natural and easy.”
Google has been working to resolve conversational experiences, nuances, and imperfections in human speech. This has involved significant investment in Artificial Intelligence (AI) and speech, Natural Language Understanding (NLU), text-to-speech, or TTS. This has been bundled into what Google has dubbed “conversational mechanics,” Tasca said.
While examining AI capabilities, researchers perceived that they needed six different Machine Learning models, processing well over 100 signals – including proximity, head orientation, gaze detection, user phrasing, voice, and voice match signals – to understand that they’re speaking to Google Assistant. Tasca said that Nest Hub Max, a new capability, allows systems to process and recognize users to start conversations much easier.
The same will be launched for Android this week and iOS in the coming weeks.
Yet another feature announced by Google relates to quick phrases or very popular phrases – such as “turn it up,” “answer a phone call,” or stop or snooze a timer.
“It’s just so much easier and faster to say ‘Set a timer for 10 minutes than to say ‘Hey Google’ each and every time,” Tasca said.
Natural language enhancements are being made to Google Assistant based on how users interact in their everyday lives. Talking about real conversations, they are full of nuances – for instance, they say “um,” or pause or make self-corrections. Such nuanced clues can occur in under 100 or 200 milliseconds, but each person can understand and respond accordingly, Tasca pointed out.
“With two humans communicating, these things are natural,” Tasca said. “They don’t really get in the way of people understanding each other. We want people to be able to just talk to the Google Assistant like they would another human and understand the meaning and be able to fulfill the intent.”
Google plans to introduce natural language enhancements to Google Assistant early in 2023.
Mapping the world with AI
Further new features are leveraging advances in AI and computer vision to combine thousands and billions of images from Street View with aerial photos to deliver immersive views in Google Maps. According to Miriam Daniel, vice president of Google Maps, these capabilities will be rolled out in Los Angeles, London, New York, San Francisco, and Tokyo by the end of the year.
“Over the last few years, we’ve been pushing ourselves to continuously redefine what a map can be by making new and helpful information available to our 1 billion users,” Daniel said. “AI is powering the next generation of experiences to explore the world in a whole new way.”
For example, with new Google Maps functions, a user chalking a trip to London may want to know about some of the best sights and dining options there. They will now be able to “virtually soar” over the locations like Westminster Abbey or Big Ben and use a time slider to view how these landmarks look at different times of the day. Daniel said they can also glide down to the street level to explore restaurants and shops in the area.
“You can make informed decisions about when and where to go,” she said. “You can look inside to quickly understand the vibe of a place before you book your reservations.”
Only recently, Google Maps launched a new capability to identify eco-friendly and fuel-efficient routes. To date, people have used this to travel 86 billion miles. Google’s approximation is that this has saved more than half a million metric tons of carbon emissions – the equivalent of taking 100,000 cars off the road, Daniel said. This capability is now available in the U.S. and Canada and will be expanded to Europe later this year.
“All these experiences are supercharged by the power of AI,” Daniel said.
At the same time, Google Translate announced new updates, including 24 new languages, bringing its total supported languages to 133. According to Isaac Caswell, a research scientist with Google Translate, these languages are spoken by more than 300 million people worldwide.
He added that there are still roughly 6,000 languages that are not supported. Still, he emphasised that the newly supported languages represent a great step forward. “Because how can you communicate naturally if it’s not in the language, you’re most comfortable with?”