Highlights:

  • Since the year 2019, Alphabet has been working on the development of robots that can perform simple jobs such as fetching beverages and cleaning surfaces.
  • According to Google, their robots could correctly execute user orders 74% of the time once PaLM-SayCan was integrated into them.

Alphabet, the parent company of Google, is combining two of its most ambitious research initiatives – robotics and AI language understanding – to create a “helper robot” that’s capable of comprehending instructions in natural language.

Since 2019, Alphabet has been working on the development of robots that can perform simple jobs such as fetching beverages and cleaning surfaces. This Everyday Robots project is still in its infancy – the robots are slow and hesitant – however, the bots have now been given an upgrade: Improved language understanding, thanks to Google’s Large Language Model (LLM) PaLM. Although the project is still in its infancy, the robots have been upgraded.

Robots may be programmed to carry out only the most elementary tasks, such as “bring me a bottle of water.” However, LLMs such as GPT-3 and Google’s MuM can better discern the purpose behind more indirect commands. In the scenario presented by Google, you might tell one of the Everyday Robots prototypes, “I spilled my drink; can you help?” This order is run through the robot’s internal database of probable actions, and the robot concludes that it means “bring the sponge from the kitchen to me.”

Even though the bar has been set quite low for an “intelligent” robot, this is still a step in the right direction. What would make the robot intelligent is if it witnessed you spill your drink, heard you yell, “gah, oh my god, my stupid drink,” and helped you clean up the mess.

PaLM-SayCan is the name that Google has given to the resulting system. This name encapsulates how the model combines the language understanding skills of LLMs (“Say”) with the “affordance grounding” of its robots (that’s “Can” — filtering instructions through possible actions). PaLM stands for phrase-level modeling, and SayCan stands for affordance grounding.

According to Google, its robots could correctly execute user orders 74% of the time once PaLM-SayCan was integrated into them. Additionally, the robots could plan appropriate responses to 101 user requests 84% of the time. Even though that’s a great hit rate, you must take those stats with a pinch of salt. Because we do not know the complete list of 101 commands, it is unclear how restrictive these instructions were. Have they truly captured the full scope and complexity of language that we would expect a genuine home helper robot to understand? It’s not very likely.

The reason for this is that this is a significant obstacle for Google and other companies working on house robots since real life is inherently messy. There are just too many complicated orders that we would want to ask a genuine home robot, such as “clean up the cereal I just spilled under the couch” to “sauté the onions for a pasta sauce.” An actual home robot is just not feasible currently (both commands contain a vast amount of implied knowledge, from how to clean up cereal, where the onions in the fridge are and how to prepare them, and so on).

For this reason, the sole home robot to achieve even a modicum of success in this century — the robot vacuum cleaner — has but one life goal: Suctioning dirt.

We are now seeing new bots enter the market, but these are still purposely constrained in what they can achieve. As AI offers advancements in abilities such as vision and navigation, we are now seeing new types of bots venturing the market. Take, for instance, the Retriever bot developed by Labrador Systems. It’s essentially a shelf that rolls around on wheels that can carry things from one room to another in the house. This straightforward idea has a lot of untapped potential. For example, the Retriever robot might be of tremendous assistance to persons with mobility issues; but we are still a long way off from having robot butlers that can do anything we ask of them.