When can I get my new household robot?

We’d all love to have a smart robot in our house that understands spoken natural language, just like those in sci-fi films. Ingrid Zukerman* answers the question ‘How far are we from this dream?’

An anonymous author once mused that “AI is the science of making computers act like the ones in the movies”.

Assuming they meant something like the computer on the starship USS Enterprise or Rosie, the household robot from The Jetsons, what would be required to achieve this dream? And where do we stand?

A modest version of a household robot should be able to understand and respond to simple spoken instructions in natural language (any language spoken by people, such as English, Spanish or Chinese).

It should be able to run errands and perform basic chores, and its responses should be reasonable.

That is, we can’t expect robots to be correct all the time, but in order to be trustworthy, a robot’s responses should make sense.

For our household robot to react reasonably to requests such as “get the blue mug on the table”, it should be able to deal with several issues, such as perceptual homonymy (words that mean different things under different perceptual conditions), syntactic ambiguity, and user vagueness and inaccuracy.

It should also be able to recognise users’ intentions, the potential risks of actions and adapt to different users.

Perceptual homonymy applies to intrinsic features of objects, such as colour and size, and to spatial relations.

For example, when talking about a red flower or about a person’s red hair, the two colours are usually completely different.

In other cases, the intended colour may be hard to determine, or an object could have several colours.

Size depends on the type of an object.

For instance, a particular mug may be considered large in comparison to mugs in general, but it is usually smaller than a small vase.

In addition, context matters: objects seem smaller when placed in large spaces, and if there are two mugs on a table — a larger one and a smaller one — and a user requests a large mug, our robot should retrieve the former.

Spatial relations can be divided into topological relations (indicated by prepositions such as “on” and “in”) and projective relations (signalled by prepositional phrases such as “in front of” and “to the left of”).

Looking at topological relations, “the note on the fridge” may be vertically on top of the fridge or attached to the front of the fridge with a magnet.

Also, if we ask our household robot for “the apple in the bowl”, an apple sitting inside a fruit bowl would satisfy this requirement, but so would an apple on top of a pile of apples in a bowl (even if this apple exceeds the height of the bowl), because it is within the control of the bowl (if we move the bowl, the apple will move with it).

However, if an apple was glued to the outside of the bowl, it would still be within the control of the bowl, but we wouldn’t say it is in the bowl.

Projective relations depend on a frame of reference, which may be the speaker, the robot or a landmark.

For example, if we ask our household robot to pick up the plant to the left of the table, do we mean our left or its left?

A similar decision would be made when interpreting “the plant in front of the table”, but not for “the plant in front of the mirror”, as a mirror has a “face” (it only has one front).

These problems are exacerbated by errors in Automated Speech Recognition — the technology that allows people to speak to computers.

Automated Speech Recognition errors may happen due to out-of-vocabulary words or rare words, which a speech recogniser may mishear as a common word, or words that are being used outside their usual context.

Our AI should be able to cope with misheard and out-of-vocabulary words.

For instance, if we request “the shiny blue mug”, and our robot can’t identify shiny objects, it should still be able to generate a useful response, such as “I can’t see ‘shiny’, but there are two blue mugs on the table, which one do you want?”.

Eventually, our robot should be able to learn the meaning of some out-of-vocabulary words.

The robot will also have to contend with syntactic ambiguity, vagueness and inaccuracy.

Syntactic ambiguity occurs when the phrasing of a description licenses several spatial relations.

For instance, if we ask for “the flower on the table near the lamp”, who should be near the lamp?

The flower or the table?

A request for “the blue mug on the table” is vague when there are several blue mugs on the table, and inaccurate when the mug on the table is green, or the blue mug is on a chair.

Having some concept of a speaker’s intention, and of the implications of requested actions, would help our robot respond appropriately.

If we are thirsty, then even if our request is ambiguous or inaccurate, the robot could bring one of several mugs.

But this is not the case if we want to show our special mug to a friend.

What if we ask the robot to throw a chair?

When would it be appropriate for our robot to question our request, and when should it just comply?

An implicit assumption made by optimisation-based response generation systems is that there is one optimal response for each dialogue state.

However, our response-generation experiments have shown that different users prefer different responses under the same circumstances, and that several responses are acceptable to the same user.

Therefore, it is worth investigating user-related factors, such as habits, preferences and capabilities, which influence the suitability of an AI’s responses.

Moving forward, in order to generate suitable responses to a user’s request, an AI should be designed with the ability to assess how good its favourite candidate interpretation is, how many other good candidates there are, and how they differ from this favourite interpretation.

To achieve that, our AI would have to keep track of alternative interpretations; and for each interpretation, the AI would compute the probability that it was intended by the speaker and the utility associated with it.

This probability, in turn, would incorporate the probabilities of the following factors: the output of the speech recogniser, the syntactic and semantic structures of the user’s request, and the pragmatic aspects of the interpretation.

Previous work has offered a computational model that implements this idea with respect to descriptions comprising simple colours, sizes and spatial relations.

To reach a desirable endpoint, this approach would have to be extended to consider the more complicated issues raised above.

Designed correctly, AIs of the future should consider all these factors to determine whether its interpretations make sense; and they should be able to discern between several plausible interpretations, and decide when to ask and when to act.

*Professor Ingrid Zukerman works in the Department of Data Science and Artificial Intelligence in the Faculty of Information Technology at Monash University.

This article first appeared at lens.monash.edu.

Prime Minister rejects plea for federal funding for new Albury Wodonga hospital

How to make PS News your preferred news source on Google

Review of Triple Zero laws about to (finally) get underway

Boeing shares more details about growth path for Australia’s Ghost Bat uncrewed combat aircraft

Australian-developed hypersonic aircraft set to take off soon

Congressional report points to muddy waters ahead for Australia’s AUKUS ambitions

Latest job vacancies in the Public Sector

Worried your case won't hold up in court? Here’s how to find out

Why urgent projects shouldn’t be rushed

Participatory democracy? If the Swiss can do it, so can we

Does sharing trade secrets warrant more police action and international support than sex crimes?

The hollow - and hollowed out - political parties

Dreams of a high-speed rail link edge closer, but still has a long way to go

Immersive, engaging travel trends in 2026 — with a caveat on the side

Lasers, mirrors, and invisibility shields: Inside Questacon’s trippy new light show

Upcoming Public Sector events

Upcoming Public Sector events

Upcoming Public Sector events

New Tasmanian bill aims to cut red tape so more homes can be built

Custom colonial-style home on picturesque Berrima acreage

Your front-row seat to Kiama’s coastline

When can I get my new household robot?

Start the conversation

When can I get my new household robot?

Subscribe to PS News

Start the conversation

What's Trending

Related Stories

Be among the first to get all the Public Sector and Defence news and views that matter.