Whilst Mark Zuckerberg is out touting the power of Facebook to deploy news, ads and videos to the masses, behind the scenes at Facebook a highly credentialed team of scientists are working on innovations far more exciting than trending cat pictures and Buzzfeed quizzes.
Backed by a team of 50 researchers, Yann LeCun – one of the earliest pioneers in biologically based artificial intelligence – is working at the cutting edge of a recently revived field of computer science known as ‘deep learning’. His mission? Create an artificial intelligence with real conversational skills.
On the surface, teaching a computer to speak may sound like a relatively simple task – after all, we have voice activated systems and the perpetually annoying yet entertaining Siri already.
However the mandate LeCun and his FAIR artificial intelligence group is tasked with is a far cry from Siri’s stilted and pre-programmed dialogue or the rudimentary language activation in voice response systems.
To understand the complexity and potentially revolutionary effect of what FAIR is working on, a basic understanding of computational processes and the different forms of AI is essential. In a (hugely simplistic) nutshell, computers traditionally process tasks and information in a linear fashion, using what is known as Von Neumann processing.
Simple machine learning (basic artificial intelligence) retains a degree of linearity, but applies parallel processing, allowing a computer to execute many processes simultaneously.
Struggling with ambiguity
Structurally this approach resembles the way a human brain processes information, and it allows a computer to ‘learn’ based on codified outcomes (correct/ incorrect). Whilst basic neural networks can produce impressive logical learning on tasks with clear outcomes, they struggle with ambiguity.
Tasks such as recognising human faces from photos, or parsing natural language into procedural commands still require human experts to spend extraordinary lengths of time teaching the computers how to deal with various permutations in pixel location, feature share or syntax, making their commercial use prohibitively costly.
By contrast, the ‘deep learning’ approach pioneered by LeCun in the 1980s, and now revived into popularity by FAIR and comparable research units at Google, IBM and Microsoft, teaches computers to ‘learn’ in a far more human way. Some systems already developed using this approach can recognise images as accurately as humans; LeCun aims to take the next step in sophistication, creating machines with both the linguistic skills and the human-like common sense to have a basic conversation.
“Language in itself is not that complicated,” he says. “What’s complicated is having a deep understanding of language and the world that gives you common sense. That’s what we’re really interested in building into machines.”
Whilst undoubtedly this work is technically groundbreaking, one wonders what the practical application and benefit will be, should the FAIR team achieve their goal of creating a conversational intelligence.
Facebook is tentatively looking to use it in a soon to be released virtual concierge called Moneypenny, as an alternative to supporting Moneypenny with a call centre full of actual assistants – a harmless and yet sadly uninspiring use for such a significant technical achievement.
Thinking beyond its application to virtual concierges, being able to control machines with conversational instructions could have profound impacts on how we use computerised systems.
At the most basic level, a shift from button pressing/ mouse clicks to natural language commands would require a significant re-think of how human-computer interfaces are designed. Linguistic cues would potentially reduce the role of the touchscreen and the mouse, which in turn may change the way we feel about and engage with various devices.
Although shifting to natural language could create a less stilted and more freeflowing interaction with our automated peers, it could also introduce the same types of misunderstanding and ambiguity that is common to human interactions.
If deployed in critical operations systems, such as heavy machinery or vehicles, the introduction of this type of ambiguity could be fatal. Telling your car to turn left when you mean right could well be the last thing you do.
And of course there is the issue of which human language a system is programmed in – what if a Japanese user tried to talk to an English speaking machine? You only have to travel to a country whose language you don’t speak to see how much frustration (and hilarity) this could cause.
Beyond these simplistic and practical considerations however, LeCun suggests provocatively that there may be deeper shifts in the human- machine relationship, with machines able to ‘get to know’ their users well enough to understand when what they want isn’t actually what they need.
“Systems like this should be able to understand not just what people would be entertained by but what they need to see regardless of whether they will enjoy it,” he says.
Reiterating our earlier point, although clever in concept, it would appear that LeCun and Facebook’s aims for this conversational intelligence are profoundly superficial.
Sure, a helpfully pragmatic concierge or an information source with common sense would be handy – but it’s hardly solving world hunger. One hopes that, as this capability develops further, researchers will consider more socially and economically beneficial applications for conversational intelligence than a virtual Moneypenny.