Can Conversational AI Shed its Dunce Cap?

Conversational AI is everywhere. (E.g. Alexa, Siri, Google Assistant, as well as thousands of lesser known chatbots). However, as discussed in the last post, these systems currently function well only with simple tasks, because the bots cannot understand normal everyday conversation the way a human assistant does. As a result, we’ve largely given up using them for complex tasks.

Current chatbots are not very smart

Current chatbots are not very smart

As we saw at this year’s Google I/O conference, Google demonstrated that we are on the cusp of change and by using several new techniques, AI is now capable of understanding complex conversations and responding in natural and intelligent ways.

The most popular technique for helping machines understand conversational speech is by using word embeddings, which attempt to encapsulate the “meaning” of a word in a vector after reading massive amounts of text and analyzing how each word appears in various contexts across a dataset. The idea is that words with similar meaning will have similar vectors. Word2Vec & GloVe are currently the most popular word embedding algorithms. But as Sebastian Ruder, Research Scientist at AYLIEN, notes “learning word vectors is like {an image recognition system that} only learning edges.” It works for certain situations where the problem is simple or straightforward, but not when things are complex.

Let’s look at several new techniques that attempt to move beyond Word2Vec’s shallow approach and embed text with meaning in a richer way.

1) Narrow the Scope and Train Intensively

While this technique works, it is more of an “idiot savant” approach, where the bot will be able to converse across a narrow domain quite well but will be really dumb about everything else. This is okay in some situations, but when using this approach, it is especially important that the users know the chatbot is a computer, so that when the bot says something silly the user knows why.

This was a core technique used by Google in its I/O conference demo, when Google Assistant booked an appointment at a hair salon, and then made a restaurant reservation. As Google explained, the training was intensive and narrow in scope. But what would have happened if the human had decided to make small talk and asked, “How about them Red Sox?” Google noted that Google Assistant was not ready to “carry out general conversations,” so the response would probably have been hilarious or embarrassing.

2) Next Generation Word Embeddings

A paradigm shift is occurring within word embeddings by new techniques such as ELMo, ULMFiT, and the OpenAI transformer. As per Sebastian Ruder, if learning word vectors {e.g. Word2vec} is like only learning edges, these approaches are like learning the full hierarchy of features, from edges to shapes to high-level semantic concepts.” In essence these new techniques have a much richer semantic representation of words/sentences and thus enable the bot to understand words in a deeper way.

Possibly even more exciting is the idea that with these newer systems we may be able to build transferable pre-trained universal word/sentence embeddings that we can use with virtually any bot and achieve excellent comprehension and results, which sounds a lot like human intelligence!

3) Use an Ontology and Sentiment detection to Label the Text for Meaning

While word embeddings are one way to embed a text dataset with meaning, data labeling is the tried and true method used in other AI domains such as image recognition. (See our white paper on “Data Labeling Full-Text Datasets for AI Predictive Lift” for more comprehensive treatment of this topic.) The problem with data labeling is that in the past this has been done by humans and thus is very expensive. But automated data labeling for text is now a possibility, using an Entity Ontology and Sentiment Detection.

An entity ontology is like a dictionary and a thesaurus; its job is to define the meaning of words by: a) encoding commonalities between concepts in a specific domain (e.g. both “yellow fever” and “malaria” are “diseases spread by mosquitoes”), and b) encoding how words relate to concepts, when they vary depending upon the context (e.g. that Mercury is sometimes a “metal,” sometimes a “planet” and a sometimes a “Greek god”). Entity ontologies can be created and used to label a dataset with meaning at great cost using humans. But now these tasks can be fully automated. High quality ontologies can be generated using NLP and AI techniques. These ontologies can be further edited by domain experts (“human-in-the-loop”) and then used to label datasets in bulk or in real time (e.g. streaming).

Understanding text also requires a nuanced and micro understanding of sentiment. Document or even sentence level sentiment is essentially useless for AI. For example, “My neighbor’s garden is awesome, the vegetables are really fresh, but they also attract deer, which is how I got Lyme disease.” The bot needs to see the first part of the sentence as positive (e.g. the fresh vegetables produced by my neighbor’s garden are excellent), and the second part of the sentence as negative (e.g. getting Lyme disease because of my neighbor’s garden is awful), rather than as neutral (half good plus half bad).

Labeling datasets with ontologies and sentiment often result in a better chatbot than by using word embedding alone, as the ontology and sentiment detection capture additional meaning allowing the bot to achieve a more human-like understanding of the text.

Losing the Dunce Cap in 2019?

While I cannot be sure these newer techniques will make bots super-smart next month or next year, we do know that they are making conversational AI systems smarter all the time. If you are using these new techniques, we’d love to hear about how it’s working. Or if you want help moving your bot to the head of the class – give us a call.

Why Conversational AI's Great Expectations Met Dumb and Dumber

Conversational artificial intelligence is surrounded by a lot of hype and promise. The seeds of these great expectations were sown in the 60’s by Star Trek (1966) and 2001: A Space Odyssey (1968). We all want to ask the computer to help us with almost any imaginable task just like they do on TV and in the movies. Think back to the first time you talked to Alexa or Siri or Google Assistant. You were hoping for “HAL” like human conversations and maybe a little afraid of falling in love as Theodore Twombly (Joaquin Phoenix), does in “Her.” But then reality smacked you in the face (cue the sound of a car crash.)

Conversational AI’s roller coaster ride

Conversational AI’s roller coaster ride

As the Nielsen and Norman Group aptly point out “people are learning that ‘intelligent’ assistants are actually not that smart, ” they also note that “{people} simply avoid usability agony by limiting their use to a subset of simple features.” Yup – I think that sums it up pretty well. Siri, Alexa and Google Assistant are pretty good at simple tasks, but falling in love? Not gonna happen.

So why is the reality so far from our expectations? To be blunt, (as with human relationships that just did not work out) it’s because they just cannot understand us. Why is it that IBM’s AI Watson can defeat top Jeopardy! players, but for the clue 'What do grasshoppers eat,' Watson answered: 'Kosher.' And as Gregory Barber of Wired points out, while we’ve made great of progress in AI image recognition, “understanding language…has proved elusive.”

But this is beginning to change, as we saw at this year’s Google I/O conference, where Google demonstrated a flawless Google Assistant booking an appointment at a hair salon, and then making a restaurant reservation. The humans on the other end of the call had no idea they were talking to a chatbot. But, and there is a big but, Google explained that this was only possible because the scope of the problem was narrow and the training was intensive. Google noted that Google Assistant was not ready to “carry out general conversations.”

So, through daily experience, people have had their expectations surrounding virtual assistants reset. We’ve learned to converse only around simple tasks. I call this hitting rock bottom. The good news is, that just like when a roller coaster hits bottom, we are now starting the long expected ride up to the top, where the fun can really begin.

In the next post, we will take a look at how cutting edge technologies are enabling machines to understand us better and why the hype around virtual assistants will eventually meet our expectations.

Conversational Systems / Chatbots: The Best Ways to Achieve Success

Or “the times they are a-changin….” (Bob Dylan)

Conversational systems (e.g. chatbots) divide into two camps:

  • Rules-based chatbots – this is the old guard that make minor use of AI (they currently dominate the market)
  • Intelligent chatbots – this is the emerging new guard that make intensive use of AI and are striving for human like intelligence


Lorie Shaull (Wikimedia)

Let’s examine how to achieve success in each camp.

Rules-based Chatbots

Rules-based chatbots are the reigning world champs of conversational systems and are the Rocky Balboas of the industry. They achieve their capabilities through lots of hard work, time and effort. They have no actual smarts but mimic intelligence via rules and programming. Some of these systems can function at a very acceptable level, but they are rarely confused with a person. In fact, informing the human upfront that they are dealing with a robot is the generally accepted practice because it’s a real embarrassment to trick someone at the beginning of the conversation and then to have the robot fall flat on its face when asked an unexpected question.

The weakness of a rules-based chatbot is ultimately its greatest strength. If you are not trying to be a human, then it’s okay to admit it and focus on what you do well.

  Achieving success with rules-based conversational systems:

a) Use the 80/20 rule to focus the chatbot. Understand your call volume and focus the chatbot on your high volume simple requests. If you can off-load 10% or 30% of the total call volume by answering the repetitive, low-value questions it’s a huge win.

b) Admit defeat and keep the humans happy. Do not worry about handling the tough problems. People do this well and chatbots (especially rules-based ones) do this poorly. So as soon as your confidence level on a response drops below a reasonably high level – route the person to a service rep. This makes them happy and keeps the chatbot from getting embarrassed.

A colleague of mine just built an Alexa skill for a municipality. The chatbot was focused on the frequently asked, but easy to answer questions (such as “What is the recycling schedule?”). It will be a huge win for both residents and the town.

Intelligent Chatbots

We are now on the verge of a sea change in conversational systems from rules to understanding. Think of this as the Dick Fosbury of chatbots - a whole new approach to conversational systems. Rather than mimic understanding with rules, intelligent chatbots attempt to use AI and machine learning to understand a domain at a deep enough level so they can handle questions and provide high-quality responses without rules and programming.

An early example of this shift was on display at this year’s Google I/O conference. At the conference, Google CEO Sundar Pichai demonstrated Google Assistant running Duplex, Google’s experimental AI voice system. The demonstration consisted of two parts: first, Google Assistant booked an appointment at a hair salon, and then made a restaurant reservation. The demonstration was flawless and the humans on the other end of the call were clearly fooled and had no idea they were talking to a chatbot.

In their AI blog, Google explained that “one of the key research insights was to constrain Duplex to closed domains, which are narrow enough to explore extensively. Duplex can only carry out natural conversations after being deeply trained in such domains. It cannot carry out general conversations.”

Informatics4AI is working with a number of customers focused on intelligent conversational systems, and we fully agree with Google’s engineers.

Achieving success with intelligent conversational systems:

  • Focus the chatbot and constrain the domain. Do not attempt to train your chatbot over an open/expansive domain. Focus the system on a specific topic and you may be able to get the intelligence you are seeking. For some customers with open domains, we are experimenting with the creation of subdomains to enable learning on focused topics and a triage-bot to help direct humans to the appropriate subtopic.
  • Better data makes a better model. You will need a high-quality dataset for training the chatbot. Labeling your full-text for meaning will assist the required deep training and you will achieve better results. (See this post for more information of labeling full-text for meaning.)

Are you thinking about a truly intelligent bot? Let us know your experiences as well as your successes and failures. We also welcome your questions.

Why Machines (AI) find full-text hard to understand?

In this multi-post topic, we examine the problem and reveal the secrets for successfully training AI models on full-text datasets. First, let’s understand how hard this is and why?

Two people talking v4.jpg

The following statement by Indrek Vainu CEO of AlphaBlues, an enterprise chatbot company, summarizes the current situation. “Extraction of meaning — or more specifically, semantic relations between words in free text — is a complex task. The complexity is mostly due to the rich web of relations between the conceptual entities the words represent.” He goes on to say that machine learning is “largely clueless when fed unstructured data, such as free text.”  

IMImobile, another chatbot company states that “Machine learning is a powerful technology and promises an exciting future where machines can come to understand our needs and our intent, perhaps better than we do ourselves. However, at this moment in time we only recommend machine learning for scenarios where there is little scope for ambiguity, and where vectorisation (converting non-numeric input to numeric inputs) is straightforward.” 

A recent customer engagement at Informatics4AI supports these statements. Our customer was working with a dataset comprised of unstructured doctor's notes. They found their machine learning efforts created a model that was highly effective for straight forward diagnostic situations (e.g. a patient passing a common screening test). But when fed notes relating to complex tests and multiple patient conditions, the model did not produce predictions with the accuracy that they needed. 

As an illustration of the difficulty that AI has with full text (and for a bit of fun) let's take a look at the results that Janelle C Shane got when she trained a neural network on a database of about 30,000 recipes and then asked the machine to produce a new recipe: 


2 pkg hershey’s can be prepared in unpeeled

1 smaller

½ cup yellow onions you may

1 cup egg; chilled, coursely chopped

½ lb bacon, chopped

1 ½ cup sugar, grated

4 oz square oil

Halve the finely chopped fresh garlic salt and pepper. Break the meat into the pineapples and pat them, scraping the room off the skillet. Add ghees and beer and bring to a boil; cover and simmer, uncovered, on High for 20 to 30 minutes or until the onion thickens.

To be fair and to clarify, this model was built by an AI enthusiast and not a AI professional, but I think it illustrates the issue – the machine has no clue what a recipe is really all about. 

However, all is not lost when trying to apply machine learning to full text. The key is adding structure and meaning to the raw data, and by doing so, enable the machine to understand the text and thus begin to learn. We will review these techniques in the next blog post