When people first use Juji, they are often amazed by how easy it is to create an intelligent chatbot with the platform. This reaction of pleasant surprise is particularly pronounced for people in the know, i.e. technical people who have actually done relevant work before. I am talking about the CTOs, the NLP researchers, and the employees of big technology firms.
A fun anecdote: our first Facebook ads a couple weeks ago was rejected due to "Unacceptable Business Practices", even though the ads was just a screencast of a user using Juji to create a chatbot. Obviously they reverted the decision after I complained, but it did indicate that what Juji offers is often considered too good to be true.
I give full credit to our wonderful engineers and designers, whose ingenuity and heroic effort makes Juji possible, and to my partner Michelle, whose relentless advocacy for users drove us here. On the other hand, there might also be a few things that I did not screw up completely as the architect. In retrospect, here are three high level ideas that might have helped, or in marketing speak, "three big ideas that make Juji stand out".
To give more context, my point of comparison is regard to the chatbot platforms that truly have elements of artificial intelligence, not the numerous button bot platforms that offers no more than a graphic user interface in a message box. In this regard, I am mainly comparing Juji with chatbot offerings from a few major technology companies: Amazon Lex, Facebook Messenger, Google DialogFlow, IBM Waston, Microsoft Bot Framework. If you insist on knowing my opinions on these platforms, my ranking of them in term of technical capability is the following: IBM > Microsoft > Google = Amazon > Facebook. Obviously, Juji is at a mile above them all ;-).
Call me crazy, but I actually want to build artificial general intelligence (AGI), someday. Of course, I am not crazy enough to bet a company on AGI. However, I do believe that AGI could be reached incrementally, by building smaller and practical intelligent things one at a time. Just like how humans build anything impressive, the practices will always be ahead of the big and beautiful theories. Chatbots surly sound like one of those smaller and practical goal posts that might lead to something big in the future. All these is to say, I want my chatbot platform to have the potentials to grow into something more ambitious, perhaps an operating system for intelligence?
The first technical capability necessary for intelligence is what I call agency. Animals have agency because they act on their own, whereas objects such as stones, cups, or keyboards do not. Objects do not have agency because they only react, but do not act on their own. Obviously, there is no possibility of intelligence if there is no agency. In term of computer software, for a platform to support building software components that have agency, it must allow the components to have their own execution loops.
There is an important distinction between bots that just respond to user input or external events and those who run their own execution loops. The former share a system event loop and is reactive, and the later let each agent to have its own execution loop, hence can potentially be proactive, i.e. acting on their own or have their own mind, so to speak. It is the later that should be regarded as real agent.
Among the major chatbot offerings above, only IBM Waston bots seem to have agency. The rest of the offerings are either deployed as Web hooks (DialogFlow, Lex), or run as callback handlers (Bot Framework). These are all reactive bots that cannot act on their own. In Waston bot, the dialog is defined as trees of nodes, where each node is a production rule (i.e. If-Then). The developer does not control the execution of the dialog, but the agent itself runs an execution loop to go through these trees and act accordingly. In principle, this enables the agent to be proactive. I do not know if IBM Waston actually does this proactive firing of rules in practice, but this is precisely how Juji bot works.
When an end user comes to one of the Juji chatbot deployment on either Facebook or the Web, a new Juji bot is instantiated on the spot for this encounter. Each Juji bot runs two execution loops simultaneously, one reactive and another proactive. The end result is that the bot may speak any time on its own, not just react to user input. It is also easy to keep an Juji bot instance running indefinitely, act on its own to proactively send information out to a user, based on a schedule, or some environmental contexts that the bot creator has determined. Essentially, each end user gets her own unique Juji bot, who overtime would potentially develop a unique relationship with. This is a far cry from the universal bots in your living room now that are not personalized and are only reactive.
2. Topic Abstraction
The second technical aspect I care about is the abstraction of conversation embedded in the system. One factor I look at is the concept of natural language understanding (NLU) used. The majority of NLU systems are modeled on a concept of intent, referring to what user wants to do. This reflects a fundamental bias of these systems that were originated from academic research, where the narrow goal that the academics have set themselves up is to help users to accomplish certain tasks, hence it is central to understand the intent of user utterances.
I regard the reliance on intent as a severe limitation, because human-bot conversation may not be about user’s intent at all. For example, what about the bot’s intent? Considering only user’s intent limits the application of bots to some boring application domains such as customer support, question answering, internet of things, or e-commerce, where user ask questions that bots try to answer or speak their wishes that bots try to fulfill.
For more interesting applications such as marketing research, job interviews, gaming characters, customer on-boarding, educational companion, mental health assistant, and so on, the chatbots need to have their own agenda, which the often used intent concept simply does not cover. Among the major systems, the only exception is Microsoft BotKit, where intent is not explicitly hard coded in the system. On this front, Juji goes a step further, Juji bot can have complex and explicit agenda that go beyond either user or bot’s simple intent. These agenda are also present in Juji's question answering system, so a user question may lead to a completely different agenda, hence a completely different conversation.
Another factor in the conversation abstraction is the unit of dialog considered in the system. Most systems treat a turn as the basic unit of conversation. This is too granular, because the developers of the bot then have to think of all the possible user utterances at each turn and respond to them accordingly. This is a task not very suitable for a developer to do and the system should give them as much help as possible. BotKit is again the only solution that introduces a higher level concept. BotKit has a concept of a thread, which handle a sequence of turns. However, this is not good enough, because a thread can only be executed, and the only thing one can do with them is to jump among threads.
Juji’s abstraction is called a topic, which may have zero, one or multiple turns in them. Most importantly, topic is the first class citizen in Juji platform, where one can create a topic on the fly, pass arguments to it, pass a topic around, look up a topic, and do all kinds of things with them. This flexibility enables Juji to supply a large library of reusable mini-conversations (represented as topics of course) that users can simply compose into a full bot. Developers are largely alleviated from the burden of trying to anticipate user’s next input, because Juji has many reusable topics that handle all kinds of user digressions and dis-behaviors that a bot developer is not well equipped to anticipate.
The topic abstraction also enables easy representation of complex conversational logics and contexts as plain data structures. This data centric view leads to pervasive code generation throughout our system. Some data in, other data out. Everything flows as data and can be generated on the fly. Such maximum flexibility makes it easy to create an easy to use chatbot creation user interface, without forcing users to learn strange NLP jargons, such as intents, entities, slots, and so on.
3. Symbolics as the bones and ML as the flesh
With the popularity of deep learning (DL) and machine learning (ML) technology, it is not surprising that most of the systems listed above have natural language processing (NLP) capabilities based on them. These capabilities are the must haves in an AI chatbot platform if the tasks require understanding free-text utterances. Although most button bot platforms on the market today do not offer any NLP capabilities, I expect some of them will integrate them eventually. However, in my opinion, pursuing competitive advantage in raw NLP model performance alone is rather a futile exercise, because these technology are rapidly commoditized and the differences among vendors are minimum.
The real competitive advantage is the ease and speed with which a new NLP model can be deployed in production. Here DialogFlow and Lex seems to be very capable, as they are essentially plumbing mechanism of data flows, so new NLP models should be easy to be plugged in. The NLP integration story of Waston and Facebook (wit.ai) is not as clear because NLP capability seems to be part of the system, which is actually a weakness, because one wants to iterate on these often and fast.
Juji takes a unique and practical hybrid approach to integrate DL and ML based NLP with the so called traditional AI approach. As described in my 2018 talk, our slogan for the integration is "Symbolics as the bones, and Machine Learning as the flesh". Consequently, Juji has a complete story on NLP integration. One can either use Juji’s built-in NLP models, run one’s own code in Juji’s sandbox, or call out to third party code easily. The components are loosely coupled, yet are all within the comfort and convenience of a single system.
In summary, all three big ideas fit together nicely to create the unique Juji chatbot creation experience that is both easy and powerful. For example, we can easily integrate any end-to-end DL based conversational techniques to support lively chitchats. At the same time, because our customers have placed a high emphasis on controllability of the bot utterances, our system can easily control the flow of the conversation and leads it naturally towards customer's business goals. We are able to accommodate these seemly conflicting requirements, because we have set out to design a system that produces true agents, has the right level of abstraction and is practical rather than ideological about how to achieve AI.