Communication as Intelligence: Methods for Social Exchange using Natural Language

Currently in peer review for the June 2003 Internation Conference on Artificial Intelligence in Las Vegas, Nevada.

Dario Nardi, Assistant Adjunct Professor, Program in Computing

University of California, Los Angeles


Abstract- A general framework and various methods are described for creating virtual agents that use natural language to detect, process, recall and share socially relevant information. Since information exchange necessarily involves data, the basic element chosen for the agent framework is the variable: variables act as containers to label and store information. Methods building on this framework are demonstrated for capturing information using grammatical patterns, checking information, making inferences, committing information to memory, recalling what is learned, performing math operations, implementing situation agendas, capturing late responses to past question and ambiguous responses, Internet connectivity, general context evaluation, and cross (multi) user interaction. Additional methods are suggested for future exploration. The working software tool developed based on these methods is ideal for teaching AI and Computational Social Science as well as one way to approach functional natural language systems.

Key Terms

agents, communication, social exchange, natural language

Motivation

Intelligent agents are found today in places as diverse as virtual worlds and automated commercial transactions. Natural language is often mentioned as a choice medium for many situations, in order for agents to more easily collect, process, remember and share everyday information in a way familiar to people. Social information includes such data as your name and age, the various languages you speak, your mother's maiden name, your spouse's favorite color, a past career position, or any number of other facts. Many times the exchange process is structured; other times much less so. Some data is explicit; other data is implicit. What operations are needed to communicate socially valuable information back and forth between agents using natural language? A general framework, set of ten methods and a software prototype useful for pedagogy are presented. Common needs, potential solutions, and several problematic areas are also covered.

Project Background

Natural language based social exchange is defined here as the passing back and forth of socially-relevant information between social agents. An example of social exchange might consist of giving one’s name, requesting and capturing a partner agent’s name, and then making inferences based on what is captured such as probable gender or possible ethnic background. The challenges: there are many ways grammatically to provide one’s name, not all of them direct or equally clear; the exchange process itself may happen in different ways, with or without reciprocity; exchanges may occur at any time, although there are social conventions of what comes when; and there are many situations where a social exchange must be processed for inferences, discarded as questionable, augmented by a mathematical operation, or modified by other information – for example, a greeting appropriate to the time of day. Further, multiple social exchanges may happen at once, may be interwoven, or consist of several steps (such as a “knock-knock” joke.) Passing information between agents (gossip) is also an important aspect of social exchange. And agents also have identities and agendas, with communication occurring in a context or a set of nested or overlapping contexts.

 The set of social-exchange operations presented here addresses many though not all of these challenges within a single framework. A working XML (eXtensible Markup Language) based prototype has been developed and is being used as a teaching tool for both computing students and liberal arts and sciences students to explore AI concepts. The framework uses a distributed approach: hundreds of objects, or “behaviors”, each handling one specific social-exchange function while having access to a global pool of shared resources. Modification, addition or deletion of individual social exchange functions is easily done by modifying the XML data file.

 The software prototype and operations suggested here should not be confused with a “chatbot” or similar non-serious types of programs meant to fool people. The objective has been an intelligent agent capable of practical social functions such as capturing, recalling, and using exchanged information. The prototype is currently being adapted for use with robotic agents, virtual world agents and Internet agents.

 For the purpose of classroom instruction on Artificial Intelligence or on Computational Social Science topics, or for other applications, a focus on social exchange conveniently limits the scope of the natural language problem to something manageable. The problem space is greatly reduced. Only hundreds or several thousand non-rare social exchange functions need to be addressed as opposed to, for example, handling hundreds of millions of concepts an informed speaker must know to truly converse. The challenge of social-exchange also asks us to consider questions related to social intelligence. In the spirit of modeling after real agents, who are living systems, the methods presented combine organic (historical, unstructured, arbitrary) techniques as well as mechanistic (a-temporal, structured, algorithmic) techniques.

 The Basic Framework

Since exchanging information necessarily involves data, the basic element chosen is the variable. Variables act as containers to label (name) and store information. For example:

            user name = Joe

            robot feeling about user = happy

            user mother age = 50

 An agent may know some values at the start, such as its name; an agent might also infer values or request values for variables as it processes conversational exchanges. For example:

            user sex = male IF user name = Joe OR user sex = female IF user name = Mary

 In English, the syntax reads: the user’s sex is male if the user’s name is Joe, or the user’s sex is female is the user’s name is Mary. Often, instead of a literal value like "Joe" or "Mary" we refer to a list of possible values. For example:

            user sex = male IF user name = (male name)

 The parenthetical "(male name)" refers to a list of commonly known male names. If the user's name is "Joe" and "Joe" is on the list of male names, then the user's sex will be male. This list, along with many other categorical lists of “commonly known” items, is stored in a data file for use by the main software program. A list-based scheme is helpful for categorizing everyday concepts such as knowing a cat is a pet or giving examples of common categories such as offering a dog as a typical pet. A single term can also appear on multiple lists.

 Variables are also appropriate in matching to grammatical patterns. For example, if a user says at some point "my name is Joe" and the agent matches this against the grammatical pattern, "my name is X", where X is a variable, then we might have:

            user sex = male IF X = (male name)

 A variable might also point to a numerical range. For example, if someone says, “I am X years and Y months old”, then we have expectations about the speaker:

            X = (2 to 120) AND Y = (0 to 11)

In English, this reads, X falls in the range from 2 to 120, inclusive, and Y falls in the range from 0 to 11, inclusive.

Variable names can themselves be formed from variables. For example, the grammar pattern “my X is Y years old” picks up the sentence “my sister is 40 plus years old” and forms a variable with a value like this:

            X age = Y

A variable named “sister age” with the 40 is created.

With human social interaction, all our knowledge is available to us in all situations although we may explicitly use only one or two data. Similarly, the prototype developed makes all variables and their values global, always available in a shared pool to any of the hundreds or thousands of behaviors. In this way, variable values can be easily recalled when needed and used in later interactions or with other users.

Methods for Social Exchange

Variables are not enough. A way to capture, process, remember and share variable values is also necessary. Ten methods are presented. Each method addresses a particular issue and uses an example exchange of an agent’s name to illustrate how it works. Other examples are given as appropriate.

1) Capturing information

Capturing information involves extracting key data from a literal or implied grammatical pattern. An agent might say “my name is X” to indicate his or her name, or might simply say “it’s X” or “X” in response to an inquiry. The later case requires the listener mentally supply a grammar structure knowing that what’s heard is a response to a question. Also, multiple data may be involved. For example, capturing a full name requires two or three pieces of information, such as: “my name is X Y Z” or “my last name is Z my first name is X and my middle name is Y”. All languages have an organic set of commonly expected grammar structures.

2) Checking information

Part of the capturing process involves listening for what is expected, sorting out what is not relevant, and in some cases seeking clarification. A person might say, “My name is nothing.” Or, a behavior for detecting first and last names might look for “my name is X Y” and similar grammar patterns, and then check, is X a commonly known name? If not, the agent might ask for clarification unless the sentence better matches what another module is primed to listen for. Here, extracting the last name is more difficult since there are many unique last names. Repeating back to the user is helpful and offers the opportunity for correction – the system will remember the last name only if the user says “yes” in confirmation. Finally consider, “my name is Joe Black but you can call me anything.” The conjunction “and” is a cue to English speakers for where to delimit the name. There are roughly two dozen common delimiters that cover most cases. Truncating data units at delimiters “smartens” the interaction.

 3) Making inferences

People often make inferences. For example, someone named Joe is likely male, only women can be literally pregnant, and someone born in 1995 isn't going to have kids in the year 2003. If an agent hears “my name is X”, then it tries the inference:

user sex = male IF X = (male name) OR user sex = female IF X = (female name)

 In most daily social exchange, inference is based on common sense expectations. A male might be pregnant with an idea; a situation might be pregnant with possibilities, or a ship referred to as a “she” might be pregnant with passengers. Some of these situations can be disambiguated by a grammar pattern (for example, “she” or “I am” instead of “it” or “the X is”). At this stage of the project, metaphorical speech can be considered outside the intelligence and purpose of the virtual agent.

 Inference may also be done with logical operations. Given the grammar pattern, “if W is like X and Y is like Z then what”, an agent makes a comparison to see if X equals Y. If so, there is transitive closure and the agent infers that W equals Z.

 4) Committing information to memory

Agents expect other agents to remember most of what they say. Committing to memory what is learned involves variable value assignment. If a speaker says, “my name is Joe” and the listener hears “my name is X” and X checks out as a name then the agent remembers:

            user name = X

 And if an inference is made as in method 3 above, then the agent also remembers:

            user sex = male

 Additionally, a pre-existing identity can be established for an agent as a list of data “learned” without reference to any input. For example, at the very start the virtual agent learns:

            robot name = HAL

            robot sex = male

            robot IQ = 200

 In this way, the agent can be easily configured to present itself as any number of personas: an on-line gamer, virtual secretary, oneself or a person from history.

 5)  Recalling what is learned

How can an agent recall what it has learned? One way involves behaviors that handle the retrieval of information. For example, if a user asks, “what is my name” then the agent’s “recall name” behavior can check if the variable “user name” has a value, and if so can share that and if not can say so and put in a request (see the “agenda” method 7 below) to get the user’s name. A second way is simpler. A user can simply pose a question to the virtual agent that includes the variable name of interest. The software automatically looks for a questioning tone and adjusts variables named with "user" or "robot" or such to match appropriate pronoun usage. For example, the variable named "user first name" can be accessed by posing the question, "what is my first name?"

 6) Mathematical operations

Math is useful for common functions such as calculating the length of the conversation, determining the year of birth given a person’s current age, and doing simple algebra and geometry. As an example, this particularly challenging math-based behavior determines the distance between two countries:

  1. Detect the basic request, such as “what is the distance between X and Y”
  2. Detect the names of the counties (X = England and Y = Kenya)
  3. Make inferences to locate each country in a particular global region; for example: global zone = 1 IF X = (European country) OR global zone = 2 IF X = (African country), and so on.
  4. Calculate the difference between the global zones and multiply the result by a standard value, such as 4000 miles
  5. Format the results for output

 Since computers have internal clocks, an agent has automatic access to variables such as the current month, year, and conversation starting time, which makes scheduling easier.

 7) Situational Agendas

When people go into a situation they often have an agenda. This agenda might include:

  1. Getting information (asking for or inferring the value of a variable)
  2. Sharing information (informing the partner agent of a variable’s value)
  3. Making an offer (randomly picking and presenting a value for a variable, and keeping that value if the partner agent gives a positive response)
  4. Setting a goal (checking if a variable value equals or, if appropriate, exceeds a desired threshold value)

Each of these four (and potentially others) is handled in its own way, although getting information is the most complex. For example, a query for the user’s name might involve:

  1. Posting the variable to query (for example, “user name”)
  2. Formatting and verbalizing the query (what is your name?)
  3. Inferring grammatical structures to capture a response that may have little or no grammatical support (such as understanding that a response of “Joe” means, “my name is Joe”)
  4. Allowing the “user name” behavior to capture the user’s response
  5. Noting this information is found and requires no further follow up
  6. If a user didn’t respond in an expected way, seek clarification and continue asking.

This method is easy as long as the “user name” behavior is primed to process the input for that variable name. Thus, choice of variable names is pragmatic and not arbitrary.

 8) Internet Connectivity

Real people have a physical environment so it is only fitting that virtual agents enjoy an Internet environment. Assuming a sufficiently speedy Internet connection, the agent can load and parse web site HTML and use a web robot to gather information such as recent news events, the local weather, and encyclopedia entries of historical events, scientific terms and so on. These data are stored in variable names just as all data are. The variable names are constructed dynamically to reflect their contents (temperature, sports news, and so on.)

 9) Evaluating Context

A bare-bones virtual agent has to worry about too few responses; an agent with a rich set of behaviors in the thousands has to worry about too many responses. For example, a general inquiry about “time” might call on: the current local time, a definition of time, a metaphor relating the nominalization “time” to something said earlier, the time remaining before something happens, the length of the current conversation, and so on. As a different example, what does the exact statement “test me” refer to? Does it refer to a request to give the partner agent a math problem? Or some other request? Simply, what is the “right” thing to say? The answer is context-variant subsumption architecture. This can be done several ways. One involves tagging each behavior with one or more applicable contexts, and sorting possible outputs by those that best match the current context. Unfortunately, computer-bound agents usually lack senses to determine context. Robotic agents like the Evolution Robotics ER1 are equipped with object recognition, and object- or motion-determined context can work somewhat better in determining context. Context can also be explicated by speakers (for example, “let’s talk about music”) although this appears uncommon in real conversations. A history of interactions and gathered data about a user can sometimes help determine context using multiple inferences. The current prototype only tags some behaviors for specific contexts and leaves the majority to function whenever they are called upon, on a first-come, first-reply (but don’t repeat oneself) basis.

 10) Cross-user Interaction

Sharing information between people is a common purpose of social exchange. Sharing information might include gossip or be more practical, such as linking together partner agents who share similar interests or goals. We can treat the global pool of variables as a single object whose owner is the current user. We can then reference information gathered from other agents by referring to data in an object-oriented manner. For example:

            User42 user name = Joe

            User19 user name = Mary

            User19 user age = 40

An inquiry made about “Mary’s age” returns “40” while an inquiry into Joe’s age returns “unknown”. Like context determination, cross-user interaction is complicated by user recognition issues. How do we refer to someone whose name we don’t know? What about live interaction with multiple partner agents at once? And what about information that should not be shared? While the object-oriented approach works well technically, these questions remain an area for ongoing research and development.

 Other Social Exchange Methods

Several methods not covered here are currently in development. One issue is filtering (understanding that “mom”, “momma” and “mother” mean the same thing in most situations.) A second method is handling anaphora. For example, understanding that “he is nice” refers to a previously named male, or that “my name is Joe what’s yours” means “my name is Joe, what is your name?” A third issue relates to meta-cognitive skills. For example, what is a smooth and useful way for an agent to explain the inferences it is making, or has made in the past? Other issues, including metaphor, giving step-by-step instructors, and reading from texts instead of engaging in conversation, are interesting related problems being explored.

 Conclusion

A general framework and ten specific methods were presented for intelligent agents to engage in social exchange using natural language. The framework is based on the use of variables and their values – for example, “user name = Joe”. Hundreds of independent behaviors each handle the specifics of interaction, such as detecting grammar structures, making inferences, and doing math calculations. The prototype software has demonstrated that this approach is viable and accessible to students and others from various disciplines who are exploring AI and Computational Social Science. Particularly difficult challenges remain such as context determination and partner-agent determination, and these may be better handled for agents with senses in virtual or real physical environments. The narrow focus on exchange of social information also does not address broader areas of inquiry such as determination of parts of speech or general knowledge representation and situational understanding. Rather, the focus is more inter-agent as opposed to intra-agent, and more behavioral than cognitive.

 References

Will be included in final Camera-Ready copy.

 

The full paper, link to a relevant newspaper article, and related research papers and references will be posted after the conference.