Research/Blog

CellStrat > Research/Blog > Artificial Intelligence > Artificial General Intelligence (AGI) > Hierarchical Text Generation and Planning for Strategic Dialogue using RL

Hierarchical Text Generation and Planning for Strategic Dialogue using RL

August 5, 2020
Posted by: Salim Ansari
Category: Artificial General Intelligence (AGI) Artificial Intelligence Natural Language Processing Reinforcement Learning Retail

No Comments

Introduction

Moving up the value chain CellStrat would like to encourage discussions and webinars focusing on the application of AI in Real Life problem-solving. A beginning has already been made and this is another step in that direction. The use of Deep Learning and Reinforcement Learning to solve a complex Strategic Negotiation is a very good example, showcasing the use of RL in optimizing the decision-making process.

The topic has been divided into two parts Part I will deal with the introduction to the Hierarchical Text Generation Process. In part II we will take up a case study.

The Word-by-word approach to text generation has been successful in many tasks. However, they have limitations in under-constrained generation settings, such as dialogue response or summarization, where models have significant freedom in the semantics of the text to generate. There is a tendency among the models to overly generalize the responses that might be valid but not necessarily accurate. Further, such models are interpretable and at times intellectually dissatisfying because they do not clearly distinguish between the semantics of language and its surface realization. Entangling form and meaning is problematic for reinforcement learning, where back-propagating caused by semantic decisions can adversely affect the linguistic quality of text (Lewis et al., 2017), and for candidate generation for long term planning, as the linguistically diverse text may lack semantic diversity. Here we will concentrate on Negotiation dialogs. More focused on Strategic Negotiation.

negotiation dialogues

Overall Approach

Use a method for learning discrete latent representations of sentences(z_t) based on their effect on the continuation of the dialogue. It consists of

Decoupling the semantics of the dialogue utterance from its linguistic realization. –Use the latent sentence representations (z_t) for hierarchical language generation, planning, and reinforcement learning.
Improve the ability of the model to plan ahead by creating a set of semantically diverse candidate messages by sampling z_tand then use rollout to identify an expected reward for each

RL applied for learning based on end-task reward

Advantages of this approach

increases the end- task reward achieved by the model
improves the effectiveness of long-term planning using roll- outs,
allows self-play reinforcement learning to improve decision making without diverging from human language..

The text generated by the model has consequences than can be easily measured with ref to human response using a hierarchical generation or phased approach for a strategic dialogue agent.

In the first phase the agent samples a short-term plan in the form of a latent sentence representation.
The agent then conditions on this plan during generation, allowing the precise and consistent generation of text to achieve a short- term goal. this could be treated as phase two.
Doing so, we aim to disentangle the concepts of ”what to say” and ”how to say it”.

Negotiation Dialogue Sequence

Separated Strategic and NLG aspect

Hierarchical generation of dialogue responses

The latent variable z_t is inferred to maximize the likelihood of a message x_t, given previous messages x_0t−1≡(x0,…,x_t−1) which has the effect of clustering similar message strings.

Approach
- Latent variable z_t is optimized to maximize the likelihood of messages and actions of the continuation of the dialogue, but not the message x_t itself –
- z_t learns to represent x_t’s effect on the dialogue, but not the words of x_t.
- The distinction is important because messages with similar words can have very different semantics, conversely the same meaning can be conveyed with different sentences.
- Results show empirically and through human evaluation that our method leads to
  - Both better perplexities and end task rewards
  - qualitatively that our representations group sentences that are more semantically coherent but linguistically diverse.
  - Using this message representation improves the strategic decision making of our dialogue agent.

Ground rules

The agents X and Y are initially given a space A of possible agreements,

Value functions v^X and v^Y , specify a non-negative reward for each Agreement a ε A
Agents cannot directly observe each other’s value functions and can only infer it through dialogue.
The agents sequentially exchange turns of natural language x_t , consisting of n + 1 words x_t^0:nt=(x⁰_t, . . . , xⁿ_t ), until one agent enters a special turn that ends the dialogue.
Both agents independently enter agreements a^X , a^Y ε A respectively.
If the agreements are compatible, both agents receive a reward based on their actions and the value function.
If the actions are incompatible, neither agent receives any reward.
Training dialogues from an agent’s perspective consist of agreement space A, value function v, messages x_0:T,and agreement a.

References

https://web.stanford.edu/class/cs224n/slides/cs224n-2020-lecture15-nlg.pdf
arxiv.org/pdf/1712.05846.pdf
a hierarchical generation approach for a strategic dialogue agent, where the agent ﬁrst samples a short-term plan in the form of a latent sentence representation.