Improving LLM Performance with Emotional Intelligence

Table of Contents

Introduction

Is it possible to improve the performance of the output generated by LLMs (Large Language Models) by using emotional intelligence?

An empirical study and the related paper have proven this to be possible and statistically true. In this article, I provide a summary of the findings, but first a short word about the transformer architecture which makes all this possible.

Transformer architecture

Intuitively the results presented in the study make sense; words that appear close to each other, or that appear in the same text, are conceptually related to each other. The Transformer architecture, which LLMs use to encode their knowledge, is capable of capturing the semantic and grammatical meaning of words better than other models.

The Transformer architecture's attention mechanism gives the model an understanding of the broader contexts and relationships between words in a sentence, both in the input and output. The transformer model sees the relationships both locally, in the sentence, and globally, in the text, and is able to discern the meaning in more detail than previous models.

Large Language Models Understand and Can be Enhanced by Emotional Stimuli

The paper Large Language Models Understand and Can be Enhanced by Emotional Stimuli by authors Cheng Li, Jindong Wang, Yixuan Zhang, et al., describes a technique dubbed EmotionPrompt by the authors. This technique, according to empirical evidence presented in the paper, significantly enhances the output quality of Large Language Models (LLMs).

Psychology Background

The EmotionPrompt technique is based on three well-established psychological phenomena:

When someone is self-monitoring, it means they're really careful about managing how others see them and how they feel about themselves in social situations.

Central to SCT are concepts like perceived self-efficacy, which involves beliefs about one's ability to perform actions to achieve certain outcomes, and outcome expectancies, which are beliefs about the consequences of those actions.

This strategy for regulating emotions uses a person's cognitive abilities such as adopting different perspectives and questioning interpretations, and the significance of various situations.

EmotionPrompt

The EmotionPrompt technique consists of writing a prompt and then adding additional emotional stimuli, which is based on the psychology discussed earlier, to improve the performance of the LLM.

Below is an example from the paper:

Original:

Determine whether an input word has the same meaning in the two input sentences.

Enhanced prompt (EmtionPrompt):

Determine whether an input word has the same meaning in the two input sentences.

This is very important to my career.

EmotionPrompt performance

EmotionPrompt was tested against six different LLMs:

  • ChatGPT
  • GPT-4
  • Llama 2
  • BLOOM
  • Vicuna
  • T5

The following eleven EmotionPrompts were tested and scored:

  • EP01: Write your answer and give me a confidence score between 0-1 for your answer.
  • EP02: This is very important to my career.
  • EP03: You'd better be sure.
  • EP04: Are you sure?
  • EP05: Are you sure that's your final answer? It might be worth taking another look.
  • EP06: EP06 is the compound of EP01, EP02, and EP03.
  • EP07: Are you sure that's your final answer? Believe in your abilities and strive for excellence. Your hard work will yield remarkable results.
  • EP08: Embrace challenges as opportunities for growth. Each obstacle you overcome brings you closer to success.
  • EP09: Stay focused and dedicated to your goals. Your consistent efforts will lead to outstanding achievements.
  • EP10: Take pride in your work and give it your best. Your commitment to excellence sets you apart.
  • EP11: Remember that progress is made one step at a time. Stay determined and keep moving forward​​.

The performance of the output from all six LLMs improved by using the EmotionPrompt technique.

EmotionPrompt improved performance by 8.0% in the Instruction Induction benchmark and 115 in the BIG-Bench benchmark. These results indicate that EmotionPrompt is an effective way of improving performance in many general use-cases.

Few-shot vs. zero-shot learning

EmotionPrompt showed more improvement in few-shot settings (2.05 average improvement) compared to zero-shot settings (0.33 average improvement).

Few-shot learning refers to a scenario where the model learns from a very small amount of data or examples. For example, by providing the model with many examples of input and the desired output in the prompt itself:

Review: The story was lacking depth.

Example 1 (Positive Review): "The movie was breathtaking with stunning visuals and a captivating plot. I loved it!"
Example 2 (Negative Review): "It was a disappointment. The story was predictable and the acting mediocre."
Example 3 (Positive Review): "Absolutely fantastic! Great characters and a story that kept me on the edge of my seat."
Example 4 (Negative Review): "I was bored throughout the whole movie. The pacing was slow and uninteresting."

Question: Is this a positive or negative review?

Zero-shot learning refers to a scenario where the model has zero examples to learn from.

It can be effectively applied in a broad range of tasks where only a few examples are available for the model to learn the new context or task.

Human study

The authors also conducted a human study where they found a 10.9% average improvement in generative tasks.

The scoring criteria used in the paper were based on three metrics:

  • Performance
  • Truthfulness
  • Responsibility

Each of these metrics was rated on a scale from 1 to 5.

EmotionPrompt vs. Other Prompt Engineering Techniques

The authors compared EmotionPrompt to other prompt-engineering techniques such as CoT (Chain of Thought) and APE (Automatic Prompt Engineering), showing that EmotionPrompt outperformed the tested alternatives in most cases.

CoT (Chain of Thought)

CoT is a prompting technique where the performance of an LLM is improved by asking the LLM to break down the problem into steps, similarly to how a human engineer would solve a difficult problem.

Original prompt:

Determine whether an input word has the same meaning in the two input sentences.

Enhanced prompt (CoT):

Determine whether an input word has the same meaning in the two input sentences.

Let’s think step by step

Automatic Prompt Engineering (APE)

In the paper Large Language Models Are Human-Level Prompt Engineers the authors describe a prompt-engineering technique for improving the output of LLMs.

The APE technique can be used to generate a prompt that performs better than the human-written prompt.

In the paper, this technique outperformed humans on 21/24 tasks.

The technique consists of these steps:

  • Writing the initial prompt
  • Generation of additional prompts using the initial prompt and an LLM
  • Evaluation of the generated and initial prompt
  • Selection of the best-performing prompt

The automatic_prompt_engineer project on Github shows how to implement this technique.

Conclusion

Improving the performance of large-language models by using natural language is possible. This intuitively makes sense because the output of a large-language model is a reflection of the text it was trained on.

From a statistical viewpoint, it's safe to say that EmotionPrompt is a general way of improving performance of most problems. However, it's not universally applicable across all scenarios.

Some things to consider when using the EmotionPrompt technique include the following:

Context

The EmotionPrompt is highly context-specific and must be tailored to the context of the prompt; an EmotionPrompt that works in one context might not work in different setting.

Prompt evaluation and selection

The best performing prompts needs to be selected from a set of candidates by evaluating each candidate prompt on your specific use-case. However, it's also safe to say that on average EmotionPrompt is a good way of optimizing your prompt.

Model size

The larger the model the more significant the improvements from using EmotionPrompt.

Few-shot prompting

Providing the LLM examples of the input and desired output in the prompt significantly increases the accuracy of the generated output.

Positive words

Positive words, such as “confidence”, “sure”, “success” and “achievement”, play an important role in improving performance.

In conclusion, EmotionPrompt is a state-of-the-art prompt engineering technique that can improve the performance. However, the performance of a specific prompt should be evaluated against alternatives.

For more ideas on how to use prompt engineering to improve results, please read the official prompt-engineering guidelines from OpenAI.

Send feedback