Prompt design for Large Language Models

How to combine several instructions? Divide and conquer?

Intro

A prompt is an input to LLMs. Typically, a prompt contains a set of instructions that tell an LLM how to solve a task. The way the prompt is written has a significant impact on the LLMs output and the final task results.

Today creating an LLM-based solution requires a systematic approach:

Build
Measure
Learn

According to system design and engineering, it is not optimal. Refer to Specify-Design-Build-Test-Fix Paradigm and Design Process Model which were documented in the 1960s. It is described as inconsistent, inefficient, and chaotic¹

The process of constructing a solution should follow a structured progression rather than being opportunistic. There are no established prompt design workflows, typically making it an iterative process. This involves evaluating different prompts using a set of inputs and systematically assessing them on a large dataset².

Traditional Approach and Single-Turn Prompting

Single-turn prompting aims to get a result in a one run of LLM. The conventional method adopts a three-stage design³. The pertinent steps necessary for troubleshooting issues within a process are depicted on a diagram. The principal input for this process is the “Initial Prompt” coupled with the recognized problems that require resolution⁴.

Here is a general flow:

Start with an initial prompt and issues
Select the most problematic issue from the identified issues
Revise the prompt
Check whether the revised prompt addresses the original issue
Check whether the revised prompt addresses all the issues in the current context
Confirm that the revised prompt has not introduced any new issues in other contexts
If any of the checks fails, revise the prompt again. If all checks pass, the new finalized prompt is the output
The output, therefore, is the “New Prompt”, which has addressed the most problematic issue without causing further concerns in the current and other contexts

Traditional approach has next problems:

Inevitable Breakdowns. Complex LLM-based systems will inevitably experience breakdowns as a result of combining instructions, affecting one or more parts
Decreasing Returns on Efforts. As we try to improve complex LLM-based systems, it becomes harder to add new instructions without causing errors due to their interdependencies, similar to the concept of diminishing returns
Continuous Improvement Journey. The process of improvement in complex LLM-based systems is a never-ending journey, as there is no ultimate endpoint. Entropy plays a role, making it an ongoing challenge to manage and optimize these systems

Findings

Impact of Minor Problems. Even small issues in the queries can have significant negative consequences for the conversation. These problems can create a downward spiral, leading to less effective interactions
Effectiveness of Explicit Instructions. Providing clear and specific instructions yields better results and higher reliability compared to vague or ambiguous ones
Balancing Spontaneity and Control. When designing prompts, there is a tradeoff between allowing spontaneous interactions without strict controls and constraining behavior with prompts. Deciding the level of risk appetite is essential in this process
Suitability of Prompt Design. While prompt design is useful for risk-tolerant domains, it may not be fully ready for high-stakes domains, where more precision and reliability are required
Avoiding Over-Restriction. Attempting to eliminate all issues through prompt design alone may have unintended consequences. It could turn the solution into a rigid system with limited capabilities, resembling a simple rule tree and a collection of predefined actions
Allowing Dynamics. Instead of aiming to restrict a language model, llm-based solutions should allow more natural and free-flowing interactions. One way to achieve this is by introducing dynamic prompt changes and allowing branching of interactions into different llm-based subsystems. The focus should be on preventing critical failures while embracing a degree of “controlled chaos” to maintain flexibility

Hints

To improve prompts, consider the following guidelines:

Include examples of desired interactions in your prompts
Craft prompts that resemble code snippets to make them more familiar to users
Repetition of important instructions can reinforce understanding

Ensure that task requirements are clearly stated, leaving no room for ambiguity. The task specification should be explicit and not implied.

Plan the reuse of instructions thoughtfully to enhance efficiency.

When it comes to testing and verification, approach it with caution and systematically, avoiding opportunistic and overconfident behaviours.

For debugging, adopt a systematic approach that targets identified solvable underlying problems.

Measure the success of the solution based on multiple metrics to get a comprehensive evaluation.

Single-turn methods

A summary from ⁵:

In-Context Learning. Problem-solving by LLM inference only.
Instruction-Following. Instructions describing the task are given before the prompt. By providing specific instructions at the beginning of the input, the LLM to perform a particular task or produce desired outputs.
Chain-of-Thought⁶ ⁷. A technique for constructing short prompts by breaking down the thought process into a series of intermediate steps. These steps lead to the final output, helping the LLM to perform complex tasks more effectively with limited training examples.
Impersonation⁸. Prompt instructs the LLM to impersonate a domain expert when answering a domain-specific question. when answering a domain-specific question. By assuming the role of an expert the model can generate answers that appear more knowledgeable and relevant, even if it does not have in-depth knowledge of the subject.

Chaining and Multi-Turn Prompting

Chaining is a way of using LLMs to solve complex tasks by breaking them down into smaller subtasks. The output of one subtask is used as the input for the next subtask, and so on. This allows LLMs to leverage their strengths in handling a variety of tasks to solve complex problems that would be difficult or impossible to solve with a single subtask⁹.

Chains offer new ways to interact with LLMs:

Subtask calibration. Chaining allows users to calibrate the expectations of the LLM by breaking down a complex task into smaller, more manageable subtasks. This helps the LLM to better understand the task and generate more accurate results
Parallel downstream effects. Chaining allows users to compare and contrast different strategies for solving a task by observing the parallel downstream effects of each strategy. This can help to identify the best strategy for a particular task
Unit testing. Chaining allows users to “unit test” subcomponents of a Chain to debug unexpected model outputs. This can help users to identify and fix problems with their Chains more quickly

The process of Chaining includes the following steps:

Splitting. Identification the subtasks that need to be solved to complete the overall task
Ideation. Generation of suggestions for how to solve each subtask
Composition. Combination of the suggestions from the ideation step to create a solution to the overall task

The research⁹ suggests that we can make it easier to understand and fix problems with black-box LLMs by breaking down complex problems into smaller ones. The model would then solve each smaller problem independently, and show the intermediate results to users, so they can see how the model is working and make changes and corrections if necessary.

To begin implementing chains, the first step is to identify “Primitive tasks”, which are the fundamental operations for constructing chains. As mentioned, these primitive tasks are⁹:

Classification. Useful for branching and validation
Factual query. Answering a question about a given piece of information
Generation. Creating a new piece of the text, given a certain input
Ideation. Providing a list of ideas for a given input
Information extraction. Retrieving relevant information from the input
Rewriting. 1-1 mapping to modify the input
Split points. 1-N mapping operation used to split inputs into multiple parts
Compose points. N-1 mapping operation that merges multiple items into one. Reverse operation of p.7

Using these primitives, complex tasks can be broken down into blocks, making it easier to assess each step, for example, using the Likert Scale. This approach enables the quantification of LLM behaviour and, ultimately, the assessment of task outcomes. Each primitive can have its own guardrails and detailed use cases, promoting a high level of reuse.

In a testing environment, when a human evaluates the LLM output, chains provide the ability to isolate interventions and preserve progress for further analysis. This allows addressing specific subtasks individually, unlike the traditional approach where resolving an issue might involve changing the prompt, leading to unpredictable consequences. With chains, interventions can be precisely targeted, streamlining the debugging process and enhancing the overall efficiency and performance.

The use of scoped objectives for subtasks can help prevent any unintended tangents and provide clear boundaries for each task. These scoped objectives can also have their own guardrails to help define the limits and constraints within which the LLM operates. By setting up guardrails, the LLMs ability for free exploration is limited, and its outputs can be constrained to conform to specific forms or guidelines.

Findings

While chaining is a valuable technique for tackling tasks that cannot be accomplished in one run of an LLM, there are several drawbacks to consider:

Complex management. Chaining involves more complex management compared to using a single prompt and run. As the number of primitives and branching scenarios increases, the complexity of managing the system grows. This complexity can be similar to managing a rule-based system, leading to potential negative outcomes and challenges
Additional Overhead and Latency. Each primitive introduced in the flow requires an additional call to the LLM. This introduces overhead in terms of computational resources and can result in a less smooth user experience due to increased latency caused by multiple requests
Risk of Infinite Loops. In complex tasks, there is a risk of encountering cycles that could break the entire flow and lead to an infinite loop. Dealing with such cases requires oversight and handling, adding to the complexity of the system and increasing maintenance costs

As a result, Chaining has its advantages, it is essential to carefully consider the drawbacks and weigh them against the benefits to determine if it is the right approach for a particular task or scenario.

Multi-turn methods

A summary from ⁵:

Ask Me Anything¹⁰. Uses multiple prompt templates to convert few samples into an open-ended question-answer format. The final output is obtained by aggregating the LLM predictions
Self-consistency¹¹. Extends the Chain-of-Thought prompting by sampling multiple reasoning paths and selecting the most consistent answer among them
Least-to-Most¹². Uses a set of constant prompts to decompose a complex problem into a series of sub-problems. The LLM solves the sub-problems sequentially, with later sub-problems containing previously generated solutions, iteratively building the final output
Scratchpad¹³. A method for fine-tune LLMs on multistep computational tasks so that they output intermediate reasoning steps, e.g., intermediate calculations when performing additions, into a “scratchpad” before generating the final result
ReAct¹⁴. Combines reasoning and action by prompting LLMs to generate reasoning traces and action plans. These action plans can be executed to allow the model to interact with external environments
Automatic Reasoning and Tool-Use¹⁵. A method for automatically generating multistep reasoning prompts, including symbolic calls to external tools such as search and code generation or execution. Demonstrations of related tasks are retrieved from a library with accompanying reasoning steps and uses a frozen language model to generate intermediate reasoning steps
Self-refine¹⁶. Iterative refinement, improving an initial solution over several steps
Tree of Thoughts¹⁷. Generalizes the Chain-of-Thought approach to maintain a tree of thoughts with multiple different paths. Each thought is a language sequence serving as an intermediate step. This allows the LLM to self-evaluate the progress of intermediate thoughts in solving the problem and incorporate search algorithms, such as breadth-first or depth-first search, for systematic exploration of the tree with lookahead and backtracking

Conclusion

In the review, two approaches were discussed: Traditional and Chaining. Each of them has its own advantages and disadvantages:

Traditional Approach and Single-Turn Prompting
- Simpler and easier to start with
- It can become inadequate for high-stakes domains due to the risk of uncontrolled behaviour
- Hard to tune as complexity increases
- It requires fewer resources from the LLM side as it involves only one run, resulting in better latency
Chaining Approach and Multi-Turn Prompting
- Introduces overhead from the beginning with the need to create and develop primitive blocks separately before combining them to solve a task
- More suitable for high-stakes domains due to its transparent behaviour that can be broken down into manageable steps for further review
- Easier to tune as the complexity of problems grows
- Requires more resources from the LLM side due to multiple calls, resulting in higher latency

Considering these points, the best approach could be to start with the Traditional Approach and then switch to Chaining as the project grows and requires more sophisticated and controlled behaviour.

The choice between these approaches should be based on the specific requirements, complexity, and goals of the task at hand, ensuring that the chosen approach aligns with the desired outcomes and constraints.

Intro

Traditional Approach and Single-Turn Prompting

Findings

Hints

Single-turn methods

Chaining and Multi-Turn Prompting

Findings

Multi-turn methods

Conclusion

Footnotes