skip to content
Nikolay Donets

Prompt design for Large Language Models

/

How to combine several instructions? Divide and conquer?

Intro

A prompt is an input to LLMs. Typically, a prompt contains a set of instructions that tell an LLM how to solve a task. The way the prompt is written has a significant impact on the LLMs output and the final task results.

Today creating an LLM-based solution requires a systematic approach:

  1. Build
  2. Measure
  3. Learn

According to system design and engineering, it is not optimal. Refer to Specify-Design-Build-Test-Fix Paradigm and Design Process Model which were documented in the 1960s. It is described as inconsistent, inefficient, and chaotic1

The process of constructing a solution should follow a structured progression rather than being opportunistic. There are no established prompt design workflows, typically making it an iterative process. This involves evaluating different prompts using a set of inputs and systematically assessing them on a large dataset2.

Traditional Approach and Single-Turn Prompting

Single-turn prompting aims to get a result in a one run of LLM. The conventional method adopts a three-stage design3. The pertinent steps necessary for troubleshooting issues within a process are depicted on a diagram. The principal input for this process is the “Initial Prompt” coupled with the recognized problems that require resolution4.

Input
Output
LLM
Direct task setting

Here is a general flow:

  1. Start with an initial prompt and issues
  2. Select the most problematic issue from the identified issues
  3. Revise the prompt
  4. Check whether the revised prompt addresses the original issue
  5. Check whether the revised prompt addresses all the issues in the current context
  6. Confirm that the revised prompt has not introduced any new issues in other contexts
  7. If any of the checks fails, revise the prompt again. If all checks pass, the new finalized prompt is the output
  8. The output, therefore, is the “New Prompt”, which has addressed the most problematic issue without causing further concerns in the current and other contexts
Prompt Design
Inputs
Results
NO
YES
NO
YES
NO
YES
Does the prompt
addresses the original issue
in the context?
Does the prompt
addresses all issues
across the context?
Does the prompt
break anything in
other contexts?
Prompt revision
Initial prompt
Issue selection.
The most problematic issue
Identified issues
New prompt

Traditional approach has next problems:

  • Inevitable Breakdowns. Complex LLM-based systems will inevitably experience breakdowns as a result of combining instructions, affecting one or more parts
  • Decreasing Returns on Efforts. As we try to improve complex LLM-based systems, it becomes harder to add new instructions without causing errors due to their interdependencies, similar to the concept of diminishing returns
  • Continuous Improvement Journey. The process of improvement in complex LLM-based systems is a never-ending journey, as there is no ultimate endpoint. Entropy plays a role, making it an ongoing challenge to manage and optimize these systems

Findings

  • Impact of Minor Problems. Even small issues in the queries can have significant negative consequences for the conversation. These problems can create a downward spiral, leading to less effective interactions
  • Effectiveness of Explicit Instructions. Providing clear and specific instructions yields better results and higher reliability compared to vague or ambiguous ones
  • Balancing Spontaneity and Control. When designing prompts, there is a tradeoff between allowing spontaneous interactions without strict controls and constraining behavior with prompts. Deciding the level of risk appetite is essential in this process
  • Suitability of Prompt Design. While prompt design is useful for risk-tolerant domains, it may not be fully ready for high-stakes domains, where more precision and reliability are required
  • Avoiding Over-Restriction. Attempting to eliminate all issues through prompt design alone may have unintended consequences. It could turn the solution into a rigid system with limited capabilities, resembling a simple rule tree and a collection of predefined actions
  • Allowing Dynamics. Instead of aiming to restrict a language model, llm-based solutions should allow more natural and free-flowing interactions. One way to achieve this is by introducing dynamic prompt changes and allowing branching of interactions into different llm-based subsystems. The focus should be on preventing critical failures while embracing a degree of “controlled chaos” to maintain flexibility

Hints

To improve prompts, consider the following guidelines:

  1. Include examples of desired interactions in your prompts
  2. Craft prompts that resemble code snippets to make them more familiar to users
  3. Repetition of important instructions can reinforce understanding

Ensure that task requirements are clearly stated, leaving no room for ambiguity. The task specification should be explicit and not implied.

Plan the reuse of instructions thoughtfully to enhance efficiency.

When it comes to testing and verification, approach it with caution and systematically, avoiding opportunistic and overconfident behaviours.

For debugging, adopt a systematic approach that targets identified solvable underlying problems.

Measure the success of the solution based on multiple metrics to get a comprehensive evaluation.

Single-turn methods

A summary from 5:

  • In-Context Learning. Problem-solving by LLM inference only.
  • Instruction-Following. Instructions describing the task are given before the prompt. By providing specific instructions at the beginning of the input, the LLM to perform a particular task or produce desired outputs.
  • Chain-of-Thought6 7. A technique for constructing short prompts by breaking down the thought process into a series of intermediate steps. These steps lead to the final output, helping the LLM to perform complex tasks more effectively with limited training examples.
  • Impersonation8. Prompt instructs the LLM to impersonate a domain expert when answering a domain-specific question. when answering a domain-specific question. By assuming the role of an expert the model can generate answers that appear more knowledgeable and relevant, even if it does not have in-depth knowledge of the subject.

Chaining and Multi-Turn Prompting

Chaining is a way of using LLMs to solve complex tasks by breaking them down into smaller subtasks. The output of one subtask is used as the input for the next subtask, and so on. This allows LLMs to leverage their strengths in handling a variety of tasks to solve complex problems that would be difficult or impossible to solve with a single subtask9.

Chains offer new ways to interact with LLMs:

  1. Subtask calibration. Chaining allows users to calibrate the expectations of the LLM by breaking down a complex task into smaller, more manageable subtasks. This helps the LLM to better understand the task and generate more accurate results
  2. Parallel downstream effects. Chaining allows users to compare and contrast different strategies for solving a task by observing the parallel downstream effects of each strategy. This can help to identify the best strategy for a particular task
  3. Unit testing. Chaining allows users to “unit test” subcomponents of a Chain to debug unexpected model outputs. This can help users to identify and fix problems with their Chains more quickly

The process of Chaining includes the following steps:

  1. Splitting. Identification the subtasks that need to be solved to complete the overall task
  2. Ideation. Generation of suggestions for how to solve each subtask
  3. Composition. Combination of the suggestions from the ideation step to create a solution to the overall task
Chaining
Composition
Combination of the suggestions
into a final output
Splitting
Distinct subtasks
Ideation
Suggestions for subtasks
Input
Output

The research9 suggests that we can make it easier to understand and fix problems with black-box LLMs by breaking down complex problems into smaller ones. The model would then solve each smaller problem independently, and show the intermediate results to users, so they can see how the model is working and make changes and corrections if necessary.

To begin implementing chains, the first step is to identify “Primitive tasks”, which are the fundamental operations for constructing chains. As mentioned, these primitive tasks are9:

  1. Classification. Useful for branching and validation
  2. Factual query. Answering a question about a given piece of information
  3. Generation. Creating a new piece of the text, given a certain input
  4. Ideation. Providing a list of ideas for a given input
  5. Information extraction. Retrieving relevant information from the input
  6. Rewriting. 1-1 mapping to modify the input
  7. Split points. 1-N mapping operation used to split inputs into multiple parts
  8. Compose points. N-1 mapping operation that merges multiple items into one. Reverse operation of p.7

Using these primitives, complex tasks can be broken down into blocks, making it easier to assess each step, for example, using the Likert Scale. This approach enables the quantification of LLM behaviour and, ultimately, the assessment of task outcomes. Each primitive can have its own guardrails and detailed use cases, promoting a high level of reuse.

In a testing environment, when a human evaluates the LLM output, chains provide the ability to isolate interventions and preserve progress for further analysis. This allows addressing specific subtasks individually, unlike the traditional approach where resolving an issue might involve changing the prompt, leading to unpredictable consequences. With chains, interventions can be precisely targeted, streamlining the debugging process and enhancing the overall efficiency and performance.

The use of scoped objectives for subtasks can help prevent any unintended tangents and provide clear boundaries for each task. These scoped objectives can also have their own guardrails to help define the limits and constraints within which the LLM operates. By setting up guardrails, the LLMs ability for free exploration is limited, and its outputs can be constrained to conform to specific forms or guidelines.

Findings

While chaining is a valuable technique for tackling tasks that cannot be accomplished in one run of an LLM, there are several drawbacks to consider:

  1. Complex management. Chaining involves more complex management compared to using a single prompt and run. As the number of primitives and branching scenarios increases, the complexity of managing the system grows. This complexity can be similar to managing a rule-based system, leading to potential negative outcomes and challenges

  2. Additional Overhead and Latency. Each primitive introduced in the flow requires an additional call to the LLM. This introduces overhead in terms of computational resources and can result in a less smooth user experience due to increased latency caused by multiple requests

  3. Risk of Infinite Loops. In complex tasks, there is a risk of encountering cycles that could break the entire flow and lead to an infinite loop. Dealing with such cases requires oversight and handling, adding to the complexity of the system and increasing maintenance costs

As a result, Chaining has its advantages, it is essential to carefully consider the drawbacks and weigh them against the benefits to determine if it is the right approach for a particular task or scenario.

Multi-turn methods

A summary from 5:

  • Ask Me Anything10. Uses multiple prompt templates to convert few samples into an open-ended question-answer format. The final output is obtained by aggregating the LLM predictions
  • Self-consistency11. Extends the Chain-of-Thought prompting by sampling multiple reasoning paths and selecting the most consistent answer among them
  • Least-to-Most12. Uses a set of constant prompts to decompose a complex problem into a series of sub-problems. The LLM solves the sub-problems sequentially, with later sub-problems containing previously generated solutions, iteratively building the final output
  • Scratchpad13. A method for fine-tune LLMs on multistep computational tasks so that they output intermediate reasoning steps, e.g., intermediate calculations when performing additions, into a “scratchpad” before generating the final result
  • ReAct14. Combines reasoning and action by prompting LLMs to generate reasoning traces and action plans. These action plans can be executed to allow the model to interact with external environments
  • Automatic Reasoning and Tool-Use15. A method for automatically generating multistep reasoning prompts, including symbolic calls to external tools such as search and code generation or execution. Demonstrations of related tasks are retrieved from a library with accompanying reasoning steps and uses a frozen language model to generate intermediate reasoning steps
  • Self-refine16. Iterative refinement, improving an initial solution over several steps
  • Tree of Thoughts17. Generalizes the Chain-of-Thought approach to maintain a tree of thoughts with multiple different paths. Each thought is a language sequence serving as an intermediate step. This allows the LLM to self-evaluate the progress of intermediate thoughts in solving the problem and incorporate search algorithms, such as breadth-first or depth-first search, for systematic exploration of the tree with lookahead and backtracking

Conclusion

In the review, two approaches were discussed: Traditional and Chaining. Each of them has its own advantages and disadvantages:

  • Traditional Approach and Single-Turn Prompting
    • Simpler and easier to start with
    • It can become inadequate for high-stakes domains due to the risk of uncontrolled behaviour
    • Hard to tune as complexity increases
    • It requires fewer resources from the LLM side as it involves only one run, resulting in better latency
  • Chaining Approach and Multi-Turn Prompting
    • Introduces overhead from the beginning with the need to create and develop primitive blocks separately before combining them to solve a task
    • More suitable for high-stakes domains due to its transparent behaviour that can be broken down into manageable steps for further review
    • Easier to tune as the complexity of problems grows
    • Requires more resources from the LLM side due to multiple calls, resulting in higher latency

Considering these points, the best approach could be to start with the Traditional Approach and then switch to Chaining as the project grows and requires more sophisticated and controlled behaviour.

The choice between these approaches should be based on the specific requirements, complexity, and goals of the task at hand, ensuring that the chosen approach aligns with the desired outcomes and constraints.

Footnotes

  1. Wasson, C.S. “System Engineering Analysis, Design, and Development: Concepts, Principles, and Practices.” (2015)

  2. Zamfirescu-Pereira, J.D. et al. “Why Johnny Can’t Prompt: How Non-AI Experts Try (and Fail) to Design LLM Prompts.” Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (2023): n. pag.

  3. Wei, Jason et al. “Chain of Thought Prompting Elicits Reasoning in Large Language Models.” ArXiv abs/2201.11903 (2022): n. pag.

  4. Zamfirescu-Pereira, J.D. et al. “Herding AI Cats: Lessons from Designing a Chatbot by Prompting GPT-3.” Proceedings of the 2023 ACM Designing Interactive Systems Conference (2023): n. pag.

  5. Kaddour, Jean et al. “Challenges and Applications of Large Language Models.” (2023). 2

  6. Ling, W., Yogatama, D., Dyer, C., & Blunsom, P. (2017). Program Induction by Rationale Generation: Learning to Solve and Explain Algebraic Word Problems. Annual Meeting of the Association for Computational Linguistics.

  7. Wei, J., Wang, X., Schuurmans, D., Bosma, M., Chi, E.H., Xia, F., Le, Q., & Zhou, D. (2022). Chain of Thought Prompting Elicits Reasoning in Large Language Models. ArXiv, abs/2201.11903.

  8. Salewski, L., Alaniz, S., Rio-Torto, I., Schulz, E., & Akata, Z. (2023). In-Context Impersonation Reveals Large Language Models’ Strengths and Biases. ArXiv, abs/2305.14930.

  9. Wu, Tongshuang Sherry et al. “AI Chains: Transparent and Controllable Human-AI Interaction by Chaining Large Language Model Prompts.” Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems (2021): n. pag. 2 3

  10. Arora, Simran et al. “Ask Me Anything: A simple strategy for prompting language models.” ArXiv abs/2210.02441 (2022): n. pag.

  11. Wang, Xuezhi et al. “Self-Consistency Improves Chain of Thought Reasoning in Language Models.” ArXiv abs/2203.11171 (2022): n. pag.

  12. Zhou, Denny et al. “Least-to-Most Prompting Enables Complex Reasoning in Large Language Models.” ArXiv abs/2205.10625 (2022): n. pag.

  13. Nye, Maxwell et al. “Show Your Work: Scratchpads for Intermediate Computation with Language Models.” ArXiv abs/2112.00114 (2021): n. pag.

  14. Yao, Shunyu et al. “ReAct: Synergizing Reasoning and Acting in Language Models.” ArXiv abs/2210.03629 (2022): n. pag.

  15. Paranjape, Bhargavi et al. “ART: Automatic multi-step reasoning and tool-use for large language models.” ArXiv abs/2303.09014 (2023): n. pag.

  16. Madaan, Aman et al. “Self-Refine: Iterative Refinement with Self-Feedback.” ArXiv abs/2303.17651 (2023): n. pag.

  17. Yao, Shunyu et al. “Tree of Thoughts: Deliberate Problem Solving with Large Language Models.” ArXiv abs/2305.10601 (2023): n. pag.