Prompt design

In this vignette, you’ll learn the basics of writing an LLM prompt, i.e. the text that you send to an LLM asking it to do a job for you. If you’ve never written a prompt before, a good to way to think about it is as writing a set of instructions for a technically skilled but busy human. You’ll need to clearly and concisely state what you want, resolve any potential ambiguities that are likely to arise, and provide a few examples. Don’t expect to write the perfect prompt on your first attempt. You’ll need to iterate a few times, but in my experience, this iteration is very worthwhile because it forces you to clarify your understanding of the problem.

As well as the general advice in this vignette, it’s also a good idea to read the specific advice for the model that you’re using. Here are some pointers to the prompt engineering guides for a few popular models:

If you have a claude account, you can use its https://docs.anthropic.com/en/docs/build-with-claude/prompt-engineering/prompt-generator. This prompt generator has been specifically tailored for Claude, but I suspect it will help many other LLMs, or at least give you some ideas as to what else you might want to include in your prompt.

Mechanics

Store in a separate file using markdown. Because prompts can be quite long, we suggest writing them in markdown. LLMs, like humans, appear to find markdown to be quite readable. That way you can use headers to divide up the prompt, and other tools like itemised lists to enumerate multiple options.

Store them in git. prompt.md is a good name if you only have one. If you have multiple, give them informative names.

For prompts that are configurable or dynamically generated, use f-strings to insert variables, or a templating language like jinja for more complex scenarios.

Additionally, build up a small set of challenge examples that you can use to verify that the prompt does what you expect. (Eventually, you might want to formally evaluate different prompts for the problem you’re tackling, but that’s currently outside the scope of chatlas.)

The following projects have good, non-trivial, examples of prompts:

However, for the purposes of this vignette, we’ll keep the prompts fairly short and just use a string so you can more easily read them.

Code generation

Let’s explore prompt design for a simple code generation task:

from chatlas import ChatAnthropic, ChatOpenAI

question = """
  How can I compute the mean and median of variables a, b, c, and so on,
  all the way up to z, grouped by age and sex.
"""

Basic flavour

When I don’t provide a system prompt, I sometimes get answers in a different language (like R):

chat = ChatAnthropic()
_ = chat.chat(question)

Here’s how to compute mean and median for variables a through z, grouped by age and sex:

# Using dplyr
library(dplyr)

df %>%
  group_by(age, sex) %>%
  summarise(across(a:z, list(
    mean = ~mean(., na.rm = TRUE),
    median = ~median(., na.rm = TRUE)
  )))

# Alternative base R approach
aggregate(. ~ age + sex, data = df[,c("age", "sex", letters[1:26])], 
         FUN = function(x) c(mean = mean(x), median = median(x)))

This will: 1. Group the data by age and sex 2. Calculate both mean and median for each variable a through z 3. Handle missing values with na.rm = TRUE 4. Return a dataframe with results for each age-sex combination

The output will have columns for age, sex, and mean/median values for each variable.

I can ensure that I always get Python code by providing a system prompt:

chat.system_prompt = "You are a helpful Python (not R) programming assistant."
_ = chat.chat(question)

Here’s how to compute mean and median for variables a through z in Python, using pandas:

import pandas as pd

# Assuming your data is in a DataFrame called 'df'
# and has columns 'age', 'sex', and variables 'a' through 'z'

# Create a list of variables from 'a' to 'z'
vars_to_analyze = list('abcdefghijklmnopqrstuvwxyz')

# Calculate both mean and median
result = df.groupby(['age', 'sex'])[vars_to_analyze].agg(['mean', 'median'])

# If you want to handle missing values explicitly:
result = df.groupby(['age', 'sex'])[vars_to_analyze].agg({
    col: ['mean', 'median'] for col in vars_to_analyze
}).fillna(0)  # or use .dropna() instead of fillna if you prefer to remove NAs

# To make the result more readable, you can flatten the column names
result.columns = [f'{col}_{stat}' for col, stat in result.columns]

This will: 1. Group your data by age and sex 2. Calculate both mean and median for each variable from a to z 3. Return a DataFrame where: - The index contains the age and sex combinations - The columns contain the mean and median for each variable - Column names will be like ‘a_mean’, ‘a_median’, ‘b_mean’, ‘b_median’, etc.

You can then view or further process the results as needed:

# View the results
print(result)

# Reset index if you want age and sex as columns instead of index
result_reset = result.reset_index()

Note that I’m using both a system prompt (which defines the general behaviour) and a user prompt (which asks the specific question). You could put all of the content in the user prompt and get similar results, but I think it’s helpful to use both to cleanly divide the general framing of the response from the specific questions that you want to ask.

Since I’m mostly interested in the code, I ask it to drop the explanation:

chat.system_prompt = """
  You are a helpful Python (not R) programming assistant.
  Just give me the code without any text explanation.
"""
_ = chat.chat(question)
import pandas as pd

vars_to_analyze = list('abcdefghijklmnopqrstuvwxyz')
result = df.groupby(['age', 'sex'])[vars_to_analyze].agg(['mean', 'median'])
result.columns = [f'{col}_{stat}' for col, stat in result.columns]

In this case, I seem to mostly get pandas code. But if you want a different style, you can ask for it:

chat.system_prompt = """
  You are a helpful Python (not R) programming assistant who prefers polars to pandas.
  Just give me the code without any text explanation.
"""
_ = chat.chat(question)
import polars as pl

vars_to_analyze = list('abcdefghijklmnopqrstuvwxyz')

result = (df
    .groupby(['age', 'sex'])
    .agg([
        pl.col(col).mean().alias(f'{col}_mean') 
        for col in vars_to_analyze
    ] + [
        pl.col(col).median().alias(f'{col}_median')
        for col in vars_to_analyze
    ])
)

Be explicit

If there’s something about the output that you don’t like, you can try being more explicit about it. For example, the code isn’t styled quite how I like, so I provide more details about what I do want:

chat.system_prompt = """
  You are a helpful Python (not R) programming assistant who prefers siuba to pandas.
  Just give me the code. I don't want any explanation or sample data.
  * Spread long function calls across multiple lines.
  * Where needed, always indent function calls with two spaces.
  * Always use double quotes for strings.
"""
_ = chat.chat(question)
from siuba import _, select, group_by, summarize
from siuba.dply.vector import across
import pandas as pd

result = (df
  >> group_by(_.age, _.sex)
  >> summarize(
    across(
      select(_[list("abcdefghijklmnopqrstuvwxyz")]),
      ["mean", "median"]
    )
  )
)

This still doesn’t yield exactly the code that I’d write, but it’s prety close.

You could provide a different prompt if you were looking for more explanation of the code:

chat.system_prompt = """
  You are an an expert Python (not R) programmer and a warm and supportive teacher.
  Help me understand the code you produce by explaining each function call with
  a brief comment. For more complicated calls, add documentation to each
  argument. Just give me the code without any text explanation.
"""
_ = chat.chat(question)
import pandas as pd

# Create list of column names a-z
vars_to_analyze = list('abcdefghijklmnopqrstuvwxyz')

# Group by age and sex, compute mean and median for each variable
result = (df
    .groupby(['age', 'sex'])                    # Group the data by age and sex
    [vars_to_analyze]                           # Select only a-z columns
    .agg(['mean', 'median'])                    # Calculate mean and median  
    .rename_axis(columns=['variable', 'stat'])  # Name the column levels
    .melt(                                      # Reshape to long format
        ignore_index=False,
        value_name='value'
    )
    .reset_index()                              # Convert indices to columns
)

Teach it about new features

You can imagine LLMs as being a sort of an average of the internet at a given point in time. That means they will provide popular answers, which will tend to reflect older coding styles (either because the new features aren’t in their index, or the older features are so much more popular). So if you want your code to use specific features that are relatively recent, you might need to provide the examples yourself:

chat.system_prompt = """
  You are an expert R programmer.
  Just give me the code; no explanation in text.
  Use the `.by` argument rather than `group_by()`.
  dplyr 1.1.0 introduced per-operation grouping with the `.by` argument.
  e.g., instead of:

  transactions |>
    group_by(company, year) |>
    mutate(total = sum(revenue))

  write this:
  transactions |>
    mutate(
      total = sum(revenue),
      .by = c(company, year)
    )
"""
chat.chat(question)
df |>
  summarise(
    across(a:z, list(
      mean = \(x) mean(x, na.rm = TRUE),
      median = \(x) median(x, na.rm = TRUE)
    )),
    .by = c(age, sex)
  )
<chatlas._chat.ChatResponse at 0x7fee61b6f770>

Structured data

Providing a rich set of examples is a great way to encourage the output to produce exactly what you want. This is also known as multi-shot prompting. Here we’ll work through a prompt that I designed to extract structured data from recipes, but the same ideas apply in many other situations.

Getting started

My overall goal is to turn a list of ingredients, like the following, into a nicely structured JSON that I can then analyse in Python (e.g. to compute the total weight, scale the recipe up or down, or to convert the units from volumes to weights).

ingredients = """
  ¾ cup (150g) dark brown sugar
  2 large eggs
  ¾ cup (165g) sour cream
  ½ cup (113g) unsalted butter, melted
  1 teaspoon vanilla extract
  ¾ teaspoon kosher salt
  ⅓ cup (80ml) neutral oil
  1½ cups (190g) all-purpose flour
  150g plus 1½ teaspoons sugar
"""
chat = ChatOpenAI(model="gpt-4o-mini")

(This isn’t the ingredient list for a real recipe but it includes a sampling of styles that I encountered in my project.)

If you don’t have strong feelings about what the data structure should look like, you can start with a very loose prompt and see what you get back. I find this a useful pattern for underspecified problems where a big part of the problem is just defining precisely what problem you want to solve. Seeing the LLMs attempt at coming up with a data structure gives me something to immediately react to, rather than having to start from a blank page.

instruct_json = """
  You're an expert baker who also loves JSON. I am going to give you a list of
  ingredients and your job is to return nicely structured JSON. Just return the
  JSON and no other commentary.
"""
chat.system_prompt = instruct_json
_ = chat.chat(ingredients)

{ “ingredients”: [ { “name”: “dark brown sugar”, “amount”: “¾ cup”, “weight”: “150g” }, { “name”: “large eggs”, “amount”: “2” }, { “name”: “sour cream”, “amount”: “¾ cup”, “weight”: “165g” }, { “name”: “unsalted butter”, “amount”: “½ cup”, “weight”: “113g”, “state”: “melted” }, { “name”: “vanilla extract”, “amount”: “1 teaspoon” }, { “name”: “kosher salt”, “amount”: “¾ teaspoon” }, { “name”: “neutral oil”, “amount”: “⅓ cup”, “volume”: “80ml” }, { “name”: “all-purpose flour”, “amount”: “1½ cups”, “weight”: “190g” }, { “name”: “sugar”, “amount”: “150g plus 1½ teaspoons” } ] }

(I don’t know if the colour text, “You’re an expert baker who also loves JSON”, does anything, but I like to think this helps the LLM get into the right mindset of a very nerdy baker.)

Provide examples

This isn’t a bad start, but I prefer to cook with weight, so I only want to see volumes if weight isn’t available. So I provide a couple of examples of what I’m looking for. I was pleasantly suprised that I can provide the input and output examples in such a loose format.

instruct_weight = """
  Here are some examples of the sort of output I'm looking for:

  ¾ cup (150g) dark brown sugar
  {"name": "dark brown sugar", "quantity": 150, "unit": "g"}

  ⅓ cup (80ml) neutral oil
  {"name": "neutral oil", "quantity": 80, "unit": "ml"}

  2 t ground cinnamon
  {"name": "ground cinnamon", "quantity": 2, "unit": "teaspoon"}
"""

chat.system_prompt = instruct_json + "\n" + instruct_weight
_ = chat.chat(ingredients)

{ “ingredients”: [ { “name”: “dark brown sugar”, “quantity”: 150, “unit”: “g” }, { “name”: “large eggs”, “quantity”: 2, “unit”: “count” }, { “name”: “sour cream”, “quantity”: 165, “unit”: “g” }, { “name”: “unsalted butter”, “quantity”: 113, “unit”: “g”, “state”: “melted” }, { “name”: “vanilla extract”, “quantity”: 1, “unit”: “teaspoon” }, { “name”: “kosher salt”, “quantity”: ¾, “unit”: “teaspoon” }, { “name”: “neutral oil”, “quantity”: 80, “unit”: “ml” }, { “name”: “all-purpose flour”, “quantity”: 190, “unit”: “g” }, { “name”: “sugar”, “quantity”: “150g plus 1½ teaspoons”, “unit”: “g” } ] }

Just providing the examples seems to work remarkably well. But I found it useful to also include description of what the examples are trying to accomplish. I’m not sure if this helps the LLM or not, but it certainly makes it easier for me to understand the organisation and check that I’ve covered the key pieces that I’m interested in.

instruct_weight = """
  * If an ingredient has both weight and volume, extract only the weight:

  ¾ cup (150g) dark brown sugar
  [
    {"name": "dark brown sugar", "quantity": 150, "unit": "g"}
  ]

* If an ingredient only lists a volume, extract that.

  2 t ground cinnamon
  ⅓ cup (80ml) neutral oil
  [
    {"name": "ground cinnamon", "quantity": 2, "unit": "teaspoon"},
    {"name": "neutral oil", "quantity": 80, "unit": "ml"}
  ]
"""

This structure also allows me to give the LLMs a hint about how I want multiple ingredients to be stored, i.e. as an JSON array.

I then just iterated on this task, looking at the results from different recipes to get a sense of what the LLM was getting wrong. Much of this felt like I was iterating on my understanding of the problem as I didn’t start by knowing exactly how I wanted the data. For example, when I started out I didn’t really think about all the various ways that ingredients are specified. For later analysis, I always want quantities to be number, even if they were originally fractions, or the if the units aren’t precise (like a pinch). It also forced me to realise that some ingredients are unitless.

instruct_unit = """
* If the unit uses a fraction, convert it to a decimal.

  ⅓ cup sugar
  ½ teaspoon salt
  [
    {"name": "dark brown sugar", "quantity": 0.33, "unit": "cup"},
    {"name": "salt", "quantity": 0.5, "unit": "teaspoon"}
  ]

* Quantities are always numbers

  pinch of kosher salt
  [
    {"name": "kosher salt", "quantity": 1, "unit": "pinch"}
  ]

* Some ingredients don't have a unit.
  2 eggs
  1 lime
  1 apple
  [
    {"name": "egg", "quantity": 2},
    {"name": "lime", "quantity": 1},
    {"name", "apple", "quantity": 1}
  ]
"""

You might want to take a look at the full prompt to see what I ended up with.

Structured data

Now that I’ve iterated to get a data structure that I like, it seems useful to formalise it and tell the LLM exactly what I’m looking for using structured data. This guarantees that the LLM will only return JSON, the JSON will have the fields that you expect, and then chatlas will automatically convert it into an Python data structure for you.

from pydantic import BaseModel, Field

class Ingredient(BaseModel):
    "Ingredient name"
    name: str = Field(description="Ingredient name")
    quantity: float
    unit: str | None = Field(description="Unit of measurement")

class Ingredients(BaseModel):
    items: list[Ingredient]

chat.system_prompt = instruct_json + "\n" + instruct_weight
chat.extract_data(ingredients, data_model=Ingredients)
{'items': [{'name': 'dark brown sugar', 'quantity': 150, 'unit': 'g'},
  {'name': 'large eggs', 'quantity': 2, 'unit': 'count'},
  {'name': 'sour cream', 'quantity': 165, 'unit': 'g'},
  {'name': 'unsalted butter', 'quantity': 113, 'unit': 'g'},
  {'name': 'vanilla extract', 'quantity': 1, 'unit': 'teaspoon'},
  {'name': 'kosher salt', 'quantity': 0.75, 'unit': 'teaspoon'},
  {'name': 'neutral oil', 'quantity': 80, 'unit': 'ml'},
  {'name': 'all-purpose flour', 'quantity': 190, 'unit': 'g'},
  {'name': 'sugar', 'quantity': 150, 'unit': 'g'}]}

Capturing raw input

One thing that I’d do next time would also be to include the raw ingredient name in the output. This doesn’t make much difference here, in this simple example, but it makes it much easier to align the input and the output and start to develop automated measures of how well my prompt is doing.

instruct_weight_input = """
  * If an ingredient has both weight and volume, extract only the weight:

    ¾ cup (150g) dark brown sugar
    [
      {"name": "dark brown sugar", "quantity": 150, "unit": "g", "input": "¾ cup (150g) dark brown sugar"}
    ]

  * If an ingredient only lists a volume, extract that.

    2 t ground cinnamon
    ⅓ cup (80ml) neutral oil
    [
      {"name": "ground cinnamon", "quantity": 2, "unit": "teaspoon", "input": "2 t ground cinnamon"},
      {"name": "neutral oil", "quantity": 80, "unit": "ml", "input": "⅓ cup (80ml) neutral oil"}
    ]
"""

I think this is particularly important if you’re working with even less structured text. For example, imagine you had this text:

recipe = """
  In a large bowl, cream together one cup of softened unsalted butter and a
  quarter cup of white sugar until smooth. Beat in an egg and 1 teaspoon of
  vanilla extract. Gradually stir in 2 cups of all-purpose flour until the
  dough forms. Finally, fold in 1 cup of semisweet chocolate chips. Drop
  spoonfuls of dough onto an ungreased baking sheet and bake at 350°F (175°C)
  for 10-12 minutes, or until the edges are lightly browned. Let the cookies
  cool on the baking sheet for a few minutes before transferring to a wire
  rack to cool completely. Enjoy!
"""

Including the input text in the output makes it easier to see if it’s doing a good job:

chat.system_prompt = instruct_json + "\n" + instruct_weight_input
_ = chat.chat(ingredients)

{ “ingredients”: [ { “name”: “dark brown sugar”, “quantity”: 150, “unit”: “g” }, { “name”: “large eggs”, “quantity”: 2, “unit”: “count” }, { “name”: “sour cream”, “quantity”: 165, “unit”: “g” }, { “name”: “unsalted butter”, “quantity”: 113, “unit”: “g”, “state”: “melted” }, { “name”: “vanilla extract”, “quantity”: 1, “unit”: “teaspoon” }, { “name”: “kosher salt”, “quantity”: 0.75, “unit”: “teaspoon” }, { “name”: “neutral oil”, “quantity”: 80, “unit”: “ml” }, { “name”: “all-purpose flour”, “quantity”: 190, “unit”: “g” }, { “name”: “sugar”, “quantity”: “150g plus 1.5 teaspoons”, “unit”: “g” } ] }

When I ran it while writing this vignette, it seems to be working out the weight of the ingredients specified in volume, even though the prompt specifically asks it not to do that. This may suggest I need to broaden my examples.

Token usage

from chatlas import token_usage
token_usage()
[{'name': 'Anthropic', 'input': 6146, 'output': 1248},
 {'name': 'OpenAI', 'input': 2909, 'output': 1044}]