Prompt Engineering for Named Entity Recognition (NER)

2023 was the year of exploration, testing and proof-of-concepts or deployment of smaller LLM-powered workflows/use cases for many organizations. Whilst 2024 will likely be the year where we will see even more production systems leveraging LLMs. Compared to a traditional ML system where data (examples, labels), model and weights are some of the main artifacts, prompts are instead the main artifacts. Prompts and prompt engineering are fundamental in driving a certain behavior of an assistant or agent, for your use case.

Therefore many of the large players as well as academia have provided guides on how to prompt LLMs efficiently:

Many of these guides are quite generic and can work for many different use cases. However, there are very few guides mentioning best practices for constructing prompts for Named Entity Recognition (NER). OpenAI has for instance a cookbook for NER as well as this and this paper which both suggested a method called Prompt-NER.

In this article, we will discuss some techniques that might be helpful when using LLM for NER use cases. To set the stage we will first start with defining what Prompt Engineering and Named Entity Recognition are.

Prompt Engineering

Prompt Engineering sometimes feels more like an art compared to science but there are more best practices showing up (see some of the references in the previous section).

One definition of Prompt Engineering is shown below:

Definition: Prompt Engineering

Prompt engineering is a relatively new discipline for developing and optimizing prompts to efficiently use language models (LMs) for a wide variety of applications and research topics. Prompt engineering skills help to better understand the capabilities and limitations of large language models (LLMs).

Prompt engineering is not just about designing and developing prompts. It encompasses a wide range of skills and techniques that are useful for interacting and developing with LLMs. It’s an important skill to interface, build with, and understand the capabilities of LLMs. You can use prompt engineering to improve the safety of LLMs and build new capabilities like augmenting LLMs with domain knowledge and external tools.

— Prompt Engineering Guide

Like any other artifact prompts may be “outdated” or “drift” which is why it is important to have systems in place to do the:

Experiment tracking of prompts
Evaluating your prompts (either via the “golden dataset” approach, LLM-based evals or both)
Observability of how your prompts are being used
Versioning of your prompts.

Named Entity Recognition (NER)

Extracting entities or tags from data can be very useful for many different domains and businesses and can be used for many different things such as classification, knowledge retrieval, search etc.

See one definition below for Named Entity Recognition:

Definition: Named Entity Recognition

Named Entity Recognition (NER) is a task of Natural Language Processing (NLP) that involves identifying and classifying named entities in a text into predefined categories such as person names, organizations, locations, and others. The goal of NER is to extract structured information from unstructured text data and represent it in a machine-readable format. Approaches typically use BIO notation, which differentiates the beginning (B) and the inside (I) of entities. O is used for non-entity tokens.

— Papers with Code

Below is an example of the BIO notation, which is a common format used for NER:

Mac [B-PER]
Doe [I-PER]   
ate [O]
a [O]
hamburger [O] 
at [O]
Mcdonalds [B-LOC]

However, the BIO notation does not make sense for all use cases. Let’s say that if you are interested in extracting food entities then hamburger above might be a key entity or tag that you want to predict.

Usually, a “typical” NER pipeline comprises the following steps:

tokenizer: Turn text into tokens
tagger: Assign part of speech tags
parser: Assign dependency labels
ner: Detect and label named entities

Solving NER systems has previously been done by using:

Rule-based systems
Statistical & ML systems
Deep-Learning systems
Mix of (1) - (3).

Where spacy and Transformer architectures from e.g. HuggingFace such as BERT, XLM-ROBERta etc. have been the go-to methods or architectures.

Before we start with prompt engineering let’s set the stage with an example use case.

Food Entities from recipes

In the following section we will assume the following:

We are a food tech startup that provides and sells custom smart purchase lists to retailers from online food recipes.
We want to extract the following type of entities: FOOD, QUANTITY, UNIT, PHYSICAL_QUALITY, COLOR
When we have the entities we want to populate smart purchase lists with recommendations on where to get the food.

The example recipe that we will be using:

### Chashu pork (for ramen and more)
Chashu pork is a classic way to prepare pork belly for Japanese dishes such as ramen. 
While it takes a little time, it's relatively hands-off and easy, and the result is delicious.

### Ingredients
2 lb pork belly or a little more/less
2 green onions spring onions, or 3 if small
1 in fresh ginger (a chunk that will give around 4 - 6 slices)
2 cloves garlic
⅔ cup sake
⅔ cup soy sauce
¼ cup mirin
½ cup sugar
2 cups water or a little more as needed

### Instructions
Have some string/kitchen twine ready and a pair of scissors before you prepare the pork. 
If needed, trim excess fat from the outside of the pork but you still want a later over the
...

Technique #1 - Use `temperature=0.0`

LLMs are non-deterministic by nature and different generations of using e.g. chat completions APIs from e.g. OpenAI will return different responses. However, one way to mitigate this is to use the temperature parameter often provided in these types of APIs. In short, the lower the temperature, the more deterministic the results in the sense that the highest probable next token is always picked. Do notice that this is still not a guarantee for deterministic output. Rather the best try effort to always select the most likely token as model output.

This is of course useful in a NER case where you would like the same or similar input to produce the same or similar output.

Tip 1: Temperature Control

Use temperature=0.0 to get more deterministic output:

Lower temperature = more predictable responses
Higher temperature = more creative/random responses
For NER tasks, you want consistency, so use 0.0

Technique #2 - Set `seed` for your runs

Recently OpenAI added the functionality to set the seed to make runs more reproducible. If all other params are set to the same value e.g. temperature=0 etc, seed is set and the system_fingerprint then output will be mostly deterministic.

Tip 2: Seed Control

Set seed parameter for reproducible results:

response = openai.chat.completions.create(
  model="gpt-4",
  temperature=0.0,
  seed=42,  # Same seed = same output
  messages=[...]
)

Note: Only works when combined with temperature=0 and same system_fingerprint.

Both the previous section and this section were more focused on model parameters. The following sections will instead focus on what Prompt Engineering techniques we can use to extract named entities.

Technique #3 - Use clear instructions

The next step is to use clear instructions to the agent normally:

System prompt is used for instructions
User prompt to provide context and data
(Assistant) prompt to provide examples

Starting with a prompt like the below:

System:
You are a food AI assistant who is an expert in natural language processing
and especially name entity recognition.

User:
Extract all food-related entities from the recipe below in backticks:
```{recipe}```
...

We get the following output:

#### extract_food_entities
I will now proceed to extract all the food-related entities from the given recipe.

#### extract_food_entities
The food-related entities present in the recipe are as follows:

* Pork belly
* Green onions
* Fresh ginger
* Garlic
* Sake
* Soy sauce
* Mirin
* Sugar
* Water
* Rice

These entities cover the main ingredients used for the Chashu pork recipe.

You can also use the Assistant message to prompt with some examples:

Assistant:
Example:
```json
{
  "food": "minced meat",
  "quantity": 500
  "unit": grams (g)
  "physicalQuality": "minced",
  "color": "brown"
}

…

Tip 3

Use clear instructions to guide the LLM effectively:

Use System prompt for instructions
Use User prompt to provide context and data
Use Assistant prompt to provide examples

This structured approach helps the model understand exactly what you want it to do.

Technique #4 - Use `functions` or `tools`

When we use functions or tools we prompt the model to provide input arguments for an actual function in a downstream manner. This is similar to what is mentioned in section 4.2 here. You can use these arguments as they are (as they will be valid JSON) or do some further processing by doing the function calling in your logic. One example could be that we want to trigger certain actions or do certain formatting based on the function arguments.

The functions will also be part of the system prompt. Many of the latest models have been fine-tuned to work with function calling and thus produce valid JSON output in that way. To define a function we define a jsonSchema as below:

{
  "name": "extract_food_entities",
  "description": "You are a food AI assistant. Your task is to extract food entities from a recipe based on the JSON schema. You are to return the output as valid JSON.",
  "parameters": {
    "type": "object",
    "properties": {
      "food-metadata": {
        "type": "object",
        "properties": {
          "food": {
            "type": "string",
            "description": "The name of the food item"
          },
          "quantity": {
            "type": "string",
            "description": "The quantity of the food item"
          },
          "unit": {
            "type": "string",
            "description": "The unit of the food item"
          },
          "physicalQuality": {
            "type": "string",
            "description": "The physical quality of the food item"
          },
          "color": {
            "type": "string",
            "description": "The color of the food item"
          }
        },
        "required": [
          "food", 
          "quantity",
          "unit",
          "physicalQuality",
          "color"
        ]
      }
    },
    "required": [
      "food-metadata"
    ]
  }
}

The example output of using the extract_food_entities below is:

{
    "food-metadata": {
        "food": "Pork Belly",
        "quantity": "2",
        "unit": "lb",
        "physicalQuality": "-",
        "color": "-"
    }
}

Tip 4

Use tools or function calling with a jsonSchema to extract wanted metadata.

Technique #5 - Use domain prompts

As seen above using the jsonSchema above gives us metadata in a structured format that we can use for downstream processing. However, there are some limitations in the number of characters you can set in the description for each property in the jsonSchema. One way to give further instructions to the LLM is to add domain-specific instructions to e.g. the system prompt:

System:
You are a food AI assistant who is an expert in natural language processing
and especially name entity recognition. The entities we are interested in are: "food", "quantity", "unit", "physicalQuality" and "color". 

See further instructions below for each entity:

"food": This can be both liquid and solid food such as meat, vegetables, alcohol, etc. 

"quantity": The exact quantity or amount of the food that should be used in the recipe. Answer in both full units such as 1,2,3, etc but also fractions e.g. 1/3, 2/4, etc. 

"unit": The unit being used e.g. grams, milliliters, pounds, etc. The unit must always be returned.

"physicalQuality": The characteristic of the ingredient (e.g. boneless for chicken breast, frozen for
spinach, fresh or dried for basil, powdered for sugar).

"color": The color of the food e.g. green, black, white. If no color is identified respond with colorless.

User:
Extract all food-related entities from the recipe below in backticks:
```{recipe}```
...

Example output with this update prompt is shown below:

{
    "food-metadata": {
        "food": "pork belly",
        "quantity": "2 lb",
        "unit": "pound",
        "physicalQuality": "raw",
        "color": "colorless"
    }
}

{
    "food-metadata": {
        "food": "green onions", 
        "quantity": "2",
        "unit": "pieces",
        "physicalQuality": "fresh",
        "color": "green"
    }
}

Tip 5

Incorporate domain knowledge to help the LLM with extracting the entities you are looking for.

Technique #6 - Use Chain-of-Thought

Chain-of-Thought (CoT)

Chain-of-Thought (CoT) is a prompting technique where each input question is followed by an intermediate reasoning step, that leads to the final answer. This shown to improve the the output from LLMs. There is also a slight variation of CoT called Zero-Shot Chain-of-Thought where you introduce “Let’s think step by step” to guide the LLM’s reasoning.

An update to the prompt now using Zero-Shot Chain-of-Thought would be:

System:
You are a food AI assistant who is an expert in natural language processing
and especially name entity recognition. The entities we are interested in are: "food", "quantity", "unit", "physicalQuality" and "color". 

See further instructions below for each entity:

"food": This can be both liquid and solid food such as meat, vegetables, alcohol, etc. 

"quantity": The exact quantity or amount of the food that should be used in the recipe. Answer in both full units such as 1,2,3, etc but also fractions e.g. 1/3, 2/4, etc. 

"unit": The unit being used e.g. grams, milliliters, pounds, etc. The unit must always be returned.

"physicalQuality": The characteristic of the ingredient (e.g. boneless for chicken breast, frozen for
spinach, fresh or dried for basil, powdered for sugar).

"color": The color of the food e.g. green, black, white. If no color is identified respond with colorless.

Let's think step-by-step.

User:
Extract all food-related entities from the recipe below in backticks:
```{recipe}```
...

By adding “Let’s think step by step” we can see some slight improvements for the extraction:

{
    "food-metadata": {
        "food": "pork belly", "quantity": "2 lb",
        "unit": "ounce",
        "physicalQuality": "trimmed",
        "color": "colorless"
    }
}

Trimmed is likely a better physicalQuality to describe the pork belly, instead of raw. This as the pork belly is sliced and used as a topping e.g. a bowl of ramen.

Tip 6

Chain-of-thought may improve performance, especially for fields that need some calculation or reasoning steps.

Technique #7 - Use Prompt Chaining

Prompt Chaining

To improve the reliability and performance of LLMs, one of the important prompting engineering techniques is to break tasks into subtasks. Once those subtasks have been identified, the LLM is prompted with a subtask and then its response is used as input to another prompt. This is what’s referred to as prompt chaining where a task is split into subtasks with the idea to create a chain of prompt operations.

— Prompt Engineering Guide

Prompt Chaining is somewhat similar to CoT here with the idea of breaking down the problem into sub-problems or tasks. However, the main difference here is that one prompt outputs its response from a previous prompt to the next prompt as refined context.

In our recipe example, we might construct a prompt looking like the below:

CONTEXT_PROMPT = """
You are a food AI assistant who is an expert in natural language processing
and especially named entity recognition. The entities we are interested in are: "food", "quantity", "unit", "physicalQuality" and "color". 

Your task is to extract the entity, the value of the entity and the reasoning, 
for why the extracted value is the correct value. 
If you cannot extract the entity add "N/A".

Make sure to output the results according to the output format below.

<Food-Metadata>:
"<metadata>":
- Entity: // the actual entity i.e. pork, onions etc
- Value: // Value of the metadata field
- Reasoning: // Reasoning for why the entity is correct.

Let's think step-by-step.
"""

The output from the first prompt is:

"pork":
- Entity: food
- Value: pork belly
- Reasoning: The text mentions preparing pork belly for Japanese dishes such as ramen, and the recipe specifically calls for 2 lb of pork belly.

"green onions":
- Entity: food
- Value: green onions
- Reasoning: The text mentions using 2 green onions and specifies that 3 should be used if they are small.

"ginger":
- Entity: food
- Value: ginger
- Reasoning: The recipe specifies using 1 inch of fresh ginger.

"garlic":
- Entity: food
- Value: garlic
- Reasoning: The recipe specifies using 2 cloves of garlic.

"sake":
- Entity: food
- Value: sake
- Reasoning: The recipe calls for ⅔ cup of sake.

"soy sauce":
- Entity: food
- Value: soy sauce
- Reasoning: The recipe calls for ⅔ cup of soy sauce.

"mirin":
- Entity: food
- Value: mirin
- Reasoning: The recipe calls for ¼ cup of mirin.

"sugar":
- Entity: food
- Value: sugar
- Reasoning: The recipe calls for ½ cup of sugar.

"water":
- Entity: food
- Value: water
- Reasoning: The recipe specifies using 2 cups of water.

We then use the output from the prompt above as input to our extract_food_entities prompt from before. This approach may be helpful when you have entities that need to be calculated with some reasoning around them or they may not be in the exact format that you have in your JSON schema.

Tip 7

Prompt-Chaining can help as an import pre-processing step to provide more relevant context.

Closing Remarks

In this post, we have been walking through some useful prompt-engineering techniques that might be helpful when you deal with Named Entity Recognition (NER) using LLMs such as OpenAI.

Key Takeaways

Depending on your use-case one or several of these techniques may help improve your NER solution. However, writing clear instructions, using CoT and or prompt chaining together with tools or functions tend to improve the NER extraction.

Prompt Engineering for Named Entity Recognition (NER)

Marcus Elwin

Prompt Engineering

Definition: Prompt Engineering

Named Entity Recognition (NER)

Definition: Named Entity Recognition

Food Entities from recipes

Technique #1 - Use `temperature=0.0`

Tip 1: Temperature Control

Technique #2 - Set `seed` for your runs

Tip 2: Seed Control

Technique #3 - Use clear instructions

Tip 3

Technique #4 - Use `functions` or `tools`

Tip 4

Technique #5 - Use domain prompts

Tip 5

Technique #6 - Use Chain-of-Thought

Chain-of-Thought (CoT)

Tip 6

Technique #7 - Use Prompt Chaining

Prompt Chaining

Tip 7

Closing Remarks

Key Takeaways

Was this helpful?

Recent Posts

The Third Path: Why the Super IC vs. Product Engineer Debate Misses the Point

Taste Still Matters In AI & Software Engineering

Unpopular Opinion - It's harder than ever to be a good data scientist

NER using DSPy

Prompt Engineering

Definition: Prompt Engineering

Named Entity Recognition (NER)

Definition: Named Entity Recognition

Food Entities from recipes

Technique #1 - Use temperature=0.0

Tip 1: Temperature Control

Technique #2 - Set seed for your runs

Tip 2: Seed Control

Technique #3 - Use clear instructions

Tip 3

Technique #4 - Use functions or tools

Tip 4

Technique #5 - Use domain prompts

Tip 5

Technique #6 - Use Chain-of-Thought

Chain-of-Thought (CoT)

Tip 6

Technique #7 - Use Prompt Chaining

Prompt Chaining

Tip 7

Closing Remarks

Key Takeaways

Was this helpful?

Recent Posts

The Third Path: Why the Super IC vs. Product Engineer Debate Misses the Point

Taste Still Matters In AI & Software Engineering

Unpopular Opinion - It's harder than ever to be a good data scientist

NER using DSPy

Technique #1 - Use `temperature=0.0`

Technique #2 - Set `seed` for your runs

Technique #4 - Use `functions` or `tools`