Hosted with nbsanity. See source notebook on GitHub.

Datasets Are All You Need

Can an LLM teach itself how to prompt just by looking at a dataset?

Spoilet alert: it sure can 😉

In this simple example, we use Gemini 2.5 Flash, Google DeepMind’s fast and inexpensive model (and yet very powerful, with built-in “reasoning” abilities) to iteratively compare the inputs and outputs in a dataset and improve a prompt for transforming from one input to the other, with high accuracy.

Similar setups work just as well with other reasoning models.

Why should you care? While this example is simple, it demonstrates how datasets can drive development in Generative AI projects. While the analogy to traditional ML processes is being stretched here just a bit, we use our dataset as input for training, as validation data for discovering our “hyperparameters” (a prompt), and for testing the final results.

%pip install --upgrade python-dotenv nest_asyncio google-genai pandas pyyaml

from IPython.display import clear_output ; clear_output()

import os
import json
import asyncio

from dotenv import load_dotenv
import nest_asyncio

from textwrap import dedent
from IPython.display import display, Markdown

import pandas as pd

import yaml

from google import genai

load_dotenv()

nest_asyncio.apply()

_gemini_client_aio = genai.Client(api_key=os.getenv('GEMINI_API_KEY')).aio

async def gemini(prompt):
    response = await _gemini_client_aio.models.generate_content(
        model='gemini-2.5-flash-preview-04-17',
        contents=prompt,
    )
    return response.text

def md(str): display(Markdown(str))

def display_df(df):
    display(df.style.set_properties(
        **{'text-align': 'left', 'vertical-align': 'top', 'white-space': 'pre-wrap', 'width': '50%'},
    ))

We’ve installed and imported some packages, and created some helper facilities.

Now, let’s look at our dataset.

The dataset is of very short stories (input), parsed into YAML (output). The dataset was generated purposefully for this example, since relying on a publicly available dataset would mean accepting that the LLM would have seen it during pre-training.

The task is pretty straightforward and, as you’ll see, can be discovered by the LLM in only a few steps. More complex tasks can be achieved too, ideally with larger datasets, stronger LLMs, higher “reasoning” budget, and more iteration.

dataset = pd.read_csv('dataset.csv')

display_df(dataset.head(3))

print(f'{len(dataset)} items in dataset.')

	input	output
0	The campfire crackled softly, a gentle murmur in the profound quiet of the night. Sarah, a woman of 35, sat cross-legged, her eyes reflecting the low flames, a look of peaceful contemplation on her face. Beside her, Mark, a man aged 38, reclined against a fallen log, hands clasped behind his head, utterly absorbed in the vast, star-dusted canvas above. His breathing was deep and relaxed. A few feet away, Lily, an 8-year-old girl, hummed a soft, tuneless song while gently sifting cool earth through her fingers. Her older brother, Tom, a 10-year-old boy, lay flat on his back on a thick blanket, his gaze fixed on the silvery path the moon cast across the smooth, dark surface of the lake just beyond the trees. The air was cool and still, carrying only the clean scents of pine needles, damp soil, and woodsmoke. Aside from the occasional distant hoot of an owl or the almost inaudible lapping of water, the stillness was complete. They were deep in a secluded clearing, surrounded by ancient trees and the quiet presence of the water. A palpable sense of peace and tranquility settled over the small group, each person finding their own quiet space in the communal calm under the expansive night sky.	location: nature mood: calm people: - age: 8 gender: female name: Lily - age: 38 gender: male name: Mark - age: 35 gender: female name: Sarah - age: 10 gender: male name: Tom time_of_day: night
1	The late afternoon sun cast long, gentle shadows across the quiet workspace. Sarah, a female aged 32, sat at her desk, her gaze fixed on the monitor in front of her. The only sounds were the low hum of the computer equipment and the occasional soft click of a mouse. It was clearly an office environment, with rows of tidy desks, filing cabinets lining one wall, and a whiteboard covered in faded markers. Across the aisle, Mark, a male aged 45, was leaning back in his chair, a half-empty mug of lukewarm coffee beside his keyboard. He was looking out the large window, his expression peaceful and relaxed. The air in the room felt still and calm, a sense of tranquil productivity pervading the space. Neither person spoke, content in the quiet routine of the workday winding down. The light filtering through the blinds indicated the sun was lower in the sky, firmly placing the time in the afternoon. Everything felt unhurried and serene.	location: office mood: calm people: - age: 45 gender: male name: Mark - age: 32 gender: female name: Sarah time_of_day: afternoon
2	Sarah, a 28 year old female, nervously hugged her arms, the silence of the place pressing in. Beside her, Mark, a 35 year old male, kept glancing towards the darker corners of the room. They were the only two left. The office occupied the entire fifth floor, a sprawling maze of empty cubicles and silent computers under the glow of emergency lights. Desks stretched out into the gloom, monitors reflecting only the dim light. Outside the large windows, the city was a distant sparkle against the absolute black of the night sky. It was well after midnight, the building quiet and deserted. An unsettling stillness hung in the air, broken only by the faint, inexplicable sounds the old building seemed to make in the dark. A cold draft slithered through the space, raising goosebumps on Sarah's arms. Mark shifted his weight, the sound unnaturally loud. They both felt it, the palpable sense of not being alone, of unseen eyes watching them from the shadows that clung to the edges of the empty office. It was an oppressive, chilling feeling.	location: office mood: scary people: - age: 35 gender: male name: Mark - age: 28 gender: female name: Sarah time_of_day: night

100 items in dataset.

Just like in a traditional ML project, we’ll split our dataset to training, validation, and testing subsets. We want to avoid testing on data that was seen during training. Note that the analogy isn’t perfect - some data from the validation set leaks into training as we provide feedback to the LLM on previous runs. The testing set, however, is clearn.

training_dataset = dataset.iloc[:25].reset_index(drop=True)
validation_dataset = dataset.iloc[25:50].reset_index(drop=True)
testing_dataset = dataset.iloc[50:100].reset_index(drop=True)

print(f'training: {training_dataset.shape}')
display_df(training_dataset.tail(1))

print(f'validation: {validation_dataset.shape}')
display_df(validation_dataset.tail(1))

print(f'testing: {testing_dataset.shape}')
display_df(testing_dataset.tail(1))

training: (25, 2)

	input	output
24	The air in The Rusty Tankard felt heavy, thick with the scent of stale beer and unspoken words. Outside, the streetlights cast long, weak shadows, the sky a deep, starless black. It was late, well past nine, the usual evening bustle having faded into a quiet murmur. Around a small, sticky table sat four people. There was David, a man of 58, his shoulders slumped, eyes fixed on the condensation ring his glass left on the wood. Across from him was his daughter, Maria, a woman of 32, her fingers tracing patterns on the tabletop, a deep sigh escaping her lips every so often. To David's left sat their neighbour, Mr. Henderson, a quiet man of 65, who simply held his drink, his gaze distant. Completing the group was Sarah, a young woman of 25, a friend of Maria's, who looked pale and drawn, her voice barely a whisper when she spoke. The clinking of glasses from the bar seemed miles away, unable to break the palpable sense of loss that hung over their small gathering. Each person seemed wrapped in their own private world of sorrow, the shared space offering little comfort against the cold reality they were facing together, yet alone.	location: pub mood: sad people: - age: 58 gender: male name: David - age: 32 gender: female name: Maria - age: 65 gender: male name: Mr. Henderson - age: 25 gender: female name: Sarah time_of_day: night

validation: (25, 2)

	input	output
24	The afternoon sun bathed the quiet street in a warm, golden light. Long, soft shadows stretched across the pavement, indicating the day was gently progressing. A sense of calm hung in the air, peaceful and still, disturbed only by the soft rustle of leaves on the mature trees lining the sidewalk. Clara, a female aged sixty five, sat on a simple wooden bench positioned off the main path. Her expression was one of deep peace, her hands resting calmly in her lap as she watched the scene. Nearby, a young boy named Leo, eight years old and male, rode a bicycle back and forth with slow, deliberate movements on the smooth asphalt of the street's edge. His focus was intense but his manner was unhurried. Approaching them was David, a man of thirty five years, walking with a relaxed gait. He carried a single bag of groceries, his shoulders loose, contributing to the overall tranquil feeling of the scene. As he neared the bench, a soft smile touched his lips. The three figures shared the peaceful afternoon space, each content in the quiet street setting.	location: street mood: calm people: - age: 65 gender: female name: Clara - age: 35 gender: male name: David - age: 8 gender: male name: Leo time_of_day: afternoon

testing: (50, 2)

	input	output
49	Elara, a woman aged 68, sat peacefully on a sun-warmed stone. Beside her, Liam, a boy of 12, carefully stacked smooth, flat stones he found in the streambed. They were nestled in a secluded clearing deep within a sprawling forest. Ancient oak and pine trees stood sentinel around the edges, their leaves whispering in the soft breeze. The air held the cool dampness of dawn, mingled with the fresh scent of moss and earth. Above, the sky was a pale blue canvas, painted with the first delicate strokes of golden light as the sun began its ascent over the distant hills. The sounds were hushed and gentle: the continuous, soothing trickle of the stream flowing over polished pebbles, the sleepy cooing of a dove hidden high in the canopy, the soft rustle of leaves. Elara closed her eyes for a moment, breathing deeply, a serene smile touching her lips. Liam, finished with his stone tower, simply watched the water flow, his young face reflecting the quiet peace of the surroundings. The tranquility of the morning was palpable, a quiet stillness that settled deep within them both. It was early morning, and a profound sense of calm enveloped the clearing.	location: nature mood: calm people: - age: 68 gender: female name: Elara - age: 12 gender: male name: Liam time_of_day: morning

In the training process, we iteratively feed the samples from the traning set to the LLM, along with a request to analyse the samples and craft a prompt for transforming from the input to the output. We then apply the generated prompt to all the samples in out validation set, calculate the accuracy, and use the results as feedback for the LLM in a subsequent run. We continue iterating until we have a prompt that achieves high accuracy on the validation set.

def compare_responses(res1, res2):
    try:
        return yaml.safe_load(res1) == yaml.safe_load(res2)
    except:
        return False

async def discover_prompt(training_dataset, validation_dataset):
    epochs = []
    run_again = True

    while run_again:
        print(f'Epoch {len(epochs) + 1}\n\n')

        epoch_prompt = None

        training_sample_prompt = '<training-samples>\n'
        for i, row in training_dataset.iterrows():
            training_sample_prompt += (
                "<sample>\n"
                "<input>\n"
                + str(row['input']) + "\n"
                "</input>\n"
                "<output>\n"
                + str(row['output']) + "\n"
                "</output>\n"
                "</sample>\n"
            )
        training_sample_prompt += '</training-samples>'
        training_sample_prompt = dedent(training_sample_prompt)

        if len(epochs) == 0:
            epoch_prompt = dedent(f"""
            You are an expert AI engineer.
            Your goal is to create the most accurate and effective prompt for an LLM.
            Below you are provided with a set of training samples.
            Each samples consists of an input and an output.
            You should create a prompt that will generate the output given the input.

            Instructions: thinking carefully about the training samples to understand the exact transformation required.
            Output: output only the generated prompt, without any additional text or structure (no quoting, no JSON, no XML, etc...)

            {training_sample_prompt}
            """)
        else:
            epoch_prompt = dedent(f"""
            You are an expert AI engineer.
            Your goal is to create the most accurate and effective prompt for an LLM.
            Below you are provided with a set of training samples.
            Each samples consists of an input and an output.
            You should create a prompt that will generate the output given the input.

            Instructions: thinking carefully about the training samples to understand the exact transformation required.
            Output: output only the generated prompt, without any additional text or structure (no quoting, no JSON, no XML, etc...)

            You have information about the previous training epochs:
            <previous-epochs>
            {json.dumps(epochs)}
            <previous-epochs>

            You need to improve the prompt.
            Remember that you can rewrite the prompt completely if needed -
            the previous prompt is provided here for your review.
            
            {training_sample_prompt}
        """)

        transform_prompt = await gemini(epoch_prompt)

        validation_prompts = []
        expected = []
        for _, row in validation_dataset.iterrows():
            expected.append(str(row['output']))
            validation_prompts.append(f"""{transform_prompt}

            <input>
            {str(row['input'])}
            </input>
            """)

        results = await asyncio.gather(*(gemini(p) for p in validation_prompts))

        validation_results = [
            {'expected': exp, 'result': res, 'match': compare_responses(exp, res)}
            for exp, res in zip(expected, results)
        ]

        validation_accuracy = sum([1 for r in validation_results if r['match']]) / len(validation_results)
        epochs.append({
            'epoch_number': len(epochs),
            'prompt': transform_prompt,
            'validation_accuracy': validation_accuracy,
            'validation_results': validation_results
        })                

        print(f'New prompt:\n___\n{transform_prompt}\n___\n')
        print(f"Validation accuracy: {validation_accuracy:.2%}\n___\n\n")

        run_again = len(epochs) <= 23 and epochs[-1]['validation_accuracy'] <= 0.9
    return epochs[-1]['prompt'], epochs[-1]['validation_accuracy']


transform_prompt, transform_validation_accuracy = await discover_prompt(training_dataset, validation_dataset)

print(f"Transform prompt:\n___\n{transform_prompt}\n___\n")
print(f"Validation accuracy: {transform_validation_accuracy:.2%}\n___\n")

Epoch 1


New prompt:
___
You are an expert information extraction bot. Your task is to analyze the provided text and extract specific details about the scene described.

Extract the following information:
1.  `location`: Identify the primary setting (e.g., nature, office, street, home, pub).
2.  `mood`: Determine the prevailing atmosphere or emotional tone. Choose from 'calm', 'scary', 'sad', 'happy'.
3.  `people`: Create a list of each named individual mentioned. For each person, extract or infer their `age` (use a reasonable number if an approximate range is given), `gender` (choose from 'male', 'female', or 'other' if specified), and `name`. List the people in alphabetical order by name.
4.  `time_of_day`: Determine the time of day based on descriptions of light, activities, or explicit mentions. Choose from 'morning', 'afternoon', 'night'.

Format your output strictly as YAML, following this exact structure:

```text
location: <extracted location>
mood: <inferred mood>
people:
- age: <person 1 age>
  gender: <person 1 gender>
  name: <person 1 name>
- age: <person 2 age>
  gender: <person 2 gender>
  name: <person 2 name>
... # continue for all identified people, sorted alphabetically by name
time_of_day: <inferred time of day>
```

Ensure the list of people is correctly formatted with `-` for each item and indented sub-fields.
Provide *only* the YAML output. Do not include any introductory text, explanations, or surrounding characters like markdown fences (```).

Here is the text to process:
___

Validation accuracy: 4.00%
___


Epoch 2


New prompt:
___
Analyze the provided text to extract specific details and structure the output as YAML.

Extract the following information:
1.  `location`: Identify the primary setting. Choose from 'nature', 'office', 'street', 'home', 'pub'.
2.  `mood`: Determine the prevailing atmosphere or emotional tone. Choose from 'calm', 'scary', 'sad', 'happy'.
3.  `people`: Create a list of each named individual mentioned. For each person, extract their `name`, `age` (use a single, reasonable number if an approximate range is given), and `gender` (choose from 'male', 'female', or 'other' if specified). List the people in alphabetical order by `name`.
4.  `time_of_day`: Determine the time of day based on descriptions of light, activities, or explicit mentions. Choose from 'morning', 'afternoon', 'night'.

Format your output strictly as YAML, following this exact structure:

```text
location: <extracted location>
mood: <inferred mood>
people:
- age: <person 1 age>
  gender: <person 1 gender>
  name: <person 1 name>
- age: <person 2 age>
  gender: <person 2 gender>
  name: <person 2 name>
... # continue for all identified people, sorted alphabetically by name
time_of_day: <inferred time of day>
```

Ensure the list of people is correctly formatted with `-` for each item and indented sub-fields (`age`, `gender`, `name`). Maintain the exact indentation and spacing shown in the example.

Output *only* the YAML. Do not include any introductory text, explanations, or surrounding characters like markdown fences (```).
___

Validation accuracy: 40.00%
___


Epoch 3


New prompt:
___
You are an expert AI assistant specializing in structured data extraction from text. Your task is to analyze the provided scene description and extract specific details, presenting them in a precise YAML format.

Extract the following information:
1.  `location`: Identify the primary setting. Choose one of: 'nature', 'office', 'street', 'home', 'pub'.
2.  `mood`: Determine the dominant atmosphere or emotional tone. Choose one of: 'calm', 'scary', 'sad', 'happy'.
3.  `people`: Identify every named individual. For each person, extract or infer their `name`, `age`, and `gender`.
    *   `name`: The full name as it appears, including titles like 'Mr.', 'Mrs.', etc., if used.
    *   `age`: A single integer number. Extract the exact age if explicitly stated. If a range (e.g., 'late fifties', 'around sixty') or approximation is given, determine the most reasonable single integer based on context and the provided examples (e.g., 'late fifties' -> 58, 'early sixties' -> 62, 'around sixty' -> 60, 'perhaps fifty' -> 50, 'mid-thirties' -> 35).
    *   `gender`: Determine their gender. Choose one of: 'male', 'female', or 'other'. Infer if necessary based on pronouns, names, or titles (e.g., 'Mr.' implies male, 'Mrs.' implies female). Use 'other' only if explicitly stated or strongly implied as non-binary.
    List these individuals alphabetically by their `name`.
4.  `time_of_day`: Determine the time period based on context clues (light, activities, explicit mentions). Choose one of: 'morning', 'afternoon', 'night'.

Format your output strictly as YAML. The top-level keys (`location`, `mood`, `people`, `time_of_day`) should be at the root. `people` should be a YAML list where each item is an object (`-` followed by indented keys) with keys `age`, `gender`, and `name`. Maintain the *exact* structure and indentation shown in the training samples.

**CRITICAL:** Your output must be *only* the YAML content. Do not include any introductory text, explanations, conversation, or surrounding characters like markdown fences (```text) or similar wrappers.

Here is the text to process:
___

Validation accuracy: 68.00%
___


Epoch 4


New prompt:
___
Extract specific information from the provided text about a scene and format it strictly as YAML.

Fields to extract:
1.  `location`: The primary setting. Must be one of: nature, office, street, home, pub.
2.  `mood`: The dominant atmosphere or emotional tone. Must be one of: calm, scary, sad, happy.
3.  `people`: A list of each named individual. For each person:
    *   `name`: The full name, including titles (e.g., Mr., Mrs.) if present.
    *   `age`: A single integer. Use the exact age if given. If a range or approximation is given, estimate a reasonable single integer (e.g., 'late fifties' -> 58, 'early sixties' -> 62, 'around sixty' -> 60, 'perhaps fifty' -> 50, 'mid-thirties' -> 35, 'early twenties' -> 22, 'late twenties' -> 28).
    *   `gender`: The gender. Must be one of: male, female, other. Infer if necessary using pronouns, names, or titles. Use 'other' only if explicitly stated or strongly implied as non-binary.
    The list of people must be sorted alphabetically by `name`.
4.  `time_of_day`: The time period. Must be one of: morning, afternoon, night.

Output Format:
Produce the extracted information in the following YAML structure. Maintain the *exact* indentation and spacing as shown below.

location: <extracted location>
mood: <inferred mood>
people:
  - age: <person 1 age>
    gender: <person 1 gender>
    name: <person 1 name>
  - age: <person 2 age>
    gender: <person 2 gender>
    name: <person 2 name>
... # continue for all identified people, sorted alphabetically by name
time_of_day: <inferred time of day>

Strict Output Requirement:
Your response must contain *only* the YAML content specified above. Do not include any markdown code block fences (```text or ```), introductory sentences, conversational text, explanations, or any other characters before or after the YAML. Start your response immediately with the YAML, at the very first character.

Text to process:

___

Validation accuracy: 96.00%
___


Transform prompt:
___
Extract specific information from the provided text about a scene and format it strictly as YAML.

Fields to extract:
1.  `location`: The primary setting. Must be one of: nature, office, street, home, pub.
2.  `mood`: The dominant atmosphere or emotional tone. Must be one of: calm, scary, sad, happy.
3.  `people`: A list of each named individual. For each person:
    *   `name`: The full name, including titles (e.g., Mr., Mrs.) if present.
    *   `age`: A single integer. Use the exact age if given. If a range or approximation is given, estimate a reasonable single integer (e.g., 'late fifties' -> 58, 'early sixties' -> 62, 'around sixty' -> 60, 'perhaps fifty' -> 50, 'mid-thirties' -> 35, 'early twenties' -> 22, 'late twenties' -> 28).
    *   `gender`: The gender. Must be one of: male, female, other. Infer if necessary using pronouns, names, or titles. Use 'other' only if explicitly stated or strongly implied as non-binary.
    The list of people must be sorted alphabetically by `name`.
4.  `time_of_day`: The time period. Must be one of: morning, afternoon, night.

Output Format:
Produce the extracted information in the following YAML structure. Maintain the *exact* indentation and spacing as shown below.

location: <extracted location>
mood: <inferred mood>
people:
  - age: <person 1 age>
    gender: <person 1 gender>
    name: <person 1 name>
  - age: <person 2 age>
    gender: <person 2 gender>
    name: <person 2 name>
... # continue for all identified people, sorted alphabetically by name
time_of_day: <inferred time of day>

Strict Output Requirement:
Your response must contain *only* the YAML content specified above. Do not include any markdown code block fences (```text or ```), introductory sentences, conversational text, explanations, or any other characters before or after the YAML. Start your response immediately with the YAML, at the very first character.

Text to process:

___

Validation accuracy: 96.00%
___

Pretty cool! In only a few steps, we managed to refine the prompt and increase the accuracy.

Let’s try the resulting prompt on our testing set. Can it perform as well on examples it hasn’t encountered yet?

async def test_prompt(prompt_to_test, test_data):
    """Tests the given prompt against the test dataset and reports accuracy."""
    test_prompts = []
    expected_outputs = []
    for _, row in test_data.iterrows():
        expected_outputs.append(str(row['output']))
        test_prompts.append(f"""{prompt_to_test}

<input>
{str(row['input'])}
</input>
""")

    print(f"Running test on {len(test_prompts)} samples...")
    results = await asyncio.gather(*(gemini(p) for p in test_prompts))
    print("Testing complete.")

    test_results = [
        {'input': test_data.iloc[i]['input'], 'expected': exp, 'result': res, 'match': compare_responses(exp, res)}
        for i, (exp, res) in enumerate(zip(expected_outputs, results))
    ]

    test_accuracy = sum([1 for r in test_results if r['match']]) / len(test_results)
    
    # Display mismatches
    mismatches = [r for r in test_results if not r['match']]
    if mismatches:
        print(f"\nFound {len(mismatches)} mismatches:")
        for i, mismatch in enumerate(mismatches[:5]): # Show first 5 mismatches
             md(f"""**Mismatch {i+1}:**
Input:
```
{mismatch['input']}
```
Expected:
```
{mismatch['expected']}
```
Result:
```
{mismatch['result']}
```
___""")
    else:
        print("\nNo mismatches found!")


    return test_accuracy, test_results

test_accuracy, test_results_details = await test_prompt(transform_prompt, testing_dataset)

print(f"\nTesting Accuracy: {test_accuracy:.2%}")

Running test on 50 samples...
Testing complete.

Found 2 mismatches:

Mismatch 1: Input:

Anya, a small female of ten years, kept close to her older brother, Ben. Ben, a boy of twelve, tried to walk with confidence, but his twelve year old male legs hurried slightly on the slick, wet cobblestones of Elm Street. It was a narrow street where tall, dark buildings seemed to lean in over them, almost touching the grey sky above. The early morning mist hung thick, clinging to the damp stone walls and obscuring details further down the road. It was morning, but the light was weak and sickly, barely pushing back the deep shadows. The street was deserted, utterly silent save for their quickening footsteps. No cars, no other people hurrying to work, just an unnerving quiet that pressed in on them.

Ahead, standing perfectly still near a shadowed doorway, was a person. It was Mr. Silas, a male figure in his fifties. He wore a dark coat that seemed to absorb the little light available. He didn't move a muscle, just watched them with eyes that seemed too wide and unnaturally bright in the dim light. His face was pale, unsmiling, a fixed, unsettling expression. Anya whimpered softly, her ten year old female hand tightening its death grip on Ben's sleeve. Ben swallowed hard, his twelve year old male bravado melting away like the mist. The air felt colder now, heavy with unspoken dread. They quickened their pace, their footsteps echoing unnervingly on the wet stone, acutely aware of the fifty year old male's unblinking stare following their progress down the deserted morning street. The oppressive silence felt like a physical weight, and the only thing they could think of was getting away from the still figure watching them.

Expected:

location: street
mood: scary
people:
- age: 10
  gender: female
  name: Anya
- age: 12
  gender: male
  name: Ben
- age: 50
  gender: male
  name: Mr. Silas
time_of_day: morning

Result:

location: street
mood: scary
people:
  - age: 10
    gender: female
    name: Anya
  - age: 12
    gender: male
    name: Ben
  - age: 55
    gender: male
    name: Mr. Silas
time_of_day: morning

Mismatch 2: Input:

Anya, a female aged 32, leaned back in her ergonomic chair, a wide, triumphant smile illuminating her face. Across the large, dark-wood desk from her sat Ben, a male aged 28, shaking his head slightly in disbelief, a soft chuckle escaping his lips. The only light in the vast space came from the small, warm lamp on Anya's desk, casting long, distorted shadows across the silent rows of gray cubicles and towering filing cabinets stretching into the deep darkness of the deserted office. Outside the tall, uncurtained windows, the city below sparkled with a million tiny lights against the deep black canvas of the night sky; it was clearly well past the time most people were home and asleep. A forgotten cleaning cart stood parked near the elevator bank down the silent hall, a clear sign of the late hour. The air felt still and quiet, the usual daytime energy of ringing phones and clicking keyboards having long since dissipated, leaving a peaceful calm. Ben pushed a final stack of printed papers to the side, a triumphant grin spreading across his face. "We actually did it, Anya," he said, his voice resonating with a mix of exhaustion, relief, and genuine pleasure. "Finished! Hours logged, report submitted, everything done. The big project is officially complete!" Anya laughed, a clear, light, joyful sound that echoed slightly in the quiet, empty space, but it wasn't lonely. They exchanged a look of shared endurance and pure, unadulterated happiness, the late hour and the deserted office forgotten in the warm glow of their success and the bright feeling of elation.

Expected:

location: office
mood: happy
people:
- age: 32
  gender: female
  name: Anya
- age: 28
  gender: male
  name: Ben
time_of_day: night

Result:

location: office
mood: happy
people:
  - age: 28
    gender: male
    name: Ben
  - age: 32
    gender: female
    name: Anya
time_of_day: night


Testing Accuracy: 96.00%

Not perfect, but very high accuracy for very little effort.

In this example: 1. We provided a dataset, but no instructions on how to prompt to achieve the transformation from inputs to outputs. 2. We iteratively fed a subset of our samples to the LLM, getting it to discover an effective prompt. 3. Testing the resulting prompt, we can see that it performs well on new examples.

Datasets really are all you need!

PS If you liked this demo and are looking for more, visit my AI Expertise hub and subscribe to my newsletter (low volume, high value).