Hosted with nbsanity. See source notebook on GitHub.

Advent of Haystack: Day 1

In this first challenge, we are going to build a pipeline that answers questions based on the contents of a URL. The given pipeline works as is, but your task is to complete the connnections and add the other required two components👇

You should complete the missing sections in step 3, 4 and 5

Initial Components to use:

LinkContentFetcher for using the contents of several URLs in your pipeline
HTMLToDocument for converting the HTML files into documents.
PromptBuilder for creating the prompt
OpenAIGenerator for generating responses

https://docs.haystack.deepset.ai/docs/documentsplitter
https://docs.haystack.deepset.ai/docs/cohereranker

1) Installation

Install haystack-ai, trafilatura and sentence-transformers packages with pip:

!pip install -q haystack-ai cohere-haystack trafilatura "sentence-transformers>=3.0.0"
!pip install -q --upgrade openai # not to get the OpenAI proxies error: https://community.openai.com/t/error-with-openai-1-56-0-client-init-got-an-unexpected-keyword-argument-proxies/1040332/2

(Optional) Change the Logging Level

This way, not only warnings but also information messages are displayed in the console output.

import logging

logging.basicConfig(
    format="%(levelname)s - %(name)s -  %(message)s", level=logging.WARNING
)
logging.getLogger("haystack").setLevel(logging.DEBUG)

2) Enter API key for OpenAI

If you will use OpenAI models, save your API key as OPENAI_API_KEY environment variable.

from getpass import getpass
import os

os.environ["OPENAI_API_KEY"] = getpass("Enter OpenAI API key:")
os.environ["COHERE_API_KEY"] = getpass("Enter Cohere API key:")

Enter OpenAI API key:··········
Enter Cohere API key:··········

3) Create components

Initialize two components to complete the pipeline.

Hints: * One component is to split the documents into smaller chunks of 10 sentences each. * As the other component, consider methods or components you can use to filter out irrelevant chunks by ranking before injecting the context into the prompt

from haystack.components.fetchers import LinkContentFetcher
from haystack.components.converters import HTMLToDocument
from haystack.components.builders import PromptBuilder
from haystack.components.generators import OpenAIGenerator
from haystack.components.preprocessors import DocumentSplitter
from haystack_integrations.components.rankers.cohere import CohereRanker


fetcher = LinkContentFetcher()
converter = HTMLToDocument()
splitter = DocumentSplitter(split_by="sentence", split_length=10, split_overlap=0)
ranker = CohereRanker()
template = """Given the information below, answer the query. Only use the provided context to generate the answer and output the used document links
            Context:
            {% for document in documents %}
                {{ document.content }}
                URL: {{ document.meta.url }}
            {% endfor %}

            Question: {{ query }}
            Answer:"""

prompt_builder = PromptBuilder(template=template)
generator = OpenAIGenerator(model="gpt-4o-mini")

DEBUG:haystack.core.component.component:Registering <class 'haystack.components.fetchers.link_content.LinkContentFetcher'> as a component
DEBUG:haystack.core.component.component:Registered Component <class 'haystack.components.fetchers.link_content.LinkContentFetcher'>
DEBUG:haystack.core.component.component:Registering <class 'haystack.components.converters.azure.AzureOCRDocumentConverter'> as a component
DEBUG:haystack.core.component.component:Registered Component <class 'haystack.components.converters.azure.AzureOCRDocumentConverter'>
DEBUG:haystack.core.component.component:Registering <class 'haystack.components.converters.csv.CSVToDocument'> as a component
DEBUG:haystack.core.component.component:Registered Component <class 'haystack.components.converters.csv.CSVToDocument'>
DEBUG:haystack.core.component.component:Registering <class 'haystack.components.converters.docx.DOCXToDocument'> as a component
DEBUG:haystack.core.component.component:Registered Component <class 'haystack.components.converters.docx.DOCXToDocument'>
DEBUG:haystack.core.component.component:Registering <class 'haystack.components.converters.html.HTMLToDocument'> as a component
DEBUG:haystack.core.component.component:Registered Component <class 'haystack.components.converters.html.HTMLToDocument'>
DEBUG:haystack.core.component.component:Registering <class 'haystack.components.converters.json.JSONConverter'> as a component
DEBUG:haystack.core.component.component:Registered Component <class 'haystack.components.converters.json.JSONConverter'>
DEBUG:haystack.core.component.component:Registering <class 'haystack.components.converters.markdown.MarkdownToDocument'> as a component
DEBUG:haystack.core.component.component:Registered Component <class 'haystack.components.converters.markdown.MarkdownToDocument'>
DEBUG:haystack.core.component.component:Registering <class 'haystack.components.converters.openapi_functions.OpenAPIServiceToFunctions'> as a component
DEBUG:haystack.core.component.component:Registered Component <class 'haystack.components.converters.openapi_functions.OpenAPIServiceToFunctions'>
DEBUG:haystack.core.component.component:Registering <class 'haystack.components.converters.output_adapter.OutputAdapter'> as a component
DEBUG:haystack.core.component.component:Registered Component <class 'haystack.components.converters.output_adapter.OutputAdapter'>
DEBUG:haystack.core.component.component:Registering <class 'haystack.components.converters.pdfminer.PDFMinerToDocument'> as a component
DEBUG:haystack.core.component.component:Registered Component <class 'haystack.components.converters.pdfminer.PDFMinerToDocument'>
DEBUG:haystack.core.component.component:Registering <class 'haystack.components.converters.pptx.PPTXToDocument'> as a component
DEBUG:haystack.core.component.component:Registered Component <class 'haystack.components.converters.pptx.PPTXToDocument'>
DEBUG:haystack.core.component.component:Registering <class 'haystack.components.converters.pypdf.PyPDFToDocument'> as a component
DEBUG:haystack.core.component.component:Registered Component <class 'haystack.components.converters.pypdf.PyPDFToDocument'>
DEBUG:haystack.core.component.component:Registering <class 'haystack.components.converters.tika.TikaDocumentConverter'> as a component
DEBUG:haystack.core.component.component:Registered Component <class 'haystack.components.converters.tika.TikaDocumentConverter'>
DEBUG:haystack.core.component.component:Registering <class 'haystack.components.converters.txt.TextFileToDocument'> as a component
DEBUG:haystack.core.component.component:Registered Component <class 'haystack.components.converters.txt.TextFileToDocument'>
DEBUG:haystack.core.component.component:Registering <class 'haystack.components.builders.answer_builder.AnswerBuilder'> as a component
DEBUG:haystack.core.component.component:Registered Component <class 'haystack.components.builders.answer_builder.AnswerBuilder'>
DEBUG:haystack.core.component.component:Registering <class 'haystack.components.builders.chat_prompt_builder.ChatPromptBuilder'> as a component
DEBUG:haystack.core.component.component:Registered Component <class 'haystack.components.builders.chat_prompt_builder.ChatPromptBuilder'>
DEBUG:haystack.core.component.component:Registering <class 'haystack.components.builders.prompt_builder.PromptBuilder'> as a component
DEBUG:haystack.core.component.component:Registered Component <class 'haystack.components.builders.prompt_builder.PromptBuilder'>
DEBUG:haystack.core.component.component:Registering <class 'haystack.components.generators.openai.OpenAIGenerator'> as a component
DEBUG:haystack.core.component.component:Registered Component <class 'haystack.components.generators.openai.OpenAIGenerator'>
DEBUG:haystack.core.component.component:Registering <class 'haystack.components.generators.azure.AzureOpenAIGenerator'> as a component
DEBUG:haystack.core.component.component:Registered Component <class 'haystack.components.generators.azure.AzureOpenAIGenerator'>
DEBUG:haystack.core.component.component:Registering <class 'haystack.components.generators.hugging_face_local.HuggingFaceLocalGenerator'> as a component
DEBUG:haystack.core.component.component:Registered Component <class 'haystack.components.generators.hugging_face_local.HuggingFaceLocalGenerator'>
DEBUG:haystack.core.component.component:Registering <class 'haystack.components.generators.hugging_face_api.HuggingFaceAPIGenerator'> as a component
DEBUG:haystack.core.component.component:Registered Component <class 'haystack.components.generators.hugging_face_api.HuggingFaceAPIGenerator'>
DEBUG:haystack.core.component.component:Registering <class 'haystack.components.generators.openai_dalle.DALLEImageGenerator'> as a component
DEBUG:haystack.core.component.component:Registered Component <class 'haystack.components.generators.openai_dalle.DALLEImageGenerator'>
DEBUG:haystack.core.component.component:Registering <class 'haystack.components.preprocessors.document_cleaner.DocumentCleaner'> as a component
DEBUG:haystack.core.component.component:Registered Component <class 'haystack.components.preprocessors.document_cleaner.DocumentCleaner'>
DEBUG:haystack.core.component.component:Registering <class 'haystack.components.preprocessors.document_splitter.DocumentSplitter'> as a component
DEBUG:haystack.core.component.component:Registered Component <class 'haystack.components.preprocessors.document_splitter.DocumentSplitter'>
DEBUG:haystack.core.component.component:Registering <class 'haystack.components.preprocessors.nltk_document_splitter.NLTKDocumentSplitter'> as a component
DEBUG:haystack.core.component.component:Registered Component <class 'haystack.components.preprocessors.nltk_document_splitter.NLTKDocumentSplitter'>
DEBUG:haystack.core.component.component:Registering <class 'haystack.components.preprocessors.text_cleaner.TextCleaner'> as a component
DEBUG:haystack.core.component.component:Registered Component <class 'haystack.components.preprocessors.text_cleaner.TextCleaner'>
DEBUG:haystack.core.component.component:Registering <class 'haystack_integrations.components.rankers.cohere.ranker.CohereRanker'> as a component
DEBUG:haystack.core.component.component:Registered Component <class 'haystack_integrations.components.rankers.cohere.ranker.CohereRanker'>

4) Add them to a Haystack Pipeline

from haystack import Pipeline

pipeline = Pipeline()
pipeline.add_component(name="fetcher", instance=fetcher)
pipeline.add_component(name="converter", instance=converter)

pipeline.add_component(name="splitter", instance=splitter)
pipeline.add_component(name="ranker", instance=ranker)

pipeline.add_component(name="prompt_builder", instance=prompt_builder)
pipeline.add_component(name="generator", instance=generator)

DEBUG:haystack.core.pipeline.base:Adding component 'fetcher' (<haystack.components.fetchers.link_content.LinkContentFetcher object at 0x7c306b3a79a0>

Inputs:
  - urls: List[str]
Outputs:
  - streams: List[ByteStream])
DEBUG:haystack.core.pipeline.base:Adding component 'converter' (<haystack.components.converters.html.HTMLToDocument object at 0x7c3191975c30>

Inputs:
  - sources: List[Union[str, Path, ByteStream]]
  - meta: Union[Dict[str, Any], List[Dict[str, Any]]]
  - extraction_kwargs: Optional[Dict[str, Any]]
Outputs:
  - documents: List[Document])
DEBUG:haystack.core.pipeline.base:Adding component 'splitter' (<haystack.components.preprocessors.document_splitter.DocumentSplitter object at 0x7c306af337c0>

Inputs:
  - documents: List[Document]
Outputs:
  - documents: List[Document])
DEBUG:haystack.core.pipeline.base:Adding component 'ranker' (<haystack_integrations.components.rankers.cohere.ranker.CohereRanker object at 0x7c306af33a30>

Inputs:
  - query: str
  - documents: List[Document]
  - top_k: Optional[int]
Outputs:
  - documents: List[Document])
DEBUG:haystack.core.pipeline.base:Adding component 'prompt_builder' (<haystack.components.builders.prompt_builder.PromptBuilder object at 0x7c306af33a60>

Inputs:
  - query: Any
  - documents: Any
  - template: Optional[str]
  - template_variables: Optional[Dict[str, Any]]
Outputs:
  - prompt: str)
DEBUG:haystack.core.pipeline.base:Adding component 'generator' (<haystack.components.generators.openai.OpenAIGenerator object at 0x7c306ac88fa0>

Inputs:
  - prompt: str
  - system_prompt: Optional[str]
  - streaming_callback: Optional[Callable[]]
  - generation_kwargs: Optional[Dict[str, Any]]
Outputs:
  - replies: List[str]
  - meta: List[Dict[str, Any]])

###5) Connect the components

Complete the pipeline connections to achieve a working pipeline that can be run.

Note: Everytime you want to run cell 4 and cell 5, you need to run cell 3 and reinitialize all components

pipeline.connect("fetcher.streams", "converter.sources")
pipeline.connect("converter.documents", "splitter.documents")
pipeline.connect("splitter.documents", "ranker.documents")
pipeline.connect("ranker.documents", "prompt_builder.documents")
pipeline.connect("prompt_builder", "generator")

DEBUG:haystack.core.pipeline.base:Connecting 'fetcher.streams' to 'converter.sources'
DEBUG:haystack.core.pipeline.base:Connecting 'converter.documents' to 'splitter.documents'
DEBUG:haystack.core.pipeline.base:Connecting 'splitter.documents' to 'ranker.documents'
DEBUG:haystack.core.pipeline.base:Connecting 'ranker.documents' to 'prompt_builder.documents'
DEBUG:haystack.core.pipeline.base:Connecting 'prompt_builder.prompt' to 'generator.prompt'

<haystack.core.pipeline.pipeline.Pipeline object at 0x7c306ac8a620>
🚅 Components
  - fetcher: LinkContentFetcher
  - converter: HTMLToDocument
  - splitter: DocumentSplitter
  - ranker: CohereRanker
  - prompt_builder: PromptBuilder
  - generator: OpenAIGenerator
🛤️ Connections
  - fetcher.streams -> converter.sources (List[ByteStream])
  - converter.documents -> splitter.documents (List[Document])
  - splitter.documents -> ranker.documents (List[Document])
  - ranker.documents -> prompt_builder.documents (List[Document])
  - prompt_builder.prompt -> generator.prompt (str)

###6) Visualize the Pipeline 🎨

Display the pipeline image with show() method to understand the connections between components

pipeline.show()

DEBUG:haystack.core.pipeline.draw:Mermaid diagram:

%%{ init: {'theme': 'neutral' } }%%

graph TD;

fetcher["<b>fetcher</b><br><small><i>LinkContentFetcher</i></small>"]:::component -- "streams -> sources<br><small><i>List[ByteStream]</i></small>" --> converter["<b>converter</b><br><small><i>HTMLToDocument<br><br>Optional inputs:<ul style='text-align:left;'><li>meta (Union[Dict[str, Any], List[Dict[str, Any]]])</li><li>extraction_kwargs (Optional[Dict[str, Any]])</li></ul></i></small>"]:::component
converter["<b>converter</b><br><small><i>HTMLToDocument<br><br>Optional inputs:<ul style='text-align:left;'><li>meta (Union[Dict[str, Any], List[Dict[str, Any]]])</li><li>extraction_kwargs (Optional[Dict[str, Any]])</li></ul></i></small>"]:::component -- "documents -> documents<br><small><i>List[Document]</i></small>" --> splitter["<b>splitter</b><br><small><i>DocumentSplitter</i></small>"]:::component
splitter["<b>splitter</b><br><small><i>DocumentSplitter</i></small>"]:::component -- "documents -> documents<br><small><i>List[Document]</i></small>" --> ranker["<b>ranker</b><br><small><i>CohereRanker<br><br>Optional inputs:<ul style='text-align:left;'><li>top_k (Optional[int])</li></ul></i></small>"]:::component
ranker["<b>ranker</b><br><small><i>CohereRanker<br><br>Optional inputs:<ul style='text-align:left;'><li>top_k (Optional[int])</li></ul></i></small>"]:::component -. "documents -> documents (opt.)<br><small><i>List[Document]</i></small>" .-> prompt_builder["<b>prompt_builder</b><br><small><i>PromptBuilder<br><br>Optional inputs:<ul style='text-align:left;'><li>query (Any)</li><li>template (Optional[str])</li><li>template_variables (Optional[Dict[str, Any]])</li></ul></i></small>"]:::component
prompt_builder["<b>prompt_builder</b><br><small><i>PromptBuilder<br><br>Optional inputs:<ul style='text-align:left;'><li>query (Any)</li><li>template (Optional[str])</li><li>template_variables (Optional[Dict[str, Any]])</li></ul></i></small>"]:::component -- "prompt -> prompt<br><small><i>str</i></small>" --> generator["<b>generator</b><br><small><i>OpenAIGenerator<br><br>Optional inputs:<ul style='text-align:left;'><li>system_prompt (Optional[str])</li><li>streaming_callback (Optional[Callable[]])</li><li>generation_kwargs (Optional[Dict[str, Any]])</li></ul></i></small>"]:::component
i{&ast;}--"urls<br><small><i>List[str]</i></small>"--> fetcher["<b>fetcher</b><br><small><i>LinkContentFetcher</i></small>"]:::component
i{&ast;}--"query<br><small><i>str</i></small>"--> ranker["<b>ranker</b><br><small><i>CohereRanker<br><br>Optional inputs:<ul style='text-align:left;'><li>top_k (Optional[int])</li></ul></i></small>"]:::component
generator["<b>generator</b><br><small><i>OpenAIGenerator<br><br>Optional inputs:<ul style='text-align:left;'><li>system_prompt (Optional[str])</li><li>streaming_callback (Optional[Callable[]])</li><li>generation_kwargs (Optional[Dict[str, Any]])</li></ul></i></small>"]:::component--"replies<br><small><i>List[str]</i></small>"--> o{&ast;}
generator["<b>generator</b><br><small><i>OpenAIGenerator<br><br>Optional inputs:<ul style='text-align:left;'><li>system_prompt (Optional[str])</li><li>streaming_callback (Optional[Callable[]])</li><li>generation_kwargs (Optional[Dict[str, Any]])</li></ul></i></small>"]:::component--"meta<br><small><i>List[Dict[str, Any]]</i></small>"--> o{&ast;}

classDef component text-align:center;

DEBUG:haystack.core.pipeline.draw:Rendering graph at https://mermaid.ink/img/CiUleyBpbml0OiB7J3RoZW1lJzogJ25ldXRyYWwnIH0gfSUlCgpncmFwaCBURDsKCmZldGNoZXJbIjxiPmZldGNoZXI8L2I+PGJyPjxzbWFsbD48aT5MaW5rQ29udGVudEZldGNoZXI8L2k+PC9zbWFsbD4iXTo6OmNvbXBvbmVudCAtLSAic3RyZWFtcyAtPiBzb3VyY2VzPGJyPjxzbWFsbD48aT5MaXN0W0J5dGVTdHJlYW1dPC9pPjwvc21hbGw+IiAtLT4gY29udmVydGVyWyI8Yj5jb252ZXJ0ZXI8L2I+PGJyPjxzbWFsbD48aT5IVE1MVG9Eb2N1bWVudDxicj48YnI+T3B0aW9uYWwgaW5wdXRzOjx1bCBzdHlsZT0ndGV4dC1hbGlnbjpsZWZ0Oyc+PGxpPm1ldGEgKFVuaW9uW0RpY3Rbc3RyLCBBbnldLCBMaXN0W0RpY3Rbc3RyLCBBbnldXV0pPC9saT48bGk+ZXh0cmFjdGlvbl9rd2FyZ3MgKE9wdGlvbmFsW0RpY3Rbc3RyLCBBbnldXSk8L2xpPjwvdWw+PC9pPjwvc21hbGw+Il06Ojpjb21wb25lbnQKY29udmVydGVyWyI8Yj5jb252ZXJ0ZXI8L2I+PGJyPjxzbWFsbD48aT5IVE1MVG9Eb2N1bWVudDxicj48YnI+T3B0aW9uYWwgaW5wdXRzOjx1bCBzdHlsZT0ndGV4dC1hbGlnbjpsZWZ0Oyc+PGxpPm1ldGEgKFVuaW9uW0RpY3Rbc3RyLCBBbnldLCBMaXN0W0RpY3Rbc3RyLCBBbnldXV0pPC9saT48bGk+ZXh0cmFjdGlvbl9rd2FyZ3MgKE9wdGlvbmFsW0RpY3Rbc3RyLCBBbnldXSk8L2xpPjwvdWw+PC9pPjwvc21hbGw+Il06Ojpjb21wb25lbnQgLS0gImRvY3VtZW50cyAtPiBkb2N1bWVudHM8YnI+PHNtYWxsPjxpPkxpc3RbRG9jdW1lbnRdPC9pPjwvc21hbGw+IiAtLT4gc3BsaXR0ZXJbIjxiPnNwbGl0dGVyPC9iPjxicj48c21hbGw+PGk+RG9jdW1lbnRTcGxpdHRlcjwvaT48L3NtYWxsPiJdOjo6Y29tcG9uZW50CnNwbGl0dGVyWyI8Yj5zcGxpdHRlcjwvYj48YnI+PHNtYWxsPjxpPkRvY3VtZW50U3BsaXR0ZXI8L2k+PC9zbWFsbD4iXTo6OmNvbXBvbmVudCAtLSAiZG9jdW1lbnRzIC0+IGRvY3VtZW50czxicj48c21hbGw+PGk+TGlzdFtEb2N1bWVudF08L2k+PC9zbWFsbD4iIC0tPiByYW5rZXJbIjxiPnJhbmtlcjwvYj48YnI+PHNtYWxsPjxpPkNvaGVyZVJhbmtlcjxicj48YnI+T3B0aW9uYWwgaW5wdXRzOjx1bCBzdHlsZT0ndGV4dC1hbGlnbjpsZWZ0Oyc+PGxpPnRvcF9rIChPcHRpb25hbFtpbnRdKTwvbGk+PC91bD48L2k+PC9zbWFsbD4iXTo6OmNvbXBvbmVudApyYW5rZXJbIjxiPnJhbmtlcjwvYj48YnI+PHNtYWxsPjxpPkNvaGVyZVJhbmtlcjxicj48YnI+T3B0aW9uYWwgaW5wdXRzOjx1bCBzdHlsZT0ndGV4dC1hbGlnbjpsZWZ0Oyc+PGxpPnRvcF9rIChPcHRpb25hbFtpbnRdKTwvbGk+PC91bD48L2k+PC9zbWFsbD4iXTo6OmNvbXBvbmVudCAtLiAiZG9jdW1lbnRzIC0+IGRvY3VtZW50cyAob3B0Lik8YnI+PHNtYWxsPjxpPkxpc3RbRG9jdW1lbnRdPC9pPjwvc21hbGw+IiAuLT4gcHJvbXB0X2J1aWxkZXJbIjxiPnByb21wdF9idWlsZGVyPC9iPjxicj48c21hbGw+PGk+UHJvbXB0QnVpbGRlcjxicj48YnI+T3B0aW9uYWwgaW5wdXRzOjx1bCBzdHlsZT0ndGV4dC1hbGlnbjpsZWZ0Oyc+PGxpPnF1ZXJ5IChBbnkpPC9saT48bGk+dGVtcGxhdGUgKE9wdGlvbmFsW3N0cl0pPC9saT48bGk+dGVtcGxhdGVfdmFyaWFibGVzIChPcHRpb25hbFtEaWN0W3N0ciwgQW55XV0pPC9saT48L3VsPjwvaT48L3NtYWxsPiJdOjo6Y29tcG9uZW50CnByb21wdF9idWlsZGVyWyI8Yj5wcm9tcHRfYnVpbGRlcjwvYj48YnI+PHNtYWxsPjxpPlByb21wdEJ1aWxkZXI8YnI+PGJyPk9wdGlvbmFsIGlucHV0czo8dWwgc3R5bGU9J3RleHQtYWxpZ246bGVmdDsnPjxsaT5xdWVyeSAoQW55KTwvbGk+PGxpPnRlbXBsYXRlIChPcHRpb25hbFtzdHJdKTwvbGk+PGxpPnRlbXBsYXRlX3ZhcmlhYmxlcyAoT3B0aW9uYWxbRGljdFtzdHIsIEFueV1dKTwvbGk+PC91bD48L2k+PC9zbWFsbD4iXTo6OmNvbXBvbmVudCAtLSAicHJvbXB0IC0+IHByb21wdDxicj48c21hbGw+PGk+c3RyPC9pPjwvc21hbGw+IiAtLT4gZ2VuZXJhdG9yWyI8Yj5nZW5lcmF0b3I8L2I+PGJyPjxzbWFsbD48aT5PcGVuQUlHZW5lcmF0b3I8YnI+PGJyPk9wdGlvbmFsIGlucHV0czo8dWwgc3R5bGU9J3RleHQtYWxpZ246bGVmdDsnPjxsaT5zeXN0ZW1fcHJvbXB0IChPcHRpb25hbFtzdHJdKTwvbGk+PGxpPnN0cmVhbWluZ19jYWxsYmFjayAoT3B0aW9uYWxbQ2FsbGFibGVbXV0pPC9saT48bGk+Z2VuZXJhdGlvbl9rd2FyZ3MgKE9wdGlvbmFsW0RpY3Rbc3RyLCBBbnldXSk8L2xpPjwvdWw+PC9pPjwvc21hbGw+Il06Ojpjb21wb25lbnQKaXsmYXN0O30tLSJ1cmxzPGJyPjxzbWFsbD48aT5MaXN0W3N0cl08L2k+PC9zbWFsbD4iLS0+IGZldGNoZXJbIjxiPmZldGNoZXI8L2I+PGJyPjxzbWFsbD48aT5MaW5rQ29udGVudEZldGNoZXI8L2k+PC9zbWFsbD4iXTo6OmNvbXBvbmVudAppeyZhc3Q7fS0tInF1ZXJ5PGJyPjxzbWFsbD48aT5zdHI8L2k+PC9zbWFsbD4iLS0+IHJhbmtlclsiPGI+cmFua2VyPC9iPjxicj48c21hbGw+PGk+Q29oZXJlUmFua2VyPGJyPjxicj5PcHRpb25hbCBpbnB1dHM6PHVsIHN0eWxlPSd0ZXh0LWFsaWduOmxlZnQ7Jz48bGk+dG9wX2sgKE9wdGlvbmFsW2ludF0pPC9saT48L3VsPjwvaT48L3NtYWxsPiJdOjo6Y29tcG9uZW50CmdlbmVyYXRvclsiPGI+Z2VuZXJhdG9yPC9iPjxicj48c21hbGw+PGk+T3BlbkFJR2VuZXJhdG9yPGJyPjxicj5PcHRpb25hbCBpbnB1dHM6PHVsIHN0eWxlPSd0ZXh0LWFsaWduOmxlZnQ7Jz48bGk+c3lzdGVtX3Byb21wdCAoT3B0aW9uYWxbc3RyXSk8L2xpPjxsaT5zdHJlYW1pbmdfY2FsbGJhY2sgKE9wdGlvbmFsW0NhbGxhYmxlW11dKTwvbGk+PGxpPmdlbmVyYXRpb25fa3dhcmdzIChPcHRpb25hbFtEaWN0W3N0ciwgQW55XV0pPC9saT48L3VsPjwvaT48L3NtYWxsPiJdOjo6Y29tcG9uZW50LS0icmVwbGllczxicj48c21hbGw+PGk+TGlzdFtzdHJdPC9pPjwvc21hbGw+Ii0tPiBveyZhc3Q7fQpnZW5lcmF0b3JbIjxiPmdlbmVyYXRvcjwvYj48YnI+PHNtYWxsPjxpPk9wZW5BSUdlbmVyYXRvcjxicj48YnI+T3B0aW9uYWwgaW5wdXRzOjx1bCBzdHlsZT0ndGV4dC1hbGlnbjpsZWZ0Oyc+PGxpPnN5c3RlbV9wcm9tcHQgKE9wdGlvbmFsW3N0cl0pPC9saT48bGk+c3RyZWFtaW5nX2NhbGxiYWNrIChPcHRpb25hbFtDYWxsYWJsZVtdXSk8L2xpPjxsaT5nZW5lcmF0aW9uX2t3YXJncyAoT3B0aW9uYWxbRGljdFtzdHIsIEFueV1dKTwvbGk+PC91bD48L2k+PC9zbWFsbD4iXTo6OmNvbXBvbmVudC0tIm1ldGE8YnI+PHNtYWxsPjxpPkxpc3RbRGljdFtzdHIsIEFueV1dPC9pPjwvc21hbGw+Ii0tPiBveyZhc3Q7fQoKY2xhc3NEZWYgY29tcG9uZW50IHRleHQtYWxpZ246Y2VudGVyOwo=?type=png

###7) Run the Pipeline

Use the URLs below and try the example queries.

urls = [
    "https://haystack.deepset.ai/blog/extracting-metadata-filter",
    "https://haystack.deepset.ai/blog/query-expansion",
    "https://haystack.deepset.ai/blog/query-decomposition",
    "https://haystack.deepset.ai/cookbook/metadata_enrichment",
]

queries = [
    "What is the difference between metadata filtering and metadata enrichment?",
    "Which methods can I use to transform query for better retrieval?",
    "How can I use metadata to improve retrieval?",
    "What's preprocessing?",  # Should return no answer
]

## Add parameters
for query in queries:
    result = pipeline.run(
        data={
            "fetcher": {"urls": urls},
            "ranker": {"query": query, "top_k": 2},
            "prompt_builder": {"query": query},
        }
    )
    print(f"\n\n{query=}")
    # Print the LLM response
    print(result["generator"]["replies"][0])
    print("-" * 100)

INFO:haystack.core.pipeline.pipeline:Running component fetcher
INFO:haystack.core.pipeline.pipeline:Running component converter
INFO:haystack.core.pipeline.pipeline:Running component splitter
INFO:haystack.core.pipeline.pipeline:Running component ranker
INFO:haystack.core.pipeline.pipeline:Running component prompt_builder
INFO:haystack.core.pipeline.pipeline:Running component generator
INFO:haystack.core.pipeline.pipeline:Running component fetcher
INFO:haystack.core.pipeline.pipeline:Running component converter



query='What is the difference between metadata filtering and metadata enrichment?'
Metadata filtering is a process of applying filters to queries that limit the scope of search results based on existing metadata attributes, such as date, topic, or source. It helps in retrieving documents that match specific metadata criteria, ensuring that the results are relevant to the user's query.

On the other hand, metadata enrichment involves the process of automatically enhancing the metadata of documents by extracting new information and adding custom attributes. This is done using a Pydantic Model to define metadata fields and using a custom component to populate these fields with relevant data, thus improving the quality and usability of the metadata.

In summary, while metadata filtering focuses on retrieving documents based on predefined metadata characteristics, metadata enrichment is about adding new and structured metadata to improve document categorization and retrieval.

Used document links:
- https://haystack.deepset.ai/cookbook/metadata_enrichment
- https://haystack.deepset.ai/blog/extracting-metadata-filter
----------------------------------------------------------------------------------------------------

INFO:haystack.core.pipeline.pipeline:Running component splitter
INFO:haystack.core.pipeline.pipeline:Running component ranker
INFO:haystack.core.pipeline.pipeline:Running component prompt_builder
INFO:haystack.core.pipeline.pipeline:Running component generator
INFO:haystack.core.pipeline.pipeline:Running component fetcher
INFO:haystack.core.pipeline.pipeline:Running component converter



query='Which methods can I use to transform query for better retrieval?'
To transform queries for better retrieval, you can use methods such as:

1. **Query Decomposition**: This method breaks down a complex question into smaller, independent sub-questions. By answering these sub-questions individually, you can reason about the final answer, which enhances the retrieval process.

2. **Query Expansion**: This method involves expanding keyword queries to improve recall and provide more context for the retrieval process. It helps in increasing the number of results, thereby improving the chances of retrieving relevant information.

These techniques can significantly enhance the effectiveness of retrieval augmented generation (RAG) systems.

**Used Document Links:**
- https://haystack.deepset.ai/blog/query-decomposition
- https://haystack.deepset.ai/blog/query-expansion
----------------------------------------------------------------------------------------------------

INFO:haystack.core.pipeline.pipeline:Running component splitter
INFO:haystack.core.pipeline.pipeline:Running component ranker
INFO:haystack.core.pipeline.pipeline:Running component prompt_builder
INFO:haystack.core.pipeline.pipeline:Running component generator
INFO:haystack.core.pipeline.pipeline:Running component fetcher
INFO:haystack.core.pipeline.pipeline:Running component converter



query='How can I use metadata to improve retrieval?'
You can use metadata to improve retrieval by implementing metadata filtering in your Retrieval-Augmented Generation (RAG) applications. Metadata allows you to categorize, sort, and filter information based on attributes like date, topic, source, or other relevant information. By applying metadata filters to your queries, you can limit the search space and ensure that the retrieved documents align well with the specific query or topic, enhancing relevance and accuracy. This leads to improved efficiency in processing and retrieval response times.

For example, if you filter based on metadata such as the year and company, you can ensure that only relevant documents are returned for queries about revenue increases. This enhances the quality of the generated responses by providing precise context.

Used document links:
- https://haystack.deepset.ai/blog/extracting-metadata-filter
----------------------------------------------------------------------------------------------------

INFO:haystack.core.pipeline.pipeline:Running component splitter
INFO:haystack.core.pipeline.pipeline:Running component ranker
INFO:haystack.core.pipeline.pipeline:Running component prompt_builder
INFO:haystack.core.pipeline.pipeline:Running component generator



query="What's preprocessing?"
Preprocessing in the context of LLM-based applications refers to the process of extracting metadata filters from a natural language query before using it with a retriever. This involves identifying valuable hints in the query that can improve retrieval accuracy, such as identifying specific years or companies mentioned. For example, in the query "What was the revenue of Nvidia in 2022?", the year "2022" and the company "Nvidia" can be extracted and formulated into structured metadata filters to enhance the information retrieval process.

Used Document Links:
1. https://haystack.deepset.ai/blog/extracting-metadata-filter
2. https://haystack.deepset.ai/blog/query-decomposition
----------------------------------------------------------------------------------------------------

(Optional) Bonus Task

Try to recreate the same pipeline but this time, use OpenAIChatGenerator and ChatPromptBuilder

from haystack.components.fetchers import LinkContentFetcher
from haystack.components.converters import HTMLToDocument
from haystack.components.generators.chat import OpenAIChatGenerator
from haystack.components.builders import ChatPromptBuilder
from haystack.dataclasses import ChatMessage
from haystack.components.preprocessors import DocumentSplitter
from haystack_integrations.components.rankers.cohere import CohereRanker


fetcher = LinkContentFetcher()
converter = HTMLToDocument()
splitter = DocumentSplitter(split_by="sentence", split_length=10, split_overlap=0)
ranker = CohereRanker()
template = """Given the information below, answer the query. Only use the provided context to generate the answer and output the used document links
            Context:
            {% for document in documents %}
                {{ document.content }}
                URL: {{ document.meta.url }}
            {% endfor %}

            Question: {{ query }}
            Answer:"""


system_message = """You are a technical expert. Use only the provided content and source URLs to answer questions. Don't use your own knowledge.
Do not add any extra information or search the web.
Keep your answers clear, accurate, and to the point, including the document links you used.
If the documents don’t have the answer, say "no answer"
"""

system_message = ChatMessage.from_system(system_message)
messages = [
    system_message,
    ChatMessage.from_user(template),
]  # being used in pipeline.run


pipeline = Pipeline()
pipeline.add_component(name="fetcher", instance=fetcher)
pipeline.add_component(name="converter", instance=converter)

pipeline.add_component(name="splitter", instance=splitter)
pipeline.add_component(name="ranker", instance=ranker)

pipeline.add_component(
    name="prompt_builder", instance=ChatPromptBuilder(variables=["query", "documents"])
)
pipeline.add_component(
    name="generator", instance=OpenAIChatGenerator(model="gpt-4o-mini")
)

pipeline.connect("fetcher.streams", "converter.sources")
pipeline.connect("converter.documents", "splitter.documents")
pipeline.connect("splitter.documents", "ranker.documents")
pipeline.connect("ranker.documents", "prompt_builder")
pipeline.connect("prompt_builder", "generator")

DEBUG:haystack.core.component.component:Registering <class 'haystack.components.generators.chat.openai.OpenAIChatGenerator'> as a component
DEBUG:haystack.core.component.component:Registered Component <class 'haystack.components.generators.chat.openai.OpenAIChatGenerator'>
DEBUG:haystack.core.component.component:Registering <class 'haystack.components.generators.chat.azure.AzureOpenAIChatGenerator'> as a component
DEBUG:haystack.core.component.component:Registered Component <class 'haystack.components.generators.chat.azure.AzureOpenAIChatGenerator'>
DEBUG:haystack.core.component.component:Registering <class 'haystack.components.generators.chat.hugging_face_local.HuggingFaceLocalChatGenerator'> as a component
DEBUG:haystack.core.component.component:Registered Component <class 'haystack.components.generators.chat.hugging_face_local.HuggingFaceLocalChatGenerator'>
DEBUG:haystack.core.component.component:Registering <class 'haystack.components.generators.chat.hugging_face_api.HuggingFaceAPIChatGenerator'> as a component
DEBUG:haystack.core.component.component:Registered Component <class 'haystack.components.generators.chat.hugging_face_api.HuggingFaceAPIChatGenerator'>
DEBUG:haystack.core.pipeline.base:Adding component 'fetcher' (<haystack.components.fetchers.link_content.LinkContentFetcher object at 0x7c3068254670>

Inputs:
  - urls: List[str]
Outputs:
  - streams: List[ByteStream])
DEBUG:haystack.core.pipeline.base:Adding component 'converter' (<haystack.components.converters.html.HTMLToDocument object at 0x7c3068254610>

Inputs:
  - sources: List[Union[str, Path, ByteStream]]
  - meta: Union[Dict[str, Any], List[Dict[str, Any]]]
  - extraction_kwargs: Optional[Dict[str, Any]]
Outputs:
  - documents: List[Document])
DEBUG:haystack.core.pipeline.base:Adding component 'splitter' (<haystack.components.preprocessors.document_splitter.DocumentSplitter object at 0x7c3068256410>

Inputs:
  - documents: List[Document]
Outputs:
  - documents: List[Document])
DEBUG:haystack.core.pipeline.base:Adding component 'ranker' (<haystack_integrations.components.rankers.cohere.ranker.CohereRanker object at 0x7c3068256290>

Inputs:
  - query: str
  - documents: List[Document]
  - top_k: Optional[int]
Outputs:
  - documents: List[Document])
DEBUG:haystack.core.pipeline.base:Adding component 'prompt_builder' (<haystack.components.builders.chat_prompt_builder.ChatPromptBuilder object at 0x7c3068263370>

Inputs:
  - query: Any
  - documents: Any
  - template: Optional[List[ChatMessage]]
  - template_variables: Optional[Dict[str, Any]]
Outputs:
  - prompt: List[ChatMessage])
DEBUG:haystack.core.pipeline.base:Adding component 'generator' (<haystack.components.generators.chat.openai.OpenAIChatGenerator object at 0x7c3068261db0>

Inputs:
  - messages: List[ChatMessage]
  - streaming_callback: Optional[Callable[]]
  - generation_kwargs: Optional[Dict[str, Any]]
Outputs:
  - replies: List[ChatMessage])
DEBUG:haystack.core.pipeline.base:Connecting 'fetcher.streams' to 'converter.sources'
DEBUG:haystack.core.pipeline.base:Connecting 'converter.documents' to 'splitter.documents'
DEBUG:haystack.core.pipeline.base:Connecting 'splitter.documents' to 'ranker.documents'
DEBUG:haystack.core.pipeline.base:Connecting 'ranker.documents' to 'prompt_builder.documents'
DEBUG:haystack.core.pipeline.base:Connecting 'prompt_builder.prompt' to 'generator.messages'

<haystack.core.pipeline.pipeline.Pipeline object at 0x7c3068256200>
🚅 Components
  - fetcher: LinkContentFetcher
  - converter: HTMLToDocument
  - splitter: DocumentSplitter
  - ranker: CohereRanker
  - prompt_builder: ChatPromptBuilder
  - generator: OpenAIChatGenerator
🛤️ Connections
  - fetcher.streams -> converter.sources (List[ByteStream])
  - converter.documents -> splitter.documents (List[Document])
  - splitter.documents -> ranker.documents (List[Document])
  - ranker.documents -> prompt_builder.documents (List[Document])
  - prompt_builder.prompt -> generator.messages (List[ChatMessage])

pipeline.show()

DEBUG:haystack.core.pipeline.draw:Mermaid diagram:

%%{ init: {'theme': 'neutral' } }%%

graph TD;

fetcher["<b>fetcher</b><br><small><i>LinkContentFetcher</i></small>"]:::component -- "streams -> sources<br><small><i>List[ByteStream]</i></small>" --> converter["<b>converter</b><br><small><i>HTMLToDocument<br><br>Optional inputs:<ul style='text-align:left;'><li>meta (Union[Dict[str, Any], List[Dict[str, Any]]])</li><li>extraction_kwargs (Optional[Dict[str, Any]])</li></ul></i></small>"]:::component
converter["<b>converter</b><br><small><i>HTMLToDocument<br><br>Optional inputs:<ul style='text-align:left;'><li>meta (Union[Dict[str, Any], List[Dict[str, Any]]])</li><li>extraction_kwargs (Optional[Dict[str, Any]])</li></ul></i></small>"]:::component -- "documents -> documents<br><small><i>List[Document]</i></small>" --> splitter["<b>splitter</b><br><small><i>DocumentSplitter</i></small>"]:::component
splitter["<b>splitter</b><br><small><i>DocumentSplitter</i></small>"]:::component -- "documents -> documents<br><small><i>List[Document]</i></small>" --> ranker["<b>ranker</b><br><small><i>CohereRanker<br><br>Optional inputs:<ul style='text-align:left;'><li>top_k (Optional[int])</li></ul></i></small>"]:::component
ranker["<b>ranker</b><br><small><i>CohereRanker<br><br>Optional inputs:<ul style='text-align:left;'><li>top_k (Optional[int])</li></ul></i></small>"]:::component -. "documents -> documents (opt.)<br><small><i>List[Document]</i></small>" .-> prompt_builder["<b>prompt_builder</b><br><small><i>ChatPromptBuilder<br><br>Optional inputs:<ul style='text-align:left;'><li>query (Any)</li><li>template (Optional[List[ChatMessage]])</li><li>template_variables (Optional[Dict[str, Any]])</li></ul></i></small>"]:::component
prompt_builder["<b>prompt_builder</b><br><small><i>ChatPromptBuilder<br><br>Optional inputs:<ul style='text-align:left;'><li>query (Any)</li><li>template (Optional[List[ChatMessage]])</li><li>template_variables (Optional[Dict[str, Any]])</li></ul></i></small>"]:::component -- "prompt -> messages<br><small><i>List[ChatMessage]</i></small>" --> generator["<b>generator</b><br><small><i>OpenAIChatGenerator<br><br>Optional inputs:<ul style='text-align:left;'><li>streaming_callback (Optional[Callable[]])</li><li>generation_kwargs (Optional[Dict[str, Any]])</li></ul></i></small>"]:::component
i{&ast;}--"urls<br><small><i>List[str]</i></small>"--> fetcher["<b>fetcher</b><br><small><i>LinkContentFetcher</i></small>"]:::component
i{&ast;}--"query<br><small><i>str</i></small>"--> ranker["<b>ranker</b><br><small><i>CohereRanker<br><br>Optional inputs:<ul style='text-align:left;'><li>top_k (Optional[int])</li></ul></i></small>"]:::component
generator["<b>generator</b><br><small><i>OpenAIChatGenerator<br><br>Optional inputs:<ul style='text-align:left;'><li>streaming_callback (Optional[Callable[]])</li><li>generation_kwargs (Optional[Dict[str, Any]])</li></ul></i></small>"]:::component--"replies<br><small><i>List[ChatMessage]</i></small>"--> o{&ast;}

classDef component text-align:center;

DEBUG:haystack.core.pipeline.draw:Rendering graph at https://mermaid.ink/img/CiUleyBpbml0OiB7J3RoZW1lJzogJ25ldXRyYWwnIH0gfSUlCgpncmFwaCBURDsKCmZldGNoZXJbIjxiPmZldGNoZXI8L2I+PGJyPjxzbWFsbD48aT5MaW5rQ29udGVudEZldGNoZXI8L2k+PC9zbWFsbD4iXTo6OmNvbXBvbmVudCAtLSAic3RyZWFtcyAtPiBzb3VyY2VzPGJyPjxzbWFsbD48aT5MaXN0W0J5dGVTdHJlYW1dPC9pPjwvc21hbGw+IiAtLT4gY29udmVydGVyWyI8Yj5jb252ZXJ0ZXI8L2I+PGJyPjxzbWFsbD48aT5IVE1MVG9Eb2N1bWVudDxicj48YnI+T3B0aW9uYWwgaW5wdXRzOjx1bCBzdHlsZT0ndGV4dC1hbGlnbjpsZWZ0Oyc+PGxpPm1ldGEgKFVuaW9uW0RpY3Rbc3RyLCBBbnldLCBMaXN0W0RpY3Rbc3RyLCBBbnldXV0pPC9saT48bGk+ZXh0cmFjdGlvbl9rd2FyZ3MgKE9wdGlvbmFsW0RpY3Rbc3RyLCBBbnldXSk8L2xpPjwvdWw+PC9pPjwvc21hbGw+Il06Ojpjb21wb25lbnQKY29udmVydGVyWyI8Yj5jb252ZXJ0ZXI8L2I+PGJyPjxzbWFsbD48aT5IVE1MVG9Eb2N1bWVudDxicj48YnI+T3B0aW9uYWwgaW5wdXRzOjx1bCBzdHlsZT0ndGV4dC1hbGlnbjpsZWZ0Oyc+PGxpPm1ldGEgKFVuaW9uW0RpY3Rbc3RyLCBBbnldLCBMaXN0W0RpY3Rbc3RyLCBBbnldXV0pPC9saT48bGk+ZXh0cmFjdGlvbl9rd2FyZ3MgKE9wdGlvbmFsW0RpY3Rbc3RyLCBBbnldXSk8L2xpPjwvdWw+PC9pPjwvc21hbGw+Il06Ojpjb21wb25lbnQgLS0gImRvY3VtZW50cyAtPiBkb2N1bWVudHM8YnI+PHNtYWxsPjxpPkxpc3RbRG9jdW1lbnRdPC9pPjwvc21hbGw+IiAtLT4gc3BsaXR0ZXJbIjxiPnNwbGl0dGVyPC9iPjxicj48c21hbGw+PGk+RG9jdW1lbnRTcGxpdHRlcjwvaT48L3NtYWxsPiJdOjo6Y29tcG9uZW50CnNwbGl0dGVyWyI8Yj5zcGxpdHRlcjwvYj48YnI+PHNtYWxsPjxpPkRvY3VtZW50U3BsaXR0ZXI8L2k+PC9zbWFsbD4iXTo6OmNvbXBvbmVudCAtLSAiZG9jdW1lbnRzIC0+IGRvY3VtZW50czxicj48c21hbGw+PGk+TGlzdFtEb2N1bWVudF08L2k+PC9zbWFsbD4iIC0tPiByYW5rZXJbIjxiPnJhbmtlcjwvYj48YnI+PHNtYWxsPjxpPkNvaGVyZVJhbmtlcjxicj48YnI+T3B0aW9uYWwgaW5wdXRzOjx1bCBzdHlsZT0ndGV4dC1hbGlnbjpsZWZ0Oyc+PGxpPnRvcF9rIChPcHRpb25hbFtpbnRdKTwvbGk+PC91bD48L2k+PC9zbWFsbD4iXTo6OmNvbXBvbmVudApyYW5rZXJbIjxiPnJhbmtlcjwvYj48YnI+PHNtYWxsPjxpPkNvaGVyZVJhbmtlcjxicj48YnI+T3B0aW9uYWwgaW5wdXRzOjx1bCBzdHlsZT0ndGV4dC1hbGlnbjpsZWZ0Oyc+PGxpPnRvcF9rIChPcHRpb25hbFtpbnRdKTwvbGk+PC91bD48L2k+PC9zbWFsbD4iXTo6OmNvbXBvbmVudCAtLiAiZG9jdW1lbnRzIC0+IGRvY3VtZW50cyAob3B0Lik8YnI+PHNtYWxsPjxpPkxpc3RbRG9jdW1lbnRdPC9pPjwvc21hbGw+IiAuLT4gcHJvbXB0X2J1aWxkZXJbIjxiPnByb21wdF9idWlsZGVyPC9iPjxicj48c21hbGw+PGk+Q2hhdFByb21wdEJ1aWxkZXI8YnI+PGJyPk9wdGlvbmFsIGlucHV0czo8dWwgc3R5bGU9J3RleHQtYWxpZ246bGVmdDsnPjxsaT5xdWVyeSAoQW55KTwvbGk+PGxpPnRlbXBsYXRlIChPcHRpb25hbFtMaXN0W0NoYXRNZXNzYWdlXV0pPC9saT48bGk+dGVtcGxhdGVfdmFyaWFibGVzIChPcHRpb25hbFtEaWN0W3N0ciwgQW55XV0pPC9saT48L3VsPjwvaT48L3NtYWxsPiJdOjo6Y29tcG9uZW50CnByb21wdF9idWlsZGVyWyI8Yj5wcm9tcHRfYnVpbGRlcjwvYj48YnI+PHNtYWxsPjxpPkNoYXRQcm9tcHRCdWlsZGVyPGJyPjxicj5PcHRpb25hbCBpbnB1dHM6PHVsIHN0eWxlPSd0ZXh0LWFsaWduOmxlZnQ7Jz48bGk+cXVlcnkgKEFueSk8L2xpPjxsaT50ZW1wbGF0ZSAoT3B0aW9uYWxbTGlzdFtDaGF0TWVzc2FnZV1dKTwvbGk+PGxpPnRlbXBsYXRlX3ZhcmlhYmxlcyAoT3B0aW9uYWxbRGljdFtzdHIsIEFueV1dKTwvbGk+PC91bD48L2k+PC9zbWFsbD4iXTo6OmNvbXBvbmVudCAtLSAicHJvbXB0IC0+IG1lc3NhZ2VzPGJyPjxzbWFsbD48aT5MaXN0W0NoYXRNZXNzYWdlXTwvaT48L3NtYWxsPiIgLS0+IGdlbmVyYXRvclsiPGI+Z2VuZXJhdG9yPC9iPjxicj48c21hbGw+PGk+T3BlbkFJQ2hhdEdlbmVyYXRvcjxicj48YnI+T3B0aW9uYWwgaW5wdXRzOjx1bCBzdHlsZT0ndGV4dC1hbGlnbjpsZWZ0Oyc+PGxpPnN0cmVhbWluZ19jYWxsYmFjayAoT3B0aW9uYWxbQ2FsbGFibGVbXV0pPC9saT48bGk+Z2VuZXJhdGlvbl9rd2FyZ3MgKE9wdGlvbmFsW0RpY3Rbc3RyLCBBbnldXSk8L2xpPjwvdWw+PC9pPjwvc21hbGw+Il06Ojpjb21wb25lbnQKaXsmYXN0O30tLSJ1cmxzPGJyPjxzbWFsbD48aT5MaXN0W3N0cl08L2k+PC9zbWFsbD4iLS0+IGZldGNoZXJbIjxiPmZldGNoZXI8L2I+PGJyPjxzbWFsbD48aT5MaW5rQ29udGVudEZldGNoZXI8L2k+PC9zbWFsbD4iXTo6OmNvbXBvbmVudAppeyZhc3Q7fS0tInF1ZXJ5PGJyPjxzbWFsbD48aT5zdHI8L2k+PC9zbWFsbD4iLS0+IHJhbmtlclsiPGI+cmFua2VyPC9iPjxicj48c21hbGw+PGk+Q29oZXJlUmFua2VyPGJyPjxicj5PcHRpb25hbCBpbnB1dHM6PHVsIHN0eWxlPSd0ZXh0LWFsaWduOmxlZnQ7Jz48bGk+dG9wX2sgKE9wdGlvbmFsW2ludF0pPC9saT48L3VsPjwvaT48L3NtYWxsPiJdOjo6Y29tcG9uZW50CmdlbmVyYXRvclsiPGI+Z2VuZXJhdG9yPC9iPjxicj48c21hbGw+PGk+T3BlbkFJQ2hhdEdlbmVyYXRvcjxicj48YnI+T3B0aW9uYWwgaW5wdXRzOjx1bCBzdHlsZT0ndGV4dC1hbGlnbjpsZWZ0Oyc+PGxpPnN0cmVhbWluZ19jYWxsYmFjayAoT3B0aW9uYWxbQ2FsbGFibGVbXV0pPC9saT48bGk+Z2VuZXJhdGlvbl9rd2FyZ3MgKE9wdGlvbmFsW0RpY3Rbc3RyLCBBbnldXSk8L2xpPjwvdWw+PC9pPjwvc21hbGw+Il06Ojpjb21wb25lbnQtLSJyZXBsaWVzPGJyPjxzbWFsbD48aT5MaXN0W0NoYXRNZXNzYWdlXTwvaT48L3NtYWxsPiItLT4gb3smYXN0O30KCmNsYXNzRGVmIGNvbXBvbmVudCB0ZXh0LWFsaWduOmNlbnRlcjsK?type=png

## Add parameters
for query in queries:
    result = pipeline.run(
        data={
            "fetcher": {"urls": urls},
            "ranker": {"query": query, "top_k": 2},
            "prompt_builder": {
                "template_variables": {"query": query},
                "template": messages,
            },
        }
    )
    print(f"\n\n{query=}")
    # Print the LLM response
    print(result["generator"]["replies"][0].content)
    print("-" * 100)

INFO:haystack.core.pipeline.pipeline:Running component fetcher
INFO:haystack.core.pipeline.pipeline:Running component converter
INFO:haystack.core.pipeline.pipeline:Running component splitter
INFO:haystack.core.pipeline.pipeline:Running component ranker
INFO:haystack.core.pipeline.pipeline:Running component prompt_builder
INFO:haystack.core.pipeline.pipeline:Running component generator
INFO:haystack.core.pipeline.pipeline:Running component fetcher
INFO:haystack.core.pipeline.pipeline:Running component converter



query='What is the difference between metadata filtering and metadata enrichment?'
No answer.
----------------------------------------------------------------------------------------------------

INFO:haystack.core.pipeline.pipeline:Running component splitter
INFO:haystack.core.pipeline.pipeline:Running component ranker
INFO:haystack.core.pipeline.pipeline:Running component prompt_builder
INFO:haystack.core.pipeline.pipeline:Running component generator
INFO:haystack.core.pipeline.pipeline:Running component fetcher
INFO:haystack.core.pipeline.pipeline:Running component converter



query='Which methods can I use to transform query for better retrieval?'
You can use the following methods to transform queries for better retrieval:

1. **Query Decomposition**: Break down the original question into smaller sub-questions that can be answered independently.
2. **Query Expansion**: Expand keyword queries to improve recall, thus retrieving more relevant context.

Used Document Links:
- https://haystack.deepset.ai/blog/query-decomposition
- https://haystack.deepset.ai/blog/query-expansion
----------------------------------------------------------------------------------------------------

INFO:haystack.core.pipeline.pipeline:Running component splitter
INFO:haystack.core.pipeline.pipeline:Running component ranker
INFO:haystack.core.pipeline.pipeline:Running component prompt_builder
INFO:haystack.core.pipeline.pipeline:Running component generator
INFO:haystack.core.pipeline.pipeline:Running component fetcher
INFO:haystack.core.pipeline.pipeline:Running component converter



query='How can I use metadata to improve retrieval?'
You can use metadata to improve retrieval by implementing metadata filtering, which narrows down the search space based on specific attributes such as domain, source, date, or topic. This enhances the relevance, accuracy, and efficiency of the information being retrieved, ensuring that the generated responses are aligned with the specific query or topic.

URL: https://haystack.deepset.ai/blog/extracting-metadata-filter
----------------------------------------------------------------------------------------------------

INFO:haystack.core.pipeline.pipeline:Running component splitter
INFO:haystack.core.pipeline.pipeline:Running component ranker
INFO:haystack.core.pipeline.pipeline:Running component prompt_builder
INFO:haystack.core.pipeline.pipeline:Running component generator



query="What's preprocessing?"
no answer

Used document links: 
- https://haystack.deepset.ai/blog/extracting-metadata-filter
- https://haystack.deepset.ai/blog/query-decomposition
----------------------------------------------------------------------------------------------------