Skip to content

Web Search and Summarization

Search the web for any topic, collect the top results, and distil them into a structured summary using an LLM. The result is persisted to the blackboard so downstream pipelines can read it.

Tools used: search_web, llm_job, store
Requires: LLM API key (OPENAI_API_KEY or equivalent)


The pipeline

pipelines/web_search_summarize.yaml
pipeline:
  id: web_search_summarize
  goal: "Search for '{{params.topic}}' and produce a structured summary"

  params:
    topic:
      type: string
      description: "Search topic or question"
    num_results:
      type: integer
      description: "Number of web results to retrieve"
      default: 5
    store_key:
      type: string
      description: "Blackboard key to write the summary to"
      default: web_summary

  tasks:
    - id: search
      tool: search_web
      inputs:
        query: "{{params.topic}}"
        top_n: "{{params.num_results}}"

    - id: summarize
      tool: llm_job
      inputs:
        prompt: |
          You are a research assistant. Review the web search results below and
          produce a concise summary with three sections:

          1. **Key findings** — 3–5 bullet points distilling the most important facts
          2. **Notable sources** — list the 2–3 most authoritative URLs
          3. **Gaps** — what the results do not cover or answer

          Keep each section brief and factual. Do not pad with filler.
        results: "{{search.output.results}}"

    - id: persist
      tool: store
      inputs:
        key: "{{params.store_key}}"
        value: "{{summarize.output}}"

How it works

  • search_web queries the web (DuckDuckGo by default; SerpAPI if configured) and returns a list of {title, snippet, url, source_query} objects.
  • llm_job receives the results list as the results context key. Trellis prepends it to the prompt automatically, so the model sees the raw result objects alongside the instruction.
  • store writes the LLM response string to the blackboard under the configured key.

The three tasks form a linear chain: each wave runs one task. summarize cannot start until search has completed because {{search.output.results}} must be resolved first.


Run it — Python SDK

Save the YAML above as pipelines/web_search_summarize.yaml, then:

import asyncio
from trellis.models.pipeline import Pipeline
from trellis.execution.orchestrator import Orchestrator

async def main():
    pipeline = Pipeline.from_yaml_file("pipelines/web_search_summarize.yaml")
    orch = Orchestrator()

    result = await orch.run_pipeline(
        pipeline,
        params={
            "topic": "Apple Inc earnings Q1 2025",
            "num_results": 5,
        },
    )

    # The LLM summary string
    print(result.outputs["summarize"])

    # What was written to the blackboard
    print(result.outputs["persist"])

asyncio.run(main())

Or define the pipeline inline without a file:

import asyncio
from trellis.models.pipeline import Pipeline
from trellis.execution.orchestrator import Orchestrator

YAML = """
pipeline:
  id: web_search_summarize
  goal: Search and summarize
  params:
    topic:
      type: string
  tasks:
    - id: search
      tool: search_web
      inputs:
        query: "{{params.topic}}"
        top_n: 5
    - id: summarize
      tool: llm_job
      inputs:
        prompt: "Summarize these results in 3 bullet points."
        results: "{{search.output.results}}"
"""

async def main():
    pipeline = Pipeline.from_yaml(YAML)
    orch = Orchestrator()
    result = await orch.run_pipeline(pipeline, params={"topic": "Python 3.13 release"})
    print(result.outputs["summarize"])

asyncio.run(main())

Run it — CLI

trellis run pipelines/web_search_summarize.yaml \
  --params '{"topic": "Apple Inc earnings Q1 2025", "num_results": 5}'

Expected output

result.outputs is keyed by task ID:

{
    "search": {
        "status": "success",
        "results": [
            {
                "title": "Apple Reports First Quarter Results",
                "snippet": "Apple today announced financial results for its fiscal 2025 first quarter...",
                "url": "https://www.apple.com/newsroom/2025/01/apple-reports-first-quarter-results/",
                "source_query": "Apple Inc earnings Q1 2025"
            },
            {
                "title": "AAPL Q1 2025 Earnings Beat Estimates",
                "snippet": "Apple's revenue came in at $124.3 billion, ahead of analyst expectations...",
                "url": "https://finance.yahoo.com/...",
                "source_query": "Apple Inc earnings Q1 2025"
            },
            # ... up to num_results entries
        ]
    },
    "summarize": (
        "**Key findings**\n"
        "- Apple reported Q1 2025 revenue of $124.3B, up 4% year-over-year\n"
        "- iPhone revenue of $69.1B drove the quarter; Services hit a record $26.3B\n"
        "- EPS of $2.40 beat the consensus estimate of $2.35\n\n"
        "**Notable sources**\n"
        "- https://www.apple.com/newsroom/2025/01/apple-reports-first-quarter-results/\n"
        "- https://finance.yahoo.com/...\n\n"
        "**Gaps**\n"
        "- Results don't cover geographic revenue breakdown or Vision Pro sell-through"
    ),
    "persist": {
        "status": "success",
        "key": "web_summary",
        "append": False,
        "value": "**Key findings**\n..."
    }
}

result.waves_executed will be 3 — one wave per task since each depends on the previous.


Variations

Multiple queries in parallel

Fan out a list of queries and run them concurrently:

params:
  queries:
    type: list

tasks:
  - id: searches
    tool: search_web
    parallel_over: "{{params.queries}}"
    inputs:
      query: "{{item}}"
      top_n: 3

result.outputs["searches"] is a list with one result dict per query, in the same order as params.queries.

Use SerpAPI instead of DuckDuckGo

- id: search
  tool: search_web
  inputs:
    query: "{{params.topic}}"
    provider: serpapi
    top_n: 10

Set SERPAPI_KEY in your environment or .env file.


Next steps