Web Search and Summarization¶
Search the web for any topic, collect the top results, and distil them into a structured summary using an LLM. The result is persisted to the blackboard so downstream pipelines can read it.
Tools used: search_web, llm_job, store
Requires: LLM API key (OPENAI_API_KEY or equivalent)
The pipeline¶
pipeline:
id: web_search_summarize
goal: "Search for '{{params.topic}}' and produce a structured summary"
params:
topic:
type: string
description: "Search topic or question"
num_results:
type: integer
description: "Number of web results to retrieve"
default: 5
store_key:
type: string
description: "Blackboard key to write the summary to"
default: web_summary
tasks:
- id: search
tool: search_web
inputs:
query: "{{params.topic}}"
top_n: "{{params.num_results}}"
- id: summarize
tool: llm_job
inputs:
prompt: |
You are a research assistant. Review the web search results below and
produce a concise summary with three sections:
1. **Key findings** — 3–5 bullet points distilling the most important facts
2. **Notable sources** — list the 2–3 most authoritative URLs
3. **Gaps** — what the results do not cover or answer
Keep each section brief and factual. Do not pad with filler.
results: "{{search.output.results}}"
- id: persist
tool: store
inputs:
key: "{{params.store_key}}"
value: "{{summarize.output}}"
How it works¶
search_webqueries the web (DuckDuckGo by default; SerpAPI if configured) and returns a list of{title, snippet, url, source_query}objects.llm_jobreceives the results list as theresultscontext key. Trellis prepends it to the prompt automatically, so the model sees the raw result objects alongside the instruction.storewrites the LLM response string to the blackboard under the configured key.
The three tasks form a linear chain: each wave runs one task. summarize cannot start until search has completed because {{search.output.results}} must be resolved first.
Run it — Python SDK¶
Save the YAML above as pipelines/web_search_summarize.yaml, then:
import asyncio
from trellis.models.pipeline import Pipeline
from trellis.execution.orchestrator import Orchestrator
async def main():
pipeline = Pipeline.from_yaml_file("pipelines/web_search_summarize.yaml")
orch = Orchestrator()
result = await orch.run_pipeline(
pipeline,
params={
"topic": "Apple Inc earnings Q1 2025",
"num_results": 5,
},
)
# The LLM summary string
print(result.outputs["summarize"])
# What was written to the blackboard
print(result.outputs["persist"])
asyncio.run(main())
Or define the pipeline inline without a file:
import asyncio
from trellis.models.pipeline import Pipeline
from trellis.execution.orchestrator import Orchestrator
YAML = """
pipeline:
id: web_search_summarize
goal: Search and summarize
params:
topic:
type: string
tasks:
- id: search
tool: search_web
inputs:
query: "{{params.topic}}"
top_n: 5
- id: summarize
tool: llm_job
inputs:
prompt: "Summarize these results in 3 bullet points."
results: "{{search.output.results}}"
"""
async def main():
pipeline = Pipeline.from_yaml(YAML)
orch = Orchestrator()
result = await orch.run_pipeline(pipeline, params={"topic": "Python 3.13 release"})
print(result.outputs["summarize"])
asyncio.run(main())
Run it — CLI¶
trellis run pipelines/web_search_summarize.yaml \
--params '{"topic": "Apple Inc earnings Q1 2025", "num_results": 5}'
Expected output¶
result.outputs is keyed by task ID:
{
"search": {
"status": "success",
"results": [
{
"title": "Apple Reports First Quarter Results",
"snippet": "Apple today announced financial results for its fiscal 2025 first quarter...",
"url": "https://www.apple.com/newsroom/2025/01/apple-reports-first-quarter-results/",
"source_query": "Apple Inc earnings Q1 2025"
},
{
"title": "AAPL Q1 2025 Earnings Beat Estimates",
"snippet": "Apple's revenue came in at $124.3 billion, ahead of analyst expectations...",
"url": "https://finance.yahoo.com/...",
"source_query": "Apple Inc earnings Q1 2025"
},
# ... up to num_results entries
]
},
"summarize": (
"**Key findings**\n"
"- Apple reported Q1 2025 revenue of $124.3B, up 4% year-over-year\n"
"- iPhone revenue of $69.1B drove the quarter; Services hit a record $26.3B\n"
"- EPS of $2.40 beat the consensus estimate of $2.35\n\n"
"**Notable sources**\n"
"- https://www.apple.com/newsroom/2025/01/apple-reports-first-quarter-results/\n"
"- https://finance.yahoo.com/...\n\n"
"**Gaps**\n"
"- Results don't cover geographic revenue breakdown or Vision Pro sell-through"
),
"persist": {
"status": "success",
"key": "web_summary",
"append": False,
"value": "**Key findings**\n..."
}
}
result.waves_executed will be 3 — one wave per task since each depends on the previous.
Variations¶
Multiple queries in parallel
Fan out a list of queries and run them concurrently:
params:
queries:
type: list
tasks:
- id: searches
tool: search_web
parallel_over: "{{params.queries}}"
inputs:
query: "{{item}}"
top_n: 3
result.outputs["searches"] is a list with one result dict per query, in the same order as params.queries.
Use SerpAPI instead of DuckDuckGo
Set SERPAPI_KEY in your environment or .env file.
Next steps¶
- PDF Ingest & Extraction — apply similar LLM processing to document content
- SEC Filing Extraction — structured field extraction against a schema