Configuration

Customize LLMs, tools, memory, and output behavior.

Environment Variables

All configuration is managed via .env file (copy from .env.example).

Core Configuration

# LLM Selection (required)
OPENROUTER_API_KEY=sk-your-api-key
OPENROUTER_MODEL_NAME=openai/gpt-4-turbo

# Web Search (required)
EXA_API_KEY=your-exa-api-key

# Optional: Embeddings for memory
GOOGLE_API_KEY=your-google-api-key  # For Google Gemini embeddings

LLM Selection

Choose your LLM based on quality vs. cost tradeoff.

Fast Research (Cost-Optimized)

OPENROUTER_MODEL_NAME=openrouter/openai/gpt-4o-mini

Speed: Fast pipeline execution
Cost: Lowest
Quality: Good for initial exploration, may have more hallucinations
Best for: Quick research, exploratory passes

Balanced Research

OPENROUTER_MODEL_NAME=openrouter/openai/gpt-4o

- **Speed:** Moderate
- **Cost:** Medium
- **Quality:** Reliable, good citation accuracy
- **Best for:** Most production research

High-Accuracy Research

OPENROUTER_MODEL_NAME=openrouter/anthropic/claude-3-opus

Speed: Slower
Cost: Higher
Quality: Excellent, minimal hallucinations
Best for: Critical research, high-stakes decisions

Cutting-Edge Models

OPENROUTER_MODEL_NAME=openrouter/openai/gpt-4o  # Latest GPT-4 Omni
OPENROUTER_MODEL_NAME=openrouter/anthropic/claude-3.5-sonnet  # Latest Claude

Speed: Varies
Cost: Varies
Quality: State-of-the-art
Best for: Maximum quality requirements

Find more models: OpenRouter models list Free models: OpenRouter free models

Memory Configuration

Memory is handled automatically, but you can customize:

Enable/Disable Memory

In crew.py, modify the Crew initialization:

@crew
def crew(self) -> Crew:
    return Crew(
        agents=self.agents,
        tasks=self.tasks,
        memory=True,              # Enable long-term memory
        verbose=True,
        embedder={               # Embeddings for memory
            "provider": "google",
            "config": {"model": "models/embedding-001"}
        }
    )

Memory stores:

Prior research outputs
Extracted claims and URLs
Deduplication history

Memory benefits:

Avoids re-crawling same sources
Provides context for iterative research
Improves subsequent run quality

Web Search Configuration

ResearchCrew uses EXASearch for semantic web search.

Enable Quality Filters

In tools/ai_tools.py, quality filters are already enabled:

# Excluded domains (unreliable sources)
EXCLUDED_DOMAINS = [
    "medium.com",
    "reddit.com",
    "stackoverflow.com",  # For research, not coding Q&A
    "twitter.com",
    "linkedin.com",
    "youtube.com",
]

Modify this list to add/remove excluded domains.

Search Results

By default, web crawler returns 3-5 URLs per search query. To customize:

In tasks.yaml, modify the web crawler task description:

web_crawler_task:
  description: |
    Find 3-5 high-quality URLs using semantic search.
    (Change this number as needed)
  # ...

Output Configuration

Report Location

By default, reports are saved to:

researchcrew/
├── outputs/
│   ├── 20250516.md # initial report
|   ├── 20250517.md # subsequent report (if running multiple rounds)
|   └── ...

To change output directory, modify in .env:

OUTPUT_DIR = "my_research_output"  # Change this path

Report Format

Reports are generated in markdown and saved into outputs/[yyyymmdd].md. The format is fixed (publication-ready), but you can customize via task descriptions in tasks.yaml.

Agent Customization

All agent behavior is defined in config/agents.yaml and config/tasks.yaml.

Modify Agent Instructions

Edit config/agents.yaml:

research_planner:
  role: Research Planner
  goal: Create comprehensive research plans    # Change goal
  backstory: You are an expert research strategist...  # Change backstory

Modify Task Instructions

Edit config/tasks.yaml:

research_planner_task:
  description: |
    Create a detailed research plan for: {{ topic }}

    Consider: (add custom considerations here)
    - Existing research
    - Knowledge gaps
    - Search strategy
  expected_output: |
    Structured research plan with:
    - Summary of prior research
    - Identified gaps
    - 2-3 search queries (customize this)
  agent: research_planner

Performance Tuning

Faster Runs

Use faster LLM:

OPENROUTER_MODEL_NAME=openrouter/openai/gpt-4o-mini

Reduce search scope (in tasks.yaml):

web_crawler_task:
  description: |
    Find 2-3 high-quality URLs per search query (reduced from 5)

Better Quality

Use better LLM:

OPENROUTER_MODEL_NAME=openrouter/anthropic/claude-3-opus

Increase search depth — Find more URLs per query
Use iterative refinement — Multiple rounds with feedback

Debugging Configuration

Verbose Output

Enable detailed logging in crew.py:

@crew
def crew(self) -> Crew:
    return Crew(
        # ...
        verbose=True,  # Shows agent thinking and reasoning
    )

Common Configuration Issues

"API key is invalid"
Verify .env file exists and is readable
Check API key format (starts with sk-)
Verify key is for correct provider
"Import error: No module named 'crewai'"
Reinstall: crewai install
Or: pip install crewai
"Rate limited by API"
Too many requests too quickly
Add delays between runs
Check your API quota/credits

Best Practices

Do:
Start with GPT-4o-mini (cheaper initial testing)
Switch to Claude 3 Opus for production research
Use iterative feedback for complex topics
Save .env in .gitignore (don't commit API keys!)
Don't:
Use free/unknown LLM APIs (quality/reliability unknown)
Keep API keys in code or version control
Run unlimited research runs without monitoring costs
Disable memory for multi-round research

Next Steps

Usage Guide — Single and multi-round workflows
Examples — See real research examples