Data Sources

Overview

Data sources are the foundation of your AI agent’s knowledge base. They provide the information the agent uses to answer customer questions.

Types of Data Sources

Files

Upload documents (PDF, Word, text, etc.) that are automatically processed: Supported formats:

PDF (application/pdf)
Word Documents (.docx, .doc)
Text files (.txt)
HTML (.html)
Markdown (.md)
PowerPoint (.pptx, .ppt)

File size limit: 50MB

Q&A Pairs

Create question-answer pairs for common queries:

Text Content

Add custom text snippets:

Websites

Scrape website content

Adding Data Sources

Via Web Interface

Navigate to Agents → Your Agent → Data Sources
Choose data source type
Upload file, add Q&A, or enter text
Data source is automatically linked to agent

Processing Status

File data sources track processing status:

file_ds = FileDataSource.find(id)
metadata = file_ds.metadata

# Check status
metadata["processing_status"]  # "processing", "completed", "failed"

# Results
metadata["chunk_count"]        # Number of chunks uploaded
metadata["total_chunks"]       # Total chunks extracted
metadata["error"]              # Error message if failed

Querying Data Sources

Once processed, data sources are searchable via vector search:

# Get Turbopuffer service
turbopuffer = TurbopufferService.for_organization(organization_id)

# Search for similar content
result = turbopuffer.query_similar(
  "What is the revenue forecast?",
  top_k: 10
)

# Filter by specific file
result = turbopuffer.query_similar(
  "What is the revenue forecast?",
  top_k: 10,
  filters: { file_id: ["Equals", file_data_source.id] }
)

Best Practices

Organize by topic: Group related data sources together
Keep content updated: Update data sources as information changes
Use Q&As for common questions: Faster and more accurate than searching documents
Test after adding: Verify agent can find and use new content
Monitor processing: Check file processing status before relying on new files

Troubleshooting

Files not processing

Check SolidQueue is running: bin/rails solid_queue:start
Verify environment variables: UNSTRUCTURED_API_KEY, OPENAI_API_KEY, TURBOPUFFER_API_KEY
Check job dashboard: http://localhost:3000/jobs

Agent not finding content

Verify data source is linked to agent
Check processing status is “completed”
Test vector search directly
Review agent’s knowledge base in UI

Next Steps

Procedures

Create structured procedures with conditional logic

File Processing Setup

Configure file processing pipeline

Getting Started

Widget Integration

AI Agents

Channels

Escalations

Integrations

Setup & Configuration

Overview

Types of Data Sources

Files

Q&A Pairs

Text Content

Websites

Adding Data Sources

Via Web Interface

Processing Status

Querying Data Sources

Best Practices

Troubleshooting

Files not processing

Agent not finding content

Next Steps

Procedures

File Processing Setup

Getting Started

Widget Integration

AI Agents

Channels

Escalations

Integrations

Setup & Configuration

​Overview

​Types of Data Sources

​Files

​Q&A Pairs

​Text Content

​Websites

​Adding Data Sources

​Via Web Interface

​Processing Status

​Querying Data Sources

​Best Practices

​Troubleshooting

​Files not processing

​Agent not finding content

​Next Steps

Procedures

File Processing Setup

Overview

Types of Data Sources

Files

Q&A Pairs

Text Content

Websites

Adding Data Sources

Via Web Interface

Processing Status

Querying Data Sources

Best Practices

Troubleshooting

Files not processing

Agent not finding content

Next Steps