Skip to main content

Overview

Data sources are the foundation of your AI agent’s knowledge base. They provide the information the agent uses to answer customer questions.

Types of Data Sources

Files

Upload documents (PDF, Word, text, etc.) that are automatically processed: Supported formats:
  • PDF (application/pdf)
  • Word Documents (.docx, .doc)
  • Text files (.txt)
  • HTML (.html)
  • Markdown (.md)
  • PowerPoint (.pptx, .ppt)
File size limit: 50MB

Q&A Pairs

Create question-answer pairs for common queries:

Text Content

Add custom text snippets:

Websites

Scrape website content

Adding Data Sources

Via Web Interface

  1. Navigate to AgentsYour AgentData Sources
  2. Choose data source type
  3. Upload file, add Q&A, or enter text
  4. Data source is automatically linked to agent

Processing Status

File data sources track processing status:
file_ds = FileDataSource.find(id)
metadata = file_ds.metadata

# Check status
metadata["processing_status"]  # "processing", "completed", "failed"

# Results
metadata["chunk_count"]        # Number of chunks uploaded
metadata["total_chunks"]       # Total chunks extracted
metadata["error"]              # Error message if failed

Querying Data Sources

Once processed, data sources are searchable via vector search:
# Get Turbopuffer service
turbopuffer = TurbopufferService.for_organization(organization_id)

# Search for similar content
result = turbopuffer.query_similar(
  "What is the revenue forecast?",
  top_k: 10
)

# Filter by specific file
result = turbopuffer.query_similar(
  "What is the revenue forecast?",
  top_k: 10,
  filters: { file_id: ["Equals", file_data_source.id] }
)

Best Practices

  1. Organize by topic: Group related data sources together
  2. Keep content updated: Update data sources as information changes
  3. Use Q&As for common questions: Faster and more accurate than searching documents
  4. Test after adding: Verify agent can find and use new content
  5. Monitor processing: Check file processing status before relying on new files

Troubleshooting

Files not processing

  • Check SolidQueue is running: bin/rails solid_queue:start
  • Verify environment variables: UNSTRUCTURED_API_KEY, OPENAI_API_KEY, TURBOPUFFER_API_KEY
  • Check job dashboard: http://localhost:3000/jobs

Agent not finding content

  • Verify data source is linked to agent
  • Check processing status is “completed”
  • Test vector search directly
  • Review agent’s knowledge base in UI

Next Steps

Procedures

Create structured procedures with conditional logic

File Processing Setup

Configure file processing pipeline