Overview
When a file is uploaded to Active Storage (S3), the system automatically:- Uploads to S3 via Active Storage
- Triggers background job
ProcessFileJobon thefile_processingqueue - Parses document using Unstructured.io API
- Generates embeddings using OpenAI
- Uploads vectors to Turbopuffer in batches (default: 100)
Environment Variables
Required Variables
Optional Worker Concurrency
Setting Secrets in Production (Fly.io)
Supported File Types
- PDF:
application/pdf - Word Documents:
.docx,.doc - Text:
.txt - HTML:
.html - Markdown:
.md - PowerPoint:
.pptx,.ppt
Usage
Automatic Processing (Default)
Files are automatically processed when uploaded:Manual Processing
You can manually trigger processing:Checking Processing Status
Querying Uploaded Chunks
Once processed, you can query the chunks:Monitoring
View Logs
Troubleshooting
Files not processing
- Check Solid Queue: Ensure
bin/rails solid_queue:startis running - Check environment variables: Verify all API keys are set
- Check job dashboard: Visit
http://localhost:3000/jobs
Processing failed
- Check API keys: Verify
UNSTRUCTURED_API_KEY,OPENAI_API_KEY,TURBOPUFFER_API_KEY - Check file size: Files over 50MB are automatically discarded
- Check file type: Verify file type is supported
- Review logs: Check Rails logs for specific errors
Performance Optimization
Batch Size Tuning
Adjust batch size based on:- Smaller batches (50-75): Faster feedback, less memory
- Larger batches (100-200): Better throughput, more memory
Concurrency Settings
For high-volume processing:Cost Estimation
Per 1000-Page Document
- Unstructured.io: ~0.50 per document
- OpenAI Embeddings: ~$0.01 per 1000 pages
- Turbopuffer: ~$0.0004/month storage
Next Steps
Billing Setup
Configure credit-based billing