How-To Guides Overview¶
This section provides focused, task-based guides for working with rs_document. Each guide covers a specific workflow or use case with practical examples.
Getting Started¶
- Loading Documents - Create documents from files and manage metadata
- Cleaning Tasks - Remove bullets, ligatures, and special characters
- Splitting Tasks - Split documents with different strategies
Advanced Workflows¶
- Batch Operations - Process multiple documents efficiently
- Vector DB Preparation - Prepare documents for embedding and retrieval
- LangChain Integration - Use rs_document with LangChain
Quick Examples¶
Basic Document Processing¶
from rs_document import Document
# Create, clean, and split a document
doc = Document(
page_content="Your text here...",
metadata={"source": "example.txt"}
)
doc.clean()
chunks = doc.recursive_character_splitter(1000)
Batch Processing¶
from rs_document import clean_and_split_docs, Document
# Process multiple documents at once
documents = [...] # Your documents
chunks = clean_and_split_docs(documents, chunk_size=1000)
Common Use Cases¶
RAG Applications¶
Text Processing Pipeline¶
- Create documents with metadata
- Apply specific cleaners
- Split with context overlap
- Filter and organize chunks
Need More Help?¶
- See the API Reference for detailed method documentation
- Check the Tutorial to learn the basics first