Skip to content

RS Document

How-To Guides Overview

How-To Guides Overview¶

This section provides focused, task-based guides for working with rs_document. Each guide covers a specific workflow or use case with practical examples.

Getting Started¶

Loading Documents - Create documents from files and manage metadata
Cleaning Tasks - Remove bullets, ligatures, and special characters
Splitting Tasks - Split documents with different strategies

Advanced Workflows¶

Batch Operations - Process multiple documents efficiently
Vector DB Preparation - Prepare documents for embedding and retrieval
LangChain Integration - Use rs_document with LangChain

Quick Examples¶

Basic Document Processing¶

from rs_document import Document

# Create, clean, and split a document
doc = Document(
    page_content="Your text here...",
    metadata={"source": "example.txt"}
)

doc.clean()
chunks = doc.recursive_character_splitter(1000)

Batch Processing¶

from rs_document import clean_and_split_docs, Document

# Process multiple documents at once
documents = [...]  # Your documents
chunks = clean_and_split_docs(documents, chunk_size=1000)

Common Use Cases¶

RAG Applications¶

Text Processing Pipeline¶

Need More Help?¶

See the API Reference for detailed method documentation
Check the Tutorial to learn the basics first