A comprehensive, production-ready paraphrase generation system with advanced features including batch processing, quality evaluation, caching, rate limiting, and REST API support.
.
├── data/
│ ├── custom_train.tsv # Custom training data (tab-separated: source<TAB>target) - Used for structure reference
│ └── custom_eval.tsv # Custom evaluation data (tab-separated: source<TAB>target) - Used for structure reference
├── src/
│ ├── api.py # FastAPI REST endpoints with automatic docs
│ ├── batch_processor.py # High-performance batch processing with concurrency
│ ├── cache.py # Redis/memory caching system with TTL support
│ ├── config.py # Comprehensive configuration management
│ ├── evaluation.py # Paraphrase quality evaluation with multiple metrics
│ ├── exceptions.py # Custom exception hierarchy with detailed error info
| ├── logging_config.py # Structured logging with performance monitoring
│ ├── main.py # Enhanced CLI with multiple operation modes
│ ├── provider_facade.py # Provider abstraction
│ ├── rate_limiter.py # Rate limiting system
│ ├── data_processing/ # Legacy data processing modules
│ └── provider_openrouter.py # OpenRouter API integration
├── requirements.txt # Runtime dependencies (includes FastAPI, caching, ML libraries)
├── requirements-dev.txt # Development and testing dependencies
├── logs/ # Application logs (created automatically)
├── .venv/ # Virtual environment (created by uv)
├── model_output/ # Default directory for output (less critical now)
└── README.md
Clone the repository:
bash
git clone <repository_url>
cd <repository_name>
Obtain API Key:
Configure API Key: ```bash # Option 1: Environment variable (recommended) export OPENROUTER_API_KEY="your-openrouter-key"
echo "your-openrouter-key" > ~/.api-openrouter ```
Create Virtual Environment:
bash
python -m uv venv .venv
.venv/Scripts/python.exe -m ensurepip
.venv/Scripts/python.exe -m pip install uv
Install Dependencies: ```bash # Install runtime dependencies .venv/Scripts/python.exe -m uv pip install -r requirements.txt
.venv/Scripts/python.exe -m uv pip install -r requirements-dev.txt ```
Setup Optional Components: ```bash # For Redis caching (optional) # Install and start Redis server, then set REDIS_URL=redis://localhost:6379/0
.venv/Scripts/python.exe -m uv pip install sentence-transformers ```
The custom dataset files (data/custom_train.tsv, data/custom_eval.tsv) are included for historical context and potential future use, but are not directly used by the current OpenRouter-based paraphrase generation logic. They contain tab-separated pairs of sentences, where the first column is the source sentence and the second column is the target paraphrase. Example:
Original sentence one.<TAB>Paraphrased sentence one.
Original sentence two.<TAB>Paraphrased sentence two.
The system supports multiple operation modes for different use cases.
.venv/Scripts/python.exe -m src.main --mode cli --decode_input "This is the sentence to paraphrase."
# Process multiple texts from a file
.venv/Scripts/python.exe -m src.main --mode batch --batch_input input.txt --batch_output results.txt
# Process a single text in batch mode
.venv/Scripts/python.exe -m src.main --mode batch --decode_input "Text to paraphrase"
.venv/Scripts/python.exe -m src.main --mode interactive
.venv/Scripts/python.exe -m src.main --mode evaluate \
--evaluate_original "Original text" \
--evaluate_paraphrase "Generated paraphrase"
Start the REST API server with automatic documentation:
.venv/Scripts/python.exe -m src.main --mode api --port 8000
The API will be available at: - Main API: http://localhost:8000 - Interactive Docs: http://localhost:8000/docs - Alternative Docs: http://localhost:8000/redoc - Health Check: http://localhost:8000/health
POST /paraphrase Generate a single paraphrase
{
"text": "This is the sentence to paraphrase.",
"provider": "openrouter",
"model": "anthropic/claude-3-sonnet"
}
POST /paraphrase/batch Generate multiple paraphrases
{
"texts": ["Text 1", "Text 2", "Text 3"],
"provider": "openrouter"
}
POST /evaluate Evaluate paraphrase quality
{
"original": "Original text",
"paraphrase": "Generated paraphrase",
"include_semantic": true
}
from src.main import generate_paraphrase, paraphrase_batch
from src.evaluation import evaluate_paraphrase
from src.config import setup_system
# Setup system components
setup_system()
# Single paraphrase
result = generate_paraphrase("This is a test sentence.")
print(f"Paraphrase: {result}")
# Batch processing
texts = ["Text 1", "Text 2", "Text 3"]
results = paraphrase_batch(texts)
print(f"Results: {results}")
# Quality evaluation
evaluation = evaluate_paraphrase(
original="Original text",
paraphrase="Generated paraphrase"
)
print(f"Quality score: {evaluation['overall_score']}")
| Variable | Description | Default | |
|---|---|---|---|
OPENROUTER_API_KEY |
OpenRouter API key | None | |
REDIS_URL |
Redis connection URL | redis://localhost:6379/0 |
|
LOG_LEVEL |
Logging level | INFO |
Advanced configuration can be modified in src/config.py:
# API Settings
API_HOST = "0.0.0.0"
API_PORT = 8000
API_DEBUG = False
# Cache Settings
CACHE_TYPE = "redis" # "redis" or "memory"
CACHE_TTL = 3600 # 1 hour default
# Rate Limiting
RATE_LIMIT_REQUESTS_PER_MINUTE = 60
# Batch Processing
BATCH_MAX_SIZE = 10
BATCH_MAX_WORKERS = 4
The system provides comprehensive quality evaluation:
Create a Dockerfile for containerized deployment:
FROM python:3.9-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY src/ ./src/
COPY data/ ./data/
EXPOSE 8000
CMD ["python", "-m", "src.main", "--mode", "api", "--port", "8000"]
The system includes comprehensive monitoring:
# Install development dependencies
.venv/Scripts/python.exe -m uv pip install -r requirements-dev.txt
# Run tests
.venv/Scripts/python.exe -m pytest tests/
# Run with coverage
.venv/Scripts/python.exe -m pytest --cov=src tests/
# Format code
.venv/Scripts/python.exe -m black src/
# Type checking
.venv/Scripts/python.exe -m mypy src/
# Linting
.venv/Scripts/python.exe -m flake8 src/
Enable verbose logging for troubleshooting:
.venv/Scripts/python.exe -m src.main --mode api --verbose
This project is licensed under the MIT-0 License. See the LICENSE file for details.
GitHub repository