Project Description

API-Based Paraphrase Prompt Refinement (OpenRouter)

This project uses OpenRouter to iteratively generate paraphrases and refine the generation prompt based on classification feedback. OpenRouter is the provider.

Core Idea: A loop generates paraphrases for input phrases using a dynamic prompt, classifies them (human vs machine), and then refines the generator prompt for the next iteration based on the classification results.

Default Provider and Model Files: - ~/.model-openrouter: model id for OpenRouter (default: openrouter/free)

Provider: OpenRouter

Credentials: - OpenRouter: OPENROUTER_API_KEY or ~/.api-openrouter

Project Structure

src/: Contains the main source code.
- main.py: The main script for running the prompt refinement workflow. Handles setup, API key loading, client initialization, data loading, and the main loop execution.
- prompt_loop.py: Contains the logic for a single iteration of the refinement loop (run_prompt_refinement_iteration), including generation, classification, result processing, and calling the prompt refinement logic.
- utils.py: Contains helper functions for interacting with the OpenRouter API (generate_paraphrase, classify_paraphrase), API key loading, directory creation, logging setup, mock data generation (generate_mock_paraphrases), and post-processing classifications.
- config.py: Holds configuration settings (paths, filenames, model details, API key location, prompt templates, loop control parameters) and the placeholder prompt refinement function (refine_generator_prompt).
- tests/: Contains unit tests.
  - test_gan.py: Unit tests for various components.
  - conftest.py: Defines pytest fixtures.
data/: Contains data used and generated by the loop.
- raw/: Input data (e.g., mock_input_phrases.tsv).
- processed/: Data generated during the loop.
  - selected/: Selected paraphrases classified as 'human' (e.g., selected_paraphrases_1.tsv).
  - prompts/: History of generator prompts used (e.g., generator_prompt_1.txt).
  - loop_results_1.json: JSON file summarizing metrics for each iteration.
logs/: Stores log files.
.api-openrouter: (Important/Ignored) File containing your OpenRouter API key (should be placed in your home directory by default, as configured in src/config.py).

Setup and Installation

Prerequisites:
- Python 3.8 or higher
- pip
- Git
- uv (Optional, but recommended for environment management as per instructions)
- OpenRouter API Key: Obtain an API key from OpenRouter (https://openrouter.ai/).
Clone the repository:

bash git clone <repository_url> cd <repository_directory>
Create and activate a virtual environment:

```bash

Using uv (recommended based on instructions)

Ensure uv is installed (e.g., pip install uv or python -m pip install uv)

python -m uv venv .venv

Activate:

Linux/macOS: source .venv/bin/activate

Windows: .venv\Scripts\activate

OR using standard venv

python -m venv .venv

Activate:

Linux/macOS: source .venv/bin/activate

Windows: .venv\Scripts\activate

```
Install dependencies:

```bash .venv/Scripts/python.exe -m uv pip install -r requirements.txt

For contributors/CI

.venv/Scripts/python.exe -m uv pip install -r requirements-dev.txt ```
Set up API Keys (env-first with file fallback):
OpenRouter: export OPENROUTER_API_KEY="***" or create ~/.api-openrouter with the key only.
Select Model:
Model resolution: echo "openrouter/free" > ~/.model-openrouter If file is absent, default is used.
(Optional) Configure Loop Parameters: You can override default loop parameters using environment variables: bash export MAX_ITERATIONS=5 # Default: 10 export MOCK_DATA_SAMPLES=100 # Default: 50 export BATCH_SIZE=10 # Default: 5 export SLEEP_BETWEEN_BATCHES=2 # Default: 1 (seconds)

Usage

Running the Prompt Refinement Loop

The main script (src/main.py) orchestrates the workflow:

Initialization:
- Creates necessary directories.
- Loads the OpenRouter API key.
- Configures the API client.
- Loads input phrases from data/raw/mock_input_phrases.tsv or generates mock data if the file doesn't exist.
Prompt Refinement Loop:
- Starts with the initial generator prompt defined in src/config.py.
- Iterates for a configured number of times (loop_control.max_iterations).
- In each iteration (run_prompt_refinement_iteration in src/prompt_loop.py):
  - Processes input phrases in batches.
  - For each phrase:
    - Calls the OpenRouter API using the current generator prompt to generate a paraphrase (generate_paraphrase).
    - If generation succeeds, calls the OpenRouter API using the classification prompt to classify the paraphrase as 'human' or 'machine' (classify_paraphrase).
  - Collects all results (input, generated text, classification).
  - Filters the results to get pairs classified as 'human'.
  - Saves the selected pairs to a TSV file in data/processed/selected/.
  - Calculates and logs summary metrics for the iteration (generation rate, selection rate, etc.). Saves summary to a JSON file in data/processed/.
  - Saves the generator prompt used for the current iteration to data/processed/prompts/.
  - Calls the refine_generator_prompt function (from src/config.py) to potentially modify the generator prompt based on the iteration's results.
  - Uses the (potentially) refined prompt for the next iteration.

To run the loop, execute the following command from the project root directory:

# Ensure your virtual environment is activated
python -m src.main

The script will run for the configured number of iterations, making calls to the OpenRouter API. Monitor your API usage and costs. Stop with Ctrl+C if needed.

Example Usage

Basic Run with Default Settings: bash python -m src.main
Quick Test with Fewer Iterations: bash export MAX_ITERATIONS=2 export MOCK_DATA_SAMPLES=10 python -m src.main
Using Custom Input Data:
Place your TSV file with 'input_text' column in data/raw/
The file should be tab-separated with a header row
Example: data/raw/custom_phrases.tsv
Analyzing Results: After running, check:
data/processed/selected/: Selected human-like paraphrases
data/processed/prompts/: Evolution of generator prompts
data/processed/loop_results_*.json: Metrics per iteration
logs/: Detailed execution logs

Understanding Outputs

Selected Paraphrases (selected_paraphrases_*.tsv)
Contains input phrases and their human-classified paraphrases
Use this data to evaluate prompt effectiveness
Higher selection rates indicate better prompt performance
Loop Results (loop_results_*.json)
total_processed: Total input phrases processed
total_generated: Successfully generated paraphrases
generation_rate: Success rate of API generation calls
selection_rate_of_generated: Percentage of generated text classified as human
total_selected_human: Final count of human-classified paraphrases
Prompt Evolution (generator_prompt_*.txt)
Shows how the generator prompt changes over iterations
Use to understand refinement strategy effectiveness

Security Notes

API Keys: Never commit API keys to version control. Use environment variables or secure files.
File Permissions: Keep API key files private (chmod 600 ~/.api-openrouter)
Network Security: API calls are made over HTTPS, but monitor for sensitive data in logs
Cost Management: Set iteration limits and monitor usage to control API costs

Testing

The src/tests/ directory contains unit tests. Mocking API calls will be essential for reliable testing without actual API usage.

To run the tests:

# Ensure your virtual environment is activated
python -m pytest src/tests/

Core Components

main.py: Main script, setup, loop orchestration.
prompt_loop.py: Logic for a single refinement iteration.
utils.py: API wrappers, helpers, mock data generation.
config.py: Configuration, initial prompts, refinement logic placeholder.
pandas: Used for handling input data and saving results.

Troubleshooting

Common Issues

API Key Not Found Error FileNotFoundError: API key file not found at ~/.api-openrouter Solutions:
Ensure your API key file exists: ls -la ~/.api-openrouter
Create the file: echo "your-api-key-here" > ~/.api-openrouter
Or set environment variable: export OPENROUTER_API_KEY="***"
Permission Denied Error PermissionError: Permission denied for API key file Solutions:
Fix file permissions: chmod 600 ~/.api-openrouter
Ensure the file is readable by the current user
API Rate Limiting
Symptom: Getting 429 errors or requests timing out
Solutions:
- Increase SLEEP_BETWEEN_BATCHES environment variable
- Reduce BATCH_SIZE to process fewer items at once
- Check your API provider's rate limits and quota
Empty Generated Text
Symptom: Paraphrases are empty strings
Solutions:
- Check the generator prompt template in src/config.py
- Ensure the model has sufficient context to generate meaningful text
- Try different models via the model files
Classification Always Returns 'machine'
Symptom: All paraphrases are classified as machine-generated
Solutions:
- Review the classification prompt template
- Try adjusting the prompt to be more specific about what constitutes "human-like" text
- Consider using a different model for classification

Debug Mode

Enable detailed logging by setting:

export PYTHONPATH=src
python -m src.main 2>&1 | tee debug.log

Checking API Usage

Monitor your API costs by checking the logs: - Generation calls are logged with input/output details - Classification calls are tracked separately - Total counts are shown in iteration summaries

Resetting the Loop

To start fresh:

rm -rf data/processed/*
rm -f logs/*.log

Further Improvements

Implement Prompt Refinement Logic: Develop actual strategies in config.py:refine_generator_prompt based on iteration results (e.g., analyze rejected phrases, adjust instructions).
Error Handling: Enhance error handling for API calls (rate limits, specific errors).
Input Data: Use more diverse and realistic input phrases instead of basic mock data.
Advanced Classification: Improve the classification prompt or use more robust methods if simple 'human'/'machine' is insufficient.
Metrics: Track more detailed metrics (e.g., semantic similarity between input and selected paraphrase).
Testing: Implement comprehensive tests with API mocking.
Configuration: Move prompts to separate files for easier management.
Cost Management: Add checks or limits based on estimated API costs.

paraphrase-gan

Project Description

API-Based Paraphrase Prompt Refinement (OpenRouter)

Project Structure

Setup and Installation

Using uv (recommended based on instructions)

Ensure uv is installed (e.g., pip install uv or python -m pip install uv)

Activate:

Linux/macOS: source .venv/bin/activate

Windows: .venv\Scripts\activate

OR using standard venv

python -m venv .venv