This project uses OpenRouter to iteratively generate paraphrases and refine the generation prompt based on classification feedback. OpenRouter is the provider.
Core Idea: A loop generates paraphrases for input phrases using a dynamic prompt, classifies them (human vs machine), and then refines the generator prompt for the next iteration based on the classification results.
Default Provider and Model Files: - ~/.model-openrouter: model id for OpenRouter (default: openrouter/free)
Provider: OpenRouter
Credentials: - OpenRouter: OPENROUTER_API_KEY or ~/.api-openrouter
src/: Contains the main source code.main.py: The main script for running the prompt refinement workflow. Handles setup, API key loading, client initialization, data loading, and the main loop execution.prompt_loop.py: Contains the logic for a single iteration of the refinement loop (run_prompt_refinement_iteration), including generation, classification, result processing, and calling the prompt refinement logic.utils.py: Contains helper functions for interacting with the OpenRouter API (generate_paraphrase, classify_paraphrase), API key loading, directory creation, logging setup, mock data generation (generate_mock_paraphrases), and post-processing classifications.config.py: Holds configuration settings (paths, filenames, model details, API key location, prompt templates, loop control parameters) and the placeholder prompt refinement function (refine_generator_prompt).tests/: Contains unit tests.test_gan.py: Unit tests for various components.conftest.py: Defines pytest fixtures.data/: Contains data used and generated by the loop.raw/: Input data (e.g., mock_input_phrases.tsv).processed/: Data generated during the loop.selected/: Selected paraphrases classified as 'human' (e.g., selected_paraphrases_1.tsv).prompts/: History of generator prompts used (e.g., generator_prompt_1.txt).loop_results_1.json: JSON file summarizing metrics for each iteration.logs/: Stores log files..api-openrouter: (Important/Ignored) File containing your OpenRouter API key (should be placed in your home directory by default, as configured in src/config.py).Prerequisites:
pipuv (Optional, but recommended for environment management as per instructions)Clone the repository:
bash
git clone <repository_url>
cd <repository_directory>
Create and activate a virtual environment:
```bash
python -m uv venv .venv
```
Install dependencies:
```bash .venv/Scripts/python.exe -m uv pip install -r requirements.txt
.venv/Scripts/python.exe -m uv pip install -r requirements-dev.txt ```
Set up API Keys (env-first with file fallback):
OpenRouter: export OPENROUTER_API_KEY="***" or create ~/.api-openrouter with the key only.
Select Model:
Model resolution: echo "openrouter/free" > ~/.model-openrouter If file is absent, default is used.
(Optional) Configure Loop Parameters:
You can override default loop parameters using environment variables:
bash
export MAX_ITERATIONS=5 # Default: 10
export MOCK_DATA_SAMPLES=100 # Default: 50
export BATCH_SIZE=10 # Default: 5
export SLEEP_BETWEEN_BATCHES=2 # Default: 1 (seconds)
The main script (src/main.py) orchestrates the workflow:
Initialization:
data/raw/mock_input_phrases.tsv or generates mock data if the file doesn't exist.Prompt Refinement Loop:
src/config.py.loop_control.max_iterations).run_prompt_refinement_iteration in src/prompt_loop.py):generate_paraphrase).classify_paraphrase).data/processed/selected/.data/processed/.data/processed/prompts/.refine_generator_prompt function (from src/config.py) to potentially modify the generator prompt based on the iteration's results.To run the loop, execute the following command from the project root directory:
# Ensure your virtual environment is activated
python -m src.main
The script will run for the configured number of iterations, making calls to the OpenRouter API. Monitor your API usage and costs. Stop with Ctrl+C if needed.
Basic Run with Default Settings:
bash
python -m src.main
Quick Test with Fewer Iterations:
bash
export MAX_ITERATIONS=2
export MOCK_DATA_SAMPLES=10
python -m src.main
Using Custom Input Data:
data/raw/Example: data/raw/custom_phrases.tsv
Analyzing Results: After running, check:
data/processed/selected/: Selected human-like paraphrasesdata/processed/prompts/: Evolution of generator promptsdata/processed/loop_results_*.json: Metrics per iterationlogs/: Detailed execution logsselected_paraphrases_*.tsv)Higher selection rates indicate better prompt performance
Loop Results (loop_results_*.json)
total_processed: Total input phrases processedtotal_generated: Successfully generated paraphrasesgeneration_rate: Success rate of API generation callsselection_rate_of_generated: Percentage of generated text classified as humantotal_selected_human: Final count of human-classified paraphrases
Prompt Evolution (generator_prompt_*.txt)
chmod 600 ~/.api-openrouter)The src/tests/ directory contains unit tests. Mocking API calls will be essential for reliable testing without actual API usage.
To run the tests:
# Ensure your virtual environment is activated
python -m pytest src/tests/
main.py: Main script, setup, loop orchestration.prompt_loop.py: Logic for a single refinement iteration.utils.py: API wrappers, helpers, mock data generation.config.py: Configuration, initial prompts, refinement logic placeholder.pandas: Used for handling input data and saving results.FileNotFoundError: API key file not found at ~/.api-openrouter
Solutions:ls -la ~/.api-openrouterecho "your-api-key-here" > ~/.api-openrouterOr set environment variable: export OPENROUTER_API_KEY="***"
Permission Denied Error
PermissionError: Permission denied for API key file
Solutions:
chmod 600 ~/.api-openrouterEnsure the file is readable by the current user
API Rate Limiting
Solutions:
SLEEP_BETWEEN_BATCHES environment variableBATCH_SIZE to process fewer items at onceEmpty Generated Text
Solutions:
src/config.pyClassification Always Returns 'machine'
Enable detailed logging by setting:
export PYTHONPATH=src
python -m src.main 2>&1 | tee debug.log
Monitor your API costs by checking the logs: - Generation calls are logged with input/output details - Classification calls are tracked separately - Total counts are shown in iteration summaries
To start fresh:
rm -rf data/processed/*
rm -f logs/*.log
config.py:refine_generator_prompt based on iteration results (e.g., analyze rejected phrases, adjust instructions).GitHub repository