paraphrase-back-translate

Project Description

Back Translate CLI (OpenRouter Version)

This project performs back-translation for paraphrase generation using the OpenRouter API. It takes text files from an input pool, translates them (e.g., English to French), then translates the result back (e.g., French to English) over a specified number of cycles using a specified OpenRouter model.

Files

Model Information

This version uses OpenRouter for translation. You can specify the model name via the command line (default: openrouter/free). Access requires an OpenRouter API key.

Installation

It is recommended to use a virtual environment. This project uses uv for environment and package management.

  1. Create virtual environment: bash python -m uv venv .venv
  2. Activate environment: bash # On Windows bash shells like Git Bash source .venv/Scripts/activate # On Linux/macOS # source .venv/bin/activate
  3. Install tooling into the venv (if needed) and dependencies: bash .venv/Scripts/python.exe -m ensurepip .venv/Scripts/python.exe -m pip install uv # Runtime only .venv/Scripts/python.exe -m uv pip install -r requirements.txt # Or for contributors/CI .venv/Scripts/python.exe -m uv pip install -r requirements-dev.txt

Setup

  1. API Key:

    • Preferred: set OPENROUTER_API_KEY in your environment.
    • Fallback: save the key in a plain text file at ~/.api-openrouter. You can specify a different path using the --api-key-path argument.
    • Security: keep this secret and out of version control.
  2. Directory Structure: Before running, create the necessary directory structure for input/output files (relative to where you run the command): ./data/pooling/ ├── input_pool/ # Place initial English .txt files here ├── french_pool/ # Initially empty, will hold en->fr results ├── output_pool/ # Initially empty, will hold fr->en results (final paraphrases) ├── input_pool_completed/ # Processed English files moved here └── french_pool_completed/ # Processed French files moved here ./logs/ # Directory for log files (created automatically)

Usage

Run the CLI from the project root directory using the Python interpreter from the virtual environment:

# Example: Run 2 back-translation cycles starting with en->fr, using defaults
.venv/Scripts/python.exe -m src.main --cycles 2 --translation-type en_to_fr

# Example: Run 1 cycle, fr->en, specifying directories and model
.venv/Scripts/python.exe -m src.main --cycles 1 --translation-type fr_to_en --pooling-dir ./my_data --log-dir ./my_logs --model openrouter/free --api-key-path /path/to/my/key

Arguments:

The script will randomly select a file from the appropriate input pool (input_pool for en_to_fr, french_pool for fr_to_en), translate it using the OpenRouter API, move the original to the corresponding _completed directory, and place the translated file in the output pool (french_pool for en_to_fr, output_pool for fr_to_en). This process repeats, alternating the translation direction for the specified number of cycles. Log messages indicating progress and any errors will be printed to the console and saved in the file specified by --log-dir (default: ./logs/backtranslate.log).

Dependencies

Two requirement files are provided to balance consumer flexibility and contributor reproducibility:

Install patterns:

# Runtime only
.venv/Scripts/python.exe -m uv pip install -r requirements.txt
# Development/CI
.venv/Scripts/python.exe -m uv pip install -r requirements-dev.txt
# Run tests
.venv/Scripts/python.exe -m pytest -q

Source Code

GitHub repository