v0.9.3 — Modular Refactoring

[0.9.3] modular refactoring

Added

Modular Code Architecture: Complete refactoring into separate, focused modules
batch_processor.py: Batch operations and model management functionality
benchmark_runner.py: Core benchmarking logic and model execution
benchmark_utils.py: Logging, file I/O, and general utility functions
leaderboard_exports.py: Leaderboard export and display functionality
manual_ingestion.py: Manual output ingestion and processing
Enhanced Logging System: Comprehensive timestamped logging functionality
Automatic log file creation in logs/YYYYMMDD-HHMM.txt format (e.g., logs/20251210-0413.txt)
Complete capture of all benchmark output including [SKIP], [RUN-ALL], [TESTING], [DONE], [ERROR], [LIMITED] messages
Dual output system: messages logged to file AND displayed in console (preserving existing behavior)
Automatic logging initialization when script starts
Full leaderboard logging for historical reference
Enhanced audit trail for model run results with timestamps
Advanced Model Management: Enhanced model organization and tracking
move_model_to_skip(): Automatic movement of failed models to skip list
move_model_to_limited(): Automatic movement of rate-limited models to limited list
Rate limit detection with automatic categorization vs other failures
Enhanced error handling with [LIMITED] status for rate-limited models
Comprehensive Leaderboard System: Separated concerns with dedicated export module
Export functionality moved to leaderboard_exports.py
Enhanced markdown export with medal emojis (🥇🥈🥉) for top 3 performers
CLI table formatting for terminal display
Multiple run support with historical tracking
Automatic LEADERBOARD.md file updates

Changed

Code Organization: Complete modular refactoring for better maintainability
Separated concerns across multiple specialized modules
Enhanced import structure with proper module dependencies
Improved code readability and organization
Model File Management: Reorganized model tracking system
models.txt removed (replaced with models_todo.txt and models_skip.txt)
Enhanced model state management with separate files for different statuses
Better separation between active, skipped, and rate-limited models
Error Handling Strategy: Improved error categorization and handling
Rate-limited models automatically moved to models_limited.txt
Failed models automatically moved to models_skip.txt
Enhanced feedback with specific error categories
Better progress tracking with accurate error counting

Improved

Maintainability: Modular architecture with clear separation of concerns
Debugging & Monitoring: Complete audit trail of benchmark runs for troubleshooting
Performance: Better code organization and optimized imports
User Experience: Enhanced feedback with detailed logging and error categorization
Scalability: Modular design supports easier feature additions and maintenance