# Paradisomatch Property Analysis System

A comprehensive property analysis system for evaluating rural properties across Europe using AI-powered analysis and custom objective criteria.

## Quick Start

### Prerequisites
- Python 3.9+
- OpenAI API key
- Properstar account credentials

### Installation
```bash
pip install -r requirements.txt
```

### Configuration
1. Create `.env` file with your OpenAI API key:
   ```
   OPENAI_API_KEY=your_key_here
   ```
2. Update `config.json` with your Properstar credentials

### Launch System
```bash
# Start API server
python3 criteria_api.py

# Open web interface
open http://localhost:5002/criteria_manager.html
```

### Importing Favorites from FrenchEstateAgents.com (optional)
This is additive; it does not change the Properstar flow. It just appends new URLs into `extracted_property_urls.csv`.
```bash
# From repo root or scraper/, run with quoted paths if your shell doesn’t like spaces:
venv/bin/python3.14 scraper/import_french_favorites.py --keep-browser-open --wait-seconds 20
# In the terminal, press Enter on the prompts after your favorites are visible in Chromium.
# The script saves french_favorites.csv and appends new URLs into extracted_property_urls.csv.
```
Then continue with the usual pipeline (`analyze_from_urls.py`, `parse_criteria.py`, `quality_gate.py`).

### Objective Risk Profiling (optional)
Adds a deterministic risk band (Low/Med/High) based on proximity to the nearest hospital (Overpass).
```bash
../venv/bin/python3.14 risk_features.py
# Re-run quality_gate.py to see updated risk coverage
```

## System Architecture

### Core Components

**1. Criteria Manager** (`criteria_manager.html`)
- Web-based admin interface
- Manage GPT and custom criteria weights
- Monitor system status and pipeline progress
- Trigger analysis and scraping operations

**2. Map Viewer** (`map_viewer_advanced.html`)
- Interactive property map with filtering
- Property details and analysis scores
- Favorites management
- Export functionality

**3. Analysis Pipeline**
- `analyze_from_urls_optimized.py` - GPT-4o-mini analysis (70% cost savings)
- `custom_criteria.py` - Objective data-driven scoring
- Intelligent caching to avoid re-analysis

**4. Data Collection**
- `favorites_scraper.py` - Scrape favorites from Properstar
- `check_availability.py` - Monitor property availability
- `bulletproof_geocoding.py` - GPS coordinate extraction

### Data Flow

```
Properstar Favorites
    ↓
favorites_scraper.py → enriched_data.json
    ↓
analyze_from_urls_optimized.py (GPT-4o-mini)
    ↓
custom_criteria.py (objective data)
    ↓
Map Viewer (filtered results)
```

## Key Features

### 1. Dual Scoring System
- **GPT Score**: AI analysis of 6 subjective criteria (market garden, guest accommodation, workshop, rental units, location, local market)
- **Custom Score**: Objective data (rainfall, temperature, airport distance, population density)
- **Overall Score**: Weighted combination of both scores

### 2. Intelligent Caching
- Content-based caching prevents re-analysis
- Only analyzes properties with changes or missing scores
- Saves ~99% of API costs on re-runs

### 3. Cost Optimization
- GPT-4o-mini: 70% cheaper than GPT-3.5-turbo
- Optimized prompts: 30% token savings
- Typical cost: $0.048 for 89 properties (first run), $0.00 (cached)

### 4. Real-time Progress Tracking
- Live progress updates in UI
- Job status monitoring
- Background processing with logging

## Documentation

### User Guides
- **[Workflow Guide](DOCS/WORKFLOW_GUIDE.md)** - Complete system workflow
- **[Automation Guide](DOCS/AUTOMATION_GUIDE.md)** - Automated scraping and analysis
- **[Availability Checker](DOCS/AVAILABILITY_CHECKER_GUIDE.md)** - Monitor property status

### Technical Documentation
- **[GPT Model Analysis](DOCS/GPT_MODEL_ANALYSIS.md)** - Model comparison and cost optimization
- **[Custom Criteria Guide](DOCS/CUSTOM_CRITERIA_GUIDE.md)** - Objective scoring system
- **[Scoring Flow](DOCS/SCORING_FLOW.md)** - How scores are calculated
- **[Testing Guide](DOCS/TESTING_GUIDE.md)** - Quality assurance

### Reference
- **[API Endpoints](DOCS/WORKFLOW_GUIDE.md#api-endpoints)** - All available API routes
- **[File Structure](#file-structure)** - Project organization

## File Structure

### Core Python Scripts
```
analyze_from_urls_optimized.py  # GPT analysis with caching
custom_criteria.py              # Objective data scoring
criteria_api.py                 # Flask API server
favorites_scraper.py            # Scrape Properstar favorites
check_availability.py           # Property availability monitoring
bulletproof_geocoding.py        # GPS extraction
auto_workflow.py                # Automated pipeline
```

### Web Interfaces
```
criteria_manager.html           # Admin dashboard
map_viewer_advanced.html        # Interactive map viewer
```

### Data Files
```
enriched_data.json             # Master property database
extracted_property_urls.csv    # Property URLs to analyze
analysis_output.csv            # GPT analysis results
config.json                    # System configuration
gpt_cost_tracker.json          # Cost tracking
```

## Common Operations

### Analyze Properties
```bash
# Via UI: Click "Analyze Only" in Criteria Manager
# Via CLI:
python3 analyze_from_urls_optimized.py
```

### Scrape New Favorites
```bash
# Via UI: Click "Scrape Favorites" in Criteria Manager
# Via CLI:
python3 favorites_scraper.py
```

### Check Availability
```bash
# Via UI: Click "Check Availability" in Criteria Manager
# Via CLI:
python3 check_availability.py
```

### Update Criteria Weights
1. Open Criteria Manager
2. Adjust sliders in GPT or Custom Criteria tabs
3. Click "Save & Recalculate"

## Performance & Costs

### Analysis Performance
- **First run**: ~2-5 minutes for 89 properties
- **Cached run**: <1 minute (0 API calls)
- **Cache hit rate**: 99%+ on subsequent runs

### Cost Breakdown (GPT-4o-mini)
- **Input**: $0.15 per 1M tokens
- **Output**: $0.60 per 1M tokens
- **Typical property**: ~$0.0005 per analysis
- **89 properties**: ~$0.048 (first run), $0.00 (cached)

**Annual savings vs GPT-3.5**: ~$252 at current volume

## Troubleshooting

### API Server Not Responding
```bash
# Check if running
lsof -ti:5002

# Restart
pkill -f criteria_api.py
python3 criteria_api.py &
```

### Properties Not Analyzing
1. Check API server is running
2. Verify OpenAI API key in `.env`
3. Check logs: `/tmp/paradisomatch_job_*.log`

### Missing GPS Coordinates
```bash
# Run geocoding
python3 bulletproof_geocoding.py
```

## Project Status

**Current Version**: 2.2
- 199 properties in database
- 100 with GPT scores
- 99 pending analysis
- All properties have GPS coordinates

## Contributing

When adding new features:
1. Update relevant documentation
2. Add cost tracking if using APIs
3. Implement progress tracking for long operations
4. Test with sample data first

## Archive

Old files and documentation are preserved in:
- `_archive/` - Old scripts and session docs
- `_backups/` - Data file backups
- `_old_html/` - Previous UI versions