# Custom Criteria + GPT Integration

## Overview

GPT analysis now integrates with custom criteria data to make more informed, data-driven decisions. This addresses the user's request: "ok lets go for Option 1: GPT Receives Custom Data as Context"

## What Changed

### 1. Enhanced Prompt Templates

**New Files:**
- `prompt_english.txt` - English version of the prompt for better GPT results
- Updated `prompt_with_custom_data.txt` - Dutch prompt with custom data integration

**Key Additions:**
- Short-stay accommodation emphasis (not long-term rental)
- Livability assessment for guest accommodations
- Placeholders for custom criteria data
- Instructions for GPT to reference objective data in analysis

### 2. Data Formatting

**File:** `format_custom_data_for_gpt.py`

Formats custom criteria into readable text blocks for GPT:

```
💡 BESCHIKBARE OBJECTIEVE DATA (gebruik deze in je analyse!):

🌧️ NEERSLAG: 950 mm per jaar
   → Uitstekend voor landbouw (ideaal voor groenteteelt)

🌡️ KLIMAAT:
   • Groeidagen per jaar (>5°C): 280 dagen
   • Gemiddelde temperatuur: 15.5°C
   → Uitstekend klimaat (jaar-rond teelt mogelijk)

✈️ BEREIKBAARHEID:
   • Afstand tot Valencia Airport: 45 km
   • Geschatte rijtijd: ~36 minuten
   → Uitstekende bereikbaarheid (ideaal voor gastenverblijf)

👥 LOKALE MARKT:
   • Bevolking binnen 20km: ~35,000 inwoners
   • Dichtheid: suburban
   → Goed afzetgebied (voldoende lokale vraag)
```

### 3. Analysis Script Integration

**File:** `analyze_from_urls_optimized.py`

**Changes:**
- Loads `enriched_data.json` to access custom criteria
- Creates URL-indexed lookup dictionary for fast access
- Chooses appropriate prompt based on configuration and data availability
- Inserts formatted custom data into prompt
- Supports English/Dutch prompts

**Prompt Selection Logic:**
1. If optimized prompt → use short OPTIMIZED_PROMPT
2. Else if English + has custom data → use `prompt_english.txt`
3. Else if has custom data → use `prompt_with_custom_data.txt`
4. Else if English → use `prompt_english.txt`
5. Else → use `prompt.txt` (basic Dutch)

**New Configuration:**
- `USE_ENGLISH` - Use English prompts (default: Y)
- Environment variable: `USE_ENGLISH=y/n`

## How It Works

### When Property Has Custom Data

GPT receives:
- Property description (as before)
- **NEW:** Objective climate data (rainfall, temperature, growing days)
- **NEW:** Location data (airport distance, population density)
- **NEW:** Custom score (3.5/5.0)

GPT analysis is informed by objective facts:
- "With 950mm annual rainfall..." instead of "probably has adequate rainfall"
- "Given 280 growing days..." instead of guessing climate suitability
- "Airport is 45km away..." instead of "appears reasonably accessible"

### When Property Has No Custom Data

GPT receives:
```
[NO OBJECTIVE DATA AVAILABLE]
Objective climate and location data has not yet been collected for this property.
Base your analysis on the description and general knowledge of the region.
```

## Usage

### Interactive Mode

```bash
python3 analyze_from_urls_optimized.py
```

You'll be prompted:
- Use optimized prompt? (saves ~30% tokens) [y/N]
- Use intelligent caching? (skip unchanged properties) [Y/n]
- **NEW:** Use English prompt? (potentially better GPT results) [Y/n]

### Non-Interactive Mode (API/Automation)

```bash
USE_ENGLISH=y USE_CACHE=y python3 analyze_from_urls_optimized.py
```

Environment variables:
- `USE_ENGLISH=y` - Use English prompts (default)
- `USE_CACHE=y` - Enable caching (default)
- `USE_OPTIMIZED_PROMPT=y` - Use short prompt (off by default)

## Expected Benefits

### 1. More Accurate Scoring

**Before:**
- Market Garden: 5/5 based on "looks like agricultural land"
- Reality: Only 400mm rainfall/year → needs irrigation

**After:**
- Market Garden: 3/5 - "With only 400mm rainfall per year, irrigation would be essential..."

### 2. Consistent with Custom Criteria

GPT scores now align better with custom criteria because GPT sees the same objective data.

**Example:**
- Custom score: 4.5/5 (excellent climate, close to airport)
- GPT sees this data and scores accordingly
- Overall score is now more coherent

### 3. Richer Analysis

GPT can now say:
> "Met 950mm neerslag per jaar en 280 groeidagen is dit klimaat uitstekend geschikt voor biologische groenteteelt. De afstand van 45km tot het vliegveld maakt het goed bereikbaar voor gasten..."

Instead of generic:
> "Het klimaat lijkt geschikt voor landbouw. Bereikbaarheid lijkt redelijk."

## Testing

Test the integration:

```bash
python3 test_custom_integration.py
```

This shows how custom data is formatted for a sample property.

## Next Steps

### 1. Run Custom Criteria Analysis

To populate detailed custom criteria data:

```bash
python3 custom_criteria.py
```

This will add:
- `custom_rainfall_data`
- `custom_temperature_data`
- `custom_airport_distance_data`
- `custom_population_density_data`

To enriched_data.json

### 2. Re-analyze Properties

After custom criteria data is populated:

```bash
# Clear GPT cache to force re-analysis with new data
rm -rf .gpt_cache/

# Run analysis with custom data
USE_ENGLISH=y USE_CACHE=n python3 analyze_from_urls_optimized.py
```

### 3. Compare Results

Before vs After integration:
- Check if GPT scores are more accurate
- Verify GPT references objective data in analysis text
- Confirm alignment between GPT and custom scores

## Troubleshooting

### Properties show "NO OBJECTIVE DATA AVAILABLE"

**Cause:** Custom criteria hasn't run yet or didn't collect detailed data

**Fix:**
```bash
python3 custom_criteria.py
```

### GPT analysis doesn't reference custom data

**Cause:** Using optimized prompt (doesn't include custom data section)

**Fix:** Use full prompt:
```bash
USE_OPTIMIZED_PROMPT=n python3 analyze_from_urls_optimized.py
```

### Want to force re-analysis with new data

**Fix:** Clear cache and re-run:
```bash
rm -rf .gpt_cache/
python3 analyze_from_urls_optimized.py
```

## Technical Details

### Data Flow

```
enriched_data.json
    ↓
property_data_by_url[url]
    ↓
format_custom_data_for_prompt()
    ↓
custom_data_text (formatted string)
    ↓
prompt.replace("{custom_criteria_data}", custom_data_text)
    ↓
GPT-4o-mini
    ↓
analysis with objective data references
```

### File Dependencies

- `enriched_data.json` - Source of custom criteria data
- `format_custom_data_for_gpt.py` - Formatting function
- `prompt_english.txt` - English prompt template
- `prompt_with_custom_data.txt` - Dutch prompt template
- `analyze_from_urls_optimized.py` - Main analysis script

## Architecture Notes

This implements **Option 1: GPT Receives Custom Data as Context** from the architectural decision:

**Advantages:**
- ✓ GPT sees objective data when making subjective decisions
- ✓ No changes to custom criteria calculation
- ✓ GPT can explain reasoning using objective facts
- ✓ More transparent and auditable

**Trade-offs:**
- Slightly longer prompts (more tokens)
- Cache invalidation when custom data changes
- Need to format data readably for GPT

**Alternative approaches considered:**
- Option 2: GPT Scores Replace Custom Criteria (rejected - loses objectivity)
- Option 3: Separate Independent Scoring (current system - fixed with this integration)
