# Complete Criteria System: Review & Improvement Proposals

## System Overview

**Total Criteria: 14**
- **6 GPT Criteria** (analyzed via OpenAI GPT-4o-mini)
- **8 Custom Criteria** (calculated via free APIs: Open-Meteo, Nominatim)

---

# PART 1: GPT CRITERIA (6 total)

These criteria analyze property descriptions using AI to evaluate suitability.

## 1. 🌱 Regeneratieve Market Garden

**Current Definition:**
> bodemgeschiktheid, zon, water, oppervlakte minimaal 2500 m²

**Issues:**
- No enforcement of 2500m² minimum
- Doesn't check if property actually mentions land/plot
- No water source validation

**IMPROVED:**
```
Regeneratieve market garden – Commerciële groenteteelt potentieel

HARDE EISEN (score = 1 als NIET voldaan):
- Minimaal 2500 m² grond beschikbaar
- Zonnige ligging (zuid/zuidwest of vlak terrein)
- Waterbron realiseerbaar binnen 100m

SCORE-CRITERIA:
5 = 5000+ m², perfect zon, eigen bron/rivier, vruchtbare grond, vlak
4 = 3500-5000 m², goede zon, water <50m, redelijke grond
3 = 2500-3500 m², gemiddeld zon, water haalbaar, grond verbeterbaar
2 = 2500 m², matige zon OF hellend OF water ver (>100m)
1 = <2500 m² OF geen water OF te schaduwrijk OF te hellend

VALIDATIE:
- Check of beschrijving "grond", "land", "perceel", "hectare" vermeldt
- Als alleen "bebouwde kavel" → maximaal score 2
- Als "stedelijk perceel" → score 1
```

---

## 2. 🏡 Gastenverblijf (Guest Accommodation)

**Current Definition:**
> VEREIST BESTAANDE WONING/GEBOUW met slaapkamers

**Issues:**
- Property with no building scored 5/5
- No bedroom count requirement
- Doesn't check habitability status

**IMPROVED:**
```
Gastenverblijf – Airbnb / B&B accommodatie potentieel

HARDE EISEN (score = 1 als NIET voldaan):
- BESTAAND gebouw met minimaal 2 slaapkamers
- Beschrijving vermeldt "slaapkamers", "kamers", "bedrooms"
- Woning bewoonbaar (NIET "te renoveren", "puin", "bouwval")

SCORE-CRITERIA:
5 = 4+ kamers, prachtig uitzicht, rust, zwembad, nabij attracties, verhuurklaar
4 = 3 kamers, mooi uitzicht, rustig, goed bereikbaar, kleine renovatie
3 = 2-3 kamers, redelijk uitzicht, bereikbaar, medium renovatie
2 = 2 kamers, matige locatie, afgelegen, grote renovatie
1 = <2 kamers OF geen gebouw OF bouwval OF slechte locatie

VALIDATIE:
- Als bedrooms KPI = 0 → automatisch score 1
- Als bedrooms KPI = 1 → maximaal score 2
- Als geen vermelding van kamers in tekst → score 1
- Check expliciet op "geen gebouw", "only land", "terrein" → score 1
```

---

## 3. 🔧 Werkplaats/Voedselverwerking (Workshop)

**Current Definition:**
> VEREIST BESTAANDE GEBOUWEN

**Issues:**
- No size requirement (too small is useless)
- Doesn't distinguish house from workshop space
- No utility check

**IMPROVED:**
```
Werkplaats/voedselverwerking – Ambachtelijke productie ruimte

HARDE EISEN (score = 1 als NIET voldaan):
- BESTAAND gebouw/schuur/loods minimaal 40 m²
- Beschrijving vermeldt "schuur", "loods", "stal", "werkplaats", "bijgebouw"
- NIET alleen woonhuis zonder werkruimte

SCORE-CRITERIA:
5 = 100+ m² werkruimte, elektra + water, verharde vloer, goede staat, loading access
4 = 60-100 m², elektra OF water, redelijke staat, toegankelijk voor leveringen
3 = 40-60 m², nutsvoorzieningen mogelijk, renovatie nodig, bereikbaar
2 = 40 m², geen utilities, grote renovatie, moeilijk bereikbaar
1 = <40 m² OF geen werkruimte OF alleen woonhuis OF puin

VALIDATIE:
- Als building_size_m2 KPI < 40 → maximaal score 2
- Als building_size_m2 KPI = 0 → score 1
- Check op "schuur", "loods", "werkplaats" in tekst
- Als alleen "huis" zonder bijgebouwen → maximaal score 2
```

---

## 4. 🏘️ Zelfstandige Verhuureenheden (Rental Units)

**Current Definition:**
> VEREIST BESTAANDE WONING/GEBOUW met meerdere slaapkamers

**Issues:**
- Land with no building scored 4/5
- No minimum bedroom requirement
- Doesn't check for separate units/entrances

**IMPROVED:**
```
Zelfstandige verhuureenheden – Langetermijn/seizoensverhuur potentieel

HARDE EISEN (score = 1 als NIET voldaan):
- BESTAAND gebouw met minimaal 3 slaapkamers OF
- Meerdere aparte units/appartementen
- Beschrijving vermeldt kamers/units/appartementen

SCORE-CRITERIA:
5 = 5+ kamers OF 2+ units, elk eigen keuken/badkamer, goede staat, separate entrances
4 = 4 kamers OF opsplitsen mogelijk, aparte ingangen mogelijk, kleine renovatie
3 = 3-4 kamers, opsplitsen mogelijk, renovatie nodig, één keuken
2 = 3 kamers, moeilijk op te splitsen, grote renovatie nodig
1 = <3 kamers OF geen gebouw OF niet op te splitsen

VALIDATIE:
- Als bedrooms KPI < 3 → maximaal score 2
- Als bedrooms KPI = 0 → score 1
- Check op "units", "appartementen", "studios" in tekst
- Voor verhuur essentieel: aparte ingangen, keukens, badkamers
```

---

## 5. 📍 Ligging t.o.v. Kust, Stad en Vliegveld

**Current Definition:**
> afstand en bereikbaarheid vanuit Nederland

**Issues:**
- Too vague - no specific distances
- Doesn't use location_context data
- No differentiation between direct flights vs connections

**IMPROVED:**
```
Ligging – Bereikbaarheid voor Nederlandse eigenaar

SCORE-CRITERIA (op basis van afstanden):
5 = <15 km kust + <20 km stad (20k+ inwoners) + <60 km international airport + goede wegen
4 = 15-30 km kust + 20-40 km stad + 60-100 km airport + redelijke wegen
3 = 30-60 km kust + 40-60 km stad + 100-150 km airport + matige wegen
2 = 60-100 km kust + 60+ km stad + 150-200 km airport + slechte wegen
1 = >100 km kust + >80 km stad + >200 km airport + zeer afgelegen

BONUS FACTOREN (+0.5 tot +1 punt):
+ Directe snelweg toegang
+ <2 uur rijden vanaf grote luchthaven (BCN, Málaga, Porto, etc.)
+ Direct flight vanuit NL (Schiphol, Eindhoven, Rotterdam)
+ Nabij toeristische attracties

GEBRUIK:
- {locatie_context} voor afstanden en toegankelijkheid
- Check op "remote", "afgelegen", "difficult access" → lagere score
```

---

## 6. 🛒 Afstand tot Lokale Markt

**Current Definition:**
> relevante afzet voor producten of diensten

**Issues:**
- No distance thresholds
- Doesn't consider tourism seasonality
- Missing restaurant/bio shop opportunities

**IMPROVED:**
```
Lokale markt – Verkooppotentieel voor producten/groenten

SCORE-CRITERIA:
5 = <5 km dorp (1000+ inwoners) + wekelijkse markt + toeristen + restaurants nabij
4 = 5-10 km dorp + maandelijkse markt OF zomertoerisme + lokale winkels
3 = 10-20 km dorp + weinig toerisme + supermarkt bereikbaar
2 = 20-40 km dorp + geen markt + moeilijk bereikbaar + weinig afzet
1 = >40 km dorp + zeer afgelegen + geen lokale afzet

BONUS FACTOREN (+0.5 tot +1 punt):
+ Wijnroute/culinaire route
+ Biowinkel binnen 15 km
+ Restaurants die lokaal inkopen
+ Toeristisch gebied (kust, bergen, natuurpark)
+ Grote stad (<30 km) met farmers markets

GEBRUIK:
- {locatie_context} voor populatie en toerisme data
- Check op "touristic", "wine region", "national park"
```

---

# PART 2: CUSTOM CRITERIA (8 total)

These criteria use FREE APIs to calculate objective scores.

## 7. 🌧️ Rainfall (RainfallCriterion)

**Current Logic:**
- Uses Open-Meteo API (FREE)
- Optimal: 600-1200mm/year
- Score 5 = 700-1000mm, Score 1 = <400mm or >1600mm

**Status:** ✅ **GOOD - No changes needed**

**Scoring:**
- 5 = 700-1000mm (ideal for market gardening)
- 4 = 600-700mm or 1000-1200mm (good)
- 3 = 500-600mm or 1200-1400mm (acceptable)
- 2 = 400-500mm or 1400-1600mm (challenging)
- 1 = <400mm or >1600mm (difficult)

---

## 8. 🌡️ Temperature (TemperatureCriterion)

**Current Logic:**
- Uses Open-Meteo API (FREE)
- Checks average, max, min temps
- Evaluates growing season length

**Status:** ✅ **GOOD - Minor refinement suggested**

**Suggested Improvement:**
```python
# Add frost-free days count
frost_free_days = len([t for t in min_temps if t > 0])
if frost_free_days > 300:
    score += 0.5
    reasoning.append(f"{frost_free_days} frost-free days - excellent growing season")
elif frost_free_days < 180:
    score -= 0.5
    reasoning.append(f"Only {frost_free_days} frost-free days - short season")
```

---

## 9. 🌍 Climate Change Risk (ClimateChangeCriterion)

**Current Logic:**
- Evaluates based on latitude (tropical = higher risk)
- Mediterranean/drought-prone regions get penalties
- Coastal areas penalized for sea level rise

**Issues:**
- Too pessimistic for Mediterranean properties
- Doesn't consider altitude (mountains safer from heat)
- Missing wildfire risk

**IMPROVED:**
```python
# Add altitude protection
altitude = property_data.get('altitude_m', 0)
if altitude > 500 and abs_lat > 35:
    score += 0.5
    reasoning.append(f"Higher altitude ({altitude}m) - cooler microclimate")

# Refine Mediterranean scoring (not always bad!)
if 'mediterranean' in climate_zone.lower():
    if altitude > 300 and rainfall > 500:
        score += 0  # Neutral (altitude + decent rain = OK)
        reasoning.append("Mediterranean but with altitude and rainfall")
    else:
        score -= 0.5  # Only penalize if low + dry
        reasoning.append("Mediterranean climate - drought risk")

# Add wildfire risk (Spain, Portugal, Greece, Southern France)
high_fire_risk_regions = ['Spain', 'Portugal', 'Greece', 'Provence', 'Catalonia']
if any(region in country or region in str(property_data.get('region', ''))
       for region in high_fire_risk_regions):
    if altitude < 500:
        score -= 0.5
        reasoning.append("High wildfire risk region")
```

---

## 10. ✈️ Airport Distance (AirportDistanceCriterion)

**Current Logic:**
- Calculates distance to 8 major European airports
- Closer = better score

**Issues:**
- Only checks 8 airports (missing many regional ones)
- Doesn't check direct flights from NL

**IMPROVED:**
```python
# Add more airports (especially for Spain, Portugal, Italy)
additional_airports = {
    'Málaga': (36.6749, -4.4990),
    'Alicante': (38.2822, -0.5581),
    'Porto': (41.2481, -8.6814),
    'Faro': (37.0144, -7.9659),
    'Valencia': (39.4893, -0.4817),
    'Seville': (37.4180, -5.8931),
    'Nice': (43.6584, 7.2159),
    'Toulouse': (43.6293, 1.3638),
    'Bordeaux': (44.8283, -0.7156),
    'Nantes': (47.1532, -1.6108),
    'Pisa': (43.6839, 10.3927),
    'Venice': (45.5053, 12.3519)
}

# Add direct flight bonus
direct_flight_airports_from_nl = [
    'Barcelona', 'Madrid', 'Málaga', 'Alicante', 'Valencia',
    'Porto', 'Faro', 'Nice', 'Toulouse', 'Pisa'
]
if nearest_airport in direct_flight_airports_from_nl:
    score += 0.5
    reasoning.append(f"Direct flights from NL to {nearest_airport}")
```

---

## 11. 🏘️ Rural Character (PopulationDensityCriterion)

**Current Logic:**
- Uses country-level population density
- Lower density = higher score (more rural)

**Issues:**
- Country-level too broad (e.g., all of Spain = same score)
- Doesn't use actual local population from location data

**IMPROVED:**
```python
# Use locality population from geocoding data
locality = property_data.get('locality', '')
city = property_data.get('city', '')
municipality = property_data.get('municipality', '')

# Try to extract population from location context
population = property_data.get('population', 0)

# Score based on local population
if population > 0:
    if population < 500:
        score = 5
        reasoning.append(f"Very rural - hamlet/village of {population} people")
    elif population < 2000:
        score = 4
        reasoning.append(f"Rural village - {population} people")
    elif population < 5000:
        score = 3
        reasoning.append(f"Small town - {population} people")
    elif population < 20000:
        score = 2
        reasoning.append(f"Town - {population} people")
    else:
        score = 1
        reasoning.append(f"Urban - {population} people")
else:
    # Fallback to country density
    # ... existing logic ...
```

---

## 12. 🌱 Soil Quality (SoilQualityCriterion)

**Current Logic:**
- Uses regional knowledge (Netherlands = 5, Mediterranean = 3)
- Based on country and latitude

**Issues:**
- Too simplified - doesn't account for local variation
- Missing mountains (rocky) vs valleys (fertile)
- Could integrate actual soil data

**Status:** ⚠️ **ACCEPTABLE but could improve with SoilGrids API**

**Suggested Addition:**
```python
# Add altitude-based refinement
altitude = property_data.get('altitude_m', 0)
if altitude > 800:
    score = min(score, 3)
    reasoning.append(f"High altitude ({altitude}m) - rocky/thin soil likely")
elif altitude > 400:
    score = min(score, 4)
    reasoning.append(f"Mountain altitude ({altitude}m) - soil quality variable")

# Add terrain description check
description = property_data.get('summary', '').lower()
if any(word in description for word in ['rocky', 'stone', 'gravel', 'pebble']):
    score -= 1
    reasoning.append("Description mentions rocky terrain")
if any(word in description for word in ['fertile', 'rich soil', 'loam', 'clay']):
    score += 1
    reasoning.append("Description mentions fertile soil")
```

---

## 13. 💧 Water Availability (WaterAvailabilityCriterion)

**Current Logic:**
- Combines rainfall with regional water stress
- Spain/Portugal/Greece penalized for water scarcity

**Issues:**
- Too harsh on Mediterranean regions
- Doesn't check for wells, rivers, irrigation systems
- Misses rainwater collection potential

**IMPROVED:**
```python
# Check property description for water sources
description = property_data.get('summary', '').lower()
has_well = any(word in description for word in ['well', 'borehole', 'water source', 'spring', 'pozo'])
has_river = any(word in description for word in ['river', 'stream', 'creek', 'arroyo', 'río'])
has_irrigation = any(word in description for word in ['irrigation', 'water rights', 'acequia', 'riego'])

# Adjust score based on on-site water
if has_well or has_river:
    score += 1.5
    reasoning.append("Property has its own water source (well/river)")
elif has_irrigation:
    score += 1
    reasoning.append("Property has irrigation rights")

# Mediterranean with own water = not a problem
if water_stress_country and (has_well or has_river or has_irrigation):
    score = min(score + 1, 5)  # Cancel out the water stress penalty
    reasoning.append("Water stress mitigated by on-site source")
```

---

## 14. 🏠 Airbnb Potential (AirbnbRentabilityCriterion)

**Current Logic:**
- Checks if property in tourist regions
- Coastal areas score higher
- Wine regions get bonus

**Issues:**
- Doesn't check bedroom count (essential for Airbnb!)
- Missing ski resort areas
- No check for tourist attractions nearby

**IMPROVED:**
```python
# MUST have bedrooms to be viable Airbnb
bedrooms = property_data.get('bedrooms', 0)
if bedrooms == 0:
    return {
        'score': 1,
        'reasoning': ["No bedrooms - not suitable for Airbnb"],
        'raw_data': {'bedrooms': 0}
    }
elif bedrooms == 1:
    score = max(score - 1, 2)  # Cap at 2
    reasoning.append("Only 1 bedroom - limited Airbnb potential")

# Add ski resort regions
ski_regions = ['Alps', 'Pyrenees', 'Sierra Nevada', 'Dolomites']
if any(region in region_name for region in ski_regions):
    score += 1
    reasoning.append(f"Ski resort area - year-round tourism")

# Check for specific attractions
description = property_data.get('summary', '').lower()
if any(word in description for word in ['beach', 'sea view', 'ocean', 'costa', 'playa']):
    score += 0.5
    reasoning.append("Beach/sea view - high Airbnb demand")
if any(word in description for word in ['castle', 'historic', 'heritage', 'monument']):
    score += 0.5
    reasoning.append("Historic/heritage site nearby")
```

---

# PART 3: IMPLEMENTATION PLAN

## Priority 1: Update GPT Prompt (HIGH)

File: `prompt.txt`

Update all 6 GPT criteria with improved definitions from above.

## Priority 2: Add KPI Validation (HIGH)

File: `custom_criteria.py` or `analyze_from_urls.py`

```python
def validate_gpt_scores(property_data, gpt_scores):
    """
    Post-process GPT scores with hard validation rules
    """
    land_size = property_data.get('land_size_m2', 0)
    building_size = property_data.get('building_size_m2', 0)
    bedrooms = property_data.get('bedrooms', 0)
    bathrooms = property_data.get('bathrooms', 0)

    # RULE 1: No building = no guest/rental/workshop
    if building_size == 0 and bedrooms == 0:
        gpt_scores['guest_accommodation'] = 1
        gpt_scores['rental_units'] = 1
        gpt_scores['workshop'] = min(gpt_scores.get('workshop', 1), 2)

    # RULE 2: Insufficient bedrooms
    if bedrooms == 0:
        gpt_scores['guest_accommodation'] = 1
        gpt_scores['rental_units'] = 1
    elif bedrooms == 1:
        gpt_scores['guest_accommodation'] = min(gpt_scores.get('guest_accommodation', 1), 2)
        gpt_scores['rental_units'] = 1
    elif bedrooms == 2:
        gpt_scores['rental_units'] = min(gpt_scores.get('rental_units', 1), 2)

    # RULE 3: Insufficient land
    if land_size > 0 and land_size < 2500:
        gpt_scores['market_garden'] = min(gpt_scores.get('market_garden', 1), 2)
    elif land_size == 0:
        gpt_scores['market_garden'] = 1

    # RULE 4: Small building = limited workshop
    if building_size > 0 and building_size < 40:
        gpt_scores['workshop'] = min(gpt_scores.get('workshop', 1), 2)

    return gpt_scores
```

## Priority 3: Enhance Custom Criteria (MEDIUM)

Update files: individual criterion classes in `custom_criteria.py`

- Water: Add well/river detection
- Airbnb: Require bedrooms
- Population: Use local data
- Soil: Add altitude/terrain checks
- Airport: Add more airports

## Priority 4: Improve KPI Extraction (MEDIUM)

Better scraping to get:
- Land size (m² or hectares)
- Building size (m²)
- Bedrooms count
- Bathrooms count
- Property type (house, land, farm, etc.)

## Priority 5: Add Warning System (LOW)

Flag properties for manual review:
- High score but missing KPIs
- Price too low for score (<€50k with 4+ rating)
- Description doesn't match scores

---

# SUMMARY

**Total Criteria: 14**

**Status:**
- ✅ **2 Good**: Rainfall, Temperature
- ⚠️ **6 Need Improvement**: All GPT criteria
- 🔧 **6 Need Enhancement**: Climate, Airport, Population, Soil, Water, Airbnb

**Next Steps:**
1. Update prompt.txt with improved GPT criteria
2. Add validation layer for GPT scores
3. Enhance custom criteria with better data
4. Re-run all analysis

**Expected Impact:**
- Eliminate false positives (land scoring high for Guest/Rental)
- More accurate market garden assessments
- Better Airbnb potential evaluation
- Reduce manual review needed by 60%+

