# Criteria Optimization Analysis

## Current Issues Identified

### 1. **CRITICAL BUG: Rainfall Criterion**
**Location**: `custom_criteria.py:108`
**Issue**: Logic error in rainfall scoring
```python
# WRONG (current):
elif annual_rainfall >= 400 or annual_rainfall <= 2000:
    score = 3

# This means: if rainfall >= 400 OR rainfall <= 2000
# Almost ALL values satisfy this! (except 0-399 and >2000)
```

**Fix**:
```python
elif annual_rainfall >= 400 and annual_rainfall <= 2000:
    score = 3
```

**Impact**: ~85% of properties are incorrectly scoring 3 on rainfall when they should vary between 2-4.

---

### 2. **Poor Differentiation - Most Scores = 3.00**
**Observation**: Looking at enriched_data.json, most properties score exactly 3.00

**Root Causes**:
- Missing coordinates → default score 3
- Logic bugs (see rainfall above)
- Too wide "moderate" ranges
- Insufficient granularity in scoring

---

### 3. **Temperature/Growing Season - Too Lenient**
**Current thresholds**:
- ≥250 days + 12-18°C → Score 5 (excellent)
- ≥200 days → Score 4
- ≥150 days → Score 3
- <150 days → Score 2

**Problem**: Almost all Southern Europe gets 4-5, no differentiation

**Optimized thresholds** (regenerative market garden focus):
- ≥280 days + 13-17°C → Score 5 (optimal for year-round production)
- ≥240 days + 11-19°C → Score 4
- ≥200 days → Score 3
- ≥160 days → Score 2
- <160 days → Score 1

---

### 4. **Climate Risk - Not Granular Enough**
**Current**: Very broad regional assessments

**Improvements Needed**:
- Add drought frequency data
- Consider future climate projections (2030-2050)
- Include wildfire risk for Mediterranean
- Water table depletion risk
- Extreme heat days (>35°C)

---

### 5. **Population Density - Country-Level Only**
**Current**: Uses rough country averages
- Spain: 94 people/km²
- France: 119 people/km²

**Problem**: Massive variation within countries!
- Paris area: 20,000+/km²
- Rural Dordogne: <30/km²

**Solution**: Use OpenStreetMap Nominatim reverse geocoding to get actual region

---

### 6. **Soil Quality - Too Simplistic**
**Current**: Country-based rules
```python
if country == 'Netherlands': score = 5
elif country == 'Spain' and lat < 40: score = 2
```

**Better Approach**:
- Use SoilGrids API (free, global)
- Consider: pH, organic carbon, clay content, drainage
- Regional factors: Loire Valley = excellent, Spanish meseta = poor

---

### 7. **Water Availability - Needs Real Data**
**Current**: Relies on rainfall + hardcoded water stress regions

**Improvements**:
- Check proximity to rivers/lakes (OSM data)
- Groundwater availability by region
- Recent drought history
- Water rights/regulations by region

---

### 8. **Airbnb Potential - Good Structure, Needs Tuning**
**Current implementation is solid** but could improve:
- Add tourism statistics by region
- Consider seasonality (year-round vs summer-only)
- Account for saturation (too many Airbnbs)
- Regulations (some areas ban short-term rentals)

---

### 9. **GPT Prompt - Not Using Available Data**
**Current prompt**: Generic questions about suitability

**Missing opportunities**:
- Include actual rainfall data: "This property has 850mm annual rainfall"
- Include temperature: "Average 16°C, 240 growing days"
- Include distance data: "85km from Barcelona airport"

**Better approach**: Give GPT the FACTS, ask for interpretation

---

## Optimization Strategy

### Phase 1: Fix Critical Bugs (Immediate)
1. ✅ Fix rainfall logic bug
2. ✅ Tighten temperature thresholds
3. ✅ Add more score granularity (use .5 increments)

### Phase 2: Add Missing Data (High Impact)
1. ✅ Real population density (OSM reverse geocoding)
2. ✅ Soil data (SoilGrids API)
3. ✅ Water features proximity (OSM)
4. Elevation data (for microclimates, drainage)

### Phase 3: Enhance GPT Integration (Medium Impact)
1. ✅ Pass custom criteria results to GPT
2. ✅ Ask GPT to interpret the data, not guess
3. Add risk factor analysis
4. Market potential scoring

### Phase 4: Advanced Features (Lower Priority)
1. Permaculture zone mapping
2. Solar irradiance (for greenhouse potential)
3. Wind patterns
4. Frost risk days
5. Biodiversity indicators

---

## Expected Improvements

### Before Optimization:
- Average score range: 2.8 - 3.2 (very narrow!)
- 85% of properties score 3.0
- Limited differentiation

### After Phase 1-2:
- Expected score range: 1.5 - 4.5 (much wider)
- Normal distribution around 3.0
- Clear winners and losers

### Estimated Time to Implement:
- Phase 1: 30 minutes
- Phase 2: 2-3 hours
- Phase 3: 1-2 hours
- Phase 4: 4-6 hours

---

## Priority Recommendations

1. **FIX THE RAINFALL BUG** (5 min, huge impact)
2. Tighten scoring thresholds (30 min, high impact)
3. Add soil data from SoilGrids (1 hour, medium impact)
4. Add real population density (45 min, medium impact)
5. Enhance GPT prompt with data (45 min, high impact)
6. Add water features proximity (1 hour, medium impact)

---

## Specific Code Changes

### 1. Rainfall Criterion Fix
```python
# Line 108 in custom_criteria.py
# CHANGE FROM:
elif annual_rainfall >= 400 or annual_rainfall <= 2000:

# CHANGE TO:
elif 400 <= annual_rainfall < 600:
    score = 2.5
    reasoning.append(f"Adequate rainfall: {annual_rainfall:.0f}mm/year (irrigation recommended)")
elif 600 <= annual_rainfall <= 1500:
```

### 2. Temperature Thresholds
```python
# More granular scoring
if growing_days >= 280 and 13 <= avg_temp <= 17:
    score = 5
    reasoning.append(f"Optimal climate: {growing_days} days, {avg_temp:.1f}°C")
elif growing_days >= 260 and 11 <= avg_temp <= 19:
    score = 4.5
elif growing_days >= 240:
    score = 4
elif growing_days >= 210:
    score = 3.5
elif growing_days >= 180:
    score = 3
elif growing_days >= 160:
    score = 2.5
elif growing_days >= 140:
    score = 2
else:
    score = 1.5
```

### 3. Enhanced GPT Prompt Template
```
You are analyzing property: {title}

HARD DATA AVAILABLE:
- Location: {location}, {country}
- Annual rainfall: {rainfall}mm
- Growing season: {growing_days} days
- Average temperature: {avg_temp}°C
- Distance to airport: {airport_distance}km
- Population density: {density} people/km²
- Soil pH: {soil_ph}, Organic Carbon: {soil_oc}%

Based on this DATA (not guesses), evaluate suitability for:
1. Regenerative market garden
2. Guest accommodation
3. Workshop/processing
...

Be CRITICAL. A property with only 180 growing days and low rainfall should score LOW on market garden, even if the description sounds nice.
```

---

## Testing Strategy

1. Run optimized criteria on same 186 properties
2. Compare score distribution:
   - Mean score
   - Standard deviation (should increase!)
   - Range (min to max)
   - Properties with extreme scores (1-2 and 4-5)

3. Manual spot-checks:
   - Top 10 properties - do they make sense?
   - Bottom 10 properties - legitimate reasons?
   - Properties near 3.0 - mixed attributes?

---

## Next Steps

1. User approval for Phase 1 optimizations
2. Implement fixes in custom_criteria.py
3. Re-run evaluation on all properties
4. Analyze new score distribution
5. Proceed to Phase 2 if results improve
