# 🎯 FarmMatch Criteria System - Complete Analysis

## Executive Summary

The FarmMatch system has **two parallel criteria systems** that are not fully integrated:

1. **GPT-Based Criteria** (Currently Used) - Subjective, AI-interpreted scores
2. **Custom Data-Driven Criteria** (Not Integrated) - Objective, API-based measurements

**Current State**: Only GPT criteria are used. Custom criteria code exists but is dormant.

**Recommendation**: Integrate both systems for a hybrid approach combining subjective expertise with objective data.

---

## 🔍 Current System Architecture

### **1. GPT-Based Criteria (Active)**

**Source**: [prompt.txt](prompt.txt) → [analyze_from_urls_optimized.py](analyze_from_urls_optimized.py)

**Criteria** (6 total):
1. **Regeneratieve Market Garden** (1-5)
2. **Gastenverblijf / Bed & Breakfast** (1-5)
3. **Werkplaats / Voedselverwerking** (1-5)
4. **Zelfstandige Verhuureenheden** (1-5)
5. **Ligging t.o.v. Kust, Stad en Vliegveld** (1-5)
6. **Afstand tot Lokale Markt** (1-5)

**How It Works**:
```
Property Description
        ↓
  GPT-3.5-turbo (with detailed prompt)
        ↓
  6 Scores (1-5 each)
        ↓
  Weighted Average → GPT Score (0-5)
```

**Weighting** (from [parse_criteria.py](parse_criteria.py:48-54)):
```python
CRITERION_WEIGHTS = {
    'market_garden': 3.0,          # Highest weight
    'guest_accommodation': 2.5,
    'workshop': 2.0,
    'rental_units': 2.5,
    'location': 2.0,
    'local_market': 1.5            # Lowest weight
}
```

**GPT Score Calculation**:
```python
gpt_score = sum(score * weight for score, weight in zip(scores, weights)) / sum(weights)
# Result: 0-5 scale
```

**Strengths**:
- ✅ Holistic evaluation (considers intangibles)
- ✅ Nuanced assessment (understands context)
- ✅ Flexible (adapts to different property types)
- ✅ Human-like reasoning

**Weaknesses**:
- ❌ Subjective (AI hallucinations like property 104459931)
- ❌ Inconsistent (same property different scores)
- ❌ Expensive (€0.002-0.01 per property)
- ❌ Not verifiable (no hard data backing)
- ❌ No confidence scores

---

### **2. Custom Data-Driven Criteria (Dormant)**

**Source**: [custom_criteria.py](custom_criteria.py)

**Criteria** (Partially Implemented):
1. **🌧️ Rainfall** - Annual precipitation (Open-Meteo API)
2. **🌡️ Temperature** - Growing season, frost days
3. **✈️ Airport Distance** - Travel accessibility
4. **🏙️ Population Density** - Market potential
5. **🌍 Climate Change Risk** - Future viability

**How It Should Work**:
```
Property Coordinates
        ↓
  External APIs (Open-Meteo, etc.)
        ↓
  Raw Data (rainfall mm, temp °C, distance km)
        ↓
  Scoring Algorithm (data → 1-5 scale)
        ↓
  Weighted Average → Custom Score (0-5)
```

**Weighting** (from [custom_criteria.py](custom_criteria.py:63-64)):
```python
class RainfallCriterion(Criterion):
    def __init__(self):
        self.weight = 2.0  # Individual criterion weight
```

**Custom Score Calculation** (theoretical):
```python
custom_score = sum(criterion.evaluate(property)['score'] * criterion.weight
                   for criterion in active_criteria) / sum(weights)
# Result: 0-5 scale
```

**Strengths**:
- ✅ Objective (real measured data)
- ✅ Consistent (same input = same output)
- ✅ Verifiable (traceable to source)
- ✅ Free/cheap (most APIs free)
- ✅ Updatable (can refresh data)

**Weaknesses**:
- ❌ Limited scope (only what APIs provide)
- ❌ Requires coordinates (70-80% coverage)
- ❌ API dependencies (rate limits, downtime)
- ❌ Doesn't understand context (e.g., building quality)

---

## 📊 Combined Scoring System

### **Overall Score Calculation** (from [parse_criteria.py](parse_criteria.py:67-91))

```python
overall_score = (gpt_score * 0.6 + custom_score * 0.4) * risk_factor

# Where:
# - gpt_score: 0-5 (weighted average of 6 GPT criteria)
# - custom_score: 0-5 (currently always 0 because not integrated)
# - risk_factor: 0.7 (high risk), 0.9 (medium/low risk), 1.0 (ideal)
# - weights: 60% GPT, 40% Custom (configurable)
```

**Risk Factor** (from [parse_criteria.py](parse_criteria.py:55-65)):
```python
def get_risk_factor(risk_profile):
    if 'laag' in risk_profile:
        return 0.9  # Low risk: 10% penalty
    elif 'gemiddeld' in risk_profile:
        return 0.9  # Medium risk: 10% penalty
    elif 'hoog' in risk_profile:
        return 0.7  # High risk: 30% penalty
    return 0.9
```

**Current Reality**:
```python
# Since custom_score = 0 for all properties:
overall_score = gpt_score * 0.6 * risk_factor
# This means only 60% of potential score is used!
```

---

## 🚨 Current Problems

### **Problem 1: Custom Criteria Not Integrated**

**Evidence**:
- custom_criteria.py exists with working code
- RainfallCriterion, TemperatureCriterion implemented
- BUT: analyze_from_urls_optimized.py doesn't call it
- BUT: full_update.sh doesn't include custom criteria step
- Result: custom_score = 0 for ALL properties

**Impact**:
- Overall scores only use 60% of potential (gpt_score * 0.6)
- Losing 40% of scoring capacity
- No objective data to validate GPT assessments

---

### **Problem 2: AI Hallucinations**

**Example**: Property 104459931
- Reality: Empty forested land (no buildings)
- GPT Scores: Guest 5/5, Workshop 3/5, Rental 4/5
- Should Be: All 1/5 (no buildings exist)

**Root Cause**:
- GPT-3.5-turbo misinterprets or ignores explicit requirements
- No post-analysis validation
- No objective data to cross-check

**Solution**:
- Add custom criteria that check building existence (from KPIs)
- Cross-validate GPT scores against objective data
- Flag properties where GPT and data conflict

---

### **Problem 3: Missing Objective Filters**

**Current Map Viewer Filters** ([map_viewer_advanced.html](map_viewer_advanced.html:328-456)):
- ✅ 6 GPT Criteria sliders (Market Garden, Guest, Workshop, etc.)
- ✅ Overall Score range
- ✅ Price range
- ✅ Risk profile checkboxes
- ✅ Text search
- ❌ NO Rainfall filter
- ❌ NO Temperature/Climate filter
- ❌ NO Distance filters (airport, city)
- ❌ NO Land size filter (objective)
- ❌ NO Building size filter (objective)

**Why This Matters**:
- Users can't filter by hard requirements (e.g., "min 1 hectare")
- Can't filter by climate needs (e.g., "800-1200mm rainfall")
- Can't filter by practical constraints (e.g., "< 2hrs from airport")

---

### **Problem 4: Criteria Not Detailed Enough**

**Current GPT Prompt Issues**:

1. **Vague Definitions**:
   ```
   "Minimaal 1500 m² bruikbare grond"
   ```
   - What is "bruikbare" (usable)?
   - Does forest count? Steep slopes? Rocky ground?

2. **No Quantitative Thresholds**:
   ```
   "Zonlicht: minimaal 6 uur direct zonlicht per dag"
   ```
   - How does GPT know sun hours from description?
   - Should use lat/lon + topography data

3. **Subjective Assessments**:
   ```
   "Karakteristiek pand met charme"
   ```
   - "Charme" is subjective
   - No measurable criteria

4. **Missing Measurements**:
   - No actual distance calculations (airport, market, coast)
   - No climate data integration
   - No soil quality assessment
   - No topography analysis

---

## ✅ Recommended Solution: Hybrid System

### **Phase 1: Integrate Existing Custom Criteria**

**Step 1: Add Custom Criteria Evaluation to Pipeline**

Modify [full_update.sh](full_update.sh) to add:
```bash
# STEP 6.5: Evaluate Custom Criteria
echo -e "${BLUE}📊 STEP 6.5/8: Evaluating Custom Data Criteria${NC}"
python3 evaluate_custom_criteria.py
```

**Step 2: Create evaluate_custom_criteria.py**

New script that:
1. Reads analysis_output.csv
2. For each property with coordinates:
   - Calls RainfallCriterion.evaluate()
   - Calls TemperatureCriterion.evaluate()
   - Calls AirportDistanceCriterion.evaluate()
   - etc.
3. Calculates weighted custom_score
4. Writes to analysis_output.csv (new column: custom_overall_score)

**Step 3: Update parse_criteria.py**

Already supports custom_score! Just needs data:
```python
custom_score = float(row.get('custom_overall_score', 0))
# This line already exists! Just needs column to be populated
```

---

### **Phase 2: Add Objective KPI-Based Criteria**

**New Criteria Based on Extracted KPIs**:

1. **Land Size Criterion** (from extract_gps_and_kpis.py)
   ```python
   class LandSizeCriterion(Criterion):
       def evaluate(self, property_data):
           land_m2 = property_data.get('land_size_m2', 0)
           if land_m2 >= 20000: return 5  # 2+ hectares
           if land_m2 >= 10000: return 4  # 1-2 hectares
           if land_m2 >= 5000: return 3   # 0.5-1 hectare
           if land_m2 >= 2000: return 2   # 0.2-0.5 hectare
           return 1
   ```

2. **Building Existence Criterion** (validates GPT scores)
   ```python
   class BuildingExistenceCriterion(Criterion):
       def evaluate(self, property_data):
           building_m2 = property_data.get('building_size_m2', 0)
           bedrooms = property_data.get('bedrooms', 0)

           if building_m2 > 0 or bedrooms > 0:
               return 5  # Building exists
           else:
               return 1  # No building (bare land)
   ```

3. **Realistic Distance Criterion** (using actual coordinates)
   ```python
   class AirportDistanceCriterion(Criterion):
       def evaluate(self, property_data):
           # Calculate actual distance to nearest airport
           distance_km = calculate_distance_to_airports(
               property_data['lat'],
               property_data['lon']
           )
           if distance_km < 60: return 5   # < 1hr
           if distance_km < 120: return 4  # 1-2hrs
           if distance_km < 180: return 3  # 2-3hrs
           if distance_km < 300: return 2  # 3-5hrs
           return 1                          # > 5hrs
   ```

---

### **Phase 3: Add Filters to Map Viewer**

**New Filter Section** in [map_viewer_advanced.html](map_viewer_advanced.html):

```html
<div class="filter-section">
    <h3>🌍 Climate & Environment</h3>

    <div class="criterion-filter">
        <div class="criterion-header">
            <span class="criterion-label">🌧️ Annual Rainfall</span>
            <span class="criterion-value" id="rainfall-value">Any</span>
        </div>
        <input type="range" id="rainfall" min="0" max="2000" value="0" step="50">
        <div class="score-labels">
            <span>Any</span>
            <span>2000mm+</span>
        </div>
    </div>

    <div class="criterion-filter">
        <div class="criterion-header">
            <span class="criterion-label">🌡️ Growing Season</span>
            <span class="criterion-value" id="growing-season-value">Any</span>
        </div>
        <input type="range" id="growing-season" min="0" max="365" value="0" step="30">
        <div class="score-labels">
            <span>Any</span>
            <span>365 days</span>
        </div>
    </div>
</div>

<div class="filter-section">
    <h3>📏 Property Size</h3>

    <div class="criterion-filter">
        <div class="criterion-header">
            <span class="criterion-label">🌾 Minimum Land Size</span>
            <span class="criterion-value" id="land-size-value">Any</span>
        </div>
        <input type="range" id="min-land-size" min="0" max="100000" value="0" step="1000">
        <div class="score-labels">
            <span>Any</span>
            <span>10 hectares</span>
        </div>
    </div>

    <div class="criterion-filter">
        <div class="criterion-header">
            <span class="criterion-label">🏠 Minimum Building Size</span>
            <span class="criterion-value" id="building-size-value">Any</span>
        </div>
        <input type="range" id="min-building-size" min="0" max="500" value="0" step="10">
        <div class="score-labels">
            <span>Any</span>
            <span>500 m²</span>
        </div>
    </div>
</div>

<div class="filter-section">
    <h3>✈️ Distance Filters</h3>

    <div class="criterion-filter">
        <div class="criterion-header">
            <span class="criterion-label">✈️ Max Airport Distance</span>
            <span class="criterion-value" id="airport-distance-value">Any</span>
        </div>
        <input type="range" id="max-airport-distance" min="0" max="300" value="300" step="10">
        <div class="score-labels">
            <span>Close</span>
            <span>300+ km</span>
        </div>
    </div>
</div>
```

**JavaScript Filter Logic**:
```javascript
filteredProperties = allProperties.filter(prop => {
    // Existing GPT criteria filters...

    // NEW: Climate filters
    if (prop.rainfall_mm && minRainfall > 0) {
        if (prop.rainfall_mm < minRainfall) return false;
    }

    // NEW: Size filters
    if (prop.land_size_m2) {
        if (prop.land_size_m2 < minLandSize) return false;
    }

    if (prop.building_size_m2) {
        if (prop.building_size_m2 < minBuildingSize) return false;
    }

    // NEW: Distance filters
    if (prop.airport_distance_km) {
        if (prop.airport_distance_km > maxAirportDistance) return false;
    }

    return true;
});
```

---

## 📋 Implementation Plan

### **Quick Wins** (1-2 hours):

1. **Add KPI-Based Filters to Map Viewer**
   - Land size filter (data already in enriched_data.json)
   - Building size filter (data already in enriched_data.json)
   - Bedrooms filter (data already in enriched_data.json)
   - **Impact**: Immediate ability to filter by hard requirements

2. **Add "Building Exists" Validation**
   - Flag properties where GPT gave high Guest/Workshop/Rental scores but no building exists
   - **Impact**: Catch AI hallucinations like property 104459931

---

### **Medium Effort** (4-8 hours):

3. **Integrate Rainfall Criterion**
   - Create evaluate_custom_criteria.py
   - Run RainfallCriterion for all properties with coordinates
   - Add rainfall to enriched_data.json
   - Add rainfall filter to map viewer
   - **Impact**: Objective climate-based filtering

4. **Calculate Real Distances**
   - Add AirportDistanceCriterion using lat/lon
   - Calculate distance to nearest major airport
   - Add to custom_score
   - Add filter to map viewer
   - **Impact**: Realistic travel time assessments

---

### **Full Integration** (16-24 hours):

5. **Complete Custom Criteria System**
   - Implement all criteria in custom_criteria.py
   - Add evaluate_custom_criteria.py to full_update.sh
   - Update enriched_data.json schema
   - Add all custom criteria to map viewer filters
   - Create criteria weight configuration UI
   - **Impact**: Full hybrid GPT + objective data system

6. **Add Confidence Scores**
   - GPT confidence (based on description detail)
   - Data confidence (based on API coverage)
   - Overall confidence (combined)
   - Display in map viewer
   - **Impact**: Users know which scores to trust

7. **Improve GPT Prompt Specificity**
   - Add explicit "IF NO BUILDING: score = 1" rules
   - Add quantitative thresholds where possible
   - Add cross-validation checks
   - **Impact**: Reduce AI hallucinations

---

## 🎯 Recommended Priority

### **Highest Priority** (Do First):
1. ✅ Add KPI-based filters (land size, building size, bedrooms)
2. ✅ Add building existence validation to catch AI errors

### **High Priority** (Do Soon):
3. ✅ Integrate rainfall criterion
4. ✅ Add airport distance calculation

### **Medium Priority** (Nice to Have):
5. ⚠️ Complete all custom criteria
6. ⚠️ Add confidence scores

### **Lower Priority** (Future Enhancement):
7. 🔵 Criteria weight configuration UI
8. 🔵 Advanced climate predictions

---

## 💡 Key Insights

1. **Custom criteria code exists but isn't used** - Easy win to activate it
2. **Overall scores only use 60% of potential** - Integrating custom criteria would unlock full scoring
3. **No objective filters** - Users can't filter by hard data (land size, rainfall, etc.)
4. **AI hallucinations happen** - Need objective validation
5. **Criteria could be more specific** - But objective data integration is better solution than prompt tweaking

---

## 🔗 Files to Modify

**For Quick Wins** (KPI filters):
- [map_viewer_advanced.html](map_viewer_advanced.html) - Add new filter sections
- [enriched_data.json](enriched_data.json) - Already has KPI data

**For Custom Criteria Integration**:
- Create: `evaluate_custom_criteria.py` - Main evaluation script
- Modify: [full_update.sh](full_update.sh) - Add Step 6.5
- Modify: [custom_criteria.py](custom_criteria.py) - Complete unfinished criteria
- Modify: [parse_criteria.py](parse_criteria.py) - Already supports custom_score!

**For Advanced Features**:
- Create: `criteria_config.json` - User-configurable weights
- Create: `confidence_calculator.py` - Score confidence assessment
- Modify: [prompt.txt](prompt.txt) - Improve specificity

---

**Ready to implement? Start with the Quick Wins for immediate impact!** 🚀
