# GPT Analysis Quality Improvements

## Problem Identified

**Example**: Property https://www.properstar.nl/listing/71224777 (€11,000 land plot with olive trees)
- Scored 5/5 for Guest Accommodation
- Scored 4/5 for Rental Units
- **BUT: No building exists** (just bare land)

### Root Causes

1. **Incomplete property descriptions**: Summary field truncated, missing critical info
2. **Vague prompt instructions**: Didn't explicitly require buildings for Guest/Rental/Workshop criteria
3. **GPT assumptions**: Model assumed potential rather than current state
4. **No KPI validation**: No cross-check between GPT scores and extracted KPIs (bedrooms, building_size_m2)

## Solutions Implemented

### 1. Improved Prompt (prompt.txt)

**Added explicit requirements:**

```
BELANGRIJK: Lees de beschrijving ZORGVULDIG. Als er GEEN bestaande gebouwen of woningen worden genoemd, geef dan lage scores (1-2) voor criteria die gebouwen vereisen.

2. Gastenverblijf – VEREIST BESTAANDE WONING/GEBOUW met slaapkamers.
   Als er geen gebouw is of wordt genoemd, score = 1.

3. Werkplaats/voedselverwerking – VEREIST BESTAANDE GEBOUWEN.
   Als er geen gebouwen zijn of worden genoemd, score = 1.

4. Zelfstandige verhuureenheden – VEREIST BESTAANDE WONING/GEBOUW met meerdere slaapkamers.
   Als er geen gebouw is of wordt genoemd, score = 1.
```

### 2. Recommended: Post-Processing Validation

**Create validation rules** to override GPT scores based on KPIs:

```python
def validate_scores(property_data, gpt_scores):
    """Validate GPT scores against extracted KPIs"""

    # No building = no guest/rental/workshop
    if not property_data.get('building_size_m2') and not property_data.get('bedrooms'):
        gpt_scores['guest_accommodation'] = min(gpt_scores.get('guest_accommodation', 1), 1)
        gpt_scores['rental_units'] = min(gpt_scores.get('rental_units', 1), 1)
        gpt_scores['workshop'] = min(gpt_scores.get('workshop', 1), 2)

    # No bedrooms = no guest/rental
    if not property_data.get('bedrooms'):
        gpt_scores['guest_accommodation'] = min(gpt_scores.get('guest_accommodation', 1), 1)
        gpt_scores['rental_units'] = min(gpt_scores.get('rental_units', 1), 2)

    # Small land = low market garden score
    land_size = property_data.get('land_size_m2', 0)
    if land_size > 0 and land_size < 2500:
        gpt_scores['market_garden'] = min(gpt_scores.get('market_garden', 1), 2)

    return gpt_scores
```

### 3. Recommended: Better Property Scraping

**Fetch full property page** instead of just truncated summary:

```python
async def scrape_full_property_page(url):
    """Scrape complete property description from listing page"""
    async with async_playwright() as p:
        browser = await p.chromium.launch(headless=True)
        page = await browser.new_page()
        await page.goto(url)

        # Get full description (not truncated summary)
        description = await page.query_selector('.property-description')
        full_text = await description.inner_text() if description else ""

        return full_text
```

## Testing

### Before Fix:
```json
{
  "url": "https://www.properstar.nl/listing/71224777",
  "building_size_m2": null,
  "bedrooms": null,
  "criteria": {
    "guest_accommodation": 5,  ❌ NO BUILDING
    "rental_units": 4,          ❌ NO BUILDING
    "workshop": 3               ❌ NO BUILDING
  }
}
```

### After Fix (Expected):
```json
{
  "url": "https://www.properstar.nl/listing/71224777",
  "building_size_m2": null,
  "bedrooms": null,
  "criteria": {
    "guest_accommodation": 1,  ✅ Correctly low
    "rental_units": 1,          ✅ Correctly low
    "workshop": 1,              ✅ Correctly low
    "market_garden": 4          ✅ Good for land
  }
}
```

## Implementation Priority

1. ✅ **DONE**: Update prompt.txt with explicit building requirements
2. **HIGH**: Add post-processing validation in custom_criteria.py
3. **MEDIUM**: Improve property scraping to get full descriptions
4. **LOW**: Add warning in UI when KPIs missing but scores high

## Next Steps

1. Re-run GPT analysis on all properties with updated prompt
2. Implement validation logic in analyze_from_urls.py
3. Add "Needs Review" flag for properties with score/KPI mismatches
4. Show validation warnings in Criteria Manager UI