# How to Prevent Data Issues - Complete Guide

## Issues Identified

### 1. Missing Breadcrumbs (113 properties)
**Problem**: `extract_breadcrumbs.py` only reads from `extracted_property_urls.csv` which has 163 properties, but `analysis_output.csv` has 186 properties.

**Why This Happens**:
- Breadcrumb extraction runs on existing CSV
- Skips properties that already have breadcrumbs
- Never checks for new properties in `analysis_output.csv`

**Prevention**: Use `fix_missing_breadcrumbs.py` script

### 2. Missing Prices (ALL 186 properties)
**Problem**: Price data was never scraped into the CSVs.

**Why This Happens**:
- `extracted_property_urls.csv` has empty Prijs column
- `parse_criteria.py` tries to merge prices but finds none
- Price filter in map viewer has nothing to filter

**Prevention**: Run favorites scraper with price extraction

### 3. Wrong Cached Coordinates (Property 102754054)
**Problem**: Old coordinates from `enriched_data.json` are preserved when new geocoding fails.

**Why This Happens**:
- `parse_criteria.py` line 186-187: keeps old lat/lon if new data is NaN
- Properties missing from breadcrumb extraction keep old wrong coordinates

**Prevention**: Always validate breadcrumbs before geocoding

## Automated Prevention Workflow

### Step 1: Data Collection
```bash
# Scrape favorites (includes prices)
python3 sync_favorites.py

# Extract breadcrumbs for ALL properties
python3 fix_missing_breadcrumbs.py
```

### Step 2: Validation
```bash
# Check for missing data
python3 validate_breadcrumbs.py
```

### Step 3: Geocoding
```bash
# Geocode using breadcrumbs
python3 geocode_with_breadcrumbs.py
```

### Step 4: Generate Output
```bash
# Create enriched_data.json with all data
python3 parse_criteria.py
```

### Step 5: Verify
```bash
# Final validation
python3 validate_breadcrumbs.py
```

## Quick Fix Commands

### Fix Missing Breadcrumbs Now:
```bash
python3 fix_missing_breadcrumbs.py
python3 geocode_with_breadcrumbs.py
python3 parse_criteria.py
```

### Fix Missing Prices:
```bash
# Re-scrape favorites to get prices
python3 sync_favorites.py

# Prices will be in extracted_property_urls.csv
# Then regenerate enriched data
python3 parse_criteria.py
```

### Verify Everything:
```bash
python3 validate_breadcrumbs.py
```

## Root Cause Analysis

### Why 102754054 Showed Paris Instead of Tribehou:

1. Property was NOT in `extracted_property_urls.csv` (no breadcrumb)
2. Old `enriched_data.json` had cached Paris coordinates (48.83, 2.38)
3. `parse_criteria.py` preserved old coordinates because new ones were missing
4. Map viewer displayed the cached wrong coordinates

### Solution Applied:
1. ✅ Added breadcrumb manually: "Frankrijk > Normandië > Manche > Tribehou"
2. ✅ Ran geocoding: Got correct coordinates (49.21, -1.24)
3. ✅ Ran parse_criteria: Updated enriched_data.json
4. ✅ Property now shows correctly in Tribehou, Normandy

## Prevention Checklist

Before viewing the map, always run:
- [ ] `python3 validate_breadcrumbs.py` - Check for missing data
- [ ] Fix any issues identified
- [ ] `python3 parse_criteria.py` - Regenerate enriched_data.json
- [ ] Verify map loads correctly

## Scripts Created

1. **validate_breadcrumbs.py** - Detects missing breadcrumbs and coordinates
2. **fix_missing_breadcrumbs.py** - Extracts breadcrumbs for missing properties

## Future Improvements

1. Make `extract_breadcrumbs.py` read from `analysis_output.csv` instead of `extracted_property_urls.csv`
2. Add price scraping to breadcrumb extraction
3. Add validation step to `parse_criteria.py` to warn about missing data
4. Create single `sync_all_data.py` script that runs full workflow
