# Breadcrumb Extraction & Geocoding Example

## Overview
The breadcrumb extraction feature extracts location hierarchies from Properstar.nl listings to dramatically improve geocoding success rates.

## Before vs After

**Before breadcrumb extraction:**
- Only 3 properties had coordinates (from Properstar metadata)
- 183 properties had no location data

**After breadcrumb extraction:**
- 73 properties now have coordinates (39% success rate!)
- 67 geocoded via breadcrumbs
- 2 geocoded via breadcrumb with 0.70 threshold
- 1 geocoded via breadcrumb_fixed method

## Example Breadcrumbs

### Successful Geocoding
```
URL: https://www.properstar.nl/listing/97922528
Breadcrumb: Italië > Calabrië > Cosenza > Bisignano > Agrarische exploitatie
Geocoded to: Bisignano, Cosenza, Calabria, 87043, Italia
Coordinates: 39.5001974, 16.2744614
```

### Breadcrumb Structure
```
Country > Region > Province > City > Property Type
Example: Frankrijk > Nouvelle-Aquitaine > Charente-Maritime > Saintes > Huis
```

## How It Works

### 1. Extract Breadcrumbs
```bash
python3 extract_breadcrumbs.py
```
Scrapes breadcrumb navigation from each property page and saves to `extracted_property_urls.csv`.

### 2. Geocode with Breadcrumbs
```bash
python3 geocode_with_breadcrumbs.py
```
Uses breadcrumb data with smart fallback strategies:
- Try full city, province, country
- Try city, country only
- Try province, country
- Apply fuzzy matching if needed

### 3. View on Map
```bash
python3 -m http.server 8000
open http://localhost:8000/map_viewer_advanced.html
```

## Statistics

From 186 properties:
- **162 (87%)** have breadcrumbs extracted
- **73 (39%)** successfully geocoded
- **113 (61%)** still need location data

## Why Some Properties Fail

Properties without geocoding typically have:
1. No breadcrumb data on Properstar (generic listings)
2. Very vague location information (just country/region)
3. Property types instead of places ("Agrarische exploitatie" without city)
4. New listings not yet fully populated

## Integration with Pipeline

The breadcrumb extraction integrates seamlessly:

```bash
# 1. Scrape favorites
python3 sync_favorites.py

# 2. Extract breadcrumbs (NEW!)
python3 extract_breadcrumbs.py

# 3. Geocode with breadcrumbs (NEW!)
python3 geocode_with_breadcrumbs.py

# 4. Analyze properties
python3 analyze_from_urls.py
python3 custom_criteria.py

# 5. View on map
python3 -m http.server 8000
open http://localhost:8000/map_viewer_advanced.html
```

## Benefits

1. **Better Visualization**: More properties visible on the map
2. **Location-Based Filtering**: Filter by region, country
3. **Climate Data**: Enable climate analysis for more properties
4. **Distance Calculations**: Calculate distances to airports, cities
5. **Market Analysis**: Understand geographic distribution of opportunities

## Future Improvements

Potential enhancements:
- Use alternative geocoding services (Google Maps, MapBox)
- Implement caching to avoid re-geocoding
- Add manual coordinate override for specific properties
- Parse property descriptions for additional location clues
- Use AI to extract location from property text

---

**Result:** From 3 to 73 properties with coordinates - a 24x improvement! 🎉
