# Property Availability Checker - Complete Guide

## Overview

The availability checker automatically verifies if properties are still listed/available and removes unavailable ones from your analysis.

## Features

✅ **Automated checking** - Checks all properties for availability
✅ **Smart caching** - Skips recently checked properties (configurable)
✅ **Detailed reporting** - Generates JSON reports with statistics
✅ **Backup system** - Creates backups before removing properties
✅ **Rate limiting** - Respects server resources (2 seconds between requests)
✅ **Multiple detection methods**:
- HTTP status codes (404, redirects)
- Page content analysis (removed/sold indicators)
- Essential elements check (title, price)

## Installation

```bash
cd /Users/jonathan/SynologyDrive/Since\ Today/PROJECTEN/farmmatch/scraper

# Install required packages
pip3 install beautifulsoup4 requests schedule
```

## Usage

### 1. Manual Check (Recommended First Time)

```bash
# Check all properties (skips those checked in last 24h)
python3 check_availability.py

# Force check ALL properties (ignore recent checks)
python3 check_availability.py --force
```

**What happens:**
- Checks each property URL
- Marks unavailable properties as `status: "Removed"`
- Saves results to `enriched_data.json`
- Generates `availability_check_report.json`
- Properties marked "Removed" are hidden in map viewer (when "Show Removed" is unchecked)

### 2. Permanently Remove Unavailable Properties

```bash
# Review and permanently delete properties marked as "Removed"
python3 check_availability.py --remove
```

**What happens:**
- Creates backup: `enriched_data_backup_YYYYMMDD_HHMMSS.json`
- Shows count of properties to remove
- Asks for confirmation
- Permanently removes from `enriched_data.json`

### 3. Automated Scheduling

#### Option A: Python Scheduler (Recommended)

```bash
# Run daily at 3 AM (default)
python3 auto_availability_check.py

# Run daily at specific time (24-hour format)
python3 auto_availability_check.py "14:30"

# Keep running in background
nohup python3 auto_availability_check.py > availability_checker.log 2>&1 &
```

#### Option B: macOS Launchd (System-Level Scheduling)

Create file: `~/Library/LaunchAgents/com.farmmatch.availability.plist`

```xml
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
    <key>Label</key>
    <string>com.farmmatch.availability</string>

    <key>ProgramArguments</key>
    <array>
        <string>/usr/bin/python3</string>
        <string>/Users/jonathan/SynologyDrive/Since Today/PROJECTEN/farmmatch/scraper/check_availability.py</string>
    </array>

    <key>StartCalendarInterval</key>
    <dict>
        <key>Hour</key>
        <integer>3</integer>
        <key>Minute</key>
        <integer>0</integer>
    </dict>

    <key>StandardOutPath</key>
    <string>/Users/jonathan/SynologyDrive/Since Today/PROJECTEN/farmmatch/scraper/availability_checker.log</string>

    <key>StandardErrorPath</key>
    <string>/Users/jonathan/SynologyDrive/Since Today/PROJECTEN/farmmatch/scraper/availability_checker_error.log</string>

    <key>WorkingDirectory</key>
    <string>/Users/jonathan/SynologyDrive/Since Today/PROJECTEN/farmmatch/scraper</string>
</dict>
</plist>
```

Load the job:
```bash
launchctl load ~/Library/LaunchAgents/com.farmmatch.availability.plist

# Check if running
launchctl list | grep farmmatch

# Unload (stop) the job
launchctl unload ~/Library/LaunchAgents/com.farmmatch.availability.plist
```

#### Option C: Cron (Alternative)

```bash
# Edit crontab
crontab -e

# Add this line (runs daily at 3 AM)
0 3 * * * cd /Users/jonathan/SynologyDrive/Since\ Today/PROJECTEN/farmmatch/scraper && /usr/bin/python3 check_availability.py >> availability_checker.log 2>&1
```

## Detection Logic

The checker identifies unavailable properties through:

### 1. HTTP Status Codes
- **404** → Property removed
- **5xx** → Server error (assumes available, retry later)

### 2. URL Redirects
- Redirect to search page → Property removed
- Redirect to homepage → Property removed

### 3. Page Content Analysis
Searches for keywords:
- "not available", "niet beschikbaar"
- "no longer available", "niet meer beschikbaar"
- "sold", "verkocht"
- "under offer", "in optie"
- "listing removed", "verwijderd"

### 4. Essential Elements
- Missing title → Likely removed
- Missing price indicators → Possibly removed

## Property Status Fields

After checking, each property gets:

```json
{
  "url": "https://...",
  "status": "Active" or "Removed",
  "availability_last_checked": "2025-10-08T19:30:00",
  "availability_status_code": 200,
  "availability_reason": "Property appears to be active",
  "removed_at": "2025-10-08T19:30:00",  // if removed
  "removal_reason": "Page not found (404)"  // if removed
}
```

## Reports

### availability_check_report.json

```json
{
  "timestamp": "2025-10-08T19:30:00",
  "total_properties": 186,
  "checked": 150,
  "skipped": 36,
  "still_available": 140,
  "newly_unavailable": 10,
  "already_unavailable": 0,
  "availability_rate": "93.3%"
}
```

## Best Practices

### Recommended Schedule

1. **Daily checks** (3 AM): `auto_availability_check.py`
2. **Weekly cleanup** (Sunday): `check_availability.py --remove`
3. **Monthly full check**: `check_availability.py --force`

### Performance Considerations

- **Rate limiting**: 2 seconds between requests (can process ~1,800 properties/hour)
- **Timeout**: 10 seconds per property
- **Smart caching**: Skips properties checked in last 24 hours
- **Incremental saves**: Saves progress every 10 properties

### Workflow Integration

```bash
# Daily automation script
#!/bin/bash

cd /Users/jonathan/SynologyDrive/Since\ Today/PROJECTEN/farmmatch/scraper

# 1. Check availability
python3 check_availability.py

# 2. If Sunday, clean up
if [ $(date +%u) -eq 7 ]; then
    echo "y" | python3 check_availability.py --remove
fi

# 3. Update map viewer (properties are auto-filtered)
# Map viewer already hides properties with status="Removed"
```

## Troubleshooting

### Problem: Too many false positives (properties marked unavailable when they're not)

**Solution**: The checker is conservative and assumes availability if uncertain. Check these:
- Is the website blocking requests? (User-Agent header issue)
- Are there temporary server errors? (5xx errors are treated as "available")
- Check `availability_reason` field for details

### Problem: Properties not being removed

**Possible causes**:
1. Not running `--remove` flag (properties are only marked, not deleted)
2. Properties recently checked (use `--force` to recheck)
3. Website structure changed (detection keywords need updating)

### Problem: Script timing out

**Solution**:
- Increase timeout in code: `timeout=10` → `timeout=20`
- Reduce batch size (already saves every 10 properties)
- Run with `--force` less frequently

## Customization

### Adjust Check Frequency

```python
# In check_availability.py, line ~200
recent_threshold_hours=24  # Change to 12, 48, 72, etc.
```

### Add Custom Detection Keywords

```python
# In check_availability.py, line ~52
unavailable_indicators = [
    'not available',
    'your custom keyword here',
    # Add more...
]
```

### Change Rate Limiting

```python
# In check_availability.py, line ~267
time.sleep(2)  # Change to 1, 3, 5, etc. seconds
```

## Integration with Existing System

The availability checker integrates seamlessly:

1. **Map Viewer** - Already filters removed properties when "Show Removed" is unchecked
2. **Custom Criteria** - Processes all properties (including removed) but you can filter
3. **GPT Analysis** - Processes all properties (add filter if needed)

### Filter in Custom Criteria (Optional)

Add to `custom_criteria.py`:

```python
# At the start of main()
properties = [p for p in properties if p.get('status') != 'Removed']
```

## Success Metrics

After running for 1 week, you should see:

- **Availability rate**: 85-95% (normal for real estate)
- **False positives**: <5% (check `availability_reason`)
- **Time saved**: Hours of manual checking
- **Data quality**: Only active properties in analysis

## Quick Reference

```bash
# First time setup
pip3 install beautifulsoup4 requests schedule

# Manual check
python3 check_availability.py

# Force recheck all
python3 check_availability.py --force

# Remove unavailable
python3 check_availability.py --remove

# Start scheduler (daily 3 AM)
python3 auto_availability_check.py

# Background scheduler
nohup python3 auto_availability_check.py > availability_checker.log 2>&1 &

# Check scheduler status
ps aux | grep auto_availability_check.py

# View logs
tail -f availability_checker.log
```

## Support

If you encounter issues:
1. Check `availability_check_report.json` for statistics
2. Review `availability_reason` fields in `enriched_data.json`
3. Test a single property URL manually in browser
4. Adjust detection keywords or thresholds as needed
