# 🧪 FarmMatch Update Pipeline - Testing Guide

## Quick Test: Is Everything Working?

Run this single command to test your entire system:

```bash
cd /Users/jonathan/SynologyDrive/Since\ Today/PROJECTEN/farmmatch/scraper
python3 test_update_pipeline.py
```

**Expected result**: 8/9 tests pass (auth.json format difference is ok)

---

## What Gets Tested

### ✅ Tests That Should Pass

1. **API Server Status** - Is the API running on port 5001?
2. **Required Data Files** - Do enriched_data.json and auth.json exist?
3. **Current Data State** - How many properties, how fresh is the data?
4. **Availability Check** - Can we check if a property is still available?
5. **API Endpoints** - Do /api/system-status and /api/get-api-key work?
6. **Progress Tracking** - Can we create and read job progress?
7. **Scrape-Only Command** - Does auto_scrape_favorites.py exist?
8. **Full Pipeline Command** - Do all 4 pipeline scripts exist?

### ⚠️ Tests That May Fail

9. **Scraping Capability** - Checks auth.json format (may differ from test expectations)

---

## Manual Testing Methods

### Method 1: Test via UI (Recommended)

**Best for**: Non-technical users, visual feedback

1. Open http://localhost:8000/criteria_manager.html
2. Look at the **System Status** bar at top:
   - ✅ API Server: Running (green)
   - 🟢 Data Freshness: Should show how old your data is
   - ⚙️ Active Jobs: Should be 0

3. Click **"🔄 Full Update"** button
4. Watch the progress bar:
   ```
   🚀 Full Update Started
   [████████████░░░░░░░░] 60%
   Analyzing property 108 / 180 (60%)
   ```

5. Wait 20-30 minutes for completion
6. You should see: **✅ Update Complete!**

**What to verify**:
- Progress bar moves from 0% → 100%
- Current step updates ("Scraping...", "Checking availability...", etc.)
- No errors appear
- Data refreshes automatically when done

---

### Method 2: Test via Command Line

**Best for**: Developers, seeing detailed logs

#### Test 1: Scrape Only (5-10 min)
```bash
cd /Users/jonathan/SynologyDrive/Since\ Today/PROJECTEN/farmmatch/scraper
python3 auto_scrape_favorites.py scrape-only
```

**Expected output**:
```
======================================================================
📥 SCRAPING FAVORITES ONLY
======================================================================
[Playwright browser output...]
✅ Favorites scraping completed
```

**Verify**:
- `extracted_property_urls.csv` is updated
- No errors about authentication

---

#### Test 2: Availability Check Only (5-10 min)
```bash
python3 check_availability.py
```

**Expected output**:
```
🔍 Checking availability for 186 properties...
[1/186] Checking: https://www.properstar.nl/listing/86083770
✅ Active (200 - Property appears to be active)
[2/186] Checking: https://www.properstar.nl/listing/106994589
✅ Active (200 - Property appears to be active)
...
✅ Availability check complete!
   Active: 165
   Removed: 21
```

**Verify**:
- Properties get `status: "Active"` or `status: "Removed"`
- `enriched_data.json` is updated with `availability_last_checked` timestamps

---

#### Test 3: Full Pipeline (20-30 min)
```bash
python3 auto_scrape_favorites.py now
```

**Expected output**:
```
======================================================================
🚀 STARTING FULL UPDATE PIPELINE
======================================================================

📥 Step 1/4: Scraping favorites from Properstar...
[... scraping output ...]
✅ Favorites scraped successfully

🔍 Step 2/4: Checking property availability...
   (Filtering out sold/removed properties before geocoding)
[... availability check output ...]
✅ Availability check completed
   📊 165 active properties, 21 removed/sold (skipping these)

📍 Step 3/4: Geocoding active properties...
[... geocoding output ...]
✅ Geocoding completed

🎯 Step 4/4: Running custom criteria evaluation...
[... analysis output ...]
✅ Custom criteria evaluation completed

======================================================================
✅ FULL UPDATE PIPELINE COMPLETED SUCCESSFULLY
======================================================================

📊 Current Status:
   Total properties: 186
   Active: 165
   Removed: 21
```

**Verify**:
- All 4 steps complete without errors
- Step 2 shows removed properties count
- Steps 3 and 4 only process active properties
- Final summary shows total/active/removed counts

---

### Method 3: Test via API (Advanced)

**Best for**: Testing the UI button functionality directly

#### Start a Full Update via API:
```bash
curl -X POST http://localhost:5001/api/scrape-favorites \
  -H "Content-Type: application/json" \
  -d '{"full_pipeline": true}'
```

**Expected response**:
```json
{
  "success": true,
  "job_id": "a7f3b2c1",
  "message": "Favorites scraping started in background",
  "full_pipeline": true,
  "estimated_time": "20-30 minutes"
}
```

#### Check Job Progress:
```bash
# Replace a7f3b2c1 with your job_id from above
curl http://localhost:5001/api/job-status/a7f3b2c1
```

**Expected response**:
```json
{
  "success": true,
  "job_id": "a7f3b2c1",
  "status": "running",
  "progress": 2,
  "current_step": "Checking property availability...",
  "total_steps": 4,
  "started_at": "2025-10-08T22:45:00"
}
```

#### Poll every few seconds until status = "completed":
```bash
watch -n 2 'curl -s http://localhost:5001/api/job-status/a7f3b2c1 | jq'
```

---

## Verification Checklist

After running an update, verify these things:

### Data File Checks
```bash
# Check file was updated recently
ls -lh enriched_data.json

# Check total properties
cat enriched_data.json | python3 -c "import json, sys; print(len(json.load(sys.stdin)))"

# Check active vs removed
cat enriched_data.json | python3 -c "
import json, sys
data = json.load(sys.stdin)
active = sum(1 for p in data if p.get('status') != 'Removed')
print(f'Active: {active}, Removed: {len(data) - active}')
"

# Check how many have coordinates
cat enriched_data.json | python3 -c "
import json, sys
data = json.load(sys.stdin)
with_coords = sum(1 for p in data if p.get('lat'))
print(f'With coordinates: {with_coords}/{len(data)}')
"
```

### Expected Results:
- `enriched_data.json` modified timestamp is recent
- All properties have `lat`, `lon`, `analysis`
- Active properties have `status: "Active"`
- Removed properties have `status: "Removed"`

---

## Common Issues & Solutions

### Issue 1: "API server not running"
**Symptom**: Tests fail with "Cannot connect to API server"

**Solution**:
```bash
cd /Users/jonathan/SynologyDrive/Since\ Today/PROJECTEN/farmmatch/scraper
python3 criteria_api.py
```

Or use the startup script:
```bash
./start_system.sh
```

---

### Issue 2: "Progress bar stuck at 0%"
**Symptom**: UI shows progress but stays at 0%

**Possible causes**:
1. Python scripts don't write progress files yet (known limitation)
2. Job crashed immediately

**Debug**:
```bash
# Check if job process is running
ps aux | grep auto_scrape_favorites

# Check progress file
cat /tmp/farmmatch_progress_*.json

# Check API logs
tail -f /tmp/farmmatch_api.log
```

**Note**: Current implementation has progress API ready but scripts don't update it yet. This is a known limitation documented in PROGRESS_TRACKING_IMPLEMENTATION.md.

---

### Issue 3: "Scraping fails with authentication error"
**Symptom**: "Invalid credentials" or "Login failed"

**Solution**:
1. Check auth.json has valid Properstar credentials
2. Try logging in manually at properstar.nl to verify credentials
3. Check if cookies expired (scraper may need to re-login)

---

### Issue 4: "Pipeline takes too long"
**Symptom**: Full update takes > 45 minutes

**Possible causes**:
1. Processing removed properties (should be fixed now)
2. Network is slow
3. Many new properties to geocode

**Debug**:
```bash
# Check how many properties being processed
cat enriched_data.json | python3 -c "
import json, sys
data = json.load(sys.stdin)
active = sum(1 for p in data if p.get('status') != 'Removed')
print(f'{active} active properties will be processed')
"
```

**Expected timing**:
- Scraping: 5-10 min
- Availability: 5-10 min (depends on property count)
- Geocoding: 2-5 min (depends on new properties)
- Analysis: 5-10 min (depends on new properties)

---

### Issue 5: "Data doesn't refresh in UI"
**Symptom**: Clicked update, it completed, but map shows old data

**Solution**:
1. Hard refresh browser: Cmd+Shift+R (Mac) or Ctrl+Shift+R (Windows)
2. Clear browser cache
3. Close and reopen the page

---

## Performance Benchmarks

### Typical Times (165 active properties)

| Operation | Time | Notes |
|-----------|------|-------|
| **Scrape Favorites** | 5-8 min | Depends on Properstar server |
| **Check Availability** | 5-10 min | 2 sec per property × 165 |
| **Geocode (0 new)** | <1 min | Skip already geocoded |
| **Geocode (5 new)** | 1-2 min | 1.5 sec per property |
| **Analyze (0 new)** | <1 min | Skip already analyzed |
| **Analyze (5 new)** | 2-3 min | GPT API calls |
| **Full Pipeline (no new)** | 12-15 min | Scrape + availability only |
| **Full Pipeline (5 new)** | 20-25 min | All steps |
| **Full Pipeline (fresh DB)** | 45-60 min | All 165 properties |

---

## Quick Reference Commands

### System Control
```bash
# Start system
./start_system.sh

# Stop system
./stop_system.sh

# Check if running
curl http://localhost:5001/api/system-status
```

### Data Operations
```bash
# Scrape only
python3 auto_scrape_favorites.py scrape-only

# Check availability only
python3 check_availability.py

# Full pipeline
python3 auto_scrape_favorites.py now
```

### Testing
```bash
# Run all tests (dry run)
python3 test_update_pipeline.py

# Test actual full update
python3 test_update_pipeline.py full-update-test
```

### Monitoring
```bash
# Watch API logs
tail -f /tmp/farmmatch_api.log

# Watch web server logs
tail -f /tmp/farmmatch_web.log

# Check active jobs
curl http://localhost:5001/api/system-status | jq '.active_jobs'

# Check data freshness
curl http://localhost:5001/api/system-status | jq '.data_age_hours'
```

---

## Best Practices

### Daily Workflow
1. **Morning**: Check availability only (5-10 min)
   ```bash
   python3 check_availability.py
   ```

2. **Weekly**: Run full update (20-30 min)
   ```bash
   python3 auto_scrape_favorites.py now
   ```

3. **Before Important Decisions**: Refresh data
   ```bash
   # Quick refresh
   curl -X POST http://localhost:5001/api/check-availability
   ```

### Automated Scheduling
```bash
# Schedule weekly updates (Sunday 2am)
python3 auto_scrape_favorites.py schedule sunday 02:00

# Or use crontab
crontab -e
# Add: 0 2 * * 0 cd /path/to/scraper && python3 auto_scrape_favorites.py now
```

---

## Success Criteria

✅ **System is working correctly if**:
- Test suite shows 8/9 or 9/9 passes
- UI shows system status as green
- Progress bars update during operations
- Data file timestamps are recent after updates
- No "Removed" properties appear in active results
- All active properties have coordinates and analysis

---

**Last Updated**: October 8, 2025
**Status**: Testing system ready for use
