# Pipeline Logging - Implementation Summary

## What Was Added

### 1. Log File Creation
**File**: `criteria_api.py`

When jobs are started (Full Update or Availability Check), the system now:
- Creates a log file: `/tmp/farmmatch_job_<job_id>.log`
- Redirects all stdout and stderr to this file
- Returns the log file path in the API response

**Changes Made**:
- Line 187-189: Create log file handle for scrape jobs
- Line 191-197: Redirect subprocess output to log file
- Line 206: Store log_file in active_jobs
- Line 215: Return log_file path in API response
- Line 240-248: Same for availability check jobs

### 2. Log Viewer Script
**File**: `view_job_log.sh` (new file)

Helper script to easily view job logs:
```bash
./view_job_log.sh                    # List all recent jobs
./view_job_log.sh <job_id>           # View a job's log
./view_job_log.sh <job_id> follow    # Follow log in real-time
```

### 3. Documentation
**File**: `LOGGING_GUIDE.md` (new file)

Comprehensive guide covering:
- How logging works
- How to view logs (3 methods)
- What gets logged in each pipeline step
- Troubleshooting with logs
- Log retention and monitoring

## How It Works

### When You Click "Full Update":

1. **API receives request** → Generates job ID (e.g., "a1b2c3d4")

2. **Creates two files**:
   - `/tmp/farmmatch_progress_a1b2c3d4.json` (progress tracking)
   - `/tmp/farmmatch_job_a1b2c3d4.log` (output log)

3. **Starts subprocess** with output redirected to log file

4. **Returns response**:
   ```json
   {
     "success": true,
     "job_id": "a1b2c3d4",
     "log_file": "/tmp/farmmatch_job_a1b2c3d4.log",
     "estimated_time": "20-30 minutes"
   }
   ```

5. **As pipeline runs**, all output goes to log file:
   - Scraping progress
   - Availability checks
   - Geocoding results
   - Analysis scores
   - Errors and warnings

### Viewing Progress:

**Option 1 - Real-time log following**:
```bash
./view_job_log.sh a1b2c3d4 follow
```

**Option 2 - Check progress file**:
```bash
cat /tmp/farmmatch_progress_a1b2c3d4.json
```

**Option 3 - API endpoint**:
```bash
curl http://localhost:5001/api/job-status/a1b2c3d4
```

## What Gets Logged

### Full Update Pipeline

**Step 1: Scraping** (`auto_scrape_favorites.py`)
```
======================================================================
🎯 SCRAPING FAVORITES
======================================================================

🔍 Found 186 properties in favorites
📥 Extracting URLs...
✅ Saved to: extracted_property_urls.csv
```

**Step 2: Availability** (`check_availability.py`)
```
======================================================================
🔍 CHECKING AVAILABILITY
======================================================================

[1/186] https://www.properstar.nl/listing/12345
  ✅ Status: 200 - Property is ACTIVE

[2/186] https://www.properstar.nl/listing/67890
  ❌ Status: 404 - Property REMOVED
```

**Step 3: Geocoding** (`geocode_properties.py`)
```
======================================================================
🗺️ GEOCODING PROPERTIES
======================================================================

🔍 [1/150] Finca in Spain
  → Coordinates: 41.23, 2.45
  ✅ Geocoded successfully
```

**Step 4: Analysis** (`custom_criteria.py`)
```
======================================================================
🎯 CUSTOM CRITERIA EVALUATION
======================================================================

📋 Active Criteria:
  • 🌧️ Rainfall (weight: 2.0)
  • 🌡️ Temperature (weight: 1.5)
  ...

🔍 [1/150] https://www.properstar.nl/listing/12345
  → Evaluating 🌧️ Rainfall...
  → Evaluating 🌡️ Temperature...
  ...
  ✓ Overall custom score: 4.2
```

## Benefits

1. **Debugging**: See exactly what went wrong when a pipeline fails
2. **Monitoring**: Follow progress in real-time
3. **Transparency**: Know what the system is doing at each step
4. **Troubleshooting**: Identify issues like missing auth.json immediately
5. **Audit Trail**: Keep history of all pipeline runs

## Example Usage

### Scenario: User clicks "Full Update" but nothing happens

**Before (no logging)**:
- Progress stuck at 0%
- No way to know what's wrong
- Have to guess the issue

**After (with logging)**:
```bash
# View the log
./view_job_log.sh a1b2c3d4

# See the error
FileNotFoundError: [Errno 2] No such file or directory: 'auth.json'
```

**Solution**: Run `./login.sh` to create auth.json

### Scenario: Want to see scraping progress

**Terminal 1 - Follow log**:
```bash
./view_job_log.sh a1b2c3d4 follow
```

**Terminal 2 - Watch progress**:
```bash
watch -n 2 'curl -s http://localhost:5001/api/job-status/a1b2c3d4'
```

See real-time updates as each property is scraped, checked, geocoded, and analyzed.

## Files Modified

1. **criteria_api.py** - Added log file creation and redirection
2. **view_job_log.sh** (new) - Log viewer helper script
3. **LOGGING_GUIDE.md** (new) - Comprehensive logging documentation
4. **LOGGING_SUMMARY.md** (this file) - Implementation summary

## Next Steps

The logging system is now active. When you:

1. **Login to Properstar**: `./login.sh`
2. **Click "Full Update"** in the UI
3. **Note the job ID** from the response
4. **Follow the log**: `./view_job_log.sh <job_id> follow`

You'll see detailed progress through all 4 pipeline steps!
