# 💰 Cost Reduction Guide - Minimize GPT Usage

## Problem: Current System

**Current cost per property:** ~$0.02-0.03
**186 properties:** ~$3-5
**Every re-analysis:** $3-5

## Solution: Use Deterministic Analysis

### New Approach: Extract Facts, Not Opinions

Instead of asking GPT to evaluate everything, we:
1. ✅ **Extract objective facts** from Properstar pages (FREE)
2. ✅ **Score based on rules** (FREE)
3. ❌ **Skip GPT entirely** OR use it sparingly for edge cases

---

## 🆕 New Script: deterministic_analyzer.py

**Cost:** $0.00
**Speed:** Much faster (no API calls)

### What It Extracts (Deterministic):

#### Hard Facts:
- Land size (m²)
- Building size (m²)
- Number of bedrooms
- Number of bathrooms
- Property type (farm, villa, house, land)
- Price
- Features (pool, garage, barn, etc.)
- Location (municipality, region, country)
- Coordinates (from embedded maps)

#### Rule-Based Scoring:

**Market Garden (1-5):**
- Land size ≥ 5 hectares → 5
- Land size ≥ 2 hectares → 4
- Land size ≥ 0.5 hectares → 3
- Land size ≥ 0.2 hectares → 2
- Land size < 0.2 hectares → 1
- +1 if has garden feature
- +1 if property type is "farm"

**Guest Accommodation (1-5):**
- Bedrooms ≥ 5 → 5
- Bedrooms ≥ 3 → 4
- Bedrooms ≥ 2 → 3
- Bedrooms < 2 → 2
- +1 if has pool/airco/terrace
- +1 if property type is villa/farm

**Workshop (1-5):**
- Building size ≥ 200m² → 4
- Building size ≥ 100m² → 3
- Building size < 100m² → 2
- +1 if has barn/garage/workshop

**Rental Units (1-5):**
- Based on bedrooms + building size
- ≥4 bedrooms + ≥200m² → 5
- ≥3 bedrooms → 4
- ≥2 bedrooms → 3

**Location (1-5):**
- Netherlands/Belgium → 5
- France/Germany → 4
- Spain/Portugal/Italy → 3

**Local Market (1-5):**
- Defaults to 3 (needs manual assessment or GPT)

**Risk:**
- Missing >3 key facts → High
- Missing 1-2 key facts → Medium
- All key facts present → Low

---

## 📊 Comparison

| Feature | GPT Analysis | Deterministic |
|---------|-------------|---------------|
| Cost per property | $0.02-0.03 | $0.00 |
| Cost for 186 properties | $3-5 | $0.00 |
| Speed per property | ~3-5 seconds | ~1-2 seconds |
| Accuracy | High (subjective) | Good (objective) |
| Customizable | Edit prompt | Edit rules |
| Scalable | Limited by $ | Unlimited |

---

## 🔄 Three Analysis Modes

### Mode 1: Pure Deterministic (Recommended)
**Cost:** $0
**Use:** Default for all properties

```bash
cd scraper
/usr/bin/python3 deterministic_analyzer.py
```

Extracts facts and scores based on rules. NO GPT CALLS.

### Mode 2: Hybrid (Smart)
**Cost:** <$1 for 186 properties
**Use:** Deterministic first, GPT only for unclear cases

```bash
# First run deterministic
/usr/bin/python3 deterministic_analyzer.py

# Then run GPT only on properties missing critical info
/usr/bin/python3 analyze_from_urls.py --only-incomplete
```

### Mode 3: Full GPT (Expensive)
**Cost:** $3-5
**Use:** When you want subjective quality assessment

```bash
/usr/bin/python3 analyze_from_urls.py
```

---

## 🎯 Recommended Workflow

### Initial Setup (One Time):
```bash
# 1. Scrape favorites
/usr/bin/python3 sync_favorites.py

# 2. Extract facts + score deterministically (FREE!)
/usr/bin/python3 deterministic_analyzer.py

# 3. Parse criteria
/usr/bin/python3 parse_criteria.py
```

**Cost: $0**

### Weekly Maintenance:
```bash
# 1. Check for new properties
/usr/bin/python3 sync_favorites.py

# 2. If new properties: analyze deterministically
/usr/bin/python3 deterministic_analyzer.py

# 3. Parse criteria
/usr/bin/python3 parse_criteria.py

# 4. Smart unfavorite
/usr/bin/python3 smart_unfavorite.py
```

**Cost: $0** (unless you choose to use GPT on specific properties)

---

## 🔧 Customizing Rules

Edit **deterministic_analyzer.py** lines 137-250 to adjust scoring rules:

### Example: Make Land Size More Important

```python
# Current:
if facts['land_size_m2'] >= 50000:  # 5+ ha
    mg_score = 5
elif facts['land_size_m2'] >= 20000:  # 2+ ha
    mg_score = 4

# More strict:
if facts['land_size_m2'] >= 100000:  # 10+ ha
    mg_score = 5
elif facts['land_size_m2'] >= 50000:  # 5+ ha
    mg_score = 4
```

### Example: Prioritize Properties with Pools

```python
# Add in guest accommodation scoring:
if 'zwembad' in facts['features'] or 'pool' in facts['features']:
    guest_score = min(5, guest_score + 2)  # +2 instead of +1
    guest_reasons.append("Has pool (major plus!)")
```

---

## 💡 When to Use GPT

### Use GPT when:
- ❓ Property description is vague/incomplete
- ❓ Subjective quality matters (views, charm, uniqueness)
- ❓ Need cultural/local knowledge
- ❓ Final decision on expensive properties

### DON'T use GPT when:
- ✅ Basic facts are clear (land size, bedrooms, etc.)
- ✅ Objective criteria are sufficient
- ✅ Initial filtering/screening
- ✅ Just need to remove obvious bad matches

---

## 🎨 Hybrid Approach Example

**Scenario:** You have 186 properties

**Step 1:** Deterministic analysis on ALL
- Cost: $0
- Time: ~5 minutes
- Filters out ~50% of properties

**Step 2:** Manual review of top 93 properties
- Look at photos and descriptions
- Select ~20 most interesting

**Step 3:** GPT analysis on ONLY those 20
- Cost: ~$0.40
- Time: ~1 minute
- Get subjective quality assessment

**Total Cost:** $0.40 instead of $3-5!
**Savings:** 90%+

---

## 📈 Cost Projection

### Current System (GPT for everything):
- Initial analysis: $5
- Re-analysis after changes: $5
- 4 re-analyses per year: $20/year

### New System (Deterministic):
- Initial analysis: $0
- Re-analysis after changes: $0
- 4 re-analyses per year: $0/year
- Optional GPT on top 20: $1.60/year

**Annual Savings:** $18+

---

## 🔄 Migration Path

### Option A: Fresh Start (Recommended)
```bash
# Backup current data
cp analysis_output.csv analysis_output_gpt_backup.csv

# Run deterministic analyzer
/usr/bin/python3 deterministic_analyzer.py

# Compare results
# Deterministic scores are in columns: det_market_garden, det_guest, etc.
# GPT scores are in columns: market_garden, guest_accommodation, etc.
```

### Option B: Keep Both
Use deterministic for initial screening, GPT for final decisions:

```bash
# Screen with deterministic
/usr/bin/python3 deterministic_analyzer.py

# Filter to top properties
/usr/bin/python3 smart_unfavorite.py --dry-run

# Then manually run GPT on remaining high-potential properties
```

---

## 📊 Accuracy Comparison

### What Deterministic Gets Right:
- ✅ Land size assessment (very accurate)
- ✅ Building suitability (accurate)
- ✅ Bedroom count (perfect)
- ✅ Location preference (rule-based)
- ✅ Risk assessment (based on data completeness)

### What GPT Does Better:
- 🎯 Subjective quality ("charming", "authentic")
- 🎯 Local market assessment
- 🎯 Regulatory/legal considerations
- 🎯 Cultural factors
- 🎯 Nuanced descriptions

### Reality Check:
For screening 186 properties down to 20-30 interesting ones, **deterministic is MORE than sufficient**. Save GPT for the final candidates.

---

## 🎯 Recommended Configuration

**config.json** for deterministic analysis:

```json
{
  "use_deterministic": true,
  "use_gpt": false,
  "gpt_only_for_top": 20,
  "unfavorite_thresholds": {
    "overall_score": 0.0,
    "det_market_garden": 3,
    "det_guest": 3,
    "det_workshop": 2,
    "det_rental": 3,
    "det_location": 3,
    "det_market": 3,
    "max_risk": "Gemiddeld",
    "require_all": false
  }
}
```

---

## 🆘 FAQ

**Q: Is deterministic analysis less accurate?**
A: For objective facts (land size, bedrooms), it's MORE accurate. For subjective quality, GPT is better. Use deterministic for screening.

**Q: Can I still use GPT?**
A: Yes! Run deterministic first, then run GPT on interesting properties only.

**Q: Will my existing data be lost?**
A: No. Deterministic scores go in separate columns (det_*). Your GPT scores stay in the original columns.

**Q: How do I switch back to GPT?**
A: Just run analyze_from_urls.py anytime. Both systems can coexist.

**Q: What if I want BOTH?**
A: Perfect! Run deterministic first (free), then selectively run GPT on properties you're serious about.

---

## 💰 Bottom Line

**Current annual cost:** ~$20
**With deterministic:** ~$0-2
**Savings:** 90%+
**Speed:** 2-3x faster
**Scalability:** Unlimited

**Recommendation:** Use deterministic for screening, GPT for final decisions on top candidates.

---

**Ready to save money? Run:**
```bash
cd scraper
/usr/bin/python3 deterministic_analyzer.py
```
