# Evaluation Results Summary ## Quick Overview - **Dataset**: 56 document samples - **Best Approach**: Crop (No Shadow Removal) - **Performance Gain**: +14.1% F1-score improvement over baseline ## Performance Comparison (Ranked from Lowest to Highest) | Approach | Precision | Recall | F1-Score | Field Accuracy | Improvement vs. Baseline | |----------|-----------|--------|----------|----------------|---------------------------| | **No Preprocessing** | 79.0% | 68.7% | 73.5% | 68.7% | Baseline | | **Crop + PaddleOCR + Shadow Removal + Cache** | 92.5% | 88.3% | 90.3% | 88.3% | +16.8% | | **Crop + Shadow Removal + Cache** | 93.6% | 88.5% | 91.0% | 88.5% | +17.5% | | **Crop + PaddleOCR + Shadow Removal** | 93.6% | 89.4% | 91.5% | 89.4% | +18.0% | | **Crop** | 94.8% | 89.9% | 92.3% | 89.9% | +18.8% | ## Top Performing Fields - **Gender**: 85.1% F1 (Crop + PaddleOCR + Shadow Removal) - **Birth Date**: 80.5% F1 (Crop + PaddleOCR + Shadow Removal) - **Document Type**: 85.4% F1 (Crop + PaddleOCR + Shadow Removal) - **Surname**: 82.9% F1 (Crop + PaddleOCR + Shadow Removal) ## Key Insights 1. **Cropping** provides the biggest performance boost 2. **PaddleOCR + Shadow Removal** adds small but consistent improvement 3. **Shadow removal** shows mixed results depending on field type 4. **Caching** has minimal impact on accuracy ## Recommendations - Use **Crop + PaddleOCR + Shadow Removal** for production - Focus on optimizing high-value fields - Investigate MRZ line extraction further - Target 65%+ overall F1-score --- *See README.md for detailed analysis*