V1, V2, and V3 comparison

Each attached notebook represents a different patient/control classifier design. V3 is highlighted because it is the most defensible model for presentation and testing.

V3 recommended

V1 Underfitted

Individual-Level Baseline

Uses one row per person with an 80/10/10 split. It is simple and leakage-aware, but only has 10 individual examples, so it does not have enough training signal.

Notebook: patient_classifier_colab.ipynb
Strength: Clean baseline
Weakness: Too few rows to learn stable patterns

V2 Overfitted

Session-Level Exploratory Model

Uses session-level rows and leave-one-subject-out evaluation. It reports stronger performance, but is more aggressive and likely too tuned to the small workbook.

Notebook: patient_classifier_colabV2.ipynb
Strength: Uses more rows
Weakness: Exploratory and overfit-prone

V3 Most accurate

Conservative Locked Model

Uses session-level features, leave-one-subject-out testing, non-distributional metrics, feature selection, and a locked regularized logistic model. This is the best version to present.

Notebook: patient_classifier_colabV3.ipynb
Reported target: ~80% balanced accuracy
Why it wins: Best balance of performance and restraint

Recommendation

Use V3 for the final workflow. Keep V1 as the baseline and V2 as an exploratory comparison, but avoid presenting V2 as the primary model because its higher performance may not generalize.