SWI Classification — ML Results

Model performance overview

5-fold stratified cross-validation results. Balanced accuracy accounts for class imbalance; macro-F1 weights all three classes equally regardless of frequency.

Random Forest

Gradient Boosting

Accuracy is inflated by class dominance (fresh ≫ brackish/saline). Balanced accuracy and macro-F1 are the reliable metrics here.

Confusion matrix

Aggregated across all 5 CV folds. Rows = true class, columns = predicted class. Off-diagonal cells are misclassifications.

Misclassification analysis

Feature importance

Two complementary methods. MDI (Mean Decrease Impurity) from the full-dataset Random Forest. Permutation importance measures the drop in balanced accuracy when each feature is randomly shuffled on a held-out split — more reliable for high-cardinality features.

Per-class metrics

Precision, recall and F1-score for each salinity class across both models. Aggregated from 5 CV folds.

Methodology

Design decisions and scientific rationale for the ML pipeline.

Feature engineering

Lag windows	7d · 30d · 90d · 180d
Precip / PET / ET0	sum over window
Temp · humidity · wind · radiation	mean over window
Soil moisture (4 depths)	mean over window
Soil temperature (4 depths)	mean over window
Water balance (P − PET)	sum over window
Static features	distance_coast_m · depth · doy · year · month
Total features	85

Class thresholds (Cl⁻ mg/L)

Fresh	< 250 mg/L
Brackish	250 – 1000 mg/L
Saline	> 1000 mg/L

Model configuration

RF n_estimators	200 (CV) · 500 (importance)
RF class_weight	balanced
GB n_estimators	200
GB max_depth	4
GB learning_rate	0.05
GB imbalance	sample_weight (balanced)
CV strategy	StratifiedKFold · 5 folds

Tordera — severe class imbalance warning. Only 2 saline and 10 brackish observations out of 293 total. No model can reliably learn the minority classes with this distribution. Results for fresh class are valid; brackish/saline predictions are not meaningful. More balanced sampling or combining both datasets is recommended before deployment.

Ter — Gradient Boosting performs well. Macro-F1 = 0.73, balanced accuracy = 0.71 with 5-fold CV. The brackish class remains the hardest (F1 = 0.32), which is expected given only 19 observations. The saline class achieves F1 = 0.91, strongly driven by distance_coast_m and depth as static predictors.

Key drivers identified. Distance from coast and depth dominate both MDI and permutation importance — these are the primary physical controls. Among the meteorological features, 30d precipitation and deep soil moisture (100–255 cm) emerge as the most hydrogeologically meaningful lags, consistent with the expected aquifer response time.