Model performance improvement tips
Créé le : 10 janvier 2025
Créé le : 10 janvier 2025
Below a function to train ML model for traiding bot, for now after training we have ROC-AUC=0.537, Precision=0.452, Recall=0.999, F1=0.622, Best threshold=0.35, need to improuve these values.
Code:
async def train_unified_model_with_calibration_upd():
symbols = await get_tickers_usdt ()
textif not symbols: logging.error ("No symbols found for USDT.") return # 2) Загружаем данные df_all = await load_all_data (symbols, TIMEFRAMES, DATA_LIMIT) if df_all.empty: logging.error ("No data loaded.") return logging.info (f"Loaded data shape: {df_all.shape}") # 3) Добавляем технические индикаторы df_all = add_technical_indicators (df_all) # 4) Добавляем таргет df_all = add_target (df_all, forecast_horizon=5) # 5) Кодируем категориальные признаки df_all = encode_categorical (df_all) # 6) Готовим фичи и целевую переменную X, y, feature_cols = prepare_features (df_all) logging.info (f"Feature columns: {feature_cols}") # 7) Делим данные на train / test (сначала) X_trainval, X_test, y_trainval, y_test = train_test_split (X, y, test_size=0.2, shuffle=False) logging.info (f"After 1st split: trainval={len (X_trainval)}, test={len (X_test)}") # Теперь делим trainval (80%) на собственно train (60%) и val (20%) X_train, X_val, y_train, y_val = train_test_split (X_trainval, y_trainval, test_size=0.25, shuffle=False) logging.info (f"Final split sizes: train={len (X_train)}, val={len (X_val)}, test={len (X_test)}") # Handle class imbalance smote = SMOTE (random_state=42) X_train, y_train = smote.fit_resample (X_train, y_train) # 8) Hyperparameter tuning for RandomForest rf_param_grid = { 'n_estimators': [50, 100, 200], 'max_depth': [4, 6, 8], 'min_samples_split': [2, 5, 10] } rf_grid_search = GridSearchCV (RandomForestClassifier (random_state=42), rf_param_grid, cv=5, scoring='roc_auc', n_jobs=-1) rf_grid_search.fit (X_train, y_train) best_rf_model = rf_grid_search.best_estimator_ train_score = best_rf_model.score (X_train, y_train) val_score = best_rf_model.score (X_val, y_val) logging.info (f"Best RF Model Train Accuracy: {train_score:.3f}, Val Accuracy: {val_score:.3f}") # 9) Калибровка вероятностей calibrated_model = CalibratedClassifierCV (estimator=best_rf_model, method='sigmoid', cv='prefit') calibrated_model.fit (X_val, y_val) # Test proba_val = calibrated_model.predict_proba (X_val)[:, 1] y_pred_thr = (proba_val >= 0.5).astype (int) logging.info (f"Predicted positive samples: {np.sum (y_pred_thr)}") # 10) Подбор порога тоже на val proba_val = calibrated_model.predict_proba (X_val)[:, 1] thresholds = np.linspace (0, 1, 101) best_thr = 0.5 best_f1 = 0.0 for thr in thresholds: y_pred_thr = (proba_val >= thr).astype (int) precision, recall, f1, _ = precision_recall_fscore_support (y_val, y_pred_thr, average='binary', zero_division=0) if f1 > best_f1: best_f1 = f1 best_thr = thr logging.info (f"Best threshold on val={best_thr:.2f} with F1={best_f1:.3f}") # 11) Оцениваем итоговую модель на тесте (уже c подобранным threshold, но без «дополнительной» калибровки) proba_test = calibrated_model.predict_proba (X_test)[:, 1] y_pred_test = (proba_test >= best_thr).astype (int) # Считаем метрики на test test_precision, test_recall, test_f1, _ = precision_recall_fscore_support (y_test, y_pred_test, average='binary') test_roc_auc = roc_auc_score (y_test, proba_test) logging.info ( f"Final Test metrics: ROC-AUC={test_roc_auc:.3f}, " f"Precision={test_precision:.3f}, Recall={test_recall:.3f}, F1={test_f1:.3f}, Best threshold={best_thr:.2f}" ) # 12) Сохраняем откалиброванную модель, признаки и best_thr Path ("models").mkdir (exist_ok=True) with open ("models/ml_unified_model.pkl", "wb") as f: pickle.dump ((calibrated_model, feature_cols, best_thr), f) # 13) Сохраняем маппинги символов / tf symbol_mapping = {symbol: idx for idx, symbol in enumerate (df_all['symbol'].unique ())} tf_mapping = {tf: idx for idx, tf in enumerate (df_all['tf'].unique ())} with open ("models/symbol_mapping.pkl", "wb") as f: pickle.dump (symbol_mapping, f) with open ("models/tf_mapping.pkl", "wb") as f: pickle.dump (tf_mapping, f) logging.info ("Calibrated model + threshold saved to models/ml_unified_model.pkl") return calibrated_model, feature_cols, best_thr
Below are some suggestions and ideas you can try to improve model performance (i.e., achieve higher ROC-AUC, precision, and F1 while not sacrificing recall quite so drastically). Some may or may not apply exactly to your scenario, but in practice they often help boost predictive power for trading models.
Since you’re working on a trading (time-series) problem, a standard train_test_split(..., shuffle=False)
may not fully reflect the temporal dependencies in your data. Instead, consider using TimeSeriesSplit (or a custom walk-forward split) to respect the order of time:
pythonfrom sklearn.model_selection import TimeSeriesSplit tscv = TimeSeriesSplit(n_splits=5) # or however many splits you want rf_grid_search = GridSearchCV( estimator=RandomForestClassifier(random_state=42), param_grid=rf_param_grid, cv=tscv, scoring='roc_auc', n_jobs=-1 )
TimeSeriesSplit ensures each split trains on past data and tests on future data, preventing look-ahead bias and giving you a more realistic performance estimate.
Right now, you’re exploring only a small parameter grid:
pythonrf_param_grid = { 'n_estimators': [50, 100, 200], 'max_depth': [4, 6, 8], 'min_samples_split': [2, 5, 10] }
max_depth
(3, 5, 7, 9, 11, None), try min_samples_leaf
, or incorporate max_features
.RandomizedSearchCV
or libraries like optuna for more efficient hyperparameter searches.Example:
pythonrf_param_grid = { 'n_estimators': [50, 100, 200, 300, 500], 'max_depth': [3, 5, 7, 9, None], 'min_samples_split': [2, 5, 10], 'min_samples_leaf': [1, 2, 5, 10], 'max_features': ['sqrt', 'log2'] }
You have add_target(df_all, forecast_horizon=5)
which presumably labels whether the price moves up/down in the next 5 candles. Sometimes the definition of an up/down move or the threshold for “significant” movement can cause extreme class imbalance or lead to many false positives.
Making your target less noisy (i.e. only “1” if a certain price change threshold is exceeded) often yields a better signal.
In trading, good features often matter more than the exact choice of model. In addition to standard indicators:
SelectKBest
, RFE
, or Boruta
on top of RF to prune unhelpful indicators.You are already using SMOTE on the training set:
pythonsmote = SMOTE(random_state=42) X_train, y_train = smote.fit_resample(X_train, y_train)
class_weight='balanced'
in RandomForest).sampling_strategy
parameter).Your high recall (0.999) indicates you are predicting almost everything as positive for the chosen threshold (0.35). That kills precision. One approach:
Sometimes the best F1 threshold isn’t the best for actual trading. If you’d rather have fewer signals but more accurate ones, you might want to increase the threshold to push precision up.
CalibratedClassifierCV
with sigmoid) is appropriate. Sometimes isotonic calibration works better, or an ensemble approach might do better than a single RF.Sometimes a simple classification metric like F1 or precision/recall is not enough for a trading system. You might find it useful to:
By iterating on these points—especially focusing on the quality of your features, time-series cross-validation, and a balanced threshold for your trading goals—you should see improvements in ROC-AUC, precision, and a better overall F1 score (while likely reducing that near-100% recall, which is usually a sign of many false positives). Good luck!