Model performance improvement tips
Created on: January 10, 2025
Created on: January 10, 2025
Triaging areas for improvement
I’m mapping out options to enhance the trading bot’s performance, such as refining data pre-processing, addressing class imbalance, tuning hyperparameters, and introducing advanced models like gradient boosting.
Refining threshold strategy
Considering threshold tuning to boost precision, I’m analyzing the high recall and low precision to potentially adjust the threshold from 0.35 to a range like 0.6 or 0.7.
Assessing current performance
I’m noting the low precision and high recall, suggesting the threshold might need adjustment. To improve, focus on feature engineering, data cleaning, and comprehensive hyperparameter tuning.
Advancing model strategy
I'm crafting an advanced approach, focusing on classifiers like XGBoost, time-series cross-validation, and feature selection. Considering label definition, class imbalance, horizon, and calibration for a sophisticated model progression.
Thought about improving trading model metrics for 14 seconds
Below are some suggestions and ideas you can try to improve model performance (i.e., achieve higher ROC-AUC, precision, and F1 while not sacrificing recall quite so drastically). Some may or may not apply exactly to your scenario, but in practice they often help boost predictive power for trading models.
Since you’re working on a trading (time-series) problem, a standard train_test_split(..., shuffle=False)
may not fully reflect the temporal dependencies in your data. Instead, consider using TimeSeriesSplit (or a custom walk-forward split) to respect the order of time:
from sklearn.model_selection import TimeSeriesSplit tscv = TimeSeriesSplit(n_splits=5) # or however many splits you want rf_grid_search = GridSearchCV( estimator=RandomForestClassifier(random_state=42), param_grid=rf_param_grid, cv=tscv, scoring='roc_auc', n_jobs=-1 )
TimeSeriesSplit ensures each split trains on past data and tests on future data, preventing look-ahead bias and giving you a more realistic performance estimate.
Right now, you’re exploring only a small parameter grid:
rf_param_grid = { 'n_estimators': [50, 100, 200], 'max_depth': [4, 6, 8], 'min_samples_split': [2, 5, 10] }
max_depth
(3, 5, 7, 9, 11, None), try min_samples_leaf
, or incorporate max_features
.RandomizedSearchCV
or libraries like optuna for more efficient hyperparameter searches.Example:
rf_param_grid = { 'n_estimators': [50, 100, 200, 300, 500], 'max_depth': [3, 5, 7, 9, None], 'min_samples_split': [2, 5, 10], 'min_samples_leaf': [1, 2, 5, 10], 'max_features': ['sqrt', 'log2'] }
You have add_target(df_all, forecast_horizon=5)
which presumably labels whether the price moves up/down in the next 5 candles. Sometimes the definition of an up/down move or the threshold for “significant” movement can cause extreme class imbalance or lead to many false positives.
Making your target less noisy (i.e. only “1” if a certain price change threshold is exceeded) often yields a better signal.
In trading, good features often matter more than the exact choice of model. In addition to standard indicators:
SelectKBest
, RFE
, or Boruta
on top of RF to prune unhelpful indicators.You are already using SMOTE on the training set:
smote = SMOTE(random_state=42) X_train, y_train = smote.fit_resample(X_train, y_train)
class_weight='balanced'
in RandomForest).sampling_strategy
parameter).Your high recall (0.999) indicates you are predicting almost everything as positive for the chosen threshold (0.35). That kills precision. One approach:
Sometimes the best F1 threshold isn’t the best for actual trading. If you’d rather have fewer signals but more accurate ones, you might want to increase the threshold to push precision up.
CalibratedClassifierCV
with sigmoid) is appropriate. Sometimes isotonic calibration works better, or an ensemble approach might do better than a single RF.Sometimes a simple classification metric like F1 or precision/recall is not enough for a trading system. You might find it useful to:
By iterating on these points—especially focusing on the quality of your features, time-series cross-validation, and a balanced threshold for your trading goals—you should see improvements in ROC-AUC, precision, and a better overall F1 score (while likely reducing that near-100% recall, which is usually a sign of many false positives). Good luck!