In this part, we will build machine learning ensemble models that can generate intraday trading signals for almost any forex pair.
The dataset we are going to use contains 17 CSV historical OLHCV (Open, Low, High, Close, Volume) files of 17 forex pairs. Historical data can be downloaded from –> here.
Available FX pairs: USDJPY, USDCHF, USDCAD, NZDUSD, GBPUSD, GBPCAD, GBPAUD, GBPJPY, EURCAD, EURAUD, EURJPY, EURGBP, EURCHF, EURCAD, CHFJPY, AUDUSD, AUDJPY.
Disclaimer
- This article is for educational purposes only.
- Machine learning models are NOT totally reliable. Please bear in mind that there is NO PERFECT model that provides risk-free trading signals.
- The Cross-validation section will be posted in part 3. We will see advanced backtesting methods (using synthetic data and Monte-Carlo simulations) testing libraries like Zipline (part 3).
What is this post about
- Using pandas dataframe to manipulate datasets, and matplotlib for data vizualisation.
- Coding functions in python for technical indicators like MACD and RSI
- Feature engineering (selecting, manipulating, and transforming data into features that can be used in our models learning)
- Create, train and test Ensemble Machine Learning Models using sklearn library (Keras Framework).
- Saving and loading ML models using joblib
- Generating training and testing reports
- Backtesting
- Real-time testing using a free API


Getting started
Import the required libraries
Define the target FX pair (do the same for each model)
Read the data

Add RSI, MACD indicators to df
Feature engineering
Define risk/reward ratio and compute the risk and the target rate –
Calculate max profit on each point and determine potential entry points
Filter “weak signals” due to insuffisant volume
Output: 5834.241929210424
Implement Random Forrest Model

Our model predicts SELL positions (-1) with 75% accuracy and BUY positions with 78% accuracy.
Save the model using joblib to reuse it later in part 2
['/content/drive/MyDrive/data/FXEURUSD.joblib']
Visualize the Model’s feature importance
The mean decrease in Gini coefficient is a measure of how each variable contributes to the homogeneity of the nodes and leaves in the resulting random forest. The higher the value of mean decrease accuracy or mean decrease Gini score, the higher the importance of the variable in the model.

Plot the predicted signals on the test dataset

Looks good.
Basic backtest
Note: we’ll demonstrate advanced backtesting methods (using synthetic data and Monte-Carlo simulations) using Zipline n part 3.

Test in Real-time using an API
Compute target features
Generates signals using our model

Further improvements
- Hyperparameters tuning (we only used default hyperparameters values)
- Possible use of a Hard Voting Ensembles to increase accuracy
- Trying other indicators at the feature engineering stage