Generating Forex Intraday Trading Signals using Machine Learning Ensemble and NLP Sentiment Analysis Models – Part 1

In this part, we will build machine learning ensemble models that can generate intraday trading signals for almost any forex pair.

The dataset we are going to use contains 17 CSV historical OLHCV (Open, Low, High, Close, Volume) files of 17 forex pairs. Historical data can be downloaded from –> here.



  • This article is for educational purposes only.
  • Machine learning models are NOT totally reliable. Please bear in mind that there is NO PERFECT model that provides risk-free trading signals.
  • The Cross-validation section will be posted in part 3. We will see advanced backtesting methods (using synthetic data and Monte-Carlo simulations) testing libraries like Zipline (part 3).

What is this post about

  • Using pandas dataframe to manipulate datasets, and matplotlib for data vizualisation.
  • Coding functions in python for technical indicators like MACD and RSI
  • Feature engineering (selecting, manipulating, and transforming data into features that can be used in our models learning)
  • Create, train and test Ensemble Machine Learning Models using sklearn library (Keras Framework).
  • Saving and loading ML models using joblib
  • Generating training and testing reports
  • Backtesting
  • Real-time testing using a free API
Intraday signals provided by our EURUSD model (test dataset)
Signals and Sentiment Analysis from several reliable alternative public data sources (PART 2)

Getting started

Import the required libraries

Define the target FX pair (do the same for each model)

Read the data

Add RSI, MACD indicators to df

Feature engineering

Define risk/reward ratio and compute the risk and the target rate –

Calculate max profit on each point and determine potential entry points

Filter “weak signals” due to insuffisant volume

Output: 5834.241929210424

Implement Random Forrest Model

Our model predicts SELL positions (-1) with 75% accuracy and BUY positions with 78% accuracy.

Save the model using joblib to reuse it later in part 2


Visualize the Model’s feature importance

The mean decrease in Gini coefficient is a measure of how each variable contributes to the homogeneity of the nodes and leaves in the resulting random forest. The higher the value of mean decrease accuracy or mean decrease Gini score, the higher the importance of the variable in the model.

Plot the predicted signals on the test dataset

Looks good.

Basic backtest

Note: we’ll demonstrate advanced backtesting methods (using synthetic data and Monte-Carlo simulations) using Zipline n part 3.

Test in Real-time using an API

Compute target features

Generates signals using our model

Further improvements

  • Hyperparameters tuning (we only used default hyperparameters values)
  • Possible use of a Hard Voting Ensembles to increase accuracy
  • Trying other indicators at the feature engineering stage

Don’t miss these tips!

We don’t spam! Read our privacy policy for more info.

Open chat
Powered by