Predicting the Trend - Introduction and Feature Enrichment

 Predicting the Trend

This article tries to explore how machine learning helps to predict a trend based on historical data.  We will use Foreign Exchange rates as an example to see if we can predict the daily rise and fall of currency pairs based on their previous market data.  A pure technical analysis approach without taking into account market news or reports will be taken to simplify the machine learning model and the market data requirement.

The following diagram shows the output of the targeting system with four currency pairs:
As a common approach in Forex technical analysis, technical indicators will be used based on the historical price data.  Historical data including the daily open, high, low, close and volume for each currency pair will be used to generate the technical indicator values.  

However, as opposed to manual technical analysis, we'll find a way for the system itself to determine which indicator(s) is more applicable under the current market condition for the currency pair.  In order to do this, we'll first have to teach the system how to generate the values of the common technical indicators used in Forex analysis.  Fortunately, there is no need to write the logic for each indicator ourselves as there is Python libraries available that could be used to provide most of the common technical indicators that is helpful in predicting the trend in Forex.  This is also one of the reasons why Python should be used for this project in addition to it's strong support in machine learning modeling.  The followings are the key indicators to be considered in the system:

Some of the indicators are available in a Python library called talib and some of them you will have to create on your own.  I'll try to explain how some of them could be created in upcoming blogs if you are interested.  When the indicator logics are ready, we can start building the machine learning model.  I'll follow the following steps to build the model:
  1. Feature Enrichment
  2. Feature Selection and Extraction
  3. Classification
For the environment to load the Forex data, execute the Python commands and storage of the generated result, you can select to use any platform of your choice.  The following is a sample architecture based on Alibaba PAI:

Feature Enrichment

The following process is used to generate the indicator values and incorporate into the master dataset: 

  1. First of all, we'll read the historical data into a pandas dataframe and index the data by date
  2. As different period of indicators works well with different currency pairs under different market conditions, we would like to allow multiple peiriod values to be generated for each indicators, we'll create a list of period for each indicators, e.g. momentumKey = [3,4,5,8,9,10], meaning the momentum indicator will be evaluated for 3, 4, 5, 8, 9, 10 days as period.
  3. Next, we'll define a list of all the indicators that will be incorporated into the feature list, e.g. keyList = [momentumKey,stochasticKey,williamsKey,procKey,weightedcpKey,wadlKey,adoscKey,macdKey,cciKey,bbandsKey,fourierKey,sineKey]
  4. For the values generated for each indicator period, we'll store those into dictionaries, we'll create each value inside a class method and return a dictionary as a return parameter.  The following is a snippet of the method for the momentum indicator:


    • and then we can assign the result to a variable, such as momentumKey and repeat the same for other indicators
  5. We can then put all dictionaries into a dictionary list, e.g. dictList = [momentumDict.close,stochasticDict.close,williamsDict.close,procDict.close,weightedcpDict.close,wadlDict.wadl,adoscDict.adosc,macdDict.line,cciDict.close,bbandsDict.close,heikenDict.candles,hpaDict.high,lpaDict.low,hlpaDict.highlow,paverageDict.avs,slopeDict.slope,fourierDict.coeffs,sineDict.coeffs,garchDict.garch]
  6. To make the master dataset a little bit more readable, we can also add a list of column names for each feature, e.g. colFeat = ['momentum','stochastic','williams','proc','weightedcp','wadl','adosc','macd','cci','bbands','heiken','hpa','lpa','hlpa','paverage','slope','fourier','sine','garch']
  7. Lastly, to actually add the indicator results into the master dataset, we can use the following snippet:

    • The 'macd' indicator is treated differently because it requires 2 period for each valuation.
The above is a high level summary of the feature enrichment process.  After this, we will have to select and extract the relevant features and optimize the classification result based on different combination of indicators selected and extracted to complete the process,

For those who are interested in understanding the select and extract process used, stay tuned for my upcoming blogs. 








 








Comments

Popular posts from this blog

IoT Anti-Counterfeit - Introduction

IoT Anti-Counterfeit - Architecture