Introducing neural networks to predict stock prices

Overview

IntroNeuralNetworks in Python: A Template Project

forthebadge made-with-python

GitHub license PRs Welcome

IntroNeuralNetworks is a project that introduces neural networks and illustrates an example of how one can use neural networks to predict stock prices. It is built with the goal of allowing beginners to understand the fundamentals of how neural network models are built and go through the entire workflow of machine learning. This model is in no way sophisticated, so do improve upon this base project in any way.

The core steps involved is: download stock price data from Yahoo Finance, preprocess the dataframes according to specifications for neural network libraries and finally train the neural network model and backtest over historical data.

This model is not meant to be used to live trade stocks with. However, with further extensions, this model can definitely be used to support your trading strategies.

I hope you find this project useful in your journey as a trader or a machine learning engineer. Personally, this is my first major machine learning and python project, so I'll appreciate if you leave a star.

As a disclaimer, this is a purely educational project. Any backtested results do not guarantee performance in live trading. Do live trading at your own risk. This guide and further analysis has been cross-posted in my blog, Engineer Quant

Contents

Overview

The overall workflow for this project is as such:

  1. Acquire the stock price data - this will give us our features for the model.
  2. Preprocess the data - make the train and test datasets.
  3. Use the neural network to learn from the training data.
  4. Backtest the model across a date range.
  5. Make useful stock price predictions
  6. Supplement your trading strategies with the predictions

Although this is very general, it is essentially what you need to build your own machine learning or neural network model.

Getting Started

For those of you that do not want to learn about the construction of the model (although I highly suggest you to), clone and download the project, unzip it to your preferred folder and run the following code in your computer.

pip install -r requirements.txt
python LSTM_model.py

It's as simple as that!

Requirements

For those who want a more details manual, this program is built in Python 3.6. If you are using an earlier version of Python, like Python 3.x, you will run into problems with syntax when it comes to f strings. I do suggest that you update to Python 3.6.

pip install -r requirements.txt

Stock Price Data

Now we come to the most dreaded part of any machine learning project: data acquisiton and data preprocessing. As tedious and hard as it might be, it is vital to have high quality data to feed into your model. As the saying goes "Garbage in. Garbage out." This is most applicable to machine learning models, as your model is only as good as the data it is fed. Processing the data comes in two parts: downloading the data, and forming our datasets for the model. Thanks to Yahoo Finance API, downloading the stock price data is relatively simple (sadly I doubt not for long).

To download the stock price data, we use pandas_datareader which after a while did not work. So we use this fix and use fix_yahoo_finance. If this fails (maybe in the near future), you can just download the stock data directly from Yahoo for free and save it as stock_price.csv.

Preprocessing

Once we have the stock price data for the stocks we are going to predict, we now need to create the training and testing datasets.

Preparing Train Dataset

The goal for our training dataset is to have rows of a given length (the number of prices used to predict) along with the correct prediction to evaluate our model against. I have given the user the option of choosing how much of the stock price data you want to use for your training data when calling the Preprocessing class. Generating the training data is done quite simply using numpy.arrays and a for loop. You can perform this by running:

Preprocessing.get_train(seq_len)

Preparing Test Dataset

The test dataset is prepared in precisely the same way as the training dataset, just that the length of the data is different. This is done with the following code:

Preprocessing.get_test(seq_len)

Neural Network Models

Since the main goal of this project is to get acquainted with machine learning and neural networks, I will explain what models I have used and why they may be efficient in predicting stock prices. If you want a more detailed explanation of neural networks, check out my blog.

Multilayer Perceptron Model

A multilayer perceptron is the most basic of neural networks that uses backpropagation to learn from the training dataset. If you want more details about how the multilayer perceptron works, do read this article.

LSTM Model

The benefit of using a Long Short Term Memory neural network is that there is an extra element of long term memory, where the neural network has data about the data in prior layers as a 'memory' which allows the model to find the relationships between the data itself and between the data and output. Again for more details, please read this article

Backtesting

My backtest system is simple in the sense that it only evaluates how well the model predicts the stock price. It does not actually consider how to trade based on these predictions (that is the topic of developing trading strategies using this model). To run just the backtesting, you will need to run

back_test(strategy, seq_len, ticker, start_date, end_date, dim)

The dim variable is the dimensions of the data set you want and it is necessary to successfully train the models.

Stock Predictions

Now that your model has been trained and backtested, we can use it to make stock price predictions. In order to make stock price predictions, you need to download the current data and use the predict method of keras module. Run the following code after training and backtesting the model:

data = pdr.get_data_yahoo("AAPL", "2017-12-19", "2018-01-03")
stock = data["Adj Close"]
X_predict = np.array(stock).reshape((1, 10)) / 200
print(model.predict(X_predict)*200)

Extensions

As mentioned before, this projected is highly extendable, and here some ideas for improving the project.

Getting Data

Getting data is pretty standard using Yahoo Finance. However, you may want to look into clustering data in terms of trends of stocks (maybe by sector, or if you want to be really precise, use k-means clustering?).

Neural Network Model

This neural network can be improved in many ways:

  1. Tuning hyperparameters: find the optimal hyperparameters that gives the best prediction
  2. Backtesting: Make the backtesting system more robust (I have left certain important aspects out for you to figure). Maybe include buying and shorting?
  3. Try different Neural Networks: There are plenty of options and see which works best for your stocks.

Supporting Trade

As I said earlier, this model can be used to support trading by using this prediction in your trading strategy. Examples include:

  1. Simple long short strategy: you buy if the prediction is higher, and vice versa
  2. Intraday Trading: if you can get your hands on minute data or even tick data, you can use this predictor to trade.
  3. Statistical Arbitrage: use can also use the predictions of various stock prices to find the correlation between stocks.

Contributing

Feel free to fork this and submit PRs. I am open and grateful for any suggestions or bug fixes. Hope you enjoy this project!


For more content like this, check out my academic blog at https://medium.com/engineer-quant

Owner
Vivek Palaniappan
Keen on finding effective solutions to complex problems - looking into the broad intersection between engineering, finance and AI.
Vivek Palaniappan
[NeurIPS 2021] Code for Unsupervised Learning of Compositional Energy Concepts

Unsupervised Learning of Compositional Energy Concepts This is the pytorch code for the paper Unsupervised Learning of Compositional Energy Concepts.

45 Nov 30, 2022
A "gym" style toolkit for building lightweight Neural Architecture Search systems

A "gym" style toolkit for building lightweight Neural Architecture Search systems

Jack Turner 12 Nov 05, 2022
Quantile Regression DQN a Minimal Working Example, Distributional Reinforcement Learning with Quantile Regression

Quantile Regression DQN Quantile Regression DQN a Minimal Working Example, Distributional Reinforcement Learning with Quantile Regression (https://arx

Arsenii Senya Ashukha 80 Sep 17, 2022
[CoRL 21'] TANDEM: Tracking and Dense Mapping in Real-time using Deep Multi-view Stereo

TANDEM: Tracking and Dense Mapping in Real-time using Deep Multi-view Stereo Lukas Koestler1*    Nan Yang1,2*,†    Niclas Zeller2,3    Daniel Cremers1

TUM Computer Vision Group 744 Jan 04, 2023
Code for the paper "Attention Approximates Sparse Distributed Memory"

Attention Approximates Sparse Distributed Memory - Codebase This is all of the code used to run analyses in the paper "Attention Approximates Sparse D

Trenton Bricken 14 Dec 05, 2022
Neural network for recognizing the gender of people in photos

Neural Network For Gender Recognition How to test it? Install requirements.txt file using pip install -r requirements.txt command Run nn.py using pyth

Valery Chapman 1 Sep 18, 2022
A foreign language learning aid using a neural network to predict probability of translating foreign words

Langy Langy is a reading-focused foreign language learning aid orientated towards young children. Reading is an activity that every child knows. It is

Shona Lowden 6 Nov 17, 2021
Source code release of the paper: Knowledge-Guided Deep Fractal Neural Networks for Human Pose Estimation.

GNet-pose Project Page: http://guanghan.info/projects/guided-fractal/ UPDATE 9/27/2018: Prototxts and model that achieved 93.9Pck on LSP dataset. http

Guanghan Ning 83 Nov 21, 2022
An abstraction layer for mathematical optimization solvers.

MathOptInterface Documentation Build Status Social An abstraction layer for mathematical optimization solvers. Replaces MathProgBase. Citing MathOptIn

JuMP-dev 284 Jan 04, 2023
Implementation of Learning Gradient Fields for Molecular Conformation Generation (ICML 2021).

[PDF] | [Slides] The official implementation of Learning Gradient Fields for Molecular Conformation Generation (ICML 2021 Long talk) Installation Inst

MilaGraph 117 Dec 09, 2022
Implementation of ICCV 2021 oral paper -- A Novel Self-Supervised Learning for Gaussian Mixture Model

SS-GMM Implementation of ICCV 2021 oral paper -- Self-Supervised Image Prior Learning with GMM from a Single Noisy Image with supplementary material R

HUST-The Tan Lab 4 Dec 05, 2022
Unofficial Implementation of MLP-Mixer, gMLP, resMLP, Vision Permutator, S2MLPv2, RaftMLP, ConvMLP, ConvMixer in Jittor and PyTorch.

Unofficial Implementation of MLP-Mixer, gMLP, resMLP, Vision Permutator, S2MLPv2, RaftMLP, ConvMLP, ConvMixer in Jittor and PyTorch! Now, Rearrange and Reduce in einops.layers.jittor are support!!

130 Jan 08, 2023
"Learning and Analyzing Generation Order for Undirected Sequence Models" in Findings of EMNLP, 2021

undirected-generation-dev This repo contains the source code of the models described in the following paper "Learning and Analyzing Generation Order f

Yichen Jiang 0 Mar 25, 2022
Tensorflow implementation of soft-attention mechanism for video caption generation.

SA-tensorflow Tensorflow implementation of soft-attention mechanism for video caption generation. An example of soft-attention mechanism. The attentio

Paul Chen 153 Nov 14, 2022
PyTorch implementation of the paper Dynamic Data Augmentation with Gating Networks

Dynamic Data Augmentation with Gating Networks This is an official PyTorch implementation of the paper Dynamic Data Augmentation with Gating Networks

九州大学 ヒューマンインタフェース研究室 3 Oct 26, 2022
DiscoNet: Learning Distilled Collaboration Graph for Multi-Agent Perception [NeurIPS 2021]

DiscoNet: Learning Distilled Collaboration Graph for Multi-Agent Perception [NeurIPS 2021] Yiming Li, Shunli Ren, Pengxiang Wu, Siheng Chen, Chen Feng

Automation and Intelligence for Civil Engineering (AI4CE) Lab @ NYU 98 Dec 21, 2022
Code artifacts for the submission "Mind the Gap! A Study on the Transferability of Virtual vs Physical-world Testing of Autonomous Driving Systems"

Code Artifacts Code artifacts for the submission "Mind the Gap! A Study on the Transferability of Virtual vs Physical-world Testing of Autonomous Driv

Andrea Stocco 2 Aug 24, 2022
StackNet is a computational, scalable and analytical Meta modelling framework

StackNet This repository contains StackNet Meta modelling methodology (and software) which is part of my work as a PhD Student in the computer science

Marios Michailidis 1.3k Dec 15, 2022
Code for "Learning From Multiple Experts: Self-paced Knowledge Distillation for Long-tailed Classification", ECCV 2020 Spotlight

Learning From Multiple Experts: Self-paced Knowledge Distillation for Long-tailed Classification Implementation of "Learning From Multiple Experts: Se

27 Nov 05, 2022
Si Adek Keras is software VR dangerous object detection.

Si Adek Python Keras Sistem Informasi Deteksi Benda Berbahaya Keras Python. Version 1.0 Developed by Ananda Rauf Maududi. Developed date: 24 November

Ananda Rauf 1 Dec 21, 2021