Implementation of the bachelor's thesis "Real-time stock predictions with deep learning and news scraping".

Overview

Real-time stock predictions with deep learning and news scraping

This repository contains a partial implementation of my bachelor's thesis "Real-time stock predictions with deep learning and news scraping". The code has been built using PyTorch Lightning, read its documentation to get a complete overview of how this repository is structured.

Disclaimer: Neither the pipeline nor the model published in this repository are the ones used in the thesis. On the pipeline side, notice that the model tries to match headlines and prices of the same day, while in the thesis we used news published the day before. For the case of the model, the one shared here has nothing to do with the original and should be considered a toy model.

Preparing the data

The data used in the thesis has been completely crawled and put together from scratch. Specifically, you can find the titles and descriptions of the news published on Reuters.com from January 2010 to May 2018. In addition to that, you also have the stock prices (end of the day) of S&P 500 companies extracted from AlphaVantage.co. Everything is compressed in a H5DF file that you can download from this link.

The first step is to clone this repository and install its dependencies:

git clone https://github.com/davidalvarezdlt/bachelor_thesis.git
cd bachelor_thesis
pip install -r requirements.txt

Move both bachelor_thesis_data.hdf5 and word2vec.bin inside ./data. The resulting folder structure should look like this:

bachelor_thesis/
    bachelor_thesis/
    data/
        bachelor_thesis_data.hdf5
        word2vec.bin
    lightning_logs/
    .gitignore
    .pre-commit-config.yaml
    LICENSE
    README.md
    requirements.txt

Training the model

In short, you can train the model by calling:

python -m bachelor_thesis

You can modify the default parameters of the code by using CLI parameters. Get a complete list of the available parameters by calling:

python -m bachelor_thesis --help

For instance, if we want to train the model using GOOGL stock prices, with a batch size of 32 and using one GPUs, we would call:

python -m bachelor_thesis --symbol GOOGL --batch_size 32 --gpus 1

Every time you train the model, a new folder inside ./lightning_logs will be created. Each folder represents a different version of the model, containing its checkpoints and auxiliary files.

Testing the model

You can measure the loss and the accuracy of the model (number of times the prediction is correct) and store it in TensorBoard by calling:

python -m bachelor_thesis --test --test_checkpoint <test_checkpoint>

Where --test_checkpoint is a valid path to the model checkpoint that should be used.

Citation

If you use the data provided in this repository or if you find this thesis useful, please use the following citation:

@thesis{Alvarez2018,
    type = {Bachelor's Thesis},
    author = {David Álvarez de la Torre},
    title = {Real-time stock predictions with Deep Learning and news scrapping},
    school = {Universitat Politècnica de Catalunya},
    year = 2018,
}
Owner
David Álvarez de la Torre
Founder of @lemonplot. Alumni of UPC and ETH.
David Álvarez de la Torre
QilingLab challenge writeup

qiling lab writeup shielder 在 2021/7/21 發布了 QilingLab 來幫助學習 qiling framwork 的用法,剛好最近有用到,順手解了一下並寫了一下 writeup。 前情提要 Qiling 是一款功能強大的模擬框架,和 qemu user mode

Yuan 17 Nov 17, 2022
Optimized primitives for collective multi-GPU communication

NCCL Optimized primitives for inter-GPU communication. Introduction NCCL (pronounced "Nickel") is a stand-alone library of standard communication rout

NVIDIA Corporation 2k Jan 09, 2023
abess: Fast Best-Subset Selection in Python and R

abess: Fast Best-Subset Selection in Python and R Overview abess (Adaptive BEst Subset Selection) library aims to solve general best subset selection,

297 Dec 21, 2022
Hierarchical Memory Matching Network for Video Object Segmentation (ICCV 2021)

Hierarchical Memory Matching Network for Video Object Segmentation Hongje Seong, Seoung Wug Oh, Joon-Young Lee, Seongwon Lee, Suhyeon Lee, Euntai Kim

Hongje Seong 72 Dec 14, 2022
Python implementation of ADD: Frequency Attention and Multi-View based Knowledge Distillation to Detect Low-Quality Compressed Deepfake Images, AAAI2022.

ADD: Frequency Attention and Multi-View based Knowledge Distillation to Detect Low-Quality Compressed Deepfake Images Binh M. Le & Simon S. Woo, "ADD:

2 Oct 24, 2022
Machine Learning with JAX Tutorials

The purpose of this repo is to make it easy to get started with JAX. It contains my "Machine Learning with JAX" series of tutorials (YouTube videos and Jupyter Notebooks) as well as the content I fou

Aleksa Gordić 372 Dec 28, 2022
YOLOv4 / Scaled-YOLOv4 / YOLO - Neural Networks for Object Detection (Windows and Linux version of Darknet )

Yolo v4, v3 and v2 for Windows and Linux (neural networks for object detection) Paper YOLO v4: https://arxiv.org/abs/2004.10934 Paper Scaled YOLO v4:

Alexey 20.2k Jan 09, 2023
YolactEdge: Real-time Instance Segmentation on the Edge

YolactEdge, the first competitive instance segmentation approach that runs on small edge devices at real-time speeds. Specifically, YolactEdge runs at up to 30.8 FPS on a Jetson AGX Xavier (and 172.7

Haotian Liu 1.1k Jan 06, 2023
An open-source, low-cost, image-based weed detection device for fallow scenarios.

Welcome to the OpenWeedLocator (OWL) project, an opensource hardware and software green-on-brown weed detector that uses entirely off-the-shelf compon

Guy Coleman 145 Jan 05, 2023
A platform to display the carbon neutralization information for researchers, decision-makers, and other participants in the community.

Welcome to Carbon Insight Carbon Insight is a platform aiming to display the carbon neutralization roadmap for researchers, decision-makers, and other

Microsoft 14 Oct 24, 2022
MLSpace: Hassle-free machine learning & deep learning development

MLSpace: Hassle-free machine learning & deep learning development

abhishek thakur 293 Jan 03, 2023
Large Scale Fine-Grained Categorization and Domain-Specific Transfer Learning. CVPR 2018

Large Scale Fine-Grained Categorization and Domain-Specific Transfer Learning Tensorflow code and models for the paper: Large Scale Fine-Grained Categ

Yin Cui 187 Oct 01, 2022
TuckER: Tensor Factorization for Knowledge Graph Completion

TuckER: Tensor Factorization for Knowledge Graph Completion This codebase contains PyTorch implementation of the paper: TuckER: Tensor Factorization f

Ivana Balazevic 296 Dec 06, 2022
An SMPC companion library for Syft

SyMPC A library that extends PySyft with SMPC support SyMPC /ˈsɪmpəθi/ is a library which extends PySyft ≥0.3 with SMPC support. It allows computing o

Arturo Marquez Flores 0 Oct 13, 2021
Python 3 module to print out long strings of text with intervals of time inbetween

Python-Fastprint Python 3 module to print out long strings of text with intervals of time inbetween Install: pip install fastprint Sync Usage: from fa

Kainoa Kanter 2 Jun 27, 2022
Deep-learning X-Ray Micro-CT image enhancement, pore-network modelling and continuum modelling

EDSR modelling A Github repository for deep-learning image enhancement, pore-network and continuum modelling from X-Ray Micro-CT images. The repositor

Samuel Jackson 7 Nov 03, 2022
YOLOv5 🚀 is a family of object detection architectures and models pretrained on the COCO dataset

YOLOv5 🚀 is a family of object detection architectures and models pretrained on the COCO dataset, and represents Ultralytics open-source research int

阿才 73 Dec 16, 2022
QA-GNN: Question Answering using Language Models and Knowledge Graphs

QA-GNN: Question Answering using Language Models and Knowledge Graphs This repo provides the source code & data of our paper: QA-GNN: Reasoning with L

Michihiro Yasunaga 434 Jan 04, 2023
Powerful unsupervised domain adaptation method for dense retrieval.

Powerful unsupervised domain adaptation method for dense retrieval

Ubiquitous Knowledge Processing Lab 191 Dec 28, 2022
Unsupervised Attributed Multiplex Network Embedding (AAAI 2020)

Unsupervised Attributed Multiplex Network Embedding (DMGI) Overview Nodes in a multiplex network are connected by multiple types of relations. However

Chanyoung Park 114 Dec 06, 2022