Creating a statistical model to predict 10 year treasury yields

Overview

Predicting 10-Year Treasury Yields

Intitially, I wanted to see if the volatility in the stock market, represented by the VIX index (data source), had a tangible impact on 10-Year Treasury yields (data source). Below are the results of my exploration of the VIX's effect on 10Y yields:

Line Graph Comparing VIX Price and Yield over the last 31 years

VIX and Yield TS

As can be seen in the above graph, there doesn't seem to be much correlation off the bat, simply looking at their annual trends. Overall, yields seem to have dropped quite dramatically over the last 31 years, with not much reaction to major changes in volatility. Meanwhile, VIX has had a more dramatic journey, with plenty of large ups and downs. Although it doesn't seem like much of a correlation from this view, it would be more beneficial to look at a scatter plot and create a regression line to be sure.

VIX vs. Yield Scatter Plot

VIX vs. Yield

The red line in the scatter plot is the regression line obtained. The regression line seems to be slanted downward, indicating a negative effect. This means that when the volatility in the stock market goes up, 10Y Treasury yields go down. The regression equation: 10-Year Treasury Yield = 4.71 + -0.02(VIX Price) indicates that an increase of $1 US in the VIX price would cause the yield to go down by 0.02 percentage points. Since the VIX price will never be $0, it does not make sense to interpret the y-intercept of 4.71. Thus, based on this scatter plot, and the fact that there is a slope to regression line, there may be a significant impact on yield by the price of VIX. However, to check if it is statistically significant, the t-statistic is needed.

Stata Analysis

Thus, I decided to run some statistical analysis in stata, contained here. The first regression I ran was between VIX Price and 10Y yields to see if there was any statistically significant effect of stock volatility on yields. When checking for statistical significance in the 5% size, the t-statistic of the coefficient must be either above 1.96 or below -1.96 to be considered significant. In this case, the t-statistic was -1.46, which meant that the stock volatility was not statistically significant.

...Not so fast. One issue with trying to simplify trends in this way is that omitted variables could play a big part in the statistical significance of present variables. Thus, I decided to use 4 more key macroeconomical datasets: unemployment rate, interest rate, change in CPI, and inflationary expectations. With these 4 key parts of the economy accounted for, I ran another regression, including all of the variables against the yield.

The new data was quite interesting. I had expected the change in CPI and inflationary expectations to be really important factors, but it turns out they are statistically insignificant. The t-statistic for change in CPI was 0.12 and for inflationary expectations was -1.71, short of the 1.96 and -1.96 thresholds required respectively. On the other hand, the t-statistic for the VIX Price dropped to -3.49, meaning that some of the variables that were added to the model were in fact invisibly impacting the effects of the volatility. The unemployment rate and interest rate were both statistically significant, with t-statistics of 10.99 and 37.20 respectively. Overall, 80.19% of the variation in the 10-Year Treasury yield could be explained by my model.

Interest Rate vs. 10-Year Treasury Yield Graph

ir vs. yield

Having seen the graph of a statistically insignificant variable (pre-multiple regression), I wanted to plot a scatter plot of an extremely significant variable to see the contrast. It is clear that there is a clear positive relationship between interest rate and the 10-Year Treasury yield. The regression line: 10-Year Treasury Yield = 2.31 + 0.73(Interest Rate) indicates that an increase in interest rate of 1 percentage point leads to a 0.73 percentage point increase in the yield. It is possible for rates to come down to 0, so the y-intercept indicates that the 10Y Treasury Note yields 2.31% when the interest rate hits 0. The constrast between the two red regression lines, as well as the distribution of the dots shown in the two scatter plots is quite clear, indicating how statistically significant the two variables are comparitavely.

Project instructions

10Y Treasury data citation:

OECD, "Main Economic Indicators - complete database", Main Economic Indicators (database),http://dx.doi.org/10.1787/data-00052-en (October 23, 2021) Copyright, 2016, OECD. Reprinted with permission.

Change in CPI data citation:

OECD, "Main Economic Indicators - complete database", Main Economic Indicators (database),http://dx.doi.org/10.1787/data-00052-en (October 23, 2021) Copyright, 2016, OECD. Reprinted with permission.

Inflation Expectation data citation:

Surveys of Consumers, University of Michigan, University of Michigan: Inflation Expectation© [MICH], retrieved from FRED, Federal Reserve Bank of St. Louis https://fred.stlouisfed.org/series/MICH/, (October 23, 2021)

Sample code for Harry's Airflow online trainng course

Sample code for Harry's Airflow online trainng course You can find the videos on youtube or bilibili. I am working on adding below things: the slide p

102 Dec 30, 2022
Fitting thermodynamic models with pycalphad

ESPEI ESPEI, or Extensible Self-optimizing Phase Equilibria Infrastructure, is a tool for thermodynamic database development within the CALPHAD method

Phases Research Lab 42 Sep 12, 2022
A set of functions and analysis classes for solvation structure analysis

SolvationAnalysis The macroscopic behavior of a liquid is determined by its microscopic structure. For ionic systems, like batteries and many enzymes,

MDAnalysis 19 Nov 24, 2022
Exploratory data analysis

Exploratory data analysis An Exploratory data analysis APP TAPIWA CHAMBOKO 🚀 About Me I'm a full stack developer experienced in deploying artificial

tapiwa chamboko 1 Nov 07, 2021
This creates a ohlc timeseries from downloaded CSV files from NSE India website and makes a SQLite database for your research.

NSE-timeseries-form-CSV-file-creator-and-SQL-appender- This creates a ohlc timeseries from downloaded CSV files from National Stock Exchange India (NS

PILLAI, Amal 1 Oct 02, 2022
Useful tool for inserting DataFrames into the Excel sheet.

PyCellFrame Insert Pandas DataFrames into the Excel sheet with a bunch of conditions Install pip install pycellframe Usage Examples Let's suppose that

Luka Sosiashvili 1 Feb 16, 2022
Bearsql allows you to query pandas dataframe with sql syntax.

Bearsql adds sql syntax on pandas dataframe. It uses duckdb to speedup the pandas processing and as the sql engine

14 Jun 22, 2022
A utility for functional piping in Python that allows you to access any function in any scope as a partial.

WithPartial Introduction WithPartial is a simple utility for functional piping in Python. The package exposes a context manager (used with with) calle

Michael Milton 1 Oct 26, 2021
PostQF is a user-friendly Postfix queue data filter which operates on data produced by postqueue -j.

PostQF Copyright © 2022 Ralph Seichter PostQF is a user-friendly Postfix queue data filter which operates on data produced by postqueue -j. See the ma

Ralph Seichter 11 Nov 24, 2022
Show you how to integrate Zeppelin with Airflow

Introduction This repository is to show you how to integrate Zeppelin with Airflow. The philosophy behind the ingtegration is to make the transition f

Jeff Zhang 11 Dec 30, 2022
Aggregating gridded data (xarray) to polygons

A package to aggregate gridded data in xarray to polygons in geopandas using area-weighting from the relative area overlaps between pixels and polygons. Check out the binder link above for a sample c

Kevin Schwarzwald 42 Nov 09, 2022
Python Package for DataHerb: create, search, and load datasets.

The Python Package for DataHerb A DataHerb Core Service to Create and Load Datasets.

DataHerb 4 Feb 11, 2022
SNV calling pipeline developed explicitly to process individual or trio vcf files obtained from Illumina based pipeline (grch37/grch38).

SNV Pipeline SNV calling pipeline developed explicitly to process individual or trio vcf files obtained from Illumina based pipeline (grch37/grch38).

East Genomics 1 Nov 02, 2021
Pypeln is a simple yet powerful Python library for creating concurrent data pipelines.

Pypeln Pypeln (pronounced as "pypeline") is a simple yet powerful Python library for creating concurrent data pipelines. Main Features Simple: Pypeln

Cristian Garcia 1.4k Dec 31, 2022
statDistros is a Python library for dealing with various statistical distributions

StatisticalDistributions statDistros statDistros is a Python library for dealing with various statistical distributions. Now it provides various stati

1 Oct 03, 2021
Full ELT process on GCP environment.

Rent Houses Germany - GCP Pipeline Project: The goal of the project is to extract data about house rentals in Germany, store, process and analyze it u

Felipe Demenech Vasconcelos 2 Jan 20, 2022
An extension to pandas dataframes describe function.

pandas_summary An extension to pandas dataframes describe function. The module contains DataFrameSummary object that extend describe() with: propertie

Mourad 450 Dec 30, 2022
First and foremost, we want dbt documentation to retain a DRY principle. Every time we repeat ourselves, we waste our time. Second, we want to understand column level lineage and automate impact analysis.

dbt-osmosis First and foremost, we want dbt documentation to retain a DRY principle. Every time we repeat ourselves, we waste our time. Second, we wan

Alexander Butler 150 Jan 06, 2023
This repo contains a simple but effective tool made using python which can be used for quality control in statistical approach.

This repo contains a powerful tool made using python which is used to visualize, analyse and finally assess the quality of the product depending upon the given observations

SasiVatsal 8 Oct 18, 2022