SAT Project - The first project I had done at General Assembly, performed EDA, data cleaning and created data visualizations

Overview

Project 1: Standardized Test Analysis by Adam Klesc

Overview

This project covers:

  • Basic statistics and probability
  • Many Python programming concepts
  • Programmatically interacting with files and directories
  • Visualizations
  • EDA
  • Working with Jupyter notebooks for development and reporting

The SAT is a standardized test that many colleges and universities in the United States require for their admissions process. This score is used along with other materials such as grade point average (GPA) and essay responses to determine whether or not a potential student will be accepted to the university.

The SAT has two sections of the test: Evidence-Based Reading and Writing and Math (source).

The SAT changed their format in 2016 to address some problems that students and schools had with their style of questions and questionable grading. The ACT surpassed the SAT in popularity due to these problems and a perception that the SAT was "class-biased" due to their questions. The ACT also did this through a lot of states make it free state-wide and a requirement to pass high school which the SAT did not catch up on.


Problem Statement

When crafting my problem statement, I really was stuck with what to really analyze and what problem to solve. But it really simplified the process for me when looking at the yearly datasets because looking at the years and the progression of both the SAT and ACT participation rates, it really clicked.

My problem statement in a short sentence is:

Has collegeboard solved the problems they were seeking to fix with the 2016 format change?

I really had to go through three factors when making my problem statement. Who is my audience, what data am I using, and what value am I bringing to my audience.

  • The first factor was fairly easy to answer, I think the people who would be most interested in the answer to my problem statement would be collegeboard and the people involved in the change to see whether their change has actually worked.

  • The second factor was what data I would use and primarily I would be using the SAT 2017-2019 data along with the ACT 2017-2019 data but over the course of the project, I added some outside sources as well.

  • The final factor in creating that problem statement was bringing clarity to the changes and whether they have been actually productive in solving what they were mean't to solve.


Datasets

Provided Data

These are the datasets I used in the process that were included in the data folder provided, I cleaned each of them but will still be sharing both the raw and cleaned datasets

Additional Data

These are the datasets I required from outside research I performed.


Data Dictionary

A dictionary compiling the primary columns from the two main datasets used throughout the analysis.

Feature Type Dataset Description
participation17 float SATTotal Participation numbers for the 2017 SAT(in decimal percentage)
participation17 float ACTTotal Participation numbers for the 2017 ACT(in decimal percentage)
participation18 float ACTTotal Participation numbers for the 2018 ACT(in decimal percentage)
participation18 float SATTotal Participation numbers for the 2018 SAT(in decimal percentage)
participation19 float SATTotal Participation numbers for the 2019 SAT(in decimal percentage)
participation19 *float( ACTTotal Participation numbers for the 2019 ACT(in decimal percentage)
total17 int SATTotal Average total score for the 2017 SAT(median score)
total18 int SATTotal Average total score for the 2018 SAT(median score)
total19 int SATTotal Average total score for the 2019 SAT(median score)
composite17 float ACTTotal Average total score for the 2017 ACT(mean score)
composite18 float ACTTotal Average total score for the 2018 ACT(mean score)
composite19 float ACTTotal Average total score for the 2019 ACT(mean score)
sat_free boolean ACTTotal/SATTotal States that have the SAT free statewide
act_free boolean ACTTotal/SATTotal States that have the ACT free statewide
population_rank float ACTTotal/SATTotal Population rank for each state
gdp_rank_19 float/int ACTTotal/SATTotal GDP rank for each state
state object SATTotal The states where the data is located
state object ACTTotal The states where the data is located

Deliverables

Notebook that contains my cleaning of data and creation of new CSV's

cleaning_and_creating.ipynb

Notebook that contains my analysis and visualization

visualization_and_analysis.ipynb

Starter Code

starter-code.ipynb

ReadMe

README-ADAMK.md

Presentation

project_1_presentation.pdf


Technical Report Starter Code

Edited structure of starter code notebook by creating individual notesbooks for cleaning and analysis + visualization. Starter code trials are complete either in those two notebooks or on the starter code itself

Notebook that contains my cleaning of data and creation of new CSV's

cleaning_and_creating.ipynb

Notebook that contains my analysis and visualization

visualization_and_analysis.ipynb

Starter Code

starter-code.ipynb


Analysis

The analysis started with understanding the problem statement, in order to do that, I had to find out the reasons why the SAT changed and look at those problems in isolation each when looking at the data to see if collegeboard has actually succeeded in solving the problems they tried to address with their change.

  • Increase participation to overtake the ACT in market-share

  • Eliminate the stigma that the SAT is class-biased and is unfair towards those of lower incomes

I narrowed it down to these two problems because of these two sources

greentestprep

cnn.com

These sources really helped clarify what the SAT was actually looking to achieve with their change and helped me in my data collection process

  • From looking at the mean and median, it was evident that the SAT was steadily gaining popularity while the ACT was losing popularity but what makes the change a great success in terms of the first problem the change was mean't to address was how the SAT was significantly more popular than the ACT in states with a higher population. The SAT has also notably been made free in 3 states after 2017 (the first full year of the SAT change), those states being Colorado, West Virginia, and Illinois. New Hampshire, Michigan, and Connecticut also made a switch to the SAT, the year of the change. It's clear that the SAT change has done wonders in improving participation and making states believers in their test.

  • This lines up with outside sources claiming that the SAT surpassed the ACT in 2019 with overall market share with 2.2 million students to the ACT's 1.8.

  • From looking at the correlation between the GDP per state + SAT Participation, it was clear to see that the change had not had the effect that was desired from collegeboard execs. The coefficient stayed the same despite the added help in a new Khan Academy course that was free to all + fee waivers (for college applications) for those who met a certain income threshold. source

  • While a lot of studies tend to disagree on the fact that the SAT is a class-biased test, it is clear that a lot of universities are taking note of the potentially inherent class disadvantage in standardized testing as a whole and have outright banned it not only for just times of a pandemic, but also for the foreseeable future. The most prominent of these universities being the University of California


In Conclusion

From my analysis, I was able to conclude that the SAT has successfully managed to overtake the ACT once again as the market leader in standardized testing due to primarily to the SAT changes made to the format and structure. It not only made the test more appealing to students but also states themselves as number of them made it free during the time between 2019 from the date of the change. This is evidenced by the fact that the SAT once again, became the market-leader in standardized testing after the ACT's regression post-SAT change. While solving participation was a massive success in both participation numbers and time, the second problem is where the SAT still needs work.

The SAT change partially occurred due to the reputation that the SAT was a class-biased test, the strange scoring of the 2400 scale and detachment from the actual school work that students were learning in school lead to the ACT being more popular and while they did fix the problem with participation, the lingering problem of class-bias is still there. With more colleges pulling away from standardized testing and geographical location being so heavily tied to participation and success, there need to be more upcoming changes to rectify this issue and make the SAT a more equitable test.

Owner
Adam Muhammad Klesc
Hopeful data scientist. Currently in General Assembly and taking their data science immersive course!
Adam Muhammad Klesc
ActNN: Reducing Training Memory Footprint via 2-Bit Activation Compressed Training

ActNN : Activation Compressed Training This is the official project repository for ActNN: Reducing Training Memory Footprint via 2-Bit Activation Comp

UC Berkeley RISE 178 Jan 05, 2023
tensorflow implementation of 'YOLO : Real-Time Object Detection'

YOLO_tensorflow (Version 0.3, Last updated :2017.02.21) 1.Introduction This is tensorflow implementation of the YOLO:Real-Time Object Detection It can

Jinyoung Choi 1.7k Nov 21, 2022
A set of Deep Reinforcement Learning Agents implemented in Tensorflow.

Deep Reinforcement Learning Agents This repository contains a collection of reinforcement learning algorithms written in Tensorflow. The ipython noteb

Arthur Juliani 2.2k Jan 01, 2023
Code for 'Blockwise Sequential Model Learning for Partially Observable Reinforcement Learning' (AAAI 2022)

Blockwise Sequential Model Learning Code for 'Blockwise Sequential Model Learning for Partially Observable Reinforcement Learning' (AAAI 2022) For ins

2 Jun 17, 2022
Python wrapper of LSODA (solving ODEs) which can be called from within numba functions.

numbalsoda numbalsoda is a python wrapper to the LSODA method in ODEPACK, which is for solving ordinary differential equation initial value problems.

Nick Wogan 52 Jan 09, 2023
Dynamic vae - Dynamic VAE algorithm is used for anomaly detection of battery data

Dynamic VAE frame Automatic feature extraction can be achieved by probability di

10 Oct 07, 2022
Training, generation, and analysis code for Learning Particle Physics by Example: Location-Aware Generative Adversarial Networks for Physics

Location-Aware Generative Adversarial Networks (LAGAN) for Physics Synthesis This repository contains all the code used in L. de Oliveira (@lukedeo),

Deep Learning for HEP 57 Oct 22, 2022
PyTorch implementation of our ICCV 2019 paper: Liquid Warping GAN: A Unified Framework for Human Motion Imitation, Appearance Transfer and Novel View Synthesis

Impersonator PyTorch implementation of our ICCV 2019 paper: Liquid Warping GAN: A Unified Framework for Human Motion Imitation, Appearance Transfer an

SVIP Lab 1.7k Jan 06, 2023
Image Recognition using Pytorch

PyTorch Project Template A simple and well designed structure is essential for any Deep Learning project, so after a lot practice and contributing in

Sarat Chinni 1 Nov 02, 2021
This folder contains the implementation of the multi-relational attribute propagation algorithm.

MrAP This folder contains the implementation of the multi-relational attribute propagation algorithm. It requires the package pytorch-scatter. Please

6 Dec 06, 2022
Colossal-AI: A Unified Deep Learning System for Large-Scale Parallel Training

ColossalAI An integrated large-scale model training system with efficient parallelization techniques. arXiv: Colossal-AI: A Unified Deep Learning Syst

HPC-AI Tech 7.9k Jan 08, 2023
Python scripts for performing stereo depth estimation using the HITNET Tensorflow model.

HITNET-Stereo-Depth-estimation Python scripts for performing stereo depth estimation using the HITNET Tensorflow model from Google Research. Stereo de

Ibai Gorordo 76 Jan 02, 2023
Jingju baseline - A baseline model of our project of Beijing opera script generation

Jingju Baseline It is a baseline of our project about Beijing opera script gener

midon 1 Jan 14, 2022
A Pytorch Implementation of a continuously rate adjustable learned image compression framework.

GainedVAE A Pytorch Implementation of a continuously rate adjustable learned image compression framework, Gained Variational Autoencoder(GainedVAE). N

39 Dec 24, 2022
Code for the paper: On Pathologies in KL-Regularized Reinforcement Learning from Expert Demonstrations

Non-Parametric Prior Actor-Critic (N-PPAC) This repository contains the code for On Pathologies in KL-Regularized Reinforcement Learning from Expert D

Cong Lu 5 May 13, 2022
Deep Semisupervised Multiview Learning With Increasing Views (IEEE TCYB 2021, PyTorch Code)

Deep Semisupervised Multiview Learning With Increasing Views (ISVN, IEEE TCYB) Peng Hu, Xi Peng, Hongyuan Zhu, Liangli Zhen, Jie Lin, Huaibai Yan, Dez

3 Nov 19, 2022
A quantum game modeling of pandemic (QHack 2022)

Contributors: @JongheumJung, @YoonjaeChung, @GyunghunKim Abstract In the regime of a global pandemic, leaders around the world need to consider variou

Yoonjae Chung 8 Apr 03, 2022
MAT: Mask-Aware Transformer for Large Hole Image Inpainting

MAT: Mask-Aware Transformer for Large Hole Image Inpainting (CVPR2022, Oral) Wenbo Li, Zhe Lin, Kun Zhou, Lu Qi, Yi Wang, Jiaya Jia [Paper] News This

254 Dec 29, 2022
IDRLnet, a Python toolbox for modeling and solving problems through Physics-Informed Neural Network (PINN) systematically.

IDRLnet IDRLnet is a machine learning library on top of PyTorch. Use IDRLnet if you need a machine learning library that solves both forward and inver

IDRL 105 Dec 17, 2022
Instance Segmentation in 3D Scenes using Semantic Superpoint Tree Networks

SSTNet Instance Segmentation in 3D Scenes using Semantic Superpoint Tree Networks(ICCV2021) by Zhihao Liang, Zhihao Li, Songcen Xu, Mingkui Tan, Kui J

83 Nov 29, 2022