Python KNN model: Predicting a probability of getting a work visa. Tableau: Non-immigrant visas over the years.

Last update: Nov 21, 2021

Overview

The value of international students to the United States. Probability of getting a non-immigrant visa.

Project timeline: Jan 2021 - April 2021

Project team:

Zinaida Dvoskina (myself)
Kirill Ilin
Johnathan Conley
Cindy Ye Fung

Analyzed publicly available data on the U.S. non-immigrant visa acquisition. To conduct research, used publicly available data from the USCIS (the number of visas issued per country, category, the political party in office, and year) and from the US Department of Labor Office of Foreign Labor Certification (employment-based immigration applications: applicant’s received dates, decision dates, the most recent date a case determination decision was issued, etc.).

Created a Tableau timelapse, showing the world map, where visa numbers can be filtered by region, country, and compared between years. Other visualizations showed no strong trend to justify that the political party in office affects the likelihood of a foreigner obtaining a visa.

Created a KNN model for classification with the following variables as predictors: Received month, Agent representing employer, Annual wage rate, Annual prevailing wage, PW wage level, H-1B dependent status, Support H1B status. Datasets are populated with approved results of visa applications - almost 97%. That resulted in highly biased prediction models towards positive outcomes, which means the model wasn’t very trustworthy, even though it performed very well predicting positive outcomes for visa approval.

To solve the problem, randomly eliminated data points and aligned the number of positive and negative outcomes for a more correct prediction. Due to computing power, had to limit the number of predictors to 3: Full Time Position, PW, and New Employer, and the model was only run for 2020.

A new KNN model run on undersampled data showed results not biased towards a positive outcome. Chosen predictors had an impact on visa decisions, however, only in approximately 60% of cases. Further increase in the number of predictors could improve the model.

An interesting finding was that software engineers are at the top job title to obtain a working visa; however, they have the most denials.

In this repository you can find our code, Tableau workbooks, project report and a presentation with our major findings. The data file is too big to upload here.

Python KNN model: Predicting a probability of getting a work visa. Tableau: Non-immigrant visas over the years.

Related tags

Overview

Owner

Zinaida Dvoskina

Code for the paper "VisualBERT: A Simple and Performant Baseline for Vision and Language"

Zero-Cost Proxies for Lightweight NAS

This is an official pytorch implementation of Lite-HRNet: A Lightweight High-Resolution Network.

In this project, we create and implement a deep learning library from scratch.

Repository of Jupyter notebook tutorials for teaching the Deep Learning Course at the University of Amsterdam (MSc AI), Fall 2020

One Million Scenes for Autonomous Driving

VISSL is FAIR's library of extensible, modular and scalable components for SOTA Self-Supervised Learning with images.

Practical Single-Image Super-Resolution Using Look-Up Table

SBINN: Systems-biology informed neural network

Video Matting Refinement For Python

System-oriented IR evaluations are limited to rather abstract understandings of real user behavior

Source codes of CenterTrack++ in 2021 ICME Workshop on Big Surveillance Data Processing and Analysis

Raster Vision is an open source Python framework for building computer vision models on satellite, aerial, and other large imagery sets

Code for "The Box Size Confidence Bias Harms Your Object Detector"

This repo is developed for Strong Baseline For Vehicle Re-Identification in Track 2 Ai-City-2021 Challenges

This repository contains the code and models for the following paper.

Medical Image Segmentation using Squeeze-and-Expansion Transformers

noisy labels; missing labels; semi-supervised learning; entropy; uncertainty; robustness and generalisation.

Implementation of Kronecker Attention in Pytorch

Uncertain natural language inference