Improve current data preprocessing for FTM's WOB data to analyze Shell and Dutch Governmental contacts.

Overview

data-preprocessing_toogoodtogo_threatlines

We're the hackathon leftovers, but we are Too Good To Go ;-). A repo by Lukas Schubotz, Stef van Buuren, and Raymon van Dinter. We aim to improve current data preprocessing for FTM's WOB data to analyze Shell and Dutch Governmental contacts.

Synchronous visualisation of email threads

Publications from the FTM "Dossier SHELL papers" https://www.ftm.nl/dossier/shell-papers suggest that timing of events is critical in the interactions between actors. It would therefore be useful if we could visualise the mail exchanges in time.

The idea is to visualise threads of mail exchanges between actors over time. When this is done for multiple threads, the display would give rapid insight into the structure and timing of exchanges between actors. For example, suppose we are able to construct a single thread from "RE:" and "FW:" mails in the data. A simple visualisation would be

See https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.88.9825&rep=rep1&type=pdf for variations on this display, for example by adding the interactions between the actors by fancy arcs and resorting the mails according to actor pairs.

A generalisation to multiple simulataneous threads would stack multiple lines, similar to a dot plot. Such a design calls for relatively simple thread displays that are synchronised in time. Therefore we will concentrate on using a simple thread line that plots mail chronology against calender time.

A somewhat grander idea would be to create a "film of events". The user would place a cursor on the time axis, and scroll through time. The new information per mail is displayed as the cursor passes the send time of the email.

Issues to resolve

We need complex/advanced text processing. Some of the issues include:

  1. How can we split multiple emails in a RE/FW into a set of elementary mails, each corresponding to just one sender?
  2. How well can we form threads by matching on subject lines?
  3. Do duplicates extracted from RE/FW serve any useful purpose?
  4. What is the percentage of threads for which we can find the parent mail (the mail that started the thread)?

Experiment 1

The first design plots all thread lines between 2016 and 2020 on one chart.

Experiment 2

The second design uses trelliscopejs to plot the same information in smaller pieces.

The user can switch between 27 panes, each containing about 20 threads.

Try out the interactive version

Experiment 3

Back to figure 1, but now plotted with rbokeh, so that we may zoom and use tooltips (interaction not supported by GitHub markdown)

Owner
ASReview hackathon for Follow the Money
ASReview hackathon for Follow the Money
DSG - Source code for Digital Scholarship Grant project.

DSG Source code for Dr. Stephanie Tsang's Digital Scholarship Grant project. Work performed by Mr. Wang Minghao while as her Research Assistant. The s

1 Jan 04, 2022
Генератор отчетов на Python с использованием библиотеки docx для работы с word-файлами и запросов к сервису

Генератор отчетов на Python с использованием библиотеки docx для работы с word-файлами и запросов к сервису

Semyon Esaev 2 Jun 24, 2022
Linux Pressure Stall Information (PSI) Status App

Linux Pressure Stall Information (PSI) Status App psistat is a simple python3 program to display the PSIs and to capture/display exception events. psi

Joe D 3 Sep 18, 2022
Meower a social media platform written in Scratch 3.0 and Python

Meower Meower is a social media platform written in Scratch 3.0 and Python, ported to HTML for self-hosting. Try Beta 4.6 Changelog for 4.6 Start impl

Meower Media Co. 23 Dec 02, 2022
Minimalistic Gridworld Environment (MiniGrid)

Minimalistic Gridworld Environment (MiniGrid) There are other gridworld Gym environments out there, but this one is designed to be particularly simple

Maxime Chevalier-Boisvert 1.7k Jan 03, 2023
Vaksina - Vaksina COVID QR Validation Checker With Python

Vaksina COVID QR Validation Checker Vaksina is a general purpose library intende

Michael Casadevall 33 Aug 20, 2022
Scripts to integrate DFIR-IRIS, MISP and TimeSketch

Scripts to integrate DFIR-IRIS, MISP and TimeSketch

Koen Van Impe 20 Dec 16, 2022
Scitizen - Help scientific research for the benefit of mankind and humanity 🔬

Scitizen - Help scientific research for the benefit of mankind and humanity 🔬 Scitizen has been built from the ground up to give everyone the possibi

Pierre CORBEL 21 Mar 08, 2022
A basic animation modding workflow for FFXIV

AnimAssist Provides a quick and easy way to mod animations in FFXIV. You will need: Before anything, the VC++2012 32-bit Redist from here. Havok will

liam 37 Dec 16, 2022
Clackety Keyboards Powered by Python

KMK: Clackety Keyboards Powered by Python KMK is a feature-rich and beginner-friendly firmware for computer keyboards written and configured in Circui

KMK Firmware 780 Jan 03, 2023
Antchain-MPC is a library of MPC (Multi-Parties Computation)

Antchain-MPC Antchain-MPC is a library of MPC (Multi-Parties Computation). It include Morse-STF: A tool for machine learning using MPC. Others: Commin

Alipay 37 Nov 22, 2022
For when you really need to rank things

Comparisonator For when you really need to rank things. Do you know that feeling when there's this urge deep within you that tells you to compare thin

Maciej Wilczyński 1 Nov 01, 2021
eyes is a Public Opinion Mining System focusing on taiwanese forums such as PTT, Dcard.

eyes is a Public Opinion Mining System focusing on taiwanese forums such as PTT, Dcard. Features 🔥 Article monitor: helps you capture the trend at a

Sean 116 Dec 29, 2022
automate some stuff so I can be more noob

dota automate some stuff so I can be more noob This is a simple project, but one that I've wanted forever! I use pyautogui, time, smtplib and datetime

Aaron Allen 17 Oct 18, 2022
Certipy is a Python tool to enumerate and abuse misconfigurations in Active Directory Certificate Services (AD CS).

Certipy Certipy is a Python tool to enumerate and abuse misconfigurations in Active Directory Certificate Services (AD CS). Based on the C# variant Ce

ollypwn 1.3k Jan 01, 2023
Tool for running a high throughput data ingestion/transformation workload with MongoDB

Mongo Mangler The mongo-mangler tool is a lightweight Python utility, which you can run from a low-powered machine to execute a high throughput data i

Paul Done 9 Jan 02, 2023
Taking the fight to the establishment.

Throwdown Taking the fight to the establishment. Wat? I wanted a simple markdown interpreter in python and/or javascript to output html for my website

Trevor van Hoof 1 Feb 01, 2022
Fuzz introspector for python

Fuzz introspector High-level goals: Show fuzzing-relevant data about each function in a given project Show reachability of fuzzer(s) Integrate seamles

14 Mar 25, 2022
Create beautiful diagrams just by typing mathematical notation in plain text.

Penrose Penrose is an early-stage system that is still in development. Our system is not ready for contributions or public use yet, but hopefully will

Penrose 5.6k Jan 08, 2023
PyGo custom language, New but similar language programming

New but similar language programming. Now we are capable to program in a very similar language to Python but at the same time get the efficiency of Go.

Fernando Perez 4 Nov 19, 2022