Block fingerprinting for the beacon chain, for client identification & client diversity metrics

Overview

blockprint

This is a repository for discussion and development of tools for Ethereum block fingerprinting.

The primary aim is to measure beacon chain client diversity using on-chain data, as described in this tweet:

https://twitter.com/sproulM_/status/1440512518242197516

The latest estimate using the improved k-NN classifier for slots 2048001 to 2164916 is:

Getting Started

The raw data for block fingerprinting needs to be sourced from Lighthouse's block_rewards API.

This is a new API that is currently only available on the block-rewards-api branch, i.e. this pull request: https://github.com/sigp/lighthouse/pull/2628

Lighthouse can be built from source by following the instructions here.

VirtualEnv

All Python commands should be run from a virtualenv with the dependencies from requirements.txt installed.

python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt

k-NN Classifier

The best classifier implemented so far is a k-nearest neighbours classifier in knn_classifier.py.

It requires a directory of structered training data to run, and can be used either via a small API server, or in batch mode.

You can download a large (886M) training data set here.

To run in batch mode against a directory of JSON batches (individual files downloaded from LH), use this command:

./knn_classifier.py training_data_proc data_to_classify

Expected output is:

classifier score: 0.9886800869904645
classifying rewards from file slot_2048001_to_2050048.json
total blocks processed: 2032
Lighthouse,0.2072
Nimbus or Prysm,0.002
Nimbus or Teku,0.0025
Prysm,0.6339
Prysm or Teku,0.0241
Teku,0.1304

Training the Classifier

The classifier is trained from a directory of reward batches. You can fetch batches with the load_blocks.py script by providing a start slot, end slot and output directory:

./load_blocks.py 2048001 2048032 testdata

The directory testdata now contains 1 or more files of the form slot_X_to_Y.json downloaded from Lighthouse.

To train the classifier on this data, use the prepare_training_data.py script:

./prepare_training_data.py testdata testdata_proc

This will read files from testdata and write the graffiti-classified training data to testdata_proc, which is structured as directories of single block reward files for each client.

$ tree testdata_proc
testdata_proc
├── Lighthouse
│   ├── 0x03ae60212c73bc2d09dd3a7269f042782ab0c7a64e8202c316cbcaf62f42b942.json
│   └── 0x5e0872a64ea6165e87bc7e698795cb3928484e01ffdb49ebaa5b95e20bdb392c.json
├── Nimbus
│   └── 0x0a90585b2a2572305db37ef332cb3cbb768eba08ad1396f82b795876359fc8fb.json
├── Prysm
│   └── 0x0a16c9a66800bd65d997db19669439281764d541ca89c15a4a10fc1782d94b1c.json
└── Teku
    ├── 0x09d60a130334aa3b9b669bf588396a007e9192de002ce66f55e5a28309b9d0d3.json
    ├── 0x421a91ebdb650671e552ce3491928d8f78e04c7c9cb75e885df90e1593ca54d6.json
    └── 0x7fedb0da9699c93ce66966555c6719e1159ae7b3220c7053a08c8f50e2f3f56f.json

You can then use this directory as the first argument to ./knn_classifier.py.

Classifier API

With pre-processed training data installed in ./training_data_proc, you can host a classification API server like this:

gunicorn --reload api_server --timeout 1800

It will take a few minutes to start-up while it loads all of the training data into memory.

Initialising classifier, this could take a moment...
Start-up complete, classifier score is 0.9886800869904645

Once it has started up, you can make POST requests to the /classify endpoint containing a single JSON-encoded block reward. There is an example input file in examples.

curl -s -X POST -H "Content-Type: application/json" --data @examples/single_teku_block.json "http://localhost:8000/classify"

The response is of the following form:

{
  "block_root": "0x421a91ebdb650671e552ce3491928d8f78e04c7c9cb75e885df90e1593ca54d6",
  "best_guess_single": "Teku",
  "best_guess_multi": "Teku",
  "probability_map": {
    "Lighthouse": 0.0,
    "Nimbus": 0.0,
    "Prysm": 0.0,
    "Teku": 1.0
  }
}
  • best_guess_single is the single client that the classifier deemed most likely to have proposed this block.
  • best_guess_multi is a list of 1-2 client guesses. If the classifier is more than 95% sure of a single client then the multi guess will be the same as best_guess_single. Otherwise it will be a string of the form "Lighthouse or Teku" with 2 clients in lexicographic order. 3 client splits are never returned.
  • probability_map is a map from each known client label to the probability that the given block was proposed by that client.

TODO

  • Improve the classification algorithm using better stats or machine learning (done, k-NN).
  • Decide on data representations and APIs for presenting data to a frontend (done).
  • Implement a web backend for the above API (done).
  • Polish and improve all of the above.
Owner
Sigma Prime
Blockchain & Information Security Services
Sigma Prime
WhyNotWin11 - Detection Script to help identify why your PC isn't Windows 11 Release Ready

WhyNotWin11 - Detection Script to help identify why your PC isn't Windows 11 Release Ready

Robert C. Maehl 5.9k Dec 31, 2022
msImpersonate - User account impersonation written in pure Python3

msImpersonate v1.0 msImpersonate is a Python-native user impersonation tool that is capable of impersonating local or network user accounts with valid

Joe Helle 90 Dec 16, 2022
(Pre-)compromise operations for MITRE CALDERA

(Pre-)compromise operations for CALDERA Extend your CALDERA operations over the entire adversary killchain. In contrast to MITRE's access plugin, cald

Diederik Bakker 3 Aug 22, 2022
The update manager for the ERA App (era.sh)

ERA Update Manager This is the official update manager used in the ERA app (see era.sh) How it works Once a new version of ERA is available, the app l

Kian Shahriyari 1 Dec 29, 2021
Automatically remove user join messages when the user leaves the server.

CleanLeave Automatically remove user join messages when the user leaves the server. Installation You will need to install poetry to run this bot local

11 Sep 19, 2022
A webapp for taking fast notes, designed for business, school, and collaboration with groups.

JOTS Journal of the Session A webapp for taking fast notes, designed for business, school, and collaboration with groups.

Zebadiah S. Taylor 2 Jun 10, 2022
Some basic sorting algos

Sorting-Algos Some basic sorting algos HacktoberFest 2021 This repository consists of mezzo-level projects that undertake a simple task and perform it

Manthan Ghasadiya 7 Dec 13, 2022
Bionic is Python Framework for crafting beautiful, fast user experiences for web and is free and open source.

Bionic is Python Framework for crafting beautiful, fast user experiences for web and is free and open source. Getting Started This is an example of ho

14 Apr 10, 2022
a wordle-solver written in python

Wordle Solver Overview This is yet another wordle solver. It is built with the word list of the official wordle website, but it should also work with

Shoubhit Dash 10 Sep 24, 2022
Implemented Exploratory Data Analysis (EDA) using Python.Built a dashboard in Tableau and found that 45.87% of People suffer from heart disease.

Heart_Disease_Diagnostic_Analysis Objective 🎯 The aim of this project is to use the given data and perform ETL and data analysis to infer key metrics

Sultan Shaikh 4 Jan 28, 2022
Python dictionaries with advanced dot notation access

from box import Box movie_box = Box({ "Robin Hood: Men in Tights": { "imdb stars": 6.7, "length": 104 } }) movie_box.Robin_Hood_Men_in_Tights.imdb_s

Chris Griffith 2.1k Dec 28, 2022
Demo Python project using Conda and Poetry

Conda Poetry This is a demonstration of how Conda and Poetry can be used in a Python project for dev dependency management and production deployment.

Ryan Allen 2 Apr 26, 2022
A web interface for a soft serve Git server.

Soft Serve monitor Soft Sevre is a very nice git server. It offers a really nice TUI to browse the repositories on the server. Unfortunately, it does

Maxime Bouillot 5 Apr 26, 2022
This Python library searches through a static directory and appends artist, title, track number, album title, duration, and genre to a .json object

This Python library searches through a static directory (needs to match your environment) and appends artist, title, track number, album title, duration, and genre to a .json object. This .json objec

Edan Ybarra 1 Jun 20, 2022
Its a simple and fun to use application. You can make your own quizes and send the lik of the quiz to your friends.

Quiz Application Its a simple and fun to use application. You can make your own quizes and send the lik of the quiz to your friends. When they would a

Atharva Parkhe 1 Feb 23, 2022
A package selector for building your confy nest

Hornero A package selector for building your comfy nest About Hornero helps you to install your favourite packages on your fresh installed Linux distr

Santiago Soler 1 Nov 22, 2021
A simple PID tuner and simulator.

PIDtuner-V0.1 PlantPy PID tuner version 0.1 Features Supports first order and ramp process models. Supports Proportional action on PV or error or a sp

3 Jun 23, 2022
Url-check-migration-python - A python script using Apica API's to migrate URL checks between environments

url-check-migration-python A python script using Apica API's to migrate URL chec

Angelo Aquino 1 Feb 16, 2022
Data wrangling & common calculations for results from qMem measurement software

qMem Datawrangler This script processes output of qMem measurement software into an Origin ® compatible *.csv files and matplotlib graphs to quickly v

Julian 1 Nov 30, 2021
A repository of study materials related to Think Python 2nd Edition by Allen B. Downey. More information about the book can be found here: https://greenteapress.com/wp/think-python-2e/

Intro-To-Python This content is based on the book Think Python 2nd Edition by Allen B. Downey. More information about the book can be found here: http

Brent Eskridge 63 Jan 07, 2023