☀️ Measuring the accuracy of BBC weather forecasts in Honolulu, USA

Overview

Accuracy of BBC Weather forecasts for Honolulu

This repository records the forecasts made by BBC Weather for the city of Honolulu, USA. Essentially, there's a GitHub Action that runs at each 30 minute mark and saves the latest forecasts. The data is stored in a separate branch called data. Therefore, the data is versioned. This allows going back into the past to see the forecasts that were made for any given hour in the (relative) future.

I made this after watching Git scraping, the five minute lightning talk by Simon Willison. It blew my mind! I agree with Simon that collecting and versioning API data via git is a powerful pattern. You could use this pattern to keep a ledger of any dynamic forecasting system, such as the predicted outcomes of football games. In these dynamical systems, the forecasts are updated when new information becomes available. Therefore, the forecasted values depend on the point in time when they were made. I think that it's super interesting to analyse how these forecasts evolve through time.

The build_database.py script iterates through all the commits in the data branch and consolidates the data into an SQLite database. You can run the script yourself by simply cloning this repository. Then, go into a terminal, navigate to the cloned repository, and install the necessary Python dependencies:

python -m venv .env
source .env/bin/activate
pip install -r requirements.txt

Then, run the consolidation script:

python build_database.py

This will create a bbc_weather.sqlite file. You can load the latter into your preferred database access tool — I have a personal preference for DataGrip — to analyse the data. At present, the database contains two tables:

forecasts

These are the predicted weather values made at one point in time for a future point in time.

issued_at at celsius feels_like_celsius wind_speed_kph
2021-03-10 09:00:00 2021-03-10 11:00:00 24 30 16
2021-03-10 09:00:00 2021-03-10 12:00:00 25 31 17
2021-03-10 09:00:00 2021-03-10 13:00:00 26 32 17
2021-03-10 09:00:00 2021-03-10 14:00:00 27 33 17
2021-03-10 09:00:00 2021-03-10 15:00:00 26 33 17

observations

These are the weather values that actually occurred — as opposed to those that were forecasted.

at celsius wind_speed_kph
2021-03-09 19:00:00 23 0
2021-03-09 20:00:00 22 8
2021-03-09 21:00:00 22 0
2021-03-09 22:00:00 21 9
2021-03-09 23:00:00 21 0

Check out measure_accuracy.sql for an example of how to evaluate the correctness of the forecasts.

Owner
Max Halford
Data wizard @alan-eu. PhD in machine learning applied to query optimization. Kaggle competitions Master when it was cool. Online machine learning nut.
Max Halford
Ptorch NLU, a Chinese text classification and sequence annotation toolkit, supports multi class and multi label classification tasks of Chinese long text and short text, and supports sequence annotation tasks such as Chinese named entity recognition, part of speech tagging and word segmentation.

Pytorch-NLU,一个中文文本分类、序列标注工具包,支持中文长文本、短文本的多类、多标签分类任务,支持中文命名实体识别、词性标注、分词等序列标注任务。 Ptorch NLU, a Chinese text classification and sequence annotation toolkit, supports multi class and multi label classifi

186 Dec 24, 2022
Ecco is a python library for exploring and explaining Natural Language Processing models using interactive visualizations.

Visualize, analyze, and explore NLP language models. Ecco creates interactive visualizations directly in Jupyter notebooks explaining the behavior of Transformer-based language models (like GPT2, BER

Jay Alammar 1.6k Dec 25, 2022
Blazing fast language detection using fastText model

Luga A blazing fast language detection using fastText's language models Luga is a Swahili word for language. fastText provides a blazing fast language

Prayson Wilfred Daniel 18 Dec 20, 2022
BookNLP, a natural language processing pipeline for books

BookNLP BookNLP is a natural language processing pipeline that scales to books and other long documents (in English), including: Part-of-speech taggin

654 Jan 02, 2023
Implementation of TF-IDF algorithm to find documents similarity with cosine similarity

NLP learning Trying to learn NLP to use in my projects! Table of Contents About The Project Built With Getting Started Requirements Run Usage License

Faraz Farangizadeh 3 Aug 25, 2022
Mirco Ravanelli 2.3k Dec 27, 2022
pysentimiento: A Python toolkit for Sentiment Analysis and Social NLP tasks

A Python multilingual toolkit for Sentiment Analysis and Social NLP tasks

297 Dec 29, 2022
Pretrain CPM - 大规模预训练语言模型的预训练代码

CPM-Pretrain 版本更新记录 为了促进中文自然语言处理研究的发展,本项目提供了大规模预训练语言模型的预训练代码。项目主要基于DeepSpeed、Megatron实现,可以支持数据并行、模型加速、流水并行的代码。 安装 1、首先安装pytorch等基础依赖,再安装APEX以支持fp16。 p

Tsinghua AI 37 Dec 06, 2022
Official code repository of the paper Linear Transformers Are Secretly Fast Weight Programmers.

Linear Transformers Are Secretly Fast Weight Programmers This repository contains the code accompanying the paper Linear Transformers Are Secretly Fas

Imanol Schlag 77 Dec 19, 2022
p-tuning for few-shot NLU task

p-tuning_NLU Overview 这个小项目是受乐于分享的苏剑林大佬这篇p-tuning 文章启发,也实现了个使用P-tuning进行NLU分类的任务, 思路是一样的,prompt实现方式有不同,这里是将[unused*]的embeddings参数抽取出用于初始化prompt_embed后

3 Dec 29, 2022
End-to-end MLOps pipeline of a BERT model for emotion classification.

image source EmoBERT-MLOps The goal of this repository is to build an end-to-end MLOps pipeline based on the MLOps course from Made with ML, but this

Dimitre Oliveira 4 Nov 06, 2022
Beyond Paragraphs: NLP for Long Sequences

Beyond Paragraphs: NLP for Long Sequences

AI2 338 Dec 02, 2022
Creating a Feed of MISP Events from ThreatFox (by abuse.ch)

ThreatFox2Misp Creating a Feed of MISP Events from ThreatFox (by abuse.ch) What will it do? This will fetch IOCs from ThreatFox by Abuse.ch, convert t

17 Nov 22, 2022
मराठी भाषा वाचविण्याचा एक प्रयास. इंग्रजी ते मराठीचा शब्दकोश. An attempt to preserve the Marathi language. A lightweight and ad free English to Marathi thesaurus.

For English, scroll down मराठी शब्द मराठी भाषा वाचवण्यासाठी मी हा ओपन सोर्स प्रोजेक्ट सुरू केला आहे. माझ्या मते, आपली भाषा हळूहळू आणि कोणाचाही लक्षात

मुक्त स्त्रोत 20 Oct 11, 2022
This repository collects together basic linguistic processing data for using dataset dumps from the Common Voice project

Common Voice Utils This repository collects together basic linguistic processing data for using dataset dumps from the Common Voice project. It aims t

Francis Tyers 40 Dec 20, 2022
Ukrainian TTS (text-to-speech) using Coqui TTS

title emoji colorFrom colorTo sdk app_file pinned Ukrainian TTS 🐸 green green gradio app.py false Ukrainian TTS 📢 🤖 Ukrainian TTS (text-to-speech)

Yurii Paniv 85 Dec 26, 2022
Long text token classification using LongFormer

Long text token classification using LongFormer

abhishek thakur 161 Aug 07, 2022
SNCSE: Contrastive Learning for Unsupervised Sentence Embedding with Soft Negative Samples

SNCSE SNCSE: Contrastive Learning for Unsupervised Sentence Embedding with Soft Negative Samples This is the repository for SNCSE. SNCSE aims to allev

Sense-GVT 59 Jan 02, 2023
Finally, some decent sample sentences

tts-dataset-prompts This repository aims to be a decent set of sentences for people looking to clone their own voices (e.g. using Tacotron 2). Each se

hecko 19 Dec 13, 2022
基于“Seq2Seq+前缀树”的知识图谱问答

KgCLUE-bert4keras 基于“Seq2Seq+前缀树”的知识图谱问答 简介 博客:https://kexue.fm/archives/8802 环境 软件:bert4keras=0.10.8 硬件:目前的结果是用一张Titan RTX(24G)跑出来的。 运行 第一次运行的时候,会给知

苏剑林(Jianlin Su) 65 Dec 12, 2022