原神抽卡记录数据集-Genshin Impact gacha data

Last update: Dec 27, 2022

Related tags

Text Data & NLP genshin-impact

Overview

提要

持续收集原神抽卡记录中

可以使用抽卡记录导出工具导出抽卡记录的json，将json文件发送至[email protected]，我会在清除个人信息后将文件提交到此处。以下两种导出工具任选其一即可。

一种抽卡记录导出工具 from sunfkny 使用方法演示视频

另一种electron版的抽卡记录导出工具 from lvlvl

目前数据集中有195917条抽卡记录

数据使用说明

你可以以个人身份自由的使用本项目数据用于抽卡机制研究，你可以自由的修改和发布我的分析代码（虽然我这代码还不如重新写一次）

但是一定不要将抽卡数据集发布整合到别的平台上，若如此，以后有人去使用多个来源的抽卡数据可能会遇到严重的数据重复问题。请让想要获得抽卡数据朋友来GitHub下载，或注明数据来自本项目。

在使用本数据集得出任何结论时，请自问过程是否严谨，结论是否可信。不应当发布显然不正确的抽卡模型或是不正确且会造成不良影响的模型，如造成不良影响，数据集整理者和提供数据的玩家不负任何责任。

通过一段时间的研究，我基本整理出了原神抽卡的所有机制：

原神抽卡全机制总结

分析抽卡机制的一些工具

数据格式说明

dataset_02文件夹中文件从0001开始顺序编号

每个文件夹内包含一个账号的抽卡记录

gacha100.csv 记录初行者推荐祈愿抽卡数据

gacha200.csv 记录常驻祈愿抽卡数据

gacha301.csv 记录角色活动祈愿数据

gacha302.csv 记录武器活动祈愿数据

csv文件内数据记录格式如下

抽卡时间	名称	类别	星级
YYYY-MM-DD HH:MM:SS	物品全名	角色/武器	3/4/5

分析工具说明

DataAnalysis.py用于分析csv抽卡文件，这段代码还在重写中，会非常的难用，仅供参考，运行后会输出参考统计量并画出分布图，分布图中理论值是我根据实际数据、部分游戏文件推理建立的概率增长模型。

DistributionMatrix.py用于在四星五星耦合的情况下分析设计模型的抽卡概率和分布，是计算抽卡模型的综合概率与期望的大杀器

原神抽卡记录数据集-Genshin Impact gacha data

Related tags

Overview

提要

数据使用说明

数据格式说明

推荐数据处理方式

分析工具说明

Owner

A curated list of FOSS tools to improve the Hacker News experience

Data preprocessing rosetta parser for python

🤗Transformers: State-of-the-art Natural Language Processing for Pytorch and TensorFlow 2.0.

Negative sampling for solving the unlabeled entity problem in NER. ICLR-2021 paper: Empirical Analysis of Unlabeled Entity Problem in Named Entity Recognition.

A Telegram bot to add notes to Flomo.

Py65 65816 - Add support for the 65C816 to py65

This is a really simple text-to-speech app made with python and tkinter.

An end to end ASR Transformer model training repo

Learning Spatio-Temporal Transformer for Visual Tracking

This repository contains Python scripts for extracting linguistic features from Filipino texts.

👄 The most accurate natural language detection library for Python, suitable for long and short text alike

Neural network models for joint POS tagging and dependency parsing (CoNLL 2017-2018)

T‘rex Park is a Youzan sponsored project. Offering Chinese NLP and image models pretrained from E-commerce datasets

Code for Discovering Topics in Long-tailed Corpora with Causal Intervention.

Ceaser-Cipher - The Caesar Cipher technique is one of the earliest and simplest method of encryption technique

[KBS] Aspect-based sentiment analysis via affective knowledge enhanced graph convolutional networks

Gpt2-WebAPI - The objective of this API is to provide the 3 best possible responses to sentences that the user would input via http GET request as a parameter

Sentence boundary disambiguation tool for Japanese texts (日本語文境界判定器)

DeeBERT: Dynamic Early Exiting for Accelerating BERT Inference

Contact Extraction with Question Answering.