An open collection of annotated voices in Japanese language

Last update: Dec 14, 2022

Related tags

Text Data & NLP koniwa

Overview

声庭 (Koniwa): オープンな日本語音声とアノテーションのコレクション

Koniwa (声庭): An open collection of annotated voices in Japanese language

概要

Koniwa(声庭)は利用・修正・再配布が自由でオープンな音声とアノテーションのコレクションです．
（商用目的での利用も可能です．）

アノテーション作業は始まったばかりです．皆様のコントリビューションをお待ちしております．

ファイルリンク

sound: 音声データ (Google Drive)
source: 参考データ (Google Drive): 原文などアノテーション時の参考になる資料
data: 書誌情報・アノテーションデータ

シリーズ

本コレクションは現在以下のオープンな音声データを利用しています．公開に関わってくださった皆様に深く感謝いたします．

amagasaki: CC BY 4.0
- 2011年4月〜2015年11月
- 兵庫県尼崎市のラジオ番組 (FMあまがさき)
  - いなむら市長の「ひと咲きまち咲きあまがさき」
  - いなむら市長の「い～なこの街あまがさき」 (2014年11月より改題)
free_culture_2012: CC BY 3.0
- 2012年8月
- J-WAVEのラジオ番組 J-WAVE 360° Forum 〜Seek and Find〜
higashiyodogawa: CC BY 4.0
- 2017年11月〜2021年7月
- 大阪市東淀川区の「広報ひがしよどがわ」音声版
librivox: パブリックドメイン
- LibriVox.orgの収録作品
- 歌など一部のものは除外している
minato: CC BY 4.0
- 2019年5月〜2020年12月
- 大阪市港区の「広報みなと」音声版
nishiyodogawa: CC BY 4.0
- 2018年8月〜2021年7月
- 大阪市西淀川区の『広報紙「きらり☆にしよど」音声版』
roudoku_toshokan: CC BY 2.1 JP (原文はパブリックドメイン)
- 池田英生氏の朗読図書館配信の朗読音声
tnc: CC BY 3.0 (原文はパブリックドメイン)
- テレビ西日本のアナウンサーによる朗読音声

Licence

原文・音声のライセンス

本コレクション内の音声は以下のいずれかでライセンスされているもののみを含めることにしています．

パブリックドメイン
- PDM
- CC0
クリエイティブ・コモンズ
- CC BY

アノテーションや文書のライセンス

以下は全てCC0 1.0でライセンスします

二次的著作物に該当するアノテーションのうち二次的著作部分
アノテーションのコメント・アノテーションマニュアルなどの本レポジトリ内の一次著作物（プログラムを除く）

プログラムのライセンス

プログラムはApache License 2.0でライセンスします．

Maintainer

shirayu

An open collection of annotated voices in Japanese language

Related tags

Overview

声庭 (Koniwa): オープンな日本語音声とアノテーションのコレクション

概要

ファイルリンク

シリーズ

Licence

原文・音声のライセンス

アノテーションや文書のライセンス

プログラムのライセンス

Maintainer

Owner

Koniwa project

Code for EMNLP 2021 main conference paper "Text AutoAugment: Learning Compositional Augmentation Policy for Text Classification"

A Python script which randomly chooses and prints a file from a directory.

NeurIPS'21: Probabilistic Margins for Instance Reweighting in Adversarial Training (Pytorch implementation).

华为商城抢购手机的Python脚本 Python script of Huawei Store snapping up mobile phones

neural network based speaker embedder

Visual Automata is a Python 3 library built as a wrapper for Caleb Evans' Automata library to add more visualization features.

Pretrained language model and its related optimization techniques developed by Huawei Noah's Ark Lab.

A simple command line tool for text to image generation, using OpenAI's CLIP and a BigGAN

Beautiful visualizations of how language differs among document types.

Wake: Context-Sensitive Automatic Keyword Extraction Using Word2vec

In this project, we aim to achieve the task of predicting emojis from tweets. We aim to investigate the relationship between words and emojis.

Dope Wars game engine on StarkNet L2 roll-up

Transformer related optimization, including BERT, GPT

Galois is an auto code completer for code editors (or any text editor) based on OpenAI GPT-2.

Sinkhorn Transformer - Practical implementation of Sparse Sinkhorn Attention

Pre-Training with Whole Word Masking for Chinese BERT

Implementation of ProteinBERT in Pytorch

Sploitus - Command line search tool for sploitus.com. Think searchsploit, but with more POCs

Weird Sort-and-Compress Thing

A multi-lingual approach to AllenNLP CoReference Resolution along with a wrapper for spaCy.