🏆 • 5050 most frequent words in 109 languages

Last update: Nov 24, 2022

Overview

🏆 Most Common Words Multilingual

5000 most frequent words in 109 languages. Uses wordfrequency.info as a source.

🔗 License

source code license
data is released under different license(s), as they're taken from online sources. Feel free to contribute with your own data!

🌐 Language	📁 File
Afrikaans (af)	.txt
Albanian (sq)	.txt
Amharic (am)	.txt
Arabic (ar)	.txt
Armenian (hy)	.txt
Azerbaijani (az)	.txt
Basque (eu)	.txt
Belarusian (be)	.txt
Bengali (bn)	.txt
Bosnian (bs)	.txt
Bulgarian (bg)	.txt
Catalan (ca)	.txt
Cebuano (ceb)	.txt
Chichewa (ny)	.txt
Chinese (simplified) (zh-CN)	.txt
Chinese (traditional) (zh-TW)	.txt
Corsican (co)	.txt
Croatian (hr)	.txt
Czech (cs)	.txt
Danish (da)	.txt
Dutch (nl)	.txt
English (en)	.txt
Esperanto (eo)	.txt
Estonian (et)	.txt
Filipino (tl)	.txt
Finnish (fi)	.txt
French (fr)	.txt
Frisian (fy)	.txt
Galician (gl)	.txt
Georgian (ka)	.txt
German (de)	.txt
Greek (el)	.txt
Gujarati (gu)	.txt
Haitian creole (ht)	.txt
Hausa (ha)	.txt
Hawaiian (haw)	.txt
Hebrew (iw)	.txt
Hindi (hi)	.txt
Hmong (hmn)	.txt
Hungarian (hu)	.txt
Icelandic (is)	.txt
Igbo (ig)	.txt
Indonesian (id)	.txt
Irish (ga)	.txt
Italian (it)	.txt
Japanese (ja)	.txt
Javanese (jw)	.txt
Kannada (kn)	.txt
Kazakh (kk)	.txt
Khmer (km)	.txt
Kinyarwanda (rw)	.txt
Korean (ko)	.txt
Kurdish (ku)	.txt
Kyrgyz (ky)	.txt
Lao (lo)	.txt
Latin (la)	.txt
Latvian (lv)	.txt
Lithuanian (lt)	.txt
Luxembourgish (lb)	.txt
Macedonian (mk)	.txt
Malagasy (mg)	.txt
Malay (ms)	.txt
Malayalam (ml)	.txt
Maltese (mt)	.txt
Maori (mi)	.txt
Marathi (mr)	.txt
Mongolian (mn)	.txt
Myanmar (my)	.txt
Nepali (ne)	.txt
Norwegian (no)	.txt
Odia (or)	.txt
Pashto (ps)	.txt
Persian (fa)	.txt
Polish (pl)	.txt
Portuguese (pt)	.txt
Punjabi (pa)	.txt
Romanian (ro)	.txt
Russian (ru)	.txt
Samoan (sm)	.txt
Scots gaelic (gd)	.txt
Serbian (sr)	.txt
Sesotho (st)	.txt
Shona (sn)	.txt
Sindhi (sd)	.txt
Sinhala (si)	.txt
Slovak (sk)	.txt
Slovenian (sl)	.txt
Somali (so)	.txt
Spanish (es)	.txt
Sundanese (su)	.txt
Swahili (sw)	.txt
Swedish (sv)	.txt
Tajik (tg)	.txt
Tamil (ta)	.txt
Tatar (tt)	.txt
Telugu (te)	.txt
Thai (th)	.txt
Turkish (tr)	.txt
Turkmen (tk)	.txt
Ukrainian (uk)	.txt
Urdu (ur)	.txt
Uyghur (ug)	.txt
Uzbek (uz)	.txt
Vietnamese (vi)	.txt
Welsh (cy)	.txt
Xhosa (xh)	.txt
Yiddish (yi)	.txt
Yoruba (yo)	.txt
Zulu (zu)	.txt

Count the frequency of letters or words in a text file and show a graph.

Word Counter By EBUS Coding Club Count the frequency of letters or words in a text file and show a graph. Requirements Python 3.9 or higher matplotlib

0 Apr 9, 2022

This program do translate english words to portuguese

Python-Dictionary This program is used to translate english words to portuguese. Web-Scraping This program use BeautifulSoap to make web scraping, so

1 Oct 10, 2022

Python powered crossword generator with database with 20k+ polish words

crossword_generator Generate simple crossword puzzle from words and definitions fetched from krzyżowki.edu.pl endpoints -/ string:word - returns js

0 Jan 4, 2022

This Project is based on NLTK It generates a RANDOM WORD from a predefined list of words, From that random word it read out the word, its meaning with parts of speech , its antonyms, its synonyms

This Project is based on NLTK(Natural Language Toolkit) It generates a RANDOM WORD from a predefined list of words, From that random word it read out the word, its meaning with parts of speech , its antonyms, its synonyms

2 Nov 17, 2021

Russian words synonyms and antonyms

ru_synonyms Russian words synonyms and antonyms. Install pip install git+https://github.com/ahmados/rusynonyms.git Usage from ru_synonyms import Anto

7 Dec 14, 2022

The projects lets you extract glossary words and their definitions from a given piece of text automatically using NLP techniques

Unsupervised technique to Glossary and Definition Extraction Code Files GPT2-DefinitionModel.ipynb - GPT-2 model for definition generation. Data_Gener

28 May 25, 2021

Turkish Stop Words Türkçe Dolgu Sözcükleri

trstop Turkish Stop Words Türkçe Dolgu Sözcükleri In this repository I put Turkish stop words that is contained in the first 10 thousand words with th

103 Nov 12, 2022

The ability of computer software to identify words and phrases in spoken language and convert them to human-readable text

speech-recognition-py Speech recognition is the ability of computer software to identify words and phrases in spoken language and convert them to huma

1 Apr 3, 2022

Words_And_Phrases - Just a repo for useful words and phrases that might come handy in some scenarios. Feel free to add yours

Words_And_Phrases Just a repo for useful words and phrases that might come handy in some scenarios. Feel free to add yours Abbreviations Abbreviation

1 Feb 1, 2022

Comments

build(deps): bump certifi from 2021.10.8 to 2022.12.7
Bumps certifi from 2021.10.8 to 2022.12.7.

Commits

9e9e840 2022.12.07

b81bdb2 2022.09.24

939a28f 2022.09.14

aca828a 2022.06.15.2

de0eae1 Only use importlib.resources's new files() / Traversable API on Python ≥3.11 ...

b8eb5e9 2022.06.15.1

47fb7ab Fix deprecation warning on Python 3.11 (#199)

b0b48e0 fixes #198 -- update link in license

9d514b4 2022.06.15

4151e88 Add py.typed to MANIFEST.in to package in sdist (#196)

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.

dependencies
opened by dependabot[bot] 0
build(deps): bump numpy from 1.21.4 to 1.22.0
Bumps numpy from 1.21.4 to 1.22.0.

Release notes

Sourced from numpy's releases.

v1.22.0

NumPy 1.22.0 Release Notes

NumPy 1.22.0 is a big release featuring the work of 153 contributors spread over 609 pull requests. There have been many improvements, highlights are:

Annotations of the main namespace are essentially complete. Upstream is a moving target, so there will likely be further improvements, but the major work is done. This is probably the most user visible enhancement in this release.

A preliminary version of the proposed Array-API is provided. This is a step in creating a standard collection of functions that can be used across application such as CuPy and JAX.

NumPy now has a DLPack backend. DLPack provides a common interchange format for array (tensor) data.

New methods for quantile, percentile, and related functions. The new methods provide a complete set of the methods commonly found in the literature.

A new configurable allocator for use by downstream projects.

These are in addition to the ongoing work to provide SIMD support for commonly used functions, improvements to F2PY, and better documentation.

The Python versions supported in this release are 3.8-3.10, Python 3.7 has been dropped. Note that 32 bit wheels are only provided for Python 3.8 and 3.9 on Windows, all other wheels are 64 bits on account of Ubuntu, Fedora, and other Linux distributions dropping 32 bit support. All 64 bit wheels are also linked with 64 bit integer OpenBLAS, which should fix the occasional problems encountered by folks using truly huge arrays.

Expired deprecations

Deprecated numeric style dtype strings have been removed

Using the strings "Bytes0", "Datetime64", "Str0", "Uint32", and "Uint64" as a dtype will now raise a TypeError.

(gh-19539)

Expired deprecations for loads, ndfromtxt, and mafromtxt in npyio

numpy.loads was deprecated in v1.15, with the recommendation that users use pickle.loads instead. ndfromtxt and mafromtxt were both deprecated in v1.17 - users should use numpy.genfromtxt instead with the appropriate value for the usemask parameter.

(gh-19615)

... (truncated)

Commits

4adc87d Merge pull request #20685 from charris/prepare-for-1.22.0-release

fd66547 REL: Prepare for the NumPy 1.22.0 release.

125304b wip

c283859 Merge pull request #20682 from charris/backport-20416

5399c03 Merge pull request #20681 from charris/backport-20954

f9c45f8 Merge pull request #20680 from charris/backport-20663

794b36f Update armccompiler.py

d93b14e Update test_public_api.py

7662c07 Update init.py

311ab52 Update armccompiler.py

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.

dependencies
opened by dependabot[bot] 0

Releases(0.1.0)

0.1.0(Dec 19, 2021)

Data comes from wordfrequency.info.
Source code(tar.gz)
Source code(zip)
all.json(13.20 MB)
most-common-words.zip(2.24 MB)

Owner

🃏 effectively learn new languages by using cool methods, such as flashcards and most common words!

GitHub Repository

Official implementations for various pre-training models of ERNIE-family, covering topics of Language Understanding & Generation, Multimodal Understanding & Generation, and beyond.

English|简体中文 ERNIE是百度开创性提出的基于知识增强的持续学习语义理解框架，该框架将大数据预训练与多源丰富知识相结合，通过持续学习技术，不断吸收海量文本数据中词汇、结构、语义等方面的知识，实现模型效果不断进化。ERNIE在累积 40 余个典型 NLP 任务取得 SOTA 效果，并在 G

5.4k Jan 03, 2023

Coreference resolution for English, German and Polish, optimised for limited training data and easily extensible for further languages

Coreferee Author: Richard Paul Hudson, msg systems ag 1. Introduction 1.1 The basic idea 1.2 Getting started 1.2.1 English 1.2.2 German 1.2.3 Polish 1

169 Dec 21, 2022

Data manipulation and transformation for audio signal processing, powered by PyTorch

torchaudio: an audio library for PyTorch The aim of torchaudio is to apply PyTorch to the audio domain. By supporting PyTorch, torchaudio follows the

1.9k Jan 08, 2023

🎐 a python library for doing approximate and phonetic matching of strings.

jellyfish Jellyfish is a python library for doing approximate and phonetic matching of strings. Written by James Turk 1.8k Dec 21, 2022

ASCEND Chinese-English code-switching dataset

ASCEND (A Spontaneous Chinese-English Dataset) introduces a high-quality resource of spontaneous multi-turn conversational dialogue Chinese-English code-switching corpus collected in Hong Kong.

11 Dec 09, 2022

An algorithm that can solve the word puzzle Wordle with an optimal number of guesses on HARD mode.

WordleSolver An algorithm that can solve the word puzzle Wordle with an optimal number of guesses on HARD mode. How to use the program Copy this proje

3 Mar 02, 2022

Simple multilingual lemmatizer for Python, especially useful for speed and efficiency

Simplemma: a simple multilingual lemmatizer for Python Purpose Lemmatization is the process of grouping together the inflected forms of a word so they

70 Dec 29, 2022

Tool to add main subject to items on Wikidata using a WMFs CirrusSearch for named entity recognition or a manually supplied list of QIDs

ItemSubjector Tool made to add main subject statements to items based on the title using a home-brewed CirrusSearch-based Named Entity Recognition alg

9 Nov 17, 2022

Resources for "Natural Language Processing" Coursera course.

Natural Language Processing course resources This github contains practical assignments for Natural Language Processing course by Higher School of Eco

1.1k Jan 01, 2023

🍊 PAUSE (Positive and Annealed Unlabeled Sentence Embedding), accepted by EMNLP'2021 🌴

PAUSE: Positive and Annealed Unlabeled Sentence Embedding Sentence embedding refers to a set of effective and versatile techniques for converting raw

21 Dec 15, 2022

Source code of paper "BP-Transformer: Modelling Long-Range Context via Binary Partitioning"

BP-Transformer This repo contains the code for our paper BP-Transformer: Modeling Long-Range Context via Binary Partition Zihao Ye, Qipeng Guo, Quan G

119 Nov 14, 2022

Code for the paper "VisualBERT: A Simple and Performant Baseline for Vision and Language"

This repository contains code for the following two papers: VisualBERT: A Simple and Performant Baseline for Vision and Language (arxiv) with a short

464 Jan 04, 2023

Simple program that translates the name of files into English

Simple program that translates the name of files into English. Useful for when editing/inspecting programs that were developed in a foreign language.

0 Dec 22, 2021

小布助手对话短文本语义匹配的一个baseline

oppo-text-match 小布助手对话短文本语义匹配的一个baseline 模型参考：https://kexue.fm/archives/8213 base版本线下大概0.952，线上0.866（单模型，没做K-flod融合）。训练测试环境：tensorflow 1.15 + keras

132 Dec 14, 2022

String Gen + Word Checker

Creates random strings and checks if any of them are a real words. Mostly a waste of time ngl but it is cool to see it work and the fact that it can generate a real random word within10sec

1 Jan 06, 2022

Basic Utilities for PyTorch Natural Language Processing (NLP)

Basic Utilities for PyTorch Natural Language Processing (NLP) PyTorch-NLP, or torchnlp for short, is a library of basic utilities for PyTorch NLP. tor

2.1k Jan 01, 2023

Code to use Augmented Shapiro Wilks Stopping, as well as code for the paper "Statistically Signifigant Stopping of Neural Network Training"

This codebase is being actively maintained, please create and issue if you have issues using it Basics All data files are included under losses and ea

32 Nov 09, 2021