trstop

Turkish Stop Words Türkçe Dolgu Sözcükleri In this repository I put Turkish stop words that is contained in the first 10 thousand words with the highest frequency. In order to test the new candidate words in future, I add a small python script, and a 10 thousand item word list with highest frequency. At https://github.com/sgsinclair/trombone/blob/master/src/main/resources/org/voyanttools/trombone/keywords/stop.tr.turkish-lucene.txt are some Turkish stop words. However, some stop words in that list do not belong to the ten thousand highest frequency words.

In order to use the module:

import trstop

print(trstop.is_stop_word(parameter))

Contributors:

Ahmet Aksoy
Toprak Öztürk

Bu depoya en sık kullanılan 10 bin Türkçe sözcük listesinde yer alan dolgu sözcüklerini ekledim. Dolgu sözcükleri (stop words), sık kullanılan, ama iptal edildiklerinde ayrıldıkları cümlenin anlamında önemli değişiklikler oluşturmayan sözcüklerdir.

"Stop words" terimine karşılık "dolgu sözcükleri" terimini kullandım. Daha iyi bir seçenek varsa, değiştirmeye hazırım. Depoya eklediğim "turkce-stop-words-dict.py" betiğini, ileride listeye yeni sözcükler eklemek istediğimizde kullanım sıklığını denetlemek amacıyla kullanabiliriz.

https://github.com/sgsinclair/trombone/blob/master/src/main/resources/org/voyanttools/trombone/keywords/stop.tr.turkish-lucene.txt adresinde de bazı dolgu sözcükleri listelenmiş. Ancak buradaki bazı sözcükler ilk on bine girecek kadar yoğun frekansa sahip değil.

Modülü kullanmak için:

import trstop

print(trstop.is_stop_word(parametre))

Projeye katkıda bulunanlar:

Ahmet Aksoy
Toprak Öztürk

Son güncelleme: 29.06.2018

Turkish Stop Words Türkçe Dolgu Sözcükleri

Related tags

Overview

trstop

In order to use the module:

Contributors:

Modülü kullanmak için:

Projeye katkıda bulunanlar:

Owner

Ahmet Aksoy

A python package to fine-tune transformer-based models for named entity recognition (NER).

An easy-to-use Python module that helps you to extract the BERT embeddings for a large text dataset (Bengali/English) efficiently.

A Word Level Transformer layer based on PyTorch and 🤗 Transformers.

A multi-lingual approach to AllenNLP CoReference Resolution along with a wrapper for spaCy.

Generate vector graphics from a textual caption

[ICCV 2021] Instance-level Image Retrieval using Reranking Transformers

Sploitus - Command line search tool for sploitus.com. Think searchsploit, but with more POCs

Official Pytorch implementation of Test-Agnostic Long-Tailed Recognition by Test-Time Aggregating Diverse Experts with Self-Supervision.

Code voor mijn Master project omtrent VideoBERT

Module for automatic summarization of text documents and HTML pages.

Data preprocessing rosetta parser for python

Code for Findings of ACL 2022 Paper "Sentiment Word Aware Multimodal Refinement for Multimodal Sentiment Analysis with ASR Errors"

Chinese NER with albert/electra or other bert descendable model (keras)

A paper list of pre-trained language models (PLMs).

Telegram AI chat bot written in Python using Pyrogram

Python bot created with Selenium that can guess the daily Wordle word correct 96.8% of the time.

This repository contains the code, data, and models of the paper titled "CrossSum: Beyond English-Centric Cross-Lingual Abstractive Text Summarization for 1500+ Language Pairs".

An extension for asreview implements a version of the tf-idf feature extractor that saves the matrix and the vocabulary.

Pytorch-version BERT-flow: One can apply BERT-flow to any PLM within Pytorch framework.

Research code for "What to Pre-Train on? Efficient Intermediate Task Selection", EMNLP 2021