Proposed n-stage Latent Dirichlet Allocation method - A Novel Approach for LDA

Overview

n-stage Latent Dirichlet Allocation (n-LDA)

Proposed n-LDA & A Novel Approach for classical LDA

Latent Dirichlet Allocation (LDA) is a generative probabilistic topic model for a given text collection. Topics have a probability distribution over words and text documents over topics. Each subject has a probability distribution over the fixed word corpus [1]. The method exemplifies a mix of these topics for each document. Then, a model is produced by sampling words from this mixture [2].

The coherence value, which is the topic modeling criterion, is used to determine the number of K topic in the system. The coherence value calculates the closeness of words to each other. The topic value of the highest one among the calculated consistency values is chosen as the topic number of the system [3].

After modeling the system with classical LDA, an LDA-based n-stage method is proposed to increase the success of the model. The value of n in the method may vary according to the size of the dataset. With the method, it is aimed to delete the words in the corpus that negatively affect the success. Thus, with the increase in the weight values of the words in the topics formed with the remaining words, the class labels of the topics can be determined more easily [4].

image

The steps of the method are shown in above Figure. In order to reduce the number of words in the dictionary, the threshold value for each topic is calculated. The threshold value is obtained by dividing the sum of the weights of all the words to the word count in the relevant topic. Words with a weight less than the specified threshold value are deleted from the topics and a new dictionary is created for the model. Finally, the system is re-modeled using the LDA algorithm with the new dictionary. These steps can be repeated n times [4].

This method was applied for Turkish and English language. n-stage LDA method was better than classic LDA according to related studies.

Related papers & articles for n-stage LDA

!!! Please citation first paper:

@inproceedings{guven2019comparison,
  title={Comparison of Topic Modeling Methods for Type Detection of Turkish News},
  author={G{\"u}ven, Zekeriya Anil and Diri, Banu and {\c{C}}akalo{\u{g}}lu, Tolgahan},
  booktitle={2019 4th International Conference on Computer Science and Engineering (UBMK)},
  pages={150--154},
  year={2019},
  organization={IEEE}
  doi={10.1109/UBMK.2019.8907050}
}

1-Guven, Z. A., Diri, B., & Cakaloglu, T. (2018, October). Classification of New Titles by Two Stage Latent Dirichlet Allocation. In 2018 Innovations in Intelligent Systems and Applications Conference (ASYU) (pp. 1-5). Ieee.

2-Guven, Z. A., Diri, B., & Cakaloglu, T. (2021). Evaluation of Non-Negative Matrix Factorization and n-stage Latent Dirichlet Allocation for Emotion Analysis in Turkish Tweets. arXiv preprint arXiv:2110.00418.

3-Güven, Z. A., Diri, B., & Çakaloğlu, T. (2020). Comparison of n-stage Latent Dirichlet Allocation versus other topic modeling methods for emotion analysis. Journal of the Faculty of Engineering and Architecture of Gazi University, 35(4), 2135-2146.

4-Güven, Z. A., Diri, B., & Çakaloğlu, T. (2018, April). Classification of TurkishTweet emotions by n-stage Latent Dirichlet Allocation. In 2018 Electric Electronics, Computer Science, Biomedical Engineerings' Meeting (EBBT) (pp. 1-4). IEEE.

5-Güven, Z. A., Diri, B., & Çakaloğlu, T. (2019, September). Comparison of Topic Modeling Methods for Type Detection of Turkish News. In 2019 4th International Conference on Computer Science and Engineering (UBMK) (pp. 150-154). IEEE.

6-GÜVEN, Z. A., Banu, D. İ. R. İ., & ÇAKALOĞLU, T. (2019). Emotion Detection with n-stage Latent Dirichlet Allocation for Turkish Tweets. Academic Platform Journal of Engineering and Science, 7(3), 467-472.

7-Güven, Z. A., Diri, B., & Çakaloğlu, T. Comparison Method for Emotion Detection of Twitter Users. In 2019 Innovations in Intelligent Systems and Applications Conference (ASYU) (pp. 1-5). IEEE.

References

[1] David M. Blei, Andrew Y. Ng, and Michael I. Jordan. Latent Dirichlet allocation.Journal of Machine LearningResearch, 2003. ISSN 15324435. doi:10.1016/b978-0-12-411519-4.00006-9.

[2] Yong Chen, Hui Zhang, Rui Liu, Zhiwen Ye, and Jianying Lin.Experimental explorations on short texttopic mining between LDA and NMF based Schemes.Knowledge-Based Systems, 2019. ISSN 09507051.doi:10.1016/j.knosys.2018.08.011.

[3] Zekeriya Anil Güven, Banu Diri, and Tolgahan Çakaloˇglu. Classification of New Titles by Two Stage Latent DirichletAllocation. InProceedings - 2018 Innovations in Intelligent Systems and Applications Conference, ASYU 2018, 2018.ISBN 9781538677865. doi:10.1109/ASYU.2018.8554027.

[4] Guven, Zekeriya Anil, Banu Diri, and Tolgahan Cakaloglu. "Evaluation of Non-Negative Matrix Factorization and n-stage Latent Dirichlet Allocation for Emotion Analysis in Turkish Tweets." arXiv preprint arXiv:2110.00418 (2021).

Owner
Anıl Güven
Anıl Güven
Data visualization app for H&M competition in kaggle

handm_data_visualize_app Data visualization app by streamlit for H&M competition in kaggle. competition page: https://www.kaggle.com/competitions/h-an

Kyohei Uto 12 Apr 30, 2022
Improved Fitness Optimization Landscapes for Sequence Design

ReLSO Improved Fitness Optimization Landscapes for Sequence Design Description Citation How to run Training models Original data source Description In

Krishnaswamy Lab 44 Dec 20, 2022
Get started with Machine Learning with Python - An introduction with Python programming examples

Machine Learning With Python Get started with Machine Learning with Python An engaging introduction to Machine Learning with Python TL;DR Download all

Learn Python with Rune 130 Jan 02, 2023
ML course - EPFL Machine Learning Course, Fall 2021

EPFL Machine Learning Course CS-433 Machine Learning Course, Fall 2021 Repository for all lecture notes, labs and projects - resources, code templates

EPFL Machine Learning and Optimization Laboratory 1k Jan 04, 2023
Camera calibration & 3D pose estimation tools for AcinoSet

AcinoSet: A 3D Pose Estimation Dataset and Baseline Models for Cheetahs in the Wild Daniel Joska, Liam Clark, Naoya Muramatsu, Ricardo Jericevich, Fre

African Robotics Unit 42 Nov 16, 2022
Code for the paper SphereRPN: Learning Spheres for High-Quality Region Proposals on 3D Point Clouds Object Detection, ICIP 2021.

SphereRPN Code for the paper SphereRPN: Learning Spheres for High-Quality Region Proposals on 3D Point Clouds Object Detection, ICIP 2021. Authors: Th

Thang Vu 15 Dec 02, 2022
Python scripts form performing stereo depth estimation using the HITNET model in Tensorflow Lite.

TFLite-HITNET-Stereo-depth-estimation Python scripts form performing stereo depth estimation using the HITNET model in Tensorflow Lite. Stereo depth e

Ibai Gorordo 22 Oct 20, 2022
ChebLieNet, a spectral graph neural network turned equivariant by Riemannian geometry on Lie groups.

ChebLieNet: Invariant spectral graph NNs turned equivariant by Riemannian geometry on Lie groups Hugo Aguettaz, Erik J. Bekkers, Michaël Defferrard We

haguettaz 12 Dec 10, 2022
NeuralWOZ: Learning to Collect Task-Oriented Dialogue via Model-based Simulation (ACL-IJCNLP 2021)

NeuralWOZ This code is official implementation of "NeuralWOZ: Learning to Collect Task-Oriented Dialogue via Model-based Simulation". Sungdong Kim, Mi

NAVER AI 31 Oct 25, 2022
Robotic Process Automation in Windows and Linux by using Driagrams.net BPMN diagrams.

BPMN_RPA Robotic Process Automation in Windows and Linux by using BPMN diagrams. With this Framework you can draw Business Process Model Notation base

23 Dec 14, 2022
Per-Pixel Classification is Not All You Need for Semantic Segmentation

MaskFormer: Per-Pixel Classification is Not All You Need for Semantic Segmentation Bowen Cheng, Alexander G. Schwing, Alexander Kirillov [arXiv] [Proj

Facebook Research 1k Jan 08, 2023
[ICCV 2021 Oral] Mining Latent Classes for Few-shot Segmentation

Mining Latent Classes for Few-shot Segmentation Lihe Yang, Wei Zhuo, Lei Qi, Yinghuan Shi, Yang Gao. This codebase contains baseline of our paper Mini

Lihe Yang 66 Nov 29, 2022
Element selection for functional materials discovery by integrated machine learning of atomic contributions to properties

Element selection for functional materials discovery by integrated machine learning of atomic contributions to properties 8.11.2021 Andrij Vasylenko I

Leverhulme Research Centre for Functional Materials Design 4 Dec 20, 2022
Visyerres sgdf woob - Modules Woob pour l'intranet et autres sites Scouts et Guides de France

Vis'Yerres SGDF - Modules Woob Vous avez le sentiment que l'intranet des Scouts

Thomas Touhey (pas un pseudonyme) 3 Dec 24, 2022
One Million Scenes for Autonomous Driving

ONCE Benchmark This is a reproduced benchmark for 3D object detection on the ONCE (One Million Scenes) dataset. The code is mainly based on OpenPCDet.

148 Dec 28, 2022
Repository for Driving Style Recognition algorithms for Autonomous Vehicles

Driving Style Recognition Using Interval Type-2 Fuzzy Inference System and Multiple Experts Decision Making Created by Iago Pachêco Gomes at USP - ICM

Iago Gomes 9 Nov 28, 2022
Variational Attention: Propagating Domain-Specific Knowledge for Multi-Domain Learning in Crowd Counting (ICCV, 2021)

DKPNet ICCV 2021 Variational Attention: Propagating Domain-Specific Knowledge for Multi-Domain Learning in Crowd Counting Baseline of DKPNet is availa

19 Oct 14, 2022
Explaining Hyperparameter Optimization via PDPs

Explaining Hyperparameter Optimization via PDPs This repository gives access to an implementation of the methods presented in the paper submission “Ex

2 Nov 16, 2022
A toolset of Python programs for signal modeling and indentification via sparse semilinear autoregressors.

SPAAR Description A toolset of Python programs for signal modeling via sparse semilinear autoregressors. References Vides, F. (2021). Computing Semili

Fredy Vides 0 Oct 30, 2021
Cmsc11 arcade - Final Project for CMSC11

cmsc11_arcade Final Project for CMSC11 Developers: Limson, Mark Vincent Peñafiel

Gregory 1 Jan 18, 2022