Applying Word Co-Occurrence Graph in Enhancing LDA Model for Topic Discovering in Large-Scaled Text Corpus
Phu Pham1, Phuc Do2

1Phu Pham, University of Information Technology (UIT), VNU-HCM, Vietnam Asia.
2Phuc Do, University of Information Technology (UIT), VNU-HCM, Vietnam Asia.
Manuscript received on 21 August 2019 | Revised Manuscript received on 11 September 2019 | Manuscript Published on 17 September 2019 | PP: 1366-1371 | Volume-8 Issue-2S8 August 2019 | Retrieval Number: B10680882S819/2019©BEIESP | DOI: 10.35940/ijrte.B1068.0882S819
Open Access | Editorial and Publishing Policies | Cite | Mendeley | Indexing and Abstracting
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC-BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)

Abstract: Topic modeling, such as LDA is considered as a useful tool for the statistical analysis of text document collections and other text-based data. Recently, topic modeling becomes an attractive researching field due to its wide applications. However, there are remained disadvantages of traditional topic modeling like as LDA due the shortcoming of bag-of-words (BOW) model as well as low-performance in handle large text corpus. Therefore, in this paper, we present a novel approach of topic model, called LDA-GOW, which is the combination of word co-occurrence, also called: graph-of-words (GOW) model and traditional LDA topic discovering model. The LDA-GOW topic model not only enable to extract more informative topics from text but also be able to leverage the topic discovering process from large-scaled text corpus. We test our proposed model in comparing with the traditional LDA topic model, within several standardized datasets, include: WebKB, Reuters-R8 and annotated scientific documents which are collected from ACM digital library to demonstrate the effectiveness of our proposed model. For overall experiments, our proposed LDA-GOW model gains approximately 70.86% in accuracy.
Keywords: Topic Model, LDA, Graph-Of-Words (GOW), Frequent Subgraph Mining, Word Co-Occurrence Graph, Graph-Based Concept.
Scope of the Article: Graph Algorithms and Graph Drawing