Conference Proceedings

Fifth International Conference on Advances in Computing, Communication and Information Technology - CCIT 2017

Using bipartite graphs projected onto two dimensions for text classification

Author(s) : ELENI ROZAKI , STEPHEN REDMOND

Abstract

In our Big Data world, the amount of text being gathered is ever expanding. For many years, data curators have sought ways to group thes e documents and identify common topics. As the size of the problem increases, solutions that will scale are needed . The purpose of this work is to present a novel text classifier that can be used for text - mining and interactive information access. The mode l that is demonstrated can be used to extract hierarchical relations between topics , as well as to conducted unsupervised clustering of documents and keywords. The approach that is taken with this model is the use of a graph - of - words key term extraction an d a dimensional projection of the bipartite graph of documents and key terms. This projection makes it possible for terms to be co - clustered in an efficient manner in relation to their documents and the documents in relation to their terms. Furthermore, t h e key term extraction process that is outlined can be scaled on a large corpus using a distributed processing system such as Apache Spark, and the resultant model can be visually interacted with by users.

Conference Title : Fifth International Conference on Advances in Computing, Communication and Information Technology - CCIT 2017
Conference Date(s) : 02-03 September, 2017
Place : Novotel Zurich City-West Schiffbaustrasse 13, 8005 Zurich, Switzerland
No fo Author(s) : 2
DOI : 10.15224/978-1-63248-131-3-19
Page(s) : 55 - 59
Electronic ISBN : 978-1-63248-131-3
Views : 329   |   Download(s) : 142