International Conference on Advances in Information Processing and Communication Technology - IPCT 2014
Author(s) : COSKUN SONMEZ , METIN TURAN
Summarization requires selection of the more informative sentences within a set of documents. Generally, process assumes the document set includes related topics to a subject. However, some of the documents may be outlier and the effect of an outlier document might affect the success of extractive summary. Research is focused on filtering documents at the extraction stage these are outlier. Extraction finds the outlier documents far distance from representative document set word vector (DSWV). DUC 2006 data set is used for tests. System summaries are compared with the systems generated by DUC participants. Results points out that filtering outlier documents overwhelm all the systems fairly.