This project explores the Topical Multi-Document Summarization problem and tries to suggest a new summarizing algorithm for it. Topical Multi-Document Summarization is, given an initiated topic, generating a comprehensive summary from a large collection of documents based on the topic-related content of part of or all the documents.
Total 60 documents are used for the experiment in this project. There are 30 selected Brown documents, the rest are articles about the debate of profit or non-profit publishing and articles about car survey.
1. Web page for clustering result
snapshot
2. Web page for reduced clusters and documents used for summarizing
snapshot
Detailed explanation will be found at the final paper.
Gees Stein, Amit Bagga and G. Wise. 2000. Multi-Document Summarization: Methodologies and Evaluations. Conference TALN 2000.
Regina Barzilay, Kathleen R. McKeown, Michael Elhadad. 1999. Information Fusion in the Context of Multi-Document Summarization.
Carbonell, Jame and Jade Goldstein. 1998. The Use of MMR, Diversity-Based Reranking for Reordering Documents and Producing Summaries. SIGIR ’98.
Inderjit Dhillon and James Fan, 2001. Efficient Clustering of Very Large Document Collection.
CCS format, www.netlib.org/linalg/html_templates/node92.html
K-means clustering, www.engr.sjsu.edu/~knapp/HCIRDFSC/C/k_means.htm