CS359 Project - APPROACH OF TOPICAL MULTI-DOCUMENT SUMMARIZATION


Objective

This project explores the Topical Multi-Document Summarization problem and tries to suggest a new summarizing algorithm for it. Topical Multi-Document Summarization is, given an initiated topic, generating a comprehensive summary from a large collection of documents based on the topic-related content of part of or all the documents.

Reports

  1. Project proposal: html, pdf
  2. Final paper: pdf

Texts for Experiment

Total 60 documents are used for the experiment in this project. There are 30 selected Brown documents, the rest are articles about the debate of profit or non-profit publishing and articles about car survey.

Tools

   1. Borrowed    2. Developed

Experiment Result

   1. Web page for clustering result

          snapshot

  2. Web page for reduced clusters and documents used for summarizing

          snapshot

   Detailed explanation will be found at the final paper.

Reference

  1. Gees Stein, Amit Bagga and G. Wise. 2000. Multi-Document Summarization: Methodologies and Evaluations. Conference TALN 2000.

  2. www.summarization.com

  3. www.cs.columbia.edu/nlp/projects.html

  4. www.cs.columbia.edu/~hjing/summarization.htm

  5. Regina Barzilay, Kathleen R. McKeown, Michael Elhadad. 1999. Information Fusion in the Context of Multi-Document Summarization.

  6. Carbonell, Jame and Jade Goldstein. 1998. The Use of MMR, Diversity-Based Reranking for Reordering Documents and Producing Summaries. SIGIR ’98.

  7. Inderjit Dhillon and James Fan, 2001. Efficient Clustering of Very Large Document Collection.

  8. www.cs.utexas.edu/users/dml/

  9. www.site.uottawa.ca/tanka/ts.html

  10. transend.labs.bt.com

  11. CCS format, www.netlib.org/linalg/html_templates/node92.html

  12. K-means clustering, www.engr.sjsu.edu/~knapp/HCIRDFSC/C/k_means.htm


Xuehai Zhang 12-05-01