| LDC Catalog Information for 2001 | |||||
| Catalog ID | Catalog Name | Requested By | Invoice Date | Location | Inv # |
| LDC2001S91 | 1997 HUB-4 Broadcast News Evaluation Non English Test Material | Gina-Anne Levow | Jan 16, 2002 | /data/lorien1/dataHUB4_1997NE/ | 7584 |
| LDC2001S97 | 2000 NIST Speaker Recognition Evaluation | ||||
| LDC2001T55 | Arabic Newswire Part 1 | ||||
| LDC2001T61 | CALLHOME Spanish Dialogue Act Annotation | Gina-Anne Levow | Jan 16, 2002 | merry:/export/data/1/Data/LDCdownloads/levowb7.LDC2001T61.tar | 7584 |
| LDC2001T62 | Cetempublico | ||||
| LDC2001T11 | Chinese Treebank Version 2.0 | Gina-Anne Levow | Jan 16, 2002 | merry:/export/data/1/Data/LDCdownloads/levow28.LDC2001T11.tar | 7584 |
| LDC2001S16 | Grassfields Bantu Fieldwork: Ngomba Tone Paradigms | Nadine Di Vito | Mar 15, 2005 | /data/lorien1/Data/Tone/Ngomba | 13272 |
| LDC2001T02 | Message Understanding Conference (MUC) 7 | ||||
| LDC2001T10 | Prague Dependency Treebank 1.0 | ||||
| LDC2001S04 | Speech in Noisy Environments (SPINE2) Part 1 Audio | ||||
| LDC2001T05 | Speech in Noisy Environments (SPINE2) Part 1 Transcripts | ||||
| LDC2001S06 | Speech in Noisy Environments (SPINE2) Part 2 Audio | ||||
| LDC2001T07 | Speech in Noisy Environments (SPINE2) Part 2 Transcripts | ||||
| LDC2001S08 | Speech in Noisy Environments (SPINE2) Part 3 Audio | ||||
| LDC2001T09 | Speech in Noisy Environments (SPINE2) Part 3 Transcripts | ||||
| LDC2001S99 | Speech in Noisy Environments 1 (SPINE1 CODED) Coded Audio | ||||
| LDC2001S13 | Switchboard Cellular Part 1 Audio | ||||
| LDC2001S15 | Switchboard Cellular Part 1 Transcribed Audio | ||||
| LDC2001T14 | Switchboard Cellular Part 1 Transcription | Barbara Need | Feb 26, 2004 | 11067 | |
| LDC2001T60 | Syllable-Final /s/ Lenition | ||||
| LDC2001S93 | TDT2 Mandarin Audio Corpus | Gina-Anne Levow | Jan 16, 2002 | /data/lorien1/Data/TDT2-Audio | 7584 |
| LDC99T38 | TDT2 Mandarin Text | /data/lorien1/Data/TDT2-Text | |||
| LDC99T39 | TDT2 Multilanguage Text Version 3.0 | ||||
| LDC2001T57 | TDT2 Multilanguage Text Version 4.0 | Gina-Anne Levow | Jan 16, 2002 | /data/lorien1/Data/TDT2-Text-4.0 | 7584 |
| LDC2001S94 | TDT3 English Audio | Gina-Anne Levow | Jan 23, 2002 | /data/lorien1/Data/TDT3-Audio | 7613 |
| LDC2001S95 | TDT3 Mandarin Audio | Gina-Anne Levow | Jan 16, 2002 | /data/lorien1/Data/TDT3-Audio | 7584 |
| LDC2001T58 | TDT3 Multilanguage Text Version 2.0 | Gina-Anne Levow | Jan 16, 2002 | /data/lorien1/Data/TDT3-Text | 7584 |
| LDC Catalog Information for 2002 | |||||
| Catalog ID | Catalog Name | Requested By | Invoice Date | Location | Inv # |
| LDC2002S11 | 1997 HUB4 English Evaluation Speech and Transcripts | ||||
| LDC2002S22 | 1997 HUB5 Arabic Evaluation | ||||
| LDC2002T39 | 1997 HUB5 Arabic Transcripts | ||||
| LDC2002S24 | 1997 HUB5 German Evaluation | ||||
| LDC2003T03 | 1997 HUB5 German Transcripts | ||||
| LDC2002S25 | 1997 HUB5 Spanish Evaluation | ||||
| LDC2003T04 | 1997 HUB5 Spanish Transcripts | ||||
| LDC2002S10 | 1998 HUB5 English Evaluation | ||||
| LDC2003T02 | 1998 HUB5 English Transcripts | Barbara Need | Feb 26, 2004 | 11067 | |
| LDC2002S56 | 2000 Communicator Evaluation | Gina-Anne Levow | Jan 21, 2003 | /data/lorien1/Data/Communicator2000 | 9044 |
| LDC2002S13 | 2001 HUB5 English Evaluation | Gina-Anne Levow | May 16, 2002 | /data/lorien1/data/HUB5E_01 | 8107 |
| LDC2002S12 | 2001 HUB5 Mandarin Evaluation | Gina-Anne Levow | May 16, 2002 | /data/lorien1/Data/HUB5 | 8107 |
| LDC2003T01 | 2001 HUB5 Mandarin Transcripts | Gina-Anne Levow | Jun 11, 2003 | /data/lorien1/Data/HUB5 | 9702 |
| LDC2003T01 | 2001 HUB5 Mandarin Transcripts | Jun Yang | May 25, 2004 | 11609 | |
| LDC2002S34 | 2001 NIST Speaker Recognition Evaluation Corpus | ||||
| LDC2002E36 | 2002 DUC Evaluation Version 0.1 | ||||
| LDC2002E33 | ACE Phase 2 Training Data Version 6 | ||||
| LDC2002E55 | Arabic Treebank: Part 1 v1.0 | ||||
| LDC2002E49 | Buckwalter Arabic Morphological Analyzer | ||||
| LDC2002S37 | Callhome Egyptian Arabic Speech Supplement | ||||
| LDC2002T38 | Callhome Egyptian Arabic Transcripts Supplement | ||||
| LDC2002E27 | Chinese English Translation Dictionary v3.0 | ||||
| LDC2002E14 | Chinese English Translation Lexicon Version 3-beta | ||||
| LDC2002S28 | Emotional Prosody Speech and Transcripts | Gina-Anne Levow | Jan 21, 2003 | /data/lorien1/Data/EMOTIONAL_PROSODY | 9044 |
| LDC2002E17 | English Translation of Chinese Treebank Version 1 beta | ||||
| LDC2001S16 | Grassfields Bantu Fieldwork: Ngomba Tone Paradigms | Nadine Di Vito | Mar 15, 2005 | /data/lorien1/Data/Tone/Ngomba | 13272 |
| LDC2002E19 | Hong Kong Hansard Parallel Text Version 2 beta | ||||
| LDC2002E16 | Hong Kong News Parallel Text Version 2 beta | ||||
| LDC2002T26 | Korean English Treebank Annotations | ||||
| LDC2002E54 | Multiple-Translation Arabic Corpus | ||||
| LDC2002T01 | Multiple-Translation Chinese Corpus | Gina-Anne Levow | May 16, 2002 | merry:/export/data/1/Data/Chinese_translations | 8107 |
| LDC2002E53 | Multiple-Translation Chinese Corpus 2.0 | ||||
| LDC2002E50 | Name-Annotated TDT Corpus Supplement for ACE | ||||
| LDC2002T07 | RST Discourse Treebank | Gina-Anne Levow | May 16, 2002 | /data/lorien1/Data/discourse_treebank | 8107 |
| LDC2002E58 | Sinorama Chinese English Parallel Text | ||||
| LDC2001S08 | Speech in Noisy Environments (SPINE2) Part 3 Audio | ||||
| LDC2001T09 | Speech in Noisy Environments (SPINE2) Part 3 Transcripts | ||||
| LDC2002S06 | Switchboard-2 Phase III Audio | Gina-Anne Levow | May 16, 2002 | on shelf | 8107 |
| LDC2002E32 | TDT3 Arabic Text Version 0.1 | ||||
| LDC2002E52 | TDT4 Multilanguage Text Corpus | ||||
| LDC2002T31 | The AQUAINT Corpus of English News Text | Gina-Anne Levow | Oct 15, 2003 | /data/lorien1/Data/NTCIR4 | 10301 |
| LDC2002S04 | Translanguage English Database (TED) Speech | Gina-Anne Levow | May 16, 2002 | /data/lorien1/Data/TED* | 8107 |
| LDC2002T03 | Translanguage English Database (TED) Transcripts | Gina-Anne Levow | May 16, 2002 | /data/lorien1/Data/Translanguage | 8107 |
| LDC2002E15 | UN Arabic English Parallel Text Version 1 beta | ||||
| LDC2002E48 | Ummah Arabic English Parallel News Text | ||||
| LDC2002S35 | Voicemail Corpus Part II | Gina-Anne Levow | Jan 21, 2003 | /data/lorien1/data/VOICEMAIL | 9044 |
| LDC2002S02 | West Point Arabic Speech Corpus | ||||
| LDC2002E18 | Xinhua Chinese English Parallel News Text Version 1 beta | ||||
| LDC Catalog Information for 2003 | |||||
| Catalog ID | Catalog Name | Requested By | Invoice Date | Location | Inv # |
| LDC2003T03 | 1997 HUB5 German Transcripts | ||||
| LDC2002T42 | 1997 HUB5 Spanish Transcripts | ||||
| LDC2003T04 | 1997 HUB5 Spanish Transcripts | ||||
| LDC2003T02 | 1998 HUB5 English Transcripts | Barbara Need | Feb 26, 2004 | 11067 | |
| LDC2003S01 | 2001 Communicator Evaluation | Gina-Anne Levow | Jun 11, 2003 | /data/lorien1/Data/Communicator2001 | 9702 |
| LDC2003T01 | 2001 HUB5 Mandarin Transcripts | Gina-Anne Levow | Jun 11, 2003 | /data/lorien1/Data/HUB5 | 9702 |
| LDC2003T01 | 2001 HUB5 Mandarin Transcripts | Jun Yang | May 25, 2004 | 11609 | |
| LDC2003E26 | ACE 2004 Pilot Corpus V1.0 | ||||
| LDC2003T11 | ACE-2 Version 1.0 | ||||
| LDC2003E18 | ACE3-V1.3 | ||||
| LDC2003T20 | ANC First Release | Gina-Anne Levow | Dec 02, 2003 | 10538 | |
| LDC2003E10 | Aquaint Xinhua for NTCIR Evaluation | ||||
| LDC2003T12 | Arabic Gigaword | ||||
| LDC2003E05 | Arabic Translation Corpus Part 1 | ||||
| LDC2003T07 | Arabic Treebank: Part 1 - 10K-word English Translation | Gina-Anne Levow | Jun 11, 2003 | /data/lorien1/Data/Arabic_treebank | 9702 |
| LDC2003T06 | Arabic Treebank: Part 1 v 2.0 | Gina-Anne Levow | Jun 11, 2003 | /data/lorien1/Data/Arabic_treebank | 9702 |
| LDC2003E17 | Arabic Treebank: Part 2 v 1.0 | ||||
| LDC2003E24 | Arabic Treebank: Part 2 v 1.1 | ||||
| LDC2004E14 | Articulation Index Speech V1.0 | ||||
| LDC2003E01 | Chinese <-> English Name Entity Lists Version 1.0 beta | ||||
| LDC2003T09 | Chinese Gigaword | Gina-Anne Levow | Jun 11, 2003 | /data/lorien1/data/GIGAWORD_MAN// | 9702 |
| LDC2003E06 | Chinese Treebank 3.0 | ||||
| LDC2003E07 | Chinese Treebank English Parallel Corpus | ||||
| LDC2003S04 | Cross-Channel Forensic Speech for Automatic Speaker Recognition | ||||
| LDC2003E27 | EARS MDE RT-03 DevTest and Evaluation Corpus | ||||
| LDC2003E19 | EARS MDE RT-03F Training Corpus | ||||
| LDC2003T05 | English Gigaword | Gina-Anne Levow | Jun 11, 2003 | /data/lorien1/data/GIGAWORD_ENG/ | 9702 |
| LDC2003E14 | FBIS Multilanguage Texts | ||||
| LDC2003V01 | FORM2 Kinematic Gesture | ||||
| LDC2003E13 | Fisher Quick Transcription Part 1 Version 1.0 | ||||
| LDC2003E13C | Fisher Quick Transcription Part 3 Version 1.0 | ||||
| LDC2003E12D | Fisher Training Speech Data, Part 4 | ||||
| LDC2003E12 | Fisher Training Speech Part 1 | ||||
| LDC2003E12B | Fisher Training Speech Part 2 | ||||
| LDC2003E12C | Fisher Training Speech Part 3 | ||||
| LDC2003E13D | Fisher Training Transcripts Part 4, v1.0 | ||||
| LDC2003L01 | Grassfields Bantu Fieldwork: Dschang Lexicon | Barbara Need | Nov 07, 2003 | 10445 | |
| LDC2003S02 | Grassfields Bantu Fieldwork: Dschang Tone Paradigms | Barbara Need | Nov 07, 2003 | /data/lorien1/Data/Tone/Dschang | 10445 |
| LDC2003E15 | HARD GovDocs | ||||
| LDC2003E25 | Hong Kong News Parallel Text | ||||
| LDC2003P01 | Korean Telephone Conversations Complete Set | ||||
| LDC2003L02 | Korean Telephone Conversations Lexicon | Gina-Anne Levow | Jun 11, 2003 | /data/lorien1/Data/Korean | 9702 |
| LDC2003S03 | Korean Telephone Conversations Speech | Gina-Anne Levow | Jun 11, 2003 | /data/lorien1/data/KOR_SPEECH_1,2,3 | 9702 |
| LDC2003T08 | Korean Telephone Conversations Transcripts | Gina-Anne Levow | Jun 11, 2003 | /data/lorien1/Data/Korean | 9702 |
| LDC2003T13 | Message Understanding Conference (MUC) 6 | ||||
| LDC2003E04 | Multiple Translation Chinese Corpus Part 3 | ||||
| LDC2003T18 | Multiple-Translation Arabic (MTA) Part 1 | ||||
| LDC2003T17 | Multiple-Translation Chinese (MTC) Part 2 | ||||
| LDC2003T10 | SAID | ||||
| LDC2003E16 | SIGHAN Bakeoff | ||||
| LDC2003T15 | SLX Corpus of Classic Sociolinguistic Interviews | ||||
| LDC2003S06 | Santa Barbara Corpus of Spoken American English Part-II | Gina-Anne Levow | Dec 02, 2003 | /data/lorien1/data/SBCSAE_P2/ | 10538 |
| LDC2003T16 | SummBank 1.0 | ||||
| LDC2003E02 | TDT4 Multilanguage Speech | ||||
| LDC2003E20 | TDT4 Multilanguage Text Subset for TIDES Extraction 2003 | ||||
| LDC2003E21 | TDT4 Multilanguage Text Version 1.1 | ||||
| LDC2003E03 | TDT4 Multilanguage Transcripts | ||||
| LDC2003E22 | The SLX Corpus of Classic Sociolinguistic Interviews | ||||
| LDC2003E11 | UN Chinese English Parallel Text Version 1.0 beta | ||||
| LDC2003S05 | West Point Russian Speech | ||||
| LDC Catalog Information for 2004 | |||||
| Catalog ID | Catalog Name | Requested By | Invoice Date | Location | Inv # |
| LDC2004T15 | 2000 Communicator Dialogue Act Tagged | Gina-Anne Levow | Aug 24, 2004 | /data/lorien1/Data/CommDA | 12259 |
| LDC2004T16 | 2001 Communicator Dialogue Act Tagged | Gina-Anne Levow | Aug 24, 2004 | /data/lorien1/data/2001_COMM_DIALOG_ACT | 12259 |
| LDC2004S04 | 2002 NIST Speaker Recognition Evaluation | ||||
| LDC2004S11 | 2002 Rich Transcription Broadcast News and Conversational Telephone Speech | ||||
| LDC2004E27 | ACE 2004 English and Chinese Training Data Superset | ||||
| LDC2004E03 | ACE 2004 Pilot Corpus V1.3 | ||||
| LDC2004E06 | AQUAINT Supplement for DUC2004 | ||||
| LDC2004E71 | ATB Part 3 (a) v.1.1 | ||||
| LDC2004E22 | Arabic CTS Levantine Fisher Training Data Set 1, Transcriptions | ||||
| LDC2004E21 | Arabic CTS Levantine Fisher Training Data Set 1: Speech | ||||
| LDC2004E65 | Arabic CTS Levantine Fisher Training Data Set 2, Speech | ||||
| LDC2004E66 | Arabic CTS Levantine Fisher Training Data Set 2, Transcripts | ||||
| LDC2004T18 | Arabic English Parallel News Part 1 | ||||
| LDC2004E08 | Arabic English Parallel News Text Part 1 | ||||
| LDC2004E07 | Arabic News Translation Corpus Part 3 | ||||
| LDC2004E11 | Arabic News Translation Corpus Part 4 | ||||
| LDC2004T17 | Arabic News Translation Text Part 1 | ||||
| LDC2004T02 | Arabic Treebank: Part 2 v 2.0 | ||||
| LDC2004T11 | Arabic Treebank: Part 3 v 1.0 | ||||
| LDC2004E14 | Articulation Index Speech V1.0 | ||||
| LDC2004T27 | Buckwalter(Bad catalog No. do not use) Arabic. Morph Analyzer | ||||
| LDC2004T05 | Chinese Treebank Version 4.0 | Jun Yang | May 25, 2004 | 11609 | |
| LDC2004T05 | Chinese Treebank Version 4.0 | Gina-Anne Levow | Aug 24, 2004 | 12259 | |
| LDC2004S01 | Czech Broadcast News Speech | ||||
| LDC2004T01 | Czech Broadcast News Transcripts | ||||
| LDC2004V01 | FORM1 Kinematic Gesture | ||||
| LDC2004S13 | Fisher English Training Speech Part 1 Speech | ||||
| LDC2004T19 | Fisher English Training Speech Part 1, Transcripts | ||||
| LDC2004E30 | HARD 2004 Corpus | Gina-Anne Levow | Jun 29, 2004 | /data/lorien1/Data/HARD2004 | 11875 |
| LDC2004E34 | HARD 2004 Evaluation Topics V1.1 | Gina-Anne Levow | Jun 29, 2004 | /data/lorien1/Data/HARD2004 | 11875 |
| LDC2004E34 | HARD 2004 Evaluation Topics V1.1 | Gina-Anne Levow | Jul 09, 2004 | /data/lorien1/Data/HARD2004 | 11999 |
| LDC2004E34 | HARD 2004 Evaluation Topics V1.1 | Gina-Anne Levow | Jul 21, 2004 | /data/lorien1/Data/HARD2004 | 12101 |
| LDC2004E32 | HARD 2004 Training Data | Gina-Anne Levow | Jun 29, 2004 | /data/lorien1/Data/HARD2004 | 11875 |
| LDC2004E09 | Hong Kong Hansard Parallel Text | ||||
| LDC2004T08 | Hong Kong Parallel Text | ||||
| LDC2004S02 | ICSI Meeting Speech | Gina-Anne Levow | Mar 16, 2004 | /data/lorien1/Data/icsi/ | 11180 |
| LDC2004T04 | ICSI Meeting Transcripts | Gina-Anne Levow | Mar 16, 2004 | /data/lorien1/Data/icsi | 11180 |
| LDC2004E04 | ISL Meeting Corpus Speech | /data/lorien1/dataISL_MEETING_SPEECH_2// | |||
| LDC2004E05 | ISL Meeting Corpus Transcripts | /data/lorien1/Data/isl | |||
| LDC2004S05 | ISL Meeting Speech Part 1 | Gina-Anne Levow | Aug 24, 2004 | /data/lorien1/data/ISL_MEETING_SPEECH_2 | 12259 |
| LDC2004T10 | ISL Meeting Transcripts Part 1 | Gina-Anne Levow | Aug 24, 2004 | /data/lorien1/Data/isl | 12259 |
| LDC2004L01 | Klex: Finite-State Lexical Transducer for Korean | ||||
| LDC2004S08 | MDE RT-03 Training Data Speech | Gina-Anne Levow | Aug 24, 2004 | /data/lorien1/data/MDE_RT03_TRAIN_SP_EECH_1,2 | 12259 |
| LDC2004T12 | MDE RT-03 Training Data Text and Annotations | Gina-Anne Levow | Aug 24, 2004 | /data/lorien1/data/MDE_RT03_TRAIN_TE_XT/ | 12259 |
| LDC2004T03 | Morphologically Annotated Korean Text | Gina-Anne Levow | Mar 16, 2004 | 11180 | |
| LDC2004T07 | Multiple-Translation Chinese (MTC) Part 3 | ||||
| LDC2004E15 | NIST Meeting Evaluation Corpus | ||||
| LDC2004S09 | NIST Meeting Pilot Corpus Speech | Gina-Anne Levow | Aug 24, 2004 | /data/lorien1/data/NIST_MEET_PILOT_SP_8,9/ | 12259 |
| LDC2004T13 | NIST Meeting Pilot Corpus Transcripts and Metadata | Gina-Anne Levow | Aug 24, 2004 | /data/lorien1/data/NIST_MEET_PILOT/ | 12259 |
| LDC2004E01 | NIST Pilot Meeting Corpus Speech | ||||
| LDC2004E02 | NIST Pilot Meeting Corpus Transcripts V1.4 | ||||
| LDC2004T23 | Prague Arabic Dependency Treebank 1.0 | ||||
| LDC2004T25 | Prague Czech-English Dependency Treebank Version 1.0 | ||||
| LDC2004E26 | Proposition Bank 1 V1.0 | ||||
| LDC2004T14 | Proposition Bank I | ||||
| LDC2004E24 | RT-04 MDE Annotation Consistency Study | ||||
| LDC2004E16 | RT-04 MDE DevTest Set #1 Version 1.2 | ||||
| LDC2004E29 | RT-04 MDE DevTest Set #2 V1.2 | ||||
| LDC2004E31 | RT-04 MDE Training Data V1.2 | ||||
| LDC2004E28 | RT-04 STT Transcription Consistency Study | ||||
| LDC2004E67 | RT-04F STT Chinese CTS Development Data Speech | ||||
| LDC2004E68 | RT-04F STT Chinese CTS Development Data Transcripts | ||||
| LDC2004E69 | RT-04F STT Chinese CTS Training Data Speech | ||||
| LDC2004E70 | RT-04F STT Chinese CTS Training Data Transcripts | ||||
| LDC2004E10 | RT-04F STT Multilingual Speech Development Data - Supplement | ||||
| LDC2004E18 | RT-04F STT Multilingual Speech Development Data V1.1 Re-release | ||||
| LDC2004E19 | RT-04F STT Multilingual Transcripts Devlopment Data V1.2 | ||||
| LDC2004S10 | Santa Barbara Corpus of Spoken American English III | ||||
| LDC2004S07 | Switchboard Cellular Part 2 Audio | ||||
| LDC2004E20 | TDT-4 Annotations | ||||
| LDC2004E36 | TDT4 (Chinese, Arabic) Reformatted for MT Processing | ||||
| LDC2004E35 | TDT5 (Chinese, Arabic) Reformatted for MT Processing | ||||
| LDC2004E23 | TERN 2004 Training Data V1.3 | ||||
| LDC2004T09 | TIDES Extraction (ACE) 2003 Multilingual Training Data | Gina-Anne Levow | Aug 24, 2004 | /data/lorien1/data/ace_tides_multling_train/ | 12259 |
| LDC2004E17 | TIDES Extraction ACE 2004 Training Data V1.4 | ||||
| LDC2004S12 | Talkbank Ethology Data: Field Recordings of Vervet Monkey Calls | ||||
| LDC2004E13 | UN Arabic English Parallel Text | ||||
| LDC2004E12 | UN Chinese English Parallel Text | ||||
| LDC2004E72 | eTIRR Arabic English News Text | ||||
| LDC Catalog Information for 2005 | |||||
| Catalog ID | Catalog Name | Requested By | Invoice Date | Location | Inv # |
| LDC2005E12 | 2005 MSE Arabic-English Clusters V1.2 | ||||
| LDC2005T09 | ACE 2004 Multilingual Training Corpus | Michael Berger | Mar 16, 2005 | /data/lorien1/data/ace_tides_multling_train/ | 13293 |
| LDC2005E22 | ACE 2005 Arabic Unsupervised Training Data | ||||
| LDC2005E21 | ACE 2005 Chinese Unsupervised Training Data | ||||
| LDC2005E20 | ACE 2005 English Unsupervised Training Data | ||||
| LDC2005E18 | ACE 2005 Multilingual Training Data V6.0 | ||||
| LDC2005T07 | ACE Time Normalization (TERN) 2004 English Training Data V1.0 | Michael Berger | Feb 18, 2005 | /data/lorien1/data/TERN | 13132 |
| LDC2005T35 | ANC 2nd Release | ||||
| LDC2005S07 | Arabic CTS Levantine Fisher Training Data Set 2, Speech | Michael Berger | Feb 18, 2005 | /data/lorien1/data/cts_arabic_la_td3_speech/ | 13132 |
| LDC2005T03 | Arabic CTS Levantine Fisher Training Data Set 3 , Transcripts | Michael Berger | Feb 18, 2005 | /data/lorien1/data/cts_arabic_la_td3_trans/ | 13132 |
| LDC2005E46 | Arabic Treebank English Translation | ||||
| LDC2005T02 | Arabic Treebank: Part 1 v 3.0 (POS with full vocalization + syntactic analysis) | Michael Berger | Feb 18, 2005 | /data/lorien1/data/ATB_PT1_VER3/ | 13132 |
| LDC2005T20 | Arabic Treebank: Part 3 (full corpus) v2.0 (MPG + Syntactic Analysis) | Michael Berger | Jun 20, 2005 | /data/lorien1/data/ATB_PT1_VER3/ | 13812 |
| LDC2005T30 | Arabic Treebank: Part 4 v1.0 (MPG Annotation) | Michael Berger | Oct 14, 2005 | /data/lorien1/data/atb_p4_v1/ | 14458 |
| LDC2005S22 | Articulation Index | Michael Berger | Oct 14, 2005 | /data/lorien1/data/artic_index/ | 14458 |
| LDC2005T33 | BBN Pronoun Coreference and Entity Type Corpus | Michael Berger | Sep 16, 2005 | /data/lorien1/data/bbn-pcet/ | 14266 |
| LDC2005S08 | BBN/AUB DARPA Babylon Levantine Arabic Speech and Transcripts | Michael Berger | Feb 18, 2005 | /data/lorien1/data/bablyon_arabic/ | 13132 |
| LDC2005T13 | CCGbank | Michael Berger | May 23, 2005 | /data/lorien1/data/ccgbank | 13702 |
| LDC2005T34 | Chinese <-> English Name Entity Lists (v1.beta) | ||||
| LDC2005E47 | Chinese English News Magazine Parallel Text | ||||
| LDC2005T10 | Chinese English News Magazine Parallel Text | Michael Berger | Jun 20, 2005 | 13812 | |
| LDC2005T14 | Chinese Gigaword Second Edition | Michael Berger | Aug 18, 2005 | /data/lorien1/data/gigaword_cmn_v2// | 14123 |
| LDC2005T06 | Chinese News Translation Text Part 1 | Michael Berger | Mar 16, 2005 | /data/lorien1/data/CH_NEWS_TRANS/ | 13293 |
| LDC2005T23 | Chinese Proposition Bank 1.0 | Michael Berger | Sep 16, 2005 | /data/lorien1/data/cpb_ver1/ | 14266 |
| LDC2005T01 | Chinese Treebank 5.0 | Michael Berger | Feb 18, 2005 | /data/lorien1/data/ctb5 | 13132 |
| LDC2005T01U01 | Chinese Treebank 5.1 | ||||
| LDC2005E43 | CoNLL-2005 Shared Task Datasets | ||||
| LDC2005T08 | Discourse Graphbank | Michael Berger | Mar 16, 2005 | /data/lorien1/data/DISCOURSE_GRPH_B/ | 13293 |
| LDC2005T12 | English Gigaword Second Edition | Michael Berger | Jul 22, 2005 | /data/lorien1/data/gigaword_end_v2/ | 14005 |
| LDC2005E40B | Fisher English Phase 2, Part 2, Training Speech | ||||
| LDC2005E39B | Fisher English Phase 2, Part 2, Training Transcripts | ||||
| LDC2005S13 | Fisher English Training Part 2, Speech | Michael Berger | Apr 14, 2005 | /data/lorien1/data/FE_03_SP/ | 13460 |
| LDC2005T19 | Fisher English Training Part 2, Transcripts | Michael Berger | Apr 14, 2005 | /data/lorien1/data/FE_03_P2_TRAN/ | 13460 |
| LDC3001X01 | Frank Test Corpus I | ||||
| LDC3001X10 | Frank Test Corpus XV1 | ||||
| LDC2005E69 | GALE Kickoff Release - English-Arabic Parallel Treebank V1.0 | ||||
| LDC2005S15 | HKUST Mandarin Telephone Speech, Part 1 | Michael Berger | Jul 22, 2005 | /data/lorien1/data/hkust_mcts_p1/ | 14005 |
| LDC2005T32 | HKUST Mandarin Telephone Transcript Data, Part 1 | Michael Berger | Jul 22, 2005 | /data/lorien1/data/hkust_mcts_p1tr/ | 14005 |
| LDC2005S14 | Levantine Arabic QT Training Data Set 4 (Speech + Transcripts) | Michael Berger | Jun 20, 2005 | 13812 | |
| LDC2005T24 | MDE RT-04 Training Data Text/Annotations | Michael Berger | Aug 18, 2005 | /data/lorien1/data/mde_04_text_annot/ | 14123 |
| LDC2005S16 | MDE RT04 Training Data Speech | Michael Berger | Aug 18, 2005 | /data/lorien1/data/mde_04_speech_bnews/ | 14123 |
| LDC2005E14 | MSE 2005 Sample Summary Topic | ||||
| LDC2005L01 | Mawukakan Lexicon | Michael Berger | Apr 14, 2005 | /data/lorien1/data/MAWU_LEXICON/ | 13460 |
| LDC2005T05 | Multiple-Translation Arabic (MTA) Part 2 | Michael Berger | Feb 18, 2005 | /data/lorien1/data/MTA_P2 | 13132 |
| LDC2005S27 | NIST 2003 Language Recognition Development Data II | ||||
| LDC2005E50 | NTCIR Evaluation | ||||
| LDC2005S25 | Santa Barbara Corpus of Spoken American English Part-IV | Michael Berger | Sep 16, 2005 | /data/lorien1/data/sbcsae_4 | 14266 |
| LDC2005S11 | TDT4 Multilingual Broadcast News Speech Corpus | Michael Berger | May 23, 2005 | /data/lorien1/data/TDT4* | 13702 |
| LDC2005T16 | TDT4 Multilingual Text and Annotations | Michael Berger | May 23, 2005 | /data/lorien1/data/tdt4_aem_txt/ | 13702 |
| LDC2005E44 | TIDES MT 2003 Arabic Evaluation Set | ||||
| LDC2005E45 | TIDES MT 2003 Chinese Evaluation Set | ||||
| LDC2005S28 | West Point Croatian Speech Corpus | Michael Berger | Oct 14, 2005 | /data/lorien1/data/wp_croatian/ | 14458 |
| Catalog Information for Non-Member Years | |||||
| Catalog ID | Catalog Name | Requested By | Invoice Date | Location | Inv # |
| LDC98T24 | 1997 Mandarin Broadcast News Transcripts (Hub-4NE) | Gina-Anne Levow | Apr 01, 2005 | /data/lorien1/data/HUB4_1997NE/ | 13392 |
| LDC2004E25 | 2003 HARD Annotations | Gina-Anne Levow | Jun 29, 2004 | /data/lorien1/Data/HARD2004 | 11875 |
| LDC96S36 | Boston University Radio Speech Corpus | Barbara Need | Nov 16, 1998 | /data/lorien1/Data/Tone/BURadio | 4451 |
| LDC96L14 | CELEX2 | Partha Niyogl | Feb 06, 2001 | 6461 | |
| LDC96S30 | CTIMIT | Partha Niyogl | Jun 20, 2002 | Z399061 | 8268 |
| LDC2002L27 | Chinese-English Translation Lexicon Version 3.0 | Gina-Anne Levow | Jan 21, 2003 | 9044 | |
| LDC2004E42 | HARD 2004 Reference Annotations | Gina-Anne Levow | Oct 18, 2004 | /data/lorien1/Data/HARD2004 | 12522 |
| LDC93S12 | HCRC Map Task Corpus | Gina-Anne Levow | Apr 23, 2004 | /data/lorien1/Data/Maptask | 11412 |
| LDC93S2 | NTIMIT | Partha Niyogl | Jun 20, 2002 | Z399061 | 8268 |
| LDC93S10 | TIDIGITS | Professor A. Murua | Feb 19, 1998 | Z895150 | 3738 |
| LDC93S1 | TIMIT Acoustic-Phonetic Continuous Speech Corpus | Partha Niyogl | Jun 20, 2002 | /data/lorien1/Data/TIMIT | 8268 |
| LDC98S72 | Taiwanese Putonghua Speech and Transcripts | Gina-Anne Levow | Apr 01, 2005 | /data/lorien1/Data/TWPTH | 13392 |
| LDC0000 | To be filled | Gina-Anne Levow | Dec 01, 2004 | 12736 | |
| LDC99T42 | Treebank-3 | Derrick Higgins | Apr 26, 2001 | 6735 | |
| N/A | 20 Newsgroups | Gina-Anne Levow | /data/lorien1/Data/20_newsgroups | ||
| N/A | Cross-language Evaluation Forum -2000; E/F/G/I | Gina-Anne Levow | /data/lorien1/Data/CLEF | ||
| N/A | Cross-language Evaluation Forum -2004; E/Fr/Fi/Ru | Gina-Anne Levow | /data/lorien1/Data/CLEF2004 | ||
| N/A | CUHK Broadcast Cantonese-CUSENT | Gina-Anne Levow | /data/lorien1/Data/CUSENT | ||
| N/A | ICSI Switchboard Close Transcription | Gina-Anne Levow | /data/lorien1/Data/ICSI97 | ||
| N/A | ICSI Meeting Recorder Dialogue Acts | Gina-Anne Levow | /data/lorien1/Data/MRDA | ||
| N/A | LEAP Learners' English | Gina-Anne Levow | /data/lorien1/data/LEAP_ENG | ||
| N/A | LEAP Learners' German | Gina-Anne Levow | /data/lorien1/data/LEAP_GERMAN | ||
| N/A | NTCIR 4 Data: CJKE | Gina-Anne Levow | /data/lorien1/Data/NTCIR4 | ||
| N/A | Sun SpeechActs Data | Gina-Anne Levow | /data/lorien1/Data/SpeechActs | ||
| N/A | TRAINS Dialogue Data | Gina-Anne Levow | /data/lorien1/Data/Trains | ||
| N/A | Xu Focus Speech Data - Pitch tracks | Gina-Anne Levow | /data/lorien1/Data/XuFocus1999 | ||
| N/A | Xu Focus Raw Speech | Gina-Anne Levow | /data/lorien1/Data/XuFocus1999_audio | ||
| N/A | Mini-newsgroups | Gina-Anne Levow | /data/lorien1/Data/mini_newsgroups | ||
| N/A | Reuters Text Classification Data | Gina-Anne Levow | /data/lorien1/Data/reuters | ||
| N/A | Switchboard Dialogue Acts | Gina-Anne Levow | /data/lorien1/Data/swbda | ||
| N/A | McNeill Wombat Dialogues - DSP | Gina-Anne Levow | /data/lorien1/Data/wombat | ||