|
congressional-speech dataset available: msg#00097science.linguistics.corpora
The "congressional speech" corpus and associated graph information used in our "Get out the vote: Determining support or opposition from Congressional floor-debate transcripts" EMNLP 2006 paper is now available. Specifically, the data includes speeches as individual documents, together with: * automatically-derived labels for whether the speakers supported the legislation under discussion or not, allowing for experiments with this kind of sentiment analysis * indications of which debate each speech comes from (and the position within the debate), allowing for consideration of conversational structure * indications of by-name references between speakers, allowing for experiments with agreement classification (if one determines the "true" labels from the support/oppose labels assigned to the pair of speakers in question) * the edge weights and other information we derived to create the graphs we used for our experiments upon this data, facilitating implementation of alternative graph-based classification methods upon the graphs we constructed The download site is: http://www.cs.cornell.edu/home/llee/data/convote.html Matt Thomas, Bo Pang, and Lillian Lee |
|
| <Prev in Thread] | Current Thread | [Next in Thread> |
|---|---|---|
| Previous by Date: | SIGIR 2007 2nd Call for Papers: 00097, Djoerd Hiemstra |
|---|---|
| Next by Date: | Release of Russian Semantic Lexicon and Multiword list: 00097, Rayson, Paul |
| Previous by Thread: | SIGIR 2007 2nd Call for Papersi: 00097, Djoerd Hiemstra |
| Next by Thread: | Release of Russian Semantic Lexicon and Multiword list: 00097, Rayson, Paul |
| Indexes: | [Date] [Thread] [Top] [All Lists] |
| News | FAQ | advertise |