|
Re: Searching Japanese corpora: msg#00148science.linguistics.corpora
Hi Eric, It is my understanding that it is possible to write the pronunciation of all kanji and kanji compounds in both hiragana and katakana (and each kanji/kanji compound can have multiple pronunciations). In most types of written Japanese, it would be uncommon to write the pronunciation for kanji, and there are many words that are always written in katakana or hiragana, and never in kanji, so when searching for words, having a tool that would automatically search for a kanji word and it's kana representations at the same time would not be that useful. I should confess that there are some words that are written in both kanji and kana with higher frequency, such as some older loanwords, some place names, some proper names, some low-frequency kanji, and a few other types of words. I have a gut feeling that the number of words that fall into these categories is not that large. I don't know of any tools out there to do the kind of query you mentioned, but it has been a few years since I working on Japanese text. In the meantime, I can only suggest making many queries, one with kanji/kanji compund and others with the hiragana and katakana spellings of all the possible pronunciations. Yours, Cyrus http://www.psych.ualberta.ca/~westburylab/ Eric J. M. Smith wrote: Greetings, |
|
| <Prev in Thread] | Current Thread | [Next in Thread> |
|---|---|---|
| Previous by Date: | Call for papers: 2007 SEMDIAL (Workshop on the Semantics and Pragmatics of Dialogue): 00148, Ron Artstein |
|---|---|
| Next by Date: | Re: Corpora of comic strips/books: 00148, Ryan North |
| Previous by Thread: | Searching Japanese corporai: 00148, Eric J. M. Smith |
| Next by Thread: | Re: Searching Japanese corpora: 00148, Brett Powley |
| Indexes: | [Date] [Thread] [Top] [All Lists] |
| News | FAQ | advertise |