|
Searching Japanese corpora: msg#00144science.linguistics.corpora
Greetings, Following up on our recent thread about grep with Unicode, I'm curious about how people search for text in Japanese-language corpora. My understanding of Japanese is rudimentary, but is it not possible (potentially at least) for the same text to be written in hiragana, katakana, or kanji? In order to find all occurrences of a particular string in a corpus, would I have to do the search 3 times, once for each script? I assume that would be the case for something like grep. But are there more sophisticated query tools which abstract away the question of which script is actually used for data within the corpus? Thanks, Eric J. M. Smith Dept. of Linguistics University of Toronto |
|
| <Prev in Thread] | Current Thread | [Next in Thread> |
|---|---|---|
| Previous by Date: | Re: Question concerning audio file search: 00144, Mike Maxwell |
|---|---|
| Next by Date: | Re: Question concerning audio file search: 00144, Doug Cooper |
| Previous by Thread: | Corpora of comic strips/booksi: 00144, Axel Herold |
| Next by Thread: | Re: Searching Japanese corpora: 00144, Cyrus Shaoul |
| Indexes: | [Date] [Thread] [Top] [All Lists] |
| News | FAQ | advertise |