Foreign Language Help
May 12, 2009 4 Comments
I’m currently doing research that involves online communities and multiple languages. As part of this, I’m analyzing some exceedingly popular languages (Spanish, German, Japanese…) as well as some less studied communities (Volapuk, Ukrainian, Esperanto).
The idea is that we’re doing some basic text processing. To reduce the amount of time this takes and the value of the analysis, we’re wanting to exclude a standard list of stop words. These are words, in English, such as in to a and the that, etc. (Examples in English, German, French) While I can find these for most European languages and have learned of other languages (Japanese, Chinese) don’t really have a concept of stop words in their language.
While I’ve found stop word lists for most of the languages, I’m stumped on three languages: Esperanto, Volapuk, Ukrainian, and Bengali. Any insights would be appreciated.