|
Request for help concerning a LSA problem: msg#00031science.linguistics.corpora
Hello all, I´m currently trying to implement Latent Semantic Analysis, as part of an automatic classification system. I´m programming in Java, and using the Jama Matrix package for the matrix stuff. I have stumbled over some strange problems, and would be grateful if anyone on this list could offer some help. My problem is: I have implemented a class which takes care of building a matrix representation of a corpus, and performs SVD over the term-by-document matrix. Most of the operations are done by the Jama class "Matrix". This works fine, except for the fact that when I ran the program over various small test corpora (like, for instance, the one from Chapter 15 in Schütze and Manning´s book Foundations of Statistical NLP) most of the righ and left singular vectors contained the correct values but with wrong/reversed sign?! E.g. a vector that should have the values [-0.75,-0.28,-0.20, ...] are assigned the values [0.75,0.28, ...]. Unfortunately, I have limited experience with linear algebra and the like so now I find myself completely at loss in debugging this... As far as I can understand, this means that my vectors are pointing in the opposite direction from the one they should, but why this is escapes my understanding :) Any help, hints, tricks and the like are extremely welcome! I can also send over the source code on request. Regards, -- Cecilie D. Widsteen Department of Linguistics University of Oslo |
|
| <Prev in Thread] | Current Thread | [Next in Thread> |
|---|---|---|
| Previous by Date: | Re: Bible corpora?: 00031, Christian-Emil Ore |
|---|---|
| Next by Date: | corpora of spoken English: 00031, vdipede\@libero\.it |
| Previous by Thread: | post-doc position in statistical text mining and applications, ENST Parisi: 00031, Francois Yvon |
| Next by Thread: | RE: Request for help concerning a LSA problem: 00031, Evgeniy Gabrilovich |
| Indexes: | [Date] [Thread] [Top] [All Lists] |
| News | FAQ | advertise |