logo       

Re: Request for help concerning a LSA problem: msg#00046

science.linguistics.corpora

Subject: Re: Request for help concerning a LSA problem

Cecilie,
I suggest you try e.g. the notorious "Human machine interface..." corpus from Landauer's paper An Introduction to Latent Semantic Analysis. I have 'tested' tools I use, scipy (Python) and svdlibc (C), against this. I have also tried to produce the results from Ch.15 with scipy and sdvlibc, but both give the same results, ie. .75 .28 ... the figures in the book seem strange... but I only gave it a guick look.
Good luck
Petr

Cecilie Desiree Widsteen wrote:
Hello all,

I´m currently trying to implement Latent Semantic Analysis, as part of
an automatic classification system. I´m programming in Java, and using
the Jama Matrix package for the matrix stuff. I have stumbled over some
strange problems, and would be grateful if anyone on this list could
offer some help.
My problem is: I have implemented a class which takes care of building a
matrix representation of a corpus, and performs SVD over the
term-by-document matrix. Most of the operations are done by the Jama
class "Matrix". This works fine, except for the fact that when I ran
the program over various small test corpora (like, for instance, the one
from Chapter 15 in Schütze and Manning´s book Foundations of Statistical
NLP) most of the righ and left singular vectors contained the correct
values but with wrong/reversed sign?! E.g. a vector that should have the
values [-0.75,-0.28,-0.20, ...] are assigned the values [0.75,0.28,
...]. Unfortunately, I have limited experience with linear algebra and
the like so now I find myself completely at loss in debugging this...
As far as I can understand, this means that my vectors are pointing in
the opposite direction from the one they should, but why this is escapes
my understanding :)
Any help, hints, tricks and the like are extremely welcome! I can also
send over the source code on request.

Regards,
--
Cecilie D. Widsteen
Department of Linguistics
University of Oslo










<Prev in Thread] Current Thread [Next in Thread>
Google Custom Search

News | FAQ | advertise