Please take our Survey
logo       

Choosing A Webhost:
A web hosting service is a type of Internet hosting service that allows individuals and organizations to provide their own website accessible via the World Wide Web. Web hosts are companies that provide space on a server they own for use by their clients as well as providing Internet connectivity, typically in a data center. Web hosts can also provide data center space and connectivity to the Internet for servers they do not own to be located in their data center, called colocation. more...

Re: Parameter Settings in BaumWelchTraining]: msg#00020

java.bio.general

Subject: Re: Parameter Settings in BaumWelchTraining]

Sorry for the previous error.
---------------------------- Original Message ----------------------------
Subject: Re: [Biojava-l] Parameter Settings in BaumWelchTraining
From: sacoca@xxxxxxxxxxxxx
Date: Fri, March 12, 2004 12:27 am
To: mark.schreiber@xxxxxxxxxxxxxxxxxx
--------------------------------------------------------------------------

Here is the code I have for the training. Using what you told me below, I
can retreive all of the weights that I calculated manually for the hmm
(distributions for the transitions and distributions for the alphabet of
each state). What I do not understand is how to use this information and
the sequences stored in a file to run the BaumWelchAlgorithm and then
retreive the optimized values calculated by the algorithm to set them back
into my HMM.

//Retreive the alphabet of all states
FiniteAlphabet SA = hmm.stateAlphabet();
Iterator i = SA.iterator();

SimpleModelTrainer MT = new SimpleModelTrainer();
MT.registerModel(hmm);

//go through each state
while(i.hasNext())
{Symbol Currentstate = (Symbol)i.next();

//Retreive the distribution of all transitions from the current state
FiniteAlphabet From = hmm.transitionsFrom((State)Currentstate);
Distribution d = hmm.getWeights((State)Currentstate);
Iterator i2 = From.iterator();

//go through it and look at all the weights for each of the transitions
while(i2.hasNext())
{Symbol s = (Symbol)i2.next();
System.out.println("From state "+Currentstate.getName()+
"To State "+s.getName()+
"Weight "+d.getWeight(s));}

//get the distribution for the alphabet of the current state
Distribution d2 =((EmissionState)Currentstate).getDistribution();
FiniteAlphabet IN = (FiniteAlphabet)hmm.emissionAlphabet();
Iterator i3 = IN.iterator();
//you can go through it the same way as above using a while loop
*****************************************************************
This is what I don't understand!!!!
*****************************************************************
here, we have a set of training sequences stored in a file in fasta format
that i'd like to use with the BaumWelch algorithm to optimize the
transition distributions mentionned above.

//This is the file with all the training sequences
BufferedInputStream is = new BufferedInputStream(new
FileInputStream("z:/Sequences.faa"));

//Load the file with the SequenceDB class
SequenceDB DB = SeqIOTools.readFasta(is, ProtAlphabet);

//use 100 cycles as the stop criteria
StoppingCriteria stopper = new StoppingCriteria()
{public boolean isTrainingComplete(TrainingAlgorithm ta)
{return (ta.getCycle() > 100);}};

*****************************************
This part is what I am clueless about
*****************************************
//How do I optimize my hmm with the BaumWelch algorithm and retreive //the
optimized values ? How do I train the distribution above with //the baum
welch and the sequences that I have ?
DP dp= DPFactory.DEFAULT.createDP(hmm);
BaumWelchTrainer bwt = new BaumWelchTrainer(dp);
}

PS : I do not know why you are helping all of us here but thank you. It
makes Biojava a lot easier to deal with.

Steve

> Hi Stephane -
>
> Within EmissionState you can set a Distribution that contains emission
probabilities for the Symbols states emission alphabet using the
setDistribution method. This Distribution will be your predetermined
weights.
>
> To set the transition probabilities you can use the setWeights(State
source, Distribution weights). The source is the state you are
> transitioning from and the weights is the probability of transitioning
to any State that the source connects too. Because States implement
Symbol you can put them in a Distribution.
>
> To make a Distribution of States that state 'a' could connect to use the
following pseudo code:
>
> State a;
> Model m;
> FiniteAlphabet endPoints;
>
> endPoints = m.transitionsFrom(a);
> Distribution d =
> DistributionFactory.DEFAULT.createDistribution(endPoints);
>
> //You can then train d or set it's weights and put it back in the model
with
>
> m.setWeights(a, d);
>
> Mark Schreiber
> Principal Scientist (Bioinformatics)
>
> Novartis Institute for Tropical Diseases (NITD)
> 1 Science Park Road
> #04-14 The Capricorn, Science Park II
> Singapore 117528
>
> phone +65 6722 2973
> fax +65 6722 2910
>
>
>
>
>
> sacoca@xxxxxxxxxxxxx
> Sent by: biojava-l-bounces@xxxxxxxxxxxxxxxxxxx
> 03/12/2004 06:11 AM
>
>
> To: "Biojava Mailing List" <biojava-l@xxxxxxxxxxx>
> cc:
> Subject: [Biojava-l] Parameter Settings in
> BaumWelchTraining
>
>
> Hi all. I'm trying to optimize the transition states probabilities for
my HMM. I already have set them to values which I think are pretty good.
Since I know the Baum Welch can only help with the scores and optimize
them up to a local maxima I thought of using the parameters I calculated
as a starting point. The problem is that I don't know how!
> I followed the example in biojava:
>
> ....
> //train the model to have uniform parameters
> ModelTrainer mt = new SimpleModelTrainer();
> //register the model to train
> mt.registerModel(hmm);
>
> I want to use the values already set in my hmm as the starting
parameters in the BaumWelch. I don't want to use the uniform
distribution as indicated below!
>
> //as no other counts are being used the null weight will cause
> everything to be uniform
> mt.setNullModelWeight(1.0);
> mt.train();
>
> I tried adding counts and looking up examples on the net but ended up
more confused than I started. How do I use the addCounts to make this
work!
>
> Stephane Acoca
> Master's Student
> McGill Center for Bioinformatics
>
> _______________________________________________
> Biojava-l mailing list - Biojava-l@xxxxxxxxxxx
> http://biojava.org/mailman/listinfo/biojava-l
>
>
>



_______________________________________________
Biojava-l mailing list - Biojava-l@xxxxxxxxxxx
http://biojava.org/mailman/listinfo/biojava-l



<Prev in Thread] Current Thread [Next in Thread>
Google Custom Search

Free Magazines

Cisco News
Receive a free quarterly e-newsletter with exclusive articles on how Cisco IT uses its own products and solutions to enable the business.
subscribe

Systems Management News, the newspaper for IT systems administration and data center managers! Each issue of Systems Management News is chock-full of news and analysis to help you understand what's happening in your field.
subscribe

The Enterprise Newsweekly eWeek is the essential technology information source for builders of e-business.
subscribe

Oracle Magazine Oracle Magazine contains technology strategy articles, sample code, tips, Oracle and partner news, how to articles for developers and DBAs, and more. Oracle (NASDAQ: ORCL) is the world's largest enterprise software company.
subscribe

Total Telecom Total Telecom is "The Economist of the communications industry".
subscribe

Navigation

Home | advertise | OSDir is an inevitable website. super tiny logo