|
Re: [pm-h] [SPAM] [PBML] complex data structure help: msg#00012lang.perl.perl-mongers.houston
Thanks for the suggestions. I'm torn between using Perl structures (easier in the short term) and a database (harder in the short term, but better for long-term storage). Since we're planning on being able to store anywhere from months to years worth of data, a database is probably my best bet. Now I just gotta put on my (tiny, ill-fitting) DBA hat, and scratch out a schema. 8-) Paul 3:59am, Kevin Shaum wrote: > On Monday 27 March 2006 5:13 pm, Paul Archer wrote: >> I'm writing a log analyzer (a la Webalyzer) to analyze Solaris' nfslog >> files. They're in the same format as wu-ftpd xferlog files. I'd use an >> existing solution, but I can't find anything that keeps track of reads vs >> writes, which is critical for us. >> Anyway, I need to be able to sort by filesystem, client machine, user, time >> (with a one-hour base period) read, write, or total usage. >> Can anyone suggest a data structure (or pointers to same) that will allow >> me to pull data out in an arbitrary fashion (ie users on X day sorted by >> data written)? >> Once I have the structure, I can deal with doing the reports, but I want to >> make sure I don't shoot myself in the foot with the structure. >> >> I was thinking of a hash of hashes, where the keys are filesystems pointing >> to hashes where the keys are client machines, etc, etc. But it seems that >> approach would be inefficent for lookups based on times or users (for >> example). > > The simplest thing to do would be to store it all as a simple list of > (references to) lists, then 'grep' and 'sort' the big list as the query > requires. > > @result = sort { $a->[1] lt $b->[1] } > grep { $_->[2] >= $time0 and $_->[2] <= $time1 } > grep { $_->[0] eq 'myhost' } > @dataset; > > A more readable (but possibly less efficient) version would store each entry > in the big list as (a reference to) a hash: > > @result = sort { $a->{username} lt $b->{username1} } > grep { $_->{time} >= $time0 and $_->{time} < $time1 } > grep { $_->{hostname} eq 'myhost' } > @dataset; > > If the data set is large enough that that's not practical, then the suggestion > to go to a relational database (e.g., SQLite) makes sense. But it sounds like > you're thinking of keeping it all in RAM anyway. > > Hope this helps. > > Kevin > _______________________________________________ > Houston mailing list > Houston@xxxxxx > http://mail.pm.org/mailman/listinfo/houston > ----------------------------------------------- "Working with babies had its problems... but then I tried working with chickens." Jim Henson, talking about making "Labyrinth" -----------------------------------------------
|
|
| <Prev in Thread] | Current Thread | [Next in Thread> |
|---|---|---|
| Previous by Date: | Re: [pm-h] [SPAM] [PBML] complex data structure help, Kevin Shaum |
|---|---|
| Next by Date: | Re: [pm-h] complex data structure help, Paul Archer |
| Previous by Thread: | Re: [pm-h] [SPAM] [PBML] complex data structure help, Kevin Shaum |
| Next by Thread: | [pm-h] Device::USB, G. Wade Johnson |
| Indexes: | [Date] [Thread] [Top] [All Lists] |
| News | FAQ | advertise |