Brian Hurt wrote:
> On Wed, 30 Jun 2004, Jesse Guardiani wrote:
>
>> Are you saying that the following code (for int lists) is faster than the
>> generic list intersect function I posted for large lists?
>
> Yes. Even faster would be to use a specialized compare function, like:
>
> module IntSet = Set.Make (
> struct
> type t = int
> let compare (x: int) y =
> if (x = y) then 0
> else if (x < y) then -1
> else 1
> end
> );;
>
>> let list_to_set = List.fold_left
>> (fun x y -> IntSet.add y x)
>> IntSet.empty
>> ;;
>
> This is O(N log N).
>
>>
>>
>> let inter_int_list il1 il2 =
>> let t1 = list_to_set il1 in
>> let t2 = list_to_set il2 in
>> IntSet.elements (IntSet.inter t1 t2);
>> ;;
>
> This is O(N log N) as well. Maybe O(N), I haven't looked.
>
> Converting the set back to a list is O(N) as well.
>
> How much do you know about big-O notation? Sounds like little or nothing.
> Here's the core of the idea. Measure how long the algorithms take for
> various lengths of time, and you'll come up with some formula- given n,
> the number of elements in the set, computing the intersection will take
> a*n^2 + b*n + c seconds, for some constants a, b, and c, using your list
> method. Using a set, the formula will instead look like x*n*log(n) + y*n
> + z, for some constants x, y, and z.
>
> No matter what the constants are, there will be an n such that
> a*n^2 + b*n + c > x*n*log(n) + y*n + z. At that point or larger, the set
> based implementation starts being faster. This becomes more obvious if
> you look at how the two functions scale. If n elements takes t time, then
> how long does 2*n elements take? With the list-based algorithm, it'll
> take 4*t time, more or less. With the set based implementation, it'll
> take 2*t + c time, for some small c. The time the list based
> implementation takes grows faster than the set based implementation, and
> sooner or later it'll overtake and pass it.
>
> Worse yet, this generally happens sooner than you think. I'd predict that
> the list based implementation and the set based implementation will take
> the same amount of time for n somewhere between 3 and 100. Let's be
> generous and assume the balance point is n-64 (note: I haven't measured
> this, so don't depend upon this number! I'm just picking a number to make
> my math easier). At n=64 elements, both algorithms take the same time-
> 100 microseconds. How much time do they take at n=1024 elements? The
> list based algorithm will take about 25,600 microseconds. The Set based
> implementation, on the other hand, will take only about 2,700 microseconds
> (I get this number the following way- I know that k*64*log2(64) = 100
> usec, solving for k = 0.26, and then I plug that into k*1024*log2(1024)) .
> At 64 elements they're the same speed, at 1024 elements, the set based
> implementation is 10 times faster. At 2^20th elements, the set based
> implementation is almost 5,000 times faster.
>
> This is why experienced programmers are much more interested in optimizing
> algorithsm- it's where the big payoffs are. Basically, you're not going
> to see 10-fold improvements in speed on the same machine. Maybe, just
> maybe, badly interepreted code vr.s super optimized compiled code has a
> 10-fold speed difference. Not much more than that. And 10-fold increases
> in computer performance take the better part of a decade- basically,
> 350MHz PIIIs vr.s 3.5GHz P4s. How long ago were 350MHz PIIIs hot? Six
> years? Seven years? At n=1024, the set based intersection on a 350MHz
> PIII is about as fast as the list based intersection on a 3.5GHz P4.
That's all incredibly interesting. I plan to print it out and study it,
but do you "experienced programmers" ever actually write code that
DOES something? Or do you just fiddle with algorithms all day? :)
I'm going to use my simplified list intersect function for now (my lists
are very small, usually less than 10 elements), and later when I have some
time I'll benchmark the two algorithms and see where they intersect. If the
intersection point is high enough, then it may well be worth it to provide
two versions of the function designed for different sized lists.
I just don't know how much time I can spare to benchmark algorithms when
I should be writing code that does actual work. Sometimes you have to
leave theory behind and get your hands dirty...
--
Jesse Guardiani, Systems Administrator
WingNET Internet Services,
P.O. Box 2605 // Cleveland, TN 37320-2605
423-559-LINK (v) 423-559-5145 (f)
http://www.wingnet.net
-------------------------------------------------------
This SF.Net email sponsored by Black Hat Briefings & Training.
Attend Black Hat Briefings & Training, Las Vegas July 24-29 -
digital self defense, top technical experts, no vendor pitches,
unmatched networking opportunities. Visit www.blackhat.com
|