|
String>>hash performance: msg#01364lang.smalltalk.squeak.general
I was doing some work in Squeak 3.0 at home, and found that (for the particular thing I was doing) most of the time was going in String>>hash. Looking in Squeak 3.2, I see that String>>hash has been replaced by code that calls a primitive. It has made an improvement to the time, no two ways about that. It hasn't made as much of an improvement as I expected. Number of instances of String : 37,074 The sum of #size of each: 996,905 Time to hash all of them: 3.648 sec Average time to hash one: 98.5 usec Time to hash all of them in C: 0.078 sec Average time to hash one in C: 2.1 usec Measuring the time in Squeak: s := String allInstances. t0 := [s do: [:each | each]] timeToRun. t1 := [s do: [:each | each hash]] timeToRun. (t1 - t0) * 1.0e-3 "Print It" (t1 - t0) * 1.0e3 / s size "Print It" Measuring the time in C: f := StandardFileStream newFileNamed: 'strpool.dat'. s do: [:each | f print: each size; space; nextPutAll: each]. f close. Then a C program loads the strings, hashes them all 400 times, and writes out the times. Controlling for hardware: Squeak and the C program were both run on the same machine. How do I know the C code was doing the same thing as the Squeak hash code? String>>hash ^String stringHash: self initialHash: self species hash String class>>stringHash: aString initialHash: speciesHash |stringSize hash low| <primitive: 'primitiveStringHash' module: 'MiscPrimitivePlugin'> self var: #aHash declareC: 'int speciesHash'. self var: #aString declareC: 'unsigned char *aString'. stringSize := aString size. hash := speciesHash bitAnd: 16rFFFFFFF. 1 to: stringSize do: [:pos | hash := hash + (aString at: pos) asciiValue. "Begin hashMultiply" low := hash bitAnd: 16383. hash := (16r260D * low + ((16r260D * (hash bitShift: -14) + (16r0065 * low) bitAnd: 16383)) * 16384)) bitAnd: 16r0FFFFFFF. ]. ^hash I turned that into C, that's how. I don't actually understand the rules for Slang; I presume that (aString at: pos) asciiValue turns into aString[pos] (because at any rate that's the number that always used to be used). So if #stringHash:initialHash: is compiled into C, how come C that does the same thing is running about 47 times faster? Where is the time going? What have I misunderstood?
|
|
| <Prev in Thread] | Current Thread | [Next in Thread> |
|---|---|---|
| Previous by Date: | Re: Design Patterns and Collection Classes, Richard A. O'Keefe |
|---|---|
| Next by Date: | Re: Design Patterns and Collection Classes, Richard A. O'Keefe |
| Previous by Thread: | Re: [BUG]Collection>>removeAll:], David Griswold |
| Next by Thread: | Re: String>>hash performance, Tim Rowledge |
| Indexes: | [Date] [Thread] [Top] [All Lists] |
| News | FAQ | advertise |