[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

I wrote Python's sort, so I may know something about it ;-) To my eyes, no, there's not "an issue" here, but a full explanation would be very involved. For some sorting algorithms, it's possible to guarantee a redundant comparison is never made. For example, a pure insertion sort. But Python's sort is quite complex, using a number of distinct strategies dynamically adapting to structure found in the data as it goes along. There's a possibility for a small bit of redundant work whenever switching from one strategy to another. That's typical of "hybrid" strategies. Your case involves a switch between Python's sort's two simplest strategies. Let's just look at what matters: [22, 33, 11, 11] The first step is searching for "a natural run", an initial sub-sequence that's already sorted. 22 < 33 says the first two elements are already in order. Then 11 < 33 says that's the end of the run. That part is over now. The next step is using a binary insertion sort to move 11 into the already-sorted [22, 33] prefix. A binary search starts looking "in the middle" first, but a 2-element run is so short that 33 "is the middle", so it first compares 11 to 33 _again_. So it goes. Code to detect that special case could be added, but it would probably cost more than it saves (it always costs time to test for it, but that time is pure waste unless the tests succeed). This is specific to an ascending natural run of length 2, so is quite a special case. For example, in [22, 33, 44, 11, 11] 11 < 44 ends the natural run, and then the binary search first compares 11 to 33 ("the middle" of the 3-element natural run), and no redundant comparison gets made. But in [22, 33, 44. 43, 11] 43 gets compared to 44 again on the _second_ iteration of the binary search loop. In general, this particular strategy switch ends up doing at most one redundant comparison only if the first element after an ascending natural run belongs immediately before the last element of the run. For technical reasons, this can happen at most min(len(the_list) / 32, number_of_natural_runs_in_the_list) times, so it's generally trivial in an average-case O(N log N) process. It's so rare it would be minor in an O(N) process too, unless N is very small. A principled way to avoid this would be to skip the "find a run" step if N is very small, leaving the whole thing to binary insertion sort. Then a redundant comparison would never be done for small N. But that would end up doing more comparisons _overall_ in common cases where a short list starts with a relatively (compared to N) sizable ascending or descending run. So I'm happy enough with the tradeoffs already in place. On Wed, Aug 22, 2018 at 10:37 AM ??? <1520306395 at qq.com> wrote: > Why compare twice? > > class Student(object): > > def __init__(self, name, score): > self.name = name > self.score = score > > def __str__(self): > return '(%s: %s)' % (self.name, self.score) > > __repr__ = __str__ > > def __lt__(self, s): > #print(self, '--', s) > if(self.score<s.score): > print(self, '<', s) > return True > if(self.score>s.score): > print(self, '>', s) > return False > if(self.score==s.score): > if(self.name>s.name): > print(self, '>', s) > return False > if(self.name<s.name): > print(self, '<', s) > return True > if(self.name==s.name): > print(self, '==', s) > return False > > def __eq__(self, s): > return (self.score == s.score) and (self.name == s.name) > def __gt__(self, s): > return not ((self == s) or (self < s)) > def __le__(self, s): > return ((self == s) or (self < s)) > def __ge__(self, s): > return ((self == s) or (self > s)) > def __nq__(self, s): > return not (self == s) > > L = [Student('Tim', 22), Student('Bob', 33), Student('Kevin', 11), Student('Alice', 11)] > print(sorted(L)) > > Output: > (Bob: 33) > (Tim: 22) > (Kevin: 11) < (Bob: 33) > (Kevin: 11) < (Bob: 33) > (Kevin: 11) < (Tim: 22) > (Alice: 11) < (Tim: 22) > (Alice: 11) < (Kevin: 11) > [(Alice: 11), (Kevin: 11), (Tim: 22), (Bob: 33)] > > Best regards, > Xiaofeng > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/tim.peters%40gmail.com

- Prev by Date:
**[Python-Dev] Starting to use gcc-8 on upstream Python project CI** - Next by Date:
**[Python-Dev] Let's change to C API!** - Previous by thread:
**[Python-Dev] Issue?** - Index(es):