[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

join and split with empty delimiter

Il 18/07/2019 12:27, Ben Bacarisse ha scritto:
> Irv Kalb <Irv at furrypants.com> writes:
>> I have always thought that split and join are opposite functions.  For
>> example, you can use a comma as a delimiter:
>>>>> myList = ['a', 'b', 'c', 'd', 'e']
>>>>> myString = ','.join(myList)
>>>>> print(myString)
>> a,b,c,d,e
>>>>> myList = myString.split(',')
>>>>> print(myList)
>> ['a', 'b', 'c', 'd', 'e']
>> Works great.
> Note that join and split do not always recover the same list:
>>>> ','.join(['a', 'b,c', 'd']).split(',')
> ['a', 'b', 'c', 'd']
> You don't even have to have the delimiter in one of the strings:
>>>> '//'.join(['a', 'b/', 'c']).split('//')
> ['a', 'b', '/c']
>> But i've found a case where they don't work that way.  If
>> I join the list with the empty string as the delimiter:
>>>>> myList = ['a', 'b', 'c', 'd']
>>>>> myString = ''.join(myList)
>>>>> print(myString)
>> abcd
>> That works great.  But attempting to split using the empty string
>> generates an error:
>>>>> myString.split('')
>> Traceback (most recent call last):
>>   File "<pyshell#9>", line 1, in <module>
>>     myString.split('')
>> ValueError: empty separator
>> I know that this can be accomplished using the list function:
>>>>> myString = list(myString)
>>>>> print(myString)
>> ['a', 'b', 'c', 'd']
>> But my question is:  Is there any good reason why the split function
>> should give an "empty separator" error?  I think the meaning of trying
>> to split a string into a list using the empty string as a delimiter is
>> unambiguous - it should just create a list of single characters
>> strings like the list function does here.
> One reason might be that str.split('') is not unambiguous.  For example,
> there's a case to be made that there is a '' delimiter at the start and
> the end of the string as well as between letters.  '' is a very special
> delimiter because every string that gets joined using it includes it!
> It's a wild version of ','.join(['a', 'b,c', 'd']).split(',').
> Of course str.split('') could be defined to work the way you expect, but
> it's possible that the error is there to prompt the programmer to be
> more explicit.

It is even more ambiguous if you consider that any string starts with an
infinite number of empty strings, followed by a character, followed by
an infinite number of empty strings, followed by ...
The result wouldn't fit on screen, or in memory for that!