osdir.com


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

join and split with empty delimiter


Danilo Coccia <daniloco at acm.org> writes:

> Il 18/07/2019 12:27, Ben Bacarisse ha scritto:
>> Irv Kalb <Irv at furrypants.com> writes:
>> 
>>> I have always thought that split and join are opposite functions.  For
>>> example, you can use a comma as a delimiter:
>>>
>>>>>> myList = ['a', 'b', 'c', 'd', 'e']
>>>>>> myString = ','.join(myList)
>>>>>> print(myString)
>>> a,b,c,d,e
>>>
>>>>>> myList = myString.split(',')
>>>>>> print(myList)
>>> ['a', 'b', 'c', 'd', 'e']
>>>
>>> Works great.
>> 
>> Note that join and split do not always recover the same list:
>> 
>>>>> ','.join(['a', 'b,c', 'd']).split(',')
>> ['a', 'b', 'c', 'd']
>> 
>> You don't even have to have the delimiter in one of the strings:
>> 
>>>>> '//'.join(['a', 'b/', 'c']).split('//')
>> ['a', 'b', '/c']
>> 
>>> But i've found a case where they don't work that way.  If
>>> I join the list with the empty string as the delimiter:
>>>
>>>>>> myList = ['a', 'b', 'c', 'd']
>>>>>> myString = ''.join(myList)
>>>>>> print(myString)
>>> abcd
>>>
>>> That works great.  But attempting to split using the empty string
>>> generates an error:
>>>
>>>>>> myString.split('')
>>> Traceback (most recent call last):
>>>   File "<pyshell#9>", line 1, in <module>
>>>     myString.split('')
>>> ValueError: empty separator
>>>
>>> I know that this can be accomplished using the list function:
>>>
>>>>>> myString = list(myString)
>>>>>> print(myString)
>>> ['a', 'b', 'c', 'd']
>>>
>>> But my question is:  Is there any good reason why the split function
>>> should give an "empty separator" error?  I think the meaning of trying
>>> to split a string into a list using the empty string as a delimiter is
>>> unambiguous - it should just create a list of single characters
>>> strings like the list function does here.
>> 
>> One reason might be that str.split('') is not unambiguous.  For example,
>> there's a case to be made that there is a '' delimiter at the start and
>> the end of the string as well as between letters.  '' is a very special
>> delimiter because every string that gets joined using it includes it!
>> It's a wild version of ','.join(['a', 'b,c', 'd']).split(',').
>> 
>> Of course str.split('') could be defined to work the way you expect, but
>> it's possible that the error is there to prompt the programmer to be
>> more explicit.
>
> It is even more ambiguous if you consider that any string starts with an
> infinite number of empty strings, followed by a character, followed by
> an infinite number of empty strings, followed by ...
> The result wouldn't fit on screen, or in memory for that!

Right, but that can be finessed by saying that two delimiters can't
overlap, which is the usual rule.  A reasonable interpretation of "not
overlapping" might well exclude having more the one delimiter in the
same place.

-- 
Ben.