[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

join and split with empty delimiter

Irv Kalb <Irv at furrypants.com> writes:

> I have always thought that split and join are opposite functions.  For
> example, you can use a comma as a delimiter:
>>>> myList = ['a', 'b', 'c', 'd', 'e']
>>>> myString = ','.join(myList)
>>>> print(myString)
> a,b,c,d,e
>>>> myList = myString.split(',')
>>>> print(myList)
> ['a', 'b', 'c', 'd', 'e']
> Works great.

Note that join and split do not always recover the same list:

>>> ','.join(['a', 'b,c', 'd']).split(',')
['a', 'b', 'c', 'd']

You don't even have to have the delimiter in one of the strings:

>>> '//'.join(['a', 'b/', 'c']).split('//')
['a', 'b', '/c']

> But i've found a case where they don't work that way.  If
> I join the list with the empty string as the delimiter:
>>>> myList = ['a', 'b', 'c', 'd']
>>>> myString = ''.join(myList)
>>>> print(myString)
> abcd
> That works great.  But attempting to split using the empty string
> generates an error:
>>>> myString.split('')
> Traceback (most recent call last):
>   File "<pyshell#9>", line 1, in <module>
>     myString.split('')
> ValueError: empty separator
> I know that this can be accomplished using the list function:
>>>> myString = list(myString)
>>>> print(myString)
> ['a', 'b', 'c', 'd']
> But my question is:  Is there any good reason why the split function
> should give an "empty separator" error?  I think the meaning of trying
> to split a string into a list using the empty string as a delimiter is
> unambiguous - it should just create a list of single characters
> strings like the list function does here.

One reason might be that str.split('') is not unambiguous.  For example,
there's a case to be made that there is a '' delimiter at the start and
the end of the string as well as between letters.  '' is a very special
delimiter because every string that gets joined using it includes it!
It's a wild version of ','.join(['a', 'b,c', 'd']).split(',').

Of course str.split('') could be defined to work the way you expect, but
it's possible that the error is there to prompt the programmer to be
more explicit.