osdir.com


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Unicode filenames


Bob van der Poel wrote:

> I have some files which came off the net with, I'm assuming, unicode
> characters in the names. I have a very short program which takes the
> filename and puts into an emacs buffer, and then lets me add information
> to that new file (it's a poor man's DB).
> 
> Next, I can look up text in the file and open the saved filename.
> Everything works great until I hit those darn unicode filenames.
> 
> Just to confuse me even more, the error seems to be coming from a bit of
> tkinter code:
>  if sresults.has_key(textAtCursor):
>         bookname = os.path.expanduser(sresults[textAtCursor].strip())
> 
> which generates
> 
>   UnicodeWarning: Unicode equal comparison failed to convert both
>   arguments
> to Unicode - interpreting them as being unequal  if
> sresults.has_key(textAtCursor):
> 
> I really don't understand the business about "both arguments". Not sure
> how to proceed here. Hoping for a guideline!

I cannot provoke the error with dict.has_key() over here, only with direct 
comparisons:

>>> u"a" == u"?"
False
>>> u"a" == "?"
__main__:1: UnicodeWarning: Unicode equal comparison failed to convert both 
arguments to Unicode - interpreting them as being unequal
False

The problem is that you are mixing strings of type str and type unicode, and 
generally speaking the remedy is to use unicode throughout. In your case
this means opening files with io.open() or codecs.open() instead of the 
builtin, and invoking os.listdir() with a unicode argument.

I don't remember about Tkinter, I think it provides ascii-only strings as 
str and everything else as unicode. If that's correct you could play it safe 
with a conversion function:

def ensure_unicode(s):
    if isinstance(s, bytes):
        return s.decode("ascii")
    return s

Your other option is to live with the *warning* -- it's not an error, just a 
reminder that you have to rethink your types once you switch to Python 3.

You can also switch off the message with

python -W ignore::UnicodeWarning yourscript

or by setting the PYTHONWARNINGS environment variable.