[Python-Dev] The future of the wchar_t cache
On 22Oct2018 0413, Victor Stinner wrote:
> For code like "for name in os.listdir(): open(name): ...." (replace
> listdir with scandir if you want to get file metadata), the cache is
> useless, since the fresh string has to be converted to wchar_t*
> anyway, and the cache is destroyed at the end of the loop iteration,
> whereas the cache has never been used...
Agreed the cache is useless here, but since the listdir() result came in
as wchar_t we could keep it that way (assuming we'd only be changing it
to char), and then there wouldn't have to be a conversion when we
immediately pass it back to open().
That said, I spent some time yesterday converting the importlib cache to
use scandir and separate caches for dir/file (to avoid the stat calls)
and it made very little overall difference. I have to assume the string
manipulation dominates. (Making DirEntry lazily calculate its .path had
a bigger impact. Also, I didn't try to make Windows flush its own stat
cache, and accessing warm files is much faster than cold ones.)
> I'm not saying that the cache is useless. I just doubt that it's so
> common that it really provide any performance benefit.
I think that it is mostly useless, but if we can transparently keep many
strings "native" size, that will handle many of the useful cases such as
the single-use pass-through scenario like above.