array and struct 64-bit Linux change in behavior Python 3.7 and 2.7
> On 3 Dec 2019, at 01:50, Richard Damon <Richard at Damon-Family.org> wrote:
> On 12/2/19 4:25 PM, Barry Scott wrote:
>>> On 2 Dec 2019, at 17:55, Rob Gaddi <rgaddi at highlandtechnology.invalid> wrote:
>>> On 12/2/19 9:26 AM, Chris Clark wrote:
>>>> Test case:
>>>> import array
>>>> array.array('L', )
>>>> # x.itemsize == 8 rather than 4
>>>> This works fine (returns 4) under Windows Python 3.7.3 64-bit build.
>>>> Under Ubuntu; Python 2.7.15rc1, 3.6.5, 3.70b3 64-bit this returns 8. Documentation at https://docs.python.org/3/library/array.html explicitly states 'L' is for size 4.
>>>> It impacts all uses types of array (e.g. reading from byte strings).
>>>> The struct module is a little different:
>>>> import struct
>>>> x = struct.pack('L', 0)
>>>> # len(x) ===8 rather than 4
>>>> This can be worked around by using '=L' - which is not well documented - so this maybe a doc issue.
>>>> Wanted to post here for comments before opening a bug at https://bugs.python.org/
>>>> Is anyone seeing this under Debian/Ubuntu?
>>> I'd say not a bug, at least in array. Reading that array documentation you linked, 4 is explicitly the MINIMUM size in bytes, not the guaranteed size.
>> I'm wondering how useful it is that for array you can read from a file but have no ideas how many bytes each item needs.
>> If I have a file with int32_t in it I cannot from the docs know how to read that file into an array.
>>> The struct situation is, as you said, a bit different. I believe that with the default native alignment @, you're seeing 4-byte data padded to an 8-byte alignment, not 8-byte data. That does seem to go against what the struct documentation says, "Padding is only automatically added between successive structure members. No padding is added at the beginning or the end of the encoded struct."
>> The 'L' in struct is documented for 3.7 to use 4 bytes, but in fact uses 8, on fedora 31. Doc bug?
>> Given I have exact control with b, h, i, and q but L is not fixed in size I'm not sure how it can be used with certainty across OS and versions.
> Actually, you DON'T have exact control with those sizes, it just happens
> that all the platforms you are using happen to have the same size for
> those types.
According to the docs for struct (python 2.7 and python 3.8) I do have exact control for the types I listed.
Or did I miss a caveat on that page?
The docs for array indeed show that you have no exact control and that is what I'm commenting on.
As other have observed that makes array the wrong tool to read data of a fixed format.
> Welcome to the ambiguity in the C type system, the basic
> types are NOT fixed in size.
Of course that is why int32_t etc where added to the C standards.
> L means 'Long' and as Christian said, that
> is 8 byte long on Linux-64 bit. 'L' is exactly the right type for
> interfacing with a routine defined as taking a long. The issue is that
> you don't know what type a int32_t will be (it might be int, or it might
> be long, and long might not be 32 bits, it will be at least 32 bits).
> Perhaps array could be extended so that it took '4' for a 4 byte integer
> and '8' for an 8 byte integer (maybe 'U4' and 'U8' for unsigned). Might
> as well also allow 1 and 2 for completeness for char and short (but
> those are currently consistent).
Personally I have never thought to use array.
I have user struct and ctypes extensively and they give me the
documented control I need to work with data structures and APIs.
> Richard Damon