logo       

Re: bug while wrapping const char arrays without an explicitly defined size: msg#00190

programming.swig

Subject: Re: bug while wrapping const char arrays without an explicitly defined size

David Beazley wrote:

Marcelo Matus writes:
> >
> Yes, the strings are NULL terminated, but the null character appears in the
> python side,
>
IMHO, the NULL character should *not* appear. It's the string
terminator, it's not part of the string.

> char hi_a [] = {'h','e','l','l','o'}; => 'hello'
> char hi_b [] = "hello"; => 'hello\0'
> char hi_c [] = {'h','e','l','l','o', 0};=> 'hello\0'
> char hi_d [5] = {'h','e',0,'l','o'}; => 'he\0lo'
> char hi_e [6] = "he\0lo"; => 'he\0lo\0'
> > The last '0' char is preserved since python preserve all the null chars if
> you use a known size. This is done in this way since people also use the
> char arrays as binary forms, are '0' char can appears in any place, like
> > char octect[8];
> > Hence, is not neccesary you have a NULL ending character, and if there is
> one, is not clear you can ignore it.
>
If people are not using NULL terminated strings in C, they can write
special typemaps to deal with it. In my experience, most of the cases you've shown above are rare and/or non-existent in most programs.

> > Note that if you pass back to C/C++ either 'hello' or 'hello\0', both > will work as espected

Except that if you try to test in Python

if a == "hello"

it won't work.

> > But, anyway, I remember only a few user worrying about fully preserving > the intra or final
> NULL characters in an array. So, if you prefer, we can always chop all > the NULL characters
> from the end.
>
The SWIG documentation is pretty clear about the handling of strings.

"The char * datatype is handled as a NULL-terminated ASCII
string. SWIG maps this into a 8-bit character string in the target
scripting language. SWIG converts character strings in the target
language to NULL terminated strings before passing them into
C/C++. The default handling of these strings does not allow them to
have embedded NULL bytes. Therefore, the char * datatype is not
generally suitable for passing binary data. However, it is possible to
change this behavior by defining a SWIG typemap. See the chapter on
Typemaps for details about this. "


And that is the way the 'char *' datatype is working, nothing has been changed,
that is what we call "strings", ie, the one that always have a NULL ending
character.

The thing is that 'char*' and 'char[ANY]' are different, in C you can always get
sizeof(char[ANY]) -> ANY, you don't need to use strlen because the size
is known, C doesn't require the char[ANY] to be NULL-terminated, etc.

And the size part was the thing the char[ANY] typemap was trying to capture, ie,
if you define char[20], you always get a string of size 20, no matter if you
have a NULL char or not.

But I will add the code to delete the ending NULL chars by default, so,
char *, char[ANY] and char[] will look more alike.

Should we also need to add something to the docs saying that
the same rules for the char * datatype will apply to char[ANY]
and char[]?

Marcelo

All C string strings regardless of specification (array, pointers,
whatever) should default to NULL-terminated. Typemaps can be used to deal with exceptions to that rule.

-- Dave




_______________________________________________
Swig maillist - Swig@xxxxxxxxxxxxxxx
http://mailman.cs.uchicago.edu/mailman/listinfo/swig



<Prev in Thread] Current Thread [Next in Thread>
Google Custom Search

News | FAQ | advertise