[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

python 2 to 3 converter

Chris Angelico wrote:
> On Tue, Dec 10, 2019 at 12:15 PM songbird <songbird at anthive.com> wrote:
>> Chris Angelico wrote:
>> ...
>> >
>> > Here's an example piece of code.
>> >
>> > sock = socket.socket(...)
>> > name = input("Enter your username: ")
>> > code = input("Enter the base64 code: ")
>> > code = base64.b64decode(code)
>> > sock.write("""GET /foo HTTP/1.0
>> > Authentication: Demo %s/%s
>> >
>> > """ % (name, code))
>> > match = re.search(r"#[A-Za-z0-9]+#", sock.read())
>> > if match: print("Response: " + match.group(0))
>> >
>> > Your challenge: Figure out which of those strings should be a byte
>> > string and which should be text. Or alternatively, prove that this is
>> > a hard problem. There are only a finite number of types - two, to be
>> > precise - so by your argument, this should be straightforward, right?
>>   this isn't a process of looking at isolated code.  this
>> is a process of looking at the code, but also the test cases
>> or working examples.  so the inputs are known and the code
>> itself gives clues about what it is expecting.
> Okay. The test cases are also written in Python, and they use
> unadorned string literals to provide mock values for input() and the
> socket response. Now what?

  wouldn't there be clues in how that string is used in
the program itself (either calls to converters or when
the literal is assigned to some variable or used in a
print statement)?

> What if the test cases are entirely ASCII characters?

  it all goes utf in that case and the string is not 

> What if the test cases are NOT entirely ASCII characters?

  if the program has more than one language then you may
have to see what the character set falls into.  is it hex
it it octal or binary or some language.  i'd guess there
will be clues in the code as to how that string is used

>>   regular expressions can be matched in finite time as well
>> as a fixed length text of any type can be scanned as a match
>> or rejected.
>>   if you examined a thousand uses of match and found the
>> pattern used above and then examined what those programs did
>> with that match what would you select as the first type, the
>> one used the most first, if that doesn't work go with the 2nd,
>> etc.
> That's not really the point. Are your regular expressions working with
> text or bytes? Does your socket return text or bytes?

  clues in the program again.  you're not limited to looking
only at the string itself, but the context of the entire
program.  i'm sure patterns are there to be found if you
can scan enough programs they'll start showing up.  once
you've found a viable pattern then you have a way to
generate a test case to see if it works or not.

> I've deliberately chosen these examples because they are hard. And I
> didn't even get into an extremely hard problem, with the inclusion of
> text inside binary data inside of text inside of bytes. (It does
> happen.)
> These problems are fundamentally hard because there is insufficient
> information in the source code alone to determine the programmer's
> intent.

  that is why we would be running the program itself and
examining test case results.

  none of these programs run in isolation, information is
known what they expect as input or produce as output.