osdir.com

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Python-Dev] Status of PEP 3145 - Asynchronous I/O for subprocess.popen


*This* is the type of conversation that I wanted to avoid. But I'll answer
your questions because I used to do exactly the same thing.

On Fri, Mar 28, 2014 at 3:20 AM, Victor Stinner <victor.stinner at gmail.com>wrote:

> 2014-03-28 2:16 GMT+01:00 Josiah Carlson <josiah.carlson at gmail.com>:
> > def do_login(...):
> >     proc = subprocess.Popen(...)
> >     current = proc.recv(timeout=5)
> >     last_line = current.rstrip().rpartition('\n')[-1]
> >     if last_line.endswith('login:'):
> >         proc.send(username)
> >         if proc.readline(timeout=5).rstrip().endswith('password:'):
> >             proc.send(password)
> >             if 'welcome' in proc.recv(timeout=5).lower():
> >                 return proc
> >     proc.kill()
>
> I don't understand this example. How is it "asynchronous"? It looks
> like blocking calls. In my definition, asynchronous means that you can
> call this function twice on two processes, and they will run in
> parallel.
>

In this context, async means not necessarily blocking. If you didn't
provide a timeout, it would default to 0, which would return immediately
with what was sent and/or received from the subprocess. If you don't
believe me, that's fine, but it prevents meaningful discussion.

Using greenlet/eventlet, you can write code which looks blocking, but
> runs asynchronously. But I don't think that you are using greenlet or
> eventlet here.
>

You are right. And you are talking about something that is completely out
of scope.


> I take a look at the implementation:
> http://code.google.com/p/subprocdev/source/browse/subprocess.py
>
> It doesn't look portable. On Windows, WriteFile() is used. This
> function is blocking, or I missed something huge :-) It's much better
> if a PEP is portable. Adding time.monotonic() only to Linux would make
> the PEP 418 much shorter (4 sentences instead of 10 pages? :-))!
>

Of course it's not portable. Windows does things differently from other
platforms. That's one of the reasons why early versions required pywin32.
Before you reply to another message, I would encourage you to read the bug,
the pep, and perhaps the recipe I just posted: http://pastebin.com/0LpyQtU5

Or you can try to believe that I have done all of those and believe what I
say, especially when I say that I don't believe that spending a lot of time
worrying about the original patch/recipe and the GSoC entry. They would all
require a lot of work to make reasonably sane, which is why I wrote the
minimal recipe above.

The implementation doesn't look reliable:
>
>   def get_conn_maxsize(self, which, maxsize):
>     # Not 100% certain if I get how this works yet.
>     if maxsize is None:
>       maxsize = 1024
>     ...
>
> This constant 1024 looks arbitrary. On UNIX, a write into a pipe may
> block with less bytes (512 bytes).
>

Testing now I seem to be able to send non-reading subprocesses somewhat
arbitrary amounts of data without leading to a block. But I can't test all
Linux installations or verify that I'm correct. But whether or not this
makes sense is moot, as I don't think it should be merged, and I don't
believe anyone thinks it should be merged at this point.

asyncio has a completly different design. On Windows, it uses
> overlapped operations with IOCP event loop. Such operation can be
> cancelled. Windows cares of the buffering. On UNIX, non-blocking mode
> is used with select() (or something faster like epoll) and asyncio
> retries to write more data when the pipe (or any file descriptor used
> for process stdin/stdoud/stderr) becomes ready (for reading/writing).
>
> asyncio design is more reliable and portable.
>

More reliable, sure. More portable... only because all of the portability
heavy lifting has been done and included in Python core. That's one other
thing that you aren't understanding - the purpose of trying to have this in
the standard library is so that people can use the functionality (async
subprocesses) on multiple platforms without needing to write it themselves
(poorly), ask on forums of one kind or another, copy and paste from some
recipe posted to the internet, etc. It's a strict increase in the
functionality and usefulness of the Python standard library and has
literally zero backwards compatibility issues.

This is the absolute minimum functionality necessary to make people who
need this functionality happy. No, really. Absolute minimum. Sort of what
asyncore was - the minimum functionality necessary to have async sockets in
Python. Was it dirty? Sure. Was it difficult to use? Some people had
issues. Did it work? It worked well enough that people were making money
building applications based on asyncore (myself included 10 years ago).

I don't see how you can implement asynchronous communication with a
> subprocess without the complex machinery of an event loop.
>

Words can have multiple meanings. The meaning of "async" in this context is
different from what you believe it to mean, which is part of your
confusion. I tried to address this in my last message, but either you
didn't read that part, didn't understand that part, or don't believe what I
wrote. So let me write it again:

In this context, "async subprocesses" means the ability to interactively
interrogate a subprocess without necessarily blocking on input or output.
Everyone posting questions about this on StackOverflow or other forums
understands it this way. It *does not mean* that it needs to participate in
an event loop, needs to be usable with asyncore, asyncio, Twisted,
greenlets, gevent, or otherwise.

If there is one thing that *I* need for you (and everyone else) to
understand and believe in this conversation, it is the above. Do you? Yes?
Okay. Now read everything that I've written again. No? Can you explain
*why* you don't believe or understand me?


> The API above can be very awkward (as shown :P ), but that's okay. From
> > those building blocks a (minimally) enterprising user would add
> > functionality to suit their needs. The existing subprocess module only
> > offers two methods for *any* amount of communication over pipes with the
> > subprocess: check_output() and communicate(), only the latter of which
> > supports sending data (once, limited by system-level pipe buffer
> lengths).
>
> As I wrote, it's complex to handle non-blocking file descriptors. You
> have to catch EWOULDBLOCK and retries later when the file descriptor
> becomes ready. The main thread has to watch for such event on the file
> descriptor, or you need a dedicated thread. By the way,
> subprocess.communicate() is currently implemented using threads on
> Windows.
>

I know what it takes, I've been writing async sockets for 12 years. I used
to maintain asyncore/asynchat and related libraries. Actually, you can
thank me for asyncore existing in Python 2.6+ (Giampaolo has done a great
job and kept asyncore alive after I stopped participating daily python-dev
about 5 years ago, and I can't thank him enough for that).

But to the point: stop bagging on the old patches. No one likes them. We
all agree. The question is where do we go from here.

> Neither allow for nontrivial interactions from a single subprocess.Popen()
> > invocation. The purpose was to be able to communicate in a bidirectional
> > manner with a subprocess without blocking, or practically speaking,
> blocking
> > with a timeout. That's where the "async" term comes from.
>
> I call this "non-blocking functions", not "async functions".
>
> It's quite simple to check if a read will block on not on UNIX. It's
> more complex to implement it on Windows. And even more complex to
> handle to add a buffer to write().
>

Okay, call it non-blocking subprocess reads and writes. Whatever you want
to call it. And yes, I know what it takes to read and write on Windows...
I've done it 3 times now (the original recipe, the original patch, now the
above recipe).

But the other piece is that *this* doesn't necessarily need to be 100%
robust - I'm not even advocating it to be in the Python standard library
anywhere! I've given up on that. But a short example hanging out in the
docs? Someone will use it. Someone will run into issues. They will add
robustness. They will add functionality. And it will grow into something
worth using before being posted to the cheeseshop.

The status quo is that people don't get answers anywhere in the Python docs
or the Python stdlib. Python core is noticeably absent in a source of
information about how someone would go about using the subprocess module in
a completely reasonable and sane manner.

> Your next questions will be: But why bother at all? Why not just build the
> > piece you need *inside* asyncio? Why does this need anything more? The
> > answer to those questions are wants and needs. If I'm a user that needs
> > interactive subprocess handling, I want to be able to do something like
> the
> > code snippet above. The last thing I need is to have to rewrite the way
> my
> > application/script/whatever handles *everything* just because a new
> > asynchronous IO library has been included in the Python standard library
> -
> > it's a bit like selling you a $300 bicycle when you need a $20 wheel for
> > your scooter.
>
> You don't have to rewrite your whole application. If you only want to
> use asyncio event loop in a single function, you can use
> loop.run_until_complete(do_login) which blocks until the function
> completes. The "function" is an asynchronous coroutine in fact.
>

The point of this conversation is that I was offering to write the handful
of wrappers that would make interactions of the form that I showed earlier
easy and possible with asyncio. So that a user didn't have to write them
themselves.

[snip]

Even if eval_python_async() is asynchronous, eval_python() function is
> blocking so you can write: print("1+1 = %r" % eval_python("1+1"))
> without callback nor "yield from".
>
> Running tasks in parallel is faster than running them in sequence
> (almost 5 times faster on my PC).
>

This is completely unrelated to the conversation.

The syntax in eval_python_async() is close to the API you proposed,
> except that you have to add "yield from" in front of "blocking"
> functions like read() or drain() (it's the function to flush the stdin
> buffer, I'm not sure that it is needed in this example).
>
> The timeout is on the whole eval_python_async(), but you can as well
> using finer timeout on each read/write.
>
> > But here's the thing: I can build enough using asyncio in 30-40 lines of
> > Python to offer something like the above API. The problem is that it
> really
> > has no natural home.
>
> I agree that writing explicit asynchronous code is more complex than
> using eventlet. Asynchronous programming is hard.
>

No, it's not hard. It just requires thinking in a different way. It's the
thinking in a different way that's difficult. But I've been doing async
sockets programming on and off for 13 years now, so I get it. What I'm
offering is to help people *not* do that, because some people have
difficulty thinking in that way.

> But in the docs? It would show an atypical, but not
> > wholly unreasonable use of asyncio (the existing example already shows
> what
> > I would consider to be an atypical use of asyncio).
>
> The asyncio documentation is still a work-in-progress. I tried to
> document all APIs, but there are too few examples and the
> documentation is still focused on the API instead of being oriented to
> the user of the API.
>
> Don't hesitate to contribute to the documentation!
>

So is this the "okay" that I've been waiting with baited breath for?

 - Josiah

We can probably write a simple example showing how to interact with an
> interactive program like Python.
>
> Victor
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20140328/bf5d5f88/attachment.html>