osdir.com


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Python-Dev] A more flexible task creation


> To be honest, I see "async with" being abused everywhere in asyncio,
> lately.? I like to have objects with start() and stop() methods, but
> everywhere I see async context managers.>
> Fine, add nursery or whatever, but please also have a simple start() /
> stop() public API.
> 
> "async with" is only good for functional programming.? If you want to go
> more of an object-oriented style, you tend to have start() and stop()
> methods in your classes, which will call start() & stop() (or close())
> methods recursively on nested resources.? So of the libraries (aiopg,
> I'm looking at you) don't support start/stop or open/close well.

Wouldn't calling __enter__ and __exit__ manually works for you ? I
started coding begin() and stop(), but I removed them, as I couldn't
find a use case for them.

And what exactly is the use case that doesn't work with `async with` ?
The whole point is to spot the boundaries of the tasks execution easily.
If you start()/stop() randomly, it kinda defeat the purpose.

It's a genuine question though. I can totally accept I overlooked a
valid use case.


> 
> I tend to slightly agree, but OTOH if asyncio had been designed to not
> schedule tasks automatically on __init__ I bet there would have been
> other users complaining that "why didn't task XX run?", or "why do tasks
> need a start() method, that is clunky!".? You can't please everyone...

Well, ensure_future([schedule_immediatly=True]) and
asyncio.create_task([schedule_immediatly=True] would take care of that.
They are the entry point for the task creation and scheduling.

> 
> Also, in
> ?? ? ? ? ? ? task_list = run.all(foo(), foo(), foo())
> 
> As soon as you call?foo(),?you are instantiating a coroutine, which
> consumes memory, while the task may not even be scheduled for a long
> time (if you have 5000 potential tasks but only execute 10 at a time,
> for example).

Yes but this has the benefit of accepting any awaitable, not just
coroutine. You don't have to wonder what to pass, or which form. It's
always the same. Too many APi are hard to understand because you never
know if it accept a callback, a coroutine function, a coroutine, a task,
a future...

For the same reason, request.get() create and destroys a session every
time. It's inefficient, but way easier to understand, and fits the
majority of the use cases.

> 
> But if you do as Yuri suggested, you'll instead accept a function
> reference, foo, which is a singleton, you can have many foo references
> to the function, but they will only create coroutine objects when the
> task is actually about to be scheduled, so it's more efficient in terms
> of memory.

I made some test, and the memory consumption is indeed radically smaller
if you just store references if you just compare it to the same unique
raw coroutine.

However, this is a rare case. It assumes that:

- you have a lot of tasks
- you have a max concurrency
- the max concurrency is very small
- most tasks reuse a similar combination of callables and parameters

It's a very specific narrow case. Also, everything you store on the
scope will be wrapped into a Future object no matter if it's scheduled
or not, so that you can cancel it later. So the scale of the memory
consumption is not as much.

I didn't want to compromise the quality of the current API for the
general case for an edge case optimization.

On the other hand, this is a low hanging fruit and on plateforms such as
raspi where asyncio has a lot to offer, it can make a big difference to
shave up 20 of memory consumption of a specific workload.

So I listened and implemented an escape hatch:

import random
import asyncio

import ayo

async def zzz(seconds):
    await asyncio.sleep(seconds)
    print(f'Slept for {seconds} seconds')


@ayo.run_as_main()
async def main(run_in_top):

    async with ayo.scope(max_concurrency=10) as run:
        for _ in range(10000):
            run.from_callable(zzz, 0.005) # or run.asap(zzz(0.005))

This would only lazily create the awaitable (here the coroutine) on
scheduling. I see a 15% of memory saving for the WHOLE program if using
`from_callable()`.

So definitly a good feature to have, thank you.

But again, and I hope Yuri is reading this because he will implement
that for uvloop, and this will trickles down to asyncio, I think we
should not compromise the main API for this.

asyncio is hard enough to grok, and too many concepts fly around. The
average Python programmer has been experienced way easier things from
past Python encounter.

If we want, one day, that asyncio is consider the clean AND easy way to
do async, we need to work on the API.

asyncio.run() is a step in the right direction (although again I wish we
implemented that 2 years ago when I talked about it instead of telling
me no).

Now if we add nurseries, it should hide the rest of the complexity. Not
add to it.