|
|
Subject: Matthew Quinney is out of the office. - msg#00164
List: linux.rpm.yum
I will be out of the office starting 13/10/2003 and will not return until
20/10/2003.
I will respond to your message when I return.
Was this page helpful?
Thread at a glance:
Previous Message by Date:
click to view message preview
Re: The Future of urlgrabber
On Mon, 2003-10-13 at 09:58, Michael Stenner wrote:
> > Also, it's the same as arguing that you should just suck copies of
> > every library into every application or at the minimum that you
> > should always do static linking.
>
> Slow down there. Every library into every application? Saying you
> should write _a_ program in python is not the same as saying you
> should write EVERY program in python. By no means would a decision to
> write urlgrabber for "slurping" imply that all libraries should be
> written that way.
Yes, I'm somewhat sensationalist -- but being reasonable never seems to
get results ;) Also, everything here is generalizations; there are
cases where what I describe doesn't happen, but they're the exception
rather than the rule. And counting on being the exception tends to be a
losing strategy in my experience.
Unfortunately, it's pretty clear from my watching things that when you
start sucking libraries (which is what urlgrabber is, really), things
are ugly in almost all cases. eg, problems with libegg getting out of
date in various GNOME apps using it and thus causing problems, having to
do upgrades of six programs because of an exploit in one library
(*cough*zlib*cough*), etc. It also tends to encourage a lack of API
stability which is just as painful for users of your library when they
want to upgrade for bugfixes :/
Plus, sucking in copies tends to lead to forks because "well, I've got
the code here, why not make this little change that makes my life
easier". It's far harder to do that when the library is external.
> We now have three alternatives on the table:
> 1) simple external
> + very tidy
> - major constraints on the code (preserve backward compat, etc)
> + apps get automatic bugfixes with urlgrabber upgrade
Well, you obviously start with this in any case.
> 2) parallel external
> - not so tidy
> + fewer constraints (change major num when BC breaks)
> + apps get automatic bugfixes with urlgrabber upgrade
You only do this when you have to. It's not the sort of thing that
should be happening often. It should be planned well in advance and
gone into knowing that you're breaking compatibility against all wants,
hopes and desires :)
> 3) slurped internal
> + very tidy
> + no constraints (apps slurp whenevery they want/can)
> - no automatic bugfixes - must slurp new version
See above :)
Cheers,
Jeremy
Next Message by Date:
click to view message preview
Re: The Future of urlgrabber
On Mon, Oct 13, 2003 at 03:25:19PM -0400, Jeremy Katz wrote:
> Unfortunately, it's pretty clear from my watching things that when you
> start sucking libraries (which is what urlgrabber is, really), things
> are ugly in almost all cases. eg, problems with libegg getting out of
> date in various GNOME apps using it and thus causing problems, having to
> do upgrades of six programs because of an exploit in one library
> (*cough*zlib*cough*), etc. It also tends to encourage a lack of API
> stability which is just as painful for users of your library when they
> want to upgrade for bugfixes :/
>
> Plus, sucking in copies tends to lead to forks because "well, I've got
> the code here, why not make this little change that makes my life
> easier". It's far harder to do that when the library is external.
OK, those are all excellent points. I'm 99% convinced.
> > 2) parallel external
> > - not so tidy
> > + fewer constraints (change major num when BC breaks)
> > + apps get automatic bugfixes with urlgrabber upgrade
>
> You only do this when you have to. It's not the sort of thing that
> should be happening often. It should be planned well in advance and
> gone into knowing that you're breaking compatibility against all wants,
> hopes and desires :)
Just a curiosity. How would this work for a python module? I'm
thinking that urlgrabber will take on a structure like this:
urlgrabber/__init__.py
urlgrabber/<main-file-formerly-"urlgrabber.py">
urlgrabber/keepalive.py
urlgrabber/progress_meter.py
And then if the parallel external route gets taken, it be done as:
urlgrabber2/__init__.py
...
Is that what you would have in mind?
-Michael
--
Michael Stenner Office Phone: 919-660-2513
Duke University, Dept. of Physics mstenner@xxxxxxxxxxxx
Box 90305, Durham N.C. 27708-0305
Previous Message by Thread:
click to view message preview
gpgcheck
When doing 'yum update' yum downloads all the required packages - and
then does the GPG check. If it fails - it gives an error (with the
package name for the failed check) and aborts.
Can this behavior be changed - so that it does the GPG check for all
packages - and gives the complete list of packages that the check
failed on - before aborting?
thanks,
Satish
Next Message by Thread:
click to view message preview
new urlgrabber design
Again, if you don't know what urlgrabber is, you don't need to read
this. I am actively requesting input from Jeremy, Seth, and Icon. I
would love to input from others as well (Ryan?), but these are the
ones that will get he beatings.
Here is the basic design that I have in mind. This (intentionally)
has no mention of internal workings. It only discusses things that
matter to someone that would USE the module. Internal design is
certainly open for discussion, but I only want to talk about it now to
the extent that it affects interface.
-Michael
=======================================================================
MAIN FUNCTIONS:
urlgrab -- Fetch a url and make a local copy. Return the filename
urlopen -- Return a file object for the specified url.
urlread -- Read the specified file into a string and return int.
retrygrab -- Wrapper for urlgrab the retries given certain errors.
retryopen -- Wrapper for urlopen the retries given certain errors.
retryread -- Wrapper for urlread the retries given certain errors.
NOTE: retryopen can't protect you from errors that occur AFTER the
connection is made. It can only retry setting up the connection.
FEATURES:
* identical behavior for http, ftp, and file
Options that change the behavior for one protocol (like
copy_local) are OK as long as they don't affect the other
protocols. However, something like byte-ranges MUST work for
all protocols. These are different because byte-ranges CHANGE
the return value for a given input. copy_local only modifies
the internal behavior.
All options must by syntactically legal for ALL urls. The whole
point is to have the library not care what sort of url is passed
in.
* smart url interpretation
- handle "normal local filenames" also
- handle url-encoded username/password for ftp and http (and file? smb?)
* byte ranges
* reget support
- internally supported via byte ranges
- several reget modes
+ never: always start from the beginning
+ force: always pick up from the end of the local file
+ smart: check timestamps, length, etc.
* throttling
* progress meter
* i18n support (if the calling application provides translations)
* settable User-Agent
* http keepalive (via the keepalive module)
* timestamp preservation
INTERFACE:
I'm considering changing the function interface a little. There are
just getting to be an insane number of options, and I'm not sure how
to deal with it. There is also the issue of passing options through
retry*.
Option 1 (the way it is now, everything is a kwarg)
def urlgrab(url, filename=None, copy_local=0, close_connection=0,
progress_obj=None, throttle=None, bandwidth=None):
def retrygrab(url, filename=None, copy_local=0, close_connection=0,
progress_obj=None, throttle=None, bandwidth=None,
numtries=3, retrycodes=[-1,2,4,5,6,7], checkfunc=None):
This is REALLY ugly and it makes it very hard to cleanly add
options. Specifically, what if someone does:
retrygrab(url, fn, 1, 0, None, None, None, 5) # the last is numtries
and then we later add more options to urlgrab? Sure, it's not
likely, and sure, I put a warning to only use these as kwargs in
the doc, but still. It's very icky. However, it is very clear
and very normal.
Option 2
def urlgrab(url, filename=None, **kwargs):
def retrygrab(url, filename=None, **kwargs):
retrygrab could then strip out the options it cares about and pass
on the rest. This makes the function definition very clean, but
completely useless to look at. The legal args would have to go in
the docs. One of the up-sides is that things could ONLY be called as
keyword args so the ordering is irrelevant.
Option 3
def urlgrab(url, filename=None, options=None):
def retrygrab(url, filename=None, optionsNone):
Same as 2, but instead of calling as:
urlgrab(url, copy_local=1)
it must be
urlgrab(url, options={'copy_local':1})
I don't really like this option. It's just a step on the way to
the next one :)
Option 4
def urlgrab(url, filename=None, options=None):
def retrygrab(url, filename=None, options=None, retry_options=None):
Here, the options arg to retrygrab would get passed through
untouched, and retry_options would be ONLY for options related to
the retry process.
I'm open to other ideas... If I had to pick now, I'd probably go
with (2), but I'm still quite open.
STRUCTURE:
Because urlgrabber already consists of at least two files
(urlgrabber.py and keepalive.py), I'm thinking of making it a
"package" (directory with sub-modules inside). One might argue that
this is the only sane way to go if it's going to be a tidy library.
This will also make life much easier if we need to do "parallel
installs" farther down the road.
Then again, maybe keepalive.py and progress_meter.py should be
separate!
--
Michael Stenner Office Phone: 919-660-2513
Duke University, Dept. of Physics mstenner@xxxxxxxxxxxx
Box 90305, Durham N.C. 27708-0305
|
|