|
osdir.com mailing list archive F.A.Q. -since 2001! |
|
|
|
Subject: Re: issues with buildbot - msg#00019List: python.buildbot.devel
by Date: Prev Next Date Index by Thread: Prev Next Thread Index
> I'm in the process of setting up a new buildbot system for a large
> project. It's a great tool and I've got it mostly running now, but I > thought I'd give you some feedback on the few issues I've noticed. Hey, welcome! > 1. Contrary to what I would have thought, the 'Force Build' doesn't > replace a build that is pending (due to changes with a stable timer), it > instead starts a build of HEAD independently - and then the build of > the changed revision (usually the same) will begin immediately after. Right, the "Force" button is really about "force a brand new build" rather than "accelerate the currently pending build and skip its tree-stable timer". It is most commonly used when something has changed out-of-band: you've just added a new Builder and want to see if it works even though no source code changes have happened, you just installed some library on the buildslave and want to re-run a build that failed because it was missing before, etc. With the new Scheduler design in the upcoming release, it will be easier to implement a "skip the timeout" button (you would make a subclass of Scheduler with a method that takes any pending build and resets its timer down to zero). The "Force" button will still have the same behavior, though: it bypasses the scheduler altogether, injecting a brand new build into the Builder. > This is a bit odd, and also is a problem in my setup where (for various > reasons) I assume the revision each build is based on is newer than the > previous. Just so you know, the internal architecture is designed to make it possible to build arbitrary revisions of the code (specifically the BuildRequest that is handed to the Builder has a "SourceStamp" object, which provides a branch and revision number, and ideally those can change arbitrarily). Someone could force a build of a specific revision, then force a second build of an *earlier* revision, etc. The usual build-upon-Change use case doesn't exercise this functionality, but the 'buildbot try' and build-on-branch features of the upcoming release do. Apart from that, the only problem I see with a buildbot setup where you assume revisions are monotonically increasing is that you wouldn't be able to build the same source code twice, which is kind of a pity. The buildslaves are supposed to behave a lot like a human developer doing checkouts and compiles.. hopefully your human developers aren't limited to one compile per source tree too. > 2. The 'Stop Build' button sends a SIGKILL to the build process. It > really should use SIGTERM so that the process can cleanup any temporary > files, locks or such. It can still be followed by SIGKILL if it fails > to exit in a timely manner. Yeah, I've got a TODO somewhere in the code about doing it this way, but I hadn't yet gotten around to implementing it. I think the "Stop Build" button is under-tested.. current the SIGKILL-sending code is mostly used to implement the "your build process appears to be stuck" timeout, and it seemed unlikely to me that such a wedged build would respond any better to a SIGTERM than a SIGKILL. Also, most of my experience is with using 'make' as the top-level build tool, and it doesn't do anything special with SIGTERM (and can't really pass it on to its children). But yeah, it should be implemented just as you said. > 3. The PBChangeSource doesn't allow you to specify a prefix of more than > one directory - if you do it just treats it as an always failed match. The prefix is a strict string.startswith() comparison (actually comparison and removal). The idea is that your project is paying attention to exactly one source tree, defined by whatever tree you check out from the repository in the first Step of your build. This prefix-stripping serves two purposes. The first is that your source tree may be kept in a larger repository, one that tracks other projects (or other components of whatever you're using the buildbot on), so you might get VC notification for files that are outside the tree of interest. Discarding filenames that don't have the prefix is equivalent to ignoring files outside your tree of interest. The second is that some BuildSteps find it useful to have a list of exactly which files were changed for any particular build. (this is stored as the .files list in a Change object). For example, step_twisted.Trial (which runs a unit-test utility) can be set up in a mode which only runs unit tests for the files that were changed in this build. (specifically, you can put special test-case-name tags in python source files, and /usr/bin/trial can be told to look for these tags and only run the unit tests named by them. This is handy for configuring "quick" Builders that provide immediate feedback on the stuff that was just modified, but you usually follow it up with a full test suite run a couple of minutes later). But, for this list of changed file names to be useful, it needs to be relative to your source tree, not relative to some remote VC repository's internal directory structure. The prefix is stripped from the filenames to make sure that changes[0].files[0] is a valid relative path from the top of the tree that you've checked out to the file that was just changed. So, changing PBChangeSource (or any of the change sources) to accept a list of prefixes would invalidate the second purpose. Unless you've got a source checkout step that does multiple checkouts and somehow merges them into the same directory, I think you would wind up with filenames that don't line up with anything in the builder's local tree. What's the use case for multiple prefixes? Or, having just written all that, I think I may have misunderstood you. When you say "more than one directory", do you mean "foo/bar/baz" as opposed to just "foo" ? In that case, it's just a bug. If you could provide me with an example of a working and a non-working case, I'll write up a test case and fix it. Oops. > 4. It seems that the output to the build step logs is overly buffered, > making it impossible to watch the build process output in anything close > to real time. Hm, my experience has been that it isn't delayed by more than a few seconds. The chain of pipes looks like: child process writes to stdout/stderr -> maybe a libc FILE buffer -> -> pipe -> buildslave reads from pipe, immediately packages output and sends to master -> TCP socket -> master receives output, appends to logfile, publishes to status targets -> TCP socket (HTTP) -> web browser appends text to the bottom of the page The "Nagle" algorithm in TCP will delay small amounts of data for small amounts of time (I want to say that 100ms is common) in the hopes of sending out one big packet instead of several small ones, but I doubt that's the issue here. Unix pipes typically have some buffering (4k at most), but in my experience they are usually flush-as-soon-as-possible rather than as-late-as-possible. The libc buffered FILE object typically has as-late-as-possible semantics, but most of the test processes I've seen don't wind up with huge latencies because of this (possibly because they use lots of small commands instead of one command that produces huge amounts of output). It's possible that your build process has a couple layers of child processes, each doing their own buffering, which might cause the kinds of delays you're talking about. How bad is it? To track this one down, I would recommend adding some log.msg() calls in buildbot.slave.command.ShellCommandPP.outReceived, which is called each time the buildslave receives some stdout from the process it has just spawned. Something like: import time log.msg("got stdout %d bytes time %s" % (len(data), time.time())) If this is reporting frequent small updates, then the problem is somewhere in buildbot. If it is reporting infrequent large updates, then the buffering is happening somewhere in the child process. I don't remember if the pipes that are created to the child process have their buffering flag turned off or not. It's a tradeoff between immediacy and efficiency, of course, so it may default to the more-efficient True setting. If you're interested, they get created in twisted.internet.process.Process (just search for calls to os.pipe). This might be related, but even with the default setting I see stdout messages coming from a single 20-minute Trial process showing up every second or two. > 5. The ShellCommand build step doesn't allow you to set the description > via the init call parameters. Maybe my python is bad, but I'm not sure > how to set it aside from here, since the steps are actually in the tuple > of s(step.ShellCommand, arg = x, arg2 = y, ...) - so there's no actually > instance of the object to set the description attribute on. (In my > configuration I've created a subclass of ShellCommand that does > understand a description argument to init and sets the attribute from > there). I think you can use s(step.ShellCommand, name="my description") to set it. Most of the useful attributes of a BuildStep subclass will be copied from the kwargs you pass to __init__ (see BuildStep.__init__ where it loops through the names listed in BuildStep.parms). Yeah, the BuildStep instances don't exist until the Build actually starts, because each Build gets a separate copy of each BuildStep. (there is state kept in the BuildStep instance that is specific to a particular build, so they can't be shared or pre-created in the config file). > 6. (wishlist) In my environment I actually have only one supported > platform and hence only a single builder. But I do have multiple build > trees of the same code (different released versions that are still > actively maintained). It would be nice if the same master could manage > all the code branches (since the build process is identical). It's your lucky day! :). The next release will include the build-on-branch feature we've been talking about for the last few months. Take a look at the user's manual on the web site (the CVS HEAD one, not the 0.6.6 version) and see if the functionality described in there will meet your needs. If not, let me know, I need more use cases. > It would also be nice if the WaterFall display could show the status of all > the different build trees in the one page, rather than having to create a > separate html.WaterFall object using a different port for each. If I understand you correctly, then I think the build-on-branch feature will accomodate this desire (but I may not quite understand what you want to do). In the next release, the main Waterfall display will have one column per builder, just as we've got now, and you can either have each Builder handle multiple branches (so the builds for those branches would be interleaved in a single column), or you could create multiple Builders (each with the same BuildFactory) and assign one branch per Scheduler and one Scheduler per Builder (so the branches would be built in parallel). Hmm, that description wasn't very clear. I'll make a note to try and write up some use cases in the documentation. But, in short, the next release will let you build multiple branches in a single buildmaster, and thus display all their status on the same page. > 7. (wishlist) This may be already possible, but it would be nice if the > builder could have access to the build output files easily. My build > steps include archiving successful builds and making them available > to users, and it would be nice to include the build logs in the archive. In the current code, each BuildStep has access to the logfiles that were generated as it runs (see ShellCommand.createSummary). In addition, each StatusTarget (like the Waterfall page, or the IRC bot) can get access to each LogFile (see buildbot.status.mail.MailNotifier.buildMessage, at the end where it uses build.getLogs() and log.getText() ). The idea is that the BuildStep is responsible for process-specific things, like creating filtered versions of the main log file (just the warnings, just the errors, or parsing pass/fail tests counts from the output). The StatusTargets are responsible for distributing the logs somewhere. Long-term archiving should be implemented in a new StatusTarget, which can just pull all the logfiles from the IBuildStatus object (along with build results, the SourceStamp, etc) and stash them somewhere. I'm working on a diagram of how Build/BuildStep/BuildStatus/BuildStepStatus instances are related, but in general it is always possible to get from the buildbot.process -side object (like Build or BuildStep) to the buildbot.status -side object (like BuildStatus or BuildStepStatus). (it is *not* possible to go in the other direction.. one reason is that the status objects are persisted, so they can't hold references to things that shouldn't be persisted.. the other reason is that status objects are supposed to passively accept data from the build process rather than influence the build, so maintaining a unidirectional connection makes everything cleaner). So, I'd recommend doing archiving in a StatusTarget, but if you really need access to the logs from the BuildStep or the Build (or even the Builder, but that would get kind of ugly), you can do it. hope that helps.. feel free to describe your use cases a bit more, I'll do what I can do accomodate them in the code. cheers, -Brian ------------------------------------------------------- SF.Net email is Sponsored by the Better Software Conference & EXPO September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf
Thread at a glance:
Previous Message by Date:issues with buildbotHi all, I'm in the process of setting up a new buildbot system for a large project. It's a great tool and I've got it mostly running now, but I thought I'd give you some feedback on the few issues I've noticed. 1. Contrary to what I would have thought, the 'Force Build' doesn't replace a build that is pending (due to changes with a stable timer), it instead starts a build of HEAD independently - and then the build of the changed revision (usually the same) will begin immediately after. This is a bit odd, and also is a problem in my setup where (for various reasons) I assume the revision each build is based on is newer than the previous. 2. The 'Stop Build' button sends a SIGKILL to the build process. It really should use SIGTERM so that the process can cleanup any temporary files, locks or such. It can still be followed by SIGKILL if it fails to exit in a timely manner. 3. The PBChangeSource doesn't allow you to specify a prefix of more than one directory - if you do it just treats it as an always failed match. 4. It seems that the output to the build step logs is overly buffered, making it impossible to watch the build process output in anything close to real time. 5. The ShellCommand build step doesn't allow you to set the description via the init call parameters. Maybe my python is bad, but I'm not sure how to set it aside from here, since the steps are actually in the tuple of s(step.ShellCommand, arg = x, arg2 = y, ...) - so there's no actually instance of the object to set the description attribute on. (In my configuration I've created a subclass of ShellCommand that does understand a description argument to init and sets the attribute from there). 6. (wishlist) In my environment I actually have only one supported platform and hence only a single builder. But I do have multiple build trees of the same code (different released versions that are still actively maintained). It would be nice if the same master could manage all the code branches (since the build process is identical). It would also be nice if the WaterFall display could show the status of all the different build trees in the one page, rather than having to create a separate html.WaterFall object using a different port for each. 7. (wishlist) This may be already possible, but it would be nice if the builder could have access to the build output files easily. My build steps include archiving successful builds and making them available to users, and it would be nice to include the build logs in the archive. cheers, chris ------------------------------------------------------- SF.Net email is Sponsored by the Better Software Conference & EXPO September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf Next Message by Date:Re: issues with buildbotThanks for the great response Brian! A few more comments inline... On Tue, Aug 30, 2005 at 12:49:12PM -0700, Brian Warner wrote: <snip> > Apart from that, the only problem I see with a buildbot setup where you > assume revisions are monotonically increasing is that you wouldn't be able to > build the same source code twice, which is kind of a pity. The buildslaves > are supposed to behave a lot like a human developer doing checkouts and > compiles.. hopefully your human developers aren't limited to one compile per > source tree too. No - they're not. I probably should describe the use case a bit more, and you can tell me how what I'm doing is evil and wrong ;-) Basically I want builds to be produced and made available for developers and for QA, and also for eventual selection of a final release. All these builds come out of the build system, so there is full accountability of the build. My build steps do the following: 1. "Clobber & check out" 2. "Calculate next version". Our code uses a versioning scheme of: major.minor-buildnumber The major and minor represent the usual, and the build number is incremented for every sucessfull build of the tree. This version number is stored in the source tree and included in the built product so that it can be determined which exact build a system is running. For builds done outside the build system, there is also a '-dev' postfix added to the version included in the built product to indicate that it did not come from the build system and, whilst it is based on similar code to the official build with that version, may also contain additional changes not under revision control. So this step reads the current version file, calculates the next build number, and rewrites the file. 3. "Build & test". If this fails, the entire build is stopped at this point. 4. "Commit the new version". This does a subversion commit of the new version file. 5. "Tag source". This does a subversion copy operation to create a copy that exactly captures the code used in the build, and puts it under "project/tags/builds/<version>". 6. "Copy to archive". The build is copied to a build archive (which is available via HTTP, ftp, etc) for use by all the different groups. This includes the final product image and a changelog. (It would be nice to also include the build logs and the list of changes that caused the build). Most of these steps are implemented using ShellCommand objects to run the various scripts required. My problem, of course, stems from the fact that I modify external resources (the repository and the build archive) as a side effect of the build process. Which causes two major issues: the system doesn't support building 'out-of-order', as then it will be unable to modify and commit the version file (since it's not the HEAD version), and it cannot re-build the same version for the same reason and because the build will already be tagged and exist in the archive. <snip> > Or, having just written all that, I think I may have misunderstood you. When > you say "more than one directory", do you mean "foo/bar/baz" as opposed to > just "foo" ? In that case, it's just a bug. If you could provide me with an > example of a working and a non-working case, I'll write up a test case and > fix it. Oops. Haha. You gave a very nice overview, but yeah - I meant a prefix like "project/trunk". I just did a look, and the exact issue is here: changes/pb.py: def perspective_addChange(self, changedict): ... if self.prefix: bits = path.split(self.sep) log.msg("bits[0]: "+bits[0]); if bits[0] == self.prefix: if bits[1:]: path = self.sep.join(bits[1:]) else: path = '' else: break .... So it's only checking the first part of the split path against the prefix. I can supply a patch if you want, but I'll have to go read up on my python first. I imaging you can either just do a string match to see if the path begins with the prefix (before splitting) or split both and iterate through the prefix bits, comparing each bit to the equivalent path bit. <snip> > I think you can use s(step.ShellCommand, name="my description") to set it. > Most of the useful attributes of a BuildStep subclass will be copied from the > kwargs you pass to __init__ (see BuildStep.__init__ where it loops through > the names listed in BuildStep.parms). The name and the description seem to get used differently. The one I was interested in was the string that gets used in the waterfall display to show what it's currently up to. Having a step that read like "install -d /opt/builds/blah && copy XYZ ..." wasn't as nice as "copying to build archive" ;-) > It's your lucky day! :). The next release will include the build-on-branch > feature we've been talking about for the last few months. Take a look at the > user's manual on the web site (the CVS HEAD one, not the 0.6.6 version) and > see if the functionality described in there will meet your needs. If not, let > me know, I need more use cases. I'll go have a read, but so far it sounds great! <snip> > So, I'd recommend doing archiving in a StatusTarget, but if you really need > access to the logs from the BuildStep or the Build (or even the Builder, but > that would get kind of ugly), you can do it. I'll have to have a bit of a think about how this all works before I can comment. But would the StatusTarget also have access to the files built on the builder? Thanks again for your great work! I'll have to really start learning python so I can be a bit more detailed in future. I played a bit with it a while ago, but wasn't impressed and just stuck to perl. But people keep telling me it's a lot better now. I'll also do some more investigation on the buffering issue and get back to you with more detail. cheers, chris ------------------------------------------------------- SF.Net email is Sponsored by the Better Software Conference & EXPO September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf Previous Message by Thread:issues with buildbotHi all, I'm in the process of setting up a new buildbot system for a large project. It's a great tool and I've got it mostly running now, but I thought I'd give you some feedback on the few issues I've noticed. 1. Contrary to what I would have thought, the 'Force Build' doesn't replace a build that is pending (due to changes with a stable timer), it instead starts a build of HEAD independently - and then the build of the changed revision (usually the same) will begin immediately after. This is a bit odd, and also is a problem in my setup where (for various reasons) I assume the revision each build is based on is newer than the previous. 2. The 'Stop Build' button sends a SIGKILL to the build process. It really should use SIGTERM so that the process can cleanup any temporary files, locks or such. It can still be followed by SIGKILL if it fails to exit in a timely manner. 3. The PBChangeSource doesn't allow you to specify a prefix of more than one directory - if you do it just treats it as an always failed match. 4. It seems that the output to the build step logs is overly buffered, making it impossible to watch the build process output in anything close to real time. 5. The ShellCommand build step doesn't allow you to set the description via the init call parameters. Maybe my python is bad, but I'm not sure how to set it aside from here, since the steps are actually in the tuple of s(step.ShellCommand, arg = x, arg2 = y, ...) - so there's no actually instance of the object to set the description attribute on. (In my configuration I've created a subclass of ShellCommand that does understand a description argument to init and sets the attribute from there). 6. (wishlist) In my environment I actually have only one supported platform and hence only a single builder. But I do have multiple build trees of the same code (different released versions that are still actively maintained). It would be nice if the same master could manage all the code branches (since the build process is identical). It would also be nice if the WaterFall display could show the status of all the different build trees in the one page, rather than having to create a separate html.WaterFall object using a different port for each. 7. (wishlist) This may be already possible, but it would be nice if the builder could have access to the build output files easily. My build steps include archiving successful builds and making them available to users, and it would be nice to include the build logs in the archive. cheers, chris ------------------------------------------------------- SF.Net email is Sponsored by the Better Software Conference & EXPO September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf Next Message by Thread:Re: issues with buildbotThanks for the great response Brian! A few more comments inline... On Tue, Aug 30, 2005 at 12:49:12PM -0700, Brian Warner wrote: <snip> > Apart from that, the only problem I see with a buildbot setup where you > assume revisions are monotonically increasing is that you wouldn't be able to > build the same source code twice, which is kind of a pity. The buildslaves > are supposed to behave a lot like a human developer doing checkouts and > compiles.. hopefully your human developers aren't limited to one compile per > source tree too. No - they're not. I probably should describe the use case a bit more, and you can tell me how what I'm doing is evil and wrong ;-) Basically I want builds to be produced and made available for developers and for QA, and also for eventual selection of a final release. All these builds come out of the build system, so there is full accountability of the build. My build steps do the following: 1. "Clobber & check out" 2. "Calculate next version". Our code uses a versioning scheme of: major.minor-buildnumber The major and minor represent the usual, and the build number is incremented for every sucessfull build of the tree. This version number is stored in the source tree and included in the built product so that it can be determined which exact build a system is running. For builds done outside the build system, there is also a '-dev' postfix added to the version included in the built product to indicate that it did not come from the build system and, whilst it is based on similar code to the official build with that version, may also contain additional changes not under revision control. So this step reads the current version file, calculates the next build number, and rewrites the file. 3. "Build & test". If this fails, the entire build is stopped at this point. 4. "Commit the new version". This does a subversion commit of the new version file. 5. "Tag source". This does a subversion copy operation to create a copy that exactly captures the code used in the build, and puts it under "project/tags/builds/<version>". 6. "Copy to archive". The build is copied to a build archive (which is available via HTTP, ftp, etc) for use by all the different groups. This includes the final product image and a changelog. (It would be nice to also include the build logs and the list of changes that caused the build). Most of these steps are implemented using ShellCommand objects to run the various scripts required. My problem, of course, stems from the fact that I modify external resources (the repository and the build archive) as a side effect of the build process. Which causes two major issues: the system doesn't support building 'out-of-order', as then it will be unable to modify and commit the version file (since it's not the HEAD version), and it cannot re-build the same version for the same reason and because the build will already be tagged and exist in the archive. <snip> > Or, having just written all that, I think I may have misunderstood you. When > you say "more than one directory", do you mean "foo/bar/baz" as opposed to > just "foo" ? In that case, it's just a bug. If you could provide me with an > example of a working and a non-working case, I'll write up a test case and > fix it. Oops. Haha. You gave a very nice overview, but yeah - I meant a prefix like "project/trunk". I just did a look, and the exact issue is here: changes/pb.py: def perspective_addChange(self, changedict): ... if self.prefix: bits = path.split(self.sep) log.msg("bits[0]: "+bits[0]); if bits[0] == self.prefix: if bits[1:]: path = self.sep.join(bits[1:]) else: path = '' else: break .... So it's only checking the first part of the split path against the prefix. I can supply a patch if you want, but I'll have to go read up on my python first. I imaging you can either just do a string match to see if the path begins with the prefix (before splitting) or split both and iterate through the prefix bits, comparing each bit to the equivalent path bit. <snip> > I think you can use s(step.ShellCommand, name="my description") to set it. > Most of the useful attributes of a BuildStep subclass will be copied from the > kwargs you pass to __init__ (see BuildStep.__init__ where it loops through > the names listed in BuildStep.parms). The name and the description seem to get used differently. The one I was interested in was the string that gets used in the waterfall display to show what it's currently up to. Having a step that read like "install -d /opt/builds/blah && copy XYZ ..." wasn't as nice as "copying to build archive" ;-) > It's your lucky day! :). The next release will include the build-on-branch > feature we've been talking about for the last few months. Take a look at the > user's manual on the web site (the CVS HEAD one, not the 0.6.6 version) and > see if the functionality described in there will meet your needs. If not, let > me know, I need more use cases. I'll go have a read, but so far it sounds great! <snip> > So, I'd recommend doing archiving in a StatusTarget, but if you really need > access to the logs from the BuildStep or the Build (or even the Builder, but > that would get kind of ugly), you can do it. I'll have to have a bit of a think about how this all works before I can comment. But would the StatusTarget also have access to the files built on the builder? Thanks again for your great work! I'll have to really start learning python so I can be a bit more detailed in future. I played a bit with it a while ago, but wasn't impressed and just stuck to perl. But people keep telling me it's a lot better now. I'll also do some more investigation on the buffering issue and get back to you with more detail. cheers, chris ------------------------------------------------------- SF.Net email is Sponsored by the Better Software Conference & EXPO September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf
blog comments powered by Disqus
|
|