|
|
Subject: Re: Data file corruption and recovery - msg#00080
List: web.zope.zodb
Was this page helpful?
Thread at a glance:
Previous Message by Date:
click to view message preview
Re: Data file corruption and recovery
Jeremy,
I never got the zeo server to tell me anything (I started it in debug
mode to make sure). Now I wasn't in a very patient mood so I didn't
wait more than 3 mins or so. I figured that the problem was some sort
of corruption as the older file loaded with no problem. I don't see any
.trN file but maybe this is due to my impatience. I have the original
file but it is fairly big (67MB when gziped). fsrecvoer didn't say
anything other than that 0 data was lost during recovery. As Toby
suggests it may be possible that the machine was writing bad data for a
while as a result of the cpu. Is it possible that there was no problem
and that I should have been more patient with the startup time of zeo?
-EAD
Jeremy Hylton wrote:
On Thu, 2003-02-13 at 09:33, Erik Dahl wrote:
Yesterday I had a cpu failure on a box that caused the sudden reboot of
a zeo server. When the service was brought up on the other side of the
cluster it didn't start. I figured this was due to data corruption and
when using a backup the server started fine. The problem was the backup
was a little stale so I wanted to try recovering the corrupt file. I
found two methods for fixing the file running fsrecover.py or running
tranalyzer.py then using its output to truncate the data file.
fsrecover.py did fix my problem but only after running for around 6
hours and generating no output other than to say that no data was lost.
The tranalyzer method never worked. My questions are:
What happened when you tried to start up the zeo server? I must admit
that I haven't run into this problem in real deployment, and I don't
remember what the storage / server is supposed to say.
Did it create a .trN file? That would indicate it figured out what
transactions to delete.
Do you have the original file? If you run into file storage corruption
problems, it's helpful to us developers if you keep a copy of the
damaged file.
1. how can you figure out what the server is doing when you have a
corrupted file (I tried setting STUPID_LOG_SEVERITY to -300 with no
results).
It did say something, right? Otherwise you would not have known that
the storage was corrupted. There should be some complaint during the
initial startup, but if it starts up successfully I wouldn't expect
further error reports in the log.
2. any idea why taking transactions off the end of the file didn't fix
the problem?
Do you know what changes fsrecover made to fix the problem?
3. would directory storage handle this situation better or do I need to
go to a berkeley db backend?
It's surprising that truncating the file didn't solve the problem.
FileStorage should be pretty robust against these sorts of crashes. It
calls sync() in tpc_finish(), so it's quite unlikely that a reboot would
do anything other than leave incomplete transaction data at the end.
Jeremy
_______________________________________________
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/
ZODB-Dev mailing list - ZODB-Dev@xxxxxxxx
http://mail.zope.org/mailman/listinfo/zodb-dev
Next Message by Date:
click to view message preview
Re: AdaptableStorage and ZClass instances
On Thu, 13 Feb 2003 15:47:38 -0500
Shane Hathaway <shane@xxxxxxxx> wrote:
> However, at configuration time, you have access to the classes of the
> objects to be loaded/stored. During configuration, you could take the
> opportunity to automatically configure mappers and gateways based on
> class inspection.
Can I only access the class at configuration time? Can't one
get access to the class at runtime through the MetaTypeClassifier?
> Application code might look very much like the
> SQLObject example (see Christian's post), except that classes would use
> ZODB.Persistent as a base class (rather than SQLObject) and the same
> code would work in regular ZODB.
Mmm, I don't know if I want to put any storage specific details in my
application code just to make the OR Mapping work. What I think is
genius about AS is that I can keep writing apps in the way I'm used to
and in a way that fits the framework iow I don't have to subclass from
this or that to persist my classes. I'd rather do more work on the side
of the OR mapper and teach it how to persist objects in my problem
domain.
For a start one can definitely have a SQL base class that implements
basic load and store methods which leaves you with only having to
specify sql strings in the subclass, like Rocky suggested.
But then one still has to write SQL for each class that you want to
persist. My idea was to have a default properties gateway that generates
SQL at runtime for all classes that doesn't have a specific gateway.
This seems problematic if one doesn't have access to the object and
since this is a default gateway one wouldn't now anything about the
class either. All one really needs is the classname and the property
types and values on the object. Is this at all possible? The classname
can then be used in the load and store methods of the gateway to query
the right table and property types will be looked up in a typemap for
the RDBMS.
--
Roché Compaan
Upfront Systems http://www.upfrontsystems.co.za
_______________________________________________
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/
ZODB-Dev mailing list - ZODB-Dev@xxxxxxxx
http://mail.zope.org/mailman/listinfo/zodb-dev
Previous Message by Thread:
click to view message preview
Re: Data file corruption and recovery
Jeremy,
I never got the zeo server to tell me anything (I started it in debug
mode to make sure). Now I wasn't in a very patient mood so I didn't
wait more than 3 mins or so. I figured that the problem was some sort
of corruption as the older file loaded with no problem. I don't see any
.trN file but maybe this is due to my impatience. I have the original
file but it is fairly big (67MB when gziped). fsrecvoer didn't say
anything other than that 0 data was lost during recovery. As Toby
suggests it may be possible that the machine was writing bad data for a
while as a result of the cpu. Is it possible that there was no problem
and that I should have been more patient with the startup time of zeo?
-EAD
Jeremy Hylton wrote:
On Thu, 2003-02-13 at 09:33, Erik Dahl wrote:
Yesterday I had a cpu failure on a box that caused the sudden reboot of
a zeo server. When the service was brought up on the other side of the
cluster it didn't start. I figured this was due to data corruption and
when using a backup the server started fine. The problem was the backup
was a little stale so I wanted to try recovering the corrupt file. I
found two methods for fixing the file running fsrecover.py or running
tranalyzer.py then using its output to truncate the data file.
fsrecover.py did fix my problem but only after running for around 6
hours and generating no output other than to say that no data was lost.
The tranalyzer method never worked. My questions are:
What happened when you tried to start up the zeo server? I must admit
that I haven't run into this problem in real deployment, and I don't
remember what the storage / server is supposed to say.
Did it create a .trN file? That would indicate it figured out what
transactions to delete.
Do you have the original file? If you run into file storage corruption
problems, it's helpful to us developers if you keep a copy of the
damaged file.
1. how can you figure out what the server is doing when you have a
corrupted file (I tried setting STUPID_LOG_SEVERITY to -300 with no
results).
It did say something, right? Otherwise you would not have known that
the storage was corrupted. There should be some complaint during the
initial startup, but if it starts up successfully I wouldn't expect
further error reports in the log.
2. any idea why taking transactions off the end of the file didn't fix
the problem?
Do you know what changes fsrecover made to fix the problem?
3. would directory storage handle this situation better or do I need to
go to a berkeley db backend?
It's surprising that truncating the file didn't solve the problem.
FileStorage should be pretty robust against these sorts of crashes. It
calls sync() in tpc_finish(), so it's quite unlikely that a reboot would
do anything other than leave incomplete transaction data at the end.
Jeremy
_______________________________________________
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/
ZODB-Dev mailing list - ZODB-Dev@xxxxxxxx
http://mail.zope.org/mailman/listinfo/zodb-dev
Next Message by Thread:
click to view message preview
Re: Data file corruption and recovery
On Thursday 13 February 2003 2:33 pm, Erik Dahl wrote:
> Yesterday I had a cpu failure on a box that caused the sudden reboot of
> a zeo server.
cpu failure - That sounds bad. How do you know the failure really was sudden,
and that your server hadnt been writing badness into this storage for a while
before the reboot? Did you check the filesystem containing the storage?
> 3. would directory storage handle this situation better
Ive never been able to provoke a failure by pulling the power plug from a
loaded FileStorage.
--
Toby Dickenson
http://www.geminidataloggers.com/people/tdickenson
_______________________________________________
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/
ZODB-Dev mailing list - ZODB-Dev@xxxxxxxx
http://mail.zope.org/mailman/listinfo/zodb-dev
|
|