logo       

Re: Seeing memory corruption, GC moves my objects around: msg#00348

lisp.cmucl.devel

Subject: Re: Seeing memory corruption, GC moves my objects around

I will try some of the suggestions.

I will also try to maintain a seperate list of references to the
objects that are getting destroyed to see whether the GC keeps them
that way (in case there is a problem with that global hash table).

You for reference...

> > Well, yes. However, in this case we are in a loop which contains
> > purely readonly code and the data in that hashtable becomes corrupted
> > from one iteration to the next, without ever leaving the loop. Except
> > for the GC, if the GC runs we get the corruption.
> >
> FWIW, there was a memory corruption problem in matlisp. A few
> iterations of some loop would work ok, and maybe even a few GCs would be
> ok, but eventually cmucl would die with a segfault (or something).
> Eventually, the user isolated the problem to one routine and we figured
> it out. A foreign routine was scribbling past the end of a Lisp array
> because the Lisp array was only half the required size.

There is a lot of C code running, but I don't think it is responsibe
here, because:

- as I said I kind of caught the GC "in the act"

- the objects that are splattered over my poor innocent data are
usually valid, tagged Lisp objects

> > So I am not creating the damaged data myself, it is the GC not
> > realizing that this area of memory is in use by my program and
> > overwriting the location I point to with other stuff.
>
> I am betting that the hash table is already corrupted in some way by the
> time you get to your test code. You've obviously initialized the hash
> tables by the time your code runs, so the damage could have occurred
> then. Somewhat like what happens in C when you corrupt the stack or
> heap. The error might not show up until much later in the program.

No, when I enter this tight readonly loop the hashtable is OK. My
function which checks all entries for validity is running inside that
loop. Usually the first 2 of 6 runs it finds all entries OK and the
3rd run finds one entry corrupt which has been OK in the last
iteration.

But this observation is crappy for now, I don't understand why the GC
is triggered in this loop at all, it shouldn't cons. The disassembly
of the defun is 23 pages so I need some time figuring out what that
loop is actually doing. Since the bug only occurs with safety=0 and
debug=0 even reading the stuff bends my brain >;=P

> I don't envy you and the task ahead. :-(

Actually it's fun for now. If I pull this stunt off I probably earn a
"hero of the <insert employer name here> union" medal or the hacking
equivalent of a Purple Heart :-)

Martin
--
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Martin Cracauer <cracauer@xxxxxxxx> http://www.cons.org/cracauer/
No warranty. This email is probably produced by one of my cats
stepping on the keys. No, I don't have an infinite number of cats.




<Prev in Thread] Current Thread [Next in Thread>
Google Custom Search

News | FAQ | advertise