|
Re: Seeing memory corruption, GC moves my objects around: msg#00334lisp.cmucl.devel
Gerd Moellmann wrote on Fri, Aug 29, 2003 at 12:08:28AM +0200: > Martin Cracauer <cracauer@xxxxxxxx> writes: > > > I am on late CMUCL 18 on x86-linux. I have all memory spaces moved > > and resized but I am otherwise not noticeably different from stock > > CMUCL 18. > > How does your CMUCL relate to CVS CMUCL or CMUCL releases? What I run is pretty much the state of the RELEASE_18e branch as of March 27, 2003. None of the new features, including none of your fancier PCL work are in. Local changes are mainly the changes memory space sizes and locations, otherwise it is only cosmetics, except that speed = 3 + safety = 0 still leaves array bounds checking on in my build. > > 1) assume we have wrong type declarations. What would be a scenario > > where we promise to be of one type, put in another and confuse the > > GC? > > I've quite often managed to get there by using (SETF %INSTANCE-REF) or > SVREF or similar things with invalid indices in SAFETY 0 code, thereby > overwriting random stuff beyond an instance or vector or whatever. Well, yes. However, in this case we are in a loop which contains purely readonly code and the data in that hashtable becomes corrupted from one iteration to the next, without ever leaving the loop. Except for the GC, if the GC runs we get the corruption. So I am not creating the damaged data myself, it is the GC not realizing that this area of memory is in use by my program and overwriting the location I point to with other stuff. That is what puzzles me - I had plenty of overwritten memory by direct action of my code, but I never managed to trick the GC into assuming a memory area is unused when it wasn't. > > If we promised something to be a fixnum and put a pointer-containing > > object in it I assume we could do that. > > If "it" is a structure slot with no type declaration as you say, yes. Can you elaborate a little? Why do you say the slot wouldn't have to have a type declaration? I was rather thinking that violating an existing declaration would cause this trouble. > In the cases I encountered, compiling with SAFETY 3 caught the error > I was making. I tested a lot in different safety settings. Safe mode does not come up with this memory corruption nor with any error. I can only get reproduce this in speed = 3, safety = 0 (where array bounds checking is still on). That doesn't mean the memory corruption isn't there in safe mode, but I cannot trigger it. In addition, CMUCL SAFETY=3 mode is not completely safe, you can assign a wrong type object to a structure slot that is declared to be of a different type. So a violated slot type might still be unnoticed in safe-mode. > (If it happens with one particular kernel version only, there is of > course also the possibility that the bug is something completely > different, maybe a kernel bug, configuration bug (libc + kernel), or > even a hardware defect that a changed VM system exposes. Slightly > mismatching libc + kernel on GNU/Linux were my personal favourites > while maintaining Emacs ("oh yes, now that you mention it, I really > forgot to tell that Emacs started to crash after I installed/upgraded > ...". @!$%$%!!) Hardware defect is unlikely, it occurs on several machines. But all machines we have ever seen it on have redhat-7.3, running various kernels of (Redhat patched) 2.4.18-27.7.xsmp - 2.4.18-26.7.xsmp. I don't trust the Redhat kernels further than I can throw them but I am not quick to assign blame to them. If I use a different kernel the bug would be what I believe is only hidden, so I cannot require everybody to change kernels, especially not from Redhat to a stock kernel. Martin -- %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Martin Cracauer <cracauer@xxxxxxxx> http://www.cons.org/cracauer/ No warranty. This email is probably produced by one of my cats stepping on the keys. No, I don't have an infinite number of cats. |
|
| <Prev in Thread] | Current Thread | [Next in Thread> |
|---|---|---|
| Previous by Date: | symbol/package conflicts in code/export.lisp: 00334, Sean Champ |
|---|---|
| Next by Date: | [schamp@xxxxxxxxxxxxxxxxxxxxx: Re: side-effect of (rename-package "LISP" "LISP") [in code/exports.lisp]]: 00334, Sean Champ |
| Previous by Thread: | Re: Seeing memory corruption, GC moves my objects aroundi: 00334, Gerd Moellmann |
| Next by Thread: | Re: Seeing memory corruption, GC moves my objects around: 00334, Mike McDonald |
| Indexes: | [Date] [Thread] [Top] [All Lists] |
| News | FAQ | advertise |