|
|
Subject: Couple of Emacs hacks - msg#00302
List: lisp.cmucl.devel
Ahoy again,
Just wanted to send pointers to a couple of Emacs hacks.
The first is a debugger written by Antonio Menezes Leitao and posted
to this list in January. Eric Marsden just told me about it today, and
I must say it's pretty amazing. My only contribution here is a
screenshot, in case other people didn't realise what it does:
http://www.bluetail.com/~luke/misc/lisp/debug-shot.png
The original mail, including the code, is here:
http://article.gmane.org/gmane.lisp.cmucl.devel/1478/match=antonio
The other is a previously unreleased program by Eric Marsden that I've
been hacking on recently: the Superior Lisp Interaction Mode, Extended
("slime"). This is essentially a CMUCL-specific clone of 'ilisp', but
with WIRE-based interaction between Emacs and CMUCL -- perhaps a
"Hemlock for Emacs". It does a couple things that Ilisp does:
completion on symbols and showing arglists. But the interested feature
is collecting compiler notes from CMUCL and presenting them as
annotations in Emacs buffers. This screenshot demonstrates:
http://www.bluetail.com/~luke/misc/lisp/slime-shot.png
Slime has too few features to be a whizzy-bang development environment
just yet. But it's easy to setup (provided you have a recent CMUCL
snapshot), and should work fine with GNU Emacs 21 or XEmacs
21. Comments and suggestions welcome!
Cheers,
Luke
Was this page helpful?
Thread at a glance:
Previous Message by Date:
click to view message preview
[HEADS UP] Byte-code interpreter
I've tuned the byte-code interpreter today, so that running ANSI tests
byte-compiled takes a little less time. It passes the tests, so I'm
quite confident it works, but please watch out for problems.
Next Message by Date:
click to view message preview
gencgc performance and soft-real time
Ahoy!
Gencgc doesn't run as fast as it should!
In particular it seems to always take a long time to do a garbage
collection, which is no fun when I'm trying to write a program with
consistently good (<1ms) response times. I am particularly interested
in the case of programs that generated a lot of short-lived data, and
thus frequently GC a nursery with very little live data (because
that's how I write programs :-)). The gencgc design _should_ be
excellent for this as far as I can see.
Hopefully this is the smallest useful demonstration of the issue with
CMUCL 18e on Linux x86:
* (progn (gc :full t) (time (gc :gen 0)))
; Compiling LAMBDA NIL:
; Compiling Top-Level Form:
; Evaluation took:
; 0.02 seconds of real time
; 0.017998 seconds of user run time
; 0.0 seconds of system run time
; 30,291,620 CPU cycles
; [Run times include 0.02 seconds GC run time]
; 0 page faults and
; 128 bytes consed.
;
NIL
Twenty milliseconds go GC an empty nursery on my 1.7Ghz machine!
Surely this should take less than one millisecond.
Not to suggest that collecting an empty generation is that
interesting, but I'm thinking this is a lower-bound on GC times, and
that collecting a generation with virtually no live data should be
ultra-fast by gencgc's design.
That's the "problem statement".
A few weeks ago I was looking in to this and trying to figure out why
it takes so long to GC a generation with virtually no live data, with
the help of people on IRC. I was not CMUCL-hacker enough to make a
fix, but in case someone else wants to persue this here's how it
looked to me:
First off, recompiling the 'lisp' program with gcc -O3 halved the GC
time. But it still seems an order of magnitude more than necessary.
Then I tried using the 'oprofile' profiler to figure out where the
time was being spent. I profiled while doing (gc :gen 0) in a loop,
and here were the results (ignoring functions that used < 1% CPU):
vma samples % symbol name image name
08058744 155 1.49196 scav_fdefn /tmp/cmu18e/bin/lisp
08058624 265 2.55077 scav_immediate /tmp/cmu18e/bin/lisp
08058674 376 3.61921 scav_boxed /tmp/cmu18e/bin/lisp
0805d000 697 6.70902 update_x86_dynamic_space_free_pointer
/tmp/cmu18e/bin/lisp
0805e2cc 2414 23.2361 from_space_p /tmp/cmu18e/bin/lisp
080575fc 6351 61.132 scavenge /tmp/cmu18e/bin/lisp
So the main hotspot is the 'scavenge' function -- and I believe
'scavenge' is the one calling from_space_p, though you can't see this
directly from the profile.
What 'scavenge' does is update pointers to objects that were moved
during GC. To do this it rummages through and updates objects in
generations other than the one that was collected and objects in
"static space". There is an optimization here: a "write-barrier" is
used to keep track of objects in other generations that have not been
modified since the last GC. Those objects can be skipped when
scavenging, because they can't point to freshly moved object.
The trouble is that the write-barrier optimization is only used for
other generations, and not objects in static space -- so all static
objects are scavenged every time. This is typically several megabytes
of data. This is where the GC is spending all that time!
That's my theory anyway. I've checked by getrusage+printf and by
observing the effect of scavenging static space multiple times per GC,
and both showed that scavenging static space was the "hot
spot". Because I'm not actually modifying objects in static space (as
far as I know), it seems likely we could optimize away all this
scavenging and drastically shorten GC times.
My first thought was to put a write barrier over static space. While I
was still struggling to make that work (but never did), Dan Barlow
suggested a more holistic approach -- why do we have so many objects
in static space, and could we not put them into a "tenured" generation
that doesn't get GC'd instead? That would give us a write-barrier for
free.
Alas, my attention span was not great enough to read enough of
gencgc.c and purify.c to understand the full picture, so this is as
far as I've gotten.
Is anyone able to shed some more light?
(Wouldn't it be wonderful if a little optimization rendered the common
wisdom that "if response time matters then you mustn't cons"
obsolete?)
Cheers,
Luke (hoping he hasn't made any really embarrassing errors!)
Previous Message by Thread:
click to view message preview
[HEADS UP] Byte-code interpreter
I've tuned the byte-code interpreter today, so that running ANSI tests
byte-compiled takes a little less time. It passes the tests, so I'm
quite confident it works, but please watch out for problems.
Next Message by Thread:
click to view message preview
Re: Couple of Emacs hacks
Luke Gorrie <luke@xxxxxxxxxxxx> writes:
> Slime has too few features to be a whizzy-bang development environment
> just yet. But it's easy to setup (provided you have a recent CMUCL
> snapshot), and should work fine with GNU Emacs 21 or XEmacs
> 21. Comments and suggestions welcome!
Forgot one minor detail -- the download URL :-)
http://www.bluetail.com/~luke/misc/lisp/slime-0.2.tar.gz
|
|