osdir.com
mailing list archive

Subject: Re: [arch] Interpreter vs. JIT for Harmony VM - msg#00099

List: java.harmony.devel

Date: Prev Next Index Thread: Prev Next Index
In my experience GC tuning often makes a larger difference than fully
optimized code generation. Thus anything that doubles our footprint
will probably tend to be perceptively slower in larger systems under
load (these things don't seem to be so perceptable with microbenchmarking).

-andy

Santiago Gala wrote:
El mi??, 21-09-2005 a las 08:29 -0700, will pugh escribi??:

I think having a FastJIT and forgoing the interpreter is a pretty elegant solution, however, there are a few things that may come out of this:

1) Implementing JVMTI will probabaly be more difficult than doing a straight interpreter
2) The FastJIT needs to be Fast! Otherwise, you run the risk of people not wanting to use it for IDEs and Apps because the startup time is too slow.



3) Memory. A typical fast, non opt JIT will generate 10-15 bytes of
machine code *per bytecode*. This means that, say, tomcat plus typical
web applications will generate more than 20Megs of jitted code that will
be executed just a few times. A fast interpreter+optimizing compiler
would achieve similar performance and save most of those 20Megs.

I've seen this going on in my efforts to get jetspeed running on top of
jikesRVM+classpath (which is leading to a series of bug reports to both
projects).

I have it running, in my linux-ppc TiBook, only one problem with
ClassLoader.getResource that is proving difficult to solve is remaining
for a full success. :)

Tomcat+Jetspeed runs (qualitatively) faster using an Optimized JikesRVM
+classpath version in my TiBook than using IBM-jdk-1.4.2, but it
requires 200 M heap, while IBM jdk runs it in 100 Megs. Also, startup
time is about the same or slightly higher, but this is mostly because I
don't opt-compile the optimizing compiler itself to save build time.

Example output from a typical run:

Compilation Subsystem Report
Comp #Meths Time bcb/ms mcb/bcb MCKB BCKB
JNI 35 2.44 NA NA 15.5 NA
Base 26074 8082.06 194.01 10.51 22977.7 2186.7
Opt 722 14685.43 2.46 6.76 226.7 33.5


Regards
Santiago


--Will

Tom Tromey wrote:


"Geir" == Geir Magnusson <geirm@xxxxxxxxxx> writes:





On the other hand, a fast code-generating JIT can call runtime
helpers and native methods without additional glue code whereas an
interpreter has to have special glue code to make it work in a JIT
environment.


Geir> I believe you, but I don't understand this. Can you explain in more
Geir> detail?

It is about handling calling conventions.

There are conceptually (at least) 2 cases to consider when
implementing the java 'invoke' family of opcodes in an interpreter.

In the first case, suppose you're invoking another method that you
know is interpreted. In this case you might simply make a recursive
call to the interpreter function itself, passing in new locals as an
array or something. The interpreter itself might look something like
(I'm just making this up, but it is reasonably close to, e.g., what
libgcj does):

void interpret (jclass declaringClass, jmethodID method,
union jslot *locals)

... where jslot corresponds to a single stack or local variable slot
as discussed in the JVM spec.

So to make your call you would look up the method and pass slots from
your current stack as the 'locals' argument.

(Note that you aren't required to do things this way; in libgcj we
only use the native ABI and we don't special case calls to interpreted
methods at all. We probably pay some performance penalty for
this... though the interpreter is plenty slow on its own :-)


In the second case, you're calling some function that is not an
interpreted function, e.g. a native method. In this case the
underlying function will be using whatever low-level function calling
ABI is defined by the current platform (and implemented in the C
compiler).

There is no standard way in C to make such calls. Instead you end up
having to use something like libffi -- a piece of code that translates

from some array-of-arguments view to the low-level register twiddling

required to make an arbitrary C call.


For a JIT the situation is different. A JIT already understands a lot
about register twiddling. I don't know whether it is common to use
the C ABI when writing a JIT, but in any case it would seem that
putting this in there as well is no big deal. Then instead of
figuring out at call time how to make a given call, you simply
determine it at compile time and generate the appropriate code.




Our experience is that a fast, zero optimizing JIT can yield low-
enough response time. So, I think at least Harmony has the option
of having a decent system without an interpreter. Thoughts?


Geir> Basic thought is yes, I always figured we'd have this pluggable, with
Geir> an interpreter for ease of porting, and then platform-specific JIT.

It seems to me that there's a design question here. For instance, if
you want to eventually take interpreted code and compile it (when it
is "hot"), for full pluggability your JIT(s) and your interpreter need
to agree on some set of bookkeeping details in order to make this
possible. OTOH, you could make other decisions that make this problem
go away, for instance having a single choice of execution engine up
front; so the "fast JIT" and the "optimizing JIT" are just part of the
same code base and only need to talk to each other, and can be built
in an ad hoc way.


Personally I'd be just as happy if we only had a JIT. There are
already plenty of interpreters out there.

Tom





--
Andrew C. Oliver
SuperLink Software, Inc.

Java to Excel using POI
http://www.superlinksoftware.com/services/poi
Commercial support including features added/implemented, bugs fixed.




Was this page helpful?
Yes No
Thread at a glance:

Previous Message by Date: click to view message preview

[Arch] Suggestion to prioritize JVMTI over JVMPI and JVMDI

Hi all, JVMTI is on track to replace the older JVMPI and JVMDI interfaces. J2SE 5 supports JVMTI and JVMPI/JVMDI but future followons to J2SE are expected to remove support for the older interfaces. Tools vendors seem to be in the process of transitioning to the JVMTI interface. It does not really makes sense to invest too much effort in the Harmony project supporting the JVMPI interface. It would be much more effective to invest the effort making the JVMTI implementation more complete so that it includes more of the optional functionality of JVMTI. I suggest that we concentrate our debug/tools interface work in Harmony to making JVMTI work really well and let JVMPI and JVMDI fall away. Regards, Chris Elford Intel Managed Runtime Division

Next Message by Date: click to view message preview

[arch] On finalizer design

It seems tricky to write correct finalizers in Java, but seems not tricky to implement a correct finalizer in JVM. There are some new sections in JLS (or JSR133) on the interactions of finalizer and Java memory model. In my understanding, a correct JVM finalizer can be achieved with following conditions: 1) have finalizer run in a seperate thread; 2) guarantee a global barrier before the finalizer queue is executed (can be implicitly done by GC); and, 3) finish it before next GC cycle (or reachability decision point). How do you folks think? Thanks, xiaofeng == Intel Managed Runtime Division

Previous Message by Thread: click to view message preview

Re: [arch] Interpreter vs. JIT for Harmony VM

El miÃ, 21-09-2005 a las 08:29 -0700, will pugh escribiÃ: > I think having a FastJIT and forgoing the interpreter is a pretty > elegant solution, however, there are a few things that may come out of this: > > 1) Implementing JVMTI will probabaly be more difficult than doing a > straight interpreter > 2) The FastJIT needs to be Fast! Otherwise, you run the risk of > people not wanting to use it for IDEs and Apps because the startup time > is too slow. > 3) Memory. A typical fast, non opt JIT will generate 10-15 bytes of machine code *per bytecode*. This means that, say, tomcat plus typical web applications will generate more than 20Megs of jitted code that will be executed just a few times. A fast interpreter+optimizing compiler would achieve similar performance and save most of those 20Megs. I've seen this going on in my efforts to get jetspeed running on top of jikesRVM+classpath (which is leading to a series of bug reports to both projects). I have it running, in my linux-ppc TiBook, only one problem with ClassLoader.getResource that is proving difficult to solve is remaining for a full success. :) Tomcat+Jetspeed runs (qualitatively) faster using an Optimized JikesRVM +classpath version in my TiBook than using IBM-jdk-1.4.2, but it requires 200 M heap, while IBM jdk runs it in 100 Megs. Also, startup time is about the same or slightly higher, but this is mostly because I don't opt-compile the optimizing compiler itself to save build time. Example output from a typical run: Compilation Subsystem Report Comp #Meths Time bcb/ms mcb/bcb MCKB BCKB JNI 35 2.44 NA NA 15.5 NA Base 26074 8082.06 194.01 10.51 22977.7 2186.7 Opt 722 14685.43 2.46 6.76 226.7 33.5 Regards Santiago > --Will > > Tom Tromey wrote: > > >>>>>>"Geir" == Geir Magnusson <geirm@xxxxxxxxxx> writes: > >>>>>> > >>>>>> > > > > > > > >>>On the other hand, a fast code-generating JIT can call runtime > >>>helpers and native methods without additional glue code whereas an > >>>interpreter has to have special glue code to make it work in a JIT > >>>environment. > >>> > >>> > > > >Geir> I believe you, but I don't understand this. Can you explain in more > >Geir> detail? > > > >It is about handling calling conventions. > > > >There are conceptually (at least) 2 cases to consider when > >implementing the java 'invoke' family of opcodes in an interpreter. > > > >In the first case, suppose you're invoking another method that you > >know is interpreted. In this case you might simply make a recursive > >call to the interpreter function itself, passing in new locals as an > >array or something. The interpreter itself might look something like > >(I'm just making this up, but it is reasonably close to, e.g., what > >libgcj does): > > > > void interpret (jclass declaringClass, jmethodID method, > > union jslot *locals) > > > >... where jslot corresponds to a single stack or local variable slot > >as discussed in the JVM spec. > > > >So to make your call you would look up the method and pass slots from > >your current stack as the 'locals' argument. > > > >(Note that you aren't required to do things this way; in libgcj we > >only use the native ABI and we don't special case calls to interpreted > >methods at all. We probably pay some performance penalty for > >this... though the interpreter is plenty slow on its own :-) > > > > > >In the second case, you're calling some function that is not an > >interpreted function, e.g. a native method. In this case the > >underlying function will be using whatever low-level function calling > >ABI is defined by the current platform (and implemented in the C > >compiler). > > > >There is no standard way in C to make such calls. Instead you end up > >having to use something like libffi -- a piece of code that translates > >from some array-of-arguments view to the low-level register twiddling > >required to make an arbitrary C call. > > > > > >For a JIT the situation is different. A JIT already understands a lot > >about register twiddling. I don't know whether it is common to use > >the C ABI when writing a JIT, but in any case it would seem that > >putting this in there as well is no big deal. Then instead of > >figuring out at call time how to make a given call, you simply > >determine it at compile time and generate the appropriate code. > > > > > > > >>>Our experience is that a fast, zero optimizing JIT can yield low- > >>>enough response time. So, I think at least Harmony has the option > >>>of having a decent system without an interpreter. Thoughts? > >>> > >>> > > > >Geir> Basic thought is yes, I always figured we'd have this pluggable, with > >Geir> an interpreter for ease of porting, and then platform-specific JIT. > > > >It seems to me that there's a design question here. For instance, if > >you want to eventually take interpreted code and compile it (when it > >is "hot"), for full pluggability your JIT(s) and your interpreter need > >to agree on some set of bookkeeping details in order to make this > >possible. OTOH, you could make other decisions that make this problem > >go away, for instance having a single choice of execution engine up > >front; so the "fast JIT" and the "optimizing JIT" are just part of the > >same code base and only need to talk to each other, and can be built > >in an ad hoc way. > > > > > >Personally I'd be just as happy if we only had a JIT. There are > >already plenty of interpreters out there. > > > >Tom > > > > -- VP and Chair, Apache Portals (http://portals.apache.org) Apache Software Foundation signature.asc Description: This is a digitally signed message part

Next Message by Thread: click to view message preview

Re: [arch] Interpreter vs. JIT for Harmony VM

acoliver@xxxxxxxxxx wrote: In my experience GC tuning often makes a larger difference than fully optimized code generation. Thus anything that doubles our footprint will probably tend to be perceptively slower in larger systems under load (these things don't seem to be so perceptable with microbenchmarking). -andy Garbage collector performance is a space-time tradeoff, and exhibits a curve something like an exponential decay. The difference between a 1.25x heap (relative to the minimum heap requirement to run the application) and a 1.5x heap can be dramatic - but a 5x heap vs 6x heap can be a negligible difference. On the other hand code compiled with an optimizing compiler is (at least for the JikesRVM compilers) 7+ times faster than with the baseline compiler. The cost of compiled code would only double our footprint if the code was much larger than the data. Santiago's results below show 20MB of baseline compiled code, leaving the other 80MB increase unexplained. Santiago Gala wrote: El mi??, 21-09-2005 a las 08:29 -0700, will pugh escribi??: I think having a FastJIT and forgoing the interpreter is a pretty elegant solution, however, there are a few things that may come out of this: 1) Implementing JVMTI will probabaly be more difficult than doing a straight interpreter 2) The FastJIT needs to be Fast! Otherwise, you run the risk of people not wanting to use it for IDEs and Apps because the startup time is too slow. I would have thought that implementing JVMTI for SlowJIT-ted code would have been about as difficult as for the FastJIT-ted code ? Or are we to assume that tools will only be used at the lowest level of optimization ? 3) Memory. A typical fast, non opt JIT will generate 10-15 bytes of machine code *per bytecode*. This means that, say, tomcat plus typical web applications will generate more than 20Megs of jitted code that will be executed just a few times. A fast interpreter+optimizing compiler would achieve similar performance and save most of those 20Megs. I've seen this going on in my efforts to get jetspeed running on top of jikesRVM+classpath (which is leading to a series of bug reports to both projects). I don't think there's any reason why code that never gets executed needs to be kept - it should be possible for the VM to revert a baseline compiled method to its original uncompiled state, and as long as there's no active stack frame executing the method it can be reclaimed. The details are probably a little hairy, of course ... I have it running, in my linux-ppc TiBook, only one problem with ClassLoader.getResource that is proving difficult to solve is remaining for a full success. :) Tomcat+Jetspeed runs (qualitatively) faster using an Optimized JikesRVM +classpath version in my TiBook than using IBM-jdk-1.4.2, but it requires 200 M heap, while IBM jdk runs it in 100 Megs. Also, startup time is about the same or slightly higher, but this is mostly because I don't opt-compile the optimizing compiler itself to save build time. Example output from a typical run: Compilation Subsystem Report Comp #Meths Time bcb/ms mcb/bcb MCKB BCKB JNI 35 2.44 NA NA 15.5 NA Base 26074 8082.06 194.01 10.51 22977.7 2186.7 Opt 722 14685.43 2.46 6.76 226.7 33.5 Regards Santiago Are you using the GenMS collector ? cheers, Robin
Sign up for updates to this mailing list. email:
Loading Comments...
Home | News | Patents | Sitemap | FAQ | advertise

Advertising by