|
|
Subject: Re: Patches to get oprofile to work with perfmon2 on amd64 - msg#00035
List: linux.oprofile
Will,
On Mon, Mar 27, 2006 at 11:09:57AM -0500, William Cohen wrote:
> I have gotten oprofile to make use of the new perfmon2 mechanism to
> collect samples. I currently have this running on my AMD64 laptop. The
> oprof_perfmon2-20060327.diff patches the oprofile user space code and
> perfmon2_oprof20060327.diff is for the kernel. The patches are still
> "work in progress" and there are certainly things that need to be
> corrected. The patches borrow heavily from the previous ia64
> oprofile/perfmon support.
Looking at /arch/i386/oprofile/perfmon.c, it is identical to the
IA-64 version and the experimental i386 version I developed. I think
we can move this format into the generic perfmon code in perfmon/.
This way we only have one version to maintain.
> Due to the different sampling mechanism that could be used for x86,
> /dev/oprofile/implement has been added so the sampling mechanism being
> used can be identify how the samples are being collected.
>
Yes. I think there are things to do in this area. Perfmon2 does not support
NMI-based sampling. On Itanium there is no NMI. On other architectures,
if I understand clearly, NMI is used because it provides better coverage
of kernel code. NMI cannot be masked therefore you can collect samples
in code sections were interrupts are masked.
Is that the ONLY motivation for this?
> Rather than directly setting up the bits for the performance monitoring
> hardware libpfm is used to map the name to the appropriate bits. For
> processors with complicated constraints on the performance monitoring
> hardware this makes more sense than trying to duplicate the constraints
> mechanism in oprofile.
>
Yes, you could use libpfm to simplify this part of the job. My understanding
here is that there is already that logic about events/encodings/constraints
in Oprofile. The only missing piece would be out to map OProfile register naming
scheme to the perfmon2 naming scheme. Using libpfm just for this may look
overkill in a sense. I need to look at how rgister names are handled across
the various architectures OProfile supports. May be there is a simpler way that
would not introduce a dependency on libpfm.
> Below are issues that still need to be fixed in the various areas of the
> oprofile/perfmon2 monitoring.
> kernel:
> - separating oprofiles processor id code from i386 nmi mechanism setup
> - have oprofile/perfmon2 identify cpu for real (currently just hardwired
> to amd64)
This is something I don't quite understand in OProfile. Why is it that user
code relies on CPU detection done by the OPRofile kernel code? The user
code could as well detect the CPU model (via cpuid or equivalent). If you
assume that the kernel code probes on init and disables itself if the CPU
is not supported, then nothing bad can happen.
> - oprofile always uses perfmon2 if kernel configured with perfmon
I think we have to do this otherwise we may have PMU access conflicts.
> - module installation a bit odd:
> -install oprofile modules
> -opcontrol reads information to determine if perfmon2 used
Yes that makes sense.
> -opcontrol install appropropriate perfmon module
Yes, or it could be builtin.
> - oprofile lies that it needs buffer space (perfmon_get_size()) so
> perfmon2 actually calls oprofile's perfmon_handler()
I fixed that. This was a bug. The format detection code was wrong.
>
> oprofile:
> - make translation of events names to bit patterns more robust:
> can hang if event is not found
> - verify that the event masking support works
> - get rid of fatal_error() function in opd_perfmon.c
> - ophelp get the available events from libpfm when possible
>
> libpfm:
> -make event mapping complete (lots of events missing for various processors)
> -libpfm isn't available on some procesors that perfmon supports (e.g.
> p4/ppc64)
Yes, I know that for non Itanium, there are some events missing, sometimes
because of umask combinations.
Thanks for your patches.
--
-Stephane
-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
Was this page helpful?
Thread at a glance:
Previous Message by Date:
click to view message preview
Patches to get oprofile to work with perfmon2 on amd64
I have gotten oprofile to make use of the new perfmon2 mechanism to
collect samples. I currently have this running on my AMD64 laptop. The
oprof_perfmon2-20060327.diff patches the oprofile user space code and
perfmon2_oprof20060327.diff is for the kernel. The patches are still
"work in progress" and there are certainly things that need to be
corrected. The patches borrow heavily from the previous ia64
oprofile/perfmon support.
Due to the different sampling mechanism that could be used for x86,
/dev/oprofile/implement has been added so the sampling mechanism being
used can be identify how the samples are being collected.
Rather than directly setting up the bits for the performance monitoring
hardware libpfm is used to map the name to the appropriate bits. For
processors with complicated constraints on the performance monitoring
hardware this makes more sense than trying to duplicate the constraints
mechanism in oprofile.
Below are issues that still need to be fixed in the various areas of the
oprofile/perfmon2 monitoring.
kernel:
- separating oprofiles processor id code from i386 nmi mechanism setup
- have oprofile/perfmon2 identify cpu for real (currently just hardwired
to amd64)
- oprofile always uses perfmon2 if kernel configured with perfmon
- module installation a bit odd:
-install oprofile modules
-opcontrol reads information to determine if perfmon2 used
-opcontrol install appropropriate perfmon module
- oprofile lies that it needs buffer space (perfmon_get_size()) so
perfmon2 actually calls oprofile's perfmon_handler()
oprofile:
- make translation of events names to bit patterns more robust:
can hang if event is not found
- verify that the event masking support works
- get rid of fatal_error() function in opd_perfmon.c
- ophelp get the available events from libpfm when possible
libpfm:
-make event mapping complete (lots of events missing for various processors)
-libpfm isn't available on some procesors that perfmon supports (e.g.
p4/ppc64)
-Will
--- oprofile-0.9.2-0.20060309-perfmon2/utils/opcontrol.perfmon2 2006-03-18
20:50:11.000000000 -0500
+++ oprofile-0.9.2-0.20060309-perfmon2/utils/opcontrol 2006-03-23
17:13:28.000000000 -0500
@@ -267,6 +267,14 @@
OP_COUNTERS=`ls $MOUNT/ | grep "^[0-9]\+\$" | tr "\n" " "`
NR_CHOSEN=0
+ OP_IMPLEMENTATION_DIR=$MOUNT/implementation
+ if test -f $OP_IMPLEMENTATION; then
+ OP_IMPLEMENTATION=`cat $OP_IMPLEMENTATION_DIR`
+ else
+ OP_IMPLEMENTATION="unspecified"
+ fi
+
+
DEFAULT_EVENT=`$OPHELP --get-default-event`
IS_TIMER=0
@@ -274,10 +282,42 @@
if test "$CPUTYPE" = "timer"; then
IS_TIMER=1
else
- case "$CPUTYPE" in
+ case $OP_IMPLEMENTATION in
+ perfmon2)
+ IS_PERFMON=$KERNEL_SUPPORT
+ # need to get the appropriate perfmon module installed
+ # FIXME need to remove them when they are not needed
+ case "$CPUTYPE" in
+ i386/ppro|i386/pii|i386/piii)
+ PERFMON_MOD="perfmon_p6"
+ ;;
+ i386/p6_mobile)
+ PERFMON_MOD="perfmon_pm"
+ ;;
+ #FIXME need to handle em64t
+ i386/p4|i386/p4-ht)
+ PERFMON_MOD="perfmon_p4"
+ ;;
+ i386/athlon|x86-64/hammer)
+ PERFMON_MOD="perfmon_amd"
+ ;;
+ esac
+ modprobe $PERFMON_MOD
+ if test "$?" != "0"; then
+ echo "Unable to load module $PERFMON_MOD."
+ # couldn't load the module
+ exit 1
+ fi
+ ;;
+ unspecified)
+ case "$CPUTYPE" in
ia64/*)
IS_PERFMON=$KERNEL_SUPPORT
;;
+ esac
+ ;;
+ *)
+ ;;
esac
fi
}
--- oprofile-0.9.2-0.20060309-perfmon2/daemon/Makefile.am.perfmon2
2006-03-10 13:35:12.000000000 -0500
+++ oprofile-0.9.2-0.20060309-perfmon2/daemon/Makefile.am 2006-03-10
13:37:15.000000000 -0500
@@ -25,7 +25,7 @@
opd_anon.h \
opd_anon.c
-LIBS=@POPT_LIBS@ @LIBERTY_LIBS@
+LIBS=@POPT_LIBS@ @LIBERTY_LIBS@ @PFM_LIBS@
AM_CPPFLAGS = \
-I ${top_srcdir}/libabi \
--- oprofile-0.9.2-0.20060309-perfmon2/daemon/opd_perfmon.c.perfmon2
2006-03-10 13:35:24.000000000 -0500
+++ oprofile-0.9.2-0.20060309-perfmon2/daemon/opd_perfmon.c 2006-03-10
16:04:36.000000000 -0500
@@ -8,7 +8,7 @@
* @author John Levon
*/
-#ifdef __ia64__
+#if defined( __ia64__) || defined(OPROF_PERFMON2)
/* need this for sched_setaffinity() in <sched.h> */
#define _GNU_SOURCE
@@ -33,6 +33,25 @@
#ifdef HAVE_SCHED_SETAFFINITY
#include <sched.h>
#endif
+#ifdef OPROF_PERFMON2
+#include <perfmon/perfmon.h>
+#include <perfmon/pfmlib.h>
+#endif
+
+/* FIXME fatal_error is just temporary */
+static void fatal_error(char *fmt,...) __attribute__((noreturn));
+
+static void
+fatal_error(char *fmt, ...)
+{
+ va_list ap;
+
+ va_start(ap, fmt);
+ vfprintf(stderr, fmt, ap);
+ va_end(ap);
+
+ exit(1);
+}
extern op_cpu cpu_type;
@@ -63,7 +82,7 @@
}
#endif
-
+#ifndef OPROF_PERFMON2
#ifndef HAVE_PERFMONCTL
#ifndef __NR_perfmonctl
#define __NR_perfmonctl 1175
@@ -74,6 +93,7 @@
return syscall(__NR_perfmonctl, fd, cmd, arg, narg);
}
#endif
+#endif
static unsigned char uuid[16] = {
@@ -97,7 +117,7 @@
static void perfmon_start_child(int ctx_fd)
{
- if (perfmonctl(ctx_fd, PFM_START, 0, 0) == -1) {
+ if (op_pfm_start(ctx_fd, NULL) == -1) {
perror("Couldn't start perfmon: ");
exit(EXIT_FAILURE);
}
@@ -106,7 +126,7 @@
static void perfmon_stop_child(int ctx_fd)
{
- if (perfmonctl(ctx_fd, PFM_STOP, 0, 0) == -1) {
+ if (op_pfm_stop(ctx_fd) == -1) {
perror("Couldn't stop perfmon: ");
exit(EXIT_FAILURE);
}
@@ -149,11 +169,12 @@
static void set_affinity(size_t cpu)
{
cpu_set_t set;
+ int err;
CPU_ZERO(&set);
CPU_SET(cpu, &set);
- int err = sched_setaffinity(getpid(), sizeof(set), &set);
+ err = sched_setaffinity(getpid(), sizeof(set), &set);
if (err == -1) {
fprintf(stderr, "Failed to set affinity: %s\n",
@@ -205,14 +226,18 @@
/** create the per-cpu context */
static void create_context(struct child * self)
{
+#ifdef OPROF_PERFMON2
+ pfarg_ctx_t ctx;
+#else
pfarg_context_t ctx;
+#endif
int err;
- memset(&ctx, 0, sizeof(pfarg_context_t));
+ memset(&ctx, 0, sizeof(ctx));
memcpy(&ctx.ctx_smpl_buf_id, &uuid, 16);
ctx.ctx_flags = PFM_FL_SYSTEM_WIDE;
- err = perfmonctl(0, PFM_CREATE_CONTEXT, &ctx, 1);
+ err = op_pfm_create_context(&ctx);
if (err == -1) {
fprintf(stderr, "CREATE_CONTEXT failed: %s\n",
strerror(errno));
@@ -223,17 +248,39 @@
}
+/* FIXME need to factor out machine specific ia64 stuff */
/** program the perfmon counters */
static void write_pmu(struct child * self)
{
+ int err;
+ size_t i, j;
+#ifndef OPROF_PERFMON2
pfarg_reg_t pc[OP_MAX_COUNTERS];
pfarg_reg_t pd[OP_MAX_COUNTERS];
- int err;
- size_t i;
+#else
+ pfmlib_input_param_t inp;
+ pfmlib_output_param_t outp;
+ pfarg_pmc_t pc[OP_MAX_COUNTERS];
+ pfarg_pmd_t pd[OP_MAX_COUNTERS];
+ pfmlib_options_t pfmlib_options;
+
+ /*
+ * pass options to library (optional)
+ */
+ memset(&pfmlib_options, 0, sizeof(pfmlib_options));
+ pfmlib_options.pfm_debug = 1; /* set to 1 for debug */
+ pfmlib_options.pfm_verbose = 1; /* set to 1 for debug */
+ pfm_set_options(&pfmlib_options);
+
+ memset(&inp,0, sizeof(inp));
+ memset(&outp,0, sizeof(outp));
+#endif /* OPROF_PERFMON2 */
memset(pc, 0, sizeof(pc));
memset(pd, 0, sizeof(pd));
+#ifndef OPROF_PERFMON2
+
#define PMC_GEN_INTERRUPT (1UL << 5)
#define PMC_PRIV_MONITOR (1UL << 6)
/* McKinley requires pmc4 to have bit 23 set (enable PMU).
@@ -257,22 +304,72 @@
pc[i].reg_value &= ~(0xf << 16);
pc[i].reg_value |= ((event->um & 0xf) << 16);
pc[i].reg_smpl_eventid = event->counter;
- }
- for (i = 0; i < op_nr_counters && opd_events[i].name; ++i) {
- struct opd_event * event = &opd_events[i];
pd[i].reg_value = ~0UL - event->count + 1;
pd[i].reg_short_reset = ~0UL - event->count + 1;
pd[i].reg_num = event->counter + 4;
+ pd[i].reg_smpl_eventid = event->counter;
}
+#else
- err = perfmonctl(self->ctx_fd, PFM_WRITE_PMCS, pc, i);
+ /* setup inp */
+ inp.pfp_dfl_plm = PFM_PLM0;
+
+ for (i = 0; i < op_nr_counters && opd_events[i].name; ++i) {
+ struct opd_event * event = &opd_events[i];
+ /* Find the matching event */
+ if (pfm_find_event(event->name, &inp.pfp_events[i].event)
+ != PFMLIB_SUCCESS) {
+ fatal_error("Cannot find %s event\n", event->name);
+ }
+ (event->user) ? (inp.pfp_events[i].plm |= PFM_PLM3)
+ : (inp.pfp_events[i].plm &= ~PFM_PLM3);
+ (event->kernel) ? (inp.pfp_events[i].plm |= PFM_PLM0)
+ : (inp.pfp_events[i].plm &= ~PFM_PLM0);
+
+ /* set to sampling */
+ /* interval between samples */
+ }
+ inp.pfp_event_count = i;
+
+ /* generate outp */
+ err = pfm_dispatch_events(&inp, NULL, &outp, NULL);
+ if (err != PFMLIB_SUCCESS) {
+ fatal_error("cannot configure events: %s\n", pfm_strerror(err));
+
+ exit(EXIT_FAILURE);
+ }
+
+ /* copy outp over */
+ for (i=0; i < outp.pfp_pmc_count; i++) {
+ pc[i].reg_num = outp.pfp_pmcs[i].reg_num;
+ pc[i].reg_value = outp.pfp_pmcs[i].reg_value;
+ }
+
+ /*
+ * figure out pmd mapping from output pmc
+ */
+ for (i=0, j=0; i < inp.pfp_event_count; i++) {
+ struct opd_event * event = &opd_events[i];
+ pd[i].reg_num = outp.pfp_pmcs[j].reg_pmd_num;
+ for(; j < outp.pfp_pmc_count; j++) if
(outp.pfp_pmcs[j].reg_evt_idx != i) break;
+ /* fill out the rest of the information pmd */
+ pd[i].reg_smpl_pmds[0] = 0;
+ pd[i].reg_flags |= PFM_REGFL_OVFL_NOTIFY;
+ pd[i].reg_reset_pmds[0] = 0;
+ pd[i].reg_value = - event->count;
+ pd[i].reg_short_reset = - event->count;
+ pd[i].reg_long_reset = - event->count;
+ }
+#endif
+
+ err = op_pfm_write_pmcs(self->ctx_fd, pc, i);
if (err == -1) {
perror("Couldn't write PMCs: ");
exit(EXIT_FAILURE);
}
- err = perfmonctl(self->ctx_fd, PFM_WRITE_PMDS, pd, i);
+ err = op_pfm_write_pmds(self->ctx_fd, pd, i);
if (err == -1) {
perror("Couldn't write PMDs: ");
exit(EXIT_FAILURE);
@@ -288,7 +385,7 @@
memset(&load_args, 0, sizeof(load_args));
load_args.load_pid = self->pid;
- err = perfmonctl(self->ctx_fd, PFM_LOAD_CONTEXT, &load_args, 1);
+ err = op_pfm_load_context(self->ctx_fd, &load_args);
if (err == -1) {
perror("Couldn't load context: ");
exit(EXIT_FAILURE);
@@ -316,6 +413,11 @@
{
struct child * self = &children[cpu];
+ if (pfm_initialize() != PFMLIB_SUCCESS) {
+ printf("Can't initialize library\n");
+ exit(1);
+ }
+
self->pid = getpid();
self->sigusr1 = 0;
self->sigusr2 = 0;
@@ -461,4 +563,4 @@
kill(children[i].pid, SIGUSR2);
}
-#endif /* __ia64__ */
+#endif /* defined(__ia64__) || defined(OPROF_PERFMON2) */
--- oprofile-0.9.2-0.20060309-perfmon2/daemon/opd_perfmon.h.perfmon2
2006-03-10 13:35:34.000000000 -0500
+++ oprofile-0.9.2-0.20060309-perfmon2/daemon/opd_perfmon.h 2006-03-18
21:15:35.000000000 -0500
@@ -11,7 +11,7 @@
#ifndef OPD_PERFMON_H
#define OPD_PERFMON_H
-#ifdef __ia64__
+#if defined(__ia64__) || defined(OPROF_PERFMON2)
#include <stdlib.h>
@@ -20,6 +20,8 @@
void perfmon_start(void);
void perfmon_stop(void);
+#if (!defined(OPROF_PERFMON2))
+
/* The following is from asm/perfmon.h. When it's installed on
* enough boxes, we can remove this and include the platform
* perfmon.h
@@ -80,6 +82,53 @@
#define PFM_LOAD_CONTEXT 0x10
#define PFM_FL_SYSTEM_WIDE 0x02
+/* wrapper to allow older perfmon interface to be used */
+/* FIXME need to be set correcly for older perfmon */
+#define op_pfm_create_context(ctx) perfmonctl(0, PFM_CREATE_CONTEXT, ctx, 1)
+#define op_pfm_write_pmcs(fd, pmcs, count) \
+ perfmonctl(fd, PFM_WRITE_PMCS, pmcs, count)
+#define op_pfm_write_pmds(fd, pmds, count) \
+ perfmonctl(fd, PFM_WRITE_PMDS, pmds, count)
+#define op_pfm_read_pmds(fd, pmds, count) \
+ perfmonctl(fd, PFM_READ_PMDS, pmds, count)
+#define op_pfm_load_context(fd, load) \
+ perfmonctl(fd, PFM_LOAD_CONTEXT, load, 1)
+#define op_pfm_start(fd, start) \
+ perfmonctl(fd, PFM_START, start, 1)
+#define op_pfm_stop(fd) \
+ perfmonctl(fd, PFM_STOP, NULL, 0)
+#define op_pfm_restart(fd) \
+ perfmonctl(fd, PFM_RESTART, NULL, 0)
+#define op_pfm_create_evtsets(fd, setd, count) \
+ perfmonctl(fd, PFM_CREATE_EVTSETS, setd, count)
+#define op_pfm_getinfo_evtsets(fd, info, count) \
+ perfmonctl(fd, PFM_GETINOF, info, count)
+#define op_pfm_delete_evtsets(fd, setd, count) \
+ perfmonctl(fd, PFM_DELETE_EVTSETS, setd, count)
+#define op_pfm_unload_context(fd) \
+ perfmonctl(fd, PFM_UNLOAD_CONTEXT, NULL, 0)
+
+#else
+
+/* wrapper to allow older perfmon interface to be used */
+#define op_pfm_create_context(ctx) pfm_create_context(ctx, NULL, 0)
+#define op_pfm_write_pmcs(fd, pmcs, count) pfm_write_pmcs(fd, pmcs, count)
+#define op_pfm_write_pmds(fd, pmds, count) pfm_write_pmds(fd, pmds, count)
+#define op_pfm_read_pmds(fd, pmds, count) pfm_read_pmds(fd, pmds, count)
+#define op_pfm_load_context(fd, load) pfm_load_context(fd, load)
+#define op_pfm_start(fd, start) pfm_start(fd, start)
+#define op_pfm_stop(fd) pfm_stop(fd)
+#define op_pfm_restart(fd) pfm_restart(fd)
+#define op_pfm_create_evtsets(fd, setd, count) \
+ pfm_create_evtsets(fd, setd, count)
+#define op_pfm_getinfo_evtsets(fd, info, count) \
+ pfm_getinfo_evtsets(fd, info, count)
+#define op_pfm_delete_evtsets(fd, setd, count) \
+ pfm_delete_evtsets(fd, setd, count)
+#define op_pfm_unload_context(fd) pfm_unload_context(fd)
+
+#endif /* (!defined(OPROF_PERFMON2)) */
+
#else
void perfmon_init(void)
@@ -101,6 +150,6 @@
{
}
-#endif /* __ia64__ */
+#endif /* defined(__ia64__) || defined(OPROF_PERFMON2) */
#endif /* OPD_PERFMON_H */
--- oprofile-0.9.2-0.20060309-perfmon2/configure.in.perfmon2 2006-03-10
13:35:04.000000000 -0500
+++ oprofile-0.9.2-0.20060309-perfmon2/configure.in 2006-03-10
13:36:41.000000000 -0500
@@ -133,6 +133,18 @@
AC_SUBST(BFD_LIBS)
AC_SUBST(POPT_LIBS)
+dnl enable option to use perfmon use on processors other than ia64
+AC_ARG_ENABLE(perfmon2,
+ [ --enable-perfmon2 enable option for perfmon2 use on
non-ia64 processors (default is disabled)],
+ enable_perfmon2=$enableval, enable_perfmon2=no)
+if test "$enable_perfmon2" = yes; then
+ AC_CHECK_LIB(pfm, pfm_start,, AC_MSG_ERROR([pfm library not found]))
+ PFM_LIBS="-lpfm"
+ AC_SUBST(PFM_LIBS)
+ AX_CFLAGS_OPTION(OP_CFLAGS,[-DOPROF_PERFMON2])
+ AX_CXXFLAGS_OPTION(OP_CXXFLAGS,[-DOPROF_PERFMON2])
+fi
+
# do NOT put tests here, they will fail in the case X is not installed !
AM_CONDITIONAL(have_qt, test -n "$QT_LIB")
--- linux-2.6.16-perfmon2/drivers/oprofile/timer_int.c.orig 2006-03-20
00:53:29.000000000 -0500
+++ linux-2.6.16-perfmon2/drivers/oprofile/timer_int.c 2006-03-23
10:53:01.000000000 -0500
@@ -43,4 +43,5 @@
ops->start = timer_start;
ops->stop = timer_stop;
ops->cpu_type = "timer";
+ ops->implementation = "timer";
}
--- linux-2.6.16-perfmon2/drivers/oprofile/oprofile_files.c.orig
2006-03-20 00:53:29.000000000 -0500
+++ linux-2.6.16-perfmon2/drivers/oprofile/oprofile_files.c 2006-03-23
10:53:01.000000000 -0500
@@ -65,13 +65,24 @@
{
return oprofilefs_str_to_user(oprofile_ops.cpu_type, buf, count,
offset);
}
-
-
+
+
static struct file_operations cpu_type_fops = {
.read = cpu_type_read,
};
-
-
+
+
+static ssize_t implementation(struct file * file, char __user * buf, size_t
count, loff_t * offset)
+{
+ return oprofilefs_str_to_user(oprofile_ops.implementation, buf, count,
offset);
+}
+
+
+static struct file_operations implementation_fops = {
+ .read = implementation,
+};
+
+
static ssize_t enable_read(struct file * file, char __user * buf, size_t
count, loff_t * offset)
{
return oprofilefs_ulong_to_user(oprofile_started, buf, count, offset);
@@ -126,7 +137,8 @@
oprofilefs_create_ulong(sb, root, "buffer_size", &fs_buffer_size);
oprofilefs_create_ulong(sb, root, "buffer_watershed",
&fs_buffer_watershed);
oprofilefs_create_ulong(sb, root, "cpu_buffer_size",
&fs_cpu_buffer_size);
- oprofilefs_create_file(sb, root, "cpu_type", &cpu_type_fops);
+ oprofilefs_create_file(sb, root, "cpu_type", &cpu_type_fops);
+ oprofilefs_create_file(sb, root, "implementation",
&implementation_fops);
oprofilefs_create_file(sb, root, "backtrace_depth", &depth_fops);
oprofilefs_create_file(sb, root, "pointer_size", &pointer_size_fops);
oprofile_create_stats_files(sb, root);
--- linux-2.6.16-perfmon2/arch/x86_64/oprofile/Makefile.orig 2006-03-20
00:53:29.000000000 -0500
+++ linux-2.6.16-perfmon2/arch/x86_64/oprofile/Makefile 2006-03-23
10:53:01.000000000 -0500
@@ -15,5 +15,6 @@
OPROFILE-$(CONFIG_X86_LOCAL_APIC) += nmi_int.o op_model_athlon.o op_model_p4.o
\
op_model_ppro.o
OPROFILE-$(CONFIG_X86_IO_APIC) += nmi_timer_int.o
+OPROFILE-$(CONFIG_PERFMON) += perfmon.o
oprofile-y = $(DRIVER_OBJS) $(addprefix ../../i386/oprofile/, $(OPROFILE-y))
--- linux-2.6.16-perfmon2/arch/i386/oprofile/nmi_int.c.orig 2006-03-20
00:53:29.000000000 -0500
+++ linux-2.6.16-perfmon2/arch/i386/oprofile/nmi_int.c 2006-03-23
10:53:01.000000000 -0500
@@ -415,6 +415,7 @@
ops->start = nmi_start;
ops->stop = nmi_stop;
ops->cpu_type = cpu_type;
+ ops->implementation = "oprofile";
printk(KERN_INFO "oprofile: using NMI interrupt.\n");
return 0;
}
--- linux-2.6.16-perfmon2/arch/i386/oprofile/Makefile.orig 2006-03-20
00:53:29.000000000 -0500
+++ linux-2.6.16-perfmon2/arch/i386/oprofile/Makefile 2006-03-23
10:53:02.000000000 -0500
@@ -10,3 +10,4 @@
oprofile-$(CONFIG_X86_LOCAL_APIC) += nmi_int.o op_model_athlon.o \
op_model_ppro.o op_model_p4.o
oprofile-$(CONFIG_X86_IO_APIC) += nmi_timer_int.o
+oprofile-$(CONFIG_PERFMON) += perfmon.o
--- linux-2.6.16-perfmon2/arch/i386/oprofile/init.c.orig 2006-03-20
00:53:29.000000000 -0500
+++ linux-2.6.16-perfmon2/arch/i386/oprofile/init.c 2006-03-23
10:53:02.000000000 -0500
@@ -15,8 +15,10 @@
* with the NMI mode driver.
*/
+extern int op_perfmon_init(struct oprofile_operations * ops);
extern int op_nmi_init(struct oprofile_operations * ops);
extern int op_nmi_timer_init(struct oprofile_operations * ops);
+extern void op_perfmon_exit(void);
extern void op_nmi_exit(void);
extern void x86_backtrace(struct pt_regs * const regs, unsigned int depth);
@@ -27,8 +29,12 @@
ret = -ENODEV;
+#ifdef CONFIG_PERFMON
+ ret = op_perfmon_init(ops);
+#endif
#ifdef CONFIG_X86_LOCAL_APIC
- ret = op_nmi_init(ops);
+ if (ret < 0)
+ ret = op_nmi_init(ops);
#endif
#ifdef CONFIG_X86_IO_APIC
if (ret < 0)
@@ -42,6 +48,9 @@
void oprofile_arch_exit(void)
{
+#ifdef CONFIG_PERFMON
+ op_perfmon_exit();
+#endif
#ifdef CONFIG_X86_LOCAL_APIC
op_nmi_exit();
#endif
--- /dev/null 2006-03-27 09:20:43.000437500 -0500
+++ linux-2.6.16-perfmon2/arch/i386/oprofile/perfmon.c 2006-03-27
09:54:16.000000000 -0500
@@ -0,0 +1,116 @@
+/**
+ * @file perfmon.c
+ *
+ * @remark Copyright 2003 OProfile authors
+ * @remark Read the file COPYING
+ *
+ * @author John Levon <levon@xxxxxxxxxxxxxxxxx>
+ */
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/config.h>
+#include <linux/oprofile.h>
+#include <linux/sched.h>
+#include <linux/perfmon.h>
+#include <asm/ptrace.h>
+#include <asm/errno.h>
+
+static int allow_ints;
+
+static int
+perfmon_get_size(unsigned int flags, void *data, size_t *size)
+{
+ /* This is just a dummy size. OProfile uses its own buffer
+ for the time being. */
+ *size = sizeof (int);
+
+ return 0;
+}
+
+static int
+perfmon_handler(void *buf, struct pfm_ovfl_arg *arg,
+ unsigned long ip, u64 stamp, void *data)
+{
+ int event = arg->pmd_eventid;
+ struct pt_regs * const regs = (struct pt_regs *) data;
+
+ PFM_DBG_ovfl("oprofile overflow ip=%lx, event=%d",
+ instruction_pointer(regs), event);
+
+ arg->ovfl_ctrl = PFM_OVFL_CTRL_RESET;
+
+ /* the owner of the oprofile event buffer may have exited
+ * without perfmon being shutdown (e.g. SIGSEGV)
+ */
+ if (allow_ints)
+ oprofile_add_sample(regs, event);
+ return 0;
+}
+
+
+static int perfmon_start(void)
+{
+ allow_ints = 1;
+ return 0;
+}
+
+
+static void perfmon_stop(void)
+{
+ allow_ints = 0;
+}
+
+
+#define OPROFILE_FMT_UUID { \
+ 0x77, 0x7a, 0x6e, 0x61, 0x20, 0x65, 0x73, 0x69, \
+ 0x74, 0x6e, 0x72, 0x20, 0x61, 0x65, 0x0a, 0x6c \
+}
+
+static struct pfm_smpl_fmt oprofile_fmt = {
+ .fmt_name = "oprofile_format",
+ .fmt_uuid = OPROFILE_FMT_UUID,
+ .fmt_getsize = perfmon_get_size,
+ .fmt_handler = perfmon_handler,
+ .fmt_flags = PFM_FMT_BUILTIN_FLAG,
+ .owner = THIS_MODULE,
+};
+
+
+static char * get_cpu_type(void)
+{
+ /* FIXME: right now just dummied up for amd64.
+ This will need to list do the right thing for the
+ various x86 processors.
+ */
+ return "x86-64/hammer";
+}
+
+
+/* all the ops are handled via userspace for i386 oprofile using perfmon */
+
+static int using_perfmon;
+
+int __init op_perfmon_init(struct oprofile_operations * ops)
+{
+ int ret = pfm_register_smpl_fmt(&oprofile_fmt);
+ if (ret)
+ return -ENODEV;
+
+ ops->cpu_type = get_cpu_type();
+ ops->start = perfmon_start;
+ ops->stop = perfmon_stop;
+ ops->implementation = "perfmon2";
+ using_perfmon = 1;
+ printk(KERN_INFO "oprofile: using perfmon.\n");
+ return 0;
+}
+
+
+void __exit op_perfmon_exit(void)
+{
+ if (!using_perfmon)
+ return;
+
+ pfm_unregister_smpl_fmt(oprofile_fmt.fmt_uuid);
+}
--- linux-2.6.16-perfmon2/arch/i386/oprofile/nmi_timer_int.c.orig
2006-03-20 00:53:29.000000000 -0500
+++ linux-2.6.16-perfmon2/arch/i386/oprofile/nmi_timer_int.c 2006-03-23
10:53:02.000000000 -0500
@@ -50,6 +50,7 @@
ops->start = timer_start;
ops->stop = timer_stop;
ops->cpu_type = "timer";
+ ops->implementation = "nmi_timer";
printk(KERN_INFO "oprofile: using NMI timer interrupt.\n");
return 0;
}
--- linux-2.6.16-perfmon2/include/linux/oprofile.h.orig 2006-03-20
00:53:29.000000000 -0500
+++ linux-2.6.16-perfmon2/include/linux/oprofile.h 2006-03-23
10:53:02.000000000 -0500
@@ -39,6 +39,8 @@
void (*backtrace)(struct pt_regs * const regs, unsigned int depth);
/* CPU identification string. */
char * cpu_type;
+ /* Identify method of string. */
+ char * implementation;
};
/**
Next Message by Date:
click to view message preview
Re: Patches to get oprofile to work with perfmon2 on amd64
Stephane Eranian wrote:
Will,
On Mon, Mar 27, 2006 at 11:09:57AM -0500, William Cohen wrote:
I have gotten oprofile to make use of the new perfmon2 mechanism to
collect samples. I currently have this running on my AMD64 laptop. The
oprof_perfmon2-20060327.diff patches the oprofile user space code and
perfmon2_oprof20060327.diff is for the kernel. The patches are still
"work in progress" and there are certainly things that need to be
corrected. The patches borrow heavily from the previous ia64
oprofile/perfmon support.
Looking at /arch/i386/oprofile/perfmon.c, it is identical to the
IA-64 version and the experimental i386 version I developed. I think
we can move this format into the generic perfmon code in perfmon/.
This way we only have one version to maintain.
Yes, the changes for /arch/i386/oprofile/perfmon.c were pretty
straightforward and would be the same for other architectures. Factoring
out the code and making it common to the platforms is reasonable.
Due to the different sampling mechanism that could be used for x86,
/dev/oprofile/implement has been added so the sampling mechanism being
used can be identify how the samples are being collected.
Yes. I think there are things to do in this area. Perfmon2 does not support
NMI-based sampling. On Itanium there is no NMI. On other architectures,
if I understand clearly, NMI is used because it provides better coverage
of kernel code. NMI cannot be masked therefore you can collect samples
in code sections were interrupts are masked.
Is that the ONLY motivation for this?
Depending which kernel someone is using the same oprofile code for i386
and x86-64 platforms could use either the original oprofile or perfmon2
to access the performance monitoring hardware. It seemed easiest to have
the /dev/oprofile have a file that explicitly stated the mechanism being
used. This could also be used by GUIs and other tools to directly
determine the profiling mechanism. I wanted to avoid inferring mechanism
in uses by looking at a bunch of files.
The native OProfile driver on x86-64 and i386 use the NMI. This does
allow sampling in IRQ routines. However, need to make sure that the
amount of time spent in the NMI handler is limited. Using the NMI
routine appears to cause problems on some machines (e.g. laptops where
the NMI could happen when the BIOS is doing some power management
operation).
Is there some idea of the overhead in the perfmon2 timer interval and
sampling mechanisms?
Rather than directly setting up the bits for the performance monitoring
hardware libpfm is used to map the name to the appropriate bits. For
processors with complicated constraints on the performance monitoring
hardware this makes more sense than trying to duplicate the constraints
mechanism in oprofile.
Yes, you could use libpfm to simplify this part of the job. My understanding
here is that there is already that logic about events/encodings/constraints
in Oprofile. The only missing piece would be out to map OProfile register naming
scheme to the perfmon2 naming scheme. Using libpfm just for this may look
overkill in a sense. I need to look at how rgister names are handled across
the various architectures OProfile supports. May be there is a simpler way that
would not introduce a dependency on libpfm.
OProfile has event and unit_mask files for each of the supported
architecture in /usr/share/oprofile/{arch}/{model}. For example the
x86-64 amd64 machine would use the event and unit_mask files in
/usr/share/oprofile/x86-64/hammer.
The constraints are much more complicated for the pentium 4 and and
power processors. I would expect that libpfm will be able to do a better
job there, once support is in libpfm for them. For the Pentium4 OProfile
made a number of simplifications and reduce the available counters to 8
independent counters on non-ht processor and 4 independent counter on ht
processor. There are also tagging events that are not handled by
OProfile's mechanism. The power (ppc64) processors event selection
mechanism is relatively complex. OProfile doe have events for it, but it
isn't ideal.
The goal here is to factor out the event mapping logic and have it in
one place.
Below are issues that still need to be fixed in the various areas of the
oprofile/perfmon2 monitoring.
kernel:
- separating oprofiles processor id code from i386 nmi mechanism setup
- have oprofile/perfmon2 identify cpu for real (currently just hardwired
to amd64)
This is something I don't quite understand in OProfile. Why is it that user
code relies on CPU detection done by the OPRofile kernel code? The user
code could as well detect the CPU model (via cpuid or equivalent). If you
assume that the kernel code probes on init and disables itself if the CPU
is not supported, then nothing bad can happen.
The cpu identification is required for two purposes:
1) figure out how the oprofile module accesses the performance
monitoring hardware. There are different methods of accessing the
performance monitoring registers in ppro/p2/p3, p4, and athlon.
2) the user space needs to get the correct list of events to map event
names to number and unit masks.
The user-space could do find out the cpuid on it's own, but the oprofile
native driver has to determine the information anyway.
How would perfmon2 tools handle the case of multiple multiple
architectures? Do the cpuid in user space and modprobe the appropriate
module? What happens if the wrong perfmon kernel module is attepted to
be loaded? Is there a check in the initalizaiton to make sure that it
will works on the processor?
- oprofile always uses perfmon2 if kernel configured with perfmon
I think we have to do this otherwise we may have PMU access conflicts.
I was thinking about the case that someone would prefer to use one of
the other sampling mechanisms eg. the nmi or timer mechanism. On
OProfile you can force the timer mechanism to be used.
- module installation a bit odd:
-install oprofile modules
-opcontrol reads information to determine if perfmon2 used
Yes that makes sense.
-opcontrol install appropropriate perfmon module
Yes, or it could be builtin.
Has perfmon2 built-in been verified to work with multiple architectures?
Don't want to have different kernels for EM64T and AMD64 or P6, Pentium
M, P4.
Is there some way of identifying that perfmon2 is available on the
machine. Right now the oprofile/perfmon2 patch assumes it is always a
module.
- oprofile lies that it needs buffer space (perfmon_get_size()) so
perfmon2 actually calls oprofile's perfmon_handler()
I fixed that. This was a bug. The format detection code was wrong.
Excellent.
oprofile:
- make translation of events names to bit patterns more robust:
can hang if event is not found
- verify that the event masking support works
- get rid of fatal_error() function in opd_perfmon.c
- ophelp get the available events from libpfm when possible
libpfm:
-make event mapping complete (lots of events missing for various processors)
-libpfm isn't available on some procesors that perfmon supports (e.g.
p4/ppc64)
Yes, I know that for non Itanium, there are some events missing, sometimes
because of umask combinations.
Thanks for your patches.
Thanks for perfmon2.
-Will
-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
Previous Message by Thread:
click to view message preview
Patches to get oprofile to work with perfmon2 on amd64
I have gotten oprofile to make use of the new perfmon2 mechanism to
collect samples. I currently have this running on my AMD64 laptop. The
oprof_perfmon2-20060327.diff patches the oprofile user space code and
perfmon2_oprof20060327.diff is for the kernel. The patches are still
"work in progress" and there are certainly things that need to be
corrected. The patches borrow heavily from the previous ia64
oprofile/perfmon support.
Due to the different sampling mechanism that could be used for x86,
/dev/oprofile/implement has been added so the sampling mechanism being
used can be identify how the samples are being collected.
Rather than directly setting up the bits for the performance monitoring
hardware libpfm is used to map the name to the appropriate bits. For
processors with complicated constraints on the performance monitoring
hardware this makes more sense than trying to duplicate the constraints
mechanism in oprofile.
Below are issues that still need to be fixed in the various areas of the
oprofile/perfmon2 monitoring.
kernel:
- separating oprofiles processor id code from i386 nmi mechanism setup
- have oprofile/perfmon2 identify cpu for real (currently just hardwired
to amd64)
- oprofile always uses perfmon2 if kernel configured with perfmon
- module installation a bit odd:
-install oprofile modules
-opcontrol reads information to determine if perfmon2 used
-opcontrol install appropropriate perfmon module
- oprofile lies that it needs buffer space (perfmon_get_size()) so
perfmon2 actually calls oprofile's perfmon_handler()
oprofile:
- make translation of events names to bit patterns more robust:
can hang if event is not found
- verify that the event masking support works
- get rid of fatal_error() function in opd_perfmon.c
- ophelp get the available events from libpfm when possible
libpfm:
-make event mapping complete (lots of events missing for various processors)
-libpfm isn't available on some procesors that perfmon supports (e.g.
p4/ppc64)
-Will
--- oprofile-0.9.2-0.20060309-perfmon2/utils/opcontrol.perfmon2 2006-03-18
20:50:11.000000000 -0500
+++ oprofile-0.9.2-0.20060309-perfmon2/utils/opcontrol 2006-03-23
17:13:28.000000000 -0500
@@ -267,6 +267,14 @@
OP_COUNTERS=`ls $MOUNT/ | grep "^[0-9]\+\$" | tr "\n" " "`
NR_CHOSEN=0
+ OP_IMPLEMENTATION_DIR=$MOUNT/implementation
+ if test -f $OP_IMPLEMENTATION; then
+ OP_IMPLEMENTATION=`cat $OP_IMPLEMENTATION_DIR`
+ else
+ OP_IMPLEMENTATION="unspecified"
+ fi
+
+
DEFAULT_EVENT=`$OPHELP --get-default-event`
IS_TIMER=0
@@ -274,10 +282,42 @@
if test "$CPUTYPE" = "timer"; then
IS_TIMER=1
else
- case "$CPUTYPE" in
+ case $OP_IMPLEMENTATION in
+ perfmon2)
+ IS_PERFMON=$KERNEL_SUPPORT
+ # need to get the appropriate perfmon module installed
+ # FIXME need to remove them when they are not needed
+ case "$CPUTYPE" in
+ i386/ppro|i386/pii|i386/piii)
+ PERFMON_MOD="perfmon_p6"
+ ;;
+ i386/p6_mobile)
+ PERFMON_MOD="perfmon_pm"
+ ;;
+ #FIXME need to handle em64t
+ i386/p4|i386/p4-ht)
+ PERFMON_MOD="perfmon_p4"
+ ;;
+ i386/athlon|x86-64/hammer)
+ PERFMON_MOD="perfmon_amd"
+ ;;
+ esac
+ modprobe $PERFMON_MOD
+ if test "$?" != "0"; then
+ echo "Unable to load module $PERFMON_MOD."
+ # couldn't load the module
+ exit 1
+ fi
+ ;;
+ unspecified)
+ case "$CPUTYPE" in
ia64/*)
IS_PERFMON=$KERNEL_SUPPORT
;;
+ esac
+ ;;
+ *)
+ ;;
esac
fi
}
--- oprofile-0.9.2-0.20060309-perfmon2/daemon/Makefile.am.perfmon2
2006-03-10 13:35:12.000000000 -0500
+++ oprofile-0.9.2-0.20060309-perfmon2/daemon/Makefile.am 2006-03-10
13:37:15.000000000 -0500
@@ -25,7 +25,7 @@
opd_anon.h \
opd_anon.c
-LIBS=@POPT_LIBS@ @LIBERTY_LIBS@
+LIBS=@POPT_LIBS@ @LIBERTY_LIBS@ @PFM_LIBS@
AM_CPPFLAGS = \
-I ${top_srcdir}/libabi \
--- oprofile-0.9.2-0.20060309-perfmon2/daemon/opd_perfmon.c.perfmon2
2006-03-10 13:35:24.000000000 -0500
+++ oprofile-0.9.2-0.20060309-perfmon2/daemon/opd_perfmon.c 2006-03-10
16:04:36.000000000 -0500
@@ -8,7 +8,7 @@
* @author John Levon
*/
-#ifdef __ia64__
+#if defined( __ia64__) || defined(OPROF_PERFMON2)
/* need this for sched_setaffinity() in <sched.h> */
#define _GNU_SOURCE
@@ -33,6 +33,25 @@
#ifdef HAVE_SCHED_SETAFFINITY
#include <sched.h>
#endif
+#ifdef OPROF_PERFMON2
+#include <perfmon/perfmon.h>
+#include <perfmon/pfmlib.h>
+#endif
+
+/* FIXME fatal_error is just temporary */
+static void fatal_error(char *fmt,...) __attribute__((noreturn));
+
+static void
+fatal_error(char *fmt, ...)
+{
+ va_list ap;
+
+ va_start(ap, fmt);
+ vfprintf(stderr, fmt, ap);
+ va_end(ap);
+
+ exit(1);
+}
extern op_cpu cpu_type;
@@ -63,7 +82,7 @@
}
#endif
-
+#ifndef OPROF_PERFMON2
#ifndef HAVE_PERFMONCTL
#ifndef __NR_perfmonctl
#define __NR_perfmonctl 1175
@@ -74,6 +93,7 @@
return syscall(__NR_perfmonctl, fd, cmd, arg, narg);
}
#endif
+#endif
static unsigned char uuid[16] = {
@@ -97,7 +117,7 @@
static void perfmon_start_child(int ctx_fd)
{
- if (perfmonctl(ctx_fd, PFM_START, 0, 0) == -1) {
+ if (op_pfm_start(ctx_fd, NULL) == -1) {
perror("Couldn't start perfmon: ");
exit(EXIT_FAILURE);
}
@@ -106,7 +126,7 @@
static void perfmon_stop_child(int ctx_fd)
{
- if (perfmonctl(ctx_fd, PFM_STOP, 0, 0) == -1) {
+ if (op_pfm_stop(ctx_fd) == -1) {
perror("Couldn't stop perfmon: ");
exit(EXIT_FAILURE);
}
@@ -149,11 +169,12 @@
static void set_affinity(size_t cpu)
{
cpu_set_t set;
+ int err;
CPU_ZERO(&set);
CPU_SET(cpu, &set);
- int err = sched_setaffinity(getpid(), sizeof(set), &set);
+ err = sched_setaffinity(getpid(), sizeof(set), &set);
if (err == -1) {
fprintf(stderr, "Failed to set affinity: %s\n",
@@ -205,14 +226,18 @@
/** create the per-cpu context */
static void create_context(struct child * self)
{
+#ifdef OPROF_PERFMON2
+ pfarg_ctx_t ctx;
+#else
pfarg_context_t ctx;
+#endif
int err;
- memset(&ctx, 0, sizeof(pfarg_context_t));
+ memset(&ctx, 0, sizeof(ctx));
memcpy(&ctx.ctx_smpl_buf_id, &uuid, 16);
ctx.ctx_flags = PFM_FL_SYSTEM_WIDE;
- err = perfmonctl(0, PFM_CREATE_CONTEXT, &ctx, 1);
+ err = op_pfm_create_context(&ctx);
if (err == -1) {
fprintf(stderr, "CREATE_CONTEXT failed: %s\n",
strerror(errno));
@@ -223,17 +248,39 @@
}
+/* FIXME need to factor out machine specific ia64 stuff */
/** program the perfmon counters */
static void write_pmu(struct child * self)
{
+ int err;
+ size_t i, j;
+#ifndef OPROF_PERFMON2
pfarg_reg_t pc[OP_MAX_COUNTERS];
pfarg_reg_t pd[OP_MAX_COUNTERS];
- int err;
- size_t i;
+#else
+ pfmlib_input_param_t inp;
+ pfmlib_output_param_t outp;
+ pfarg_pmc_t pc[OP_MAX_COUNTERS];
+ pfarg_pmd_t pd[OP_MAX_COUNTERS];
+ pfmlib_options_t pfmlib_options;
+
+ /*
+ * pass options to library (optional)
+ */
+ memset(&pfmlib_options, 0, sizeof(pfmlib_options));
+ pfmlib_options.pfm_debug = 1; /* set to 1 for debug */
+ pfmlib_options.pfm_verbose = 1; /* set to 1 for debug */
+ pfm_set_options(&pfmlib_options);
+
+ memset(&inp,0, sizeof(inp));
+ memset(&outp,0, sizeof(outp));
+#endif /* OPROF_PERFMON2 */
memset(pc, 0, sizeof(pc));
memset(pd, 0, sizeof(pd));
+#ifndef OPROF_PERFMON2
+
#define PMC_GEN_INTERRUPT (1UL << 5)
#define PMC_PRIV_MONITOR (1UL << 6)
/* McKinley requires pmc4 to have bit 23 set (enable PMU).
@@ -257,22 +304,72 @@
pc[i].reg_value &= ~(0xf << 16);
pc[i].reg_value |= ((event->um & 0xf) << 16);
pc[i].reg_smpl_eventid = event->counter;
- }
- for (i = 0; i < op_nr_counters && opd_events[i].name; ++i) {
- struct opd_event * event = &opd_events[i];
pd[i].reg_value = ~0UL - event->count + 1;
pd[i].reg_short_reset = ~0UL - event->count + 1;
pd[i].reg_num = event->counter + 4;
+ pd[i].reg_smpl_eventid = event->counter;
}
+#else
- err = perfmonctl(self->ctx_fd, PFM_WRITE_PMCS, pc, i);
+ /* setup inp */
+ inp.pfp_dfl_plm = PFM_PLM0;
+
+ for (i = 0; i < op_nr_counters && opd_events[i].name; ++i) {
+ struct opd_event * event = &opd_events[i];
+ /* Find the matching event */
+ if (pfm_find_event(event->name, &inp.pfp_events[i].event)
+ != PFMLIB_SUCCESS) {
+ fatal_error("Cannot find %s event\n", event->name);
+ }
+ (event->user) ? (inp.pfp_events[i].plm |= PFM_PLM3)
+ : (inp.pfp_events[i].plm &= ~PFM_PLM3);
+ (event->kernel) ? (inp.pfp_events[i].plm |= PFM_PLM0)
+ : (inp.pfp_events[i].plm &= ~PFM_PLM0);
+
+ /* set to sampling */
+ /* interval between samples */
+ }
+ inp.pfp_event_count = i;
+
+ /* generate outp */
+ err = pfm_dispatch_events(&inp, NULL, &outp, NULL);
+ if (err != PFMLIB_SUCCESS) {
+ fatal_error("cannot configure events: %s\n", pfm_strerror(err));
+
+ exit(EXIT_FAILURE);
+ }
+
+ /* copy outp over */
+ for (i=0; i < outp.pfp_pmc_count; i++) {
+ pc[i].reg_num = outp.pfp_pmcs[i].reg_num;
+ pc[i].reg_value = outp.pfp_pmcs[i].reg_value;
+ }
+
+ /*
+ * figure out pmd mapping from output pmc
+ */
+ for (i=0, j=0; i < inp.pfp_event_count; i++) {
+ struct opd_event * event = &opd_events[i];
+ pd[i].reg_num = outp.pfp_pmcs[j].reg_pmd_num;
+ for(; j < outp.pfp_pmc_count; j++) if
(outp.pfp_pmcs[j].reg_evt_idx != i) break;
+ /* fill out the rest of the information pmd */
+ pd[i].reg_smpl_pmds[0] = 0;
+ pd[i].reg_flags |= PFM_REGFL_OVFL_NOTIFY;
+ pd[i].reg_reset_pmds[0] = 0;
+ pd[i].reg_value = - event->count;
+ pd[i].reg_short_reset = - event->count;
+ pd[i].reg_long_reset = - event->count;
+ }
+#endif
+
+ err = op_pfm_write_pmcs(self->ctx_fd, pc, i);
if (err == -1) {
perror("Couldn't write PMCs: ");
exit(EXIT_FAILURE);
}
- err = perfmonctl(self->ctx_fd, PFM_WRITE_PMDS, pd, i);
+ err = op_pfm_write_pmds(self->ctx_fd, pd, i);
if (err == -1) {
perror("Couldn't write PMDs: ");
exit(EXIT_FAILURE);
@@ -288,7 +385,7 @@
memset(&load_args, 0, sizeof(load_args));
load_args.load_pid = self->pid;
- err = perfmonctl(self->ctx_fd, PFM_LOAD_CONTEXT, &load_args, 1);
+ err = op_pfm_load_context(self->ctx_fd, &load_args);
if (err == -1) {
perror("Couldn't load context: ");
exit(EXIT_FAILURE);
@@ -316,6 +413,11 @@
{
struct child * self = &children[cpu];
+ if (pfm_initialize() != PFMLIB_SUCCESS) {
+ printf("Can't initialize library\n");
+ exit(1);
+ }
+
self->pid = getpid();
self->sigusr1 = 0;
self->sigusr2 = 0;
@@ -461,4 +563,4 @@
kill(children[i].pid, SIGUSR2);
}
-#endif /* __ia64__ */
+#endif /* defined(__ia64__) || defined(OPROF_PERFMON2) */
--- oprofile-0.9.2-0.20060309-perfmon2/daemon/opd_perfmon.h.perfmon2
2006-03-10 13:35:34.000000000 -0500
+++ oprofile-0.9.2-0.20060309-perfmon2/daemon/opd_perfmon.h 2006-03-18
21:15:35.000000000 -0500
@@ -11,7 +11,7 @@
#ifndef OPD_PERFMON_H
#define OPD_PERFMON_H
-#ifdef __ia64__
+#if defined(__ia64__) || defined(OPROF_PERFMON2)
#include <stdlib.h>
@@ -20,6 +20,8 @@
void perfmon_start(void);
void perfmon_stop(void);
+#if (!defined(OPROF_PERFMON2))
+
/* The following is from asm/perfmon.h. When it's installed on
* enough boxes, we can remove this and include the platform
* perfmon.h
@@ -80,6 +82,53 @@
#define PFM_LOAD_CONTEXT 0x10
#define PFM_FL_SYSTEM_WIDE 0x02
+/* wrapper to allow older perfmon interface to be used */
+/* FIXME need to be set correcly for older perfmon */
+#define op_pfm_create_context(ctx) perfmonctl(0, PFM_CREATE_CONTEXT, ctx, 1)
+#define op_pfm_write_pmcs(fd, pmcs, count) \
+ perfmonctl(fd, PFM_WRITE_PMCS, pmcs, count)
+#define op_pfm_write_pmds(fd, pmds, count) \
+ perfmonctl(fd, PFM_WRITE_PMDS, pmds, count)
+#define op_pfm_read_pmds(fd, pmds, count) \
+ perfmonctl(fd, PFM_READ_PMDS, pmds, count)
+#define op_pfm_load_context(fd, load) \
+ perfmonctl(fd, PFM_LOAD_CONTEXT, load, 1)
+#define op_pfm_start(fd, start) \
+ perfmonctl(fd, PFM_START, start, 1)
+#define op_pfm_stop(fd) \
+ perfmonctl(fd, PFM_STOP, NULL, 0)
+#define op_pfm_restart(fd) \
+ perfmonctl(fd, PFM_RESTART, NULL, 0)
+#define op_pfm_create_evtsets(fd, setd, count) \
+ perfmonctl(fd, PFM_CREATE_EVTSETS, setd, count)
+#define op_pfm_getinfo_evtsets(fd, info, count) \
+ perfmonctl(fd, PFM_GETINOF, info, count)
+#define op_pfm_delete_evtsets(fd, setd, count) \
+ perfmonctl(fd, PFM_DELETE_EVTSETS, setd, count)
+#define op_pfm_unload_context(fd) \
+ perfmonctl(fd, PFM_UNLOAD_CONTEXT, NULL, 0)
+
+#else
+
+/* wrapper to allow older perfmon interface to be used */
+#define op_pfm_create_context(ctx) pfm_create_context(ctx, NULL, 0)
+#define op_pfm_write_pmcs(fd, pmcs, count) pfm_write_pmcs(fd, pmcs, count)
+#define op_pfm_write_pmds(fd, pmds, count) pfm_write_pmds(fd, pmds, count)
+#define op_pfm_read_pmds(fd, pmds, count) pfm_read_pmds(fd, pmds, count)
+#define op_pfm_load_context(fd, load) pfm_load_context(fd, load)
+#define op_pfm_start(fd, start) pfm_start(fd, start)
+#define op_pfm_stop(fd) pfm_stop(fd)
+#define op_pfm_restart(fd) pfm_restart(fd)
+#define op_pfm_create_evtsets(fd, setd, count) \
+ pfm_create_evtsets(fd, setd, count)
+#define op_pfm_getinfo_evtsets(fd, info, count) \
+ pfm_getinfo_evtsets(fd, info, count)
+#define op_pfm_delete_evtsets(fd, setd, count) \
+ pfm_delete_evtsets(fd, setd, count)
+#define op_pfm_unload_context(fd) pfm_unload_context(fd)
+
+#endif /* (!defined(OPROF_PERFMON2)) */
+
#else
void perfmon_init(void)
@@ -101,6 +150,6 @@
{
}
-#endif /* __ia64__ */
+#endif /* defined(__ia64__) || defined(OPROF_PERFMON2) */
#endif /* OPD_PERFMON_H */
--- oprofile-0.9.2-0.20060309-perfmon2/configure.in.perfmon2 2006-03-10
13:35:04.000000000 -0500
+++ oprofile-0.9.2-0.20060309-perfmon2/configure.in 2006-03-10
13:36:41.000000000 -0500
@@ -133,6 +133,18 @@
AC_SUBST(BFD_LIBS)
AC_SUBST(POPT_LIBS)
+dnl enable option to use perfmon use on processors other than ia64
+AC_ARG_ENABLE(perfmon2,
+ [ --enable-perfmon2 enable option for perfmon2 use on
non-ia64 processors (default is disabled)],
+ enable_perfmon2=$enableval, enable_perfmon2=no)
+if test "$enable_perfmon2" = yes; then
+ AC_CHECK_LIB(pfm, pfm_start,, AC_MSG_ERROR([pfm library not found]))
+ PFM_LIBS="-lpfm"
+ AC_SUBST(PFM_LIBS)
+ AX_CFLAGS_OPTION(OP_CFLAGS,[-DOPROF_PERFMON2])
+ AX_CXXFLAGS_OPTION(OP_CXXFLAGS,[-DOPROF_PERFMON2])
+fi
+
# do NOT put tests here, they will fail in the case X is not installed !
AM_CONDITIONAL(have_qt, test -n "$QT_LIB")
--- linux-2.6.16-perfmon2/drivers/oprofile/timer_int.c.orig 2006-03-20
00:53:29.000000000 -0500
+++ linux-2.6.16-perfmon2/drivers/oprofile/timer_int.c 2006-03-23
10:53:01.000000000 -0500
@@ -43,4 +43,5 @@
ops->start = timer_start;
ops->stop = timer_stop;
ops->cpu_type = "timer";
+ ops->implementation = "timer";
}
--- linux-2.6.16-perfmon2/drivers/oprofile/oprofile_files.c.orig
2006-03-20 00:53:29.000000000 -0500
+++ linux-2.6.16-perfmon2/drivers/oprofile/oprofile_files.c 2006-03-23
10:53:01.000000000 -0500
@@ -65,13 +65,24 @@
{
return oprofilefs_str_to_user(oprofile_ops.cpu_type, buf, count,
offset);
}
-
-
+
+
static struct file_operations cpu_type_fops = {
.read = cpu_type_read,
};
-
-
+
+
+static ssize_t implementation(struct file * file, char __user * buf, size_t
count, loff_t * offset)
+{
+ return oprofilefs_str_to_user(oprofile_ops.implementation, buf, count,
offset);
+}
+
+
+static struct file_operations implementation_fops = {
+ .read = implementation,
+};
+
+
static ssize_t enable_read(struct file * file, char __user * buf, size_t
count, loff_t * offset)
{
return oprofilefs_ulong_to_user(oprofile_started, buf, count, offset);
@@ -126,7 +137,8 @@
oprofilefs_create_ulong(sb, root, "buffer_size", &fs_buffer_size);
oprofilefs_create_ulong(sb, root, "buffer_watershed",
&fs_buffer_watershed);
oprofilefs_create_ulong(sb, root, "cpu_buffer_size",
&fs_cpu_buffer_size);
- oprofilefs_create_file(sb, root, "cpu_type", &cpu_type_fops);
+ oprofilefs_create_file(sb, root, "cpu_type", &cpu_type_fops);
+ oprofilefs_create_file(sb, root, "implementation",
&implementation_fops);
oprofilefs_create_file(sb, root, "backtrace_depth", &depth_fops);
oprofilefs_create_file(sb, root, "pointer_size", &pointer_size_fops);
oprofile_create_stats_files(sb, root);
--- linux-2.6.16-perfmon2/arch/x86_64/oprofile/Makefile.orig 2006-03-20
00:53:29.000000000 -0500
+++ linux-2.6.16-perfmon2/arch/x86_64/oprofile/Makefile 2006-03-23
10:53:01.000000000 -0500
@@ -15,5 +15,6 @@
OPROFILE-$(CONFIG_X86_LOCAL_APIC) += nmi_int.o op_model_athlon.o op_model_p4.o
\
op_model_ppro.o
OPROFILE-$(CONFIG_X86_IO_APIC) += nmi_timer_int.o
+OPROFILE-$(CONFIG_PERFMON) += perfmon.o
oprofile-y = $(DRIVER_OBJS) $(addprefix ../../i386/oprofile/, $(OPROFILE-y))
--- linux-2.6.16-perfmon2/arch/i386/oprofile/nmi_int.c.orig 2006-03-20
00:53:29.000000000 -0500
+++ linux-2.6.16-perfmon2/arch/i386/oprofile/nmi_int.c 2006-03-23
10:53:01.000000000 -0500
@@ -415,6 +415,7 @@
ops->start = nmi_start;
ops->stop = nmi_stop;
ops->cpu_type = cpu_type;
+ ops->implementation = "oprofile";
printk(KERN_INFO "oprofile: using NMI interrupt.\n");
return 0;
}
--- linux-2.6.16-perfmon2/arch/i386/oprofile/Makefile.orig 2006-03-20
00:53:29.000000000 -0500
+++ linux-2.6.16-perfmon2/arch/i386/oprofile/Makefile 2006-03-23
10:53:02.000000000 -0500
@@ -10,3 +10,4 @@
oprofile-$(CONFIG_X86_LOCAL_APIC) += nmi_int.o op_model_athlon.o \
op_model_ppro.o op_model_p4.o
oprofile-$(CONFIG_X86_IO_APIC) += nmi_timer_int.o
+oprofile-$(CONFIG_PERFMON) += perfmon.o
--- linux-2.6.16-perfmon2/arch/i386/oprofile/init.c.orig 2006-03-20
00:53:29.000000000 -0500
+++ linux-2.6.16-perfmon2/arch/i386/oprofile/init.c 2006-03-23
10:53:02.000000000 -0500
@@ -15,8 +15,10 @@
* with the NMI mode driver.
*/
+extern int op_perfmon_init(struct oprofile_operations * ops);
extern int op_nmi_init(struct oprofile_operations * ops);
extern int op_nmi_timer_init(struct oprofile_operations * ops);
+extern void op_perfmon_exit(void);
extern void op_nmi_exit(void);
extern void x86_backtrace(struct pt_regs * const regs, unsigned int depth);
@@ -27,8 +29,12 @@
ret = -ENODEV;
+#ifdef CONFIG_PERFMON
+ ret = op_perfmon_init(ops);
+#endif
#ifdef CONFIG_X86_LOCAL_APIC
- ret = op_nmi_init(ops);
+ if (ret < 0)
+ ret = op_nmi_init(ops);
#endif
#ifdef CONFIG_X86_IO_APIC
if (ret < 0)
@@ -42,6 +48,9 @@
void oprofile_arch_exit(void)
{
+#ifdef CONFIG_PERFMON
+ op_perfmon_exit();
+#endif
#ifdef CONFIG_X86_LOCAL_APIC
op_nmi_exit();
#endif
--- /dev/null 2006-03-27 09:20:43.000437500 -0500
+++ linux-2.6.16-perfmon2/arch/i386/oprofile/perfmon.c 2006-03-27
09:54:16.000000000 -0500
@@ -0,0 +1,116 @@
+/**
+ * @file perfmon.c
+ *
+ * @remark Copyright 2003 OProfile authors
+ * @remark Read the file COPYING
+ *
+ * @author John Levon <levon@xxxxxxxxxxxxxxxxx>
+ */
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/config.h>
+#include <linux/oprofile.h>
+#include <linux/sched.h>
+#include <linux/perfmon.h>
+#include <asm/ptrace.h>
+#include <asm/errno.h>
+
+static int allow_ints;
+
+static int
+perfmon_get_size(unsigned int flags, void *data, size_t *size)
+{
+ /* This is just a dummy size. OProfile uses its own buffer
+ for the time being. */
+ *size = sizeof (int);
+
+ return 0;
+}
+
+static int
+perfmon_handler(void *buf, struct pfm_ovfl_arg *arg,
+ unsigned long ip, u64 stamp, void *data)
+{
+ int event = arg->pmd_eventid;
+ struct pt_regs * const regs = (struct pt_regs *) data;
+
+ PFM_DBG_ovfl("oprofile overflow ip=%lx, event=%d",
+ instruction_pointer(regs), event);
+
+ arg->ovfl_ctrl = PFM_OVFL_CTRL_RESET;
+
+ /* the owner of the oprofile event buffer may have exited
+ * without perfmon being shutdown (e.g. SIGSEGV)
+ */
+ if (allow_ints)
+ oprofile_add_sample(regs, event);
+ return 0;
+}
+
+
+static int perfmon_start(void)
+{
+ allow_ints = 1;
+ return 0;
+}
+
+
+static void perfmon_stop(void)
+{
+ allow_ints = 0;
+}
+
+
+#define OPROFILE_FMT_UUID { \
+ 0x77, 0x7a, 0x6e, 0x61, 0x20, 0x65, 0x73, 0x69, \
+ 0x74, 0x6e, 0x72, 0x20, 0x61, 0x65, 0x0a, 0x6c \
+}
+
+static struct pfm_smpl_fmt oprofile_fmt = {
+ .fmt_name = "oprofile_format",
+ .fmt_uuid = OPROFILE_FMT_UUID,
+ .fmt_getsize = perfmon_get_size,
+ .fmt_handler = perfmon_handler,
+ .fmt_flags = PFM_FMT_BUILTIN_FLAG,
+ .owner = THIS_MODULE,
+};
+
+
+static char * get_cpu_type(void)
+{
+ /* FIXME: right now just dummied up for amd64.
+ This will need to list do the right thing for the
+ various x86 processors.
+ */
+ return "x86-64/hammer";
+}
+
+
+/* all the ops are handled via userspace for i386 oprofile using perfmon */
+
+static int using_perfmon;
+
+int __init op_perfmon_init(struct oprofile_operations * ops)
+{
+ int ret = pfm_register_smpl_fmt(&oprofile_fmt);
+ if (ret)
+ return -ENODEV;
+
+ ops->cpu_type = get_cpu_type();
+ ops->start = perfmon_start;
+ ops->stop = perfmon_stop;
+ ops->implementation = "perfmon2";
+ using_perfmon = 1;
+ printk(KERN_INFO "oprofile: using perfmon.\n");
+ return 0;
+}
+
+
+void __exit op_perfmon_exit(void)
+{
+ if (!using_perfmon)
+ return;
+
+ pfm_unregister_smpl_fmt(oprofile_fmt.fmt_uuid);
+}
--- linux-2.6.16-perfmon2/arch/i386/oprofile/nmi_timer_int.c.orig
2006-03-20 00:53:29.000000000 -0500
+++ linux-2.6.16-perfmon2/arch/i386/oprofile/nmi_timer_int.c 2006-03-23
10:53:02.000000000 -0500
@@ -50,6 +50,7 @@
ops->start = timer_start;
ops->stop = timer_stop;
ops->cpu_type = "timer";
+ ops->implementation = "nmi_timer";
printk(KERN_INFO "oprofile: using NMI timer interrupt.\n");
return 0;
}
--- linux-2.6.16-perfmon2/include/linux/oprofile.h.orig 2006-03-20
00:53:29.000000000 -0500
+++ linux-2.6.16-perfmon2/include/linux/oprofile.h 2006-03-23
10:53:02.000000000 -0500
@@ -39,6 +39,8 @@
void (*backtrace)(struct pt_regs * const regs, unsigned int depth);
/* CPU identification string. */
char * cpu_type;
+ /* Identify method of string. */
+ char * implementation;
};
/**
Next Message by Thread:
click to view message preview
Re: Patches to get oprofile to work with perfmon2 on amd64
Stephane Eranian wrote:
Will,
On Mon, Mar 27, 2006 at 11:09:57AM -0500, William Cohen wrote:
I have gotten oprofile to make use of the new perfmon2 mechanism to
collect samples. I currently have this running on my AMD64 laptop. The
oprof_perfmon2-20060327.diff patches the oprofile user space code and
perfmon2_oprof20060327.diff is for the kernel. The patches are still
"work in progress" and there are certainly things that need to be
corrected. The patches borrow heavily from the previous ia64
oprofile/perfmon support.
Looking at /arch/i386/oprofile/perfmon.c, it is identical to the
IA-64 version and the experimental i386 version I developed. I think
we can move this format into the generic perfmon code in perfmon/.
This way we only have one version to maintain.
Yes, the changes for /arch/i386/oprofile/perfmon.c were pretty
straightforward and would be the same for other architectures. Factoring
out the code and making it common to the platforms is reasonable.
Due to the different sampling mechanism that could be used for x86,
/dev/oprofile/implement has been added so the sampling mechanism being
used can be identify how the samples are being collected.
Yes. I think there are things to do in this area. Perfmon2 does not support
NMI-based sampling. On Itanium there is no NMI. On other architectures,
if I understand clearly, NMI is used because it provides better coverage
of kernel code. NMI cannot be masked therefore you can collect samples
in code sections were interrupts are masked.
Is that the ONLY motivation for this?
Depending which kernel someone is using the same oprofile code for i386
and x86-64 platforms could use either the original oprofile or perfmon2
to access the performance monitoring hardware. It seemed easiest to have
the /dev/oprofile have a file that explicitly stated the mechanism being
used. This could also be used by GUIs and other tools to directly
determine the profiling mechanism. I wanted to avoid inferring mechanism
in uses by looking at a bunch of files.
The native OProfile driver on x86-64 and i386 use the NMI. This does
allow sampling in IRQ routines. However, need to make sure that the
amount of time spent in the NMI handler is limited. Using the NMI
routine appears to cause problems on some machines (e.g. laptops where
the NMI could happen when the BIOS is doing some power management
operation).
Is there some idea of the overhead in the perfmon2 timer interval and
sampling mechanisms?
Rather than directly setting up the bits for the performance monitoring
hardware libpfm is used to map the name to the appropriate bits. For
processors with complicated constraints on the performance monitoring
hardware this makes more sense than trying to duplicate the constraints
mechanism in oprofile.
Yes, you could use libpfm to simplify this part of the job. My understanding
here is that there is already that logic about events/encodings/constraints
in Oprofile. The only missing piece would be out to map OProfile register naming
scheme to the perfmon2 naming scheme. Using libpfm just for this may look
overkill in a sense. I need to look at how rgister names are handled across
the various architectures OProfile supports. May be there is a simpler way that
would not introduce a dependency on libpfm.
OProfile has event and unit_mask files for each of the supported
architecture in /usr/share/oprofile/{arch}/{model}. For example the
x86-64 amd64 machine would use the event and unit_mask files in
/usr/share/oprofile/x86-64/hammer.
The constraints are much more complicated for the pentium 4 and and
power processors. I would expect that libpfm will be able to do a better
job there, once support is in libpfm for them. For the Pentium4 OProfile
made a number of simplifications and reduce the available counters to 8
independent counters on non-ht processor and 4 independent counter on ht
processor. There are also tagging events that are not handled by
OProfile's mechanism. The power (ppc64) processors event selection
mechanism is relatively complex. OProfile doe have events for it, but it
isn't ideal.
The goal here is to factor out the event mapping logic and have it in
one place.
Below are issues that still need to be fixed in the various areas of the
oprofile/perfmon2 monitoring.
kernel:
- separating oprofiles processor id code from i386 nmi mechanism setup
- have oprofile/perfmon2 identify cpu for real (currently just hardwired
to amd64)
This is something I don't quite understand in OProfile. Why is it that user
code relies on CPU detection done by the OPRofile kernel code? The user
code could as well detect the CPU model (via cpuid or equivalent). If you
assume that the kernel code probes on init and disables itself if the CPU
is not supported, then nothing bad can happen.
The cpu identification is required for two purposes:
1) figure out how the oprofile module accesses the performance
monitoring hardware. There are different methods of accessing the
performance monitoring registers in ppro/p2/p3, p4, and athlon.
2) the user space needs to get the correct list of events to map event
names to number and unit masks.
The user-space could do find out the cpuid on it's own, but the oprofile
native driver has to determine the information anyway.
How would perfmon2 tools handle the case of multiple multiple
architectures? Do the cpuid in user space and modprobe the appropriate
module? What happens if the wrong perfmon kernel module is attepted to
be loaded? Is there a check in the initalizaiton to make sure that it
will works on the processor?
- oprofile always uses perfmon2 if kernel configured with perfmon
I think we have to do this otherwise we may have PMU access conflicts.
I was thinking about the case that someone would prefer to use one of
the other sampling mechanisms eg. the nmi or timer mechanism. On
OProfile you can force the timer mechanism to be used.
- module installation a bit odd:
-install oprofile modules
-opcontrol reads information to determine if perfmon2 used
Yes that makes sense.
-opcontrol install appropropriate perfmon module
Yes, or it could be builtin.
Has perfmon2 built-in been verified to work with multiple architectures?
Don't want to have different kernels for EM64T and AMD64 or P6, Pentium
M, P4.
Is there some way of identifying that perfmon2 is available on the
machine. Right now the oprofile/perfmon2 patch assumes it is always a
module.
- oprofile lies that it needs buffer space (perfmon_get_size()) so
perfmon2 actually calls oprofile's perfmon_handler()
I fixed that. This was a bug. The format detection code was wrong.
Excellent.
oprofile:
- make translation of events names to bit patterns more robust:
can hang if event is not found
- verify that the event masking support works
- get rid of fatal_error() function in opd_perfmon.c
- ophelp get the available events from libpfm when possible
libpfm:
-make event mapping complete (lots of events missing for various processors)
-libpfm isn't available on some procesors that perfmon supports (e.g.
p4/ppc64)
Yes, I know that for non Itanium, there are some events missing, sometimes
because of umask combinations.
Thanks for your patches.
Thanks for perfmon2.
-Will
-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
|
|