osdir.com
mailing list archive

Subject: Re: Patches to get oprofile to work with perfmon2 on amd64 - msg#00035

List: linux.oprofile

Date: Prev Next Index Thread: Prev Next Index
Will,

On Mon, Mar 27, 2006 at 11:09:57AM -0500, William Cohen wrote:
> I have gotten oprofile to make use of the new perfmon2 mechanism to
> collect samples. I currently have this running on my AMD64 laptop. The
> oprof_perfmon2-20060327.diff patches the oprofile user space code and
> perfmon2_oprof20060327.diff is for the kernel. The patches are still
> "work in progress" and there are certainly things that need to be
> corrected. The patches borrow heavily from the previous ia64
> oprofile/perfmon support.

Looking at /arch/i386/oprofile/perfmon.c, it is identical to the
IA-64 version and the experimental i386 version I developed. I think
we can move this format into the generic perfmon code in perfmon/.
This way we only have one version to maintain.

> Due to the different sampling mechanism that could be used for x86,
> /dev/oprofile/implement has been added so the sampling mechanism being
> used can be identify how the samples are being collected.
>

Yes. I think there are things to do in this area. Perfmon2 does not support
NMI-based sampling. On Itanium there is no NMI. On other architectures,
if I understand clearly, NMI is used because it provides better coverage
of kernel code. NMI cannot be masked therefore you can collect samples
in code sections were interrupts are masked.

Is that the ONLY motivation for this?

> Rather than directly setting up the bits for the performance monitoring
> hardware libpfm is used to map the name to the appropriate bits. For
> processors with complicated constraints on the performance monitoring
> hardware this makes more sense than trying to duplicate the constraints
> mechanism in oprofile.
>

Yes, you could use libpfm to simplify this part of the job. My understanding
here is that there is already that logic about events/encodings/constraints
in Oprofile. The only missing piece would be out to map OProfile register naming
scheme to the perfmon2 naming scheme. Using libpfm just for this may look
overkill in a sense. I need to look at how rgister names are handled across
the various architectures OProfile supports. May be there is a simpler way that
would not introduce a dependency on libpfm.


> Below are issues that still need to be fixed in the various areas of the
> oprofile/perfmon2 monitoring.

> kernel:
> - separating oprofiles processor id code from i386 nmi mechanism setup
> - have oprofile/perfmon2 identify cpu for real (currently just hardwired
> to amd64)

This is something I don't quite understand in OProfile. Why is it that user
code relies on CPU detection done by the OPRofile kernel code? The user
code could as well detect the CPU model (via cpuid or equivalent). If you
assume that the kernel code probes on init and disables itself if the CPU
is not supported, then nothing bad can happen.

> - oprofile always uses perfmon2 if kernel configured with perfmon

I think we have to do this otherwise we may have PMU access conflicts.

> - module installation a bit odd:
> -install oprofile modules
> -opcontrol reads information to determine if perfmon2 used

Yes that makes sense.

> -opcontrol install appropropriate perfmon module

Yes, or it could be builtin.

> - oprofile lies that it needs buffer space (perfmon_get_size()) so
> perfmon2 actually calls oprofile's perfmon_handler()

I fixed that. This was a bug. The format detection code was wrong.


>
> oprofile:
> - make translation of events names to bit patterns more robust:
> can hang if event is not found
> - verify that the event masking support works
> - get rid of fatal_error() function in opd_perfmon.c
> - ophelp get the available events from libpfm when possible
>
> libpfm:
> -make event mapping complete (lots of events missing for various processors)
> -libpfm isn't available on some procesors that perfmon supports (e.g.
> p4/ppc64)

Yes, I know that for non Itanium, there are some events missing, sometimes
because of umask combinations.

Thanks for your patches.

--
-Stephane


-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642


Was this page helpful?
Yes No
Thread at a glance:

Previous Message by Date: click to view message preview

Patches to get oprofile to work with perfmon2 on amd64

I have gotten oprofile to make use of the new perfmon2 mechanism to collect samples. I currently have this running on my AMD64 laptop. The oprof_perfmon2-20060327.diff patches the oprofile user space code and perfmon2_oprof20060327.diff is for the kernel. The patches are still "work in progress" and there are certainly things that need to be corrected. The patches borrow heavily from the previous ia64 oprofile/perfmon support. Due to the different sampling mechanism that could be used for x86, /dev/oprofile/implement has been added so the sampling mechanism being used can be identify how the samples are being collected. Rather than directly setting up the bits for the performance monitoring hardware libpfm is used to map the name to the appropriate bits. For processors with complicated constraints on the performance monitoring hardware this makes more sense than trying to duplicate the constraints mechanism in oprofile. Below are issues that still need to be fixed in the various areas of the oprofile/perfmon2 monitoring. kernel: - separating oprofiles processor id code from i386 nmi mechanism setup - have oprofile/perfmon2 identify cpu for real (currently just hardwired to amd64) - oprofile always uses perfmon2 if kernel configured with perfmon - module installation a bit odd: -install oprofile modules -opcontrol reads information to determine if perfmon2 used -opcontrol install appropropriate perfmon module - oprofile lies that it needs buffer space (perfmon_get_size()) so perfmon2 actually calls oprofile's perfmon_handler() oprofile: - make translation of events names to bit patterns more robust: can hang if event is not found - verify that the event masking support works - get rid of fatal_error() function in opd_perfmon.c - ophelp get the available events from libpfm when possible libpfm: -make event mapping complete (lots of events missing for various processors) -libpfm isn't available on some procesors that perfmon supports (e.g. p4/ppc64) -Will --- oprofile-0.9.2-0.20060309-perfmon2/utils/opcontrol.perfmon2 2006-03-18 20:50:11.000000000 -0500 +++ oprofile-0.9.2-0.20060309-perfmon2/utils/opcontrol 2006-03-23 17:13:28.000000000 -0500 @@ -267,6 +267,14 @@ OP_COUNTERS=`ls $MOUNT/ | grep "^[0-9]\+\$" | tr "\n" " "` NR_CHOSEN=0 + OP_IMPLEMENTATION_DIR=$MOUNT/implementation + if test -f $OP_IMPLEMENTATION; then + OP_IMPLEMENTATION=`cat $OP_IMPLEMENTATION_DIR` + else + OP_IMPLEMENTATION="unspecified" + fi + + DEFAULT_EVENT=`$OPHELP --get-default-event` IS_TIMER=0 @@ -274,10 +282,42 @@ if test "$CPUTYPE" = "timer"; then IS_TIMER=1 else - case "$CPUTYPE" in + case $OP_IMPLEMENTATION in + perfmon2) + IS_PERFMON=$KERNEL_SUPPORT + # need to get the appropriate perfmon module installed + # FIXME need to remove them when they are not needed + case "$CPUTYPE" in + i386/ppro|i386/pii|i386/piii) + PERFMON_MOD="perfmon_p6" + ;; + i386/p6_mobile) + PERFMON_MOD="perfmon_pm" + ;; + #FIXME need to handle em64t + i386/p4|i386/p4-ht) + PERFMON_MOD="perfmon_p4" + ;; + i386/athlon|x86-64/hammer) + PERFMON_MOD="perfmon_amd" + ;; + esac + modprobe $PERFMON_MOD + if test "$?" != "0"; then + echo "Unable to load module $PERFMON_MOD." + # couldn't load the module + exit 1 + fi + ;; + unspecified) + case "$CPUTYPE" in ia64/*) IS_PERFMON=$KERNEL_SUPPORT ;; + esac + ;; + *) + ;; esac fi } --- oprofile-0.9.2-0.20060309-perfmon2/daemon/Makefile.am.perfmon2 2006-03-10 13:35:12.000000000 -0500 +++ oprofile-0.9.2-0.20060309-perfmon2/daemon/Makefile.am 2006-03-10 13:37:15.000000000 -0500 @@ -25,7 +25,7 @@ opd_anon.h \ opd_anon.c -LIBS=@POPT_LIBS@ @LIBERTY_LIBS@ +LIBS=@POPT_LIBS@ @LIBERTY_LIBS@ @PFM_LIBS@ AM_CPPFLAGS = \ -I ${top_srcdir}/libabi \ --- oprofile-0.9.2-0.20060309-perfmon2/daemon/opd_perfmon.c.perfmon2 2006-03-10 13:35:24.000000000 -0500 +++ oprofile-0.9.2-0.20060309-perfmon2/daemon/opd_perfmon.c 2006-03-10 16:04:36.000000000 -0500 @@ -8,7 +8,7 @@ * @author John Levon */ -#ifdef __ia64__ +#if defined( __ia64__) || defined(OPROF_PERFMON2) /* need this for sched_setaffinity() in <sched.h> */ #define _GNU_SOURCE @@ -33,6 +33,25 @@ #ifdef HAVE_SCHED_SETAFFINITY #include <sched.h> #endif +#ifdef OPROF_PERFMON2 +#include <perfmon/perfmon.h> +#include <perfmon/pfmlib.h> +#endif + +/* FIXME fatal_error is just temporary */ +static void fatal_error(char *fmt,...) __attribute__((noreturn)); + +static void +fatal_error(char *fmt, ...) +{ + va_list ap; + + va_start(ap, fmt); + vfprintf(stderr, fmt, ap); + va_end(ap); + + exit(1); +} extern op_cpu cpu_type; @@ -63,7 +82,7 @@ } #endif - +#ifndef OPROF_PERFMON2 #ifndef HAVE_PERFMONCTL #ifndef __NR_perfmonctl #define __NR_perfmonctl 1175 @@ -74,6 +93,7 @@ return syscall(__NR_perfmonctl, fd, cmd, arg, narg); } #endif +#endif static unsigned char uuid[16] = { @@ -97,7 +117,7 @@ static void perfmon_start_child(int ctx_fd) { - if (perfmonctl(ctx_fd, PFM_START, 0, 0) == -1) { + if (op_pfm_start(ctx_fd, NULL) == -1) { perror("Couldn't start perfmon: "); exit(EXIT_FAILURE); } @@ -106,7 +126,7 @@ static void perfmon_stop_child(int ctx_fd) { - if (perfmonctl(ctx_fd, PFM_STOP, 0, 0) == -1) { + if (op_pfm_stop(ctx_fd) == -1) { perror("Couldn't stop perfmon: "); exit(EXIT_FAILURE); } @@ -149,11 +169,12 @@ static void set_affinity(size_t cpu) { cpu_set_t set; + int err; CPU_ZERO(&set); CPU_SET(cpu, &set); - int err = sched_setaffinity(getpid(), sizeof(set), &set); + err = sched_setaffinity(getpid(), sizeof(set), &set); if (err == -1) { fprintf(stderr, "Failed to set affinity: %s\n", @@ -205,14 +226,18 @@ /** create the per-cpu context */ static void create_context(struct child * self) { +#ifdef OPROF_PERFMON2 + pfarg_ctx_t ctx; +#else pfarg_context_t ctx; +#endif int err; - memset(&ctx, 0, sizeof(pfarg_context_t)); + memset(&ctx, 0, sizeof(ctx)); memcpy(&ctx.ctx_smpl_buf_id, &uuid, 16); ctx.ctx_flags = PFM_FL_SYSTEM_WIDE; - err = perfmonctl(0, PFM_CREATE_CONTEXT, &ctx, 1); + err = op_pfm_create_context(&ctx); if (err == -1) { fprintf(stderr, "CREATE_CONTEXT failed: %s\n", strerror(errno)); @@ -223,17 +248,39 @@ } +/* FIXME need to factor out machine specific ia64 stuff */ /** program the perfmon counters */ static void write_pmu(struct child * self) { + int err; + size_t i, j; +#ifndef OPROF_PERFMON2 pfarg_reg_t pc[OP_MAX_COUNTERS]; pfarg_reg_t pd[OP_MAX_COUNTERS]; - int err; - size_t i; +#else + pfmlib_input_param_t inp; + pfmlib_output_param_t outp; + pfarg_pmc_t pc[OP_MAX_COUNTERS]; + pfarg_pmd_t pd[OP_MAX_COUNTERS]; + pfmlib_options_t pfmlib_options; + + /* + * pass options to library (optional) + */ + memset(&pfmlib_options, 0, sizeof(pfmlib_options)); + pfmlib_options.pfm_debug = 1; /* set to 1 for debug */ + pfmlib_options.pfm_verbose = 1; /* set to 1 for debug */ + pfm_set_options(&pfmlib_options); + + memset(&inp,0, sizeof(inp)); + memset(&outp,0, sizeof(outp)); +#endif /* OPROF_PERFMON2 */ memset(pc, 0, sizeof(pc)); memset(pd, 0, sizeof(pd)); +#ifndef OPROF_PERFMON2 + #define PMC_GEN_INTERRUPT (1UL << 5) #define PMC_PRIV_MONITOR (1UL << 6) /* McKinley requires pmc4 to have bit 23 set (enable PMU). @@ -257,22 +304,72 @@ pc[i].reg_value &= ~(0xf << 16); pc[i].reg_value |= ((event->um & 0xf) << 16); pc[i].reg_smpl_eventid = event->counter; - } - for (i = 0; i < op_nr_counters && opd_events[i].name; ++i) { - struct opd_event * event = &opd_events[i]; pd[i].reg_value = ~0UL - event->count + 1; pd[i].reg_short_reset = ~0UL - event->count + 1; pd[i].reg_num = event->counter + 4; + pd[i].reg_smpl_eventid = event->counter; } +#else - err = perfmonctl(self->ctx_fd, PFM_WRITE_PMCS, pc, i); + /* setup inp */ + inp.pfp_dfl_plm = PFM_PLM0; + + for (i = 0; i < op_nr_counters && opd_events[i].name; ++i) { + struct opd_event * event = &opd_events[i]; + /* Find the matching event */ + if (pfm_find_event(event->name, &inp.pfp_events[i].event) + != PFMLIB_SUCCESS) { + fatal_error("Cannot find %s event\n", event->name); + } + (event->user) ? (inp.pfp_events[i].plm |= PFM_PLM3) + : (inp.pfp_events[i].plm &= ~PFM_PLM3); + (event->kernel) ? (inp.pfp_events[i].plm |= PFM_PLM0) + : (inp.pfp_events[i].plm &= ~PFM_PLM0); + + /* set to sampling */ + /* interval between samples */ + } + inp.pfp_event_count = i; + + /* generate outp */ + err = pfm_dispatch_events(&inp, NULL, &outp, NULL); + if (err != PFMLIB_SUCCESS) { + fatal_error("cannot configure events: %s\n", pfm_strerror(err)); + + exit(EXIT_FAILURE); + } + + /* copy outp over */ + for (i=0; i < outp.pfp_pmc_count; i++) { + pc[i].reg_num = outp.pfp_pmcs[i].reg_num; + pc[i].reg_value = outp.pfp_pmcs[i].reg_value; + } + + /* + * figure out pmd mapping from output pmc + */ + for (i=0, j=0; i < inp.pfp_event_count; i++) { + struct opd_event * event = &opd_events[i]; + pd[i].reg_num = outp.pfp_pmcs[j].reg_pmd_num; + for(; j < outp.pfp_pmc_count; j++) if (outp.pfp_pmcs[j].reg_evt_idx != i) break; + /* fill out the rest of the information pmd */ + pd[i].reg_smpl_pmds[0] = 0; + pd[i].reg_flags |= PFM_REGFL_OVFL_NOTIFY; + pd[i].reg_reset_pmds[0] = 0; + pd[i].reg_value = - event->count; + pd[i].reg_short_reset = - event->count; + pd[i].reg_long_reset = - event->count; + } +#endif + + err = op_pfm_write_pmcs(self->ctx_fd, pc, i); if (err == -1) { perror("Couldn't write PMCs: "); exit(EXIT_FAILURE); } - err = perfmonctl(self->ctx_fd, PFM_WRITE_PMDS, pd, i); + err = op_pfm_write_pmds(self->ctx_fd, pd, i); if (err == -1) { perror("Couldn't write PMDs: "); exit(EXIT_FAILURE); @@ -288,7 +385,7 @@ memset(&load_args, 0, sizeof(load_args)); load_args.load_pid = self->pid; - err = perfmonctl(self->ctx_fd, PFM_LOAD_CONTEXT, &load_args, 1); + err = op_pfm_load_context(self->ctx_fd, &load_args); if (err == -1) { perror("Couldn't load context: "); exit(EXIT_FAILURE); @@ -316,6 +413,11 @@ { struct child * self = &children[cpu]; + if (pfm_initialize() != PFMLIB_SUCCESS) { + printf("Can't initialize library\n"); + exit(1); + } + self->pid = getpid(); self->sigusr1 = 0; self->sigusr2 = 0; @@ -461,4 +563,4 @@ kill(children[i].pid, SIGUSR2); } -#endif /* __ia64__ */ +#endif /* defined(__ia64__) || defined(OPROF_PERFMON2) */ --- oprofile-0.9.2-0.20060309-perfmon2/daemon/opd_perfmon.h.perfmon2 2006-03-10 13:35:34.000000000 -0500 +++ oprofile-0.9.2-0.20060309-perfmon2/daemon/opd_perfmon.h 2006-03-18 21:15:35.000000000 -0500 @@ -11,7 +11,7 @@ #ifndef OPD_PERFMON_H #define OPD_PERFMON_H -#ifdef __ia64__ +#if defined(__ia64__) || defined(OPROF_PERFMON2) #include <stdlib.h> @@ -20,6 +20,8 @@ void perfmon_start(void); void perfmon_stop(void); +#if (!defined(OPROF_PERFMON2)) + /* The following is from asm/perfmon.h. When it's installed on * enough boxes, we can remove this and include the platform * perfmon.h @@ -80,6 +82,53 @@ #define PFM_LOAD_CONTEXT 0x10 #define PFM_FL_SYSTEM_WIDE 0x02 +/* wrapper to allow older perfmon interface to be used */ +/* FIXME need to be set correcly for older perfmon */ +#define op_pfm_create_context(ctx) perfmonctl(0, PFM_CREATE_CONTEXT, ctx, 1) +#define op_pfm_write_pmcs(fd, pmcs, count) \ + perfmonctl(fd, PFM_WRITE_PMCS, pmcs, count) +#define op_pfm_write_pmds(fd, pmds, count) \ + perfmonctl(fd, PFM_WRITE_PMDS, pmds, count) +#define op_pfm_read_pmds(fd, pmds, count) \ + perfmonctl(fd, PFM_READ_PMDS, pmds, count) +#define op_pfm_load_context(fd, load) \ + perfmonctl(fd, PFM_LOAD_CONTEXT, load, 1) +#define op_pfm_start(fd, start) \ + perfmonctl(fd, PFM_START, start, 1) +#define op_pfm_stop(fd) \ + perfmonctl(fd, PFM_STOP, NULL, 0) +#define op_pfm_restart(fd) \ + perfmonctl(fd, PFM_RESTART, NULL, 0) +#define op_pfm_create_evtsets(fd, setd, count) \ + perfmonctl(fd, PFM_CREATE_EVTSETS, setd, count) +#define op_pfm_getinfo_evtsets(fd, info, count) \ + perfmonctl(fd, PFM_GETINOF, info, count) +#define op_pfm_delete_evtsets(fd, setd, count) \ + perfmonctl(fd, PFM_DELETE_EVTSETS, setd, count) +#define op_pfm_unload_context(fd) \ + perfmonctl(fd, PFM_UNLOAD_CONTEXT, NULL, 0) + +#else + +/* wrapper to allow older perfmon interface to be used */ +#define op_pfm_create_context(ctx) pfm_create_context(ctx, NULL, 0) +#define op_pfm_write_pmcs(fd, pmcs, count) pfm_write_pmcs(fd, pmcs, count) +#define op_pfm_write_pmds(fd, pmds, count) pfm_write_pmds(fd, pmds, count) +#define op_pfm_read_pmds(fd, pmds, count) pfm_read_pmds(fd, pmds, count) +#define op_pfm_load_context(fd, load) pfm_load_context(fd, load) +#define op_pfm_start(fd, start) pfm_start(fd, start) +#define op_pfm_stop(fd) pfm_stop(fd) +#define op_pfm_restart(fd) pfm_restart(fd) +#define op_pfm_create_evtsets(fd, setd, count) \ + pfm_create_evtsets(fd, setd, count) +#define op_pfm_getinfo_evtsets(fd, info, count) \ + pfm_getinfo_evtsets(fd, info, count) +#define op_pfm_delete_evtsets(fd, setd, count) \ + pfm_delete_evtsets(fd, setd, count) +#define op_pfm_unload_context(fd) pfm_unload_context(fd) + +#endif /* (!defined(OPROF_PERFMON2)) */ + #else void perfmon_init(void) @@ -101,6 +150,6 @@ { } -#endif /* __ia64__ */ +#endif /* defined(__ia64__) || defined(OPROF_PERFMON2) */ #endif /* OPD_PERFMON_H */ --- oprofile-0.9.2-0.20060309-perfmon2/configure.in.perfmon2 2006-03-10 13:35:04.000000000 -0500 +++ oprofile-0.9.2-0.20060309-perfmon2/configure.in 2006-03-10 13:36:41.000000000 -0500 @@ -133,6 +133,18 @@ AC_SUBST(BFD_LIBS) AC_SUBST(POPT_LIBS) +dnl enable option to use perfmon use on processors other than ia64 +AC_ARG_ENABLE(perfmon2, + [ --enable-perfmon2 enable option for perfmon2 use on non-ia64 processors (default is disabled)], + enable_perfmon2=$enableval, enable_perfmon2=no) +if test "$enable_perfmon2" = yes; then + AC_CHECK_LIB(pfm, pfm_start,, AC_MSG_ERROR([pfm library not found])) + PFM_LIBS="-lpfm" + AC_SUBST(PFM_LIBS) + AX_CFLAGS_OPTION(OP_CFLAGS,[-DOPROF_PERFMON2]) + AX_CXXFLAGS_OPTION(OP_CXXFLAGS,[-DOPROF_PERFMON2]) +fi + # do NOT put tests here, they will fail in the case X is not installed ! AM_CONDITIONAL(have_qt, test -n "$QT_LIB") --- linux-2.6.16-perfmon2/drivers/oprofile/timer_int.c.orig 2006-03-20 00:53:29.000000000 -0500 +++ linux-2.6.16-perfmon2/drivers/oprofile/timer_int.c 2006-03-23 10:53:01.000000000 -0500 @@ -43,4 +43,5 @@ ops->start = timer_start; ops->stop = timer_stop; ops->cpu_type = "timer"; + ops->implementation = "timer"; } --- linux-2.6.16-perfmon2/drivers/oprofile/oprofile_files.c.orig 2006-03-20 00:53:29.000000000 -0500 +++ linux-2.6.16-perfmon2/drivers/oprofile/oprofile_files.c 2006-03-23 10:53:01.000000000 -0500 @@ -65,13 +65,24 @@ { return oprofilefs_str_to_user(oprofile_ops.cpu_type, buf, count, offset); } - - + + static struct file_operations cpu_type_fops = { .read = cpu_type_read, }; - - + + +static ssize_t implementation(struct file * file, char __user * buf, size_t count, loff_t * offset) +{ + return oprofilefs_str_to_user(oprofile_ops.implementation, buf, count, offset); +} + + +static struct file_operations implementation_fops = { + .read = implementation, +}; + + static ssize_t enable_read(struct file * file, char __user * buf, size_t count, loff_t * offset) { return oprofilefs_ulong_to_user(oprofile_started, buf, count, offset); @@ -126,7 +137,8 @@ oprofilefs_create_ulong(sb, root, "buffer_size", &fs_buffer_size); oprofilefs_create_ulong(sb, root, "buffer_watershed", &fs_buffer_watershed); oprofilefs_create_ulong(sb, root, "cpu_buffer_size", &fs_cpu_buffer_size); - oprofilefs_create_file(sb, root, "cpu_type", &cpu_type_fops); + oprofilefs_create_file(sb, root, "cpu_type", &cpu_type_fops); + oprofilefs_create_file(sb, root, "implementation", &implementation_fops); oprofilefs_create_file(sb, root, "backtrace_depth", &depth_fops); oprofilefs_create_file(sb, root, "pointer_size", &pointer_size_fops); oprofile_create_stats_files(sb, root); --- linux-2.6.16-perfmon2/arch/x86_64/oprofile/Makefile.orig 2006-03-20 00:53:29.000000000 -0500 +++ linux-2.6.16-perfmon2/arch/x86_64/oprofile/Makefile 2006-03-23 10:53:01.000000000 -0500 @@ -15,5 +15,6 @@ OPROFILE-$(CONFIG_X86_LOCAL_APIC) += nmi_int.o op_model_athlon.o op_model_p4.o \ op_model_ppro.o OPROFILE-$(CONFIG_X86_IO_APIC) += nmi_timer_int.o +OPROFILE-$(CONFIG_PERFMON) += perfmon.o oprofile-y = $(DRIVER_OBJS) $(addprefix ../../i386/oprofile/, $(OPROFILE-y)) --- linux-2.6.16-perfmon2/arch/i386/oprofile/nmi_int.c.orig 2006-03-20 00:53:29.000000000 -0500 +++ linux-2.6.16-perfmon2/arch/i386/oprofile/nmi_int.c 2006-03-23 10:53:01.000000000 -0500 @@ -415,6 +415,7 @@ ops->start = nmi_start; ops->stop = nmi_stop; ops->cpu_type = cpu_type; + ops->implementation = "oprofile"; printk(KERN_INFO "oprofile: using NMI interrupt.\n"); return 0; } --- linux-2.6.16-perfmon2/arch/i386/oprofile/Makefile.orig 2006-03-20 00:53:29.000000000 -0500 +++ linux-2.6.16-perfmon2/arch/i386/oprofile/Makefile 2006-03-23 10:53:02.000000000 -0500 @@ -10,3 +10,4 @@ oprofile-$(CONFIG_X86_LOCAL_APIC) += nmi_int.o op_model_athlon.o \ op_model_ppro.o op_model_p4.o oprofile-$(CONFIG_X86_IO_APIC) += nmi_timer_int.o +oprofile-$(CONFIG_PERFMON) += perfmon.o --- linux-2.6.16-perfmon2/arch/i386/oprofile/init.c.orig 2006-03-20 00:53:29.000000000 -0500 +++ linux-2.6.16-perfmon2/arch/i386/oprofile/init.c 2006-03-23 10:53:02.000000000 -0500 @@ -15,8 +15,10 @@ * with the NMI mode driver. */ +extern int op_perfmon_init(struct oprofile_operations * ops); extern int op_nmi_init(struct oprofile_operations * ops); extern int op_nmi_timer_init(struct oprofile_operations * ops); +extern void op_perfmon_exit(void); extern void op_nmi_exit(void); extern void x86_backtrace(struct pt_regs * const regs, unsigned int depth); @@ -27,8 +29,12 @@ ret = -ENODEV; +#ifdef CONFIG_PERFMON + ret = op_perfmon_init(ops); +#endif #ifdef CONFIG_X86_LOCAL_APIC - ret = op_nmi_init(ops); + if (ret < 0) + ret = op_nmi_init(ops); #endif #ifdef CONFIG_X86_IO_APIC if (ret < 0) @@ -42,6 +48,9 @@ void oprofile_arch_exit(void) { +#ifdef CONFIG_PERFMON + op_perfmon_exit(); +#endif #ifdef CONFIG_X86_LOCAL_APIC op_nmi_exit(); #endif --- /dev/null 2006-03-27 09:20:43.000437500 -0500 +++ linux-2.6.16-perfmon2/arch/i386/oprofile/perfmon.c 2006-03-27 09:54:16.000000000 -0500 @@ -0,0 +1,116 @@ +/** + * @file perfmon.c + * + * @remark Copyright 2003 OProfile authors + * @remark Read the file COPYING + * + * @author John Levon <levon@xxxxxxxxxxxxxxxxx> + */ + +#include <linux/kernel.h> +#include <linux/module.h> +#include <linux/config.h> +#include <linux/oprofile.h> +#include <linux/sched.h> +#include <linux/perfmon.h> +#include <asm/ptrace.h> +#include <asm/errno.h> + +static int allow_ints; + +static int +perfmon_get_size(unsigned int flags, void *data, size_t *size) +{ + /* This is just a dummy size. OProfile uses its own buffer + for the time being. */ + *size = sizeof (int); + + return 0; +} + +static int +perfmon_handler(void *buf, struct pfm_ovfl_arg *arg, + unsigned long ip, u64 stamp, void *data) +{ + int event = arg->pmd_eventid; + struct pt_regs * const regs = (struct pt_regs *) data; + + PFM_DBG_ovfl("oprofile overflow ip=%lx, event=%d", + instruction_pointer(regs), event); + + arg->ovfl_ctrl = PFM_OVFL_CTRL_RESET; + + /* the owner of the oprofile event buffer may have exited + * without perfmon being shutdown (e.g. SIGSEGV) + */ + if (allow_ints) + oprofile_add_sample(regs, event); + return 0; +} + + +static int perfmon_start(void) +{ + allow_ints = 1; + return 0; +} + + +static void perfmon_stop(void) +{ + allow_ints = 0; +} + + +#define OPROFILE_FMT_UUID { \ + 0x77, 0x7a, 0x6e, 0x61, 0x20, 0x65, 0x73, 0x69, \ + 0x74, 0x6e, 0x72, 0x20, 0x61, 0x65, 0x0a, 0x6c \ +} + +static struct pfm_smpl_fmt oprofile_fmt = { + .fmt_name = "oprofile_format", + .fmt_uuid = OPROFILE_FMT_UUID, + .fmt_getsize = perfmon_get_size, + .fmt_handler = perfmon_handler, + .fmt_flags = PFM_FMT_BUILTIN_FLAG, + .owner = THIS_MODULE, +}; + + +static char * get_cpu_type(void) +{ + /* FIXME: right now just dummied up for amd64. + This will need to list do the right thing for the + various x86 processors. + */ + return "x86-64/hammer"; +} + + +/* all the ops are handled via userspace for i386 oprofile using perfmon */ + +static int using_perfmon; + +int __init op_perfmon_init(struct oprofile_operations * ops) +{ + int ret = pfm_register_smpl_fmt(&oprofile_fmt); + if (ret) + return -ENODEV; + + ops->cpu_type = get_cpu_type(); + ops->start = perfmon_start; + ops->stop = perfmon_stop; + ops->implementation = "perfmon2"; + using_perfmon = 1; + printk(KERN_INFO "oprofile: using perfmon.\n"); + return 0; +} + + +void __exit op_perfmon_exit(void) +{ + if (!using_perfmon) + return; + + pfm_unregister_smpl_fmt(oprofile_fmt.fmt_uuid); +} --- linux-2.6.16-perfmon2/arch/i386/oprofile/nmi_timer_int.c.orig 2006-03-20 00:53:29.000000000 -0500 +++ linux-2.6.16-perfmon2/arch/i386/oprofile/nmi_timer_int.c 2006-03-23 10:53:02.000000000 -0500 @@ -50,6 +50,7 @@ ops->start = timer_start; ops->stop = timer_stop; ops->cpu_type = "timer"; + ops->implementation = "nmi_timer"; printk(KERN_INFO "oprofile: using NMI timer interrupt.\n"); return 0; } --- linux-2.6.16-perfmon2/include/linux/oprofile.h.orig 2006-03-20 00:53:29.000000000 -0500 +++ linux-2.6.16-perfmon2/include/linux/oprofile.h 2006-03-23 10:53:02.000000000 -0500 @@ -39,6 +39,8 @@ void (*backtrace)(struct pt_regs * const regs, unsigned int depth); /* CPU identification string. */ char * cpu_type; + /* Identify method of string. */ + char * implementation; }; /**

Next Message by Date: click to view message preview

Re: Patches to get oprofile to work with perfmon2 on amd64

Stephane Eranian wrote: Will, On Mon, Mar 27, 2006 at 11:09:57AM -0500, William Cohen wrote: I have gotten oprofile to make use of the new perfmon2 mechanism to collect samples. I currently have this running on my AMD64 laptop. The oprof_perfmon2-20060327.diff patches the oprofile user space code and perfmon2_oprof20060327.diff is for the kernel. The patches are still "work in progress" and there are certainly things that need to be corrected. The patches borrow heavily from the previous ia64 oprofile/perfmon support. Looking at /arch/i386/oprofile/perfmon.c, it is identical to the IA-64 version and the experimental i386 version I developed. I think we can move this format into the generic perfmon code in perfmon/. This way we only have one version to maintain. Yes, the changes for /arch/i386/oprofile/perfmon.c were pretty straightforward and would be the same for other architectures. Factoring out the code and making it common to the platforms is reasonable. Due to the different sampling mechanism that could be used for x86, /dev/oprofile/implement has been added so the sampling mechanism being used can be identify how the samples are being collected. Yes. I think there are things to do in this area. Perfmon2 does not support NMI-based sampling. On Itanium there is no NMI. On other architectures, if I understand clearly, NMI is used because it provides better coverage of kernel code. NMI cannot be masked therefore you can collect samples in code sections were interrupts are masked. Is that the ONLY motivation for this? Depending which kernel someone is using the same oprofile code for i386 and x86-64 platforms could use either the original oprofile or perfmon2 to access the performance monitoring hardware. It seemed easiest to have the /dev/oprofile have a file that explicitly stated the mechanism being used. This could also be used by GUIs and other tools to directly determine the profiling mechanism. I wanted to avoid inferring mechanism in uses by looking at a bunch of files. The native OProfile driver on x86-64 and i386 use the NMI. This does allow sampling in IRQ routines. However, need to make sure that the amount of time spent in the NMI handler is limited. Using the NMI routine appears to cause problems on some machines (e.g. laptops where the NMI could happen when the BIOS is doing some power management operation). Is there some idea of the overhead in the perfmon2 timer interval and sampling mechanisms? Rather than directly setting up the bits for the performance monitoring hardware libpfm is used to map the name to the appropriate bits. For processors with complicated constraints on the performance monitoring hardware this makes more sense than trying to duplicate the constraints mechanism in oprofile. Yes, you could use libpfm to simplify this part of the job. My understanding here is that there is already that logic about events/encodings/constraints in Oprofile. The only missing piece would be out to map OProfile register naming scheme to the perfmon2 naming scheme. Using libpfm just for this may look overkill in a sense. I need to look at how rgister names are handled across the various architectures OProfile supports. May be there is a simpler way that would not introduce a dependency on libpfm. OProfile has event and unit_mask files for each of the supported architecture in /usr/share/oprofile/{arch}/{model}. For example the x86-64 amd64 machine would use the event and unit_mask files in /usr/share/oprofile/x86-64/hammer. The constraints are much more complicated for the pentium 4 and and power processors. I would expect that libpfm will be able to do a better job there, once support is in libpfm for them. For the Pentium4 OProfile made a number of simplifications and reduce the available counters to 8 independent counters on non-ht processor and 4 independent counter on ht processor. There are also tagging events that are not handled by OProfile's mechanism. The power (ppc64) processors event selection mechanism is relatively complex. OProfile doe have events for it, but it isn't ideal. The goal here is to factor out the event mapping logic and have it in one place. Below are issues that still need to be fixed in the various areas of the oprofile/perfmon2 monitoring. kernel: - separating oprofiles processor id code from i386 nmi mechanism setup - have oprofile/perfmon2 identify cpu for real (currently just hardwired to amd64) This is something I don't quite understand in OProfile. Why is it that user code relies on CPU detection done by the OPRofile kernel code? The user code could as well detect the CPU model (via cpuid or equivalent). If you assume that the kernel code probes on init and disables itself if the CPU is not supported, then nothing bad can happen. The cpu identification is required for two purposes: 1) figure out how the oprofile module accesses the performance monitoring hardware. There are different methods of accessing the performance monitoring registers in ppro/p2/p3, p4, and athlon. 2) the user space needs to get the correct list of events to map event names to number and unit masks. The user-space could do find out the cpuid on it's own, but the oprofile native driver has to determine the information anyway. How would perfmon2 tools handle the case of multiple multiple architectures? Do the cpuid in user space and modprobe the appropriate module? What happens if the wrong perfmon kernel module is attepted to be loaded? Is there a check in the initalizaiton to make sure that it will works on the processor? - oprofile always uses perfmon2 if kernel configured with perfmon I think we have to do this otherwise we may have PMU access conflicts. I was thinking about the case that someone would prefer to use one of the other sampling mechanisms eg. the nmi or timer mechanism. On OProfile you can force the timer mechanism to be used. - module installation a bit odd: -install oprofile modules -opcontrol reads information to determine if perfmon2 used Yes that makes sense. -opcontrol install appropropriate perfmon module Yes, or it could be builtin. Has perfmon2 built-in been verified to work with multiple architectures? Don't want to have different kernels for EM64T and AMD64 or P6, Pentium M, P4. Is there some way of identifying that perfmon2 is available on the machine. Right now the oprofile/perfmon2 patch assumes it is always a module. - oprofile lies that it needs buffer space (perfmon_get_size()) so perfmon2 actually calls oprofile's perfmon_handler() I fixed that. This was a bug. The format detection code was wrong. Excellent. oprofile: - make translation of events names to bit patterns more robust: can hang if event is not found - verify that the event masking support works - get rid of fatal_error() function in opd_perfmon.c - ophelp get the available events from libpfm when possible libpfm: -make event mapping complete (lots of events missing for various processors) -libpfm isn't available on some procesors that perfmon supports (e.g. p4/ppc64) Yes, I know that for non Itanium, there are some events missing, sometimes because of umask combinations. Thanks for your patches. Thanks for perfmon2. -Will ------------------------------------------------------- This SF.Net email is sponsored by xPML, a groundbreaking scripting language that extends applications into web and mobile media. Attend the live webcast and join the prime developer group breaking into this new coding territory! http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642

Previous Message by Thread: click to view message preview

Patches to get oprofile to work with perfmon2 on amd64

I have gotten oprofile to make use of the new perfmon2 mechanism to collect samples. I currently have this running on my AMD64 laptop. The oprof_perfmon2-20060327.diff patches the oprofile user space code and perfmon2_oprof20060327.diff is for the kernel. The patches are still "work in progress" and there are certainly things that need to be corrected. The patches borrow heavily from the previous ia64 oprofile/perfmon support. Due to the different sampling mechanism that could be used for x86, /dev/oprofile/implement has been added so the sampling mechanism being used can be identify how the samples are being collected. Rather than directly setting up the bits for the performance monitoring hardware libpfm is used to map the name to the appropriate bits. For processors with complicated constraints on the performance monitoring hardware this makes more sense than trying to duplicate the constraints mechanism in oprofile. Below are issues that still need to be fixed in the various areas of the oprofile/perfmon2 monitoring. kernel: - separating oprofiles processor id code from i386 nmi mechanism setup - have oprofile/perfmon2 identify cpu for real (currently just hardwired to amd64) - oprofile always uses perfmon2 if kernel configured with perfmon - module installation a bit odd: -install oprofile modules -opcontrol reads information to determine if perfmon2 used -opcontrol install appropropriate perfmon module - oprofile lies that it needs buffer space (perfmon_get_size()) so perfmon2 actually calls oprofile's perfmon_handler() oprofile: - make translation of events names to bit patterns more robust: can hang if event is not found - verify that the event masking support works - get rid of fatal_error() function in opd_perfmon.c - ophelp get the available events from libpfm when possible libpfm: -make event mapping complete (lots of events missing for various processors) -libpfm isn't available on some procesors that perfmon supports (e.g. p4/ppc64) -Will --- oprofile-0.9.2-0.20060309-perfmon2/utils/opcontrol.perfmon2 2006-03-18 20:50:11.000000000 -0500 +++ oprofile-0.9.2-0.20060309-perfmon2/utils/opcontrol 2006-03-23 17:13:28.000000000 -0500 @@ -267,6 +267,14 @@ OP_COUNTERS=`ls $MOUNT/ | grep "^[0-9]\+\$" | tr "\n" " "` NR_CHOSEN=0 + OP_IMPLEMENTATION_DIR=$MOUNT/implementation + if test -f $OP_IMPLEMENTATION; then + OP_IMPLEMENTATION=`cat $OP_IMPLEMENTATION_DIR` + else + OP_IMPLEMENTATION="unspecified" + fi + + DEFAULT_EVENT=`$OPHELP --get-default-event` IS_TIMER=0 @@ -274,10 +282,42 @@ if test "$CPUTYPE" = "timer"; then IS_TIMER=1 else - case "$CPUTYPE" in + case $OP_IMPLEMENTATION in + perfmon2) + IS_PERFMON=$KERNEL_SUPPORT + # need to get the appropriate perfmon module installed + # FIXME need to remove them when they are not needed + case "$CPUTYPE" in + i386/ppro|i386/pii|i386/piii) + PERFMON_MOD="perfmon_p6" + ;; + i386/p6_mobile) + PERFMON_MOD="perfmon_pm" + ;; + #FIXME need to handle em64t + i386/p4|i386/p4-ht) + PERFMON_MOD="perfmon_p4" + ;; + i386/athlon|x86-64/hammer) + PERFMON_MOD="perfmon_amd" + ;; + esac + modprobe $PERFMON_MOD + if test "$?" != "0"; then + echo "Unable to load module $PERFMON_MOD." + # couldn't load the module + exit 1 + fi + ;; + unspecified) + case "$CPUTYPE" in ia64/*) IS_PERFMON=$KERNEL_SUPPORT ;; + esac + ;; + *) + ;; esac fi } --- oprofile-0.9.2-0.20060309-perfmon2/daemon/Makefile.am.perfmon2 2006-03-10 13:35:12.000000000 -0500 +++ oprofile-0.9.2-0.20060309-perfmon2/daemon/Makefile.am 2006-03-10 13:37:15.000000000 -0500 @@ -25,7 +25,7 @@ opd_anon.h \ opd_anon.c -LIBS=@POPT_LIBS@ @LIBERTY_LIBS@ +LIBS=@POPT_LIBS@ @LIBERTY_LIBS@ @PFM_LIBS@ AM_CPPFLAGS = \ -I ${top_srcdir}/libabi \ --- oprofile-0.9.2-0.20060309-perfmon2/daemon/opd_perfmon.c.perfmon2 2006-03-10 13:35:24.000000000 -0500 +++ oprofile-0.9.2-0.20060309-perfmon2/daemon/opd_perfmon.c 2006-03-10 16:04:36.000000000 -0500 @@ -8,7 +8,7 @@ * @author John Levon */ -#ifdef __ia64__ +#if defined( __ia64__) || defined(OPROF_PERFMON2) /* need this for sched_setaffinity() in <sched.h> */ #define _GNU_SOURCE @@ -33,6 +33,25 @@ #ifdef HAVE_SCHED_SETAFFINITY #include <sched.h> #endif +#ifdef OPROF_PERFMON2 +#include <perfmon/perfmon.h> +#include <perfmon/pfmlib.h> +#endif + +/* FIXME fatal_error is just temporary */ +static void fatal_error(char *fmt,...) __attribute__((noreturn)); + +static void +fatal_error(char *fmt, ...) +{ + va_list ap; + + va_start(ap, fmt); + vfprintf(stderr, fmt, ap); + va_end(ap); + + exit(1); +} extern op_cpu cpu_type; @@ -63,7 +82,7 @@ } #endif - +#ifndef OPROF_PERFMON2 #ifndef HAVE_PERFMONCTL #ifndef __NR_perfmonctl #define __NR_perfmonctl 1175 @@ -74,6 +93,7 @@ return syscall(__NR_perfmonctl, fd, cmd, arg, narg); } #endif +#endif static unsigned char uuid[16] = { @@ -97,7 +117,7 @@ static void perfmon_start_child(int ctx_fd) { - if (perfmonctl(ctx_fd, PFM_START, 0, 0) == -1) { + if (op_pfm_start(ctx_fd, NULL) == -1) { perror("Couldn't start perfmon: "); exit(EXIT_FAILURE); } @@ -106,7 +126,7 @@ static void perfmon_stop_child(int ctx_fd) { - if (perfmonctl(ctx_fd, PFM_STOP, 0, 0) == -1) { + if (op_pfm_stop(ctx_fd) == -1) { perror("Couldn't stop perfmon: "); exit(EXIT_FAILURE); } @@ -149,11 +169,12 @@ static void set_affinity(size_t cpu) { cpu_set_t set; + int err; CPU_ZERO(&set); CPU_SET(cpu, &set); - int err = sched_setaffinity(getpid(), sizeof(set), &set); + err = sched_setaffinity(getpid(), sizeof(set), &set); if (err == -1) { fprintf(stderr, "Failed to set affinity: %s\n", @@ -205,14 +226,18 @@ /** create the per-cpu context */ static void create_context(struct child * self) { +#ifdef OPROF_PERFMON2 + pfarg_ctx_t ctx; +#else pfarg_context_t ctx; +#endif int err; - memset(&ctx, 0, sizeof(pfarg_context_t)); + memset(&ctx, 0, sizeof(ctx)); memcpy(&ctx.ctx_smpl_buf_id, &uuid, 16); ctx.ctx_flags = PFM_FL_SYSTEM_WIDE; - err = perfmonctl(0, PFM_CREATE_CONTEXT, &ctx, 1); + err = op_pfm_create_context(&ctx); if (err == -1) { fprintf(stderr, "CREATE_CONTEXT failed: %s\n", strerror(errno)); @@ -223,17 +248,39 @@ } +/* FIXME need to factor out machine specific ia64 stuff */ /** program the perfmon counters */ static void write_pmu(struct child * self) { + int err; + size_t i, j; +#ifndef OPROF_PERFMON2 pfarg_reg_t pc[OP_MAX_COUNTERS]; pfarg_reg_t pd[OP_MAX_COUNTERS]; - int err; - size_t i; +#else + pfmlib_input_param_t inp; + pfmlib_output_param_t outp; + pfarg_pmc_t pc[OP_MAX_COUNTERS]; + pfarg_pmd_t pd[OP_MAX_COUNTERS]; + pfmlib_options_t pfmlib_options; + + /* + * pass options to library (optional) + */ + memset(&pfmlib_options, 0, sizeof(pfmlib_options)); + pfmlib_options.pfm_debug = 1; /* set to 1 for debug */ + pfmlib_options.pfm_verbose = 1; /* set to 1 for debug */ + pfm_set_options(&pfmlib_options); + + memset(&inp,0, sizeof(inp)); + memset(&outp,0, sizeof(outp)); +#endif /* OPROF_PERFMON2 */ memset(pc, 0, sizeof(pc)); memset(pd, 0, sizeof(pd)); +#ifndef OPROF_PERFMON2 + #define PMC_GEN_INTERRUPT (1UL << 5) #define PMC_PRIV_MONITOR (1UL << 6) /* McKinley requires pmc4 to have bit 23 set (enable PMU). @@ -257,22 +304,72 @@ pc[i].reg_value &= ~(0xf << 16); pc[i].reg_value |= ((event->um & 0xf) << 16); pc[i].reg_smpl_eventid = event->counter; - } - for (i = 0; i < op_nr_counters && opd_events[i].name; ++i) { - struct opd_event * event = &opd_events[i]; pd[i].reg_value = ~0UL - event->count + 1; pd[i].reg_short_reset = ~0UL - event->count + 1; pd[i].reg_num = event->counter + 4; + pd[i].reg_smpl_eventid = event->counter; } +#else - err = perfmonctl(self->ctx_fd, PFM_WRITE_PMCS, pc, i); + /* setup inp */ + inp.pfp_dfl_plm = PFM_PLM0; + + for (i = 0; i < op_nr_counters && opd_events[i].name; ++i) { + struct opd_event * event = &opd_events[i]; + /* Find the matching event */ + if (pfm_find_event(event->name, &inp.pfp_events[i].event) + != PFMLIB_SUCCESS) { + fatal_error("Cannot find %s event\n", event->name); + } + (event->user) ? (inp.pfp_events[i].plm |= PFM_PLM3) + : (inp.pfp_events[i].plm &= ~PFM_PLM3); + (event->kernel) ? (inp.pfp_events[i].plm |= PFM_PLM0) + : (inp.pfp_events[i].plm &= ~PFM_PLM0); + + /* set to sampling */ + /* interval between samples */ + } + inp.pfp_event_count = i; + + /* generate outp */ + err = pfm_dispatch_events(&inp, NULL, &outp, NULL); + if (err != PFMLIB_SUCCESS) { + fatal_error("cannot configure events: %s\n", pfm_strerror(err)); + + exit(EXIT_FAILURE); + } + + /* copy outp over */ + for (i=0; i < outp.pfp_pmc_count; i++) { + pc[i].reg_num = outp.pfp_pmcs[i].reg_num; + pc[i].reg_value = outp.pfp_pmcs[i].reg_value; + } + + /* + * figure out pmd mapping from output pmc + */ + for (i=0, j=0; i < inp.pfp_event_count; i++) { + struct opd_event * event = &opd_events[i]; + pd[i].reg_num = outp.pfp_pmcs[j].reg_pmd_num; + for(; j < outp.pfp_pmc_count; j++) if (outp.pfp_pmcs[j].reg_evt_idx != i) break; + /* fill out the rest of the information pmd */ + pd[i].reg_smpl_pmds[0] = 0; + pd[i].reg_flags |= PFM_REGFL_OVFL_NOTIFY; + pd[i].reg_reset_pmds[0] = 0; + pd[i].reg_value = - event->count; + pd[i].reg_short_reset = - event->count; + pd[i].reg_long_reset = - event->count; + } +#endif + + err = op_pfm_write_pmcs(self->ctx_fd, pc, i); if (err == -1) { perror("Couldn't write PMCs: "); exit(EXIT_FAILURE); } - err = perfmonctl(self->ctx_fd, PFM_WRITE_PMDS, pd, i); + err = op_pfm_write_pmds(self->ctx_fd, pd, i); if (err == -1) { perror("Couldn't write PMDs: "); exit(EXIT_FAILURE); @@ -288,7 +385,7 @@ memset(&load_args, 0, sizeof(load_args)); load_args.load_pid = self->pid; - err = perfmonctl(self->ctx_fd, PFM_LOAD_CONTEXT, &load_args, 1); + err = op_pfm_load_context(self->ctx_fd, &load_args); if (err == -1) { perror("Couldn't load context: "); exit(EXIT_FAILURE); @@ -316,6 +413,11 @@ { struct child * self = &children[cpu]; + if (pfm_initialize() != PFMLIB_SUCCESS) { + printf("Can't initialize library\n"); + exit(1); + } + self->pid = getpid(); self->sigusr1 = 0; self->sigusr2 = 0; @@ -461,4 +563,4 @@ kill(children[i].pid, SIGUSR2); } -#endif /* __ia64__ */ +#endif /* defined(__ia64__) || defined(OPROF_PERFMON2) */ --- oprofile-0.9.2-0.20060309-perfmon2/daemon/opd_perfmon.h.perfmon2 2006-03-10 13:35:34.000000000 -0500 +++ oprofile-0.9.2-0.20060309-perfmon2/daemon/opd_perfmon.h 2006-03-18 21:15:35.000000000 -0500 @@ -11,7 +11,7 @@ #ifndef OPD_PERFMON_H #define OPD_PERFMON_H -#ifdef __ia64__ +#if defined(__ia64__) || defined(OPROF_PERFMON2) #include <stdlib.h> @@ -20,6 +20,8 @@ void perfmon_start(void); void perfmon_stop(void); +#if (!defined(OPROF_PERFMON2)) + /* The following is from asm/perfmon.h. When it's installed on * enough boxes, we can remove this and include the platform * perfmon.h @@ -80,6 +82,53 @@ #define PFM_LOAD_CONTEXT 0x10 #define PFM_FL_SYSTEM_WIDE 0x02 +/* wrapper to allow older perfmon interface to be used */ +/* FIXME need to be set correcly for older perfmon */ +#define op_pfm_create_context(ctx) perfmonctl(0, PFM_CREATE_CONTEXT, ctx, 1) +#define op_pfm_write_pmcs(fd, pmcs, count) \ + perfmonctl(fd, PFM_WRITE_PMCS, pmcs, count) +#define op_pfm_write_pmds(fd, pmds, count) \ + perfmonctl(fd, PFM_WRITE_PMDS, pmds, count) +#define op_pfm_read_pmds(fd, pmds, count) \ + perfmonctl(fd, PFM_READ_PMDS, pmds, count) +#define op_pfm_load_context(fd, load) \ + perfmonctl(fd, PFM_LOAD_CONTEXT, load, 1) +#define op_pfm_start(fd, start) \ + perfmonctl(fd, PFM_START, start, 1) +#define op_pfm_stop(fd) \ + perfmonctl(fd, PFM_STOP, NULL, 0) +#define op_pfm_restart(fd) \ + perfmonctl(fd, PFM_RESTART, NULL, 0) +#define op_pfm_create_evtsets(fd, setd, count) \ + perfmonctl(fd, PFM_CREATE_EVTSETS, setd, count) +#define op_pfm_getinfo_evtsets(fd, info, count) \ + perfmonctl(fd, PFM_GETINOF, info, count) +#define op_pfm_delete_evtsets(fd, setd, count) \ + perfmonctl(fd, PFM_DELETE_EVTSETS, setd, count) +#define op_pfm_unload_context(fd) \ + perfmonctl(fd, PFM_UNLOAD_CONTEXT, NULL, 0) + +#else + +/* wrapper to allow older perfmon interface to be used */ +#define op_pfm_create_context(ctx) pfm_create_context(ctx, NULL, 0) +#define op_pfm_write_pmcs(fd, pmcs, count) pfm_write_pmcs(fd, pmcs, count) +#define op_pfm_write_pmds(fd, pmds, count) pfm_write_pmds(fd, pmds, count) +#define op_pfm_read_pmds(fd, pmds, count) pfm_read_pmds(fd, pmds, count) +#define op_pfm_load_context(fd, load) pfm_load_context(fd, load) +#define op_pfm_start(fd, start) pfm_start(fd, start) +#define op_pfm_stop(fd) pfm_stop(fd) +#define op_pfm_restart(fd) pfm_restart(fd) +#define op_pfm_create_evtsets(fd, setd, count) \ + pfm_create_evtsets(fd, setd, count) +#define op_pfm_getinfo_evtsets(fd, info, count) \ + pfm_getinfo_evtsets(fd, info, count) +#define op_pfm_delete_evtsets(fd, setd, count) \ + pfm_delete_evtsets(fd, setd, count) +#define op_pfm_unload_context(fd) pfm_unload_context(fd) + +#endif /* (!defined(OPROF_PERFMON2)) */ + #else void perfmon_init(void) @@ -101,6 +150,6 @@ { } -#endif /* __ia64__ */ +#endif /* defined(__ia64__) || defined(OPROF_PERFMON2) */ #endif /* OPD_PERFMON_H */ --- oprofile-0.9.2-0.20060309-perfmon2/configure.in.perfmon2 2006-03-10 13:35:04.000000000 -0500 +++ oprofile-0.9.2-0.20060309-perfmon2/configure.in 2006-03-10 13:36:41.000000000 -0500 @@ -133,6 +133,18 @@ AC_SUBST(BFD_LIBS) AC_SUBST(POPT_LIBS) +dnl enable option to use perfmon use on processors other than ia64 +AC_ARG_ENABLE(perfmon2, + [ --enable-perfmon2 enable option for perfmon2 use on non-ia64 processors (default is disabled)], + enable_perfmon2=$enableval, enable_perfmon2=no) +if test "$enable_perfmon2" = yes; then + AC_CHECK_LIB(pfm, pfm_start,, AC_MSG_ERROR([pfm library not found])) + PFM_LIBS="-lpfm" + AC_SUBST(PFM_LIBS) + AX_CFLAGS_OPTION(OP_CFLAGS,[-DOPROF_PERFMON2]) + AX_CXXFLAGS_OPTION(OP_CXXFLAGS,[-DOPROF_PERFMON2]) +fi + # do NOT put tests here, they will fail in the case X is not installed ! AM_CONDITIONAL(have_qt, test -n "$QT_LIB") --- linux-2.6.16-perfmon2/drivers/oprofile/timer_int.c.orig 2006-03-20 00:53:29.000000000 -0500 +++ linux-2.6.16-perfmon2/drivers/oprofile/timer_int.c 2006-03-23 10:53:01.000000000 -0500 @@ -43,4 +43,5 @@ ops->start = timer_start; ops->stop = timer_stop; ops->cpu_type = "timer"; + ops->implementation = "timer"; } --- linux-2.6.16-perfmon2/drivers/oprofile/oprofile_files.c.orig 2006-03-20 00:53:29.000000000 -0500 +++ linux-2.6.16-perfmon2/drivers/oprofile/oprofile_files.c 2006-03-23 10:53:01.000000000 -0500 @@ -65,13 +65,24 @@ { return oprofilefs_str_to_user(oprofile_ops.cpu_type, buf, count, offset); } - - + + static struct file_operations cpu_type_fops = { .read = cpu_type_read, }; - - + + +static ssize_t implementation(struct file * file, char __user * buf, size_t count, loff_t * offset) +{ + return oprofilefs_str_to_user(oprofile_ops.implementation, buf, count, offset); +} + + +static struct file_operations implementation_fops = { + .read = implementation, +}; + + static ssize_t enable_read(struct file * file, char __user * buf, size_t count, loff_t * offset) { return oprofilefs_ulong_to_user(oprofile_started, buf, count, offset); @@ -126,7 +137,8 @@ oprofilefs_create_ulong(sb, root, "buffer_size", &fs_buffer_size); oprofilefs_create_ulong(sb, root, "buffer_watershed", &fs_buffer_watershed); oprofilefs_create_ulong(sb, root, "cpu_buffer_size", &fs_cpu_buffer_size); - oprofilefs_create_file(sb, root, "cpu_type", &cpu_type_fops); + oprofilefs_create_file(sb, root, "cpu_type", &cpu_type_fops); + oprofilefs_create_file(sb, root, "implementation", &implementation_fops); oprofilefs_create_file(sb, root, "backtrace_depth", &depth_fops); oprofilefs_create_file(sb, root, "pointer_size", &pointer_size_fops); oprofile_create_stats_files(sb, root); --- linux-2.6.16-perfmon2/arch/x86_64/oprofile/Makefile.orig 2006-03-20 00:53:29.000000000 -0500 +++ linux-2.6.16-perfmon2/arch/x86_64/oprofile/Makefile 2006-03-23 10:53:01.000000000 -0500 @@ -15,5 +15,6 @@ OPROFILE-$(CONFIG_X86_LOCAL_APIC) += nmi_int.o op_model_athlon.o op_model_p4.o \ op_model_ppro.o OPROFILE-$(CONFIG_X86_IO_APIC) += nmi_timer_int.o +OPROFILE-$(CONFIG_PERFMON) += perfmon.o oprofile-y = $(DRIVER_OBJS) $(addprefix ../../i386/oprofile/, $(OPROFILE-y)) --- linux-2.6.16-perfmon2/arch/i386/oprofile/nmi_int.c.orig 2006-03-20 00:53:29.000000000 -0500 +++ linux-2.6.16-perfmon2/arch/i386/oprofile/nmi_int.c 2006-03-23 10:53:01.000000000 -0500 @@ -415,6 +415,7 @@ ops->start = nmi_start; ops->stop = nmi_stop; ops->cpu_type = cpu_type; + ops->implementation = "oprofile"; printk(KERN_INFO "oprofile: using NMI interrupt.\n"); return 0; } --- linux-2.6.16-perfmon2/arch/i386/oprofile/Makefile.orig 2006-03-20 00:53:29.000000000 -0500 +++ linux-2.6.16-perfmon2/arch/i386/oprofile/Makefile 2006-03-23 10:53:02.000000000 -0500 @@ -10,3 +10,4 @@ oprofile-$(CONFIG_X86_LOCAL_APIC) += nmi_int.o op_model_athlon.o \ op_model_ppro.o op_model_p4.o oprofile-$(CONFIG_X86_IO_APIC) += nmi_timer_int.o +oprofile-$(CONFIG_PERFMON) += perfmon.o --- linux-2.6.16-perfmon2/arch/i386/oprofile/init.c.orig 2006-03-20 00:53:29.000000000 -0500 +++ linux-2.6.16-perfmon2/arch/i386/oprofile/init.c 2006-03-23 10:53:02.000000000 -0500 @@ -15,8 +15,10 @@ * with the NMI mode driver. */ +extern int op_perfmon_init(struct oprofile_operations * ops); extern int op_nmi_init(struct oprofile_operations * ops); extern int op_nmi_timer_init(struct oprofile_operations * ops); +extern void op_perfmon_exit(void); extern void op_nmi_exit(void); extern void x86_backtrace(struct pt_regs * const regs, unsigned int depth); @@ -27,8 +29,12 @@ ret = -ENODEV; +#ifdef CONFIG_PERFMON + ret = op_perfmon_init(ops); +#endif #ifdef CONFIG_X86_LOCAL_APIC - ret = op_nmi_init(ops); + if (ret < 0) + ret = op_nmi_init(ops); #endif #ifdef CONFIG_X86_IO_APIC if (ret < 0) @@ -42,6 +48,9 @@ void oprofile_arch_exit(void) { +#ifdef CONFIG_PERFMON + op_perfmon_exit(); +#endif #ifdef CONFIG_X86_LOCAL_APIC op_nmi_exit(); #endif --- /dev/null 2006-03-27 09:20:43.000437500 -0500 +++ linux-2.6.16-perfmon2/arch/i386/oprofile/perfmon.c 2006-03-27 09:54:16.000000000 -0500 @@ -0,0 +1,116 @@ +/** + * @file perfmon.c + * + * @remark Copyright 2003 OProfile authors + * @remark Read the file COPYING + * + * @author John Levon <levon@xxxxxxxxxxxxxxxxx> + */ + +#include <linux/kernel.h> +#include <linux/module.h> +#include <linux/config.h> +#include <linux/oprofile.h> +#include <linux/sched.h> +#include <linux/perfmon.h> +#include <asm/ptrace.h> +#include <asm/errno.h> + +static int allow_ints; + +static int +perfmon_get_size(unsigned int flags, void *data, size_t *size) +{ + /* This is just a dummy size. OProfile uses its own buffer + for the time being. */ + *size = sizeof (int); + + return 0; +} + +static int +perfmon_handler(void *buf, struct pfm_ovfl_arg *arg, + unsigned long ip, u64 stamp, void *data) +{ + int event = arg->pmd_eventid; + struct pt_regs * const regs = (struct pt_regs *) data; + + PFM_DBG_ovfl("oprofile overflow ip=%lx, event=%d", + instruction_pointer(regs), event); + + arg->ovfl_ctrl = PFM_OVFL_CTRL_RESET; + + /* the owner of the oprofile event buffer may have exited + * without perfmon being shutdown (e.g. SIGSEGV) + */ + if (allow_ints) + oprofile_add_sample(regs, event); + return 0; +} + + +static int perfmon_start(void) +{ + allow_ints = 1; + return 0; +} + + +static void perfmon_stop(void) +{ + allow_ints = 0; +} + + +#define OPROFILE_FMT_UUID { \ + 0x77, 0x7a, 0x6e, 0x61, 0x20, 0x65, 0x73, 0x69, \ + 0x74, 0x6e, 0x72, 0x20, 0x61, 0x65, 0x0a, 0x6c \ +} + +static struct pfm_smpl_fmt oprofile_fmt = { + .fmt_name = "oprofile_format", + .fmt_uuid = OPROFILE_FMT_UUID, + .fmt_getsize = perfmon_get_size, + .fmt_handler = perfmon_handler, + .fmt_flags = PFM_FMT_BUILTIN_FLAG, + .owner = THIS_MODULE, +}; + + +static char * get_cpu_type(void) +{ + /* FIXME: right now just dummied up for amd64. + This will need to list do the right thing for the + various x86 processors. + */ + return "x86-64/hammer"; +} + + +/* all the ops are handled via userspace for i386 oprofile using perfmon */ + +static int using_perfmon; + +int __init op_perfmon_init(struct oprofile_operations * ops) +{ + int ret = pfm_register_smpl_fmt(&oprofile_fmt); + if (ret) + return -ENODEV; + + ops->cpu_type = get_cpu_type(); + ops->start = perfmon_start; + ops->stop = perfmon_stop; + ops->implementation = "perfmon2"; + using_perfmon = 1; + printk(KERN_INFO "oprofile: using perfmon.\n"); + return 0; +} + + +void __exit op_perfmon_exit(void) +{ + if (!using_perfmon) + return; + + pfm_unregister_smpl_fmt(oprofile_fmt.fmt_uuid); +} --- linux-2.6.16-perfmon2/arch/i386/oprofile/nmi_timer_int.c.orig 2006-03-20 00:53:29.000000000 -0500 +++ linux-2.6.16-perfmon2/arch/i386/oprofile/nmi_timer_int.c 2006-03-23 10:53:02.000000000 -0500 @@ -50,6 +50,7 @@ ops->start = timer_start; ops->stop = timer_stop; ops->cpu_type = "timer"; + ops->implementation = "nmi_timer"; printk(KERN_INFO "oprofile: using NMI timer interrupt.\n"); return 0; } --- linux-2.6.16-perfmon2/include/linux/oprofile.h.orig 2006-03-20 00:53:29.000000000 -0500 +++ linux-2.6.16-perfmon2/include/linux/oprofile.h 2006-03-23 10:53:02.000000000 -0500 @@ -39,6 +39,8 @@ void (*backtrace)(struct pt_regs * const regs, unsigned int depth); /* CPU identification string. */ char * cpu_type; + /* Identify method of string. */ + char * implementation; }; /**

Next Message by Thread: click to view message preview

Re: Patches to get oprofile to work with perfmon2 on amd64

Stephane Eranian wrote: Will, On Mon, Mar 27, 2006 at 11:09:57AM -0500, William Cohen wrote: I have gotten oprofile to make use of the new perfmon2 mechanism to collect samples. I currently have this running on my AMD64 laptop. The oprof_perfmon2-20060327.diff patches the oprofile user space code and perfmon2_oprof20060327.diff is for the kernel. The patches are still "work in progress" and there are certainly things that need to be corrected. The patches borrow heavily from the previous ia64 oprofile/perfmon support. Looking at /arch/i386/oprofile/perfmon.c, it is identical to the IA-64 version and the experimental i386 version I developed. I think we can move this format into the generic perfmon code in perfmon/. This way we only have one version to maintain. Yes, the changes for /arch/i386/oprofile/perfmon.c were pretty straightforward and would be the same for other architectures. Factoring out the code and making it common to the platforms is reasonable. Due to the different sampling mechanism that could be used for x86, /dev/oprofile/implement has been added so the sampling mechanism being used can be identify how the samples are being collected. Yes. I think there are things to do in this area. Perfmon2 does not support NMI-based sampling. On Itanium there is no NMI. On other architectures, if I understand clearly, NMI is used because it provides better coverage of kernel code. NMI cannot be masked therefore you can collect samples in code sections were interrupts are masked. Is that the ONLY motivation for this? Depending which kernel someone is using the same oprofile code for i386 and x86-64 platforms could use either the original oprofile or perfmon2 to access the performance monitoring hardware. It seemed easiest to have the /dev/oprofile have a file that explicitly stated the mechanism being used. This could also be used by GUIs and other tools to directly determine the profiling mechanism. I wanted to avoid inferring mechanism in uses by looking at a bunch of files. The native OProfile driver on x86-64 and i386 use the NMI. This does allow sampling in IRQ routines. However, need to make sure that the amount of time spent in the NMI handler is limited. Using the NMI routine appears to cause problems on some machines (e.g. laptops where the NMI could happen when the BIOS is doing some power management operation). Is there some idea of the overhead in the perfmon2 timer interval and sampling mechanisms? Rather than directly setting up the bits for the performance monitoring hardware libpfm is used to map the name to the appropriate bits. For processors with complicated constraints on the performance monitoring hardware this makes more sense than trying to duplicate the constraints mechanism in oprofile. Yes, you could use libpfm to simplify this part of the job. My understanding here is that there is already that logic about events/encodings/constraints in Oprofile. The only missing piece would be out to map OProfile register naming scheme to the perfmon2 naming scheme. Using libpfm just for this may look overkill in a sense. I need to look at how rgister names are handled across the various architectures OProfile supports. May be there is a simpler way that would not introduce a dependency on libpfm. OProfile has event and unit_mask files for each of the supported architecture in /usr/share/oprofile/{arch}/{model}. For example the x86-64 amd64 machine would use the event and unit_mask files in /usr/share/oprofile/x86-64/hammer. The constraints are much more complicated for the pentium 4 and and power processors. I would expect that libpfm will be able to do a better job there, once support is in libpfm for them. For the Pentium4 OProfile made a number of simplifications and reduce the available counters to 8 independent counters on non-ht processor and 4 independent counter on ht processor. There are also tagging events that are not handled by OProfile's mechanism. The power (ppc64) processors event selection mechanism is relatively complex. OProfile doe have events for it, but it isn't ideal. The goal here is to factor out the event mapping logic and have it in one place. Below are issues that still need to be fixed in the various areas of the oprofile/perfmon2 monitoring. kernel: - separating oprofiles processor id code from i386 nmi mechanism setup - have oprofile/perfmon2 identify cpu for real (currently just hardwired to amd64) This is something I don't quite understand in OProfile. Why is it that user code relies on CPU detection done by the OPRofile kernel code? The user code could as well detect the CPU model (via cpuid or equivalent). If you assume that the kernel code probes on init and disables itself if the CPU is not supported, then nothing bad can happen. The cpu identification is required for two purposes: 1) figure out how the oprofile module accesses the performance monitoring hardware. There are different methods of accessing the performance monitoring registers in ppro/p2/p3, p4, and athlon. 2) the user space needs to get the correct list of events to map event names to number and unit masks. The user-space could do find out the cpuid on it's own, but the oprofile native driver has to determine the information anyway. How would perfmon2 tools handle the case of multiple multiple architectures? Do the cpuid in user space and modprobe the appropriate module? What happens if the wrong perfmon kernel module is attepted to be loaded? Is there a check in the initalizaiton to make sure that it will works on the processor? - oprofile always uses perfmon2 if kernel configured with perfmon I think we have to do this otherwise we may have PMU access conflicts. I was thinking about the case that someone would prefer to use one of the other sampling mechanisms eg. the nmi or timer mechanism. On OProfile you can force the timer mechanism to be used. - module installation a bit odd: -install oprofile modules -opcontrol reads information to determine if perfmon2 used Yes that makes sense. -opcontrol install appropropriate perfmon module Yes, or it could be builtin. Has perfmon2 built-in been verified to work with multiple architectures? Don't want to have different kernels for EM64T and AMD64 or P6, Pentium M, P4. Is there some way of identifying that perfmon2 is available on the machine. Right now the oprofile/perfmon2 patch assumes it is always a module. - oprofile lies that it needs buffer space (perfmon_get_size()) so perfmon2 actually calls oprofile's perfmon_handler() I fixed that. This was a bug. The format detection code was wrong. Excellent. oprofile: - make translation of events names to bit patterns more robust: can hang if event is not found - verify that the event masking support works - get rid of fatal_error() function in opd_perfmon.c - ophelp get the available events from libpfm when possible libpfm: -make event mapping complete (lots of events missing for various processors) -libpfm isn't available on some procesors that perfmon supports (e.g. p4/ppc64) Yes, I know that for non Itanium, there are some events missing, sometimes because of umask combinations. Thanks for your patches. Thanks for perfmon2. -Will ------------------------------------------------------- This SF.Net email is sponsored by xPML, a groundbreaking scripting language that extends applications into web and mobile media. Attend the live webcast and join the prime developer group breaking into this new coding territory! http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
Sign up for updates to this mailing list. email:
Loading Comments...
Home | News | Patents | Sitemap | FAQ | advertise

Advertising by