Re: Turn off PROFILE option and remove WITH_PROFILE after FreeBSD 13?

From: Steve Kargl <sgk_at_troutmask.apl.washington.edu>
Date: Fri, 17 Jan 2020 11:29:26 -0800
On Fri, Jan 17, 2020 at 01:12:32PM -0500, Ed Maste wrote:
> On Fri, 17 Jan 2020 at 12:19, Steve Kargl
> <sgk_at_troutmask.apl.washington.edu> wrote:
> >
> > Why?  Because adding -pg to the gfortran command line is sufficient
> > to getting profiling information for long running numerically
> > intensive codes.  'gfortran -pg', of course, loosk for libc_p.a
> > and libm_p.a.
> 
> Have you tried sampling-based profiling (i.e., hwpmc)? I'm curious if
> it provides equal utility for you, or if there's some shortcoming.

Never needed to try hwpmc.

% gfortran9 -o z -pg fortran_file.f90

just works if libc_p.a and libm_p.a are present.  There is a link-time
failure if the libraries are missing.  Here's an example of using -pg
that found a bottleneck in my code (which I haven't profiled recently).

Each sample counts as 0.000123062 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls   s/call   s/call  name    
 46.80    275.68   275.68 1178817696   0.00     0.00  __lum_MOD_cludet_dble
 11.55    343.73    68.05 19458348     0.00     0.00  __sjnm_MOD_csjn_dble
  7.09    385.47    41.73 19458348     0.00     0.00  __sphere_MOD_sphere_shell_formfcn
  5.97    420.63    35.16 97291740     0.00     0.00  __sjnm_MOD_sjn_dble
  3.84    443.24    22.61 23712564606  0.00     0.00  cabs (w_cabs.c:17 _at_ 4968f0)

The cludet_dble() routine is a bottleneck, which makes heavy use of cabs().
It so happens that cludet_dble doesn't need to use cabs, and instead can
look at the magnitude square.  Replacing cabs(z) with creal(z)**2 + cimag(z)**2
gives

Each sample counts as 0.000123062 seconds.
  %   cumulative   self              self     total           
 53.93    232.70   232.70 1178817696   0.00     0.00  __lum_MOD_cludet_dble
 15.84    301.02    68.32 19458348     0.00     0.00  __sjnm_MOD_csjn_dble
 10.63    346.91    45.88 19458348     0.00     0.00  __sphere_MOD_sphere_shell_formfcn
  7.84    380.71    33.81 97291740     0.00     0.00  __sjnm_MOD_sjn_dble

Nominally, a 43 CPU seconds decrease.  That 43 seconds accumulates quickly,
when the code is executed a few thousand times for Monte Carlo simulations.

Is there a trivially stupid way of using hwpmc that requires no changes
to fortran_file.f90?

PS: For those snickering about the word Fortran.  Go read the Fortran 2018
standard and educate yourselves.  You want document 007 from
https://j3-fortran.org/doc/standing. 

-- 
Steve
Received on Fri Jan 17 2020 - 18:29:29 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:22 UTC