With changeset 415e971ad9, support for easy CPU profiling using Google Perftools. There are changes to the build system to link with the perftools library, and there is a startup plugin to automatically start profiling without the need to set environment variables.
Usage
First, build and install Google Perftools using the normal installation procedure. By default, /usr/local is searched by CMake, so keeping this default is easiest. CMake contains two relevant variables:
- K3D_ENABLE_PROFILING: Enabling this links the SDK and all modules against the profiling library, so profiling can be enabled. It also adds the google perftools include directory to the include path.
- K3D_BUILD_GOOGLE_PERFTOOLS_MODULE: Builds the startup module, which automatically activates profiling on startup. If you want to restrict profiling by adding ProfilerStart/ProfilerStop calls in your C++ code, you should disable this module.
By default, profiles are saved to the temp dir (/tmp on linux), this can be configured using the “profile_files” path option in .k3d/options.k3d. If all goes well, you should should see a line like
DEBUG: google_perftools: saving profile data to file /tmp/k3d-profile-0.pprof
when running K-3D. A profile file is never overwritten, the index is incremented as with the preview renders. The recommended build type for profiling is RelWithDebugInfo. This ensures performance is close to the real-life case. If the call graph is unclear, a full debug build will show more details, as there are no optimizations.
Adapting pprof
By default, pprof (the profile data processing tool) only supports libs with a .so, .dylib or .dll extension. Since K-3D plugins use .module, you need to add this to the regular expression on line 3246 in the pprof perl script, like this:
if ($l =~ /^($h)-($h)\s+..x.\s+($h)\s+\S+:\S+\s+\d+\s+(\S+\.(so|dll|dylib|module)(\.\d+)*\w*)/i) {
Examples
Just for fun, I tried profiling the SDS code, by applying 6 iterations of CatmullClark on the default polysphere. The profiling file was processed using:
pprof --gv bin/k3d /tmp/k3d-profile-0.pprof
Here, the call was issued from the build dir. If you execute from somewhere else, bin/k3d should be replaced with the path to the K-3D executable. The command produces a nice call graph of the entire K-3D run, where the biggest functions represent the biggest CPU consmers. For SDS, this surprisingly turns out to be __gettimeofday:

SDS profile, full
We can get more information about this by “focusing” on the call of interest and showing more nodes, using
pprof --gv bin/k3d --focus=__gettimeofday /tmp/k3d-profile-0.pprof
Note that this was done using a debug build, not a RelWithDebInfo build, since some intermediate calls are optimized out otherwise. The resulting graph is:

The problem is obvious now: During the development, SDS code was profiled manually by adding k3d::timer calls at different points in the code. In this case, there is one in the face_center_calculator, in order to evaluate the importance of attribute data updates. As it turns out, this timing call is now dominant over the SDS calculation itself! In retrospect, adding that many timing calls was silly, but proper profiling pointed out the error very quickly indeed.
Since we have profiling support now, all timing calls were removed from the SDS code in changeset f5c14e68f4. As a result, the Catmull-Clark operation is twice as fast, and the update step is more than 3 times as fast. Not bad for a 5 minute profiling run!
f5c14e68f4
Tweet This Post