Topic: execution profile

programming > Group: testing

code generation
algorithmic complexity analysis
code optimization
code optimization by advice and statistics
execution tracing
function cost
program statistics
testing testing

topics a-e


Two statistics have been found particularly useful in measuring program performance: statement counts and procedure timings. The first, statement counts, is used for locating frequently executed code, for testing, for debugging, and for performance information. Procedure timing is used for optimization and program comparison. (cbb 5/80)
Subtopic: importance of profiling up

Quote: system design requires 'real' implementations to test heavy loads and expansion [birrAD4_1982]
Quote: every programming language processor should include a profiler; for tuning and debugging [woolJD_1973]
Quote: every compiler needs a profiler for statement counts and procedure timings [siteRL12_1978]
Quote: post mortem dump and execution profile are important parts of a programming system [sattEH5_1975]

Subtopic: timing analysis up

Quote: timing analysis of safety-critical embedded systems; bound the execution time of instruction sequences based on all possible execution histories; account for caches, deep pipelines, out-of-order execution, and branch prediction [wilhR2_2014]

Subtopic: actionable profile up

Quote: a profile is actionable if acting on it yields the expected outcome; slowing down a method is easier than speeding it up; retain memory layout [mytkT6_2010]

Subtopic: bloat up

Quote: identify bloat with copy profiling, chains of data copies, and copy graphs; found hot copy chains and temporary data structures [xuG6_2009]
Quote: java suffers from systemic runtime bloat; e.g., extracting name-value pairs costs 1000 method invocations and 35 temporary objects [xuG6_2009]
Quote: large-scale Java applications suffer from runtime bloat and poor scalability; e.g., instantiating SimpleDateFormat takes 123 calls and 44 objects [mitcN1_2010]
Quote: few hot spots in large Java applications; in a document management system only 14 methods contributed more than 1% of time apiece [xuG6_2009]

Subtopic: avoid garbage up

Quote: Go has a stop-the-world garbage collector; avoid creating garbage via profiling tools; no need to tune the garbage collector [meyeJ10_2014]

Subtopic: idleness up

Quote: WAIT diagnosis of the root cause of idleness in multi-tier server applications; samples Java activity on one tier; easy to use and understand [altmE10_2010]
Quote: assign each thread a Wait State whether delayed by disk, GC, or network; helps identify bottlenecks [altmE10_2010]

Subtopic: flame graph up

Quote: a flame graph is a visualization of stack traces by function name; y-axis is stack depth; x-axis is alphabetical names with warm colors; the width of a name is its frequency at this stack-depth [gregB6_2016]
Quote: flame graphs -- a large plateau is a frequent stack trace, reading top-down shows ancestry, reading bottom-up shows code flow, major forks are two or more large towers atop a function [gregB6_2016]
Quote: flame graphs of profiler output -- stall cycles of blocked code paths; cycles per instruction, malloc/brk/mmap, I/O, descheduled threads, wakeups [gregB6_2016]

Subtopic: path profile up

Quote: efficient path profiling; 2x longer paths than edge profiling [ballT12_1996]
Quote: most routines have less than 3000 potential paths, allowing path coverage testing; but millions of potential paths for the program as a whole [ballT12_1996]
Quote: encode potential, execution paths as states; compute transitions with arithmetic operations; transform control-flow graph into an acyclic graph [ballT12_1996]
Quote: path profiles from short, training runs covered most of the paths found in full runs; useful for profile-driven compilation [ballT12_1996]
Quote: preferential path profiling assigns compact numbers to interesting paths; indexed by array instead of hash; reduced overhead from 50% to 15% [vaswK1_2007]

Subtopic: reuse profile up

Quote: SLO profiles temporal data locality; arrows indicate distant, use-reuse pairs, horizontal bars indicate cache misses, vertical bars suggest refactoring; improved speedup by 9% to 400% [beylK2_2009]

Subtopic: stack profile up

Quote: assign a Category to each stack frame; identify the Primary Category for a thread; e.g., Database Query or Client Communication [altmE10_2010]
Quote: hierarchical categorization by Wait State, Primary Category, Category stack, stack details; natural model for an intuitive user interface [altmE10_2010]

Subtopic: profiling loops up

Quote: the loop call tree contains a node for each function call, loop execution, and loop iteration; track and classify reuse paths for temporal data locality [beylK2_2009]
Quote: recursive time counts the number of recursive calls, charging only for loops; ignores the real execution time for each operation [hehnECR10_1990]

Subtopic: time bound up

Quote: a time bound divides shorter computations from longer ones; cannot promise results in a finite but unbounded time [hehnECR10_1990]

Subtopic: timing services up

Quote: timing services are woefully inadequate, even for computational tasks [kernBW7_1998]

Subtopic: time vs. operation count up

Quote: for search, speed and character comparisons may not be related; e.g., diagram search is 386x better by comparisons and 240x to 13x better by speed [fenwP7_2001a]
Quote: speed varies widely across test data, computers, compilers, and optimization level; simple measures such as comparisons may be ineffectual [fenwP7_2001a]
Quote: simple algorithms often faster because easily optimized and good cache performance [fenwP7_2001a]
Quote: gprof estimates child time from call frequency; inaccurate for functional programs, common design patterns in object-oriented programs, and mutual recursion [spivJM3_2004]

Subtopic: counts, total time, local time up

Quote: gprof combines call graph arcs with program histogram data for elapsed time, call counts, and time per subroutine [grahSL6_1982]
Quote: a profiler needs statement counts for testing and procedure times for tuning [siteRL12_1978]
QuoteRef: sattEH5_1975 ;;12 gives profile of execution counts on first statement of each block and rest of block by;
QuoteRef: knutDE1_1971 ;;11 "profile" of execution frequency counts very useful

Subtopic: in situ aggregation up

Quote: DTrace's quantize() generates a power-of-two histogram, in situ; for example, frequency of write system calls by size by application [cantB2_2006]
Quote: DTrace aggregates data at the source by CPU; keyed by arbitrary n-tuple; periodically aggregated across CPUs [cantB2_2006]

Subtopic: memory profiling up

Quote: use adaptive profiling to identify memory leaks in long running programs; sample code segments inversely to execution frequency; a leak is a stale object that is not accessed; SWAT has a low false positive rate [chilTM10_2004]
Quote: memory management statistics needed to improve algorithms and memory management strategies [riplGD3_1978]
Quote: use low overhead profiling to reorganize layout for a generational garbage collector; reduces cache miss by 21-42% [chilTM10_1998]

Subtopic: startup profiling up

Quote: can not charge startup costs consistently; depends on dynamically loaded code [kernBW7_1998]

Subtopic: embedded programs up

Quote: MicroTool measures min/max execution time and stack requirements from interrupt rates and min/max iteration counts; annotated flowchart [elshJL1_1991]

Subtopic: acyclic paths up

Quote: efficiently profile intraprocedural, acyclic paths--the longest paths that do not traverse a loop's back edge [ballT7_2000]
Quote: need design for testability since seldom execute most intraprocedural, acyclic paths [ballT7_2000]
Quote: go/gcc--90% of execution profile from 1000 intraprocedural paths; other SPEC95 programs in 10-100 paths [ballT7_2000]
Quote: log the call context for execution profiling; record the contexts (i.e., active subroutines) and the time spent in each context [spivJM3_2004]
Quote: aprof has the same overhead as gprof; both about twice as slow [spivJM3_2004]

Subtopic: buffer operations up

Quote: OS6 profile showed almost all execution time spent in unpacking buffers, testing contents, and storing elsewhere [stoyJE3_1972]

Subtopic: frequently used sequences up

Quote: large program frequently reuse small sequences of instructions [luccS6_2000]

Subtopic: profile-based optimization up

Quote: use execution statistics from profiler to optimize compilation of case statements [bierKH1_1988]
Quote: use execution profile data to reposition code; identify chains of basic blocks and split never executed code from procedures; 8-10% faster [pettK6_1990]
Quote: hardware performance monitoring improves performance for Java virtual machines [buytD1_2008]

Subtopic: estimate performance up

Quote: the CADES animator uses individual holon performance estimates to evaluate overall performance [pearDJ7_1973]
Quote: FORTRAN estimated execution profile of a program by a Monte_Carlo method [alleFE_1982]

Subtopic: call path up

Quote: efficient recording of dynamic calling context in one word; search algorithm to reconstruct calling context; 10-20% tunable overhead [bondMD6_2010b]
Quote: efficient call path from height of call stack and address; can change size of activation record; 80% success for C++, 95% for C [mytkT10_2010]
Quote: infer a hierarchical categorization of thread state over time; expert rules implemented with simple pattern-matching and decision trees; using thread monitor graph and names of methods on call stack [altmE10_2010]

Subtopic: calling context up

Quote: hot calling context tree (HCCT) of hot nodes and their ancestors; several orders of magnitude smaller than full tree, while providing similar precision [deliDC6_2011]

Subtopic: shadowed memory up

Quote: efficient memory shadowing for 64-bit architectures; EMS64 reduces overhead by 50% [zhaoQ6_2010]
Quote: 64-bit address space is mostly empty; EMS64 uses direct mapping for shadowed memory; page fault on rare, incorrect cases [zhaoQ6_2010]
Quote: store run-time type information in a mirror of memory, 4 bits/byte; continuation bit, type, and size; two nibbles identifies an object [logiA4_2001]

Subtopic: profile implementation up

Quote: DTrace provides safe and heterogenous dynamic instrumentation of production systems; arbitrary actions, predicates, and in situ aggregation [cantB2_2006]
Quote: DTrace providers create probes via the DTrace framework; allows tracing at any level of the system, even across protection boundaries [cantB2_2006]
Quote: JRun instruments Java code by patching its bytecode; annotated class file format with constant pool [mossK10_2000]
Quote: gprof has problems with skewed times and mutual recursion; fix by gathering complete call stacks; requires walking the stack [grahSL6_1982]
Quote: gprof uses call site as primary hash key and callee address as secondary key; 5-30% overhead; no planning required [grahSL6_1982]
Quote: problem of profiling under time-sharing; gprof samples the program counter to infer execution time [grahSL6_1982]

Subtopic: random samples up

Quote: collect random samples every t+r milliseconds where r is random in [-t,t]; may be biased without r, e.g., hardware performance monitors [mytkT6_2010]

Subtopic: infrequent samples up

Quote: samples once or twice a minute can identify root cause of system bottlenecks [altmE10_2010]
Quote: sampling-based data race detector with proportionality guarantee; PACER builds on FASTRACK; 52-86% overhead at 1-3% sampling rate [bondMD6_2010]

Subtopic: manual profiling up

Quote: if display program trace at full speed, brightly lit lines are heavily executed [teitT6_1981]

Subtopic: problems with profiling up

Quote: a profiler may be biased, e.g., ignore native methods; a profiler may perturb the program, i.e., observer effect [mytkT6_2010]
Quote: Java profilers frequently disagree on identity of hot methods and time spent in methods; cannot be all correct [mytkT6_2010]
Quote: Java profilers typically sample at yield points; better results with random sampling [mytkT6_2010]
Quote: found surprising variations between compilers on execution time for very similar programs [parlBN3_1975]
Quote: gprof estimates child time from call frequency; inaccurate for functional programs, common design patterns in object-oriented programs, and mutual recursion [spivJM3_2004]

Related up

Group: code generation
Topic: algorithmic complexity analysis
Topic: code optimization
Topic: code optimization by advice and statistics
Topic: execution tracing
Topic: function cost
Topic: histogram
Topic: program statistics
Topic: testing testing

Subtopics up

actionable profile
acyclic paths
avoid garbage
buffer operations
call path
calling context
counts, total time, local time
embedded programs
estimate performance
flame graph
frequently used sequences
importance of profiling
in situ aggregation
infrequent samples
manual profiling
memory profiling
path profile
problems with profiling
profile implementation
profile-based optimization
profiling loops
random samples
reuse profile
shadowed memory
stack profile
startup profiling
time bound
time vs. operation count
timing analysis
timing services

Updated barberCB 7/05
Copyright © 2002-2023 by C.B. Barber
Thesa, Avev, and thid-... are trademarks of C.B. Barber