Writing Efficient Programs
Writing Efficient Programs
by
Jon Louis Bentley
Summary of the Rules
Taken from Appendix C.
-
Space-For-Time Rules.
- Space-For-Time Rule 1---Data Structure Augmentation.
- Space-For-Time Rule 2---Store Precomputed Results.
- Space-For-Time Rule 3---Caching.
- Space-For-Time Rule 4---Lazy Evaluation.
-
Time-For-Space Rules.
- Time-For-Space Rule 1---Packing.
- Time-For-Space Rule 2---Interpreters.
-
Loop Rules.
- Loop Rule 1---Code Motion out of Loops.
- Loop Rule 2---Combining Tests.
- Loop Rule 3---Loop Unrolling.
- Loop Rule 4---Transfer-Driven Loop Unrolling.
- Loop Rule 5---Unconditional Branch Removal.
- Loop Rule 6---Loop Fusion.
-
Logic Rules.
- Logic Rule 1---Exploit Algebraic Identities.
- Logic Rule 2---Short-circuiting Monotone Functions.
- Logic Rule 3---Reordering Tests.
- Logic Rule 4---Precomputing Logical Functions.
- Logic Rule 5---Boolean Variable Elimination.
-
Procedure Rules.
- Procedure Rule 1---Collapsing Procedure Hierarchies.
- Procedure Rule 2---Exploit Common Cases.
- Procedure Rule 3---Coroutines.
- Procedure Rule 4---Transformations on Recursive Procedures.
- Procedure Rule 5---Parallelism.
-
Expression Rules.
- Expression Rule 1---Compile-Time Initialization.
- Expression Rule 2---Exploit Algebraic Identities.
- Expression Rule 3---Common Subexpression Elimination.
- Expression Rule 4---Pairing Computation.
- Expression Rule 5---Exploit Word Parallelism.
-
Fundamental Rules.
- Fundamental Rule 1---Code Simplification.
- Fundamental Rule 2---Problem Simplification.
- Fundamental Rule 3---Relentless Suspicion.
- Fundamental Rule 4---Early Binding.
Space-For-Time Rule 1---Data Structure Augmentation: The
time required for a common operation on data can often be reduced by
augmenting the structure with extra information or by changing the
information within the structure so that it can be accessed more easily.
- Reference counters facilitate garbage collection by keeping additional
information in dynamically allocated nodes.
- Hints augment data structures by keeping a fast but possibly inaccurate
structure along with a slow but robust structure.
Space-For-Time Rule 2---Store Precomputed Results: The
cost of recomputing an expensive function can be reduced by computing the
function only once and storing the results. Subsequent requests for the
function are then handled by table lookup rather than by computing the
function.
- Peterson stored the value of evaluated board positions to reduce the
time of a game playing program from 27.10 seconds to 0.18 seconds.
- A procedure for computing Fibonacci numbers can be replaced by a table
of the numbers.
- Stu Feldman precomputed the number of ones in all eight-bit strings to
reduce run time from over a week to less than two hours.
Space-For-Time Rule 3---Caching: Data that is accessed
most often should be the cheapest to access.
- Jalics found that caching the last element retrieved from a table
reduced the access cost from 2004 instructions to 4 instructions in 99% of
the queries.
- Chris Van Wyk's storage allocator cached the most popular kind of node
and reduced the system run time by over fifty percent; Peter Deutsch cached
activation records in an allocator and reduced run time by thirty
percent.
- In implementing a dictionary, keep most of the dictionary on disk but
cache the most common words in core.
- Rick Cattell cached recently-used tuples in a database system to reduce
the time of an access from 8 milliseconds to 1.4 milliseconds.
- Caching can "backfire" and increase the run time of a program if
locality is not present in the underlying data.
Space-For-Time Rule 4---Lazy Evaluation: The strategy of
never evaluating an item until it is needed avoids evaluations of unnecessary
items.
- In building a table of Fibonacci numbers, only compute the numbers
actually used.
- Al Aho evaluated the elements of a table as they were needed and
reduced the run time of a program from 30 seconds to less than half a
second.
- Brian Kernighan reduced the run time of a document formatter by twenty
percent by calculating the width of the current line only as needed rather
than for every input character.
Time-For-Space Rule 1---Packing: Dense storage
representations can decrease storage costs by increasing the time required to
store and retrieve data.
- Storing integers in one decimal digit per eight-bit byte, two digits
per byte, and in binary format represent three levels of packing.
- The space of a database system could be reduced by one-third by packing
three integers (between 0 and 1000) in two 16-bit words.
- John Laird reduced the time required to read a file of real numbers by
a factor of over 80 by packing the file.
- Stu Feldman found that by unpacking a table he increased the
data space slightly but decreased the code space by over four thousand
words.
- Overlaying reduces data space by storing data items that are
never simultaneously active in the same memory space.
- Code overlaying reduces code space by using the same storage for
routines that are never simultaneously needed. Many operating systems
provide this service automatically in their virtual memory systems.
Time-For-Space Rule 2---Interpreters: The space required
to represent a program can often be decreased by the use of interpreters in
which common sequences of operations are represented compactly.
- Finite State Machines (FSM's) can be implemented by small tables; they
are easy to define, code, prove correct, and maintain.
- Brooks describes how an interpreter led to small space requirements for
a console interpreter, and how the time spent in decoding a dense
representation of a FORTRAN compiler was paid for by drastically reduced
input and output costs.
- In some systems the programmer should use the interpreter provided by
the underlying machine architecture and "compile" common operations into
machine code.
Loop Rule 1---Code Motion out of Loops: Instead of
performing a certain computation in each iteration of a loop, it is better to
perform it only once, outside the loop.
- Moving the calculation of a constant factor outside a
for
loop reduced its time from 138N microseconds to 7.9N
microseconds.
- Code cannot be moved out of loops if it has side effects that are
desired on every iteration.
Loop Rule 2---Combining Tests: An efficient inner loop
should contain as few tests as possible, and preferably only one. The
programmer should therefore try to simulate some of the exit conditions of
the loop by other exit conditions.
- Adding a sentinel in the last element of an unsorted vector reduced the
time to search it from 7.3C microseconds to 4.1C
microseconds.
- Sentinels can decrease the robustness of a program. Improper use of a
sentinel caused a C compiler to generate non-reentrant code; the bug
surfaced rarely, but was fatal in those circumstances.
- Sentinels are a common application of Loop Rule 2: we place a
sentinel at the boundary of a data structure to reduce the cost of testing
whether our search has exhausted the structure.
- Bob Sproull described how the lexical analyzer of the SAIL compiler
used a control character at the end of the input buffer as a sentinel to
avoid testing for end-of-buffer on each input character.
- Combining tests in the sequential search of a sorted array
increased the run time from 6.8C microseconds to
7.3C microseconds (due to a system-dependent peculiarity); using
sentinels finally reduced the search time to 4.1C
microseconds.
- Bob Sproull described how three tests could be combined into one to
increase the speed of an inner loop of a screen editor.
Loop Rule 3---Loop Unrolling: A large cost of some short
loops is in modifying the short indices. The cost can often be reduced by
unrolling the loop.
- Unrolling a loop to sum an array of ten real numbers reduced the run
time from 63.4 microseconds to 22.1 microseconds.
- Unrolling the loop of a sequential search reduced its run time from
4.1C microseconds to 3.4C microseconds.
Loop Rule 4---Transfer-Driven Loop Unrolling: If a large
cost of an inner loop is devoted to trivial assignments, then those
assignments can often be removed by repeating the code and changing the use
of variables. Specifically, to remove the assignment I := J
, the
subsequent code must treat J
as though it were
I
.
- Unrolling the inner loop of a routine for Fibonacci numbers reduced its
time from 273 microseconds to 143 microseconds.
- Knuth used unrolling to decrease the time of inserting an element into
a linked list by 16 percent.
Loop Rule 5---Unconditional Branch Removal: A fast loop
should contain no unconditional branches. An unconditional branch at the end
of a loop can be removed by "rotating" the loop to have a conditional branch
at the bottom.
- This technique is applicable only in low-level languages.
Loop Rule 6---Loop Fusion: If two nearby loops operate on
the same set of elements, then combine their operational parts and use only
one set of loop control operations.
- To find the maximum and minimum elements in an array, we make only one
iterative pass through the array.
Logic Rule 1---Exploit Algebraic Identities: If the
evaluation of a logical expression is costly, replace it by an algebraically
equivalent expression that is cheaper to evaluate.
- Simple optimizations are often done by compilers; programmers must be
careful that a change of this type does not result in slower code.
- An algebraic identity allowed us to remove the square root in Fragment
A2 to yield Fragment A3; this gave a speedup of almost a factor of
two.
Logic Rule 2---Short-circuiting Monotone Functions: If we
wish to test whether some monotone nondecreasing function of several
variables is over a certain threshold, then we need not evaluate any of the
variables once the threshold has been reached.
- A simple application is evaluating
and
and
or
: to evaluate A and B
we need not test
B
if A
is false.
- Short-circuiting the distance evaluation in Fragment A5 reduced the
time of Fragment A6 by forty percent.
- A more complex application of this rule exits from a loop as soon as
the purpose of the loop has been accomplished.
Logic Rule 3---Reordering Tests: Logical tests should be
arranged such that inexpensive and often successful tests precede expensive
and rarely successful tests.
- This was used in testing the character types in a lexical
analyzer.
- This rule is used to push an expensive test inside a cheaper test.
- Peter Weinberger used a single-line test in a Scrabble program that was
able to avoid an expensive test in over 99% of the cases.
Logic Rule 4---Precomputing Logical Functions: A logical
function over a small finite domain can be replaced by a lookup in a table
that represents the domain.
- Testing character types in a lexical analyzer can often be implement by
a table of character types indexed by characters; Brian Kernighan reports
that this reduced the run time of some programs by thirty to forty
percent.
- David Moon designed a fast interpreter for a PDP-8 that had one table
entry for each of the 4096 possible instructions.
Logic Rule 5---Boolean Variable Elimination: We can
remove boolean variables from a program by replacing the assignment to a
boolean variable V
by an if-then-else
statement in
which one branch represents the case that V
is true and the
other represents the case that V
is false. (This generalizes to
case
statements and other logical control structures).
- This rule usually decreases time slightly (say, less than 25 percent),
but greatly increases code space.
- More complex applications of this rule remove boolean variables from
data structures by keeping separate structures for the
true
and false
records.
Procedure Rule 1---Collapsing Procedure Hierarchies: The
run times of the elements of a set of procedures that (nonrecursively) call
themselves can often by reduced by rewriting procedures in line and binding
the passed variables.
- Rewriting the distance procedure in line reduced the run time of
Fragment A4 from 21.2N2 microseconds to
14.0N2 microseconds.
- Dennis Ritchie increased the speed of a macro processor by a factor of
four by writing procedures in line.
Procedure Rule 2---Exploit Common Cases: Procedures
should be organized to handle all cases correctly and common cases
efficiently.
- Mary Shaw used this technique to increase the efficiency of the
register
SAVE
and UNSAVE
operations on the Rice
University Computer; efficiently handling the special case of operating on
all possible registers reduced the run time of some programs by thirty
percent.
- This rule encourages us to remove unneeded generality from subroutines;
Chris Van Wyk increased the speed of a program by a factor of three by
using a special purpose procedure for intersecting line segments.
- We should organize systems so that efficient cases are common cases; by
ensuring that bit fields always start in the same positions in words, Rob
Pike increased the efficiency of a raster graphics operation by a factor of
two.
Procedure Rule 3---Coroutines: A multiple-pass algorithm
can often be turned into a single-pass algorithm by use of coroutines.
- An intermediate file that is written sequentially and then read
sequentially can often be removed by linking together the two programs as
coroutines; this increases space requirements but reduces costly
input/output operations.
Procedure Rule 4---Transformations on Recursive
Procedures: The run time of recursive procedures can often be
reduced by applying the following transformations:
- Code the recursion explicitly by use of a program stack.
- If the final action of a procedure
P
is to call itself
recursively, replace that call by a goto
to its first
statement; this is usually known as removing tail recursion. That
goto
can often be transformed into a loop.
- If a procedure contains only one recursive call on itself, then it is
not necessary to store the return address on the stack.
- It is often more efficient to solve small subproblems by use of an
auxiliary procedure, rather than by recurring down to problems of size zero
or one.
Procedure Rule 5---Parallelism: A program should be
structured to exploit as much of the parallelism as possible in the
underlying hardware.
- Kulsrud, Sedgewick, Smith, and Szymanski used techniques at many design
levels to build a Quicksort program on a Cray-1 that can sort 800,000
elements in less than 1.5 seconds.
Expression Rule 1---Compile-Time Initialization: As many
variables as possible should be initialized before program execution.
- John Laird preprocessed data unchanged between runs of a program to
reduce the program's run time from 120 seconds to 4 seconds.
Expression Rule 2---Exploit Algebraic Identities: If the
evaluation of an expression is costly, replace it by an algebraically
equivalent expression that is cheaper to evaluate.
- An algebraic identity yields a fast range test that compiler writers
can use on two's-complement architectures.
- We can often multiply or divide by powers of two by shifting left or
right.
- Strength reduction on a loop that iterates through the elements of an
array replaces a multiplication by an addition. This technique generalizes
to a large class of incremental algorithms.
- David Jefferson used an incremental algorithm to reduce the number of
characters sent to a terminal by a factor of over five.
Expression Rule 3---Common Subexpression Elimination: If
the same expression is evaluated twice with none of its variables altered
between evaluations, then the second evaluation can be avoided by storing the
result of the first and using that in place of the second.
- We cannot eliminate the common evaluation of an expression with
important side-effects.
Expression Rule 4---Pairing Computation: If two similar
expressions are frequently evaluated together, then we should make a new
procedure that evaluates them as a pair.
- Knuth reported that both the sine and the cosine of a given angle can
be computed together for 1.5 times the cost of computing either
individually. Similarly, the maximum and the minimum elements of a vector
can be found at about 1.5 times the cost of finding either one.
Expression Rule 5---Exploit Word Parallelism: Use the
full word width of the underlying computer architecture to evaluate expensive
expressions.
- When we OR two 32-bit sets together giving as output their 32-bit
union, we are performing 32 operations in parallel.
- Stu Feldman's program to count one bits in a word (described in
Space-For-Time Rule 1) and Peter Weinberger's Scrabble program (described
in Logic Rule 3) both use this rule.
Fundamental Rule 1---Code Simplification: Most fast
programs are simple. Therefore, keep code simple to make it faster.
Fundamental Rule 2---Problem Simplification: To increase
the efficiency of a program, simplify the problem it solves.
Fundamental Rule 3---Relentless Suspicion: Question the
necessity of each instruction in a time-critical piece of code and each field
in a space-critical data structure.
Fundamental Rule 4---Early Binding: Move work forward in
time. Specifically, do work now just once in hope of avoiding doing it many
times later.
[ Miscellaneous | Krishna
Kunchithapadam ]
Last updated: Sun Jun 27 17:00:19 PDT 2004
URL: http://geocities.datacellar.net/krishna_kunchith/misc/wep.html