Notes on the Code Complete book

I started reading the Code Complete book on vacation in Italy in book2014 not realizing it would take me more than 3 years to finish it. At the end of 2017 I finally read the last page. Do I recommend it? Yes absolutely. Although it can be dry, even boring, sometimes it has a lot of value for every kind of programmer. And all facts are supported by research and other books which are cited all through the book. This post is a reference for myself of the notes I took while reading the book.

Code Complete cover

Key Construction Decisions

There is a difference between programming in a language vs. programming into a language. A programmer which programs in a language will limit his thoughts on how to implement certain features. Always program into the chosen language.

Programmers that work in a language they've used for 3 years or more are 30% more productive.

Design in Construction

Software design is sloppy because a good solution is often only subtly different from a poor one. It's hard to know when your design is "good enough". How much detail in the design is enough?

Focus on identifying real-world objects and determine what each object is allowed to do to other objects. Objects either contain other objects or inherit from them.
Form consisent abstractions, an abstraction is a big part of how we deal with complexity in the real world. Encapsulate implementation details.

Abstraction -> you're allowed to look at an object at a high level of detail
Encapsulation -> you're not allowed to look at an object at any other level of detail

Hide secrets and information hiding: Hiding a design decision makes a huge difference in the amount of code affected by a change. Information hiding is useful at all levels of design.
There are 2 categories of secrets:

  • Hiding complexity.
  • Hiding sources of change.
    .
    Get into the habit of asking: "What should I hide?". You'll be surprised how many difficult design issues dissolve before your eyes. Information hiding is as simple as using a named constant MAX_EMPLOYEES to hide a value of 100.

Coupling should be kept loose. The more easily other modules can call a module, the more loosely coupled it is. For example: when a function needs only one property of a class, only pass the property instead of the entire object. The function otherwise gets coupled to the class. Prefer passing primitive datatypes over classes.

Partially initializing objects for specific scenario's is a form of tight coupling.

Classes and routines are tools for reducing complexity. If they're not making your job simpler, they're not doing their job.

Working classes

A class is a collection of data and routines that share a cohesive, well-defined responsibility. A class can also be a collection of routines that provides a cohesive set of services even if no common data is involved.

An abstract data type (ADT) is a collection of data and operations that work on that data. You can represent virtually any data type as an ADT. If a stack, list or queue is represented as an ADT you need to ask: "What does this stack, list or queue represent?". Treat the ADT as such instead of treating it as a stack, list or queue.
A class is an abstract data type with the additional concepts of inheritance and polymorphism.

When exposing a routine in a class, ask yourself if exposing the routine is consistent with the abstraction. If you program throught the interface, encapsulation is broken.
Be critical of classes that contain more than 7 data members. 7+/-2 is the magical number of discrete items a person can remember while performing other tasks.

Liskov substition principle: subclasses must be usable through the base class interface without the need for the user to know the difference.

The more classes a class uses, the higher the fault rate tends to be.

Law of Demeter:

  • Object A can call own routines
  • Object A instantiates object B
  • Object A can call any routines of object B
  • Object A should avoid calling routines on objects provided by object B

The most important reason to create a class is to reduce a program's complexity. Create a class to hide information so that you won't need to think about it.
Passing lots of data around suggests that a different class organization might work better.

High-Quality Routines

The upper limit for the number of parameters for a routine: 7.
Reducing complexity is a reason to create a routine. But some jobs are performed better in a single, large routine.

Mental block: reluctance to create a simple routine for a simple purpose. But small operations tend to turn into larger operations.

Inheritace = "is a" relationship, completely adhere the same interface contract.
Cohesion = strength -> how strongly related are the operations in a routine

Functional cohesion: strongest and best kind of cohesion, occurring when a routine performs one and only one operation

Less than ideal cohesions:

  • Sequential cohesion: Fix sequential cohesion by letting one function call the other.
  • Communicational cohesion.
  • Temporal cohesion: Occurs when operations are combined into a routine because they are all done at the same time. For example: Startup. It's sometimes associated with bad programming practices.
  • Procedural cohesion: Occurs when operations in a routine are done in a specified order.
  • Logical cohesion: If the operations use some of the same code or share data, the code should be moved into a lower-level routine and the routines should be packaged into a class.
  • Coincidental cohesion: Occurs when the operations in a routine have no discernible relationship to each other.

Focus on functional cohesion.

Avoid long, silly routine names. The optimum average length for a routine name is 9 to 15 characters.
The theoretical best maximum routine length is one screen or one or two pages of program listing, approximately 50 to 150 lines. A routine should be allowed to grow organically up to 100-200 lines.

Interfaces between routines are some of the most error-prone areas of a program. 39% of all errors were internal interface errors.

If several routines use similar parameters, put the similar parameters in a consistent order. Put state or error variables last. Don't use routine parameters as working variables. If you find yourself passing the same data to many different routines, group the routines in a class and treat the frequently used data as class data.

Function: returns a value.
Procedure: does not return a value.

Use a function if the primary purpose is to return a value, otherwise use a procedure.

Routine naming: If the name is bad and it's accurate, the routine might be poorly designed. If the name is bad and it's inaccurate, it's not telling you what the program does. If the name is bad the program needs to be changed!

Defensive Programmning

In defensive programming the main idea is that if a routine is passed bad data, it won't be hurt even if the bad data is another routine's fault.

For production software, garbage in, garbage out isn't good enough. It is the mark of a sloppy, nonsecure program. Better:

  • Garbage in, nothing out.
  • Garbage in, error message out.
  • No garbage allowed in.

Assertions: A routine to assert the result of a highly optimized, complicated routine used during development. Use assertions for conditions that should never occur and to check for bugs in the code. Assertions are executable documentation.

Robustness: Try to do something that will allow the software to keep operating.

Correctness: Never return an inaccurate result. No result is better than returning an inaccurate result.

Only use exceptions for conditions that are truly exceptional. When using exceptions, include all info. For example: When an array index error occurs, include the upper and lower array limits and value of the illegal index.
Sometimes the best response to a serious runtime error is to abort.

Use barricades, convert input data to the proper type at input time.

Be willing to trade speed and resource usage during development in exchange for built-in tools that make development go more smoothly.

The Pseudocode Programming Process

Pseudocode turns into programming language comments. It is a tool for detailed design.

It is usually a waste of effort to work on efficiency at the level of individual routines. Big optimizations come from refining the high-level design.
Once you start coding, you get emotionally involved and it becomes harder to throw away bad design.

5% of all errors are hardware, compiler or operating system errors.

A working routine is not enough. Only compile until late in the process. Otherwise you get into the habit of "I'll get it right with just one more compile", the "Just One More Compile" syndrome. Rise above the cycle of hacking something together and running to see if it works.

General Issues in Using Variables

The principle of proximity: keep related actions together. Declare variables as close as possible to where they are used. Low span and low live time of a variable reduces the window of vulnerability. The span is the number of lines between declaration and usage of a variable. The live time is the total number of statements over which a variable is live.

Maximizing scope by making variables global makes it easy and convenient to write the code but hard to understand and read the code.
Create access routines instead of sharing a variable of a class by making it public.

Binding time: time at which the variable and its value are bound together.

Bind at:

  • Code writing:
    x = 0xFF (hex value for blue)
  • Compile-time:
    COLOR_BLUE = 0xFF
    x = COLOR_BLUE
  • Run time:
    x = readBlueColor()

The earlier the binding time, the lower the flexibility and complexity. On early bind time, use named constants.

Use each variable for one purpose only. Avoid variables with hidden meanings. This is known as "hybrid coupling".

The Power of Variable Names

An effective technique for coming up with a good name for a variable is to state in words what the variable represents. Variable names should be as specific as possible. Variable names averaging 10 to 16 characters minimize the debug effect. The part that gives the variable most of its meaning should be at the front. An exception is made for num:

  • numCustomers: number of customers
  • customerNum: number/index of the current customer

To avoid confusion use count and index instead of num.

Loop indexes: score[teamIndex][eventIndex] instead of score[i][j].

Rename success variables with a more specific name, for example: found, processingComplete. For boolean variables, put is in front of the name. A benefit of this is that it won't work with vague names: isStatus?.
Use positive boolean variable names. Any convention is better than no convention.

If you can't read your code to someone over the phone, rename your variables. If you have similar sounding variable names, rename your variables.

Fundamental Data Types

The only literals that should occur in the body of a program are 0 and 1. Any other literals should be replaced with something more descriptive.

Use boolean variables to document your code: bool finished, bool repeatedEntry. Assign an expression that is being tested to a variable that makes the implication of the test unmistakable.

Define the first and last entries of an enumeration to use as loop limits. Example:

    country_first = 0
    country_china = 0
    ...
    country_usa = 6
    country_last = 6

    for i = country_first to country_last
    ...

Reserve the first entry in an enum as invalid. Declaring the element that's mapped to 0 to be invalid helps to catch variables that were not properly initialized. Beware of explicitly assigning values to an enum to avoid gaps.

Random accesses in arrays are similar to random gotos in a program. Consider using container classes that you can access sequentially - sets, stacks, queues, ... - as alternatives before you automatically choose an array.
Watch out for index cross talk, when you use nested loops it's easy to confuse i and j counters.

Unusal Data Types

Use a structure to simplify operations on blocks of data. Avoid passing a structure as a parameter when only 1 or 2 fields form the structure are needed, pass the specific fields instead.

  • Isolate pointer operations in routines and classes.
  • Keep allocation and deallocation symmetric.
  • Use dog-tag fields to check for corrupted memory or add explicit redundancies.
  • Delete pointers in a linked list in the right order.

Organizing Straight-Line Code

  1. Try to write code without order dependencies.
  2. Try to write code that makes order dependencies obvious.
  3. If order dependency isn't explicit enough, document it.

Add asserts with booleans to check if data is initialized but keep in mind with extra code comes extra posibilities for errors and bugs.

Keep related actions together.

Using conditionals

In conditionals, put the case you normally expect to process first. Put code that results from a decision as close as possible to the decision.
Research shows that 50 to 80% of if statements should have had an else clause.

Create a final else clause with an assert to catch cases you didn't plan for.

Effective ordering of cases:

  • If all cases are equally important, order alphabetically or numerally.
  • Put the normal case first, add a comment.
  • Order by frequency.

Use the default clause only to detect legitimate defaults. If it isn't used, use it to detect errors.
If the default clause is used for some other purpose than error detection the implication is that every case selector is correct. This means that every value that could possibly enter the case statement would be legitimate.

Controlling Loops

The for loop is for simple uses. Most complicated looping tasks are handled by a while loop.
Make each loop perform only one function. If it seems inefficient to write two loops:

  • Write the code as two loops.
  • Comment that they should be combined for efficiency.
  • Wait until benchmarks show that the section of the program poses a performance problem.

Break: Terminate loop.
Continue: Skip loop body and continue executing at the beginning of the next iteration.

Efficient programmers do the work of mental simulations and hand calculations.

Make your loops short enough to view all at once. They should rarely be longer than 15 to 20 lines and nesting should be limited to three levels.
When coding a complex loop, start with 1 case using literals. Then indent it, put a loop around it and replace the literals with loop indexes. If necessary put another loop around it and replace the literals again. Start from the inside and work your way out.

Unusual Control Structures

Use return when it enhances readability.
Replace nested ifs with guard statements, if the language does not have a guard statement use if !.

Limit recursion to one routine.
Use new to create objects on the heap rather than letting the compiler create auto objects on the stack.
Consider alternatives to recursion before using it!

Rewrite goto with a state variable:

    DO ACTION
    IF STATUS = SUCCEED
        DO NEXT ACTION
        ...

This eliminates the need for nested ifs and complicated if ... else structures.

Table-Driven Methods

Ways to look up an entry in a table:

  • Direct access.
  • Indexed access: Create smaller table with lots of gaps to avoid a bigger table with lots of gaps.
  • Star step access: When the index is not an exact integer but a floating point.

General Control Issues

Compare boolean values to true and false implicitly.
Put boolean tests into a well-named function even if a test is only used once.
Convert negative expressions to positive, change a variable name to create a positive expression. For example: Rename !statusOk to errorDetected.
In java: Know the difference between a == b and a.equals(b).

Use blocks to clarify your intentions regardless of whether the code inside the block is 1 line or 20 lines.

Few people can understand more than 3 levels of nesting ifs. Simplify nested ifs by retesting part of the condition.

Case statements virtually always indicate poorly factored code.

A program's complexity is defined by its control flow.
Measuring complexity by counting decision points in a routine:

  • Start at 1.
  • Add 1 for the following keyword: if, while, repeat, for, and, or
  • Add 1 for each case statement

0-5: Routine is fine.
6-10: Think about ways to simplify.
10+: Break routine into a second routine and call from the first.

Structured programming: You can build any program out of a combination of sequences, selections and iterations.

The Software-Quality Landscape

Quality gates: Periodic tests or reviews that determine whether the quality of the product at one stage is sufficient to support moving onto the next.
Programmers have high achievement motivation, they will work to the objectives specified but they must be told what the objectives are.

The defect detection rate of unit and integration tests is around 30 to 35%. Unit, system and functional testing have a cumulative defect detection of less than 60%. Code reviews were several times as cost-effective as testing.

Code inspection: One-step technique
Testing: Two-step technique

One-step techniques are substantially cheaper overall than two-step ones.

General principle of software quality: Improving quality reduces development costs. The industry-average productivity is about 10 to 50 lines of code delivered per person per day. Software defect removal is the most expensive and time-consuming form of work for software.

Quality is free in the end:

  • Needs reallocation of resources.
  • Prevent defects cheaply instead of fixing them expensively.

Collaborative Construction

Developers insert an average of 1 to 3 defects per hour into their designs and 5 to 8 defects per hour into code.
The cost of full up pair programming is 10 to 25% higher than solo development. Reduction in development time is in the order of 45%. Each hour of inspection prevents about 100 hours of related work. When people know their work will be reviewed, they scrutinize it more carefully.

Key to success with pair programming:

  • Support pair programming with coding standards.
  • Don't let it turn into watching.
  • Don't force it for the easy stuff.
  • Rotate pairs and work assignments regularly.
  • Encourage pairs to match eachother's pace.
  • Make sure both partners can see the monitor.
  • Don't force people who don't like each other to pair.
  • Avoid pairing all newbies.
  • Assign a team leader.

Formal inspections

  • Is the technical work being done?
  • Is the technical work being done well?

Having more than 2 or 3 reviewers doesn't increase the number of defects found.

Procedure of an inspection:

  1. Planning
  2. Overview
  3. Preparation
  4. Inspection meeting
  5. Inspection report
  6. Rework
  7. Follow-up
  8. Third-hour meeting

Don't discuss solutions during meetings. Stay focused on identifying defects. Do not critize the author of the design or the code. The author should not try to defend the work under review. Each reviewer must respect the author's ultimate right to decide how to resolve an error.

Inspections are more focused than walkthroughs and generally pay off better.

Code reading: 3.3 defects per hour of effort.
Testing: 1.8 defects per hour of effort. Code reading focuses more on individual review.

Developer Testing

Black-box testing: Tester cannot see the inner workings.
White-box or glass box testing: Tester is aware of the inner workings.

If you want to improve your software don't just test more, develop better. Developer testing should probably take 8 to 25% of total project time.
It's hard to write a test case for a poor requirement.

Clean test: Test whether the code works.
Dirty test: Test for all the ways the code breaks.

Classes of bad data:

  • Too little (or no data)
  • Too much
  • Wrong kind of data
  • Wrong size of data
  • Uninitialized data

80% of the errors is in 20% of the code.
50% of the errors is in 5% of the code.

General principle of software quality: Improving quality improves the development schedule and reduces development costs.

95% of errors are caused by the programmer.

It's cheaper to build high-quality software than it is to build and fix low-quality software.

Debugging

Is your debugging approach weak? Do you feel anguish and frustration?

Analyzing and changing the way you debug might be the quickest way to decrease the total amount it takes you to develop a program.
Finding the defect and understanding it is usually 90% of the work.
Classes that had defects before are likely to continue to have defects.

Confessional debugging: Discover your own defect in the act of explaining it to another person.

Syntax errors:

  • Don't trust the line numbers in compiler messages.
  • Don't trust the compiler messages.
  • Don't trust the compiler's second message.

Defect corrections have a more than 50% chance of being wrong the first time.
Before you fix a problem make sure you understand it to the core. The debugger isn't a substitute for good thinking

Refactoring

If quality is degrading it is a warning that a program is evolving in the wrong direction.
Cardinal rule of software: Evolution should improve the internal quality of the program.
Copy and paste is a design error.

A chain of routines is passing tramp data. Tramp data is passing data to one routine so that routine can pass it to another routine.

Don't document bad code, rewrite it. Be aware of setup and takedown code.

The best way to prepare for future requirements is not to write speculative code.

Create and use null objects instead of testing for null values. For example: if a customer is null when it's an occupant, don't check for null but create a property Occupant on the Customer class which value is null.

Separate query operations from modification operations. A method like GetTotals should only query. If it changes an object's state separate the query functionality from the state-changing functionality.

A routine should return the most specific type of an object.

Class A calls class B which has a reference to class C.
Should class A call class C? Ask yourself what the right abstraction is for class A's interaction with class B. Remove a middleman.

Programmers have more than 50% chance of making an error on their first attempt to make a change.
Treat simple changes as if they were complicated.

Code-Tuning Strategies

Performance is only loosely related to code speed.
No one but you and other programmers care how tight your code is.
Efficient code isn't necessarily better.

20% of a program consumes 80% of its execution time. Less than 4% of a program account for 50% of its runtime. The part that needs to be perfect is usually small.

Jackson's rules for optimization:

  1. Don't do it.
  2. Don't do it yet until you have a perfectly clear unoptimized solution.

Experience doesn't help much with optimization. The only result you can be sure of without measuring performance is that code is harder to read. More than half the attempts of code tuning will be negligible or degrade performance.

Code-Tuning Techniques

Sentinel value: Value you put just past the end of the search range and that's guaranteed to terminate the search.

Reducing strength: Replacing an expensive operation with a cheaper. Example:

    if sqrt(x) < sqrt(y)

Replace with:

    if x < y
        if sqrt(x) < sqrt(y)

Big performance gain!

Computers have become so powerful that the level of performance optimization has become irrelevant. The first optimization is often not the best, keep looking for one that's better.

How Program Size Affects Construction

Number of errors increases dramatically as project size increases.

Small project: Construction takes 65% of the time.
Medium size project: Construction takes 50% of the time.
Very large project: Construction becomes less dominant.

Software program: Used by the person who created it and a few others.
Software product: Program intended to be used by other people then the programmer.

A software product costs 3 times as much as a software program to develop. A product needs extra polish compared to a program.

Don't estimate creating a 32 000 lines of code program based on your experience of developing a 2 000 lines of code program.
Succesful project planners choose their strategies for large projects explicitly.

1 000 lines of code: 7% of its effort on paper.
100 000 lines of code: 26% of its effor on paper.

Managing Construction

Start small and scale-up methods instead of starting with an all-inclusive method and pare it down.
1 of the most succesful projects of all time: 83 000 lines of code. It had 1 system error in the first 13 months. The key to the success was that the identification of all computers runs was public instead of private.

SCM: Software Configuration Management.

Off-the-cuff estimates are often mistaken by a factor of 2 or more. A significant percentage of projects would be on time if they account for the impact of untracked but agreed upon changes.

Don't let your fear of bureaucracy stop you. The average project is 1 year late and 100% over budget. Developers' estimates tend to have an optimism factor of 20 to 30%.

Approach to estimating a project:

  • Establish objectives.
  • Allow time for the estimate and plan it. Rushed estimates are inaccurate estimates.
  • Spell out requirements.
  • Estimate at a low level of detail. The more detailed your examination, the more accurate your estimate.
  • Reestimate periodically.

The average project overruns its planned schedule by 100%. Projects don't make up lost time, they fall further behind.
Adding people to a late software project makes it later. But if a project's tasks are partitionable you can divide them further and assign them to different people.

For any project attribute, it's possible to measure that attribute in a way that's superior to not measuring at all.
What gets measured gets done. Measurement has a motivational effect.
Collect data for a reason.

Good programmers tend to cluster, so do bad programmers.
Programmers who performed in the top 25% had bigger, quieter more private offices and fewer interruptions.

In a hierarchy every employee tends to rise to his level of incompetence.

Don't distract your manager with unnecessary implementation details.

Integration

Phased integration is a big bang integration. Integration happens late in the project.
Incremental integration helps a project build momentum. It's like a snowball going downhill. You encounter fewer problems at once.
Top down integration: The class at the top of the hierarchy is written and integrated first. The most troublesome errors to debug arise from simple interactions between classes.

Early integrate classes that exercise the system interfaces to minimize problems bubbling up to the top.

Integration: Make up a unique approach/strategy tailored to your specific project.
A daily build feels like it slows progress but the project team gets a more accurate picture of how it's been working all along. Daily builds enforce discipline and keep pressure-cooker projects on track.

Testing buddy: Tester who focuses on that developer's code.

For most projects, literal continuous integration is too much of a good thing.

We will always need people who can bridge the gap between the real-world problem to be solved and the computer that is supposed to be solving the problem.

Layout and Style

Good visual layout shows the logical structure of a program. A good layout scheme tells the same story to the human that it tells to the computer.
White space is grouping. A paragraph of code should contain statements that accomplish a single task. The optimal number of blank lines is 8 to 16%. The optimal indentation is 2 to 4 spaces. An indentation of 6 spaces looks pleasing but turns out to be less readable. This is a collision between aesthetic appeal and readability.

Use more parentheses than you think you need.

Fundamental theorem of formatting: Formatting should show the logical structure of the code.
Outdated rule: limit statement length to 80 characters. It's probably allright to exceed occassionaly. A single 90 character long line is usually more readable than a line broken into two.

Extra spaces hardly ever hurt.

Self-Documenting Code

Can you treat the class as a black box?
Normal case should follow the if instead of the else.

Good comments don't repeat the code or explain it. They clarify its intent. Comments should explain what you're trying to do. If it's hard to comment iot's either bad code or you don't understand it well enough. Poor comments are worse than no comments. You'd rather have accurate comments than nice looking onse.

If you use pseudocode to clarify your thoughts, coding is straightforward.

There should be 1 comment every 10 statements, this is the comment density at which clarity seems to peak. Comments should explain why the code works now, not why it didn't work at some point in the past. Think about what you would name the routine that did the same thing as the code you want to comment.

Focus your documentation efforts on the code itself. A comment should always preceed the code it describes. Don't document bad code, rewrite it.

If a loop is complicated enough to need an end-of-loop comment, treat the comment as a warning sign.
Comment any assumption you made of the state of the variables you retrieve.

Personal Character

If you can't learn at your job, find a new one. Make learning a continuous commitment.

People will have an emotional investment in the design because they will already have written code for it.

A professional programmer writes redable code.

Themes in Software Craftmanship

Difficulties in:

  • Writing comments.
  • Naming variables.
  • Decomposing the problem into cohesive classes with clear interfaces.

These difficulties indicate that you need to think harder about the design before coding.

The single most common cause of not finding errors was simply overlooking them.