Home > Computing > Boilerplate

Boilerplate

Programming languages have different levels of verbosity. Some languages have terse syntax, so you need less text to express what you want the computer to do. Others require you to repeatedly type elaborate constructs, often multiple times, to achieve the same.

Usually you don’t have a choice of programming language. You are hired by a company who already has some existing code and you have to work with that code base. Or you are targeting a specific platform and you have no choice, but to use a particular language.

Regardless of the language you use, you still have to make many choices when designing the software you write, and the choices you make will contribute to the size of the source code and may indirectly affect maintainability, extensibility and robustness.

So what makes a program a good program? I have one theory.

Copy&paste

They don’t teach how to write good programs in schools. In most schools they only teach you the mechanics of programing: they show you the tools, but they don’t teach you how to use them effectively.

My programming adventure started in high school with Turbo Pascal. One of my first projects was a simple game. One time I found a bug and I realized, that I have already fixed it once in another function. I noticed that both pieces of code which had the bug were originally copied from another function.

This was one of my first lessons, and as a programmer you never stop learning. The lesson learnt was that copy&paste approach to programming is a bad practice. If you had to modify one piece of a copied code for whatever reason, you likely have to modify all of them – that’s a lot of unnecessary manual labor, which is something programmers hate. If you just wrote a new expression, which looks similar to or exactly like an existing piece of code, you should instead put it in a new function and call it in both places.

Summary: copy&paste == bad programming practice.

Beyond copy&paste

Not too long ago I’ve been reading some articles criticizing C++, the language I use the most. One of the rightful points was that C++ needs you to type the same code at least twice in more than one place. A typical location of duplicate code is class definitions. First you define a class in a header file, so you type the function declarations there, then you type exactly the same function signatures in a source file where you define the functions.

class Vehicle {
public:
    void StartMotor();
    void Accelerate(double acc);
};

void Vehicle::StartMotor() // you had to type this again!
{
    :::
}

void Vehicle::Accelerate(double acc) // and this too!
{
    :::
}

If you later need to modify a function, e.g. change the number of arguments or their types, you have to do it at least twice. It is actually even worse if you have a derived class and you overload virtual functions. To change the interface, you have to change it in 2*C places, where C is the number of classes which declare that function.

Yet worse, it may happen that you change the function’s signature in the derived class, but forget to change it in the base class. In result, you will have a bug in your program. If you use a pointer to the base class to call the function, the function from the derived class will not be called, since it has a different signature. Fortunately modern compilers issue a warning when this happens, but you still have to write the same piece of code twice.

Sounds like copy&paste? Well, that’s how I write new classes in C++, I define class’ functions in a header file, then copy them to a source file, then use a macro in my editor to expand them by removing semicolons and adding braces on new lines.

Boilerplate code

Welcome to boilerplate code. The definition of boilerplate code is exactly that: redundant code, which you have to write to make a well-formed program source code, but which is unnecessary from your perspective as a programmer.

But the definition of boilerplate code extends beyond what you actually have to write to make a well-formed program. Consider a C++ program where you use a well-known libpng library to load a PNG image. In the basic version, you could write a single function like this:

bool LoadPng(const char* filename, std::vector< char>* image)
{
    :::
}

Inside the function you call the PNG library, which has the C interface, to verify whether the file exists, is a PNG, you load the headers, determine dimensions of the image and color format, and finally you load the image data. Without going into the details, here is how a piece of that function could look like:

bool LoadPng(const char* filename, std::vector< char>* image)
{
    :::

    png_structp png_ptr = png_create_read_struct(PNG_LIBPNG_VER_STRING, NULL, NULL, NULL);
    if (!png_ptr)
    {
        printf("Failed to create png structure\n");
        return false;
    }

    png_infop info_ptr = png_create_info_struct(png_ptr);
    if (!info_ptr)
    {
        printf("Failed to create png info structure\n");
        png_destroy_read_struct(&png_ptr, (png_infopp)NULL, (png_infopp)NULL);
        return false;
    }

    :::
}

This is maybe 10% of the code you have to write to load a PNG file. There are only two lines of code above, which actually do something potentially useful or necessary. The rest of the lines are boilerplate code.

Especially please notice that every time an error can potentially occur, you have to handle it. So you have to write error handling code many, many times. If an error occurs later in the function, you will have to delete the allocated data structures before returning from the function, and you have to write exactly the same code many times, once for every function which could fail.

I bet that there are many programs out there, which have a bug in their PNG loading code and don’t handle the error conditions fully correctly. So in some circumstances these programs will leak memory or behave unpredictably.

Now, the situation above can be improved by writing a C++ wrapper class for loading PNG.  But I wouldn’t go too far with it, or we will just shift the boilerplate code elsewhere, instead of reducing it.

I can imagine somebody writing a PNG loader class, and declaring one function of the class per function of the libpng library. Such PNG loader class would simply build a C++ interface over the library’s C interface. That approach may be appealing to some, but the problem is that 90% of that class will be… boilerplate code! There will only be one single place in the whole program where this class would be used – the LoadPng() function. So all that code would be written in vain and only be a maintenance chore, plus a potential place for bugs to hide. Moreover, the compiler would generate much more unnecessary code, contributing to the program’s final size.

class PNGLoader {
public:
    PNGLoader();
    ~PNGLoader();
    static bool SigCmp(...);
    void CreateReadStruct(...); // throw on error
    void CreateInfoStruct(...); // throw on error
    :::
    void ReadImage(...); // throw on error
};

A fact of life is that many programmers call the above approach a “design”. They create beautiful class designs and hierarchies, which are only taking space and engineering time, but contribute little to the program.

And if you happen to be a C++ hater, please know that the above problem affects not only object-oriented languages, but all programming languages in general. Programmers often tend to put too much work and thought into the form instead of focusing on the contents.

So it seems to me that the best approach is to preserve a balance between the amount of code and functionality. Sure, you can write a beautiful command line argument parser, but what good is it if your program only handles two arguments anyway? Handle them correctly, but avoid too much boilerplate code which you will never use.

In case of the PNG loader, a good choice is a function like LoadPng(), which inside uses something like Boost.ScopeExit to handle errors and corner cases. Boost.ScopeExit is actually a good way of safely handling many kinds of resources in C++.

Quality

In general, programs in source code form consist of:

  1. Comments and whitespaces, generally harmless if used wisely and not to comment out dead code,
  2. Data structures, describing the internal state of the program,
  3. Algorithms, which are mathematical transformations of the program state, and last but not least:
  4. Boilerplate code, which clutters the programs, makes them harder to understand, hides bugs and generally causes programs to be big and slow.

To write good programs, avoid boilerplate code like the plague. It’s not the only rule for writing good programs, but I think it’s an important one.

Advertisements
Categories: Computing
  1. No comments yet.
  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: