Preface | Scalable C (in progress)

Why This Book Exists

The C programming language does not have a sense of humor. If you write in C, you know that it does not forgive mistakes. It does not try to interpret what you mean. It does what you tell it, no more and no less. In return, it gives you full control over the results of your work.

Modern languages focus on comfort, abstraction, and automation. C, which was born around 1970, focuses on minimalism, portability, and performance. Well-written C code can run on a $1 embedded computer as well as on a massive server.

If you know C well enough to understand these trade-offs, then you know where C stops working, as a language. C has many problems, yet three stand out:

While C lends itself to building libraries, it has no consistent API model. This makes C code much harder to read and understand than it should be.
The standard approach for concurrency is POSIX threads that share their state. This is complex and fragile. We know how to do this better, using message passing between threads.
To compile and link C code for arbitrary platforms is a complex black art. This creates a real cost for C projects. Even CMake, perhaps the best cross-platform answer, uses autotools to bootstrap itself.

This adds up to extra effort and cost for anyone using C. It is uneconomic to write large applications in C. Even for system-level applications, many people prefer C++, Go, and Erlang. Yet there are good reasons to use C, which are not going away.

The most powerful argument for using C is that it works well with all other languages. This is a result of its age, and its wide use as a low-level systems language. If you make a library in C, you can offer it to developers in every one of the hundred most popular languages.

Over time, C's relative popularity is falling. The high costs of using C in the real world of the 21st century are throttling it.

Yet we have solved these problems. We have good, tested answers. Today these answers are still well-hidden, and known only to a few people. This book aims to change that. It aims to bring C into the 21st century and make it a cheap, useful material in which to build.

What is "Scalable C?"

We use C most often to write libraries, which we then call from applications in other languages. This layer of C libraries sits between the operating system and the application.

This layer provides security, user interfaces, audio and video, maths, graphics, databases, communications, compression, and so on. I call this the "fabric layer."

For the most part, this fabric layer sees the world as a single processor. It has no concept of concurrency. It cannot take advantage of many cores on a machine, let alone many machines in a cloud. Every library has its own style, standards, and API model. Every library has a custom build process.

A scalable technology can solve large problems as well as small ones. Our current fabric layer is not scalable. It costs too much to write and to use.

What I will explain in this book is how to build a scalable fabric layer, written in "Scalable C."

Scalable C has specific properties:

It is cheap to create a Scalable C project.
It is cheap to use, with consistent and obvious APIs.
It is cheap to deploy, with powerful tools and packagers.
It is cheap to scale to many cores, with actor-based concurrency.
It is cheap to scale to many servers, with clustering across a cloud or data center.
It is cheap to build community, with a modern collaborative process.

Scalable C is standard portable C plus a mix of other technologies:

The CLASS RFC defines the Scalable C language style.
ZeroMQ provides message passing between threads and processes.
CZMQ provides a core C library for Scalable C.
Zyre provides local-area clustering for Scalable C projects.
zproject provides packaging (builds, bindings, and distributions).
zproto provides client-server meta-programming.
The C4.1 RFC defines a collaborative process for scalability.

How This Book Works

This book takes the same approach that I take in distributed programming workshops. That is, start with simple worked examples, and then add more and more depth. Each step aims to answer a problem you'll hit soon, or have already hit.

We will see a lot of example code. All the examples work, and you can build and play with them. The Scalable C repository holds this book, and the code. If you find things you want to change, just send a pull request. I'll explain how that works, when we get started.

If you read the whole book and follow the examples, you will learn how to:

Write C code using the Scalable C style, called CLASS.
Build and package your C projects, using zproject.
Use the CZMQ generic list and hash containers.
Pass messages between threads and processes, using ZeroMQ.
Write non-blocking multithreaded C code as CZMQ actors.
Design good APIs and wire-level protocols.
Use git to collaborate with others on a project.
Build an open source community.
Make secure encrypted communications.
Build clustering across a local network, using Zyre.
Build multithreaded clients and servers, using zproto.
Generate C code using model oriented programming.
Use your C code from other languages, including Java.
Build and ship your C code for Android.
Write portable code that runs on all platforms.

This sounds like a lot, and it might be, if I had to explain everything from scratch. I'll keep things simple by focusing on patterns that work, without too much argumentation. For example I'll explain patterns for using git, that avoid the most common pitfalls. I expect you to be able to learn git yourself.

Before You Start

Here is a list of ingredients:

One working PC. It does not need to be new, or fancy.
An operating system you are comfortable with. Linux will give you the best results. OS/X and Windows are usable if you have no choice.
An Internet connection, at least to get started.
A GitHub account. If you are not already registered on github.com, do that now.
Conversational Bash skills. You can run commands, install packages, and so on.
A basic knowledge of C. You at least understand pointers, and the standard library.
A basic knowledge of compute models. You have written programs as a job for a few years at least.

Here's my current set-up:

A second hand X220 Thinkpad from LapStore. Costs about EUR 300, with an SSD. It's not the lightest or fastest laptop. Yet the battery lasts all day and it runs Linux well.
Ubuntu Linux with default configuration.

To start with, you need at least these packages:

git-all -- git is how we share code with other people.
build-essential, libtool, pkg-config - the C compiler and related tools.
autotools-dev, autoconf, automake - the GNU autoconf makefile generators.
cmake - the CMake makefile generators (an alternative to autoconf).

Plus some others:

uuid-dev, libpcre3-dev - utility libraries.
valgrind - a useful tool for checking your code.

Which we install like this (using the Debian-style apt-get package manager):

sudo apt-get install -y
    git-all build-essential libtool \
    pkg-config autotools-dev autoconf automake cmake \
    asciidoc uuid-dev libpcre3-dev valgrind

The LearnXinYMinutes project has good quick guides to many languages. Here are its guides to Bash, to C, and to git.

Before you use git, on a new laptop, always tell it your name, and email address. Use the same email address for your GitHub account:

git config --global user.name "Your Name"
git config --global user.email your.name@example.com

Why not C++?

Don't laugh. This is a serious question people sometimes ask, even when "C" is clearly in the title of the book. The answer is roughly: "C++ encourages you to make worse code than even C does."

Learning a large language (and C++ is a large language) is like memorizing the first thousand prime numbers. It is to fill your brain with junk without benefit. Yes, it is good to learn, for the sake of learning. Yet to learn complexity is like joining a cult. You start to think the knowledge is worth something for its own sake.

The C language is small and yet it takes years to master it. I wrote this book to speed people along that path. Yet inevitably, your first projects will be weak, no matter how smart you are. If you're coding every day you'll be decent after five years, and good after ten. And after twenty years you may become great.

Yet during that process, if you can keep it going, you must be making useful things, from day one. In a small language this is doable. You can learn enough to contribute to projects, or start your own, in a few days or weeks. It is like learning to tap a metal triangle. It adds to an orchestra, if you stay on rhythm.

C++ is a language that speaks to the inner intellectual. The more C++ you know the worse you become at working with others. First, because your particular dialects of C++ tend to isolate you. Second, because you sit in an ivory tower that few can approach. This is a problem with all highly abstract languages.

Any language that depends on inheritance leads you to build large monoliths. Worse, there are no reliable internal contracts. Change one piece of code and you can break a hundred.

I'll explain later how we design classes in C, so we get neatly isolated APIs. We don't need inheritance. Each class does some work. We wrap that up, expose it to the world. If we need to share code between classes, we make further APIs. This gives us layers of classes.

This gives us a neat, compact syntax. Let's take one example to compare C++ and C. We'll make a linked list and push some values to it, then print them out.

First, in C++:

#include <iostream>
#include <list>
using namespace std;

int main ()
{
    list<string> List;
    List.push_back ("apple");
    List.push_back ("orange");
    List.push_front ("grape");
    List.push_front ("tomato");
    cout << List.size () << ": ";

    list<string>::iterator i;
    for (i = List.begin (); i != List.end (); ++i)
        cout << *i << " ";
    cout << endl;
    return 0;
}

And now in Scalable C:

#include <czmq.h>

int main (void)
{
    zlist_t *list = zlist_new ();
    zlist_append (list, "apple");
    zlist_append (list, "orange");
    zlist_push (list, "grape");
    zlist_push (list, "tomato");
    printf ("%zd: ", zlist_size (list));

    char *fruit = (char *) zlist_first (list);
    while (fruit) {
        printf ("%s ", fruit);
        fruit = (char *) zlist_next (list);
    }
    puts ("");
    zlist_destroy (&list);
    return 0;
}

The C code is a little more verbose. It has to pass the list object to every method. It cannot do tricks like overloading "++" to move the list pointer. It must destroy its own objects. Yet it's clear, and explicit.

Sure, there are more compact ways of writing this example, both in C and in C++. That isn't the point.

I'm not claiming that Scalable C can do everything C++ can. Nor will it be as compact. Yet it is immediately familiar, explicit, and transparent. I'd much rather write and read 100 lines of code like this than 10 lines of code that rely on inheritance, operator overloading, and other syntactic tricks.