Why C is Almost Always the Wrong Choice

C has no true string data type.

The common arguments defending this as a feature rather than a shortcoming go something like this:

  • Performance. The argument here is that statically-allocated, null-terminated char arrays are faster than accessing the heap, and by forcing the programmer to manage his own memory, huge performance gains will result.
  • Portability. This one goes along the lines that introducing a string type could introduce portability problems, as the semantics of such a type could be wildly different from architecture to architecture.
  • Closeness to the machine. C is intended to be as “close to the machine” as possible, providing minimal abstraction: since the machine has no concept of a string, neither should C.

If these arguments are true, then we shouldn’t be using C for more than a tiny fraction of what it is being used for today. The reality of these arguments is more like this:

  • Performance: I’m a programmer of great hubris who actually believes that I can reinvent the manual memory management wheel better than the last million programmers before me (especially those snooty implementers of high-level languages), and I think that demonstrating my use of pointers, malloc(), gdb, and valgrind makes me look cooler than you.
  • Portability: I’m actually daft enough to think that the unintelligible spaghetti of preprocessor macros in this project constitutes some example of elegant, portable code, and that such things make me look cooler than you.
  • Closeness to the machine: I’ve never actually developed anything that runs in ring zero, but using the same language that Linus Torvalds does makes me look cooler than you.

The technical debt this attitude has incurred is tremendous: nearly every application that gets a steady stream of security vulnerability patches is written in C, and the vast majority of them are buffer overflow exploits made possible by bugs in manual memory management code. How many times has bind or sendmail been patched for these problems?

The truth is that most software will work better and run faster with the dynamic memory management provided by high-level language runtimes: the best algorithms for most common cases are well-known and have been implemented better than most programmers could ever do. For most corner cases, writing a shared library in C and linking it into your application (written in a high-level language) is a better choice than going all-in on C. This provides isolation of unsafe code, and results in the majority of your application being easier to read, and easier for open-source developers to contribute to. And most applications won’t even need any C code at all. Let’s face it: the majority of us are not writing kernels, database management systems, compilers, or graphics-intensive code (and in the latter case, C’s strengths are very much debatable).

The long and short of it is that most software today is I/O-bound and not CPU-bound: almost every single one of the old network services (DNS servers, mail servers, IRC servers, http servers, etc.) stand to gain absolutely nothing from being implemented in C, and should be written in high-level languages so that they can benefit from run-time bounds checking, type checking, and leak-free memory management.

Can I put out a CVE on this?