The Problem With Package Managers

As Linux moves farther away from its UNIX roots, and more towards being yet another appliance for the drooling masses (the same drooling masses who just five years ago couldn’t grok the difference between a CD-ROM tray and a cup holder), our once great proliferation of usable choices has dwindled due to a tendency on the part of developers to target only Debian- or Red Hat-based distributions, with a strong bias towards Ubuntu on the Debian side, while few of the more generous developers will also target SuSE, and even fewer will distribute software as a distribution-agnostic tarball. This situation leaves users of other distributions in a precarious position, especially in the case of those of us who–like the author of this article–believe that systemd is a baroque, labyrinthine monument to bogosity (how Lennart Poettering manages to get hired by any reputable software development firm is an atrocity that boggles the mind–his other big “hit” is a three-coil, peanut-laden steamer of a solution-looking-for-a-problem called PulseAudio), and would seek one of the increasingly rare sysvinit based distributions to get away from it.

This is a problem mostly due to package managers. If you’re on a Debian-based system, you get apt. Red Hat, yum. SuSE, zypper. These utilities should need no introduction, and are often praised by Linux users: a single command will install a package and all of its required shared libraries and dependencies, and another command will upgrade packages to the latest and greatest versions, all from a centralized, cloud-based repository or list of repositories. They do provide some convenience, but at a cost: the days of reliably being able to find a simple tarball that will work with the incantation of ./configure; make; make install seem to be numbered. This was a nice, cross-platform solution, and had the added benefit of producing binaries that were well-optimized for your particular machine.

One bright light in all this darkness is the pkgsrc tool in NetBSD: you check out a full source tree from a CVS repository, and this creates a directory structure of categories (editors, databases, utilities, etc.) into which are further subdirectories representing packages. All you need to do is descend into the desired subdirectory and type an appropriate make incantation to download the package and its dependencies, build them, and install them to your system. Updates are similar: fetch the latest updates from the CVS repo, and repeat the process.

However, not even pkgsrc has solved the other big problem with most package managers, and that is the politics of getting new packages into the repositories. The Node.js package manager, npm, is the only one that does this correctly (in the FOSS sense) in any way: you go to the npmjs.org website, create an account, choose a package name (and hope it hasn’t already been taken by another developer), and you are in charge of that little corner of the npm world. You manage your dependencies, your release schedule, your version scheme, the whole nine yards. With Linux distributions, it seems that only a blood sacrifice to the gatekeepers will allow you to contribute your own packages, and even when you get past their arcane requirements, it is still a mass of red tape just to publish patches and updated versions of your software. Node.js, for instance, has not been updated in the mainline distribution repositories since v0.10, which is by all measures an antique.

In order to meet my standards, there are three solutions, that should be employed together:

  • Publicly and brutally shame developers who release only deb and rpm packages but no ./configure; make; make install tarball until they are so insecure that they cry into their chocolate milk and do the right thing (or strengthen the developer gene pool by quitting altogether and opting for a job wiping viruses for drooling PC users with The Geek Squad)
  • Push the Linux distributions to abandon the brain-dead cathedral approach to repo management and opt for a more bazaar-like egalitarian approach like npm
  • Make countless, humiliating memes of Lennart Poettering in embarrassing and compromising contexts (this bit is more for the health of UNIX as a whole than for package managers, but it’s the duty of every good UNIX citizen)

 

Advertisements

Why C is Almost Always the Wrong Choice

C has no true string data type.

The common arguments defending this as a feature rather than a shortcoming go something like this:

  • Performance. The argument here is that statically-allocated, null-terminated char arrays are faster than accessing the heap, and by forcing the programmer to manage his own memory, huge performance gains will result.
  • Portability. This one goes along the lines that introducing a string type could introduce portability problems, as the semantics of such a type could be wildly different from architecture to architecture.
  • Closeness to the machine. C is intended to be as “close to the machine” as possible, providing minimal abstraction: since the machine has no concept of a string, neither should C.

If these arguments are true, then we shouldn’t be using C for more than a tiny fraction of what it is being used for today. The reality of these arguments is more like this:

  • Performance: I’m a programmer of great hubris who actually believes that I can reinvent the manual memory management wheel better than the last million programmers before me (especially those snooty implementers of high-level languages), and I think that demonstrating my use of pointers, malloc(), gdb, and valgrind makes me look cooler than you.
  • Portability: I’m actually daft enough to think that the unintelligible spaghetti of preprocessor macros in this project constitutes some example of elegant, portable code, and that such things make me look cooler than you.
  • Closeness to the machine: I’ve never actually developed anything that runs in ring zero, but using the same language that Linus Torvalds does makes me look cooler than you.

The technical debt this attitude has incurred is tremendous: nearly every application that gets a steady stream of security vulnerability patches is written in C, and the vast majority of them are buffer overflow exploits made possible by bugs in manual memory management code. How many times has bind or sendmail been patched for these problems?

The truth is that most software will work better and run faster with the dynamic memory management provided by high-level language runtimes: the best algorithms for most common cases are well-known and have been implemented better than most programmers could ever do. For most corner cases, writing a shared library in C and linking it into your application (written in a high-level language) is a better choice than going all-in on C. This provides isolation of unsafe code, and results in the majority of your application being easier to read, and easier for open-source developers to contribute to. And most applications won’t even need any C code at all. Let’s face it: the majority of us are not writing kernels, database management systems, compilers, or graphics-intensive code (and in the latter case, C’s strengths are very much debatable).

The long and short of it is that most software today is I/O-bound and not CPU-bound: almost every single one of the old network services (DNS servers, mail servers, IRC servers, http servers, etc.) stand to gain absolutely nothing from being implemented in C, and should be written in high-level languages so that they can benefit from run-time bounds checking, type checking, and leak-free memory management.

Can I put out a CVE on this?

Using GT.M external calls to access shared libraries

Many times, when developing MUMPS applications, you may come upon a situation where you need to use a bit of functionality exposed by a Linux shared library. FIS GT.M provides support for this via its external call mechanism. The syntax and semantics are a bit odd, so we’ll step through the implementation of a wrapper for some of the  trigonometry functions exposed by the C standard library. This example will be produced with the intention of being complete and usable.

Assumptions on the Reader

I will assume that you have a reasonably recent GT.M release configured on a Linux system. I will assume that you have access to a bash shell prompt (for those of you using VistA, a captive account which takes you directly into a VistA entry point is not sufficient for this tutorial, as you will be creating several files in the Linux filesystem with tools which don’t exist in or are inaccessible from the GT.M programmer mode. If this applies to you, please see your local guru for help).

I will assume that your local GT.M routines are in $HOME/p, that $gtmroutines is set accordingly, that you have a $HOME/lib directory available, and that your GT.M environment is set up to the point where you can read and set MUMPS globals.

I will also assume that you have working knowledge of MUMPS and C programming, including basic knowledge of pointers and their use for the latter language.

I will also assume that you have gcc and make installed, and that you or your system manager has made them available in your search path. If you are working through this example on your own machine, here are some instructions for getting gcc and make running:

Ubuntu

$ sudo apt-get install build-essential

Other Distributions

For other distributions, please search Google.
First, let’s talk a bit about the overall architecture of the GT.M external call mechanism.

Architecture

The GT.M external call mechanism uses a layered architecture. The GT.M runtime looks in the GTMXC_yourcalltable environment variable to find the location of a .xc file which contains the GT.M to C mappings. The .xc file also contains a path to a shared library (a file ending with the .so extension) in which the external routines are defined. Here’s an abstract overview of what the architecture looks like:

GT.M Callout Architecture

Figure 1.1: Abstract GT.M external call architecture

If we were creating new functionality without trying to access an existing shared library, the wrapper_library.so and wrapped_library.so pieces of the stack would likely be replaced with a single .so file containing the new functionality.

In this case, we’ll derive from this abstract architecture a more concrete architecture to apply to our trigonometry wrapper:

Concrete Architecture

Figure 1.2: Concrete architecture for our applicaton

GT.M Type System

When working with external calls to non-MUMPS shared libraries in GT.M, you need to first come to terms with the fact that you are going to end up writing C wrapper functions for every function you use. Unfortunately, GT.M lacks the intelligence to directly call external functions of arbitrary types and parameter lists, and requires an external call table to map the weakly-typed data of C to the untyped data of MUMPS.

Return Values

In order for an external function to be callable by GT.M, it can only return one of three types:

  • gtm_long_t (a long integer)
  • gtm_status_t (an int)
  • void (function does not return a value)

For our purposes, we will use gtm_status_t for as the return values’ types. This will allow us to return a 0 value from our wrapper functions when successful. Returning a nonzero value will indicate to GT.M that an error has occurred. For the sake of clarity, we will leave extensive error handling as an exercise to the reader.

Parameters

Each parameter can be either an input parameter (GT.M passes the value of this parameter to C) or an output parameter (GT.M passes a reference to this parameter to C, which populates it with a return value). Parameters can be any of the types listed in Chapter 11 of the GT.M UNIX Programmer’s Guide. For our purposes, we will be using gtm_double_t* for both our input parameters and output parameters.

External Call Table

Okay, we’re now ready to look at the format of the external call table (trig.xc in our example). The first line must be a full UNIX path to the shared library that GT.M will call, for example:

$HOME/lib/trig.so

This will tell GT.M to look in /home/your_username/lib/trig.so when trying to resolve the functions defined within the trig.xc external call table.

The remainder of the external call table is a list of definitions which map C functions to GT.M routines; one per line. Ours will look like this:


sin: gtm_status_t m_sin(I:gtm_double_t*, O:gtm_double_t*)
cos: gtm_status_t m_cos(I:gtm_double_t*, O:gtm_double_t*)
tan: gtm_status_t m_tan(I:gtm_double_t*, O:gtm_double_t*)

Using the sin function, let’s break down the format of one of these lines:

  • sin: is the name by which this function will be referred when called by our MUMPS code
  • gtm_status_t is the data type which will be returned by our C wrapper
  • m_sin is the name of our C wrapper function
  • I:gtm_double_t* specifies that the first parameter of our C wrapper is a pointer to a double-precision floating point value. The I specifies that this parameter is used for input to our C wrapper. In this case, this is the number to which the sin function will be applied.
  • O:gtm_double_t* specifies that the second and final parameter of our C wrapper is a pointer to a double-precision floating point value. The O (output) specifier indicates that GT.M will be passing a variable by reference for this parameter, and that our C wrapper will be populating it with a return value.

Here’s the completed external call table, trig.xc:


$HOME/lib/trig.so
sin: gtm_status_t m_sin(I:gtm_double_t*, O:gtm_double_t*)
cos: gtm_status_t m_cos(I:gtm_double_t*, O:gtm_double_t*)
tan: gtm_status_t m_tan(I:gtm_double_t*, O:gtm_double_t*)

C Wrapper Functions

For our C wrapper functions, there are a couple of important conventions to note:

  • The first parameter to each wrapper function must be an int. Although this is not (and must not be) specified in the external call table (trig.xc), it must be included in each wrapper function. GT.M will automatically pass to this parameter a value indicating the total number of parameters passed to our wrapper function. It is essentially the GT.M external call mechanism’s own implicit version of argc.
  • We must tell the preprocessor to include gtmxc_types.h which is located in $gtm_dist

Let’s look at the complete trig.c:

#include <math.h>
#include "gtmxc_types.h"

gtm_status_t m_sin(int c, gtm_double_t *x, gtm_double_t *out)
{
    *out = sin(*x);
    return(0);
}

gtm_status_t m_cos(int c, gtm_double_t *x, gtm_double_t *out)
{
    *out = cos(*x);
    return(0);
}

gtm_status_t m_tan(int c, gtm_double_t *x, gtm_double_t *out)
{
    *out = tan(*x);
    return(0);
}

The points that bear further discussion are the function definitions, such as m_sin(int c, gtm_double_t *x, gtm_double_t *out), and the pointer assignments, such as *out = sin(*x);

The function definitions are unique in that they use the typedefs (defined in $gtm_dist/gtmxc_types.h) for the return type and parameter types. This should facilitate portability among the various flavors of UNIX and Linux that GT.M supports.

The assignments use pointers so that the output value (in this case, gtm_double_t *out) can be accessed from within the GT.M environment. The assignment *out = sin(*x); means that we are assigning the value of the sin function of the data pointed to by x into the memory location pointed to by out. This is what allows GT.M to retrieve the value from within its native environment. When dealing with C pointers, I find it useful to read the “flow” of the operation from right-to-left.

MUMPS Routine

Next, we will build a MUMPS routine to hide our use of the GT.M external call mechanism. This is a good idea in case you ever need to port your MUMPS application to a platform that uses different mechanisms for external calls, such as InterSystems Cache’.

Here is our MUMPS routine, trig.m:

trig ;; trigonometry wrappers

sin(num)
 new result
 do &trig.sin(num,.result)
 quit result

cos(num)
 new result
 do &trig.cos(num,.result)
 quit result

tan(num)
 new result
 do &trig.tan(num,.result)
 quit result

Although you could call the external routines directly, the wrapper-within-a-wrapper approach provides more readable code, hides the call-by-reference from the programmer using your routines, and gives you MUMPS code that is more idiomatically representative of the 1995 MUMPS ANSI standard.

The uniqueness of this routine is in the &wrapper.function() syntax. The &trig.function() syntax instructs GT.M to check the $GTMXC_trig environment variable to find the external call table it should use to execute function(). In official GT.M parlance, trig is a package. There is also a $GTMXC environment variable, which points to the callout table for what is referred to as the “default” package, but in the interests of portability and modularity, we will not cover its use here.

Compiling and Linking

Now that we have our MUMPS code (trig.m), our callout table (trig.xc), and our C wrappers (trig.c), we can create a Makefile to compile and link our wrapper functions into a shared library. Please note that this makefile is specific to Linux and may or may not work on other UNIX or UNIX-like operating systems.

CFLAGS=-Wall

all: trig.so

trig.so: trig.o
        gcc $(CFLAGS) -o trig.so -shared trig.o -lm

trig.o: trig.c
        gcc $(CFLAGS) -c -fPIC -I$(gtm_dist) trig.c

clean: 
        rm trig.so
        rm trig.o

install:
        cp trig.so $(HOME)/lib

Let’s break this down line by line:

  • CFLAGS=-Wall

This line gives us a variable for flags that we will always pass to the C compiler. -Wall instructs the compiler to enable all warning messages. This should always be used, as it will help you to write cleaner code.

  • all: trig.so

This is the first rule of the Makefile, which will be invoked automatically if make is run with no command-line arguments. It simply says that the rule “all” depends on “trig.so” to be considered complete. So, if “trig.so” does not exist, make will then scan the Makefile to find a rule to use to build “trig.so”

  • trig.so: trig.o

This is the rule for building “trig.so”. It simply means that trig.so depends on the existence of “trig.o” in order to build it. If “trig.o” does not exist, make will search for a rule to build “trig.o”

  • gcc $(CFLAGS) -o trig.so -shared trig.o -lm

This is the command necessary to generate “trig.so” from “trig.o”. The “-o” flag tells the linker to name the output “trig.so”. The “-shared” flag tells the linker that we wish to generate a shared library from “trig.o”. “-lm” tells the linker that this production depends on libm.so (the “-l” flag automatically prepends “lib” onto the library we specify. “-llibm” does not work).

  • trig.o: trig.c

This is the rule from building “trig.o” from “trig.c”. It informs make that trig.o requires trig.c in order to be built. Since there is no rule in this Makefile for generating “trig.c”, make will look for “trig.c” in the current working directory.

  • gcc $(CFLAGS) -c -fPIC -I$(gtm_dist) trig.c

This is the command necessary to generate “trig.o” from “trig.c”. The “-c” flag instructs gcc to compile, but not link, the specified file. The “-fPIC” flag instructs gcc to generate position-independent code, which means that the symbols in the object file may be relocated and resolved at load time rather than link time, which is necessary for shared libraries. The “-I$(gtm_dist)” flag tells the compiler that it should search $(gtm_dist) for header (.h) files. This is necessary because gtmxc_types.h will not likely be in a place that the compiler knows about. $(gtm_dist) is an environment variable which is required in order for GT.M to function, and will always contain the path in which gtmxc_types.h is located.

Putting it all together

When you have trig.m, trig.c, trig.xc, and Makefile typed in, run the following commands from the shell prompt:

$ cp trig.m $HOME/p/
$ make
$ make install
$ export GTMXC_trig=$HOME/lib/trig.xc
$ mumps -dir
GTM> w $$sin^trig(2.53)

GT.M will respond by writing .574172 to the screen.

Where to go next

Refer to the GT.M UNIX Programmer’s Manual for more information on advanced uses of the GT.M external call mechanism.

I hope this article proves useful, and welcome your feedback!