Wednesday, December 16, 2009

Give up the func(tion)

strace is one of my favorite tools for debugging in Linux. If you've never used strace, it's a tool that prints (on stderr) the system calls executed by a program, with its arguments and the return value of the system call. Looking at the output of strace, you can step through a lot of a program's most interesting interactions, and see where something unexpected happens.

However, strace only shows you the system calls. So, while you see every file that is opened or closed, every byte that is written to a file descriptor, and every signal that is received, you don't see any indication of how any user functions are called or their results.

Typically, if you want to see this level of detail, you run your program in a debugger, where you can see its moment-by-moment execution. But sometimes what you want is just to insert some logging into an existing program. This is no problem if you have the source, but what if you don't?

As it happens, you can do this pretty easily, using the LD_PRELOAD environment variable and the dynamic linking loader. I'll run through a contrived example here.

Suppose I want to log every time my program calls the function gets(), because it's a really bad function for anyone to use.

I would write a wrapper function like this:

char *gets(char *s) {
static char *(*original_gets)(char*) = NULL;
char *result;
if (original_gets == NULL) {
original_gets = dlsym(RTLD_NEXT, "gets");
}
fprintf(stderr, "Someone called gets!!!\n");
result = original_gets(s);
fprintf(stderr, "And here is what they got with gets: %s\n", result);
return result;
}


There are just two important things here. First, the function has the same name and arguments as the original function, and second, that the function can use dlsym() with the RTLD_NEXT option to get a reference to the "next" implementation of the gets() function.

dlsym() is one of the functions that POSIX defines for interfacing with the dyamic linking loader. dlsym() is the one that allows you to resolve a particular name in various ways. The one I'm using, which is activated by the RTLD_NEXT parameter, indicates that the next definition for the given name should be returned. This allows application writers to do exactly what we're doing here: writing wrappers.

So, now that I've defined a wrapper for the normal gets() function, I put it into a shared library.

Finally, to use my wrapper, I set it as the value of the LD_PRELOAD environment variable when running a program, like this:

$ env LD_PRELOAD=./libfunc.so ./getter

This runs a program named "getter" in the current directory.

Here is the sample output:

Someone called gets!!!
Hello
And here is what they got with gets: Hello
Here is your input: Hello


So, now you know how to write a wrapper in C. Enjoy.

Here are the full sources for the code discussed in this post:

func.c:

/* Need to define _GNU_SOURCE for RTLD_NEXT to be available on Linux.
* (Since it's not part of Posix.)
*/
#define _GNU_SOURCE

#include <stdio.h>
#include <stdlib.h>
#include <dlfcn.h>

char *gets(char *s) {
static char *(*original_gets)(char*) = NULL;
char *result;
if (original_gets == NULL) {
original_gets = dlsym(RTLD_NEXT, "gets");
}
fprintf(stderr, "Someone called gets!!!\n");
result = original_gets(s);
fprintf(stderr, "And here is what they got with gets: %s\n", result);
return result;
}


getter.c:

#include <stdio.h>

int main(int ac, char **av) {
char buf[64]; /* should be enough, right? */
gets(buf);
printf("Here is your input: %s\n", buf);
return 0;
}


Makefile:

CFLAGS = -fPIC
LDFLAGS = -shared -ldl

all: libfunc.so getter

libfunc.so: func.o
$(CC) -o libfunc.so func.o $(LDFLAGS)

func.o: func.c

getter: getter.o
cc -o getter getter.o

getter.o: getter.c

test: getter libfunc.so
env LD_PRELOAD=./libfunc.so getter

clean:
rm -f *.so *.o getter

Software estimation formula

This software eatimation formula is inspired by Fred Brooks' observations about the effect of team size on project schedule in The Mythi...