Moohar Archive

Build dates

17th April 2024

This was supposed to be a quick simple thing but it turned into an adventure down a rabbit warren.

The problem

I’ve got my development laptop, a test machine and the main server. Sometimes I lose track of what version of the Tinn executable is on each machine. To address this I added the TINN_VERSION macro to the code a while ago. When the program starts, the first thing it does is output that value to the terminal. It works except I need to remember to update that value when the version changes. This is a semantic version number and currently only the core version with the major, minor and patch numbers. It doesn’t therefore change every time I compile the code. This means there could be different executables with the same version number, which is especially true when I’m testing (which is all the time). I need something more, I need a build number or build date and I need that to be added to the program automatically every time I compile it.

Plan A

My first instinct was I could just do this with a preprocessor macro. In my head I had equated macros with code that is run at compilation time, so I could use some macros to inject the date or a build number into the code somewhere. As I’ll explain in a moment, my understanding was correct but incomplete so (spoilers) this won’t work.

There are some predefined macros I could use, __DATE__ and __TIME__ look promising. The description for __DATE__ is “the compilation date of the current source file” which at first seems to be exactly what I need. Except it’s not. There are two problems, first the format of this date is unhelpful, second the “current source file” is not what I really need.

The format of this date is mmm dd yyyy for example Apr 17 2024. It hopefully goes without saying I can read and understand that date, so strictly speaking it would work for my purposes of identifying when the file was compiled, but I’m not an American so month-day-year looks ugly to me. Being from the UK I use day-month-year when dates are written as text. But when writing a date as a number I use year-month-day, because that's how numbers work, the most significant part should be on the left. If you’ve ever tried to sort dates you know I’m right.

Unfortunately there are no options in the C preprocessor to change this. There are no options to get the date in a numeric or ISO 8601 format. There are no options to manipulate the value generated by __DATE__ into a standard format. I found some verbose examples on-line that claim to work, but when you dig a little deeper they only work with the C++ preprocessor.

All that said, the format is a moot point because even if I reformat it, the value will be incorrect. To understand why you have to understand the C compilation process a little bit. There are better writeups out there than what I’m about to present, and I’ve squished some steps together for the purposes of this post. Broadly speaking the C compilation process goes like this:

  1. The preprocessor takes the source code and converts all the macros it contains into normal C code
  2. The compiler and assembler turns the processed C source code into object code
  3. The linker joins all the objects together to make an executable

By default, the C compiler does all this for you in sequence, source code goes in, executable comes out and you don’t need to worry about the intermediate steps.

You can, if you desire, break the process down and save the intermediate results. For example with the GCC compiler the -E option will only run the preprocessor and output the expanded preprocessed C code to the terminal (technically standard output). Possibly useful if you want to debug what your macros are doing.

The -c option will run the first two steps and then save the resulting object code in a .o file. Then quite separately you can link the object files together along with any libraries and whatnot into an executable. Why do it this way? As I described in my last post, it can save compilation time. Each step can be quite time consuming. If you have a massive project with lots of source files, you can run steps one and two in advance and save the resulting object file. Then if you make any modification to the .c source files, you only need to re-run steps one and two on the modified files. Then link the resulting object code to the object code from all the other files you’ve already compiled.

Since I broke the Tinn project into multiple source files and started using Make to manage the build. This is basically the process I follow. New object files are created for any source files that have been modified and saved. The object files are then combined into the executable.

What does this have to do with build numbers? Well in my example, I updated the main function in tinn.c to output the build date using the __DATE__ macro. The first time I compiled the code, step one turned the __DATE__ macro into a string literal with the current date and inserted that into the extended c source code. The second step compiled this into object code and saved the result in a tinn.o file. Then, after all the other source files had also been compiled to object code, the linker generated the executable. So far so good. When I run the program it outputs the current date as a build date, which is correct.

The problem occurs on the next day when I work on another part of the program. If for example I make an update to the client.c source file, when I compile the program, Make will recognise I’ve updated that file and will re-run steps one and two generating a new client.o file. tinn.c has not been touched, so the linker can re-use the tinn.o file generated the day before. All the object code is linked together and the new executable is ready to go. When I run the program it outputs the previous day's date as the build date, which is incorrect. This is because the date was computed the day before when the pre-processor ran on the tinn.c file and that was baked in as a literal value into the object file.

So while macros do evaluate at compile time, depending on your build process they may not be evaluated every time. So to make this work I would need to force any file that uses the build date to be completely compiled every time. Currently this would only affect one file (tinn.c) but it seems like a trap. Added to my discomfort at the format of __DATE__ value, it was time for plan B.

Plan B

Google it. This isn’t an original problem, other smarter people than me have surely solved this already. Oh, apparently not really. Well yes, people have solved it, but it’s all a bit hacky.

I found a lot of hits for people using rc files in Visual Studio to add version information to the executable, this seemed like a very Windows centric approach, something to investigate later if (when) I switch to getting this to work on Windows. It did however point me in the direction of using the linker instead of the preprocessor.

I found some posts about how in Visual Studio you can create pre-build scripts to generate a source file with the date/build number. Not helpful to me, I’m not using Visual Studio or an IDE really. I am using Sublime as my text editor, which does have a build system and scripting, so a possible option if I can’t find something else. I’m also already using Make, and this seemed like the obvious next step.

I found a tutorial by Mitch Frazier that uses Make and linker symbols. It all seemed very straight forward, the linker (step three) could inject a value (the build date or number) into the executable. Except I will soon discover that's not how linker symbols work. Anyway, using the linker over the preprocessor made sense to me as it is used every time the executable is generated so it would always generate an accurate date.

Adding a linker symbol was achieved by passing some arguments to the compiler. Specifically:

-Xlinker --defsym -Xlinker __BUILD_DATE=$$(date +'%Y%m%d')

This uses the shell command date to generate the date value and the compiler arguments to create the __BUILD_DATE symbol.

To access this symbol in the C code you define an external char variable with the same name.

#include <stdio.h>

extern char __BUILD_DATE;

main() {
    printf("Build date  : %lu\n", (unsigned long) &__BUILD_DATE);
}

Then, to get the value of the symbol you read that variable’s address…wait what now?

Symbols are not variables. They look like variables, but really they are just memory addresses, I think. I have a very loose understanding of this, but here goes. When an executable is generated, all the instructions and functions and constants and literals etc need to be stored somewhere in the resulting file. Each one of these normally has a name and the linker will list all these names and what address you can find its definition at. For example in the last build of Tinn, the main function can be found at address 4a10. This is what a linker symbol is, it maps names to addresses.

The arguments I passed to the compiler simply created a new symbol called __BUILD_DATE which points to memory address 20240417. It didn’t create anything at that address and I’ve got no idea what is at that address. In fact reading that address is undefined behaviour, although very probably a segment fault. The trick is I don’t care what is at that memory address, I just want the address which I can then use as a value.

It’s a clever idea, a bit hacky, but should fit my needs. Problem is I couldn’t get it to work. The tutorial is from 2008 and I get the feeling things have changed a little since then. Specifically the number the program output never matched the number I passed to the compiler. I used the nm (name mangling) command to output all the symbols and their values from the executable file, this confirmed __BUILD_DATE was added and set to 20240417 aka the date, but when I ran the program it would always output a much larger value.

My assumption is there is some relative address thing going on. When the program runs the address of the __BUILD_DATE is the address in the symbol table plus the start address of the program. I sort of confirmed this by adding a second linker symbol called __START with the value of 0 then outputting the difference between the address of this and __BUILD_DATE.

-Xlinker --defsym -Xlinker __START=0 -Xlinker --defsym -Xlinker __BUILD_DATE=$$(date +'%Y%m%d')
#include <stdio.h>

extern char __START;
extern char __BUILD_DATE;

main() {
    printf("Build date  : %lu\n", ( (unsigned long) &__BUILD_DATE) - ((unsigned long) &__START) );
}

This gave the correct output of 20240417.

I could have declared victory at this point and moved on, but because I felt like I’m abusing linker symbols and I’m not really sure what I’m doing is safe. I decided to look for other solutions.

Plan C

Use a linker script to insert a value properly into the executable? Maybe, one day. But that seems like a whole other language I would need to learn to just add a build date. Next.

Plan D

Use Make to generate a C source file with the build date in it. Some of my googling way back at the start of plan B pointed me in this direction and having got my head around makefiles some more and the C compilation process a lot more I think I could do it. The key is to do it in such a way the file is only generated if at least one other source file has changed and whenever at least one source file has changed. This way the date is always updated when it is needed but only when it is needed.

I started with some prep-work and created a version.h header file. This contains the semantic version information (which I update manually as needed) and the declaration of an external constant string called BUILD_DATE:

#ifndef VERSION_H
#define VERSION_H

#include "utils.h"

#define VERSION_MAJOR 0
#define VERSION_MINOR 9
#define VERSION_PATCH 0

#define VERSION "v" STR(VERSION_MAJOR) "." STR(VERSION_MINOR) "." STR(VERSION_PATCH)
extern const char* BUILD_DATE;

#endif

If I was doing this by hand, I would then create a version.c file that contained the definition of that variable:

const char* BUILD_DATE = "2024-04-17T23:06Z";

But I’m not. I want to generate that file automatically. Time to modify the makefile. The makefile has this code to build the executable.

# link .o objects into an executable
$(BUILD)/$(TARGET): $(OBJS)
	@$(CC) $(COMP_ARGS) $(OBJS) -o $@

The target $(BUILD)/$(TARGET) translates to ./build/tinn. The dependencies $(OBJS) is the list of object files that is computed earlier in the file by searching for all the .c files in the src directory and translating the file names to .o files.

# dirs
BUILD := ./build
SRC := ./src

# build list of source files
SRCS := $(shell find $(SRC) -name "*.c")
# turn source file names into object file names
OBJS := $(SRCS:$(SRC)/%=$(BUILD)/tmp/%)
OBJS := $(OBJS:.c=.o)

Through some recursion magic wherever I go to build the executable, Make will check each of those object files and each of its dependencies and if anything has changed, recompile that object file and then the executable. If none of the object files dependencies have changed, then none of the object files need to be recompiled and nor does the executable. This is therefore where I need to insert my new build date generating code.

# link .o objects into an executable
$(BUILD)/$(TARGET): $(OBJS)
	# !! generate build date here !!!
	@$(CC) $(COMP_ARGS) $(OBJS) -o $@

I don’t really want to store the version.c file in the src directory. If I did that, it would get picked up by the call to find in the makefile and added to the list of source and object files to monitor. I don’t really need the source file at all, I only really need its object file so I can link it in later.

The compiler can be set to accept input from standard input instead of a file. For example:

gcc -xc -c - -o test.o

In this call to the GCC compiler the -xc informs it that it’s compiling C code, this is required because with no file and no file extension, the compiler needs to be told what language it’s compiling. The -c instructs it to stop before linking and generate an object file. The lone - is where normally the input file path would be, in this case in informs the compiler to instead use standard input. Finally the -o test.o set the output path.

I can use a call to echo to generate the text for the source file with a call to date to generate the date. I can then pipe that to the call to gcc.

VERSION := $(BUILD)"/tmp/version.o"

# link .o objects into an executable
$(BUILD)/$(TARGET): $(OBJS)
	@echo "const char* BUILD_DATE = \""$(shell date -u "+%Y-%m-%dT%H:%MZ")"\";" | $(CC) -xc -c - -o $(VERSION)
	@$(CC) $(COMP_ARGS) $(OBJS) $(VERSION) -o $@

The only other change required is to include version object file in the final call to the compiler to build the executable because I’ve intentionally not included it in the list of other object files.

That’s it. It works. The build date is updated whenever any other source file is changed. If there are no other changes it leaves the executable alone and doesn’t recompile it unnecessarily.

I think I’m happy, I’ll let you know.

TC

Later...

Friend
Hi there. Been up to much? Have you watched Fallout yet?
No, I’ve been working on my project most of the weekend.
Friend
Yer? What have you been working on?
It’s taken me longer than anticipated getting automatic build dates working.
Friend
Build dates. Doesn’t that just happen?
Not with C. You have to build your own solution it seems.
Friend
Really? Isn’t that just one line of code?
Actually…three…and a bit..
Friend
And that took you all weekend?
Not all weekend. I had to sleep on the problem. Twice. AND then write it up for the blog, that took almost as long.
And I know so much more now.
Friend
such a geek
I am one with the code.