Routing

24th February 2024

Parameters

I had to make a fix to the routing, specifically I hadn't considered URLs that contained parameters. Parameters are the part of the URL after the question mark. The bit before the question mark identifies the resource you are looking for, the parameters are extra options you can pass to the server to do magic things with that resource.

I was re-hosting an old project that uses parameters (more on this later) and the server was attempting to map the entire incoming URL including the parameters to a file, this obviously failed.

The fix was quite simple, I used the strchr function to return a pointer to the first instance of the ? character, then using some pointer magic I replaced the question mark with a null terminator. This effectively means if I treat the path variable as an array, It now contains both the path and the query, if I treat it as a string it now only contains the path. I think this is a valid approach, but it feels a little wrong.

char* parameters = strchr(path, '?');
if (parameters != NULL) {
	*(parameters) = '\0';
	parameters++;
}

I was worried initially that when the path array is freed, it would only free up memory to the null terminator I added. I realise this is not a valid concern, this is not how memory management works. If the array is on the stack, when the function exits, the whole stack frame is released including the entire array. If I allocated it on the heap using malloc, a call to free releases all the memory allocated no matter what I’ve done with the data in it.

Having slept on it, I’m fairly confident this is in fact an efficient way of parsing the URL. It may also be a good approach to parsing the entire incoming request. It’s a destructive approach. By interesting null terminators at key points to break up the request into individual strings I would not be able to treat the whole thing as a single string again. I think this would be OK, an approach to keep in mind.

Getting back to the URL for a second, parameters are not the only thing I need to worry about, URLs can also have anchors, which is everything after the hash (#) character. The same approach works here too.

Directories

I also changed the way I route requests for directories. Previously if the URL mapped to a directory I would look for and serve the index.html from that directory. This is fairly standard behaviour. It gets complicated because you can have two different paths for the same directory. For example, /foo/bar and /foo/bar/ both map to the same “bar” directory. One has a trailing /, the other doesn’t.

Technically there is nothing stopping you from returning different content for each of the URLs, but that is considered confusing and bad practice. So I want to treat them the same.

I had catered for this by serving the index.html file (if present) in both cases. The problem is with relative urls. When responding to the /foo/bar path, the current directory is considered to be the “foo” directory. When responding to /foo/bar/ the current directory is considered to be the “bar” directory. If the index.html file contains relative URLs for other resources, those URLs will point to different places in the two different scenarios. The correction I've made is when the client requests a directory without a trailing forward slash, I respond with the HTTP code “301 - Moved Permanently” to redirect the client to the URL with the forward slash.

Except not in all cases. At the moment all the content on this site is stored in static files. That includes the blog posts which are in plan boring HTML files. At some point this will change and I will move the posts into a database. Each of these posts also has a direct link and thinking forwards I do not want the link to change when I move them into the DB. So I’ve added some hackery to the routing code for individual blog posts. For example the path for this post is /blog/routing but the HTML is currently in the file /blog/routing/post.html and any other resources like images will also be stored in the directory /blog/routing/. It works for now, but at some point when I have a proper blog module/app, I’ll need to revisit how routing works.

Colour

I also added some colour. The blog looks much better now, well at least I think so.

I'm not particularly skilled at web design. I can tell when something is working and when it's not, but when it's not I don't exactly know what to do to fix it, I get there by trial and error. It's going to be an iterative process.

At some point I need to add a masthead, but right now I'm not sure what that will look like. I did briefly have a play with DALL-E in an attempt to make a logo for the site. It generated this, which is far too busy for a logo and makes me out to be far cooler than I really am.

Pixel art of a coder

I do like it though.