I really like statistics.

I especially like collecting data about myself. Whether it be running, drinking, diet, sleeping or drinking again, I like to know details about my progress, and goals I have, and how I might be best suited to going about achieving those goals. I crave knowledge about myself and I enjoy comparing myself to others to motivate myself to improve (especially important when drinking – one can never try enough different types of booze).

Those of you who care about my life may already know that I recently moved away from Bath, where I work, to live with my girlfriend. I now live in a medium sized town called Amesbury, in Wiltshire. Amesbury is a lovely place, and I couldn’t be happier with how the move has gone, but there is one obvious downside – I’ve traded my two minute walk to work for a much larger drive:

Now, that Google estimation is very pessamistic (on an average day it takes me about 50-55mins I reckon, but more on that later), and I quite enjoy driving, but it’s fair to say that the commute is not very fun. I reckon the way I can make it a lot more fun is by gathering and analysing data on it.

Before I can find out anything fun or insightful about my commute, I need to gather lots of a data about my commute. I reasoned the easiest way to do this is at the start and end of the journey, and since I didn’t spend three years at uni learning how to type numbers into a spreadsheet, it seemed only reasonable to streamline the collection process as much as possible.

Important Metrics

The first thing I needed to decide was what metrics I was interested in. I came up with the following list:

  • Time – the obvious one. I want to spend the least amount of time driving as possible. This whole process is basically a complex minimisation function for that metric;

  • Fuel efficiency – my car very kindly measures miles per gallon of fuel consumped on each trip. I had already become somewhat obsessed with this before I started thinking about it in this much detail. Since diesel is so expensive, I want to try and make my commute as fuel-efficient as possible;

  • Journey Length – There are several different options in terms of routes to take, mostly around Bath but also before Warminster. I can categorise them by length to eaisly differentiate between them, and do partitioned analysis on each route varient;

  • Intermediate “Laps” – the slowest part of the journey by far (I postulate) is the last section, from Limpley Stoke to outside my office in Bath. I’m interested in timings for this section especially, so I should measure as much as possible for this section.

  • Commute direction - I’m actually doing two commutes, to work and from work. I sometimes go different ways, traffic levels are different, intermediate stops are at different times. I need a way to group these separately.

What I Built

TL;DR: it’s a tiny Mojolicious web server which has big shiny buttons that I can press on my phone.

I have been experimenting with Mojolicious a fair bit recently (I should write a blog post about that at some point), and wanted to keep playing with it. This seemed like the obvious opportunity. The full source is on GitHub for everyone to laugh at. The basic idea is:

post '/commutes/start' => sub {
    my $c = shift;

    my $hour   = localtime->[2];
    my $dir    = int($hour) <= 12 ? 'in' : 'out';
    my $status = $c->app->dbh->do(q(
        INSERT INTO commutes (start_time, direction)
        VALUES (CURRENT_TIMESTAMP, ?)
    ), undef, $dir);
    return $c->render(json => { message => ($status ? 'Started' : 'Failed') });
};

post '/commutes/end' => sub {
    my $c = shift;
    my $mpg = $c->param('mpg');
    my $len = $c->param('length');

    my $status = $c->app->dbh->do(q(
        UPDATE commutes SET end_time = NOW(), mpg = ?, length = ?,
        total_time = TIMESTAMPDIFF(SECOND, start_time, NOW())
        ORDER BY id DESC LIMIT 1
    ), undef, $mpg, $len);
    my ($commute_id) = $c->app->dbh->selectrow_array('SELECT MAX(id) FROM commutes');
    return $c->render(json => {
        message => ($status ? 'Ended' : 'Failed'),
        commute => $commute_id,
    });
};

These two routes define the start and end of a commuting event (not exactly the most RESTful thing ever, I know, but they’re descriptive at least). We check which direction we’re going in using the current hour (and just hope the time on the server is correct). Then we add a row to our database table to signify the start of the event and return a JSON response.

Ending the commute is slightly more complex. We take some user-defined parameters, and insert those, plus the calculated end times and total times into the database. We’re even nice enough to return the newly inserted ID in the JSON response.

Obiovusly there are tonnes of problems with this when there’s more than one user, but there isn’t so I don’t really care. Also a database is probably overkill, and I should probably write some tests, and so on and so forth. But it works, and it’s quite neat.

The next thing required is some sort of interface to allow me to post to these routes from my phone, in the car. So I pulled together a bit of HTML and JS to act as a frontend. It looks like this:

All the Bootstrap buttons

It’s all very obvious. The only thing previously unexplained is the support for fuel stops, which works just as you would expect, and allows me to measure fuel cost and regularity, and allows me to assoicate journies with fuel stops so I can work out interesting things like how long a fuel stop takes on average.

That’s all for now. Once I’ve got a reasonable amount of data (I’m thinking around a month), I can start building a visualisation and analysis solution. If I’m lazy, it’ll be a load of Highcharts and some JQuery. I kind of want to experiment with d3 though, so I may get a little adventurous. I also want to play around with weather APIs, to do some regression analysis with historic weather data and metrics I’ve collected. I’ll have a nice thing going if I get all of that done, I think.

Who knows, I may even get some R out…