How often do you actually get wet going to work? Using pandas, python, and some graphs, we find out.

How often do you get wet cycling to work? Cycling in Ireland is taking off. The DublinBikes scheme is a massive success with over 10 million journeys, there’s large increases in people cycling in Irish cities, there’s a good cyclist community, and infrastructure is slowing improving around the country. However, Ireland is a rainy place! It turns out that […]

Read More →
Cyclist in the rain. Blog about python scraping data from wunderground rainfall data.

The most recent post on this site was an analysis of how often people cycling to work actually get rained on in different cities around the world. You can check it out here. The analysis was completed using data from the Wunderground weather website, Python, specifically the Pandas and Seaborn libraries. In this post, I will […]

Read More →
aws-amazon-cloud-1

[Short version] The S3 ingestion script for Amazon applications provided by Logentries will not work for the gzip compressed log files produced by the Elastic Beanstalk log rotation system. A slightly edited script will work instead and can be found on Github here.[/Short Version]   Logentries is a brilliant startup originating here in Dublin for collecting […]

Read More →

This is a very quick post to help some people out on installation problems with Office for Mac 2016. On excited installation of Excel 2016 on my Macbook, the following error threatened to ruin the day: “An unknown error has occurred, the error code is: 0xD0000006” Seemingly unfound on the internet, the solution, oddly enough was to ensure […]

Read More →
Aggregating statistics for multiple columns in pandas with groupby

I’ve recently started using Python’s excellent Pandas library as a data analysis tool, and, while finding the transition from R’s excellent data.table library frustrating at times, I’m finding my way around and finding most things work quite well. One aspect that I’ve recently been exploring is the task of grouping large data frames by different […]

Read More →

I recently had an issue with a long running web process that I needed to substantially speed up due to timeouts. The delay arose because the system needed to fetch data from a number of URLs. The total number of URLs varied from user to user, and the response time for each URL was quite long (circa […]

Read More →

This post is about creating Python Flask web pages that can be asynchronously updated by your Python Flask application at any point without any user interaction. We’ll be using Python Flask, and the Flask-SocketIO plug-in to achieve this. In short, the final result is hosted on GitHub. What I want to achieve here is a […]

Read More →

FAST TRACK: There is some python code that allows you to scrape bike availability from bike schemes at the bottom of this post… SLOW TRACK: As a recent aside, I was interested in collecting Dublin Bikes usage data over a long time period for data visualisation and exploration purposes. The Dublinbikes scheme was launched in […]

Read More →

Self-Organising Maps (SOMs) are an unsupervised data visualisation technique that can be used to visualise high-dimensional data sets in lower (typically 2) dimensional representations. In this post, we examine the use of R to create a SOM for customer segmentation. The figures shown here used use the 2011 Irish Census information for the greater Dublin […]

Read More →

“Is there any online reading or courses I can do to get into data analysis?” At my workplace, I get asked the question above. The question is usually posed by people typically with a finance background, who’s working as a management consultant. In this post I propose a learning path for such people to “get […]

Read More →