This post shows how to load, use, and make your own word embeddings using Python. Use the Gensim and Spacy libraries to load pre-trained word vector models from Google and Facebook, or train custom models using your own data and the Word2Vec algorithm. This post is a direct follow-on from the introductory Word Embeddings post, and will show you how to get started using word vectors with your own models and systems.

Read More →

This post provides an introduction to “word embeddings” or “word vectors”. Word embeddings are real-number vectors that represent words from a vocabulary, and have broad applications in the area of natural language processing (NLP). We examine training, use, and properties of word embeddings models, and look at how and why you should look to use word embeddings over older bag-of-words techniques in your data science and language modelling tasks.

Read More →

The Pandas DataFrame – this blog post covers the basics of loading, editing, and viewing data in Python, and getting to grips with the all-important data structure in Python – the Pandas Dataframe. Learn by example to load CSV files, rename columns, extract statistics, and select rows and columns.

Read More →

In this post, geocoded data for all property price sales in Ireland from 2012-2017 is available. Data is sourced on the Irish Property Price Register and geocoded using the Google geocoding script in Python. All of the GPS latitude/longitude coordinates are further tied to census small area and electoral division boundaries.

Read More →

Merging and Joining data sets are key activities of any data scientist or analyst. In this tutorial, we explore the process of combining datasets based on common columns quickly and easily with the Python Pandas library and it’s fast merge() functionality. Finally conquer merging and become a master with this 2-part tutorial.

Read More →

Geocode your addresses for free with Python and Google For a recent project, I ported the “batch geocoding in R” script over to Python. The script allows geocoding of large numbers of string addresses to latitude and longitude values using the Google Maps Geocoding API. The Google Geocoding API is one of the most accurate geocoding […]

Read More →

Pandas Data Selection There are multiple ways to select and index rows and columns from Pandas DataFrames. I find tutorials online focusing on advanced selections of row and column choices a little complex for my requirements. Selection Options There’s three main options to achieve the selection and indexing activities in Pandas, which can be confusing. The three selection cases and […]

Read More →