data science | Shane Lynn

Data visualization for python and pandas is a key element of data science work.

Data Visualisation in Python – Pycon Dublin 2018 Presentation

2 Comments / blog, Data Visualisation, python, Talks / By Shane

The ability to explore and grasp data structures through quick and intuitive visualisation is a key skill of any data scientist. At PyConIE 2018, I presented a talk on the various libraries available for data visualisation in Dublin. This post contains the slides from that talk, along with a video recording of same.

Read CSV data quickly into Pandas DataFrames with read_csv

19 Comments / blog, data science, Pandas, python, Tutorials / By Shane

CSV (comma-separated value) files are a common file format for transferring and storing data. The ability to read, manipulate, and write data to and from CSV files using Python is a key skill to master for any data scientist or business analysis. In this post, we’ll go over what CSV files are, how to read CSV files into Pandas DataFrames, and how to write DataFrames back to CSV files post analysis.

Word Embeddings in Python with Spacy and Gensim

9 Comments / blog, data science, Natural Language Processing, python, Tutorials / By Shane

This post shows how to load, use, and make your own word embeddings using Python. Use the Gensim and Spacy libraries to load pre-trained word vector models from Google and Facebook, or train custom models using your own data and the Word2Vec algorithm. This post is a direct follow-on from the introductory Word Embeddings post, and will show you how to get started using word vectors with your own models and systems.

An introduction to word embeddings for text analysis

14 Comments / blog, data science, python, Tutorials / By Shane

This post provides an introduction to “word embeddings” or “word vectors”. Word embeddings are real-number vectors that represent words from a vocabulary, and have broad applications in the area of natural language processing (NLP). We examine training, use, and properties of word embeddings models, and look at how and why you should look to use word embeddings over older bag-of-words techniques in your data science and language modelling tasks.

The Pandas DataFrame – loading, editing, and viewing data in Python

54 Comments / blog, data science, Data Visualisation, Pandas, python, Tutorials / By Shane

The Pandas DataFrame – this blog post covers the basics of loading, editing, and viewing data in Python, and getting to grips with the all-important data structure in Python – the Pandas Dataframe. Learn by example to load CSV files, rename columns, extract statistics, and select rows and columns.

Learn To Merge and Join DataFrames Easily with Pandas

37 Comments / blog, data science, Pandas, python, Tutorials / By Shane

Merging and Joining data sets are key activities of any data scientist or analyst. In this tutorial, we explore the process of combining datasets based on common columns quickly and easily with the Python Pandas library and it’s fast merge() functionality. Finally conquer merging and become a master with this 2-part tutorial.