tutorial

Read CSV data quickly into Pandas DataFrames with read_csv

CSV (comma-separated value) files are a common file format for transferring and storing data. The ability to read, manipulate, and write data to and from CSV files using Python is a key skill to master for any data scientist or business analysis. In this post, we’ll go over what CSV files are, how to read CSV files into Pandas DataFrames, and how to write DataFrames back to CSV files post analysis.

Word Embeddings in Python with Spacy and Gensim

This post shows how to load, use, and make your own word embeddings using Python. Use the Gensim and Spacy libraries to load pre-trained word vector models from Google and Facebook, or train custom models using your own data and the Word2Vec algorithm. This post is a direct follow-on from the introductory Word Embeddings post, and will show you how to get started using word vectors with your own models and systems.

Pandas iloc and loc – quickly select rows and columns in DataFrames

Pandas Data Selection There are multiple ways to select and index rows and columns from Pandas DataFrames. I find tutorials online focusing on advanced selections of row and column choices a little complex for my requirements, but mastering the Pandas iloc, loc, and ix selectors can actually be made quite simple. Selection Options There’s three main options to …

Pandas iloc and loc – quickly select rows and columns in DataFrames Read More »

Parallel programming allows you to speed up your code execution - very useful for data science and data processing

Using Python Threading and Returning Multiple Results (Tutorial)

Threading in Python is simple. It allows you to manage concurrent threads doing work at the same time. The library is called “threading”, you create “Thread” objects, and they run target functions for you. You can start potentially hundreds of threads that will operate in parallel. Speed up long running tasks by parallelising and threading computation where you can.