machine learning | Shane Lynn

An introduction to word embeddings for text analysis

14 Comments / blog, data science, python, Tutorials / By Shane

This post provides an introduction to “word embeddings” or “word vectors”. Word embeddings are real-number vectors that represent words from a vocabulary, and have broad applications in the area of natural language processing (NLP). We examine training, use, and properties of word embeddings models, and look at how and why you should look to use word embeddings over older bag-of-words techniques in your data science and language modelling tasks.

Use Pandas Groupby to Group and Summarise DataFrames

139 Comments / blog, data science, python, Uncategorized / By Shane

Aggregation and data grouping of Dataframes is accomplished in Python Pandas using “groupby()” and “agg()” functions. In this post, we’ll look at every aspect of grouping by single or multiple columns, applying aggregation functions such as max, min, count, and naming the resulting Dataframes and Pandas Series.

Parallel programming allows you to speed up your code execution - very useful for data science and data processing

Online Learning Curriculum for Data Scientists

15 Comments / blog, data science / By Shane

“Is there any online reading or courses I can do to get into data analysis?” At my workplace, I get asked the question above. The question is usually posed by people typically with a finance background, who’s working as a management consultant. In this post I propose a learning path for such people to “get …

Online Learning Curriculum for Data Scientists Read More »