“Is there any online reading or courses I can do to get into data analysis?”

At my workplace, I get asked the question above. The question is usually posed by people typically with a finance background, who’s working as a management consultant. In this post I propose a learning path for such people to “get into data analysis”.  I will assume that the prospective student someone with decent Excel skills, not afraid of a VLOOKUP or a touch of VB, and can throw together decent plots / dashboards using the same Microsoft package, but has little or no knowledge of programming / command line operations.

A data scientist can be defined by Drew Conway‘s Data Science Venn diagram which suggests that data scientists must have a solid mathematical background, skills in coding and computer hacking, and a healthy mix of subject matter expertise.

Data science venn diagram

The courses mentioned below are by no means a “over a weekend” type of engagement – if you are serious about entering the world of data science as a profession, allow yourself at least 3-6 months to complete and study the content of the courses below.

  1. Learn to program.

    R and Python are the two primary scripting languages that are taking over the world of data science. There is very little that can not be done with knowledge of these two languages, and I would recommend getting to grips with both during your learning. R is a statistical programming language that has a huge number of packages available for every function you could think of. Python is a more general language that has data science capabilities built through the numpy and scipy libraries.



    Try RStart your journey into R and data visualisation with the “Try R” free online course from CodeSchool.com. Learn the basic syntax and get loading and plotting small data sets.
    Computing for Data AnalysisAugment your fundamental R knowledge with “Computing for Data Analysis” at Coursera.org.
    Python TrackTake a trip into Python and get top grips with the basic syntax with the Python track at Codeacademy.com
    Introduction to Computer ScienceExpand this preliminary Python know-how with a fully blown project to create a working search engine at Udacity.com’s Introduction to Computer Science


  2. Learn some maths.

    Data scientists are one part statisticians. To gather meaningful information from large data sets requires skills in summarising and correlating variables on a regular basis. A solid understanding of the maths behind statistical transformations and machine learning techniques ensures that results are valid and immune to scrutiny. Note that a lot of the necessary statistics and maths knowledge can be picked up from the machine learning-focussed courses.



    Introduction to Statistics Start off with some preliminary statistics at Udacity.com’s “Introduction to Statistics”
    Statistics One Go a bit deeper with “Statistics One” from Princeton at Coursera.org


  3. Learn machine learning and data visualisation.

    The core information that separates data scientists from data analysts is the ability to move beyond reporting and applying more sophisticated analytical techniques to model variance, extract meaning, and predict variables of interest, using your data.

    Course Name


    Data AnalysisStart with the excellent “Data Analysis” course at Coursera.org that will give you direct experience in loading, visualising, and modelling of real data sets using R. This course is considerably more advanced than the previous “computing for data analysis”, and covers some data analysis techniques, and focuses on teaching students how to structure data analysis reports.
    Machine LearningMake sure that you take the brilliant and MOOC-starting “Machine Learning” or “ml-class” course with Andrew Ng at Coursera.org. Python skills are a must for this course that covers linear algebra, regression, neural networks, support vector machine, and recommender systems among others. Andrew Ng provides an excellent background for the topics that are covered.
    Artificial Intelligence for Robotics Sebastian Thrun‘s “Artificial Intelligence for Robotics” class is a brilliant introduction to more applied machine learning techniques such as the Kalman Filter and Particle Filters. While perhaps slightly off-topic, the course has a range of interesting and worthwhile Python-based exercises that will only add to your learning journey.
    Algorithms / Neural Networks More detailed specific-topic courses can be taken in Algorithms and Network Analysis at Udacity, or Neural Networks for Machine Learning – both of which I’ve personally found useful. The Neural Networks course dips into the realm of “Deep Learning”, a hot, but advanced, topic in machine learning at Google and Facebook at the moment.
    Introduction to Hadoop and MapReduce At some point, you’re going to need to wet your toes with some Big Data, Hadoop, and MapReduce knowledge – Get a basic introduction with “Introduction to Hadoop and MapReduce” at Udacity.com, in conjunction with Cloudera.


When you have completed the majority of the courses listed above, you’ll be in a very strong position to put your knowledge to use. And practice is the key. Get on Kaggle, download a data set, and get involved!!


  1. Thanks for providing us with the course curriculum.
    This course is designed to introduce students to the data management, storage and manipulation tools common in data science and will apply those tools to real scenarios. An overview of different SQL and No-SQL database technologies is presented and the course finishes with a discussion of choosing the appropriate tool to get the job done.
    Topics include:

    Introduction to data (data types, data movement, terminology, etc.)
    Storage and Concurrency Preliminaries
    Files and File-based data systems
    Relational Database Management Systems
    Hadoop Introduction
    NoSQL – MapReduce vs. Parallel RDBMS
    Search and Text Analysis
    Thanks! http://www.intellipaat.com/

Leave a Reply