Online Learning Curriculum for Data Scientists
“Is there any online reading or courses I can do to get into data analysis?”
At my workplace, I get asked the question above. The question is usually posed by people typically with a finance background, who’s working as a management consultant. In this post I propose a learning path for such people to “get into data analysis”. I will assume that the prospective student someone with decent Excel skills, not afraid of a VLOOKUP or a touch of VB, and can throw together decent plots / dashboards using the same Microsoft package, but has little or no knowledge of programming / command line operations.
A data scientist can be defined by Drew Conway‘s Data Science Venn diagram which suggests that data scientists must have a solid mathematical background, skills in coding and computer hacking, and a healthy mix of subject matter expertise.
The courses mentioned below are by no means a “over a weekend” type of engagement – if you are serious about entering the world of data science as a profession, allow yourself at least 36 months to complete and study the content of the courses below.

Learn to program.
R and Python are the two primary scripting languages that are taking over the world of data science. There is very little that can not be done with knowledge of these two languages, and I would recommend getting to grips with both during your learning. R is a statistical programming language that has a huge number of packages available for every function you could think of. Python is a more general language that has data science capabilities built through the numpy and scipy libraries.
Course
Description
Try R Start your journey into R and data visualisation with the “Try R” free online course from CodeSchool.com. Learn the basic syntax and get loading and plotting small data sets. Computing for Data Analysis Augment your fundamental R knowledge with “Computing for Data Analysis” at Coursera.org. Python Track Take a trip into Python and get top grips with the basic syntax with the Python track at Codeacademy.com Introduction to Computer Science Expand this preliminary Python knowhow with a fully blown project to create a working search engine at Udacity.com’s Introduction to Computer Science 
Learn some maths.
Data scientists are one part statisticians. To gather meaningful information from large data sets requires skills in summarising and correlating variables on a regular basis. A solid understanding of the maths behind statistical transformations and machine learning techniques ensures that results are valid and immune to scrutiny. Note that a lot of the necessary statistics and maths knowledge can be picked up from the machine learningfocussed courses.
Course
Description
Introduction to Statistics Start off with some preliminary statistics at Udacity.com’s “Introduction to Statistics” Statistics One Go a bit deeper with “Statistics One” from Princeton at Coursera.org 
Learn machine learning and data visualisation.
The core information that separates data scientists from data analysts is the ability to move beyond reporting and applying more sophisticated analytical techniques to model variance, extract meaning, and predict variables of interest, using your data.
Course Name
Description
Data Analysis Start with the excellent “Data Analysis” course at Coursera.org that will give you direct experience in loading, visualising, and modelling of real data sets using R. This course is considerably more advanced than the previous “computing for data analysis”, and covers some data analysis techniques, and focuses on teaching students how to structure data analysis reports. Machine Learning Make sure that you take the brilliant and MOOCstarting “Machine Learning” or “mlclass” course with Andrew Ng at Coursera.org. Python skills are a must for this course that covers linear algebra, regression, neural networks, support vector machine, and recommender systems among others. Andrew Ng provides an excellent background for the topics that are covered. Artificial Intelligence for Robotics Sebastian Thrun‘s “Artificial Intelligence for Robotics” class is a brilliant introduction to more applied machine learning techniques such as the Kalman Filter and Particle Filters. While perhaps slightly offtopic, the course has a range of interesting and worthwhile Pythonbased exercises that will only add to your learning journey. Algorithms / Neural Networks More detailed specifictopic courses can be taken in Algorithms and Network Analysis at Udacity, or Neural Networks for Machine Learning – both of which I’ve personally found useful. The Neural Networks course dips into the realm of “Deep Learning”, a hot, but advanced, topic in machine learning at Google and Facebook at the moment. Introduction to Hadoop and MapReduce At some point, you’re going to need to wet your toes with some Big Data, Hadoop, and MapReduce knowledge – Get a basic introduction with “Introduction to Hadoop and MapReduce” at Udacity.com, in conjunction with Cloudera.
When you have completed the majority of the courses listed above, you’ll be in a very strong position to put your knowledge to use. And practice is the key. Get on Kaggle, download a data set, and get involved!!
vojko
Nice article!
P.S.
“Machine Learning” class uses Octave not Python.
shanelynn
Hey thanks for the update, Will change the text to reflect this!
Steve
Thanks for providing us with the course curriculum.
This course is designed to introduce students to the data management, storage and manipulation tools common in data science and will apply those tools to real scenarios. An overview of different SQL and NoSQL database technologies is presented and the course finishes with a discussion of choosing the appropriate tool to get the job done.
Topics include:
Introduction to data (data types, data movement, terminology, etc.)
Storage and Concurrency Preliminaries
Files and Filebased data systems
Relational Database Management Systems
Hadoop Introduction
NoSQL – MapReduce vs. Parallel RDBMS
Search and Text Analysis
Thanks! http://www.intellipaat.com/
MindMajix
Very useful material to learn the differences between data analysis and machine learning.