AcademicWe are a faculty team aiming to bring Applied Data Science with Python skills to everyone through a Coursera specialization this fall AMA!

Aug 24th 2016 by UMichiganAI • 13 Questions • 106 Points

My short bio: There’s a big team behind this University of Michigan Coursera specialization and we want to share with you what we’re doing to bring applied data science and python skills to everyone! From pedagogy and technology through to curricular design and content please feel free to ask us anything! Want to know why we think python is great for data science? Or what it takes to put a MOOC together?

  • Christopher Brooks is faculty in the University of Michigan School of Information, and does research in learning analytics and educational technologies, such as predictive models of student success.
  • Kevyn Collins-Thompson is faculty in the University of Michigan School of Information and does research in information retrieval and text analysis.
  • Daniel Romero is faculty in the University of Michigan School of Information and does research in networks and complex systems.
  • V.G.Vinod Vydiswaran is faculty in the University of Michigan Medical School and the School of Information and does research in text mining and natural language processing, such as mining health information from patient records and social media. In addition to the faculty, we are joined by our coordinators * Stephanie Haley and course tutorial assistant Filip Jankovic!

Here’s the course we are planning to teach:

My Proof:


Why did you pick python as the language you're using to teach with?


There are a couple of reasons. First Python is wonderful specifically for data science - lots of great libraries for machine learning (scikit-learn), natural language processing (nltk), network analysis (networkx) and basic visualizations (matplotlib). The data analysis and cleaning ability of python is great - I (Chris) am regularly writing up pandas manipulations to clean and transform research data.

Python also is a comprehensive programming language, so if you're a software developer you've got a full toolkit including multiprocessing and cloud computing libraries and not just a specialized stats language.

But we also took a look at what exists out there for free educational data science material - there are lots of great resources in R, but I think the python world was a little underrepresented, so we figured we would share our workflows (though I think all of us use a variety of tools when solving data science problems!).


or statistical testing


Today has been filled with insightful conversations around data science and python - thanks to all who participated this AMA through posing questions and sharing their thoughts!

If we haven’t gotten to your question yet we apologize and will try to circle back on it soon. For those interested in learning more about our work, check out our Applied Data Science with Python Specialization on Coursera:


What value does your specialization offer the job seeker? I'm curious if you had that demographic in mind while designing the course.

EDIT: for example, some specializations have industry partnerships, or large / capstone projects to put on your CV.


All here: We are very interested in this demographic, and talked about how to support these learners at some length in course planning. This course is more introductory, so it depends on the kind of job you are seeking, and what other background (current employment, previous academic background, etc.) you might have. For instance, if you're a programmer who is looking to shift positions away from (say) front end development to business intelligence, we hope this specialization is for you. That's of course just one example of a job seeker!

We also hope to support students who are thinking of going into graduate school, and want some solid skills to put on their application process.

And, while we don't have an omnibus capstone, instead each of the courses ends in a larger project assignment. My experience in talking with learners who had done data science MOOCs was, even if they paid for the specialization, they tended not to do the separate capstone project. So we wanted to try larger projects on a per course basis to see if this would help create a compelling portfolio for learners!

In the end, I think the best bet for a job seeker is to differentiate themselves by applying their skills to a novel project that is wholly their own!


What do you believe is a reasonable expectation for ROI for a paid data science specialization on Coursera? How does this compare to the expectations for other certification paths (e.g. post-graduate certificates)?


We all have lots of thoughts on this!

First, you can take all of the courses in this specialization along with assessments for free. You need to find the individual course and then there should be that enrollment option available. Of course, investment isn't just $, it's time too.

There are lots of different post graduate certificate options, and I think they differ heavily on price and how you take them. The coursera specializations are probably the cheapest and most flexible. Bootcamps come next, and have some significant constraints (location) and costs. Another option is university certificates, which require dedicated time and can have significant costs (e.g. in the case of 2 year Master's degrees).

Accessibility is another issue - if you live in a town that has meetups, a strong university, or boot camps, you have different resources available then if you are in a rural area (for instance).

I think the ROI argument probably comes down to understanding goals, background, and willingness to accept risk (e.g. move across the country and put out significant $). In offering this specialization we hope to help people get involved in applying data science while minimizing their costs and risk. But we teach at a residential university and think the on campus experience is amazing too, so it comes down to your risk profile in part.

But you asked about the ROI for the certificate. This is a tough question! We're starting to see students listing their co-curricular (e.g. moocs) work on their submissions to grad school. I think certificates like those from Coursera will help people not only get introduced to the area, but help get them that interview where they can pitch their case.


Hey I'm a recent Michigan alumnus and I've started learning data science for fun. One of the problems with a lot of the resources I've read online is that they don't go into enough detail about how the math works. I feel more comfortable using methods, such as PCA, when I have an idea what the method is actually doing. How do you plan to address the fact that you need to have some experience with linear algebra, stats and diff eq to have a reasonable grasp of how basic data science methods work?

Also, will people have to pay for the quizzes in your coursera class?

Thank you and go blue!


Go Blue!

First, the quizzes are free, you can sign up for each course and get the full experience!

This course is very much applied in nature, so we're not expecting significant knowledge of linear algebra and diff eq, for instance. The aim is to make the course accessible to a broad group of learners. At the same time, we don't aim to hide the details, but to bring them about through other course resources.

Also, there are some excellent courses that go into detail on specific techniques, like Andrew Ng's course, which we hope will help fill in background for those who want to dig deeper.


When designing an online course are there parts that just dont scale to a mooc? What are the results of a mooc vs a traditional course in terms of student knowledge retention?


Chris here: Yes there are challenges in scaling up some activities in particular, but we see some of that in traditional higher ed too. Discussions is the big one - MOOC discussion forums are largely unused by students, which means that there might be only hundreds or thousands of messages in the discussion forums. We see this in big first year courses too - how do you have a discussion with 300 people in a classroom?

Many MOOC faculty I've talked to handle this in a couple of ways. One is to engage in peer review - to break the discussion up into an activity like a short writing which then others in the course have to grade or comment on. In this way the discussion boards aren't really used

Another way is to have really focused discussion prompts. One of the instructors at UM, Caren Stalburg, teaches a course on instructional skills. She is deeply involved in the discussion forums, but has created her course to have very structured activities.

As far as retention, I don't know that I've seen literature on this. I think it likely comes down to learning by doing. If you don't do the programming (in this case), you won't gain the skills. There's only so much you'll be able to absorb through lectures (though that's fine if that's all you're looking for!). So one of the cool things Coursera worked on for us was integrating a coding environment (jupyter notebooks) right into course experience. You don't have to install anything to start doing data science practice, which I think is going to be awesome.

(and we're starting to see this happen in traditional lecture halls too!)


How proficient does one need to be in Python going into the class to be successful?


All of us chiming in: If you don't know python but you have a programming background I think it's very attainable - we provide some material in the first week which will help bridge the gap. If you don't have a programming background or want a review, then we would recommend checking out Dr. Chuck's MOOC, "programming for everybody".


Python is great for rapid development and "getting things done", but can be a nightmare to debug when things go wrong (compared to, e.g., Java where everything is so strict that the compiler often saves you from yourself). How do you solve that problem with newbie programmers in a MOOC?


Chris here: I'll side step the discussion of static vs. dynamically typed languages a bit, and focus on how we are supporting newbie programmers. There are really two innovations with this course that we are leaning on. First, the coursera platform has evolved to allow programming examples for in video quizzes. So instead of just multiple choice, learners can see scaffolded code and fill in some pieces to get immediate feedback on potential problem solutions. I think this is going to be awesome for supporting learning during the video.

Second, coursera has integrated the excellent jupyter notebook environment right into the course shell. So you don't have to download or setup anything in order to start programming, and we will have notebooks for all of the code examples in the lecture, allowing learners to not only follow along with the lecture but go on tangents to explore their own ideas.

To jump back to python v. java, I'm a big fan of both languages. What I appreciate about python for newbie programers is the simple syntax and lack of boiler plate. We don't have to jump into a discussion of classes an inheritance in order to start writing code which allows us to do some basic data cleaning.

(n.b. I'm not the biggest fan of the syntax for interpreter hints for python, that might be a middle ground but that's probably another discussion)


Do many companies now use CPython?
How do you ensure a MOOC gets proper publicity and supplementary texts?


Chris here: Python is certainly one of the top data science languages, along with R. There are many other tools of course, SPSS, SAS, STATA, etc. Python is particularly nice because of the large toolkit support (nltk, networkx, scikit-learn, pandas, matplotlib) for data science workflows.

For publicity we rely on word of mouth, the coursera portal, and of course activities like a reddit ama. For supplementary material I feel that there are plenty of solid data science resources on the web we can link learners out to - kaggle is a great example, where someone might want to take this specialization then get engaged in kaggle competitions to hone their skills.


I haven't used coursera in quite some time, but as I recall the courses were free. This one appears to have a fee. Can I take the course for free if I don't want the certificate?


Chris here: Yes! The course is available completely free. It's a bit circuitous if you want to sign up for all five courses for free. To do this you have to find each course's individual page then enroll from there. At the moment coursera doesn't allow signing up for a specialization for free.

Here's a list of the five courses that make up the specialization:

Also, if you're unemployed, under employed, or otherwise can't afford the fee but the certificate is valuable to you, Coursera has a financial aid option (I think you have to do this for each course as well):


Would completing programming for everybody specialization (which I currently take) provide enough knowledge for me to take this specialization or should I wait to finish first year of my (in CS) University?


Chris here: Yes, I think it would provide enough of a background, especially if you are planning to go into a technology field and consider yourself to be a keen. Some of the later courses get more intense and be more challenge as they require some basic statistics knowledge, but I think this is generally achievable by any CS student in either the late part of their first or second year of undergraduate study. I think this specialization would help you experience techniques that you might not normally get to experience until you are a senior undergraduate.


What would you say to someone that is still in the process of obtaining a college degree, little math/statistics/programming knowledge, but have an enormous amount of interest in data science and wanting to pursue a career in this field? Also, what can you say about data analysis and the health care field?


Chris here: It depends a lot on what you want to do with data science and in what capacity. I think data analysis skills are becoming an important for everyone - they help you think about the world, information, and computational resources in different ways. And I think basic understanding of data analysis and how it is applied is important for communicating in an increasingly data-driven world.

If you're an undergraduate student I think this course is a good place to start. It requires some programming and stats knowledge, but nothing that can't be learnt from existing online resources like the Python for Everybody specialization by my colleague Chuck Severance. If you want to continue the technical burn, adding in courses like Andrew Ng's Machine Learning course is an excellent next step.

You ask about the health care field. Vinod Vydiswaran is teaching the fourth course in the specialization on text mining, and he is interested in understanding about how people communicate about their health in online forums. This is a great example of one way you could go with data science, but there are many more. From diagnosing disease in individuals, studying population health, to predictions off of genetic sequences, there are lots of applications of data science in health care. The question you should ask yourself is: are you a data scientist who applies the craft to health/medical domains, or are you a health/medical expert who understands data science and can do your own relevant analyses.

There is room for both of these roles, but might help in choosing options while in undergraduate study. And shamelessly I'll share that the School of Information at the University of Michigan has a Master's of Health Informatics (and Vinod teaches in that specific track!).


Currently Coursera shows that the first course runs from 9/26 to 10/9, so that is 2 weeks? There are a lot of topics in the syllabus - will they all be covered in just 2 weeks? Also, how long are the other 4 courses of the specialization?



Hi: No, each of the courses runs for four weeks, not sure why it is showing up as only two weeks on Coursera!