AI News, Lessons learned from teaching an 11-week data science course

Lessons learned from teaching an 11-week data science course

It's a substantial introductory data science course that covers the entire 'data science pipeline': getting and cleaning data, data exploration and analysis, machine learning, visualization, and communicating results.

The course includes 66 hours of classroom instruction (twice per week for three hours), as well as each student completing a course project of their choosing.

Since I have already begun teaching a fourth session of the course, I've been giving a lot of thought to what worked and what didn't during the past three sessions so that I can make the current course even better for my students.

(My course materials can found on GitHub, both for the third session and the fourth session.) Below are my findings from the past three sessions, based on explicit and implicit feedback from the students, student performance in each area, and conversations with my excellent co-instructor from the last session, Josiah Davis.

Lessons learned from teaching an 11-week #datascience course in #python: http://t.co/J4YK1RZpGp @GA @GA_DC Some latitude is given to General Assembly instructors as to how to deliver the curriculum, including the choice of language(s) in which to teach the core content.

Because the vast majority of students who take the Data Science course (at least in DC) are relatively inexperienced at programming, attaining a baseline proficiency in two languages in 11 weeks is unattainable for most of our students.

Undoubtedly, gaining a deep understanding of every machine learning technique that we teach would require students to have a thorough understanding of probability, statistics, linear algebra, calculus, and more.

Instead, we have found that students can obtain a practical understanding of how an algorithm works, its strengths and weaknesses, and how to properly apply it by teaching the material primarily at a conceptual level.

We point students toward additional resources if they want to go deeper into the math, but most General Assembly students (and adult learners in general) are focused on learning things that they can immediately apply to their own work.

Although the simplest approach to acquiring data (and the most expeditious for teaching purposes) is to download it from the web, we spent an entire class demonstrating how (and why) to gather data using APIs and web scraping.

We gave it an entire class period (3 hours), which I have found is the minimum instructional time required to take a complete Git novice and provide them with a 'functional understanding' of Git, including the ability to clone and fork repositories and contribute on GitHub using pull requests.

I've specifically found that teaching Git in the context of the GitHub workflow is the fastest path to student comprehension, though branches are way too complicated for most novices to quickly grasp.

Because we want students to get a lot of practice using Git in a collaborative environment, we set up a GitHub repository for student work, and required that they use pull requests to submit their homework and course project.

For example, before presenting an overview of machine learning, we explored the famous iris data set and then wrote a simple algorithm in Python for classifying each flower in the data set.

(Here are the project requirements and some past student projects.) Students are responsible for coming up with their own project question, and we encourage them to choose a project connected to their personal or professional interests.

Because a lot of student learning occurs when they apply classroom knowledge to real-world data, we placed a heavy emphasis on the project, including milestone deadlines throughout the course.

Although most students did not think they were competent enough to provide feedback, we found that it was a valuable supplement to the feedback that we (as instructors) provided, as well as useful practice for students in reading and analyzing the work of others.

The primary motivation for excluding grades is that the educational backgrounds of our students vary widely, and thus it would be unfair to compare the work of a student with significant programming experience with the work of a student lacking that experience.

please comment below if you're interested!) Scikit-learn is the most popular machine learning library for Python, and for good reason: it supports a wide variety of algorithms, has a consistent interface for accessing those algorithms, is thoughtfully designed, and has excellent documentation.

In my view, there are four ways for students to learn coding: watching in-class code walkthroughs, doing in-class exercises, doing homework assignments, and individual project work.

We emphasized in-class walkthroughs over exercises because walkthroughs allow us to present high-quality, well-commented code that students can later reference, whereas meaningful in-class exercises can take up a lot of class time (that could otherwise be used for instruction).

I did spend part of a class demonstrating how I work a real-world data problem from scratch (using the Kaggle Avazu competition as an example), which students said was incredibly helpful, but they also wished it had been shown earlier in the course instead of at the end.

In my current course, we are dedicating one class (halfway through the course) to students working a data problem on data they have never seen before, with the hope that it will help them to synthesize a lot of what we have taught up to that point.

I'm actively looking for good opportunities to try a flipped classroom approach, in which students read about a topic before class, and then we use the in-class time to discuss that topic in more depth and answer their questions.

The challenge, of course, is finding the right material to give students before class: it has to be well-explained, at the right difficulty level, and cover the particular points we think are important.

Tools for online learning such as Vowpal Wabbit are becoming more widely known, and online learning is an especially useful paradigm for thinking about machine learning, so I've considered teaching that particular topic near the end of the course.

(I'd love to hear your thoughts!) Like most Python users, I continue to use (and teach) Python 2 because it is still being supported by the community, a lot of teaching resources are written for Python 2, and it's unclear whether the benefits of Python 3 outweigh the downside (no backwards compatibility).

However, when watching students actually write their code within a Notebook, I've found that the interface seems to encourage sloppy coding practices that make it hard to actually debug code.

However, there are certainly times at which the command line is the easiest (or the only) way to accomplish a task, and there is a general expectation that data scientists will be somewhat familiar with the command line, and so I have debated including it more fully the course.

We didn't spend any time on big data in the last session of the course, primarily because it's such a broad topic, but also because it's not clear what aspect of big data (or what big data tool) is suitable to teach as an introduction to the topic.

We could certainly teach the basics of the MapReduce algorithm, but I tend to think that as an application-focused course, time is usually better spent on content that students can immediately apply without having to learn 'yet another tool.'

Data Science Final Project in Python

Made some rate calculations for H1B visa.

Where to Find Real-World Python Projects

▻ Improve your Python with actionable code snippets and examples Where to find inspiration for Python projects that will help ..

Python for science, side projects and stuff!

Andrew Lonsdale There are many serious reasons why Python is a great language for scientific research but in ..

The Best Free Course on Python – Learn Python Programming

Join the VIP wait list now and get your backstage pass so you're the first one to know when I launch this course...

Python Programming

Get the Cheat Sheet Here : Best Book on Python : Beginner Python Tutorial .

FUN Google AI Projects - You Can Try !

Here are some fun Google AI (Artificial Intelligence) Projects that you can try online. TRY GOOGLE AI 10 Upcoming ..

A Project-Driven Approach to Teaching Python: A Showcase of Student Work and Achievements

Bruce Fuda One of the greatest advantages the teaching of the Digital Technologies has over most other ..

Project Jupyter: From interactive Python to open science - Fernando Perez

Fernando Pérez opens JupyterCon with an overview of Project Jupyter, describing how it fits into a vision of collaborative, community-based open development ...

Top 5 Python Online Free Courses-Beginner\Intermediate Level

Subscribe Please Here is the list of courses - 1) Introduction to Programming Using Python- ...

Applied Data Science project demo

demo for Applied Data Science, CMU Heinz course.