# Predictive Data Analytics with Python

In this course trainees learn how to read, clean, visualize and analyze data effectively using Python and its powerful free libraries Pandas, Seaborn, Scipy, Numpy, Matplotlib, and Statsmodels. They also learn how to interpret the results and use them to make predictions.

Note: This course is only available as part of NCLab’s Data Analyst Career Training program.

## Course Overview

The course focuses on using Python libraries to solve practical applications rather than on the underlying math concepts. Trainees learn enough statistical and analytical concepts and procedures in the tutorials to use these libraries effectively. This foundation is invaluable, whether trainees continue to use free Python libraries for analysis and visualization in their own work, or move on to a commercial analytics/visualization product specific to their industry. Trainees learn how to use Python and its powerful free libraries including Pandas, Numpy, Scipy, Matplotlib, Seaborn, and Statsmodels to read data from files, clean data, present data in visual form, perform qualitative and quantitative analysis of data, interpret data, and make predictions.

## Prerequisites

This course has Python Fundamentals as a prerequisite.

## Student Learning Outcomes (SLO)

Students will be able to:

• Import data from CSV files and clean data.
• Enter data into Pandas data frames, manipulate data frames.
• Calculate variance and standard deviation.
• Visualize data using scatterplots.
• Use Seaborn to display the regression and residual plots.
• Discuss the residual plot as part of regression analysis.
• Make predictions based on the results of simple linear regression analysis.
• Display boxplots with Seaborn and interpret them.
• Plot histograms with Pandas.
• Calculate the Pearson coefficient of correlation R (R-value).
• Visualize correlation graphically via heatmaps.
• Describe the purpose of using the R-squared value, and its advantages over the R-value.
• Distinguish between continuous, discrete and categorical random variables.
• Calculate the P-value and the standard error of the estimate using Scipy and interpret the results.
• Interpret data using the statistical hypothesis testing and the null hypothesis.
• Use training and testing datasets to make predictions.
• Perform multiple linear regression using Statsmodels.
• Make predictions based on multiple linear regression.
• Evaluate variable independence in multiple linear regression based on multicollinearity.
• Perform logistic regression with Seaborn and Statsmodels.
• Recognize when the results of logistic regression are wrong.

## Equipment Requirements

Computer, laptop or tablet with Internet access, email, and one of the following browsers: