Section 1
- Importing the Pandas, Seaborn and Matplotlib libraries.
 - Understanding that Pandas DataFrames are basically Python dictionaries.
 - Entering data into DataFrames using dictionaries.
 - Visualizing pairwise data using scatterplots.
 - Entering data into DataFrames by zipping lists of column values.
 - Entering data into DataFrames by using a list of pairs (or tuples) representing rows.
 - Understanding that adding a new column to a DataFrame is the same as adding a new item to a dictionary.
 - Knowing the equation of the line and the meaning of the y-intercept and the slope.
 - Using Seaborn to display the regression plot.
 - Using the regression line to make predictions.
 - Understanding confidence intervals.
 - Using Seaborn to display the residual plot.
 - Displaying the regression and residual plots either in the same figure or in separate figures.
 - Adjusting the horizontal limits of the regression and residual plots.
 - Discussing the residual plot as part of every regression analysis.
 - Knowing that for the regression analysis to be acceptable, the residuals must add up to zero.
 - Understanding that the residuals must look like random numbers, without showing any non-random patterns such as lines or curves.
 - Understanding time series, and generate sequences of integers to work with time series data.
 
Section 2
- Knowing what types of data files are supported by Pandas.
 - Reading files from the hard disk or from a URL.
 - Distinguishing between continuous, discrete and categorical random variables.
 - Identifying CSV files; understand their structure and how to read all or selected columns.
 - Checking for delimiters, quotes and empty spaces when reading CSV files.
 - Specifying which values in the CSV file should be treated as missing values.
 - Obtaining the number of rows and number of columns of a DataFrame.
 - Obtaining the list of all column names of a DataFrame.
 - Displaying the beginning and the end of a DataFrame.
 - Using the Titanic data set to practice analytics.
 - Adding new data columns to a DataFrame.
 - Iterating over columns in a DataFrame.
 - Accessing columns using integer indices.
 - Appending new rows.
 - Accessing rows using integer indices.
 - Understanding the difference between global and local row indices.
 - Deleting selected rows and/or columns from a DataFrame.
 - Identifying missing values in DataFrames using Boolean masks.
 - Summing up row and column values in DataFrames.
 - Counting missing values in the rows and columns of a DataFrame.
 - Dropping rows and/or columns with missing values.
 
Section 3
- Calculating the minimum, maximum, sum and mean of column values.
 - Calculating variance and standard deviation.
 - Understanding why standard deviation is used more often in practice.
 - Plotting histograms with Pandas.
 - Using the 68–95–99.7 rule in predictive analysis.
 - Filtering rows (= select rows with a given property) in DataFrames.
 - Calculating the median and mode with Pandas.
 - Understanding how the median differs from the mean.
 - Knowing the five values which define a boxplot.
 - Displaying boxplots with Seaborn, and how to interpret them.
 - Confirming information read from a boxplot by calling the method describe.
 
Section 4
- Understanding how a pair of random variables can be correlated.
 - Calculating the Pearson coefficient of correlation R (R-value).
 - Using Pandas to quickly see which quantities in a DataFrame are correlated.
 - Visualizing correlation graphically via heatmaps.
 - Annotating heatmaps, set limits for the values, and change color maps.
 - Understanding the purpose of using the R-squared value, and its advantages over the R-value.
 - Calculating the R-squared value by squaring all values in the correlation matrix.
 - Searching and replacing in DataFrames using the method replace().
 - Modifying values in entire DataFrames and in individual columns using functions.
 
Section 5
- Why it is important to quantify the results of linear regression analysis.
 - What is statistical hypothesis testing and the null hypothesis.
 - What is the P-value and the standard error of the estimate.
 - How to use the function linregress of the Scipy Stats module.
 - How to use Matplotlib to plot Scipy Stats results.
 - How to use the calculated y-intercept and slope for predictions.
 - How to perform simple linear regression with Statsmodels.
 - About the difference between the training and testing datasets.
 - About two “gotchas” that one needs to pay attention to when using Statsmodels.
 - How to use simple linear regression results to make predictions
 - What is multiple linear regression, and how to do it with Statsmodels.
 - How to use multiple linear regression results to make predictions.
 - How to plot the results of multiple linear regression (and understand the plots).
 - What is multicollinearity of independent variables in multiple linear regression.
 - What are the main limitations of linear regression models.
 - That linear regression does not work for categorical dependent variables.
 - What is logistic regression, and what types of models it is used for.
 - How to recognize when the results of logistic regression are wrong.
 - How to perform logistic regression with Seaborn and Statsmodels.
 - Finally, trainees will use the testing Titanic dataset to make a (simplified) prediction of the survival of Titanic passengers.
 
Capstone Project
Trainees will complete a Capstone Project under the supervision of an NCLab senior Data Analytics instructor in order to graduate and obtain a career certificate.