Intro to Python for Data Science
This is a college-level introductory Python course geared towards Data Analytics and Data Science applications. Trainees learn Python by solving programming problems of gradually increasing complexity ranging from simple calculations, working with text strings, loops, conditions, variables, and functions to file operations and data visualization.
- Trainees learn at their own pace by reading tutorials, watching videos, going through examples, and solving programming challenges.
- Every short lesson is followed by self-assessment, so that trainees instantly know whether they have mastered the concept.
- Trainees obtain real-time help from the NCLab AI tutorial engine, as well as remote assistance from live course instructors as needed.
- Trainees learn how to use powerful Python libraries including Matplotlib, Numpy and Scipy.
- An interactive Python coding app allows trainees to create portfolio artifacts and easily share them online.
This is not an introductory computer programming course. Trainees should be familiar with basic concepts of computer programming including syntax, counting and conditional loops, conditions, local and global variables, functions, and recursion. For complete beginners who have little or no computer programming, NCLab provides an excellent introductory course Introduction to Computer Programming as part of the training program.
One of the following browsers:
- Google Chrome
- Mozilla Firefox
- Microsoft Internet Explorer (9.0 or above)
Course Structure and Length
This course is self-paced, and trainees practice each skill and concept as they go. Automatic feedback is built into the course for both practice exercises and quizzes. The course is divided into four Units, and each Unit is composed of five Sections. Each Section consists of 7 instructional/practice levels, a quiz, and a master (proficiency) level. Quizzes can be retaken after 12 hours.
Introduction to Python for Data Science is designed to take approximately 80 hours. Since the course is self-paced, the amount of time required to complete the course will vary from trainee to trainee.
Unit 1 (Introduction)
In this Unit trainees review basic concepts of computer programming including loops, conditions, variables and functions. They will learn:
- When Python was introduced, who its author is, and where its name comes from.
- That Python can be used by non-programmers as a powerful scientific calculator.
- How to add, subtract, multiply and divide numbers using the operators +, -, *, /.
- About the priorities of arithmetic operators, and that parentheses have the highest priority.
- That Python 3 is one of the few languages where integer division yields the correct real value.
- That Python has many powerful free libraries and how to import them correctly.
- That the keyword from should be avoided when importing libraries.
- How to import the fractions library and work with fractions.
- How to work with fractions and how to use the built-in function help().
- How to define numerical (integer and real) variables and text strings.
- How to import Numpy and use the functions it provides.
- How to display combinations of text strings and variables using the built-in function print().
- How to use the floor division operator //, the modulo operator %, and the power operator **.
- That one needs to be careful when using // with negative and real numbers.
- That real numbers are not represented exactly in the computer, which can lead to problems with the floor division and modulo operators.
- That = is the assignment operator and == is the comparison operator.
- That True and False are Boolean values meaning “true” and “not true”, respectively.
- That when == is used to compare numbers, the result is True or False.
- That one never should compare real numbers using the == operator.
- That the built-in function abs() can be used to calculate the absolute value of numbers.
- The symbol < is the inequality symbol meaning “less than”.
- That when < is used to compare numbers, the result is True or False.
- The correct way to compare real numbers is to subtract them, and then check if the absolute value of the result is less than a given small tolerance.
- How to reach the limit of the finite computer arithmetic on any computer.
- How to use the arithmetic operators +=, -=, *=, /=, //=, %= and **=.
- How to work with the most important units of data size including b, KB, MB and GB.
- That one should not confuse 1 KB (1024 B) with 1 kB (1000 B).
- How to correctly define functions in Python.
- That a function must be called, not only defined, to do something.
- That it is very important to write docstrings and comment the code well.
- The difference between function parameters and arguments.
- That functions do not have to accept any arguments and they do not have to return any values.
- That functions can accept multiple arguments and return multiple values.
- The difference between the global scope (main program) and local scopes (inside of functions).
- That global variables are defined in the entire program, including in all functions.
- That local variables are only defined inside of functions, and undefined in the main program.
- That functions should never change the values of global variables.
- How to work with tuples, including how to obtain their length.
- How to unpack tuples, and access individual items via indices.
- How to parse tuples one item at a time using the for loop.
- That the for loop is often used in combination with the range() function.
- How to use nested for loops.
- Create empty and non-empty lists.
- Obtain the length of lists using the function len().
- Add items to lists using the methods append() and insert().
- Remove items from lists using the method pop() and the keyword del.
- Add lists and multiply them with integers.
- Lists are mutable objects in Python.
- Parse lists using the for loop.
- Empty lists using the while loop.
- Access individual list items via their indices.
- Slice lists, and create copies and reversed copies of lists via slicing.
- Reverse lists and sort them using list methods reverse() and sort().
- Reverse lists and sort them using built-in functions reversed() and sorted().
- Make list and tuple items unique.
- Work with Boolean expressions and variables.
- Work with the if , if-else and if-elif-else statements.
- Use the keyword in to check if a given item is present in a tuple or list.
- Use the method count() to count occurrences of given items in tuples and lists.
- Use the method index() to obtain positions of given items in tuples and lists.
- Work with the Boolean operators and, or, not.
- Chain arithmetic comparison operators.
- Use the break and continue statements in loops.
- Work with infinite while loops.
- Use the command pass.
- How to generate random numbers.
- Use the else branch in for and while loops.
Unit 2 (Working with Text Strings)
In this Unit trainees will learn how to work with text strings and regular expressions.
- Text strings are groups of characters enclosed in single or double quotes.
- It does not matter whether one uses single or double quotes.
- A text string enclosed in double quotes may contain single quotes, and vice versa.
- Trailing spaces are deceiving since they can make different text strings appear to be the same.
- Trailing spaces can be identified using the built-in function repr().
- Text strings can be compared with the == operator whose result is either True or False.
- The built-in function print() has useful optional parameters sep and end.
- The built-in function help() can be used to obtain help for any Python keyword or function.
- If a function contains a docstring, it will be known to the function help().
- Strings can be added and multiplied with positive integers.
- They can also be updated with the operators += and *=.
- Python is an interpreted, dynamically (weakly) typed, case-sensitive language.
- The PEP8 — Style Guide for Python Code.
- Combine single and double quotes in one text string.
- Obtain the length of text strings with the function len().
- Work with the special characters n, ” and ‘.
- Special characters are text strings of length 1, same as regular text characters.
- Cast numbers to text strings with the function str(), and insert numbers into text strings.
- Cast text strings to numbers with the functions int() and float().
- Display the type of variables using the function type().
- Check the type of variables at runtime using the function isinstance().
- Use the text string methods lower(), upper() and title().
- Text string methods do not change the original text string.
- Clean text strings – automatically eliminate trailing spaces and unwanted special characters using the methods rstrip(), lstrip() and strip().
- Split a text string into a list of words with the method split().
- Check for substrings using the keyword in.
- Make a text search case-insensitive by first lowercasing both text strings.
- Count the occurrences of substrings in text strings with the method count().
- Search for, and replace substrings in text strings with the method replace().
- Zip two lists, and use the for-loop to parse them at the same time.
- Erase parts of text strings by replacing them with an empty string.
- Clean text strings from unwanted characters.
- Shrink large empty spaces in text strings.
- Swap the contents of two text strings.
- Swap two substrings in a text string.
- Implemented the ROT13 cipher and a Morse code translator.
- Review of the while-loop, the break statement, and functions returning multiple values.
- The concept of mutability in Python.
- Text strings are immutable objects in Python.
- Access individual characters in text strings via their indices.
- Slice text strings and reverse them using slicing.
- Use system date and time in Python programs.
- Obtain the position of a substring in a given text string.
- How are text strings represented in computer memory.
- Access ASCII/Unicode codes of text characters.
- Access ASCII/Unicode characters via their codes.
- Text strings can be compared using the operators <, <=, >, >= just like numbers.
- Create text strings with characters which are not present on the keyboard.
- What are regular expressions and where they are used.
- What functionality is available in Python’s regular expressions module “re”.
- How to use the functions search(), match() and findall().
- About greedy and non-greedy repeating patterns.
- How to use character classes and groups of characters.
- How to work with the most important metacharacters and special sequences.
- How to mine unknown file names and email addresses from text data.
Unit 3 (Python Libraries and API Design)
In this Unit trainees learn how to use Matplotlib and Numpy to perform calculations, draw images, display graphs of functions, plot curves, and more.
- Import a library and abbreviate its name.
- Define lines and polylines using X and Y arrays.
- Create a plot using the function plot().
- Assign colors to shapes.
- Display the plot using the function show().
- Display two or more shapes simultaneously.
- Make both axes equally-scaled using axis(“equal”).
- Hide axes using axis(“off”).
- Fill closed areas with color using the function fill().
- Remove borders by setting linewidth=0.
- Make holes in shapes.
- Use the function linspace() to create equidistant grids.
- Plot graphs of functions using the linspace() array as the x-variable.
- Draw circles centered at (0, 0) using the formula x = r*cos(t), y = r*sin(t).
- Draw regular polygons by reducing the number of edges of the circle.
- Draw ellipses using the formula x = r_x*cos(t), y = r_y*sin(t).
- Draw spirals using the formula x = t*cos(t), y = t*sin(t).
- Cast linspace() arrays into lists so that other lists can be added to them.
- Draw circles centered at (cx, cy) using x = cx + r*cos(t), y = cy + r*sin(t).
- Reverse the orientation of Numpy and Matplotlib drawings by flipping the sign of the Y array.
- Add a title to drawings.
Automate Numpy and Matplotlib drawings by combining functions, for-loops and lists.
- Create functions to draw lines, rectangles, circles and regular polygons.
- Define and use functions returning multiple values.
- Use tuples in Python, and know that a function returning multiple values in fact is returning a tuple.
- Define and use functions with default arguments.
- Access items in linspace() arrays via their indices.
Create a Graphics Editor API with the following functionality:
- Create a drawing.
- Display a drawing.
- Add shapes to a drawing.
- Create lines, polylines, squares, rectangles, triangles, quads, polygons, circles, and arcs.
- Fill shapes with color.
- Move, rotate and scale shapes.
- Merge shapes.
- Reverse the orientation of shapes and create holes.
- Learn about software design, and in particular the necessity to keep software modules self-contained. Know that one should not expose the internal data structures of the program to users. The same holds for modules that exist within a software – they too should not expose their internal data structures to other software modules.
- Use the ‘break’ and ‘continue’ statements
- Learn additional list features including three types of list comprehension.
- Use list comprehension to duplicate, rotate, and move shapes.
- Add additional functions for the API.
- Understand why it is important to communicate with individual shapes through the Drawing, and not with all of them individually.
Unit 4 (Files, Data, and Visualization)
In this Unit trainees will learn how to work with data and files. Work with various data formats including ASCII art, bitmap images, dictionaries, CSV files, and visualize data obtained from scientific measurements and computations.
- Open a text file for reading.
- Parse a text file line-by-line.
- Close the file.
- Cleaning text strings with strip(), lstrip() and rstrip().
- Splitting a text string into words.
- Counting lines, words and characters in a text file.
- Rewinding a file and when is this useful.
- Reading selected lines and the file method readline().
- Understand the differences and similarities between sets and lists:
- Set items are not ordered and not indexed
- Sets cannot contain duplicate items
- Create empty and non-empty sets.
- Add elements to sets; remove elements from sets
- Check set length; check for the presence of an element in a set.
- Check for subsets and supersets.
- Create set unions, intersections, and differences.
- Learn how to write data to files.
- Save programs from crashing by catching exceptions.
- Trainees will learn additional things about text files such as:
- How to read all lines into a list at once.
- How to write a list of text strings into a file at once.
- How to read the whole file into a single text string at once.
- Reverse text strings and lists.
- Learn about the history of Internet and ASCII Art.
- The difference between bitmap (raster) and vector images.
- PBM (portable bitmap), PGM (portable grey map) and PPM (portable pix map) images and why they are useful.
- The structure of PBM, PGM and PPM image files.
- How to leave out comments while reading a text file.
- How to read a sequence of numbers from file and convert it into a two-dimensional array.
- How to work with two-dimensional arrays using nested loops and indices.
- How to write image files to disk.
- How to upload image and data files to NCLab.
Trainees will also create their own image viewers for PBM, PGM and PPM images based on Numpy 2D arrays.
- Learn what dictionaries are and why they are useful.
- Create an empty and a non-empty dictionary.
- Add key:value pairs to a dictionary and how to remove them.
- Access values using keys.
- Parse the dictionary using a for-loop.
- Extract from a dictionary the lists of keys, values, and items.
- Zip the lists of keys and values to create a dictionary.
- Combine dictionaries together.
- Understand that dict is a mutable type in Python (it can be changed by functions).
- Access Google Translate API from Python programs to get real-time translations.
Visualize data obtained from measurements and computations. Use CSV and other data formats. Use Numpy, Matplotlib, and in particular Matplotlib’s mplot3d toolkit to display:
- Measurement data using graphs and bar charts.
- Percentages using pie charts.
- Graphs of functions of two variables.
- 2D measurement data on structured grids using
- wireframe plots,
- surface plots,
- contour plots and
- color maps.
- Scientific data computed on unstructured triangular grids.
- 2D data represented as 2D Numpy arrays.
- Visualize MRI data of the human brain.