Labs

Installing and configuring all the software needed for this course on your machine might be tedious. We have prepared a virtual machine (VM) with the majority of tools which you can download and use it.

If you use VMWare Workstation Player (recommended, free for personal use from here) you can download the VM from here (size: ~8.7GB).

If you use Oracle VirtualBox (latest version 7.0.4, free from here) you can download the VM from here (size: ~9.3GB).

The username/password is csdeptucy.

PLEASE INSTALL THE VM BEFORE THE FIRST LAB.

If you want to resize your VM please follow these instructions.

Week Description Useful Links Material Exercises to deliver
1 Introduction to Apache Hadoop LAB01.pdf
Source Code
Dataset
 
2 Programming with Apache Hadoop   LAB02.pdf
WordCount.java
SalesJan2009.csv
🔴
- Introduction to Python Students who are not familiar with Python can review this lab in their free time and reach out to me if they need additional assistance. LAB03.pdf
 
3 Data Manipulation LAB04.pdf,
Lab04.ipynb,
iris_data.csv,
iris_data2.csv
4 Data Visualization LAB05.pdf,
Lab05.ipynb,
iris.csv,
haberman.csv
5 Data Preparation I: Cleaning, Encoding, Scaling, Resampling Data   LAB06
Lab06.ipynb
NFL Play by Play 2009-2016 (v3).zip
house_prices_train.csv
shampoo.csv
6 Data Preparation II: Dimensionality Reduction: Feature Selection and Extraction LAB07
Lab07.ipynb
 
7 Machine Learning: Regression   LAB08
Lab8_LinearRegression.ipynb
Lab8_PolynomialRegression.ipynb
Advertising.csv
Boston.csv
8 Machine Learning: Regression (cont'd)   LAB08
Lab08.ipynb
Advertising.csv
Boston.csv
9 Machine Learning: Classification and Clustering   LAB09
Lab09-classification.ipynb
Lab09-clustering.ipynb
telco.csv
wine_data.csv
fleet_data.csv
WineAnalysis.ipynb
🔴
10 Public holiday (25/3)      
11 Public holiday (1/4)      
12 Introduction to Apache Spark   LAB10
kmeans-rdd.py
kmeans-dataframe.py
 
13 No Lab Project Presentation Week