DAT-320
Building Scalable Machine Learning Pipelines as presented by Roderick Paulino as a capstone project for DAT-320 for University of Calgary Continuing Education.
This dataset and calculations are for educational purposes only.
PRELIMINARY ARCHITECTURE
This is the preliminary architecture of the machine learning data pipeline before using cloud services. The intent is to test the engineering calculations, simulations, predictions work-flow before using the cloud services. This infrastracture was simulated on a single node where a combination of VMs (using baremetal hypervisor) and Docker's image were used.
PYSPARK AND HADOOP WITH DOCKER
SPARK CLUSTER AND AIRFLOW NETWORK
These are the PYSPARK dags and not the Airflow dags. The worker's status could have been visible if the worker is in different node.