You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This repo contains projects relating to data engineering concepts
Further information and details about certain concepts can be found in the Intro to Basics folder
Projects
Linux and Shell Scripting
This project applies my abilities of Linux and shell scripting to complete a fictional scenario as a linux developer at a top-tech company.
Building Data Pipelines with Airflow
Apache Airflow is a great open source workflow orchestration tool that lets you build and run workflows
This project will collect data available in different formats, and consolidate it into a single file
Building Data Pipelines with Kafka
Apache Kafka is a very popular open source event streaming pipeline
This project will create a data pipeline that collects streaming data and loads it into a database using Kafka
Building Data Pipelines with Shell
Create a shell scripts to extract, transform, and load data
Create and populate a PostgreSQL table
Data Warehousing with Postgres
Apply my knowledge and skills to design and load data into a data warehouse using facts and dimension tables
Write aggregation queries using CUBE and ROLLUP functions and create materialized query tables (materialized view)
NoSQL with MongoDB, Cassandra and IBM Cloudant
This project applies my abilities to work with several NoSQL databases to move and analyze data
Move data from one type of database to another and run basic queries on various databases
Data Engineering and Machine Learning with Spark
Use Apache Spark for Data Engineering and Machine Learning
Create a Spark application end-to-end that includes ETL and model training
Python projects
Built an ETL pipeline with Python that extracted data from different file types, transformed the data into the require format. Then saved to a CSV file and ready to be loaded to a RDBMS.
Analyze the HTML code of a webpage and use requests and BeautifulSoup to extract the contents of that webpage. The relevant information will then be saved to a CSV file and loaded to a SQLite db
Relational database fundamentals with mySQL and PostgreSQL
Build dbs and tables with mySQL CLI and PostgreSQL/pgAdmin
Use database design to create ERDs and understand the conceptual and logical design behind a database before implementing the physical