Skip to content

YoAdrianCodes/Data_Engineering_Projects

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

134 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Description


  • This repo contains projects relating to data engineering concepts
  • Further information and details about certain concepts can be found in the Intro to Basics folder

Projects


  1. Linux and Shell Scripting
  • This project applies my abilities of Linux and shell scripting to complete a fictional scenario as a linux developer at a top-tech company.
  1. Building Data Pipelines with Airflow
  • Apache Airflow is a great open source workflow orchestration tool that lets you build and run workflows
  • This project will collect data available in different formats, and consolidate it into a single file
  1. Building Data Pipelines with Kafka
  • Apache Kafka is a very popular open source event streaming pipeline
  • This project will create a data pipeline that collects streaming data and loads it into a database using Kafka
  1. Building Data Pipelines with Shell
  • Create a shell scripts to extract, transform, and load data
  • Create and populate a PostgreSQL table
  1. Data Warehousing with Postgres
  • Apply my knowledge and skills to design and load data into a data warehouse using facts and dimension tables
  • Write aggregation queries using CUBE and ROLLUP functions and create materialized query tables (materialized view)
  1. NoSQL with MongoDB, Cassandra and IBM Cloudant
  • This project applies my abilities to work with several NoSQL databases to move and analyze data
  • Move data from one type of database to another and run basic queries on various databases
  1. Data Engineering and Machine Learning with Spark
  • Use Apache Spark for Data Engineering and Machine Learning
  • Create a Spark application end-to-end that includes ETL and model training
  1. Python projects
  • Built an ETL pipeline with Python that extracted data from different file types, transformed the data into the require format. Then saved to a CSV file and ready to be loaded to a RDBMS.
  • Analyze the HTML code of a webpage and use requests and BeautifulSoup to extract the contents of that webpage. The relevant information will then be saved to a CSV file and loaded to a SQLite db
  1. Relational database fundamentals with mySQL and PostgreSQL
  • Build dbs and tables with mySQL CLI and PostgreSQL/pgAdmin
  • Use database design to create ERDs and understand the conceptual and logical design behind a database before implementing the physical