Data Engineering Course

Chapter 1: Introduction to Database
  • Database
  • Terminology
    • Primary Key
    • Foreign/Referencing Key
    • Candidate/Alternate Key
    • Unique Constraint
  • OLTP and OLAP
  • Key
  • Relationships and Referential Integrity
  • Normalization
  • Types of Normalization
    • First Normal Form (1NF)
    • Second Normal Form (2NF)
    • Third Normal Form (3NF)
  • Denormalization
Chapter 2: Introduction to SQL
  • SQL
  • Database Vendor
  • Types of SQL Statements
    • DDL
    • DML
    • TCL
  • Data Types
    • Integer
    • Floating-Point
    • String
    • Text
    • Date and DateTime
Chapter 3: SQL Statement, Operators and Functions
  • SQL Statement Syntax
    • SELECT
    • FROM
    • WHERE
    • GROUP BY and HAVING
    • ORDER BY
  • Utility statement
  • Query Clauses
  • Filtering
  • Operators
    • Assignment Operator
    • Arithmetic Operator
    • Logical Operator
    • Comparison Operator
    • Bitwise Operator
  • Operator Precedence
  • Functions
  • Types of Functions
    • Numeric Function
    • String Function
    • Date and Time Function
    • Control Flow Function
    • Cast Function
    • Encryption and Compression Function
    • Aggregate/Grouping Function
    • Window Function
    • Information Function
    • JSON Function
  • Install MySQL Database, Create and Populate Employee Database
Chapter 4: Filtering, Join and Subquery
  • Filtering
  • Evaluate Condition
  • Types of Condition
    • Equality Condition
    • Inequality Condition
    • Range Condition
    • Membership Condition
    • Matching Condtion
  • Join
  • Types of Join
    • Inner Join
    • Left Join
    • Right Join
    • Full Join
    • Cross Join
  • Subquery
  • Types of Subquery
    • Nocorrelated Subquery
      • Single row and column Subquery
      • Multiple row and single column Subquery
      • Multiple column Subquery
    • Correlated Subquery
Chapter 5: Set Operators
  • Set
  • Set Operators
    • Union
    • Intersect
    • Except
Project
Chapter 1: Introduction to Application Design and Python
  • Overview of Course
  • Object Oriented Design
  • Python
  • Python Standard Library
  • Data Analytics Library
  • Big Data
  • Artificial Intelligence
  • Anaconda Distribution
  • Python Installation
Chapter 2: Introduction to Python Programming
  • Variables
  • Operators
  • Operator Precedence
  • Data Types
  • Assignment Statements
  • Comments
  • Escape Sequences
  • Strings
  • Keywords
  • Data Analytics
Chapter 3: Control Statements
  • Control Statements
  • if statement
  • if-else Statement
  • Nested if-else Statement
  • while Statement
  • for statement
  • break and continue Statement
  • Formatted Strings
  • Data Analytics
Chapter 4: Functions
  • Functions
  • User defined Functions
  • Module
  • Recursion
  • Lambda
  • Comments
  • Data Analytics
Chapter 5: Data Structure
  • Lists
  • Tuples
  • Slicing
  • Sorting
  • Searching
  • Dictionary
  • Sets
  • Data Analytics
Chapter 6: Classes
  • Class
  • Inheritance
  • Encapsulation
  • Polymorphism
  • Object
  • Instance Attributes
  • Class Attributes
  • Class Decorators
  • Overriding Methods
  • Data Analytics
Chapter 7: File and Exception Handling
  • File Processing
  • JSON File
  • Serialization and Deserialization
  • Exception Handling
  • try Statements
  • try-except-finally Clause
  • Data Analytics
Project
Chapter 1: Introduction to Spark
  • Big Data and Distributed Computing
  • Apache Spark
  • History of Spark
  • Spark Architecture
  • Spark Distributed Components
  • Spark API
  • Spark Users
  • Download Spark
Chapter 2: Structured API
  • RDD
  • DataFrames API
  • Spark Data Types
  • DataFrames
    • Schemas
    • Columns and Expressions
    • Rows
  • Spark Data Sources
    • DataFrameReader
    • DataFrameWriter
    • Rows
  • DataFrame Operations
    • Creating DataFrames
    • Projections
    • Renaming Columns
    • Dropping Columns
    • Adding Columns
    • Changing Column Types
    • Filtering Rows
    • Limit Rows
    • Distinct Rows
    • Sorting Rows
Chapter 3: Filtering and Data Manipulation
  • Filtering
    • Filtering Column
    • Filtering Row
    • Filter single column value
    • Filter multiple column with AND operator
    • Filter multiple column with OR operator
    • Filter with Boolean expression
  • PySpark SQL Module
  • Numeric Type Manipulation
  • String Type Manipulation
  • Date and Timestamp Type Manipulation
  • Complex Type Type Manipulation
    • Arrays Type
    • Maps Type
    • Structs Type
  • Handling Nulls
    • Dropping Null Values
    • Filling Null Values
    • Filtering Null Values
  • User Defined Functions
Chapter 4: Aggregation
  • Aggregation
  • Aggregate Functions
  • Grouping Type
    • Simple Grouping
    • GroupBy
    • Window
    • Grouping Set
      • Rollup
      • Cube
  • Pivot
  • User Defined Aggregate Functions
Chapter 5: Join
  • Join
  • Join Types
    • Inner Join
    • Left Outer Join
    • Left Semi Join
    • Left Anti Join
    • Right Outer Join
    • Outer Join
    • Natural Join
    • Cross Join
  • Complex Data Types Join
  • Duplicate Columns in Join
  • Optimization and Performance Tuning
Chapter 6: Spark SQL
  • SQL
  • Spark SQL
  • Running Spark SQL Queries
    • Spark SQL CLI
    • Spark Programmatic SQL Interface
  • SparkSQL JDBC/ODBC
  • Thrift Server
  • Catalog
  • Database
    • Create Database
    • Set Database
    • List Database
    • Display Current Database
    • Use Default Database
    • Drop Database
  • Tables
    • Managed Tables
    • UnManaged Tables
  • Create Tables
    • Create Managed Tables
    • Create UnManaged Tables
  • Describe Table
  • Display Table
  • Drop Table
  • Refresh Table Metadata
  • Cache Table
  • Views
    • Creating Views
    • Creating Temporary Views
    • Creating Global Temporary Views
    • Overwrite Views
    • Explain Views
    • Drop Views
  • Select Statements
  • Case Statements
  • Complex Types
    • Structs
    • Lists
    • Maps
  • Functions
  • User-Defined Functions
  • Subqueries
  • Interoperate SQL and DataFrames
  • Catalog API
Chapter 7: DataSets
  • DataSets
  • Creating Datasets
    • Scala
    • Java
  • DataFrame and Dataset
  • Encoder
  • Actions
  • Transformations
  • Joins
  • Grouping and Aggregations
  • Write Output to File
Chapter 8: RDD
  • Low Level APIs
  • RDDs
    • Partitions
  • Types of RDDs
  • Creating RDDs
    • Convert Dataset or DataFrame into RDD
    • Convert RDD into DataFrame or Dataset
  • Creating RDD from Data Sources
  • RDD Operations
  • Transformations
  • Actions
  • Saving output to external source
  • Caching
  • Checkpointing
  • Pipe RDDs
  • Advance RDDs
    • Key-Value RDDs
    • Aggregation
    • Joins
Chapter 8: Streaming
  • TBD
Project
Updating soon..
Updating soon..
Updating soon..
Updating soon..
Updating soon..