Learn from Curated Curriculums developed by Industry Experts
Topics
What is Data Engineering
Data Engineer Roles & Responsibilities
Difference Between ETL Developer & Data Engineer
Types of Data
Steaming Vs Batch Data
Topics
Cloud Introduction and GCP Basics
GCP Implementation Models: IaaS, PaaS, SaaS
Overview of GCP Data Engineer Role
Understanding GCP Storage Components
Introduction to GCP ETL & Streaming Components
Topics
Google Cloud SQL Deployment and Management
Introduction to BigQuery: Serverless Data Warehouse
Performance Tuning: Understanding Slots and Query Pricing
Managing IAM Roles and Secure Connections (e.g., VPC, Firewalls)
Topics
GCP Resources and Resource Types
Introduction to Google Dataflow and Google Dataproc
Basic Concepts of Data Movement and Processing
Topics
BigQuery Architecture: Storage, Query Engine, and Dremel
Data Loading and Unloading with BigQuery
Table Creation, Partitioning, and Clustering for Performance Optimization
Managing Workloads and Query Optimization
Topics
Google Dataflow Concepts: Pipelines, PCollections, and Transforms
Constructing ETL Pipelines with Dataflow
Integrating Dataflow with GCS, Cloud SQL, BigQuery, and other GCP Services
Monitoring and Debugging Dataflow Jobs
Topics
Incremental Data Loading and Handling On-Premise Data Sources
Advanced Dataflow Features: Windows, Triggers, and Stateful Processing
Implementing Real-Time Data Integration with Pub/Sub
Topics:
Integrating BigQuery with Google Cloud Storage for Big Data Queries
Utilizing BigQuery ML for Machine Learning Inside Data Warehousing
Performance Optimization and Data Transformation Techniques
Topics:
Security Measures with Google Cloud Identity and Access Management (IAM) and Role-Based Access Control
Managing Encryption and Security in Dataflow and BigQuery
Utilizing Google Cloud Marketplace Datasets for Advanced Analytics
Topics:
GCP Storage Essentials: Buckets, Objects, and Classes
Introduction to Google Cloud Storage (GCS)
Configuring and Managing GCS Buckets
GCS Object Lifecycle Policies and Versioning
Topics:
Managing GCS: Object Storage and Nearline/Coldline for Archival
Utilizing Google Cloud Console and gsutil for Efficient Storage Management
Directory and File Operations in GCS
Best Practices for Organizing Data in GCS
Topics:
Implementing Security Measures in Google Cloud Storage
Access Control with GCS Bucket Policies, ACLs, and IAM Roles
Encryption Options: Customer-Managed Encryption Keys (CMEK) and Default Encryption
Compliance Features: HIPAA, PCI-DSS, and Data Sovereignty
Topics:
Strategies for Database Migrations to GCP
Integrating Google Cloud SQL with GCS
Utilizing Google Data Transfer Service and Transfer Appliance
Data Migration Tools and Techniques (e.g., Database Migration Service)
Topics:
Advanced Concepts in GCS: Object Lock, Multi-Part Uploads, and Signed URLs
Data Replication and Cross-Region Replication
Optimizing Storage Costs with GCS Storage Classes
Leveraging GCS for Big Data Analytics
Topics:
Fundamentals of Google Cloud Pub/Sub
Developing Pub/Sub Pipelines for Real-Time Insights
Integrating IoT Devices with GCP for Data Streaming
Processing and Analyzing Streaming Data
Topics:
Understanding GCP Event Services: Cloud Functions, Cloud Tasks, and Pub/Sub
Configuring Pub/Sub with Cloud Functions for Real-Time Processing
Patterns for Real-Time and Event-Driven Data Processing
Use Cases for Event-Driven Architectures
Topics:
Monitoring GCP Storage and Pub/Sub Resources with Cloud Monitoring and Logging
Performance Tuning for GCP Data Services
Implementing Disaster Recovery and High Availability
Using Google Cloud Security Command Center for Security and Compliance
Topics:
GCP Cloud Overview: Understanding SaaS, PaaS, IaaS in GCP
Introduction to Google Cloud Dataproc: Configuration, Cluster Management
Spark on Google Cloud Dataproc: Configurations, Node Types, and Resource Management
Using HDFS, GCS, and BigQuery with Dataproc
Topics:
Integrating Python with Spark: PySpark Basics
Data Loading Techniques: Using PySpark for Data Ingestion and Processing
Utilizing SQL in Dataproc: Creating and Managing Spark DataFrames and SQL Queries
Advanced Data Transformation: Working with Spark SQL for Data Analytics
Topics:
Configuring Google Cloud Storage (GCS) for use with Dataproc
Data Management: Reading and Writing Data to GCS using PySpark and Scala
Secure Data Access: Managing Permissions and Security between Dataproc and GCS
Topics:
Understanding Dataproc Architecture: Master, Worker, and Preemptible Worker Nodes, RDDs, and DAGs
Building and Monitoring Dataproc Jobs: Scheduling, Task Management, and Optimization
Implementing Best Practices for Reliable Data Lakes with Delta Lake Concepts
Topics:
Machine Learning Fundamentals in Dataproc: Using MLlib and AI Platform for Predictive Modeling
Data Exploration and Visualization: Leveraging Notebooks for Insights
Advanced Analytic Techniques: Utilizing Scala and Python for Complex Data Analysis
Topics:
Dataproc Security: Integrating with Google Cloud IAM and VPCs
Managing Permissions: IAM Policies, Security Groups, and Data Security
Compliance and Data Governance: Best Practices in Dataproc Environments
Topics:
Streaming Data with Dataproc: Concepts and Practical Applications
Integrating Pub/Sub and BigQuery with Dataproc for Real-Time Analytics
Processing Live Data Streams: Building and Deploying Stream Analytics Solutions
Topics:
Automating Workflows with GCP Cloud Composer and Dataproc
CI/CD for Dataproc: Automation and Version Control Integration
Deployment Strategies: Best Practices for Production Deployments in GCP
Topics
1. Introduction to Python
Overview of Python's history, key features, and comparison with other languages.
Setting up the Python environment, writing your first program. 2. Core Programming Concepts
Variables, data types, conditional statements, loops, control flow.
Introduction to strings, string manipulation, and basic functions.
Topics:
1. Deep Dive into Collections
Understanding lists, tuples, dictionaries, sets, and frozen sets.
Functions, methods, and comprehensions for collections.
2. Functional Programming in Python
Exploring function arguments, anonymous functions, and special functions (map, reduce, filter).
3. Object-Oriented Programming (OOP)
Classes, objects, constructors, destructors, inheritance, polymorphism.
Encapsulation, data hiding, magic methods, and operator overloading.
Topics:
1. Mastering Exception Handling
Exception handling mechanisms, try & finally clauses, user-defined exceptions.
2. File Handling Essentials
Basics of file operations, handling Excel and CSV files.
3. Database Programming
Introduction to database connections and operations with MySQL.
Topics:
1. Getting Started with Flask
Setting up Flask, creating simple applications, routing, and middleware.
2. Exploring Django
Introduction to Django, MVC model, views, URL mapping.
Topics:
1. Automation and Scripting
Enhancing file handling, database automation, and web scraping with BeautifulSoup.
2. GUI Development with TKinter
Basics of TKinter for developing desktop applications.
3. Version Control with Git
Managing projects with Git, understanding repository management, commits, merging, and basic Git commands.
Topics
Cloud Computing Fundamentals: Overview of cloud service models (IaaS, PaaS, SaaS) and deployment models (public, private, hybrid).
Basics of DevOps: Understanding the DevOps culture, practices, and its significance in cloud environments.
Data on the Cloud: Exploring cloud storage solutions, databases, and big data services provided by major cloud providers (AWS, Azure, Google Cloud).
Introduction to Infrastructure as Code (IaC): Concepts and tools for managing infrastructure through code.
Topics:
Cloud Storage Solutions: Differences between object storage, file storage, and block storage. Use cases for each.
Cloud Databases: Overview of relational and NoSQL database services in the cloud (e.g., AWS RDS, Azure SQL Database, Google Cloud Firestore).
Data Warehousing and Big Data Solutions: Introduction to cloud-based data warehousing services (e.g., Amazon Redshift, Google BigQuery, Azure Synapse Analytics).
Data Migration to Cloud: Strategies and tools for migrating data to cloud environments.
Topics:
Automated Data Pipelines: Designing and implementing automated data pipelines using cloud services.
Continuous Integration and Continuous Delivery (CI/CD) for Data: Applying CI/CD practices to data pipeline development, including version control, testing, and deployment strategies.
Monitoring and Logging: Tools and practices for monitoring cloud resources and data pipelines, understanding logs and metrics for troubleshooting.
Infrastructure as Code (IaC) for Data Systems: Using IaC tools (e.g., Terraform, CloudFormation) to provision and manage cloud data infrastructure.
Topics:
Serverless Data Processing: Leveraging serverless architectures for data processing tasks (e.g., AWS Lambda, Azure Functions).
Containerization and Data Services: Using containers (e.g., Docker, Kubernetes) for deploying and scaling data applications and services in the cloud.
Machine Learning and AI in the Cloud: Introduction to cloud-based machine learning services and integrating AI capabilities into data pipelines.
Data Analytics and Visualization: Tools and services for analyzing and visualizing data directly in the cloud (e.g., Amazon QuickSight, Google Data Studio, Power BI on Azure).
Topics:
Introduction to Databases and SQL: Understanding relational databases and the role of SQL.
SQL Syntax Overview: Keywords, statements, and clauses.
Basic SQL Commands: SELECT, FROM, WHERE, and ORDER BY.
Filtering Data: Using conditions to retrieve specific data (AND, OR, NOT).
Topics:
Understanding Table Relationships: Primary keys, foreign keys, and the importance of relationships in databases.
Join Operations: INNER JOIN, LEFT JOIN, RIGHT JOIN, and FULL JOIN.
Subqueries and Nested Queries: Using subqueries in the SELECT, FROM, and WHERE clauses.
Aggregating Data: Using GROUP BY and aggregate functions (COUNT, SUM, AVG, MIN, MAX).
Topics:
Data Manipulation Commands: INSERT, UPDATE, DELETE.
Managing Tables: Creating and altering tables (CREATE TABLE, ALTER TABLE, DROP TABLE).
Advanced Filtering Techniques: Using LIKE, IN, BETWEEN, and wildcard characters.
Working with Dates and Times: Understanding and manipulating date and time data.
Topics:
Advanced SQL Functions: String functions, mathematical functions, and date functions.
Window Functions: Overviews of ROW_NUMBER, RANK, DENSE_RANK, LEAD, LAG, and their applications.
Query Performance Optimization: Indexes, query planning, and execution paths.
Common Table Expressions (CTEs): Writing cleaner and more readable queries with WITH clause.
Topics:
Analytical SQL for Reporting: Building complex queries to answer analytical questions.
Pivoting Data: Transforming rows to columns (PIVOT) and columns to rows (UNPIVOT).
Data Warehousing Concepts: Introduction to data warehousing practices and how they apply to SQL querying.
Integrating SQL with Data Analysis Tools: Connecting SQL databases with tools like Excel, Power BI, and Python for deeper data analysis.
25th Sept 2023
Monday
8 AM (IST)
1hr-1:30hr / Per Session
27th Sept 2023
Wednesday
10 AM (IST)
1hr-1:30hr / Per Session
29th Sept 2023
Friday
12 PM (IST)
1hr-1:30hr / Per Session