Learn from Curated Curriculums developed by Industry Experts
Topics
What is Data Engineering
Data Engineer Roles & Responsibilities
Difference Between ETL Developer & Data Engineer
Types of Data
Steaming Vs Batch Data
Topics
Cloud Introduction and AWS Basics
AWS Implementation Models: IaaS, PaaS, SaaS
Overview of AWS Data Engineer Role
Understanding AWS Storage Components
Introduction to AWS ETL & Streaming Components
Topics
Amazon RDS (Relational Database Service) Deployment
Amazon Redshift (Data Warehousing) Overview and Setup
Performance Tuning: Understanding Compute, Memory, and Storage
Managing Security Groups and Secure Connections (e.g., SSH, IAM Roles)
Topics
AWS Resources and Resource Types
Introduction to AWS Glue and AWS Lake Formation
Basic Concepts of Data Movement and Processing
Topics
Redshift Clusters, Nodes, and Data Distribution
Data Loading and Unloading with Redshift Spectrum
Table Creation, Compression, and Distribution Keys for Performance
Managing Workloads and Query Optimization
Topics
AWS Glue Concepts: Crawlers, Jobs, and Triggers
Constructing ETL Pipelines with Glue
Integrating Glue with S3, RDS, Redshift, and other AWS Services
Monitoring and Debugging Glue Jobs
Topics
Incremental Data Loading and Handling On-Premise Data Sources
Advanced Glue Features: Data Catalog, Data Batching, and Error Handling
Implementing Real-Time Data Integration with Kinesis Data Firehose
Topics
Integrating Redshift with Athena for Big Data Queries
Utilizing Redshift ML for Machine Learning Inside Data Warehousing
Performance Optimization and Data Transformation Techniques
Topics
Security Measures with AWS Identity and Access Management (IAM) and Role-Based Access Control
Managing Encryption and Security in Glue and Redshift
Utilizing AWS Marketplace Datasets and S3 for Advanced Analytics
Topics
AWS Storage Essentials: Files, Buckets, and Objects
Introduction to Amazon S3 (Simple Storage Service)
Configuring and Managing S3 Buckets
S3 Object Lifecycle Policies and Versioning
Topics
Managing S3: Object Storage, Glacier for Archival
Utilizing AWS S3 Console and CLI for Efficient Storage Management
Directory and File Operations in AWS S3
Best Practices for Organizing Data in S3
Topics
Implementing Security Measures in AWS S3
Access Control with S3 Bucket Policies, ACLs and IAM Roles
Encryption Options: S3-Managed, SSE-S3, SSE-KMS, and Client-Side Encryption
Compliance Features: HIPAA, PCI-DSS, and Data Sovereignty
Topics
Strategies for Database Migrations to AWS
Integrating Amazon RDS with S3
Utilizing AWS Data Pipeline for Data Movement and Transformation
Data Migration Tools and Techniques (e.g., AWS DMS)
Topics
Advanced Concepts in S3: Object Lock, Multi-Part Uploads, and Presigned URLs
Data Replication and Cross-Region Replication
Optimizing Storage Costs with S3 Intelligent-Tiering and Storage Classes
Leveraging S3 for Big Data Analytics
Topics
Fundamentals of AWS Kinesis (Data Streams, Firehose, and Analytics)
Developing Stream Analytics Jobs for Real-Time Insights
Integrating IoT Devices with AWS for Data Streaming
Processing and Analyzing Streaming Data
Topics
Understanding AWS Event Services: SNS, SQS, and Lambda
Configuring Kinesis with Lambda for Real-Time Processing
Patterns for Real-Time and Event-Driven Data Processing
Use Cases for Event-Driven Architectures
Topics
Monitoring AWS Storage and Kinesis Resources with CloudWatch - Performance Tuning for AWS Data Services - Implementing Disaster Recovery and High Availability - Using AWS Config, CloudTrail, and GuardDuty for Security and Compliance
Topics
AWS Cloud Overview: Understanding SaaS, PaaS, IaaS in AWS
Introduction to AWS EMR (Elastic MapReduce): Configuration, Cluster Management
Spark on AWS EMR: Configurations, Node Types, and Resource Management
Using HDFS, S3, and Glue with EMR
Topics
Integrating Python with Spark: PySpark Basics
Data Loading Techniques: Using PySpark for Data Ingestion and Processing
Utilizing SQL in EMR: Creating and Managing Spark DataFrames and SQL Queries
Advanced Data Transformation: Working with Spark SQL for Data Analytics
Topics
Configuring AWS S3 for use with EMR
Data Management: Reading and Writing Data to S3 using PySpark and Scala
Secure Data Access: Managing Permissions and Security between EMR and S3
Topics
Understanding EMR Architecture: Master, Core, and Task Nodes, RDDs, and DAGs
Building and Monitoring EMR Jobs: Scheduling, Task Management, and Optimization
Implementing Best Practices for Reliable Data Lakes with Delta Lake Concepts
Topics
Machine Learning Fundamentals in EMR: Using MLlib and SageMaker for Predictive Modeling
Data Exploration and Visualization: Leveraging Notebooks for Insights
Advanced Analytic Techniques: Utilizing Scala and Python for Complex Data Analysis
Topics
EMR Security: Integrating with AWS IAM and VPCs
Managing Permissions: IAM Policies, Security Groups, and Data Security
Compliance and Data Governance: Best Practices in EMR Environments
Topics
Streaming Data with EMR: Concepts and Practical Applications
Integrating Kinesis and Redshift with EMR for Real-Time Analytics
Processing Live Data Streams: Building and Deploying Stream Analytics Solutions
Topics
Automating Workflows with AWS Step Functions and EMR
CI/CD for EMR: Automation and Version Control Integration
Deployment Strategies: Best Practices for Production Deployments in AWS
Topics
1. Introduction to Python
Overview of Python's history, key features, and comparison with other languages.
Setting up the Python environment, writing your first program. 2. Core Programming Concepts
Variables, data types, conditional statements, loops, control flow.
Introduction to strings, string manipulation, and basic functions.
Topics:
1. Deep Dive into Collections
Understanding lists, tuples, dictionaries, sets, and frozen sets.
Functions, methods, and comprehensions for collections.
2. Functional Programming in Python
Exploring function arguments, anonymous functions, and special functions (map, reduce, filter).
3. Object-Oriented Programming (OOP)
Classes, objects, constructors, destructors, inheritance, polymorphism.
Encapsulation, data hiding, magic methods, and operator overloading.
Topics:
1. Mastering Exception Handling
Exception handling mechanisms, try & finally clauses, user-defined exceptions.
2. File Handling Essentials
Basics of file operations, handling Excel and CSV files.
3. Database Programming
Introduction to database connections and operations with MySQL.
Topics:
1. Getting Started with Flask
Setting up Flask, creating simple applications, routing, and middleware.
2. Exploring Django
Introduction to Django, MVC model, views, URL mapping.
Topics:
1. Automation and Scripting
Enhancing file handling, database automation, and web scraping with BeautifulSoup.
2. GUI Development with TKinter
Basics of TKinter for developing desktop applications.
3. Version Control with Git
Managing projects with Git, understanding repository management, commits, merging, and basic Git commands.
Topics
Cloud Computing Fundamentals: Overview of cloud service models (IaaS, PaaS, SaaS) and deployment models (public, private, hybrid).
Basics of DevOps: Understanding the DevOps culture, practices, and its significance in cloud environments.
Data on the Cloud: Exploring cloud storage solutions, databases, and big data services provided by major cloud providers (AWS, Azure, Google Cloud).
Introduction to Infrastructure as Code (IaC): Concepts and tools for managing infrastructure through code.
Topics:
Cloud Storage Solutions: Differences between object storage, file storage, and block storage. Use cases for each.
Cloud Databases: Overview of relational and NoSQL database services in the cloud (e.g., AWS RDS, Azure SQL Database, Google Cloud Firestore).
Data Warehousing and Big Data Solutions: Introduction to cloud-based data warehousing services (e.g., Amazon Redshift, Google BigQuery, Azure Synapse Analytics).
Data Migration to Cloud: Strategies and tools for migrating data to cloud environments.
Topics:
Automated Data Pipelines: Designing and implementing automated data pipelines using cloud services.
Continuous Integration and Continuous Delivery (CI/CD) for Data: Applying CI/CD practices to data pipeline development, including version control, testing, and deployment strategies.
Monitoring and Logging: Tools and practices for monitoring cloud resources and data pipelines, understanding logs and metrics for troubleshooting.
Infrastructure as Code (IaC) for Data Systems: Using IaC tools (e.g., Terraform, CloudFormation) to provision and manage cloud data infrastructure.
Topics:
Serverless Data Processing: Leveraging serverless architectures for data processing tasks (e.g., AWS Lambda, Azure Functions).
Containerization and Data Services: Using containers (e.g., Docker, Kubernetes) for deploying and scaling data applications and services in the cloud.
Machine Learning and AI in the Cloud: Introduction to cloud-based machine learning services and integrating AI capabilities into data pipelines.
Data Analytics and Visualization: Tools and services for analyzing and visualizing data directly in the cloud (e.g., Amazon QuickSight, Google Data Studio, Power BI on Azure).
Topics:
Introduction to Databases and SQL: Understanding relational databases and the role of SQL.
SQL Syntax Overview: Keywords, statements, and clauses.
Basic SQL Commands: SELECT, FROM, WHERE, and ORDER BY.
Filtering Data: Using conditions to retrieve specific data (AND, OR, NOT).
Topics:
Understanding Table Relationships: Primary keys, foreign keys, and the importance of relationships in databases.
Join Operations: INNER JOIN, LEFT JOIN, RIGHT JOIN, and FULL JOIN.
Subqueries and Nested Queries: Using subqueries in the SELECT, FROM, and WHERE clauses.
Aggregating Data: Using GROUP BY and aggregate functions (COUNT, SUM, AVG, MIN, MAX).
Topics:
Data Manipulation Commands: INSERT, UPDATE, DELETE.
Managing Tables: Creating and altering tables (CREATE TABLE, ALTER TABLE, DROP TABLE).
Advanced Filtering Techniques: Using LIKE, IN, BETWEEN, and wildcard characters.
Working with Dates and Times: Understanding and manipulating date and time data.
Topics:
Advanced SQL Functions: String functions, mathematical functions, and date functions.
Window Functions: Overviews of ROW_NUMBER, RANK, DENSE_RANK, LEAD, LAG, and their applications.
Query Performance Optimization: Indexes, query planning, and execution paths.
Common Table Expressions (CTEs): Writing cleaner and more readable queries with WITH clause.
Topics:
Analytical SQL for Reporting: Building complex queries to answer analytical questions.
Pivoting Data: Transforming rows to columns (PIVOT) and columns to rows (UNPIVOT).
Data Warehousing Concepts: Introduction to data warehousing practices and how they apply to SQL querying.
Integrating SQL with Data Analysis Tools: Connecting SQL databases with tools like Excel, Power BI, and Python for deeper data analysis.
25th Sept 2023
Monday
8 AM (IST)
1hr-1:30hr / Per Session
27th Sept 2023
Wednesday
10 AM (IST)
1hr-1:30hr / Per Session
29th Sept 2023
Friday
12 PM (IST)
1hr-1:30hr / Per Session