Learn from Curated Curriculums developed by Industry Experts
Topics
What is Data Engineering
Data Engineer Roles & Responsibilities
Difference Between ETL Developer & Data Engineer
Types of Data
Steaming Vs Batch Data
Topics
Cloud Introduction and Azure Basics
Azure Implementation Models: IaaS, PaaS, SaaS
Overview of Azure Data Engineer Role
Understanding Azure Storage Components
Introduction to Azure ETL & Streaming Components
Topics
Azure SQL Server and Database Deployment
DTU vs. DWU: Understanding Performance Levels
Managing Firewall Rules and Secure SSMS Connections
Azure Account and Subscription Management
Topics
Azure Resources and Resource Types
Introduction to Azure Data Factory (ADF) and Azure Synapse Analytics
Basic Concepts of Data Movement and Processing
Topics
Synapse SQL Pools (Data Warehousing) and Massively Parallel Processing (MPP)
Data Movement with DMS and SQL Pool Management
Table Creations, Distributions, and Indexing for Performance
Topics
Azure Data Factory Pipeline Architecture and Integration Runtime
Constructing ETL Pipelines with DIU Considerations
Data Flow Activities and Monitoring
Topics
Incremental Data Loading and Handling On-Premise Data Sources
Advanced ADF Features: Data Flows, ETL Logging, and Performance Tuning
Implementing CDC with ADF for Real-Time Data Capture
Topics
Integrating Spark with Synapse Analytics for Big Data Processing
Utilizing Python Notebooks and Spark Pools for Data Analysis
Performance Optimization and Data Transformation Techniques
Topics
Security Measures with Azure Active Directory and Role-Based Access Control
Managing Parameters and Security in Synapse and ADF Pipelines
Utilizing Azure OpenDatasets and Parquet Files for Advanced Analytics
Azure Storage Essentials: Files, Tables, and Queues
Introduction to Azure Data Lake Storage Gen2 (ADLS Gen2)
Configuring and Managing Storage Accounts
Hierarchical Namespace (HNS) and its Advantages
Managing BLOB Storage: Binary Large Objects Explained
Utilizing Azure Storage Explorer for Efficient Storage Management
Directory and File Operations in Azure Data Lake
Best Practices for Organizing Data in ADLS Gen2
Implementing Security Measures in Azure Data Lake Storage
Access Control with Shared Access Signatures (SAS) and Access Control Lists (ACLs)
Role-Based Access Control (RBAC) in Azure Storage
Encryption, Authentication, and Compliance Features
Strategies for SQL Database Migrations to Azure
Integrating Azure SQL with Data Lake Storage
Utilizing Azure Data Factory for Data Movement and Transformation
Data Migration Tools and Techniques
Advanced Concepts in Azure Table Storage
Data Replication and Geo-Redundancy Options
Optimizing Storage Costs and Performance
Leveraging Data Lake for Big Data Analytics
Fundamentals of Azure Stream Analytics
Developing Stream Analytics Jobs for Real-Time Insights
Integrating IoT Devices with Azure for Data Streaming
Processing and Analyzing Streaming Data
Understanding Azure Event Hubs for Large-Scale Event Processing
Configuring Event Hubs and Event Hub Namespaces
Connecting Event Hubs with Azure Stream Analytics
Patterns for Real-Time and Event-Driven Data Processing
Monitoring Azure Storage and Stream Analytics Resources
Performance Tuning for Azure Data Services
Implementing Disaster Recovery Strategies
Using Azure Monitor and Key Vaults for Operational Excellence
Azure Cloud Overview: Understanding SaaS, PaaS, IaaS
Introduction to Azure Databricks: Configuration, Compute Resources, and Workspace Usage
Spark Clusters in Azure Databricks: Configurations, Types, and Resource Management
Databricks File System (DBFS): Utilizing Files and Tables with Spark
Integrating Python with Spark: PySpark Basics
Data Loading Techniques: Using PySpark for Data Ingestion and Processing
Utilizing SQL in Databricks: Creating and Managing Spark Databases and Tables
Advanced Data Transformation: Working with DataFrames and Spark SQL for Data Analytics
Configuring Azure Data Lake Storage (ADLS) for use with Databricks
Data Management: Reading and Writing Data to ADLS using PySpark and Scala
Secure Data Access: Managing Access and Security between Databricks and ADLS
Understanding Databricks Architecture: Driver and Worker Nodes, RDDs, and DAGs
Building and Monitoring Databricks Jobs: Scheduling, Task Management, and Optimization
Implementing Delta Lake for Reliable Data Lakes: ACID Transactions and Performance Tuning
Machine Learning Fundamentals in Databricks: Using MLlib for Predictive Modeling
Data Exploration and Visualization: Leveraging Notebooks for Insights
Advanced Analytic Techniques: Utilizing Scala and Python for Complex Data Analysis
Databricks Security: Integrating with Azure Active Directory (AD)
Managing Permissions: Workspace, Notebooks, and Data Security
Compliance and Data Governance: Best Practices in Databricks Environments
Streaming Data with Databricks: Concepts and Practical Applications
Integrating Azure Event Hubs with Databricks for Real-Time Analytics
Processing Live Data Streams: Building and Deploying Stream Analytics Solutions
Automating Workflows with Azure Logic Apps and Databricks
CI/CD for Databricks: Automation and Version Control Integration
Deployment Strategies: Best Practices for Production Deployments in Azure
Topics
1. Introduction to Python
Overview of Python's history, key features, and comparison with other languages.
Setting up the Python environment, writing your first program. 2. Core Programming Concepts
Variables, data types, conditional statements, loops, control flow.
Introduction to strings, string manipulation, and basic functions.
Topics:
1. Deep Dive into Collections
Understanding lists, tuples, dictionaries, sets, and frozen sets.
Functions, methods, and comprehensions for collections.
2. Functional Programming in Python
Exploring function arguments, anonymous functions, and special functions (map, reduce, filter).
3. Object-Oriented Programming (OOP)
Classes, objects, constructors, destructors, inheritance, polymorphism.
Encapsulation, data hiding, magic methods, and operator overloading.
Topics:
1. Mastering Exception Handling
Exception handling mechanisms, try & finally clauses, user-defined exceptions.
2. File Handling Essentials
Basics of file operations, handling Excel and CSV files.
3. Database Programming
Introduction to database connections and operations with MySQL.
Topics:
1. Getting Started with Flask
Setting up Flask, creating simple applications, routing, and middleware.
2. Exploring Django
Introduction to Django, MVC model, views, URL mapping.
Topics:
1. Automation and Scripting
Enhancing file handling, database automation, and web scraping with BeautifulSoup.
2. GUI Development with TKinter
Basics of TKinter for developing desktop applications.
3. Version Control with Git
Managing projects with Git, understanding repository management, commits, merging, and basic Git commands.
Topics
Cloud Computing Fundamentals: Overview of cloud service models (IaaS, PaaS, SaaS) and deployment models (public, private, hybrid).
Basics of DevOps: Understanding the DevOps culture, practices, and its significance in cloud environments.
Data on the Cloud: Exploring cloud storage solutions, databases, and big data services provided by major cloud providers (AWS, Azure, Google Cloud).
Introduction to Infrastructure as Code (IaC): Concepts and tools for managing infrastructure through code.
Topics:
Cloud Storage Solutions: Differences between object storage, file storage, and block storage. Use cases for each.
Cloud Databases: Overview of relational and NoSQL database services in the cloud (e.g., AWS RDS, Azure SQL Database, Google Cloud Firestore).
Data Warehousing and Big Data Solutions: Introduction to cloud-based data warehousing services (e.g., Amazon Redshift, Google BigQuery, Azure Synapse Analytics).
Data Migration to Cloud: Strategies and tools for migrating data to cloud environments.
Topics:
Automated Data Pipelines: Designing and implementing automated data pipelines using cloud services.
Continuous Integration and Continuous Delivery (CI/CD) for Data: Applying CI/CD practices to data pipeline development, including version control, testing, and deployment strategies.
Monitoring and Logging: Tools and practices for monitoring cloud resources and data pipelines, understanding logs and metrics for troubleshooting.
Infrastructure as Code (IaC) for Data Systems: Using IaC tools (e.g., Terraform, CloudFormation) to provision and manage cloud data infrastructure.
Topics:
Serverless Data Processing: Leveraging serverless architectures for data processing tasks (e.g., AWS Lambda, Azure Functions).
Containerization and Data Services: Using containers (e.g., Docker, Kubernetes) for deploying and scaling data applications and services in the cloud.
Machine Learning and AI in the Cloud: Introduction to cloud-based machine learning services and integrating AI capabilities into data pipelines.
Data Analytics and Visualization: Tools and services for analyzing and visualizing data directly in the cloud (e.g., Amazon QuickSight, Google Data Studio, Power BI on Azure).
Topics:
Introduction to Databases and SQL: Understanding relational databases and the role of SQL.
SQL Syntax Overview: Keywords, statements, and clauses.
Basic SQL Commands: SELECT, FROM, WHERE, and ORDER BY.
Filtering Data: Using conditions to retrieve specific data (AND, OR, NOT).
Topics:
Understanding Table Relationships: Primary keys, foreign keys, and the importance of relationships in databases.
Join Operations: INNER JOIN, LEFT JOIN, RIGHT JOIN, and FULL JOIN.
Subqueries and Nested Queries: Using subqueries in the SELECT, FROM, and WHERE clauses.
Aggregating Data: Using GROUP BY and aggregate functions (COUNT, SUM, AVG, MIN, MAX).
Topics:
Data Manipulation Commands: INSERT, UPDATE, DELETE.
Managing Tables: Creating and altering tables (CREATE TABLE, ALTER TABLE, DROP TABLE).
Advanced Filtering Techniques: Using LIKE, IN, BETWEEN, and wildcard characters.
Working with Dates and Times: Understanding and manipulating date and time data.
Topics:
Advanced SQL Functions: String functions, mathematical functions, and date functions.
Window Functions: Overviews of ROW_NUMBER, RANK, DENSE_RANK, LEAD, LAG, and their applications.
Query Performance Optimization: Indexes, query planning, and execution paths.
Common Table Expressions (CTEs): Writing cleaner and more readable queries with WITH clause.
Topics:
Analytical SQL for Reporting: Building complex queries to answer analytical questions.
Pivoting Data: Transforming rows to columns (PIVOT) and columns to rows (UNPIVOT).
Data Warehousing Concepts: Introduction to data warehousing practices and how they apply to SQL querying.
Integrating SQL with Data Analysis Tools: Connecting SQL databases with tools like Excel, Power BI, and Python for deeper data analysis.
25th Sept 2023
Monday
8 AM (IST)
1hr-1:30hr / Per Session
27th Sept 2023
Wednesday
10 AM (IST)
1hr-1:30hr / Per Session
29th Sept 2023
Friday
12 PM (IST)
1hr-1:30hr / Per Session