Our Courses

DP-203 : Data Engineering on Microsoft Azure

  • Category
    IT & Software
  • View
    32
  • Review
    • 0
  • Created At
    5 months ago
DP-203 : Data Engineering on Microsoft Azure

Launch Your Career in Data Engineering. Master designing and implementing data solutions that use Microsoft Azure data services

This Professional Certificate is intended for data engineers and developers who want to demonstrate their expertise in designing and implementing data solutions that use Microsoft Azure data services anyone interested in preparing for the Exam DP-203: Data Engineering on Microsoft Azure. This Professional Certificate will help you develop expertise in designing and implementing data solutions that use Microsoft Azure data services. You will learn how to integrate, transform, and consolidate data from various structured and unstructured data systems into structures that are suitable for building analytics solutions that use Microsoft Azure data services. This program consists of 10 courses to help prepare you to take Exam DP-203: Data Engineering on Microsoft Azure. Each course teaches you the concepts and skills that are measured by the exam. By the end of this Professional Certificate, you will be ready to take and sign-up for the Exam DP-203: Data Engineering on Microsoft Azure.

Applied Learning Project

Learners will engage in interactive exercises throughout this program that offers opportunities to practice and implement what they are learning. They use the Microsoft Learn Sandbox. This is a free environment that allows learners to explore Microsoft Azure and get hands-on with live Microsoft Azure resources and services.

Skills measured on Microsoft Azure DP-203 Exam

Design and Implement Data Storage (40-45%)

Design and implement data storage (40–45%)

Design and develop data processing (25–30%)

Design and implement data security (10–15%)

Monitor and optimize data storage and data processing (10–15%)

The exam measures your ability to accomplish the following technical tasks: design and implement data storage; design and develop data processing; design and implement data security; and monitor and optimize data storage and data processing.

Functional groups

Design and implement data storage (40–45%)

Design a data storage structure

Design an Azure Data Lake solution

Recommend file types for storage

Recommend file types for analytical queries

Design for efficient querying

Design for data pruning

Design a folder structure that represents the levels of data transformation

Design a distribution strategy

Design a data archiving solution

Design a partition strategy

Design a partition strategy for files

Design a partition strategy for analytical workloads

Design a partition strategy for efficiency/performance

Design a partition strategy for Azure Synapse Analytics

Identify when partitioning is needed in Azure Data Lake Storage Gen2

Design the serving layer

Design star schemas

Design slowly changing dimensions

Design a dimensional hierarchy

Design a solution for temporal data

Design for incremental loading

Design analytical stores

Design metastores in Azure Synapse Analytics and Azure Databricks

Implement physical data storage structures

Implement compression

Implement partitioning Implement sharding

Implement different table geometries with Azure Synapse Analytics pools

Implement data redundancy

Implement distributions

Implement data archiving

Implement logical data structures

Build a temporal data solution

Build a slowly changing dimension

Build a logical folder structure

Build external tables

Implement file and folder structures for efficient querying and data pruning

Implement the serving layer

Deliver data in a relational star

Deliver data in Parquet files

Maintain metadata

Implement a dimensional hierarchy

Design and develop data processing (25–30%)

Ingest and transform data

Transform data by using Apache Spark

Transform data by using Transact-SQL

Transform data by using Data Factory

Transform data by using Azure Synapse Pipelines

Transform data by using Stream Analytics

Cleanse data

Split data

Shred JSON

Encode and decode data

Configure error handling for the transformation

Normalize and denormalize values

Transform data by using Scala

Perform data exploratory analysis

Design and develop a batch processing solution

Develop batch processing solutions by using Data Factory, Data Lake, Spark, Azure Synapse Pipelines, PolyBase, and Azure Databricks

Create data pipelines

Design and implement incremental data loads

Design and develop slowly changing dimensions

Handle security and compliance requirements

Scale resources

Configure the batch size

Design and create tests for data pipelines

Integrate Jupyter/Python notebooks into a data pipeline

Handle duplicate data

Handle missing data

Handle late-arriving data

Upsert data

Regress to a previous state

Design and configure exception handling

Configure batch retention

Design a batch processing solution

Debug Spark jobs by using the Spark UI

Design and develop a stream processing solution

Develop a stream processing solution by using Stream Analytics, Azure Databricks, and Azure Event Hubs

Process data by using Spark structured streaming

Monitor for performance and functional regressions

Design and create windowed aggregates

Handle schema drift

Process time series data

Process across partitions

Process within one partition

Configure checkpoints/watermarking during processing

Scale resources

Design and create tests for data pipelines

Optimize pipelines for analytical or transactional purposes

Handle interruptions

Design and configure exception handling

Upsert data

Replay archived stream data

Design a stream processing solution

Manage batches and pipelines

Trigger batches

Handle failed batch loads

Validate batch loads

Manage data pipelines in Data Factory/Synapse Pipelines

Schedule data pipelines in Data Factory/Synapse Pipelines

Implement version control for pipeline artifacts

Manage Spark jobs in a pipeline

Design and implement data security (10–15%)

Design security for data policies and standards

Design data encryption for data at rest and in transit

Design a data auditing strategy

Design a data masking strategy

Design for data privacy

Design a data retention policy

Design to purge data based on business requirements

Design Azure role-based access control (Azure RBAC) and POSIX-like Access Control List (ACL) for Data Lake Storage Gen2

Design row-level and column-level security

Implement data security

Implement data masking

Encrypt data at rest and in motion

Implement row-level and column-level security

Implement Azure RBAC

Implement POSIX-like ACLs for Data Lake Storage Gen2

Implement a data retention policy

Implement a data auditing strategy

Manage identities, keys, and secrets across different data platform technologies

Implement secure endpoints (private and public)

Implement resource tokens in Azure Databricks

Load a DataFrame with sensitive information

Write encrypted data to tables or Parquet files

Manage sensitive information

Monitor and optimize data storage and data processing (10–15%)

Monitor data storage and data processing

Implement logging used by Azure Monitor

Configure monitoring services

Measure performance of data movement

Monitor and update statistics about data across a system

Monitor data pipeline performance

Measure query performance

Monitor cluster performance

Understand custom logging options

Schedule and monitor pipeline tests

Interpret Azure Monitor metrics and logs

Interpret a Spark directed acyclic graph (DAG)

Optimize and troubleshoot data storage and data processing

Compact small files

Rewrite user-defined functions (UDFs)

Handle skew in data

Handle data spill

Tune shuffle partitions

Find shuffling in a pipeline

Optimize resource management

Tune queries by using indexers

Tune queries by using cache

Optimize pipelines for analytical or transactional purposes

Optimize pipeline for descriptive versus analytical workloads

Troubleshoot a failed spark job

Troubleshoot a failed pipeline run