Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Azure Data Fundamentals DP-900: Exercises and Questions, Exams of Computer Science

A collection of exercises and questions related to azure data fundamentals (dp-900). It covers various aspects of azure data services, including data storage, processing, and analytics. Useful for students preparing for the dp-900 certification exam or for anyone seeking to enhance their understanding of azure data solutions.

Typology: Exams

2024/2025

Available from 01/23/2025

Shantelle
Shantelle ๐Ÿ‡บ๐Ÿ‡ธ

5

(2)

3K documents

1 / 11

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Azure Data Fundamentals DP-900
Which service can you use to perpetually retrieve data from a Kafka
queue, process the data, and write the data to Azure Data Lake? - โœ”
โœ” Azure Stream Analytics
Which three services can be used to ingest data for stream processing?
- โœ” โœ” Azure Data Lake Storage. Azure Event Hubs. Azure IoT Hub.
Apache Spark in Azure - โœ” โœ” distributed processing framework.
Azure Synapse Analytics
Azure Databricks
Azure HDInsight
Delta Lake - โœ” โœ” Delta Lake is an open-source storage layer that
adds support for transactional consistency, schema enforcement, and
other common data warehousing features to data lake storage. It also
unifies storage for streaming and batch data, and can be used in Spark
to define relational tables for both batch and stream processing. When
used for stream processing, a Delta Lake table can be used as a
streaming source for queries against real-time data, or as a sink to
which a stream of data is written.
The Spark runtimes in Azure Synapse Analytics and Azure Databricks
include support for Delta Lake.
Delta Lake combined with Spark Structured Streaming is a good
solution when you need to abstract batch and stream processed data in
pf3
pf4
pf5
pf8
pf9
pfa

Partial preview of the text

Download Azure Data Fundamentals DP-900: Exercises and Questions and more Exams Computer Science in PDF only on Docsity!

Azure Data Fundamentals DP-

Which service can you use to perpetually retrieve data from a Kafka queue, process the data, and write the data to Azure Data Lake? - โœ” โœ” Azure Stream Analytics

Which three services can be used to ingest data for stream processing?

  • โœ” โœ” Azure Data Lake Storage. Azure Event Hubs. Azure IoT Hub.

Apache Spark in Azure - โœ” โœ” distributed processing framework.

Azure Synapse Analytics

Azure Databricks

Azure HDInsight

Delta Lake - โœ” โœ” Delta Lake is an open-source storage layer that adds support for transactional consistency, schema enforcement, and other common data warehousing features to data lake storage. It also unifies storage for streaming and batch data, and can be used in Spark to define relational tables for both batch and stream processing. When used for stream processing, a Delta Lake table can be used as a streaming source for queries against real-time data, or as a sink to which a stream of data is written.

The Spark runtimes in Azure Synapse Analytics and Azure Databricks include support for Delta Lake.

Delta Lake combined with Spark Structured Streaming is a good solution when you need to abstract batch and stream processed data in

a data lake behind a relational schema for SQL-based querying and analysis.

Azure streaming Analytics - โœ” โœ” Azure Stream Analytics: PaaS solution to define streaming jobs

Spark Structured Streaming: Open-source library on to develop streaming solutions on Apache Spark based services including Azure Synapse Analytics, Azure Databricks and Azure HDInsight

Azure Data Explorer. High-performance database and analytics service that is optimized for ingesting and querying batch or streaming data with a time-series element

Sources for stream processing (ingestion) - โœ” โœ” Azure event hubs

Azure IoT hub

Azure Data lake Store Gen 2

Apache Kafka. (ujsed with Apache Spark

Sinks (output) - โœ” โœ” Azure event hubs

Azure Data lake store gen 2 or azure blob storage

Azure SQL Database or Azure synapse Analytics, or Azure Databricks

Power BI

You need to aggregate and store multiple JSON files that contain records for sales transactions. The solution must minimize the development effort.

ORC - โœ” โœ” Optimized Row Columnar format. organizes data into columns rather than rows.

Parquet - โœ” โœ” Columnar data format. contains row groups.

Which type of Azure Storage is used for VHDs and is optimized for random read and write operations? - โœ” โœ” page blob.

Page blobs are optimized for random access and used for VHDs. Append blobs cannot be updated. Block blobs are not used for VHDs.

Blobs - โœ” โœ” Binary large objects for massive amounts of unstructured data.

Block blobs - โœ” โœ” a set of blocks. Each block can vary in size up to 100 MB. A block blob can contain up to 50,000 blocks, giving a max size of over 4.7 TB. The block is the smallest amount of data that can be read or written as an individual unit. Block blobs are best used to store discrete, large, binary objects that change infrequently.

Page blobs - โœ” โœ” A page blob is organized as a collection of fixed size 512-byte pages. Optimized to support random read and write operations; fetch and store data for a single page if necessary. A page blob can hold up to 8 TB of data. Azure uses page blobs to implement virtual disk storage for virtual machines.

Append blobs - โœ” โœ” An append blob is a block blob optimized to support ammend operations. You can only add blocks to the end of an append blob; updating or deleting existing blocks isn't supported.

Hot tier - โœ” โœ” the default for blob storage access. Use this for blobs that are accessed frequently. The blob data is stored on high- performance media.

Cool tier - โœ” โœ” lower performance and incurs reduced storage charges compared to the hot tier. Use for data accessed infrequently. Can start hot and move to cool.

Archive tier - โœ” โœ” Lowest storage cost, increased latency. Can take hours for data to become available. Must rehydrate.

Which two storage solutions can be mounted in Azure Synapse Analytics and used to process large volumes of data? - โœ” โœ” Azure Blob storage. Azure Data lake storage.

What are two characteristics of Azure Table storage? - โœ” โœ” Each RowKey value is unique within a table partition. Items in the same partitions are stored in a row key order.

You need to replace an existing on-premises SMB shared folder with a cloud solution. Which storage option should you choose? - โœ” โœ” Azure Files. Azure Files allows you to create cloud-based network shares to make documents and other files available to multiple users.

Which Azure Cosmos DB API should you use for data in a graph structure? - โœ” โœ” Apache Gremlin. The Gremlin API is used for graph databases. The MongoDB API stores data in the BSON format. The Table API is used to retrieve key/value pairs. The Cassandra API is used to retrieve tabular data.

Azure Cosmos DB - โœ” โœ” Fully managed and serverless distributed database for applications of any size or scale, with support for both

Azure Cosmos DB for Table - โœ” โœ” Azure Cosmos DB for Table is used to work with data in key-value tables, similar to Azure Table Storage. It offers greater scalability and performance than Azure Table Storage.

Azure Cosmos DB for Apache Cassandra - โœ” โœ” Azure Cosmos DB for Apache Cassandra is compatible with Apache Cassandra, which is a popular open source database that uses a column-family storage structure. Column families are tables, similar to those in a relational database, with the exception that it's not mandatory for every row to have the same columns. Can be queried using SQL

Azure Cosmos DB for Apache Gremlin - โœ” โœ” Azure Cosmos DB for Apache Gremlin is used with data in a graph structure; in which entities are defined as vertices that form nodes in connected graph. Nodes are connected by edges that represent relationships, like this:

Which type of data store uses star schemas, fact tables, and dimension tables? - โœ” โœ” Data warehouses use fact and dimension tables in a star/snowflake schema. Relational databases do not use fact and dimension tables. Cubes are generated from a data warehouse but are a table themselves. Data lakes store files.

Which is the best type of database to use for an organizational chart? - โœ” โœ” Graph. Graph databases are the best option for hierarchical data. Azure SQL Database is the best option for create, read, update, and delete (CRUD) operations and uses the least amount of storage space, but our solution does not require a database management system (DBMS). Object storage is the best option for file storage, not hierarchical databases. Table storage is not suited for files.

You design an application that needs to store data based on the following requirements:

Store historical data from multiple data sources

Load data on a scheduled basis

Use a denormalized star or snowflake schema

Which type of database should you use? - โœ” โœ” OLAP (online analytical processing).

OLAP databases are used for snowflake schemas with historical data.

Which type of database can be used for semi-structured data that will be processed by an Apache Spark pool in Azure Synapse Analytics? - โœ” โœ” column-family. Column-family databases are used to store unstructured, tabular data comprising rows and columns. Azure Synapse Analytics Spark pools do not directly support graph or relational databases.

Which two attributes are characteristics of an analytical data workload?

  • โœ” โœ” optimized for read operations.

highly denormalized. (denormalized means that it's put together in one table to be queried fast)

Which feature of transactional data processing guarantees that concurrent processes cannot see the data in an inconsistent state? - โœ” โœ” Isolation. Isolation in transactional data processing ensures that concurrent transactions cannot interfere with one another and must result in a consistent database state.

What should you include in the recommendation? - โœ” โœ” a stored procedure.

A stored procedure can encapsulate any type of business logic that can be reused in the application. A stored procedure can modify existing data as well as add new entries to tables. A stored procedure can be run from an application as well as from the server.

Which service is managed and serverless, avoids the use of Windows Server licenses, and allows for each workload to have its own instance of the service being used? - โœ” โœ” Azure SQL Database.

Azure SQL Database is a serverless platform as a service (PaaS) SQL instance. SQL Managed Instance is a PaaS service, but databases are maintained in the same SQL Managed Instance cluster. SQL Server on Azure Virtual Machines running Windows or Linux are not serverless options.

Which three open-source databases are available as platform as a service (PaaS) in Azure? - โœ” โœ” MariaDB, PostgreSQL, MySQL

All in Azure Database.

Which data service allows you to control the amount of RAM, change the I/O subsystem configuration, and add or remove CPUs? - โœ” โœ” SQL Server on Azure Virtual Machines.

Which open-source database has built-in support for temporal data? - โœ” โœ” MariaDB

Which two services allow you to pre-process a large volume of data by using Scala? Each correct answer presents a complete solution. - โœ” โœ” Azure Databricks,

a serverless Apache Spark pool in Azure Synapse Analytics

What is a characteristic of batch processing - โœ” โœ” Batch processing is used to execute complex analysis. Batch processing handles a large amount of data at a time. Batch processing is usually measured in minutes and hours.