Spring Sale Limited Time 65% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: pass65

Databricks-Certified-Data-Engineer-Associate Databricks Certified Data Engineer Associate Exam Questions and Answers

Questions 4

In order for Structured Streaming to reliably track the exact progress of the processing so that it can handle any kind of failure by restarting and/or reprocessing, which of the following two approaches is used by Spark to record the offset range of the data being processed in each trigger?

Options:

A.

Checkpointing and Write-ahead Logs

B.

Structured Streaming cannot record the offset range of the data being processed in each trigger.

C.

Replayable Sources and Idempotent Sinks

D.

Write-ahead Logs and Idempotent Sinks

E.

Checkpointing and Idempotent Sinks

Buy Now
Questions 5

A data engineer has realized that the data files associated with a Delta table are incredibly small. They want to compact the small files to form larger files to improve performance.

Which of the following keywords can be used to compact the small files?

Options:

A.

REDUCE

B.

OPTIMIZE

C.

COMPACTION

D.

REPARTITION

E.

VACUUM

Buy Now
Questions 6

A Databricks single-task workflow fails at the last task due to an error in a notebook. The data engineer fixes the mistake in the notebook. What should the data engineer do to rerun the workflow?

Options:

A.

Repair the task

B.

Rerun the pipeline

C.

Restart the Cluster

D.

Switch the cluster

Buy Now
Questions 7

A data engineer has a Job with multiple tasks that runs nightly. Each of the tasks runs slowly because the clusters take a long time to start.

Which of the following actions can the data engineer perform to improve the start up time for the clusters used for the Job?

Options:

A.

They can use endpoints available in Databricks SQL

B.

They can use jobs clusters instead of all-purpose clusters

C.

They can configure the clusters to be single-node

D.

They can use clusters that are from a cluster pool

E.

They can configure the clusters to autoscale for larger data sizes

Buy Now
Questions 8

An engineering manager uses a Databricks SQL query to monitor ingestion latency for each data source. The manager checks the results of the query every day, but they are manually rerunning the query each day and waiting for the results.

Which of the following approaches can the manager use to ensure the results of the query are updated each day?

Options:

A.

They can schedule the query to refresh every 1 day from the SQL endpoint's page in Databricks SQL.

B.

They can schedule the query to refresh every 12 hours from the SQL endpoint's page in Databricks SQL.

C.

They can schedule the query to refresh every 1 day from the query's page in Databricks SQL.

D.

They can schedule the query to run every 1 day from the Jobs UI.

E.

They can schedule the query to run every 12 hours from the Jobs UI.

Buy Now
Questions 9

A data engineer needs access to a table new_table, but they do not have the correct permissions. They can ask the table owner for permission, but they do not know who the table owner is.

Which of the following approaches can be used to identify the owner of new_table?

Options:

A.

Review the Permissions tab in the table's page in Data Explorer

B.

All of these options can be used to identify the owner of the table

C.

Review the Owner field in the table's page in Data Explorer

D.

Review the Owner field in the table's page in the cloud storage solution

E.

There is no way to identify the owner of the table

Buy Now
Questions 10

A data engineer only wants to execute the final block of a Python program if the Python variable day_of_week is equal to 1 and the Python variable review_period is True.

Which of the following control flow statements should the data engineer use to begin this conditionally executed code block?

Options:

A.

if day_of_week = 1 and review_period:

B.

if day_of_week = 1 and review_period = "True":

C.

if day_of_week == 1 and review_period == "True":

D.

if day_of_week == 1 and review_period:

E.

if day_of_week = 1 & review_period: = "True":

Buy Now
Questions 11

A data engineer needs to use a Delta table as part of a data pipeline, but they do not know if they have the appropriate permissions.

In which location can the data engineer review their permissions on the table?

Options:

A.

Jobs

B.

Dashboards

C.

Catalog Explorer

D.

Repos

Buy Now
Questions 12

Which of the following Git operations must be performed outside of Databricks Repos?

Options:

A.

Commit

B.

Pull

C.

Push

D.

Clone

E.

Merge

Buy Now
Questions 13

Which of the following code blocks will remove the rows where the value in column age is greater than 25 from the existing Delta table my_table and save the updated table?

Options:

A.

SELECT * FROM my_table WHERE age > 25;

B.

UPDATE my_table WHERE age > 25;

C.

DELETE FROM my_table WHERE age > 25;

D.

UPDATE my_table WHERE age <= 25;

E.

DELETE FROM my_table WHERE age <= 25;

Buy Now
Questions 14

A data engineer and data analyst are working together on a data pipeline. The data engineer is working on the raw, bronze, and silver layers of the pipeline using Python, and the data analyst is working on the gold layer of the pipeline using SQL The raw source of the pipeline is a streaming input. They now want to migrate their pipeline to use Delta Live Tables.

Which change will need to be made to the pipeline when migrating to Delta Live Tables?

Options:

A.

The pipeline can have different notebook sources in SQL & Python.

B.

The pipeline will need to be written entirely in SQL.

C.

The pipeline will need to be written entirely in Python.

D.

The pipeline will need to use a batch source in place of a streaming source.

Buy Now
Questions 15

An engineering manager wants to monitor the performance of a recent project using a Databricks SQL query. For the first week following the project’s release, the manager wants the query results to be updated every minute. However, the manager is concerned that the compute resources used for the query will be left running and cost the organization a lot of money beyond the first week of the project’s release.

Which of the following approaches can the engineering team use to ensure the query does not cost the organization any money beyond the first week of the project’s release?

Options:

A.

They can set a limit to the number of DBUs that are consumed by the SQL Endpoint.

B.

They can set the query’s refresh schedule to end after a certain number of refreshes.

C.

They cannot ensure the query does not cost the organization money beyond the first week of the project’s release.

D.

They can set a limit to the number of individuals that are able to manage the query’s refresh schedule.

E.

They can set the query’s refresh schedule to end on a certain date in the query scheduler.

Buy Now
Questions 16

A data architect has determined that a table of the following format is necessary:

Databricks-Certified-Data-Engineer-Associate Question 16

Which of the following code blocks uses SQL DDL commands to create an empty Delta table in the above format regardless of whether a table already exists with this name?

Databricks-Certified-Data-Engineer-Associate Question 16

Options:

A.

Option A

B.

Option B

C.

Option C

D.

Option D

E.

Option E

Buy Now
Questions 17

A data analyst has a series of queries in a SQL program. The data analyst wants this program to run every day. They only want the final query in the program to run on Sundays. They ask for help from the data engineering team to complete this task.

Which of the following approaches could be used by the data engineering team to complete this task?

Options:

A.

They could submit a feature request with Databricks to add this functionality.

B.

They could wrap the queries using PySpark and use Python’s control flow system to determine when to run the final query.

C.

They could only run the entire program on Sundays.

D.

They could automatically restrict access to the source table in the final query so that it is only accessible on Sundays.

E.

They could redesign the data model to separate the data used in the final query into a new table.

Buy Now
Questions 18

A data engineer needs to create a table in Databricks using data from a CSV file at location /path/to/csv.

They run the following command:

Databricks-Certified-Data-Engineer-Associate Question 18

Which of the following lines of code fills in the above blank to successfully complete the task?

Options:

A.

None of these lines of code are needed to successfully complete the task

B.

USING CSV

C.

FROM CSV

D.

USING DELTA

E.

FROM "path/to/csv"

Buy Now
Questions 19

A data engineering team has noticed that their Databricks SQL queries are running too slowly when they are submitted to a non-running SQL endpoint. The data engineering team wants this issue to be resolved.

Which of the following approaches can the team use to reduce the time it takes to return results in this scenario?

Options:

A.

They can turn on the Serverless feature for the SQL endpoint and change the Spot Instance Policy to "Reliability Optimized."

B.

They can turn on the Auto Stop feature for the SQL endpoint.

C.

They can increase the cluster size of the SQL endpoint.

D.

They can turn on the Serverless feature for the SQL endpoint.

E.

They can increase the maximum bound of the SQL endpoint's scaling range

Buy Now
Questions 20

What is the structure of an Asset Bundle?

Options:

A.

A single plain text file enumerating the names of assets to be migrated to a new workspace.

B.

A compressed archive (ZIP) that solely contains workspace assets without any accompanying metadata.

C.

A YAML configuration file that specifies the artifacts, resources, and configurations for the project.

D.

A Docker image containing runtime environments and the source code of the assets

Buy Now
Questions 21

A data engineer is attempting to write Python and SQL in the same command cell and is running into an error The engineer thought that it was possible to use a Python variable in a select statement.

Why does the command fail?

Options:

A.

Databricks supports multiple languages but only one per notebook.

B.

Databricks supports language interoperability in the same cell but only between Scala and SQL

C.

Databricks supports language interoperability but only if a special character is used.

D.

Databricks supports one language per cell.

Buy Now
Questions 22

A data engineer has realized that they made a mistake when making a daily update to a table. They need to use Delta time travel to restore the table to a version that is 3 days old. However, when the data engineer attempts to time travel to the older version, they are unable to restore the data because the data files have been deleted.

Which of the following explains why the data files are no longer present?

Options:

A.

The VACUUM command was run on the table

B.

The TIME TRAVEL command was run on the table

C.

The DELETE HISTORY command was run on the table

D.

The OPTIMIZE command was nun on the table

E.

The HISTORY command was run on the table

Buy Now
Questions 23

A new data engineering team team. has been assigned to an ELT project. The new data engineering team will need full privileges on the database customers to fully manage the project.

Which of the following commands can be used to grant full permissions on the database to the new data engineering team?

Options:

A.

GRANT USAGE ON DATABASE customers TO team;

B.

GRANT ALL PRIVILEGES ON DATABASE team TO customers;

C.

GRANT SELECT PRIVILEGES ON DATABASE customers TO teams;

D.

GRANT SELECT CREATE MODIFY USAGE PRIVILEGES ON DATABASE customers TO team;

E.

GRANT ALL PRIVILEGES ON DATABASE customers TO team;

Buy Now
Questions 24

A Delta Live Table pipeline includes two datasets defined using streaming live table. Three datasets are defined against Delta Lake table sources using live table.

The table is configured to run in Production mode using the Continuous Pipeline Mode.

What is the expected outcome after clicking Start to update the pipeline assuming previously unprocessed data exists and all definitions are valid?

Options:

A.

All datasets will be updated once and the pipeline will shut down. The compute resources will be terminated.

B.

All datasets will be updated at set intervals until the pipeline is shut down. The compute resources will persist to allow for additional testing.

C.

All datasets will be updated once and the pipeline will shut down. The compute resources will persist to allow for additional testing.

D.

All datasets will be updated at set intervals until the pipeline is shut down. The compute resources will be deployed for the update and terminated when the pipeline is stopped.

Buy Now
Questions 25

Which of the following describes a benefit of creating an external table from Parquet rather than CSV when using a CREATE TABLE AS SELECT statement?

Options:

A.

Parquet files can be partitioned

B.

CREATE TABLE AS SELECT statements cannot be used on files

C.

Parquet files have a well-defined schema

D.

Parquet files have the ability to be optimized

E.

Parquet files will become Delta tables

Buy Now
Questions 26

A data engineer wants to create a new table containing the names of customers who live in France.

They have written the following command:

CREATE TABLE customersInFrance

_____ AS

SELECT id,

firstName,

lastName

FROM customerLocations

WHERE country = ’FRANCE’;

A senior data engineer mentions that it is organization policy to include a table property indicating that the new table includes personally identifiable information (Pll).

Which line of code fills in the above blank to successfully complete the task?

Options:

A.

COMMENT "Contains PIT

B.

511

C.

"COMMENT PII"

D.

TBLPROPERTIES PII

Buy Now
Questions 27

A data engineer is reviewing the documentation on audit logs in Databricks for compliance purposes and needs to understand the format in which audit logs output events.

How are events formatted in Databricks audit logs?

Options:

A.

In Databricks, audit logs output events in a plain text format. In Databricks, audit logs output events in a JSON format.

B.

In Databricks, audit logs output events in an XML format.

C.

In Databricks, audit logs output events in a CSV format.

Buy Now
Questions 28

A data engineer that is new to using Python needs to create a Python function to add two integers together and return the sum?

Which of the following code blocks can the data engineer use to complete this task?

A)

Databricks-Certified-Data-Engineer-Associate Question 28

B)

Databricks-Certified-Data-Engineer-Associate Question 28

C)

Databricks-Certified-Data-Engineer-Associate Question 28

D)

Databricks-Certified-Data-Engineer-Associate Question 28

E)

Databricks-Certified-Data-Engineer-Associate Question 28

Options:

A.

Option A

B.

Option B

C.

Option C

D.

Option D

E.

Option E

Buy Now
Questions 29

A data engineer needs to determine whether to use the built-in Databricks Notebooks versioning or version their project using Databricks Repos.

Which of the following is an advantage of using Databricks Repos over the Databricks Notebooks versioning?

Options:

A.

Databricks Repos automatically saves development progress

B.

Databricks Repos supports the use of multiple branches

C.

Databricks Repos allows users to revert to previous versions of a notebook

D.

Databricks Repos provides the ability to comment on specific changes

E.

Databricks Repos is wholly housed within the Databricks Lakehouse Platform

Buy Now
Questions 30

In which of the following scenarios should a data engineer select a Task in the Depends On field of a new Databricks Job Task?

Options:

A.

When another task needs to be replaced by the new task

B.

When another task needs to fail before the new task begins

C.

When another task has the same dependency libraries as the new task

D.

When another task needs to use as little compute resources as possible

E.

When another task needs to successfully complete before the new task begins

Buy Now
Questions 31

A data engineer and data analyst are working together on a data pipeline. The data engineer is working on the raw, bronze, and silver layers of the pipeline using Python, and the data analyst is working on the gold layer of the pipeline using SQL. The raw source of the pipeline is a streaming input. They now want to migrate their pipeline to use Delta Live Tables.

Which of the following changes will need to be made to the pipeline when migrating to Delta Live Tables?

Options:

A.

None of these changes will need to be made

B.

The pipeline will need to stop using the medallion-based multi-hop architecture

C.

The pipeline will need to be written entirely in SQL

D.

The pipeline will need to use a batch source in place of a streaming source

E.

The pipeline will need to be written entirely in Python

Buy Now
Questions 32

In which of the following scenarios should a data engineer use the MERGE INTO command instead of the INSERT INTO command?

Options:

A.

When the location of the data needs to be changed

B.

When the target table is an external table

C.

When the source table can be deleted

D.

When the target table cannot contain duplicate records

E.

When the source is not a Delta table

Buy Now
Questions 33

Which of the following statements regarding the relationship between Silver tables and Bronze tables is always true?

Options:

A.

Silver tables contain a less refined, less clean view of data than Bronze data.

B.

Silver tables contain aggregates while Bronze data is unaggregated.

C.

Silver tables contain more data than Bronze tables.

D.

Silver tables contain a more refined and cleaner view of data than Bronze tables.

E.

Silver tables contain less data than Bronze tables.

Buy Now
Questions 34

A data engineering team is using Kafka to capture event data and then ingest it into Databricks. The team wants to be able to see these historical events. Medallion architecture is already in place. The team wants to be mindful of costs.

Where should this historical event data be stored?

Options:

A.

Gold

B.

Silver

C.

Bronze

D.

Raw layer

Buy Now
Questions 35

Which of the following commands can be used to write data into a Delta table while avoiding the writing of duplicate records?

Options:

A.

DROP

B.

IGNORE

C.

MERGE

D.

APPEND

E.

INSERT

Buy Now
Questions 36

Which of the following commands will return the number of null values in the member_id column?

Options:

A.

SELECT count(member_id) FROM my_table;

B.

SELECT count(member_id) - count_null(member_id) FROM my_table;

C.

SELECT count_if(member_id IS NULL) FROM my_table;

D.

SELECT null(member_id) FROM my_table;

E.

SELECT count_null(member_id) FROM my_table;

Buy Now
Questions 37

A data engineer is attempting to drop a Spark SQL table my_table and runs the following command:

DROP TABLE IF EXISTS my_table;

After running this command, the engineer notices that the data files and metadata files have been deleted from the file system.

Which of the following describes why all of these files were deleted?

Options:

A.

The table was managed

B.

The table's data was smaller than 10 GB

C.

The table's data was larger than 10 GB

D.

The table was external

E.

The table did not have a location

Buy Now
Questions 38

Which SQL keyword can be used to convert a table from a long format to a wide format?

Options:

A.

TRANSFORM

B.

PIVOT

C.

SUM

D.

CONVERT

Buy Now
Questions 39

A Delta Live Table pipeline includes two datasets defined using STREAMING LIVE TABLE. Three datasets are defined against Delta Lake table sources using LIVE TABLE.

The table is configured to run in Production mode using the Continuous Pipeline Mode.

Assuming previously unprocessed data exists and all definitions are valid, what is the expected outcome after clicking Start to update the pipeline?

Options:

A.

All datasets will be updated at set intervals until the pipeline is shut down. The compute resources will persist to allow for additional testing.

B.

All datasets will be updated once and the pipeline will persist without any processing. The compute resources will persist but go unused.

C.

All datasets will be updated at set intervals until the pipeline is shut down. The compute resources will be deployed for the update and terminated when the pipeline is stopped.

D.

All datasets will be updated once and the pipeline will shut down. The compute resources will be terminated.

E.

All datasets will be updated once and the pipeline will shut down. The compute resources will persist to allow for additional testing.

Buy Now
Questions 40

Which of the following can be used to simplify and unify siloed data architectures that are specialized for specific use cases?

Options:

A.

None of these

B.

Data lake

C.

Data warehouse

D.

All of these

E.

Data lakehouse

Buy Now
Questions 41

A data engineer is working on a personal laptop and needs to perform complex transformations on data stored in a Delta Lake on cloud storage. The engineer decides to use Databricks Connect to interact with Databricks clusters and work in their local IDE.

How does Databricks Connect enable the engineer to develop, test, and debug code seamlessly on their local machine while interacting with Databricks clusters?

Options:

A.

By allowing direct execution of Spark jobs from the local machine without needing a network connection

B.

By providing a local environment that mimics the Databricks runtime, enabling the engineer to develop, test, and debug code using a specific IDE that is required by Databricks

C.

By providing a local environment that mimics the Databricks runtime, enabling the engineer to develop, test, and debug code using their preferred ide

D.

By providing a local environment that mimics the Databricks runtime, enabling the engineer to develop, test, and debug code only through Databricks' own web interface

Buy Now
Questions 42

Which tool is used by Auto Loader to process data incrementally?

Options:

A.

Spark Structured Streaming

B.

Unity Catalog

C.

Checkpointing

D.

Databricks SQL

Buy Now
Questions 43

Identify the impact of ON VIOLATION DROP ROW and ON VIOLATION FAIL UPDATE for a constraint violation.

A data engineer has created an ETL pipeline using Delta Live table to manage their company travel reimbursement detail, they want to ensure that the if the location details has not been provided by the employee, the pipeline needs to be terminated.

How can the scenario be implemented?

Options:

A.

CONSTRAINT valid_location EXPECT (location = NULL)

B.

CONSTRAINT valid_location EXPECT (location != NULL) ON VIOLATION FAIL UPDATE

C.

CONSTRAINT valid_location EXPECT (location != NULL) ON DROP ROW

D.

CONSTRAINT valid_location EXPECT (location != NULL) ON VIOLATION FAIL

Buy Now
Questions 44

A data engineer has a Python notebook in Databricks, but they need to use SQL to accomplish a specific task within a cell. They still want all of the other cells to use Python without making any changes to those cells.

Which of the following describes how the data engineer can use SQL within a cell of their Python notebook?

Options:

A.

It is not possible to use SQL in a Python notebook

B.

They can attach the cell to a SQL endpoint rather than a Databricks cluster

C.

They can simply write SQL syntax in the cell

D.

They can add %sql to the first line of the cell

E.

They can change the default language of the notebook to SQL

Buy Now
Questions 45

A data engineer is getting a partner organization up to speed with Databricks account. Both teams share some business use cases. The data engineer has to share some of your Unity-Catalog managed delta tables and the notebook jobs creating those tables with the partner organization.

How can the data engineer seamlessly share the required information?

Options:

A.

Zip all the code and share via email and allow data ingestion from your data lake

B.

Data and Notebooks can be shared simply using Unity Catalog.

C.

Share access to codebase via Github and allow them to ingest datasets from your Datalake.

D.

Share required datasets and notebooks via Delta Sharing. Manage permissions via Unity Catalog.

Buy Now
Questions 46

A data engineer has configured a Structured Streaming job to read from a table, manipulate the data, and then perform a streaming write into a new table.

The code block used by the data engineer is below:

Databricks-Certified-Data-Engineer-Associate Question 46

If the data engineer only wants the query to process all of the available data in as many batches as required, which of the following lines of code should the data engineer use to fill in the blank?

Options:

A.

processingTime(1)

B.

trigger(availableNow=True)

C.

trigger(parallelBatch=True)

D.

trigger(processingTime="once")

E.

trigger(continuous="once")

Buy Now
Questions 47

A team creates YAML manifests that declare jobs, resources, and dependencies, then deploys them to Databricks using the Databricks CLI. The deployment succeeds.

Which feature are they using?

Options:

A.

Databricks Asset Bundles

B.

GitHub

C.

Terraform

D.

DataOps

Buy Now
Exam Name: Databricks Certified Data Engineer Associate Exam
Last Update: Mar 19, 2026
Questions: 159

PDF + Testing Engine

$63.52  $181.49

Testing Engine

$50.57  $144.49
buy now Databricks-Certified-Data-Engineer-Associate testing engine

PDF (Q&A)

$43.57  $124.49
buy now Databricks-Certified-Data-Engineer-Associate pdf