Spring Sale Limited Time 65% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: pass65

Databricks-Certified-Data-Engineer-Associate Databricks Certified Data Engineer Associate Exam Questions and Answers

Questions 4

A Delta Live Table pipeline includes two datasets defined using STREAMING LIVE TABLE. Three datasets are defined against Delta Lake table sources using LIVE TABLE.

The table is configured to run in Production mode using the Continuous Pipeline Mode.

Assuming previously unprocessed data exists and all definitions are valid, what is the expected outcome after clicking Start to update the pipeline?

Options:

A.

All datasets will be updated at set intervals until the pipeline is shut down. The compute resources will persist to allow for additional testing.

B.

All datasets will be updated once and the pipeline will persist without any processing. The compute resources will persist but go unused.

C.

All datasets will be updated at set intervals until the pipeline is shut down. The compute resources will be deployed for the update and terminated when the pipeline is stopped.

D.

All datasets will be updated once and the pipeline will shut down. The compute resources will be terminated.

E.

All datasets will be updated once and the pipeline will shut down. The compute resources will persist to allow for additional testing.

Buy Now
Questions 5

A data engineer is managing a data pipeline in Databricks, where multiple Delta tables are used for various transformations. The team wants to track how data flows through the pipeline, including identifying dependencies between Delta tables, notebooks, jobs, and dashboards. The data engineer is utilizing the Unity Catalog lineage feature to monitor this process.

How does Unity Catalog’s data lineage feature support the visualization of relationships between Delta tables, notebooks, jobs, and dashboards?

Options:

A.

Unity Catalog lineage visualizes dependencies between Delta tables, notebooks, and jobs, but does not provide column-level tracing or relationships with dashboards.

B.

Unity Catalog lineage only supports visualizing relationships at the table level and does not extend to notebooks, jobs, or dashboards.

C.

Unity Catalog lineage provides an interactive graph that tracks dependencies between tables and notebooks but excludes any job-related dependencies or dashboard visualizations.

D.

Unity Catalog provides an interactive graph that visualizes the dependencies between Delta tables, notebooks, jobs, and dashboards, while also supporting column-level tracking of data transformations.

Buy Now
Questions 6

Which of the following describes the storage organization of a Delta table?

Options:

A.

Delta tables are stored in a single file that contains data, history, metadata, and other attributes.

B.

Delta tables store their data in a single file and all metadata in a collection of files in a separate location.

C.

Delta tables are stored in a collection of files that contain data, history, metadata, and other attributes.

D.

Delta tables are stored in a collection of files that contain only the data stored within the table.

E.

Delta tables are stored in a single file that contains only the data stored within the table.

Buy Now
Questions 7

A data engineer has left the organization. The data team needs to transfer ownership of the data engineer’s Delta tables to a new data engineer. The new data engineer is the lead engineer on the data team.

Assuming the original data engineer no longer has access, which of the following individuals must be the one to transfer ownership of the Delta tables in Data Explorer?

Options:

A.

Databricks account representative

B.

This transfer is not possible

C.

Workspace administrator

D.

New lead data engineer

E.

Original data engineer

Buy Now
Questions 8

A data engineer is using the following code block as part of a batch ingestion pipeline to read from a composable table:

Databricks-Certified-Data-Engineer-Associate Question 8

Which of the following changes needs to be made so this code block will work when the transactions table is a stream source?

Options:

A.

Replace predict with a stream-friendly prediction function

B.

Replace schema(schema) with option ( " maxFilesPerTrigger " , 1)

C.

Replace " transactions " with the path to the location of the Delta table

D.

Replace format( " delta " ) with format( " stream " )

E.

Replace spark.read with spark.readStream

Buy Now
Questions 9

Which of the following tools is used by Auto Loader process data incrementally?

Options:

A.

Checkpointing

B.

Spark Structured Streaming

C.

Data Explorer

D.

Unity Catalog

E.

Databricks SQL

Buy Now
Questions 10

Which of the following describes the relationship between Gold tables and Silver tables?

Options:

A.

Gold tables are more likely to contain aggregations than Silver tables.

B.

Gold tables are more likely to contain valuable data than Silver tables.

C.

Gold tables are more likely to contain a less refined view of data than Silver tables.

D.

Gold tables are more likely to contain more data than Silver tables.

E.

Gold tables are more likely to contain truthful data than Silver tables.

Buy Now
Questions 11

A data engineer needs to use a Delta table as part of a data pipeline, but they do not know if they have the appropriate permissions.

In which location can the data engineer review their permissions on the table?

Options:

A.

Jobs

B.

Dashboards

C.

Catalog Explorer

D.

Repos

Buy Now
Questions 12

What are the transformations typically included in building the Bronze layer ?

Options:

A.

Perform extensive data cleansing

B.

Aggregate data from multiple sources

C.

Business rules and transformations

D.

Include columns Load date/time, process ID

Buy Now
Questions 13

Which of the following statements regarding the relationship between Silver tables and Bronze tables is always true?

Options:

A.

Silver tables contain a less refined, less clean view of data than Bronze data.

B.

Silver tables contain aggregates while Bronze data is unaggregated.

C.

Silver tables contain more data than Bronze tables.

D.

Silver tables contain a more refined and cleaner view of data than Bronze tables.

E.

Silver tables contain less data than Bronze tables.

Buy Now
Questions 14

A data engineer is standardizing repository layouts for multiple teams adopting Databricks Asset Bundles. The engineer wants to ensure every project has a single authoritative configuration file at the repository root that defines the bundle name, targets, workspace settings, permissions, and resource mappings (for jobs and pipelines).

Which strategy should the data engineer use to meet this goal?

Options:

A.

Place multiple databricks.yml files under each subfolder (for example, jobs/, pipelines/, workspace/) and merge them at deploy time using the include mapping.

B.

Place exactly one databricks.yml at the repository root; it is the main configuration file and may reference additional configuration files via the include mapping.

C.

Place a databricks.yml in a .databricks/ hidden folder at the repository root; only hidden locations are valid for bundle configs.

D.

Place a databricks.yml at the repository root and optional databricks.yml in subfolders; the CLI prefers .yaml over .yml when both exist.

Buy Now
Questions 15

A data engineer needs to determine whether to use the built-in Databricks Notebooks versioning or version their project using Databricks Repos.

Which of the following is an advantage of using Databricks Repos over the Databricks Notebooks versioning?

Options:

A.

Databricks Repos automatically saves development progress

B.

Databricks Repos supports the use of multiple branches

C.

Databricks Repos allows users to revert to previous versions of a notebook

D.

Databricks Repos provides the ability to comment on specific changes

E.

Databricks Repos is wholly housed within the Databricks Lakehouse Platform

Buy Now
Questions 16

A data engineer wants to create a relational object by pulling data from two tables. The relational object does not need to be used by other data engineers in other sessions. In order to save on storage costs, the data engineer wants to avoid copying and storing physical data.

Which of the following relational objects should the data engineer create?

Options:

A.

Spark SQL Table

B.

View

C.

Database

D.

Temporary view

E.

Delta Table

Buy Now
Questions 17

A data engineer runs a statement every day to copy the previous day’s sales into the table transactions. Each day’s sales are in their own file in the location " /transactions/raw " .

Today, the data engineer runs the following command to complete this task:

Databricks-Certified-Data-Engineer-Associate Question 17

After running the command today, the data engineer notices that the number of records in table transactions has not changed.

Which of the following describes why the statement might not have copied any new records into the table?

Options:

A.

The format of the files to be copied were not included with the FORMAT_OPTIONS keyword.

B.

The names of the files to be copied were not included with the FILES keyword.

C.

The previous day’s file has already been copied into the table.

D.

The PARQUET file format does not support COPY INTO.

E.

The COPY INTO statement requires the table to be refreshed to view the copied rows.

Buy Now
Questions 18

A data engineering project involves processing large batches of data on a daily schedule using ETL. The jobs are resource-intensive and vary in size, requiring a scalable, cost-efficient compute solution that can automatically scale based on the workload.

Which compute approach will satisfy the needs described?

Options:

A.

Databricks SQL Serverless

B.

Dedicated Cluster

C.

All-Purpose Cluster

D.

Job Cluster

Buy Now
Questions 19

Which of the following can be used to simplify and unify siloed data architectures that are specialized for specific use cases?

Options:

A.

None of these

B.

Data lake

C.

Data warehouse

D.

All of these

E.

Data lakehouse

Buy Now
Questions 20

A data engineer is developing a small proof of concept in a notebook. When running the entire notebook, cluster usage spikes. The data engineer wants to keep the development experience and get real-time results.

Which cluster meets these requirements?

Options:

A.

All-Purpose Cluster with a large fixed memory size

B.

All-Purpose Cluster with autoscaling

C.

Job Cluster with autoscaling enabled

D.

Job Cluster with Photon enabled and autoscaling

Buy Now
Questions 21

A Data Engineer is building a simple data pipeline using Delta Live Tables (DLT) in Databricksto ingest customer data. The raw customer data is stored in a cloud storage location in JSON format. The task is to create a DLT pipeline that reads the rawJSON data and writes it into a Delta table for further processing.

Which code snippet will correctly ingest the raw JSON data and create a Delta table using DLT?

A)

Databricks-Certified-Data-Engineer-Associate Question 21

B)

Databricks-Certified-Data-Engineer-Associate Question 21

C)

Databricks-Certified-Data-Engineer-Associate Question 21

D)

Databricks-Certified-Data-Engineer-Associate Question 21

Options:

A.

Option A

B.

Option B

C.

Option C

D.

Option D

Buy Now
Questions 22

Identify the impact of ON VIOLATION DROP ROW and ON VIOLATION FAIL UPDATE for a constraint violation.

A data engineer has created an ETL pipeline using Delta Live table to manage their company travel reimbursement detail, they want to ensure that the if the location details has not been provided by the employee, the pipeline needs to be terminated.

How can the scenario be implemented?

Options:

A.

CONSTRAINT valid_location EXPECT (location = NULL)

B.

CONSTRAINT valid_location EXPECT (location != NULL) ON VIOLATION FAIL UPDATE

C.

CONSTRAINT valid_location EXPECT (location != NULL) ON DROP ROW

D.

CONSTRAINT valid_location EXPECT (location != NULL) ON VIOLATION FAIL

Buy Now
Questions 23

A data engineer is decommissioning a sandbox schema in Unity Catalog. Some tables are ephemeral staging outputs that can be safely removed entirely, but a few tables point at shared cloud storage used by downstream jobs outside Databricks. The engineer must avoid deleting any shared files when cleaning up catalog objects.

How does Unity Catalog behave when dropping Managed vs External tables?

Options:

A.

Drop all tables; Databricks will only remove metadata for both managed and external tables

B.

Drop managed tables that are ephemeral and drop external tables; files for both remain for 7 days

C.

Drop managed staging tables to remove data and metadata, and drop external tables to remove only metadata

D.

Drop external tables first to delete their files, then drop managed tables to keep their files for recovery

Buy Now
Questions 24

A data engineer needs to optimize the data layout and query performance for an e-commerce transactions Delta table. The table is partitioned by " purchase_date " a date column which helps with time-based queries but does not optimize searches on user statistics " customer_id " , a high-cardinality column.

The table is usually queried with filters on " customer_i

d " within specific date ranges, but since this data is spread across multiple files in each partition, it results in full partition scans and increased runtime and costs.

How should the data engineer optimize the Data Layout for efficient reads?

Options:

A.

Alter table implementing liquid clustering on " customerid " while keeping the existing partitioning.

B.

Alter the table to partition by " customer_id " .

C.

Enable delta caching on the cluster so that frequent reads are cached for performance.

D.

Alter the table implementing liquid clustering by " customer_id " and " purchase_date " .

Buy Now
Questions 25

A data engineer is maintaining a data pipeline. Upon data ingestion, the data engineer notices that the source data is starting to have a lower level of quality. The data engineer would like to automate the process of monitoring the quality level.

Which of the following tools can the data engineer use to solve this problem?

Options:

A.

Unity Catalog

B.

Data Explorer

C.

Delta Lake

D.

Delta Live Tables

E.

Auto Loader

Buy Now
Questions 26

A data engineer has been given a new record of data:

id STRING = ' a1 '

rank INTEGER = 6

rating FLOAT = 9.4

Which of the following SQL commands can be used to append the new record to an existing Delta table my_table?

Options:

A.

INSERT INTO my_table VALUES ( ' a1 ' , 6, 9.4)

B.

my_table UNION VALUES ( ' a1 ' , 6, 9.4)

C.

INSERT VALUES ( ' a1 ' , 6, 9.4) INTO my_table

D.

UPDATE my_table VALUES ( ' a1 ' , 6, 9.4)

E.

UPDATE VALUES ( ' a1 ' , 6, 9.4) my_table

Buy Now
Questions 27

A data engineer wants to schedule their Databricks SQL dashboard to refresh once per day, but they only want the associated SQL endpoint to be running when it is necessary.

Which of the following approaches can the data engineer use to minimize the total running time of the SQL endpoint used in the refresh schedule of their dashboard?

Options:

A.

They can ensure the dashboard’s SQL endpoint matches each of the queries’ SQL endpoints.

B.

They can set up the dashboard’s SQL endpoint to be serverless.

C.

They can turn on the Auto Stop feature for the SQL endpoint.

D.

They can reduce the cluster size of the SQL endpoint.

E.

They can ensure the dashboard’s SQL endpoint is not one of the included query’s SQL endpoint.

Buy Now
Questions 28

In order for Structured Streaming to reliably track the exact progress of the processing so that it can handle any kind of failure by restarting and/or reprocessing, which of the following two approaches is used by Spark to record the offset range of the data being processed in each trigger?

Options:

A.

Checkpointing and Write-ahead Logs

B.

Structured Streaming cannot record the offset range of the data being processed in each trigger.

C.

Replayable Sources and Idempotent Sinks

D.

Write-ahead Logs and Idempotent Sinks

E.

Checkpointing and Idempotent Sinks

Buy Now
Questions 29

What is the primary function of the Silver layer in the Databricks medallion architecture?

Options:

A.

lngest raw data in its original state

B.

Validate, clean, and deduplicate data for further processing

C.

Aggregate and enrich data for business analytics

D.

Store historical data solely for auditing purposes

Buy Now
Questions 30

A data engineer has configured a Structured Streaming job to read from a table, manipulate the data, and then perform a streaming write into a new table.

The cade block used by the data engineer is below:

Databricks-Certified-Data-Engineer-Associate Question 30

If the data engineer only wants the query to execute a micro-batch to process data every 5 seconds, which of the following lines of code should the data engineer use to fill in the blank?

Options:

A.

trigger( " 5 seconds " )

B.

trigger()

C.

trigger(once= " 5 seconds " )

D.

trigger(processingTime= " 5 seconds " )

E.

trigger(continuous= " 5 seconds " )

Buy Now
Questions 31

A data engineer works for an organization that must meet a stringent Service Level Agreement (SLA) that demands minimal runtime errors and high availability for its data processing pipelines. The data engineer wants to avoid the operational overhead of managing and tuning clusters.

Which architectural solution will meet the requirements?

Options:

A.

Implement a hybrid approach with scheduled batch jobs on custom cloud VMs.

B.

Use an auto-scaling cluster configured and monitored by the user.

C.

Utilize Databricks serverless compute that automatically optimizes resources and abstracts cluster management.

D.

Deploy a dedicated, manually managed cluster optimized by in-house IT staff.

Buy Now
Questions 32

What is stored in a Databricks customer ' s cloud account?

Options:

A.

Data

B.

Cluster management metadata

C.

Databricks web application

D.

Notebooks

Buy Now
Questions 33

A data engineer manages multiple external tables linked to various data sources. The data engineer wants to manage these external tables efficiently and ensure that only the necessary permissions are granted to users for accessing specific external tables.

How should the data engineer manage access to these external tables?

Options:

A.

Create a single user role with full access to all external tables and assign it to all users.

B.

Use Unity Catalog to manage access controls and permissions for each external table individually.

C.

Set up Azure Blob Storage permissions at the container level, allowing access to all external tables.

D.

Grant permissions on the Databricks workspace level, which will automatically apply to all external tables.

Buy Now
Questions 34

Which of the following data lakehouse features results in improved data quality over a traditional data lake?

Options:

A.

A data lakehouse provides storage solutions for structured and unstructured data.

B.

A data lakehouse supports ACID-compliant transactions.

C.

A data lakehouse allows the use of SQL queries to examine data.

D.

A data lakehouse stores data in open formats.

E.

A data lakehouse enables machine learning and artificial Intelligence workloads.

Buy Now
Questions 35

A data engineer is attempting to drop a Spark SQL table my_table. The data engineer wants to delete all table metadata and data.

They run the following command:

DROP TABLE IF EXISTS my_table

While the object no longer appears when they run SHOW TABLES, the data files still exist.

Which of the following describes why the data files still exist and the metadata files were deleted?

Options:

A.

The table’s data was larger than 10 GB

B.

The table’s data was smaller than 10 GB

C.

The table was external

D.

The table did not have a location

E.

The table was managed

Buy Now
Questions 36

A data architect has determined that a table of the following format is necessary:

Databricks-Certified-Data-Engineer-Associate Question 36

Which of the following code blocks uses SQL DDL commands to create an empty Delta table in the above format regardless of whether a table already exists with this name?

Databricks-Certified-Data-Engineer-Associate Question 36

Options:

A.

Option A

B.

Option B

C.

Option C

D.

Option D

E.

Option E

Buy Now
Questions 37

Which of the following is stored in the Databricks customer ' s cloud account?

Options:

A.

Databricks web application

B.

Cluster management metadata

C.

Repos

D.

Data

E.

Notebooks

Buy Now
Questions 38

What is the structure of an Asset Bundle?

Options:

A.

A single plain text file enumerating the names of assets to be migrated to a new workspace.

B.

A compressed archive (ZIP) that solely contains workspace assets without any accompanying metadata.

C.

A YAML configuration file that specifies the artifacts, resources, and configurations for the project.

D.

A Docker image containing runtime environments and the source code of the assets

Buy Now
Questions 39

Databricks-Certified-Data-Engineer-Associate Question 39

Calculate the total sales amount for each region and store the results in a new dataframe called region_sales.

Given the expected result:

Databricks-Certified-Data-Engineer-Associate Question 39

Which code will generate the expected result?

Options:

A.

region_sales = sales_df.groupBy( " region " ).agg(sum( " sales_amountM).alias( " total_sales_amount " ))

B.

region_sales = sales_df. sum ( " salen_aiTiount " ) . groupBy ( " region " ) .alias ( " total_sale3_amount " )

C.

region_sales= sales_df.groupBy( " category " ).sum(nsales_amount " ).alias( " t_otal_sales_amounl " )

D.

region sales - sales_df.agg(sum( " sales_amount " ).groupBy( " region " ).alias( " total sales amount " ))

Buy Now
Questions 40

Which of the following describes the type of workloads that are always compatible with Auto Loader?

Options:

A.

Dashboard workloads

B.

Streaming workloads

C.

Machine learning workloads

D.

Serverless workloads

E.

Batch workloads

Buy Now
Questions 41

An engineering manager uses a Databricks SQL query to monitor ingestion latency for each data source. The manager checks the results of the query every day, but they are manually rerunning the query each day and waiting for the results.

Which of the following approaches can the manager use to ensure the results of the query are updated each day?

Options:

A.

They can schedule the query to refresh every 1 day from the SQL endpoint ' s page in Databricks SQL.

B.

They can schedule the query to refresh every 12 hours from the SQL endpoint ' s page in Databricks SQL.

C.

They can schedule the query to refresh every 1 day from the query ' s page in Databricks SQL.

D.

They can schedule the query to run every 1 day from the Jobs UI.

E.

They can schedule the query to run every 12 hours from the Jobs UI.

Buy Now
Questions 42

A data engineer streams customer orders into a Kafka topic (orders_topic) and is currently writing the ingestion script of a DLT pipeline. The data engineer needs to ingest the data from Kafka brokers to DLT using Databricks

What is the correct code for ingesting the data?

A)

Databricks-Certified-Data-Engineer-Associate Question 42

B)

Databricks-Certified-Data-Engineer-Associate Question 42

C)

Databricks-Certified-Data-Engineer-Associate Question 42

D)

Databricks-Certified-Data-Engineer-Associate Question 42

Options:

A.

Option A

B.

Option B

C.

Option C

D.

Option D

Buy Now
Questions 43

A data engineer needs to combine sales data from an on-premises PostgreSQL database with customer data in Azure Synapse for a comprehensive report. The goal is to avoid data duplication and ensure up-to-date information

How should the data engineer achieve this using Databricks?

Options:

A.

Develop custom ETL pipelines to ingest data into Databricks

B.

Use Lakehouse Federation to query both data sources directly

C.

Manually synchronize data from both sources into a single database

D.

Export data from both sources to CSV files and upload them to Databricks

Buy Now
Questions 44

Which two components function in the DB platform architecture’s control plane? (Choose two.)

Options:

A.

Virtual Machines

B.

Compute Orchestration

C.

Serverless Compute

D.

Compute

E.

Unity Catalog

Buy Now
Questions 45

Which of the following code blocks will remove the rows where the value in column age is greater than 25 from the existing Delta table my_table and save the updated table?

Options:

A.

SELECT * FROM my_table WHERE age > 25;

B.

UPDATE my_table WHERE age > 25;

C.

DELETE FROM my_table WHERE age > 25;

D.

UPDATE my_table WHERE age < = 25;

E.

DELETE FROM my_table WHERE age < = 25;

Buy Now
Questions 46

Which of the following SQL keywords can be used to convert a table from a long format to a wide format?

Options:

A.

PIVOT

B.

CONVERT

C.

WHERE

D.

TRANSFORM

E.

SUM

Buy Now
Questions 47

Which SQL code snippet will correctly demonstrate a Data Definition Language (DDL) operation used to create a table?

Options:

A.

DROP TABLE employees;

B.

INSERT INTO employees (id, name) VALUES (1, ' Alice ' );

C.

CRFATF tabif employees ( id INT, name suing

D.

ALTFR TABIF employees add column salary DECTMA(10,2);

Buy Now
Questions 48

A data engineer needs access to a table new_table, but they do not have the correct permissions. They can ask the table owner for permission, but they do not know who the table owner is.

Which of the following approaches can be used to identify the owner of new_table?

Options:

A.

Review the Permissions tab in the table ' s page in Data Explorer

B.

All of these options can be used to identify the owner of the table

C.

Review the Owner field in the table ' s page in Data Explorer

D.

Review the Owner field in the table ' s page in the cloud storage solution

E.

There is no way to identify the owner of the table

Buy Now
Questions 49

A data engineer is attempting to write Python and SQL in the same command cell and is running into an error The engineer thought that it was possible to use a Python variable in a select statement.

Why does the command fail?

Options:

A.

Databricks supports multiple languages but only one per notebook.

B.

Databricks supports language interoperability in the same cell but only between Scala and SQL

C.

Databricks supports language interoperability but only if a special character is used.

D.

Databricks supports one language per cell.

Buy Now
Questions 50

A Databricks workflow fails at the last stage due to an error in a notebook. This workflow runs daily. The data engineer fixes the mistake and wants to rerun the pipeline. This workflow is very costly and time-intensive to run.

Which action should the data engineer do in order to minimise downtime and cost?

Options:

A.

Switch to another cluster

B.

Repair run

C.

Re-run the entire workflow

D.

Restart the cluster

Buy Now
Questions 51

A data engineer is running code in a Databricks Repo that is cloned from a central Git repository. A colleague of the data engineer informs them that changes have been made and synced to the central Git repository. The data engineer now needs to sync their Databricks Repo to get the changes from the central Git repository.

Which of the following Git operations does the data engineer need to run to accomplish this task?

Options:

A.

Merge

B.

Push

C.

Pull

D.

Commit

E.

Clone

Buy Now
Questions 52

Which of the following must be specified when creating a new Delta Live Tables pipeline?

Options:

A.

A key-value pair configuration

B.

The preferred DBU/hour cost

C.

A path to cloud storage location for the written data

D.

A location of a target database for the written data

E.

At least one notebook library to be executed

Buy Now
Exam Name: Databricks Certified Data Engineer Associate Exam
Last Update: Apr 30, 2026
Questions: 176

PDF + Testing Engine

$63.52  $181.49

Testing Engine

$50.57  $144.49
buy now Databricks-Certified-Data-Engineer-Associate testing engine

PDF (Q&A)

$43.57  $124.49
buy now Databricks-Certified-Data-Engineer-Associate pdf