Summer Special Sale Limited Time 60% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: 713PS592

Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5 Databricks Certified Associate Developer for Apache Spark 3.5-Python Questions and Answers

Questions 4

A data engineer is working on the DataFrame:

Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5 Question 4

(Referring to the table image: it has columnsId,Name,count, andtimestamp.)

Which code fragment should the engineer use to extract the unique values in theNamecolumn into an alphabetically ordered list?

Options:

A.

df.select("Name").orderBy(df["Name"].asc())

B.

df.select("Name").distinct().orderBy(df["Name"])

C.

df.select("Name").distinct()

D.

df.select("Name").distinct().orderBy(df["Name"].desc())

Buy Now
Questions 5

Which command overwrites an existing JSON file when writing a DataFrame?

Options:

A.

df.write.mode("overwrite").json("path/to/file")

B.

df.write.overwrite.json("path/to/file")

C.

df.write.json("path/to/file", overwrite=True)

D.

df.write.format("json").save("path/to/file", mode="overwrite")

Buy Now
Questions 6

A data engineer replaces the exact percentile() function with approx_percentile() to improve performance, but the results are drifting too far from expected values.

Which change should be made to solve the issue?

Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5 Question 6

Options:

A.

Decrease the first value of the percentage parameter to increase the accuracy of the percentile ranges

B.

Decrease the value of the accuracy parameter in order to decrease the memory usage but also improve the accuracy

C.

Increase the last value of the percentage parameter to increase the accuracy of the percentile ranges

D.

Increase the value of the accuracy parameter in order to increase the memory usage but also improve the accuracy

Buy Now
Questions 7

Which configuration can be enabled to optimize the conversion between Pandas and PySpark DataFrames using Apache Arrow?

Options:

A.

spark.conf.set("spark.pandas.arrow.enabled", "true")

B.

spark.conf.set("spark.sql.execution.arrow.pyspark.enabled", "true")

C.

spark.conf.set("spark.sql.execution.arrow.enabled", "true")

D.

spark.conf.set("spark.sql.arrow.pandas.enabled", "true")

Buy Now
Questions 8

A Data Analyst needs to retrieve employees with 5 or more years of tenure.

Which code snippet filters and shows the list?

Options:

A.

employees_df.filter(employees_df.tenure >= 5).show()

B.

employees_df.where(employees_df.tenure >= 5)

C.

filter(employees_df.tenure >= 5)

D.

employees_df.filter(employees_df.tenure >= 5).collect()

Buy Now
Questions 9

A data engineer is reviewing a Spark application that applies several transformations to a DataFrame but notices that the job does not start executing immediately.

Which two characteristics of Apache Spark's execution model explain this behavior?

Choose 2 answers:

Options:

A.

The Spark engine requires manual intervention to start executing transformations.

B.

Only actions trigger the execution of the transformation pipeline.

C.

Transformations are executed immediately to build the lineage graph.

D.

The Spark engine optimizes the execution plan during the transformations, causing delays.

E.

Transformations are evaluated lazily.

Buy Now
Questions 10

A data engineer observes that an upstream streaming source sends duplicate records, where duplicates share the same key and have at most a 30-minute difference inevent_timestamp. The engineer adds:

dropDuplicatesWithinWatermark("event_timestamp", "30 minutes")

What is the result?

Options:

A.

It is not able to handle deduplication in this scenario

B.

It removes duplicates that arrive within the 30-minute window specified by the watermark

C.

It removes all duplicates regardless of when they arrive

D.

It accepts watermarks in seconds and the code results in an error

Buy Now
Questions 11

A Spark application is experiencing performance issues in client mode because the driver is resource-constrained.

How should this issue be resolved?

Options:

A.

Add more executor instances to the cluster

B.

Increase the driver memory on the client machine

C.

Switch the deployment mode to cluster mode

D.

Switch the deployment mode to local mode

Buy Now
Questions 12

A data engineer is working ona Streaming DataFrame streaming_df with the given streaming data:

Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5 Question 12

Which operation is supported with streamingdf ?

Options:

A.

streaming_df. select (countDistinct ("Name") )

B.

streaming_df.groupby("Id") .count ()

C.

streaming_df.orderBy("timestamp").limit(4)

D.

streaming_df.filter (col("count") < 30).show()

Buy Now
Questions 13

The following code fragment results in an error:

Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5 Question 13

Which code fragment should be used instead?

A)

Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5 Question 13

B)

Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5 Question 13

C)

Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5 Question 13

D)

Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5 Question 13

Options:

Buy Now
Questions 14

You have:

DataFrame A: 128 GB of transactions

DataFrame B: 1 GB user lookup table

Which strategy is correct for broadcasting?

Options:

A.

DataFrame B should be broadcasted because it is smaller and will eliminate the need for shuffling itself

B.

DataFrame B should be broadcasted because it is smaller and will eliminate the need for shuffling DataFrame A

C.

DataFrame A should be broadcasted because it is larger and will eliminate the need for shuffling DataFrame B

D.

DataFrame A should be broadcasted because it is smaller and will eliminate the need for shuffling itself

Buy Now
Questions 15

A Spark developer is building an app to monitor task performance. They need to track the maximum task processing time per worker node and consolidate it on the driver for analysis.

Which technique should be used?

Options:

A.

Use an RDD action like reduce() to compute the maximum time

B.

Use an accumulator to record the maximum time on the driver

C.

Broadcast a variable to share the maximum time among workers

D.

Configure the Spark UI to automatically collect maximum times

Buy Now
Questions 16

A data engineer wants to create a Streaming DataFrame that reads from a Kafka topic called feed.

Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5 Question 16

Which code fragment should be inserted in line 5 to meet the requirement?

Code context:

spark \

.readStream \

.format("kafka") \

.option("kafka.bootstrap.servers","host1:port1,host2:port2") \

.[LINE5] \

.load()

Options:

Options:

A.

.option("subscribe", "feed")

B.

.option("subscribe.topic", "feed")

C.

.option("kafka.topic", "feed")

D.

.option("topic", "feed")

Buy Now
Questions 17

A data scientist is working on a project that requires processing large amounts of structured data, performing SQL queries, and applying machine learning algorithms. The data scientist is considering using Apache Spark for this task.

Which combination of Apache Spark modules should the data scientist use in this scenario?

Options:

Options:

A.

Spark DataFrames, Structured Streaming, and GraphX

B.

Spark SQL, Pandas API on Spark, and Structured Streaming

C.

Spark Streaming, GraphX, and Pandas API on Spark

D.

Spark DataFrames, Spark SQL, and MLlib

Buy Now
Questions 18

A data engineer wants to write a Spark job that creates a new managed table. If the table already exists, the job should fail and not modify anything.

Which save mode and method should be used?

Options:

A.

saveAsTable with mode ErrorIfExists

B.

saveAsTable with mode Overwrite

C.

save with mode Ignore

D.

save with mode ErrorIfExists

Buy Now
Questions 19

Given a CSV file with the content:

Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5 Question 19

And the following code:

from pyspark.sql.types import *

schema = StructType([

StructField("name", StringType()),

StructField("age", IntegerType())

])

spark.read.schema(schema).csv(path).collect()

What is the resulting output?

Options:

A.

[Row(name='bambi'), Row(name='alladin', age=20)]

B.

[Row(name='alladin', age=20)]

C.

[Row(name='bambi', age=None), Row(name='alladin', age=20)]

D.

The code throws an error due to a schema mismatch.

Buy Now
Questions 20

What is the behavior for functiondate_sub(start, days)if a negative value is passed into thedaysparameter?

Options:

A.

The same start date will be returned

B.

An error message of an invalid parameter will be returned

C.

The number of days specified will be added to the start date

D.

The number of days specified will be removed from the start date

Buy Now
Questions 21

A Spark application developer wants to identify which operations cause shuffling, leading to a new stage in the Spark execution plan.

Which operation results in a shuffle and a new stage?

Options:

A.

DataFrame.groupBy().agg()

B.

DataFrame.filter()

C.

DataFrame.withColumn()

D.

DataFrame.select()

Buy Now
Questions 22

A data engineer is building a Structured Streaming pipeline and wants the pipeline to recover from failures or intentional shutdowns by continuing where the pipeline left off.

How can this be achieved?

Options:

A.

By configuring the optioncheckpointLocationduringreadStream

B.

By configuring the optionrecoveryLocationduring the SparkSession initialization

C.

By configuring the optionrecoveryLocationduringwriteStream

D.

By configuring the optioncheckpointLocationduringwriteStream

Buy Now
Questions 23

An engineer notices a significant increase in the job execution time during the execution of a Spark job. After some investigation, the engineer decides to check the logs produced by the Executors.

How should the engineer retrieve the Executor logs to diagnose performance issues in the Spark application?

Options:

A.

Locate the executor logs on the Spark master node, typically under the/tmpdirectory.

B.

Use the commandspark-submitwith the—verboseflag to print the logs to the console.

C.

Use the Spark UI to select the stage and view the executor logs directly from the stages tab.

D.

Fetch the logs by running a Spark job with thespark-sqlCLI tool.

Buy Now
Questions 24

What is a feature of Spark Connect?

Options:

A.

It supports DataStreamReader, DataStreamWriter, StreamingQuery, and Streaming APIs

B.

Supports DataFrame, Functions, Column, SparkContext PySpark APIs

C.

It supports only PySpark applications

D.

It has built-in authentication

Buy Now
Questions 25

What is the benefit of Adaptive Query Execution (AQE)?

Options:

A.

It allows Spark to optimize the query plan before execution but does not adapt during runtime.

B.

It enables the adjustment of the query plan during runtime, handling skewed data, optimizing join strategies, and improving overall query performance.

C.

It optimizes query execution by parallelizing tasks and does not adjust strategies based on runtime metrics like data skew.

D.

It automatically distributes tasks across nodes in the clusters and does not perform runtime adjustments to the query plan.

Buy Now
Exam Name: Databricks Certified Associate Developer for Apache Spark 3.5-Python
Last Update: Jul 1, 2025
Questions: 85

PDF + Testing Engine

$72.6  $181.49

Testing Engine

$57.8  $144.49
buy now Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5 testing engine

PDF (Q&A)

$49.8  $124.49
buy now Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5 pdf