Spring Sale Limited Time 65% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: pass65

Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 Databricks Certified Associate Developer for Apache Spark 3.0 Exam Questions and Answers

Questions 4

Which of the following code blocks returns a single-row DataFrame that only has a column corr which shows the Pearson correlation coefficient between columns predError and value in DataFrame

transactionsDf?

Options:

A.

transactionsDf.select(corr([ " predError " , " value " ]).alias( " corr " )).first()

B.

transactionsDf.select(corr(col( " predError " ), col( " value " )).alias( " corr " )).first()

C.

transactionsDf.select(corr(predError, value).alias( " corr " ))

D.

transactionsDf.select(corr(col( " predError " ), col( " value " )).alias( " corr " ))

(Correct)

E.

transactionsDf.select(corr( " predError " , " value " ))

Buy Now
Questions 5

Which of the following code blocks applies the Python function to_limit on column predError in table transactionsDf, returning a DataFrame with columns transactionId and result?

Options:

A.

1. spark.udf.register( " LIMIT_FCN " , to_limit)

2. spark.sql( " SELECT transactionId, LIMIT_FCN(predError) AS result FROM transactionsDf " )

(Correct)

B.

1. spark.udf.register( " LIMIT_FCN " , to_limit)

2. spark.sql( " SELECT transactionId, LIMIT_FCN(predError) FROM transactionsDf AS result " )

C.

1. spark.udf.register( " LIMIT_FCN " , to_limit)

2. spark.sql( " SELECT transactionId, to_limit(predError) AS result FROM transactionsDf " )

spark.sql( " SELECT transactionId, udf(to_limit(predError)) AS result FROM transactionsDf " )

D.

1. spark.udf.register(to_limit, " LIMIT_FCN " )

2. spark.sql( " SELECT transactionId, LIMIT_FCN(predError) AS result FROM transactionsDf " )

Buy Now
Questions 6

Which of the following describes the difference between client and cluster execution modes?

Options:

A.

In cluster mode, the driver runs on the worker nodes, while the client mode runs the driver on the client machine.

B.

In cluster mode, the driver runs on the edge node, while the client mode runs the driver in a worker node.

C.

In cluster mode, each node will launch its own executor, while in client mode, executors will exclusively run on the client machine.

D.

In client mode, the cluster manager runs on the same host as the driver, while in cluster mode, the cluster manager runs on a separate node.

E.

In cluster mode, the driver runs on the master node, while in client mode, the driver runs on a virtual machine in the cloud.

Buy Now
Questions 7

The code block shown below should store DataFrame transactionsDf on two different executors, utilizing the executors ' memory as much as possible, but not writing anything to disk. Choose the

answer that correctly fills the blanks in the code block to accomplish this.

1. from pyspark import StorageLevel

2. transactionsDf.__1__(StorageLevel.__2__).__3__

Options:

A.

1. cache

2. MEMORY_ONLY_2

3. count()

B.

1. persist

2. DISK_ONLY_2

3. count()

C.

1. persist

2. MEMORY_ONLY_2

3. select()

D.

1. cache

2. DISK_ONLY_2

3. count()

E.

1. persist

2. MEMORY_ONLY_2

3. count()

Buy Now
Questions 8

Which of the following code blocks creates a new 6-column DataFrame by appending the rows of the 6-column DataFrame yesterdayTransactionsDf to the rows of the 6- column DataFrame

todayTransactionsDf, ignoring that both DataFrames have different column names?

Options:

A.

union(todayTransactionsDf, yesterdayTransactionsDf)

B.

todayTransactionsDf.unionByName(yesterdayTransactionsDf, allowMissingColumns=True)

C.

todayTransactionsDf.unionByName(yesterdayTransactionsDf)

D.

todayTransactionsDf.concat(yesterdayTransactionsDf)

E.

todayTransactionsDf.union(yesterdayTransactionsDf)

Buy Now
Questions 9

The code block displayed below contains an error. The code block should produce a DataFrame with color as the only column and three rows with color values of red, blue, and green, respectively.

Find the error.

Code block:

1. spark.createDataFrame([( " red " ,), ( " blue " ,), ( " green " ,)], " color " )

Instead of calling spark.createDataFrame, just DataFrame should be called.

Options:

A.

The commas in the tuples with the colors should be eliminated.

B.

The colors red, blue, and green should be expressed as a simple Python list, and not a list of tuples.

C.

Instead of color, a data type should be specified.

D.

The " color " expression needs to be wrapped in brackets, so it reads [ " color " ].

Buy Now
Questions 10

The code block shown below should return a column that indicates through boolean variables whether rows in DataFrame transactionsDf have values greater or equal to 20 and smaller or equal to

30 in column storeId and have the value 2 in column productId. Choose the answer that correctly fills the blanks in the code block to accomplish this.

transactionsDf.__1__((__2__.__3__) __4__ (__5__))

Options:

A.

1. select

2. col( " storeId " )

3. between(20, 30)

4. and

5. col( " productId " )==2

B.

1. where

2. col( " storeId " )

3. geq(20).leq(30)

4. &

5. col( " productId " )==2

C.

1. select

2. " storeId "

3. between(20, 30)

4. & &

5. col( " productId " )==2

D.

1. select

2. col( " storeId " )

3. between(20, 30)

4. & &

5. col( " productId " )=2

E.

1. select

2. col( " storeId " )

3. between(20, 30)

4. &

5. col( " productId " )==2

Buy Now
Questions 11

Which of the following statements about storage levels is incorrect?

Options:

A.

The cache operator on DataFrames is evaluated like a transformation.

B.

In client mode, DataFrames cached with the MEMORY_ONLY_2 level will not be stored in the edge node ' s memory.

C.

Caching can be undone using the DataFrame.unpersist() operator.

D.

MEMORY_AND_DISK replicates cached DataFrames both on memory and disk.

E.

DISK_ONLY will not use the worker node ' s memory.

Buy Now
Questions 12

Which of the following code blocks writes DataFrame itemsDf to disk at storage location filePath, making sure to substitute any existing data at that location?

Options:

A.

itemsDf.write.mode( " overwrite " ).parquet(filePath)

B.

itemsDf.write.option( " parquet " ).mode( " overwrite " ).path(filePath)

C.

itemsDf.write(filePath, mode= " overwrite " )

D.

itemsDf.write.mode( " overwrite " ).path(filePath)

E.

itemsDf.write().parquet(filePath, mode= " overwrite " )

Buy Now
Questions 13

The code block displayed below contains an error. The code block is intended to write DataFrame transactionsDf to disk as a parquet file in location /FileStore/transactions_split, using column

storeId as key for partitioning. Find the error.

Code block:

transactionsDf.write.format( " parquet " ).partitionOn( " storeId " ).save( " /FileStore/transactions_split " )A.

Options:

A.

The format( " parquet " ) expression is inappropriate to use here, " parquet " should be passed as first argument to the save() operator and " /FileStore/transactions_split " as the second argument.

B.

Partitioning data by storeId is possible with the partitionBy expression, so partitionOn should be replaced by partitionBy.

C.

Partitioning data by storeId is possible with the bucketBy expression, so partitionOn should be replaced by bucketBy.

D.

partitionOn( " storeId " ) should be called before the write operation.

E.

The format( " parquet " ) expression should be removed and instead, the information should be added to the write expression like so: write( " parquet " ).

Buy Now
Questions 14

Which of the following code blocks returns a copy of DataFrame transactionsDf in which column productId has been renamed to productNumber?

Options:

A.

transactionsDf.withColumnRenamed( " productId " , " productNumber " )

B.

transactionsDf.withColumn( " productId " , " productNumber " )

C.

transactionsDf.withColumnRenamed( " productNumber " , " productId " )

D.

transactionsDf.withColumnRenamed(col(productId), col(productNumber))

E.

transactionsDf.withColumnRenamed(productId, productNumber)

Buy Now
Questions 15

In which order should the code blocks shown below be run in order to read a JSON file from location jsonPath into a DataFrame and return only the rows that do not have value 3 in column

productId?

1. importedDf.createOrReplaceTempView( " importedDf " )

2. spark.sql( " SELECT * FROM importedDf WHERE productId != 3 " )

3. spark.sql( " FILTER * FROM importedDf WHERE productId != 3 " )

4. importedDf = spark.read.option( " format " , " json " ).path(jsonPath)

5. importedDf = spark.read.json(jsonPath)

Options:

A.

4, 1, 2

B.

5, 1, 3

C.

5, 2

D.

4, 1, 3

E.

5, 1, 2

Buy Now
Questions 16

The code block shown below should set the number of partitions that Spark uses when shuffling data for joins or aggregations to 100. Choose the answer that correctly fills the blanks in the code

block to accomplish this.

spark.sql.shuffle.partitions

__1__.__2__.__3__(__4__, 100)

Options:

A.

1. spark

2. conf

3. set

4. " spark.sql.shuffle.partitions "

B.

1. pyspark

2. config

3. set

4. spark.shuffle.partitions

C.

1. spark

2. conf

3. get

4. " spark.sql.shuffle.partitions "

D.

1. pyspark

2. config

3. set

4. " spark.sql.shuffle.partitions "

E.

1. spark

2. conf

3. set

4. " spark.sql.aggregate.partitions "

Buy Now
Questions 17

The code block shown below should return a DataFrame with columns transactionsId, predError, value, and f from DataFrame transactionsDf. Choose the answer that correctly fills the blanks in the

code block to accomplish this.

transactionsDf.__1__(__2__)

Options:

A.

1. filter

2. " transactionId " , " predError " , " value " , " f "

B.

1. select

2. " transactionId, predError, value, f "

C.

1. select

2. [ " transactionId " , " predError " , " value " , " f " ]

D.

1. where

2. col( " transactionId " ), col( " predError " ), col( " value " ), col( " f " )

E.

1. select

2. col([ " transactionId " , " predError " , " value " , " f " ])

Buy Now
Questions 18

Which of the following code blocks displays the 10 rows with the smallest values of column value in DataFrame transactionsDf in a nicely formatted way?

Options:

A.

transactionsDf.sort(asc(value)).show(10)

B.

transactionsDf.sort(col( " value " )).show(10)

C.

transactionsDf.sort(col( " value " ).desc()).head()

D.

transactionsDf.sort(col( " value " ).asc()).print(10)

E.

transactionsDf.orderBy( " value " ).asc().show(10)

Buy Now
Questions 19

Which of the following code blocks reads JSON file imports.json into a DataFrame?

Options:

A.

spark.read().mode( " json " ).path( " /FileStore/imports.json " )

B.

spark.read.format( " json " ).path( " /FileStore/imports.json " )

C.

spark.read( " json " , " /FileStore/imports.json " )

D.

spark.read.json( " /FileStore/imports.json " )

E.

spark.read().json( " /FileStore/imports.json " )

Buy Now
Questions 20

The code block shown below should return a DataFrame with only columns from DataFrame transactionsDf for which there is a corresponding transactionId in DataFrame itemsDf. DataFrame

itemsDf is very small and much smaller than DataFrame transactionsDf. The query should be executed in an optimized way. Choose the answer that correctly fills the blanks in the code block to

accomplish this.

__1__.__2__(__3__, __4__, __5__)

Options:

A.

1. transactionsDf

2. join

3. broadcast(itemsDf)

4. transactionsDf.transactionId==itemsDf.transactionId

5. " outer "

B.

1. transactionsDf

2. join

3. itemsDf

4. transactionsDf.transactionId==itemsDf.transactionId

5. " anti "

C.

1. transactionsDf

2. join

3. broadcast(itemsDf)

4. " transactionId "

5. " left_semi "

D.

1. itemsDf

2. broadcast

3. transactionsDf

4. " transactionId "

5. " left_semi "

E.

1. itemsDf

2. join

3. broadcast(transactionsDf)

4. " transactionId "

5. " left_semi "

Buy Now
Questions 21

The code block shown below should show information about the data type that column storeId of DataFrame transactionsDf contains. Choose the answer that correctly fills the blanks in the code

block to accomplish this.

Code block:

transactionsDf.__1__(__2__).__3__

Options:

A.

1. select

2. " storeId "

3. print_schema()

B.

1. limit

2. 1

3. columns

C.

1. select

2. " storeId "

3. printSchema()

D.

1. limit

2. " storeId "

3. printSchema()

E.

1. select

2. storeId

3. dtypes

Buy Now
Questions 22

In which order should the code blocks shown below be run in order to create a table of all values in column attributes next to the respective values in column supplier in DataFrame itemsDf?

1. itemsDf.createOrReplaceView( " itemsDf " )

2. spark.sql( " FROM itemsDf SELECT ' supplier ' , explode( ' Attributes ' ) " )

3. spark.sql( " FROM itemsDf SELECT supplier, explode(attributes) " )

4. itemsDf.createOrReplaceTempView( " itemsDf " )

Options:

A.

4, 3

B.

1, 3

C.

2

D.

4, 2

E.

1, 2

Buy Now
Questions 23

The code block shown below should return a new 2-column DataFrame that shows one attribute from column attributes per row next to the associated itemName, for all suppliers in column supplier

whose name includes Sports. Choose the answer that correctly fills the blanks in the code block to accomplish this.

Sample of DataFrame itemsDf:

1. +------+----------------------------------+-----------------------------+-------------------+

2. |itemId|itemName |attributes |supplier |

3. +------+----------------------------------+-----------------------------+-------------------+

4. |1 |Thick Coat for Walking in the Snow|[blue, winter, cozy] |Sports Company Inc.|

5. |2 |Elegant Outdoors Summer Dress |[red, summer, fresh, cooling]|YetiX |

6. |3 |Outdoors Backpack |[green, summer, travel] |Sports Company Inc.|

7. +------+----------------------------------+-----------------------------+-------------------+

Code block:

itemsDf.__1__(__2__).select(__3__, __4__)

Options:

A.

1. filter

2. col( " supplier " ).isin( " Sports " )

3. " itemName "

4. explode(col( " attributes " ))

B.

1. where

2. col( " supplier " ).contains( " Sports " )

3. " itemName "

4. " attributes "

C.

1. where

2. col(supplier).contains( " Sports " )

3. explode(attributes)

4. itemName

D.

1. where

2. " Sports " .isin(col( " Supplier " ))

3. " itemName "

4. array_explode( " attributes " )

E.

1. filter

2. col( " supplier " ).contains( " Sports " )

3. " itemName "

4. explode( " attributes " )

Buy Now
Questions 24

Which of the following statements about garbage collection in Spark is incorrect?

Options:

A.

Garbage collection information can be accessed in the Spark UI ' s stage detail view.

B.

Optimizing garbage collection performance in Spark may limit caching ability.

C.

Manually persisting RDDs in Spark prevents them from being garbage collected.

D.

In Spark, using the G1 garbage collector is an alternative to using the default Parallel garbage collector.

E.

Serialized caching is a strategy to increase the performance of garbage collection.

Buy Now
Questions 25

Which of the following describes properties of a shuffle?

Options:

A.

Operations involving shuffles are never evaluated lazily.

B.

Shuffles involve only single partitions.

C.

Shuffles belong to a class known as " full transformations " .

D.

A shuffle is one of many actions in Spark.

E.

In a shuffle, Spark writes data to disk.

Buy Now
Questions 26

Which of the following statements about data skew is incorrect?

Options:

A.

Spark will not automatically optimize skew joins by default.

B.

Broadcast joins are a viable way to increase join performance for skewed data over sort-merge joins.

C.

In skewed DataFrames, the largest and the smallest partition consume very different amounts of memory.

D.

To mitigate skew, Spark automatically disregards null values in keys when joining.

E.

Salting can resolve data skew.

Buy Now
Questions 27

The code block displayed below contains an error. The code block should return DataFrame transactionsDf, but with the column storeId renamed to storeNumber. Find the error.

Code block:

transactionsDf.withColumn( " storeNumber " , " storeId " )

Options:

A.

Instead of withColumn, the withColumnRenamed method should be used.

B.

Arguments " storeNumber " and " storeId " each need to be wrapped in a col() operator.

C.

Argument " storeId " should be the first and argument " storeNumber " should be the second argument to the withColumn method.

D.

The withColumn operator should be replaced with the copyDataFrame operator.

E.

Instead of withColumn, the withColumnRenamed method should be used and argument " storeId " should be the first and argument " storeNumber " should be the second argument to that method.

Buy Now
Questions 28

Which of the following code blocks returns a one-column DataFrame of all values in column supplier of DataFrame itemsDf that do not contain the letter X? In the DataFrame, every value should

only be listed once.

Sample of DataFrame itemsDf:

1. +------+--------------------+--------------------+-------------------+

2. |itemId| itemName| attributes| supplier|

3. +------+--------------------+--------------------+-------------------+

4. | 1|Thick Coat for Wa...|[blue, winter, cozy]|Sports Company Inc.|

5. | 2|Elegant Outdoors ...|[red, summer, fre...| YetiX|

6. | 3| Outdoors Backpack|[green, summer, t...|Sports Company Inc.|

7. +------+--------------------+--------------------+-------------------+

Options:

A.

itemsDf.filter(col(supplier).not_contains( ' X ' )).select(supplier).distinct()

B.

itemsDf.select(~col( ' supplier ' ).contains( ' X ' )).distinct()

C.

itemsDf.filter(not(col( ' supplier ' ).contains( ' X ' ))).select( ' supplier ' ).unique()

D.

itemsDf.filter(~col( ' supplier ' ).contains( ' X ' )).select( ' supplier ' ).distinct()

E.

itemsDf.filter(!col( ' supplier ' ).contains( ' X ' )).select(col( ' supplier ' )).unique()

Buy Now
Questions 29

Which of the following code blocks reads in parquet file /FileStore/imports.parquet as a DataFrame?

Options:

A.

spark.mode( " parquet " ).read( " /FileStore/imports.parquet " )

B.

spark.read.path( " /FileStore/imports.parquet " , source= " parquet " )

C.

spark.read().parquet( " /FileStore/imports.parquet " )

D.

spark.read.parquet( " /FileStore/imports.parquet " )

E.

spark.read().format( ' parquet ' ).open( " /FileStore/imports.parquet " )

Buy Now
Questions 30

Which of the following code blocks returns a copy of DataFrame transactionsDf where the column storeId has been converted to string type?

Options:

A.

transactionsDf.withColumn( " storeId " , convert( " storeId " , " string " ))

B.

transactionsDf.withColumn( " storeId " , col( " storeId " , " string " ))

C.

transactionsDf.withColumn( " storeId " , col( " storeId " ).convert( " string " ))

D.

transactionsDf.withColumn( " storeId " , col( " storeId " ).cast( " string " ))

E.

transactionsDf.withColumn( " storeId " , convert( " storeId " ).as( " string " ))

Buy Now
Questions 31

Which of the following code blocks removes all rows in the 6-column DataFrame transactionsDf that have missing data in at least 3 columns?

Options:

A.

transactionsDf.dropna( " any " )

B.

transactionsDf.dropna(thresh=4)

C.

transactionsDf.drop.na( " " ,2)

D.

transactionsDf.dropna(thresh=2)

E.

transactionsDf.dropna( " " ,4)

Buy Now
Questions 32

The code block shown below should return a copy of DataFrame transactionsDf without columns value and productId and with an additional column associateId that has the value 5. Choose the

answer that correctly fills the blanks in the code block to accomplish this.

transactionsDf.__1__(__2__, __3__).__4__(__5__, ' value ' )

Options:

A.

1. withColumn

2. ' associateId '

3. 5

4. remove

5. ' productId '

B.

1. withNewColumn

2. associateId

3. lit(5)

4. drop

5. productId

C.

1. withColumn

2. ' associateId '

3. lit(5)

4. drop

5. ' productId '

D.

1. withColumnRenamed

2. ' associateId '

3. 5

4. drop

5. ' productId '

E.

1. withColumn

2. col(associateId)

3. lit(5)

4. drop

5. col(productId)

Buy Now
Questions 33

Which of the following code blocks returns a DataFrame where columns predError and productId are removed from DataFrame transactionsDf?

Sample of DataFrame transactionsDf:

1. +-------------+---------+-----+-------+---------+----+

2. |transactionId|predError|value|storeId|productId|f |

3. +-------------+---------+-----+-------+---------+----+

4. |1 |3 |4 |25 |1 |null|

5. |2 |6 |7 |2 |2 |null|

6. |3 |3 |null |25 |3 |null|

7. +-------------+---------+-----+-------+---------+----+

Options:

A.

transactionsDf.withColumnRemoved( " predError " , " productId " )

B.

transactionsDf.drop([ " predError " , " productId " , " associateId " ])

C.

transactionsDf.drop( " predError " , " productId " , " associateId " )

D.

transactionsDf.dropColumns( " predError " , " productId " , " associateId " )

E.

transactionsDf.drop(col( " predError " , " productId " ))

Buy Now
Questions 34

The code block shown below should return a DataFrame with all columns of DataFrame transactionsDf, but only maximum 2 rows in which column productId has at least the value 2. Choose the

answer that correctly fills the blanks in the code block to accomplish this.

transactionsDf.__1__(__2__).__3__

Options:

A.

1. where

2. " productId " > 2

3. max(2)

B.

1. where

2. transactionsDf[productId] > = 2

3. limit(2)

C.

1. filter

2. productId > 2

3. max(2)

D.

1. filter

2. col( " productId " ) > = 2

3. limit(2)

E.

1. where

2. productId > = 2

3. limit(2)

Buy Now
Questions 35

Which of the elements in the labeled panels represent the operation performed for broadcast variables?

Larger image

Options:

A.

2, 5

B.

3

C.

2, 3

D.

1, 2

E.

1, 3, 4

Buy Now
Questions 36

Which of the following code blocks returns approximately 1000 rows, some of them potentially being duplicates, from the 2000-row DataFrame transactionsDf that only has unique rows?

Options:

A.

transactionsDf.sample(True, 0.5)

B.

transactionsDf.take(1000).distinct()

C.

transactionsDf.sample(False, 0.5)

D.

transactionsDf.take(1000)

E.

transactionsDf.sample(True, 0.5, force=True)

Buy Now
Questions 37

Which of the following code blocks sorts DataFrame transactionsDf both by column storeId in ascending and by column productId in descending order, in this priority?

Options:

A.

transactionsDf.sort( " storeId " , asc( " productId " ))

B.

transactionsDf.sort(col(storeId)).desc(col(productId))

C.

transactionsDf.order_by(col(storeId), desc(col(productId)))

D.

transactionsDf.sort( " storeId " , desc( " productId " ))

E.

transactionsDf.sort( " storeId " ).sort(desc( " productId " ))

Buy Now
Questions 38

The code block shown below should return a one-column DataFrame where the column storeId is converted to string type. Choose the answer that correctly fills the blanks in the code block to

accomplish this.

transactionsDf.__1__(__2__.__3__(__4__))

Options:

A.

1. select

2. col( " storeId " )

3. cast

4. StringType

B.

1. select

2. col( " storeId " )

3. as

4. StringType

C.

1. cast

2. " storeId "

3. as

4. StringType()

D.

1. select

2. col( " storeId " )

3. cast

4. StringType()

E.

1. select

2. storeId

3. cast

4. StringType()

Buy Now
Questions 39

Which of the following code blocks selects all rows from DataFrame transactionsDf in which column productId is zero or smaller or equal to 3?

Options:

A.

transactionsDf.filter(productId==3 or productId < 1)

B.

transactionsDf.filter((col( " productId " )==3) or (col( " productId " ) < 1))

C.

transactionsDf.filter(col( " productId " )==3 | col( " productId " ) < 1)

D.

transactionsDf.where( " productId " =3).or( " productId " < 1))

E.

transactionsDf.filter((col( " productId " )==3) | (col( " productId " ) < 1))

Buy Now
Questions 40

The code block displayed below contains one or more errors. The code block should load parquet files at location filePath into a DataFrame, only loading those files that have been modified before

2029-03-20 05:44:46. Spark should enforce a schema according to the schema shown below. Find the error.

Schema:

1. root

2. |-- itemId: integer (nullable = true)

3. |-- attributes: array (nullable = true)

4. | |-- element: string (containsNull = true)

5. |-- supplier: string (nullable = true)

Code block:

1. schema = StructType([

2. StructType( " itemId " , IntegerType(), True),

3. StructType( " attributes " , ArrayType(StringType(), True), True),

4. StructType( " supplier " , StringType(), True)

5. ])

6.

7. spark.read.options( " modifiedBefore " , " 2029-03-20T05:44:46 " ).schema(schema).load(filePath)

Options:

A.

The attributes array is specified incorrectly, Spark cannot identify the file format, and the syntax of the call to Spark ' s DataFrameReader is incorrect.

B.

Columns in the schema definition use the wrong object type and the syntax of the call to Spark ' s DataFrameReader is incorrect.

C.

The data type of the schema is incompatible with the schema() operator and the modification date threshold is specified incorrectly.

D.

Columns in the schema definition use the wrong object type, the modification date threshold is specified incorrectly, and Spark cannot identify the file format.

E.

Columns in the schema are unable to handle empty values and the modification date threshold is specified incorrectly.

Buy Now
Questions 41

Which of the following describes Spark actions?

Options:

A.

Writing data to disk is the primary purpose of actions.

B.

Actions are Spark ' s way of exchanging data between executors.

C.

The driver receives data upon request by actions.

D.

Stage boundaries are commonly established by actions.

E.

Actions are Spark ' s way of modifying RDDs.

Buy Now
Questions 42

Which of the following describes Spark ' s standalone deployment mode?

Options:

A.

Standalone mode uses a single JVM to run Spark driver and executor processes.

B.

Standalone mode means that the cluster does not contain the driver.

C.

Standalone mode is how Spark runs on YARN and Mesos clusters.

D.

Standalone mode uses only a single executor per worker per application.

E.

Standalone mode is a viable solution for clusters that run multiple frameworks, not only Spark.

Buy Now
Questions 43

Which of the following code blocks uses a schema fileSchema to read a parquet file at location filePath into a DataFrame?

Options:

A.

spark.read.schema(fileSchema).format( " parquet " ).load(filePath)

B.

spark.read.schema( " fileSchema " ).format( " parquet " ).load(filePath)

C.

spark.read().schema(fileSchema).parquet(filePath)

D.

spark.read().schema(fileSchema).format(parquet).load(filePath)

E.

spark.read.schema(fileSchema).open(filePath)

Buy Now
Questions 44

Which of the following statements about DAGs is correct?

Options:

A.

DAGs help direct how Spark executors process tasks, but are a limitation to the proper execution of a query when an executor fails.

B.

DAG stands for " Directing Acyclic Graph " .

C.

Spark strategically hides DAGs from developers, since the high degree of automation in Spark means that developers never need to consider DAG layouts.

D.

In contrast to transformations, DAGs are never lazily executed.

E.

DAGs can be decomposed into tasks that are executed in parallel.

Buy Now
Questions 45

Which of the following code blocks returns a DataFrame showing the mean value of column " value " of DataFrame transactionsDf, grouped by its column storeId?

Options:

A.

transactionsDf.groupBy(col(storeId).avg())

B.

transactionsDf.groupBy( " storeId " ).avg(col( " value " ))

C.

transactionsDf.groupBy( " storeId " ).agg(avg( " value " ))

D.

transactionsDf.groupBy( " storeId " ).agg(average( " value " ))

E.

transactionsDf.groupBy( " value " ).average()

Buy Now
Questions 46

The code block displayed below contains an error. The code block should configure Spark to split data in 20 parts when exchanging data between executors for joins or aggregations. Find the error.

Code block:

spark.conf.set(spark.sql.shuffle.partitions, 20)

Options:

A.

The code block uses the wrong command for setting an option.

B.

The code block sets the wrong option.

C.

The code block expresses the option incorrectly.

D.

The code block sets the incorrect number of parts.

E.

The code block is missing a parameter.

Buy Now
Questions 47

Which of the following code blocks saves DataFrame transactionsDf in location /FileStore/transactions.csv as a CSV file and throws an error if a file already exists in the location?

Options:

A.

transactionsDf.write.save( " /FileStore/transactions.csv " )

B.

transactionsDf.write.format( " csv " ).mode( " error " ).path( " /FileStore/transactions.csv " )

C.

transactionsDf.write.format( " csv " ).mode( " ignore " ).path( " /FileStore/transactions.csv " )

D.

transactionsDf.write( " csv " ).mode( " error " ).save( " /FileStore/transactions.csv " )

E.

transactionsDf.write.format( " csv " ).mode( " error " ).save( " /FileStore/transactions.csv " )

Buy Now
Questions 48

Which of the following code blocks immediately removes the previously cached DataFrame transactionsDf from memory and disk?

Options:

A.

array_remove(transactionsDf, " * " )

B.

transactionsDf.unpersist()

(Correct)

C.

del transactionsDf

D.

transactionsDf.clearCache()

E.

transactionsDf.persist()

Buy Now
Questions 49

The code block displayed below contains an error. The code block should return all rows of DataFrame transactionsDf, but including only columns storeId and predError. Find the error.

Code block:

spark.collect(transactionsDf.select( " storeId " , " predError " ))

Options:

A.

Instead of select, DataFrame transactionsDf needs to be filtered using the filter operator.

B.

Columns storeId and predError need to be represented as a Python list, so they need to be wrapped in brackets ([]).

C.

The take method should be used instead of the collect method.

D.

Instead of collect, collectAsRows needs to be called.

E.

The collect method is not a method of the SparkSession object.

Buy Now
Questions 50

In which order should the code blocks shown below be run in order to return the number of records that are not empty in column value in the DataFrame resulting from an inner join of DataFrame

transactionsDf and itemsDf on columns productId and itemId, respectively?

1. .filter(~isnull(col( ' value ' )))

2. .count()

3. transactionsDf.join(itemsDf, col( " transactionsDf.productId " )==col( " itemsDf.itemId " ))

4. transactionsDf.join(itemsDf, transactionsDf.productId==itemsDf.itemId, how= ' inner ' )

5. .filter(col( ' value ' ).isnotnull())

6. .sum(col( ' value ' ))

Options:

A.

4, 1, 2

B.

3, 1, 6

C.

3, 1, 2

D.

3, 5, 2

E.

4, 6

Buy Now
Questions 51

The code block shown below should add a column itemNameBetweenSeparators to DataFrame itemsDf. The column should contain arrays of maximum 4 strings. The arrays should be composed of

the values in column itemsDf which are separated at - or whitespace characters. Choose the answer that correctly fills the blanks in the code block to accomplish this.

Sample of DataFrame itemsDf:

1. +------+----------------------------------+-------------------+

2. |itemId|itemName |supplier |

3. +------+----------------------------------+-------------------+

4. |1 |Thick Coat for Walking in the Snow|Sports Company Inc.|

5. |2 |Elegant Outdoors Summer Dress |YetiX |

6. |3 |Outdoors Backpack |Sports Company Inc.|

7. +------+----------------------------------+-------------------+

Code block:

itemsDf.__1__(__2__, __3__(__4__, " [\s\-] " , __5__))

Options:

A.

1. withColumn

2. " itemNameBetweenSeparators "

3. split

4. " itemName "

5. 4

(Correct)

B.

1. withColumnRenamed

2. " itemNameBetweenSeparators "

3. split

4. " itemName "

5. 4

C.

1. withColumnRenamed

2. " itemName "

3. split

4. " itemNameBetweenSeparators "

5. 4

D.

1. withColumn

2. " itemNameBetweenSeparators "

3. split

4. " itemName "

5. 5

E.

1. withColumn

2. itemNameBetweenSeparators

3. str_split

4. " itemName "

5. 5

Buy Now
Questions 52

Which of the following is the idea behind dynamic partition pruning in Spark?

Options:

A.

Dynamic partition pruning is intended to skip over the data you do not need in the results of a query.

B.

Dynamic partition pruning concatenates columns of similar data types to optimize join performance.

C.

Dynamic partition pruning performs wide transformations on disk instead of in memory.

D.

Dynamic partition pruning reoptimizes physical plans based on data types and broadcast variables.

E.

Dynamic partition pruning reoptimizes query plans based on runtime statistics collected during query execution.

Buy Now
Questions 53

The code block shown below should write DataFrame transactionsDf as a parquet file to path storeDir, using brotli compression and replacing any previously existing file. Choose the answer that

correctly fills the blanks in the code block to accomplish this.

transactionsDf.__1__.format( " parquet " ).__2__(__3__).option(__4__, " brotli " ).__5__(storeDir)

Options:

A.

1. save

2. mode

3. " ignore "

4. " compression "

5. path

B.

1. store

2. with

3. " replacement "

4. " compression "

5. path

C.

1. write

2. mode

3. " overwrite "

4. " compression "

5. save

(Correct)

D.

1. save

2. mode

3. " replace "

4. " compression "

5. path

E.

1. write

2. mode

3. " overwrite "

4. compression

5. parquet

Buy Now
Questions 54

The code block shown below should return the number of columns in the CSV file stored at location filePath. From the CSV file, only lines should be read that do not start with a # character. Choose

the answer that correctly fills the blanks in the code block to accomplish this.

Code block:

__1__(__2__.__3__.csv(filePath, __4__).__5__)

Options:

A.

1. size

2. spark

3. read()

4. escape= ' # '

5. columns

B.

1. DataFrame

2. spark

3. read()

4. escape= ' # '

5. shape[0]

C.

1. len

2. pyspark

3. DataFrameReader

4. comment= ' # '

5. columns

D.

1. size

2. pyspark

3. DataFrameReader

4. comment= ' # '

5. columns

E.

1. len

2. spark

3. read

4. comment= ' # '

5. columns

Buy Now
Exam Name: Databricks Certified Associate Developer for Apache Spark 3.0 Exam
Last Update: May 7, 2026
Questions: 180

PDF + Testing Engine

$63.52  $181.49

Testing Engine

$50.57  $144.49
buy now Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 testing engine

PDF (Q&A)

$43.57  $124.49
buy now Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 pdf