Which of the following code blocks returns a single-row DataFrame that only has a column corr which shows the Pearson correlation coefficient between columns predError and value in DataFrame
transactionsDf?
Which of the following code blocks applies the Python function to_limit on column predError in table transactionsDf, returning a DataFrame with columns transactionId and result?
Which of the following describes the difference between client and cluster execution modes?
The code block shown below should store DataFrame transactionsDf on two different executors, utilizing the executors ' memory as much as possible, but not writing anything to disk. Choose the
answer that correctly fills the blanks in the code block to accomplish this.
1. from pyspark import StorageLevel
2. transactionsDf.__1__(StorageLevel.__2__).__3__
Which of the following code blocks creates a new 6-column DataFrame by appending the rows of the 6-column DataFrame yesterdayTransactionsDf to the rows of the 6- column DataFrame
todayTransactionsDf, ignoring that both DataFrames have different column names?
The code block displayed below contains an error. The code block should produce a DataFrame with color as the only column and three rows with color values of red, blue, and green, respectively.
Find the error.
Code block:
1. spark.createDataFrame([( " red " ,), ( " blue " ,), ( " green " ,)], " color " )
Instead of calling spark.createDataFrame, just DataFrame should be called.
The code block shown below should return a column that indicates through boolean variables whether rows in DataFrame transactionsDf have values greater or equal to 20 and smaller or equal to
30 in column storeId and have the value 2 in column productId. Choose the answer that correctly fills the blanks in the code block to accomplish this.
transactionsDf.__1__((__2__.__3__) __4__ (__5__))
Which of the following code blocks writes DataFrame itemsDf to disk at storage location filePath, making sure to substitute any existing data at that location?
The code block displayed below contains an error. The code block is intended to write DataFrame transactionsDf to disk as a parquet file in location /FileStore/transactions_split, using column
storeId as key for partitioning. Find the error.
Code block:
transactionsDf.write.format( " parquet " ).partitionOn( " storeId " ).save( " /FileStore/transactions_split " )A.
Which of the following code blocks returns a copy of DataFrame transactionsDf in which column productId has been renamed to productNumber?
In which order should the code blocks shown below be run in order to read a JSON file from location jsonPath into a DataFrame and return only the rows that do not have value 3 in column
productId?
1. importedDf.createOrReplaceTempView( " importedDf " )
2. spark.sql( " SELECT * FROM importedDf WHERE productId != 3 " )
3. spark.sql( " FILTER * FROM importedDf WHERE productId != 3 " )
4. importedDf = spark.read.option( " format " , " json " ).path(jsonPath)
5. importedDf = spark.read.json(jsonPath)
The code block shown below should set the number of partitions that Spark uses when shuffling data for joins or aggregations to 100. Choose the answer that correctly fills the blanks in the code
block to accomplish this.
spark.sql.shuffle.partitions
__1__.__2__.__3__(__4__, 100)
The code block shown below should return a DataFrame with columns transactionsId, predError, value, and f from DataFrame transactionsDf. Choose the answer that correctly fills the blanks in the
code block to accomplish this.
transactionsDf.__1__(__2__)
Which of the following code blocks displays the 10 rows with the smallest values of column value in DataFrame transactionsDf in a nicely formatted way?
Which of the following code blocks reads JSON file imports.json into a DataFrame?
The code block shown below should return a DataFrame with only columns from DataFrame transactionsDf for which there is a corresponding transactionId in DataFrame itemsDf. DataFrame
itemsDf is very small and much smaller than DataFrame transactionsDf. The query should be executed in an optimized way. Choose the answer that correctly fills the blanks in the code block to
accomplish this.
__1__.__2__(__3__, __4__, __5__)
The code block shown below should show information about the data type that column storeId of DataFrame transactionsDf contains. Choose the answer that correctly fills the blanks in the code
block to accomplish this.
Code block:
transactionsDf.__1__(__2__).__3__
In which order should the code blocks shown below be run in order to create a table of all values in column attributes next to the respective values in column supplier in DataFrame itemsDf?
1. itemsDf.createOrReplaceView( " itemsDf " )
2. spark.sql( " FROM itemsDf SELECT ' supplier ' , explode( ' Attributes ' ) " )
3. spark.sql( " FROM itemsDf SELECT supplier, explode(attributes) " )
4. itemsDf.createOrReplaceTempView( " itemsDf " )
The code block shown below should return a new 2-column DataFrame that shows one attribute from column attributes per row next to the associated itemName, for all suppliers in column supplier
whose name includes Sports. Choose the answer that correctly fills the blanks in the code block to accomplish this.
Sample of DataFrame itemsDf:
1. +------+----------------------------------+-----------------------------+-------------------+
2. |itemId|itemName |attributes |supplier |
3. +------+----------------------------------+-----------------------------+-------------------+
4. |1 |Thick Coat for Walking in the Snow|[blue, winter, cozy] |Sports Company Inc.|
5. |2 |Elegant Outdoors Summer Dress |[red, summer, fresh, cooling]|YetiX |
6. |3 |Outdoors Backpack |[green, summer, travel] |Sports Company Inc.|
7. +------+----------------------------------+-----------------------------+-------------------+
Code block:
itemsDf.__1__(__2__).select(__3__, __4__)
Which of the following statements about garbage collection in Spark is incorrect?
The code block displayed below contains an error. The code block should return DataFrame transactionsDf, but with the column storeId renamed to storeNumber. Find the error.
Code block:
transactionsDf.withColumn( " storeNumber " , " storeId " )
Which of the following code blocks returns a one-column DataFrame of all values in column supplier of DataFrame itemsDf that do not contain the letter X? In the DataFrame, every value should
only be listed once.
Sample of DataFrame itemsDf:
1. +------+--------------------+--------------------+-------------------+
2. |itemId| itemName| attributes| supplier|
3. +------+--------------------+--------------------+-------------------+
4. | 1|Thick Coat for Wa...|[blue, winter, cozy]|Sports Company Inc.|
5. | 2|Elegant Outdoors ...|[red, summer, fre...| YetiX|
6. | 3| Outdoors Backpack|[green, summer, t...|Sports Company Inc.|
7. +------+--------------------+--------------------+-------------------+
Which of the following code blocks reads in parquet file /FileStore/imports.parquet as a DataFrame?
Which of the following code blocks returns a copy of DataFrame transactionsDf where the column storeId has been converted to string type?
Which of the following code blocks removes all rows in the 6-column DataFrame transactionsDf that have missing data in at least 3 columns?
The code block shown below should return a copy of DataFrame transactionsDf without columns value and productId and with an additional column associateId that has the value 5. Choose the
answer that correctly fills the blanks in the code block to accomplish this.
transactionsDf.__1__(__2__, __3__).__4__(__5__, ' value ' )
Which of the following code blocks returns a DataFrame where columns predError and productId are removed from DataFrame transactionsDf?
Sample of DataFrame transactionsDf:
1. +-------------+---------+-----+-------+---------+----+
2. |transactionId|predError|value|storeId|productId|f |
3. +-------------+---------+-----+-------+---------+----+
4. |1 |3 |4 |25 |1 |null|
5. |2 |6 |7 |2 |2 |null|
6. |3 |3 |null |25 |3 |null|
7. +-------------+---------+-----+-------+---------+----+
The code block shown below should return a DataFrame with all columns of DataFrame transactionsDf, but only maximum 2 rows in which column productId has at least the value 2. Choose the
answer that correctly fills the blanks in the code block to accomplish this.
transactionsDf.__1__(__2__).__3__
Which of the elements in the labeled panels represent the operation performed for broadcast variables?
Larger image
Which of the following code blocks returns approximately 1000 rows, some of them potentially being duplicates, from the 2000-row DataFrame transactionsDf that only has unique rows?
Which of the following code blocks sorts DataFrame transactionsDf both by column storeId in ascending and by column productId in descending order, in this priority?
The code block shown below should return a one-column DataFrame where the column storeId is converted to string type. Choose the answer that correctly fills the blanks in the code block to
accomplish this.
transactionsDf.__1__(__2__.__3__(__4__))
Which of the following code blocks selects all rows from DataFrame transactionsDf in which column productId is zero or smaller or equal to 3?
The code block displayed below contains one or more errors. The code block should load parquet files at location filePath into a DataFrame, only loading those files that have been modified before
2029-03-20 05:44:46. Spark should enforce a schema according to the schema shown below. Find the error.
Schema:
1. root
2. |-- itemId: integer (nullable = true)
3. |-- attributes: array (nullable = true)
4. | |-- element: string (containsNull = true)
5. |-- supplier: string (nullable = true)
Code block:
1. schema = StructType([
2. StructType( " itemId " , IntegerType(), True),
3. StructType( " attributes " , ArrayType(StringType(), True), True),
4. StructType( " supplier " , StringType(), True)
5. ])
6.
7. spark.read.options( " modifiedBefore " , " 2029-03-20T05:44:46 " ).schema(schema).load(filePath)
Which of the following code blocks uses a schema fileSchema to read a parquet file at location filePath into a DataFrame?
Which of the following code blocks returns a DataFrame showing the mean value of column " value " of DataFrame transactionsDf, grouped by its column storeId?
The code block displayed below contains an error. The code block should configure Spark to split data in 20 parts when exchanging data between executors for joins or aggregations. Find the error.
Code block:
spark.conf.set(spark.sql.shuffle.partitions, 20)
Which of the following code blocks saves DataFrame transactionsDf in location /FileStore/transactions.csv as a CSV file and throws an error if a file already exists in the location?
Which of the following code blocks immediately removes the previously cached DataFrame transactionsDf from memory and disk?
The code block displayed below contains an error. The code block should return all rows of DataFrame transactionsDf, but including only columns storeId and predError. Find the error.
Code block:
spark.collect(transactionsDf.select( " storeId " , " predError " ))
In which order should the code blocks shown below be run in order to return the number of records that are not empty in column value in the DataFrame resulting from an inner join of DataFrame
transactionsDf and itemsDf on columns productId and itemId, respectively?
1. .filter(~isnull(col( ' value ' )))
2. .count()
3. transactionsDf.join(itemsDf, col( " transactionsDf.productId " )==col( " itemsDf.itemId " ))
4. transactionsDf.join(itemsDf, transactionsDf.productId==itemsDf.itemId, how= ' inner ' )
5. .filter(col( ' value ' ).isnotnull())
6. .sum(col( ' value ' ))
The code block shown below should add a column itemNameBetweenSeparators to DataFrame itemsDf. The column should contain arrays of maximum 4 strings. The arrays should be composed of
the values in column itemsDf which are separated at - or whitespace characters. Choose the answer that correctly fills the blanks in the code block to accomplish this.
Sample of DataFrame itemsDf:
1. +------+----------------------------------+-------------------+
2. |itemId|itemName |supplier |
3. +------+----------------------------------+-------------------+
4. |1 |Thick Coat for Walking in the Snow|Sports Company Inc.|
5. |2 |Elegant Outdoors Summer Dress |YetiX |
6. |3 |Outdoors Backpack |Sports Company Inc.|
7. +------+----------------------------------+-------------------+
Code block:
itemsDf.__1__(__2__, __3__(__4__, " [\s\-] " , __5__))
The code block shown below should write DataFrame transactionsDf as a parquet file to path storeDir, using brotli compression and replacing any previously existing file. Choose the answer that
correctly fills the blanks in the code block to accomplish this.
transactionsDf.__1__.format( " parquet " ).__2__(__3__).option(__4__, " brotli " ).__5__(storeDir)
The code block shown below should return the number of columns in the CSV file stored at location filePath. From the CSV file, only lines should be read that do not start with a # character. Choose
the answer that correctly fills the blanks in the code block to accomplish this.
Code block:
__1__(__2__.__3__.csv(filePath, __4__).__5__)