Spring Sale Limited Time 65% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: pass65

Databricks-Machine-Learning-Associate Databricks Certified Machine Learning Associate Exam Questions and Answers

Questions 4

A data scientist is using MLflow to track their machine learning experiment. As a part of each of their MLflow runs, they are performing hyperparameter tuning. The data scientist would like to have one parent run for the tuning process with a child run for each unique combination of hyperparameter values. All parent and child runs are being manually started with mlflow.start_run.

Which of the following approaches can the data scientist use to accomplish this MLflow run organization?

Options:

A.

They can turn on Databricks Autologging

B.

They can specify nested=True when starting the child run for each unique combination of hyperparameter values

C.

They can start each child run inside the parent run's indented code block using mlflow.start runO

D.

They can start each child run with the same experiment ID as the parent run

E.

They can specify nested=True when starting the parent run for the tuning process

Buy Now
Questions 5

A machine learning engineer would like to develop a linear regression model with Spark ML to predict the price of a hotel room. They are using the Spark DataFrame train_df to train the model.

The Spark DataFrame train_df has the following schema:

Databricks-Machine-Learning-Associate Question 5

The machine learning engineer shares the following code block:

Databricks-Machine-Learning-Associate Question 5

Which of the following changes does the machine learning engineer need to make to complete the task?

Options:

A.

They need to call the transform method on train df

B.

They need to convert the features column to be a vector

C.

They do not need to make any changes

D.

They need to utilize a Pipeline to fit the model

E.

They need to split the features column out into one column for each feature

Buy Now
Questions 6

A machine learning engineer is using the following code block to scale the inference of a single-node model on a Spark DataFrame with one million records:

Databricks-Machine-Learning-Associate Question 6

Assuming the default Spark configuration is in place, which of the following is a benefit of using an Iterator ?

Options:

A.

The data will be limited to a single executor preventing the model from being loaded multiple times

B.

The model will be limited to a single executor preventing the data from being distributed

C.

The model only needs to be loaded once per executor rather than once per batch during the inference process

D.

The data will be distributed across multiple executors during the inference process

Buy Now
Questions 7

The implementation of linear regression in Spark ML first attempts to solve the linear regression problem using matrix decomposition, but this method does not scale well to large datasets with a large number of variables.

Which of the following approaches does Spark ML use to distribute the training of a linear regression model for large data?

Options:

A.

Logistic regression

B.

Spark ML cannot distribute linear regression training

C.

Iterative optimization

D.

Least-squares method

E.

Singular value decomposition

Buy Now
Questions 8

A data scientist wants to tune a set of hyperparameters for a machine learning model. They have wrapped a Spark ML model in the objective function objective_function and they have defined the search space search_space .

As a result, they have the following code block:

Databricks-Machine-Learning-Associate Question 8

Which of the following changes do they need to make to the above code block in order to accomplish the task?

Options:

A.

Change SparkTrials() to Trials()

B.

Reduce num_evals to be less than 10

C.

Change fmin() to fmax()

D.

Remove the trials=trials argument

E.

Remove the algo=tpe.suggest argument

Buy Now
Questions 9

Which of the following hyperparameter optimization methods automatically makes informed selections of hyperparameter values based on previous trials for each iterative model evaluation?

Options:

A.

Random Search

B.

Halving Random Search

C.

Tree of Parzen Estimators

D.

Grid Search

Buy Now
Questions 10

A data scientist uses 3-fold cross-validation and the following hyperparameter grid when optimizing model hyperparameters via grid search for a classification problem:

● Hyperparameter 1: [2, 5, 10]

● Hyperparameter 2: [50, 100]

Which of the following represents the number of machine learning models that can be trained in parallel during this process?

Options:

A.

3

B.

5

C.

6

D.

18

Buy Now
Questions 11

Which of the following machine learning algorithms typically uses bagging?

Options:

A.

IGradient boosted trees

B.

K-means

C.

Random forest

D.

Decision tree

Buy Now
Questions 12

A data scientist is wanting to explore the Spark DataFrame spark_df. The data scientist wants visual histograms displaying the distribution of numeric features to be included in the exploration.

Which of the following lines of code can the data scientist run to accomplish the task?

Options:

A.

spark_df.describe()

B.

dbutils.data(spark_df).summarize()

C.

This task cannot be accomplished in a single line of code.

D.

spark_df.summary()

E.

dbutils.data.summarize (spark_df)

Buy Now
Questions 13

A data scientist is developing a single-node machine learning model. They have a large number of model configurations to test as a part of their experiment. As a result, the model tuning process takes too long to complete. Which of the following approaches can be used to speed up the model tuning process?

Options:

A.

Implement MLflow Experiment Tracking

B.

Scale up with Spark ML

C.

Enable autoscaling clusters

D.

Parallelize with Hyperopt

Buy Now
Questions 14

A data scientist has been given an incomplete notebook from the data engineering team. The notebook uses a Spark DataFrame spark_df on which the data scientist needs to perform further feature engineering. Unfortunately, the data scientist has not yet learned the PySpark DataFrame API.

Which of the following blocks of code can the data scientist run to be able to use the pandas API on Spark?

Options:

A.

import pyspark.pandas as ps

df = ps.DataFrame(spark_df)

B.

import pyspark.pandas as ps

df = ps.to_pandas(spark_df)

C.

spark_df.to_pandas()

D.

import pandas as pd

df = pd.DataFrame(spark_df)

Buy Now
Questions 15

Which of the following approaches can be used to view the notebook that was run to create an MLflow run?

Options:

A.

Open the MLmodel artifact in the MLflow run paqe

B.

Click the "Models" link in the row corresponding to the run in the MLflow experiment paqe

C.

Click the "Source" link in the row corresponding to the run in the MLflow experiment page

D.

Click the "Start Time" link in the row corresponding to the run in the MLflow experiment page

Buy Now
Questions 16

A data scientist is wanting to explore summary statistics for Spark DataFrame spark_df. The data scientist wants to see the count, mean, standard deviation, minimum, maximum, and interquartile range (IQR) for each numerical feature.

Which of the following lines of code can the data scientist run to accomplish the task?

Options:

A.

spark_df.summary ()

B.

spark_df.stats()

C.

spark_df.describe().head()

D.

spark_df.printSchema()

E.

spark_df.toPandas()

Buy Now
Questions 17

A machine learning engineer has been notified that a new Staging version of a model registered to the MLflow Model Registry has passed all tests. As a result, the machine learning engineer wants to put this model into production by transitioning it to the Production stage in the Model Registry.

From which of the following pages in Databricks Machine Learning can the machine learning engineer accomplish this task?

Options:

A.

The home page of the MLflow Model Registry

B.

The experiment page in the Experiments observatory

C.

The model version page in the MLflow Model Registry

D.

The model page in the MLflow Model Registry

Buy Now
Questions 18

A new data scientist has started working on an existing machine learning project. The project is a scheduled Job that retrains every day. The project currently exists in a Repo in Databricks. The data scientist has been tasked with improving the feature engineering of the pipeline’s preprocessing stage. The data scientist wants to make necessary updates to the code that can be easily adopted into the project without changing what is being run each day.

Which approach should the data scientist take to complete this task?

Options:

A.

They can create a new branch in Databricks, commit their changes, and push those changes to the Git provider.

B.

They can clone the notebooks in the repository into a Databricks Workspace folder and make the necessary changes.

C.

They can create a new Git repository, import it into Databricks, and copy and paste the existing code from the original repository before making changes.

D.

They can clone the notebooks in the repository into a new Databricks Repo and make the necessary changes.

Buy Now
Questions 19

A machine learning engineer wants to parallelize the inference of group-specific models using the Pandas Function API. They have developed the apply_model function that will look up and load the correct model for each group, and they want to apply it to each group of DataFrame df .

They have written the following incomplete code block:

Databricks-Machine-Learning-Associate Question 19

Which piece of code can be used to fill in the above blank to complete the task?

Options:

A.

applyInPandas

B.

groupedApplyInPandas

C.

mapInPandas

D.

predict

Buy Now
Questions 20

Which of the following describes the relationship between native Spark DataFrames and pandas API on Spark DataFrames?

Options:

A.

pandas API on Spark DataFrames are single-node versions of Spark DataFrames with additional metadata

B.

pandas API on Spark DataFrames are more performant than Spark DataFrames

C.

pandas API on Spark DataFrames are made up of Spark DataFrames and additional metadata

D.

pandas API on Spark DataFrames are less mutable versions of Spark DataFrames

Buy Now
Questions 21

Which of the following tools can be used to distribute large-scale feature engineering without the use of a UDF or pandas Function API for machine learning pipelines?

Options:

A.

Keras

B.

pandas

C.

PvTorch

D.

Spark ML

E.

Scikit-learn

Buy Now
Questions 22

What is the name of the method that transforms categorical features into a series of binary indicator feature variables?

Options:

A.

Leave-one-out encoding

B.

Target encoding

C.

One-hot encoding

D.

Categorical

E.

String indexing

Buy Now
Exam Name: Databricks Certified Machine Learning Associate Exam
Last Update: Apr 30, 2026
Questions: 74

PDF + Testing Engine

$63.52  $181.49

Testing Engine

$50.57  $144.49
buy now Databricks-Machine-Learning-Associate testing engine

PDF (Q&A)

$43.57  $124.49
buy now Databricks-Machine-Learning-Associate pdf