MLA-C01 AWS Certified Machine Learning Engineer - Associate Questions and Answers

Questions 4

A company is using an Amazon Redshift database as its single data source. Some of the data is sensitive.

A data scientist needs to use some of the sensitive data from the database. An ML engineer must give the data scientist access to the data without transforming the source data and without storing anonymized data in the database.

Which solution will meet these requirements with the LEAST implementation effort?

Options:

Configure dynamic data masking policies to control how sensitive data is shared with the data scientist at query time.

Create a materialized view with masking logic on top of the database. Grant the necessary read permissions to the data scientist.

Unload the Amazon Redshift data to Amazon S3. Use Amazon Athena to create schema-on-read with masking logic. Share the view with the data scientist.

Unload the Amazon Redshift data to Amazon S3. Create an AWS Glue job to anonymize the data. Share the dataset with the data scientist.

Buy Now

Questions 5

A company is developing an application that reads animal descriptions from user prompts and generates images based on the information in the prompts. The application reads a message from an Amazon Simple Queue Service (Amazon SQS) queue. Then the application uses Amazon Titan Image Generator on Amazon Bedrock to generate an image based on the information in the message. Finally, the application removes the message from SQS queue.

Which IAM permissions should the company assign to the application ' s IAM role? (Select TWO.)

Options:

Allow the bedrock:InvokeModel action for the Amazon Titan Image Generator resource.

Allow the bedrock:Get* action for the Amazon Titan Image Generator resource.

Allow the sqs:ReceiveMessage action and the sqs:DeleteMessage action for the SQS queue resource.

Allow the sqs:GetQueueAttributes action and the sqs:DeleteMessage action for the SQS queue resource.

Allow the sagemaker:PutRecord* action for the Amazon Titan Image Generator resource.

Buy Now

Questions 6

An ML engineer wants to run a training job on Amazon SageMaker AI. The training job will train a neural network by using multiple GPUs. The training dataset is stored in Parquet format.

The ML engineer discovered that the Parquet dataset contains files too large to fit into the memory of the SageMaker AI training instances.

Which solution will fix the memory problem?

Options:

Attach an Amazon Elastic Block Store (Amazon EBS) Provisioned IOPS SSD volume to the instance. Store the files in the EBS volume.

Repartition the Parquet files by using Apache Spark on Amazon EMR. Use the repartitioned files for the training job.

Change the instance type to Memory Optimized instances with sufficient memory for the training job.

Use the SageMaker AI distributed data parallelism (SMDDP) library with multiple instances to split the memory usage.

Buy Now

Questions 7

A company uses an Amazon EMR cluster to run a data ingestion process for an ML model. An ML engineer notices that the processing time is increasing.

Which solution will reduce the processing time MOST cost-effectively?

Options:

Use Spot Instances to increase the number of primary nodes.

Use Spot Instances to increase the number of core nodes.

Use Spot Instances to increase the number of task nodes.

Use On-Demand Instances to increase the number of core nodes.

Buy Now

Questions 8

A company has a large, unstructured dataset. The dataset includes many duplicate records across several key attributes.

Which solution on AWS will detect duplicates in the dataset with the LEAST code development?

Options:

Use Amazon Mechanical Turk jobs to detect duplicates.

Use Amazon QuickSight ML Insights to build a custom deduplication model.

Use Amazon SageMaker Data Wrangler to pre-process and detect duplicates.

Use the AWS Glue FindMatches transform to detect duplicates.

Buy Now

Questions 9

An ML engineer is training an ML model to identify medical patients for disease screening. The tabular dataset for training contains 50,000 patient records: 1,000 with the disease and 49,000 without the disease.

The ML engineer splits the dataset into a training dataset, a validation dataset, and a test dataset.

What should the ML engineer do to transform the data and make the data suitable for training?

Options:

Apply principal component analysis (PCA) to oversample the minority class in the training dataset.

Apply Synthetic Minority Oversampling Technique (SMOTE) to generate new synthetic samples of the minority class in the training dataset.

Randomly oversample the majority class in the validation dataset.

Apply k-means clustering to undersample the minority class in the test dataset.

Buy Now

Questions 10

An ML engineer wants to use Amazon SageMaker Data Wrangler to perform preprocessing on a dataset. The ML engineer wants to use the processed dataset to train a classification model. During preprocessing, the ML engineer notices that a text feature has a range of thousands of values that differ only by spelling errors. The ML engineer needs to apply an encoding method so that after preprocessing is complete, the text feature can be used to train the model.

Which solution will meet these requirements?

Options:

Perform ordinal encoding to represent categories of the feature.

Perform similarity encoding to represent categories of the feature.

Perform one-hot encoding to represent categories of the feature.

Perform target encoding to represent categories of the feature.

Buy Now

Questions 11

A company has a team of data scientists who use Amazon SageMaker notebook instances to test ML models. When the data scientists need new permissions, the company attaches the permissions to each individual role that was created during the creation of the SageMaker notebook instance.

The company needs to centralize management of the team ' s permissions.

Which solution will meet this requirement?

Options:

Create a single IAM role that has the necessary permissions. Attach the role to each notebook instance that the team uses.

Create a single IAM group. Add the data scientists to the group. Associate the group with each notebook instance that the team uses.

Create a single IAM user. Attach the AdministratorAccess AWS managed IAM policy to the user. Configure each notebook instance to use the IAM user.

Create a single IAM group. Add the data scientists to the group. Create an IAM role. Attach the AdministratorAccess AWS managed IAM policy to the role. Associate the role with the group. Associate the group with each notebook instance that the team uses.

Buy Now

Answer:

Explanation:

Managing permissions for multiple Amazon SageMaker notebook instances can become complex when handled individually. To centralize and streamline permission management, AWS recommends creating a single IAM role with the necessary permissions and attaching this role to each notebook instance used by the data science team.

Steps to Implement the Solution:

Create a Single IAM Role with Necessary Permissions:

Define an IAM role that encompasses all permissions required by the data scientists for their tasks. This includes permissions for SageMaker operations and any other AWS services they interact with.

AWS provides managed policies like AmazonSageMakerFullAccess that can be attached to the role to grant comprehensive SageMaker permissions.(IAM Policies for SageMaker)

Attach the IAM Role to Each Notebook Instance:

When creating or updating a SageMaker notebook instance, specify the IAM role created in the previous step. This ensures that all notebook instances operate under a consistent set of permissions.

In the SageMaker console, during the notebook instance setup, you can choose an existing IAM role to associate with the instance.(Creating SageMaker Workspaces)

Benefits of This Approach:

Centralized Permission Management:By using a single IAM role, you simplify the process of updating permissions. Changes to the role ' s policies automatically propagate to all associated notebook instances, ensuring consistent access control.

Adherence to Best Practices:AWS recommends using IAM roles to manage permissions for applications running on services like SageMaker. This approach avoids the need to manage individual user permissions separately.(IAM Best Practices for SageMaker)

Alternative Options and Their Drawbacks:

Option B: Creating a single IAM group and adding data scientists to it does not directly associate the group with notebook instances. IAM groups are used to manage user permissions, not to assign roles to AWS resources like notebook instances.

Option C: Using a single IAM user with the AdministratorAccess policy is not recommended due to security risks associated with granting broad permissions and the challenges in managing shared user credentials.

Option D: Associating an IAM group with a role and then with notebook instances is not a valid approach, as IAM groups cannot be directly associated with AWS resources.

Conclusion: Option A is the most effective solution to centralize and manage permissions for SageMaker notebook instances, aligning with AWS best practices for IAM role management.

[References:, AWS Documentation: IAM Policies for SageMaker, AWS Documentation: Creating SageMaker Workspaces, AWS Documentation: IAM Best Practices for SageMaker, , , , , ]

Questions 12

An ML engineer is deploying a generative AI model-based customer support agent that uses Amazon SageMaker AI for inference. The customer support agent must respond to customer questions about topics such as shipping policies, refund processes, and account management. The generative AI model generates one token at a time.

Customers report dissatisfaction with how long the customer support agent takes to generate lengthy responses to questions. The ML engineer must apply an inference optimization technique to improve the performance of the customer support agent.

Which solution will meet this requirement?

Options:

Compilation

Speculative decoding

Quantization

Fast model loading

Buy Now

Questions 13

An ML engineer is analyzing a classification dataset before training a model in Amazon SageMaker AI. The ML engineer suspects that the dataset has a significant imbalance between class labels that could lead to biased model predictions. To confirm class imbalance, the ML engineer needs to select an appropriate pre-training bias metric.

Which metric will meet this requirement?

Options:

Mean squared error (MSE)

Difference in proportions of labels (DPL)

Silhouette score

Structural similarity index measure (SSIM)

Buy Now

Questions 14

An ML engineer needs to use AWS CloudFormation to create an ML model that an Amazon SageMaker endpoint will host.

Which resource should the ML engineer declare in the CloudFormation template to meet this requirement?

Options:

AWS::SageMaker::Model

AWS::SageMaker::Endpoint

AWS::SageMaker::NotebookInstance

AWS::SageMaker::Pipeline

Buy Now

Questions 15

An ML engineer is using Amazon SageMaker JumpStart to fine-tune a Llama 3.2 model for text generation. The ML engineer is using an instruction-based fine-tuning method. The model uses 70 billion parameters.

Select the correct fine-tuning term from the following list to match each description. Select each term one time or not at all. (Select THREE.)

• Hyperparameter tuning

• Low-rank adaptation (LoRA)

• Fully Sharded Data Parallel (FSDP)

• Learning rate

• Int8 quantization

MLA-C01 Question 15

Options:

Buy Now

Questions 16

An ML engineer needs to deploy ML models to get inferences from large datasets in an asynchronous manner. The ML engineer also needs to implement scheduled monitoring of the data quality of the models. The ML engineer must receive alerts when changes in data quality occur.

Which solution will meet these requirements?

Options:

Deploy the models by using scheduled AWS Glue jobs. Use Amazon CloudWatch alarms to monitor the data quality and to send alerts.

Deploy the models by using scheduled AWS Batch jobs. Use AWS CloudTrail to monitor the data quality and to send alerts.

Deploy the models by using Amazon Elastic Container Service (Amazon ECS) on AWS Fargate. Use Amazon EventBridge to monitor the data quality and to send alerts.

Deploy the models by using Amazon SageMaker batch transform. Use SageMaker Model Monitor to monitor the data quality and to send alerts.

Buy Now

Questions 17

A company uses Amazon SageMakerAI to support ML workflows such as model training and deployment.

Select the correct registry from the following list to meet the requirements for each use case with the LEAST operational overhead. Each registry should be selected one or more times. (Select FOUR.)

• Amazon Elastic Container Registry (Amazon ECR)

• SageMaker Model Registry

MLA-C01 Question 17

Options:

Buy Now

Questions 18

A company is exploring generative AI and wants to add a new product feature. An ML engineer is making API calls from existing Amazon EC2 instances to Amazon Bedrock.

The EC2 instances are in a private subnet and must remain private during the implementation. The EC2 instances have a security group that allows access to all IP addresses in the private subnet.

What should the ML engineer do to establish a connection between the EC2 instances and Amazon Bedrock?

Options:

Modify the security group to allow inbound and outbound traffic to and from Amazon Bedrock.

Use AWS PrivateLink to access Amazon Bedrock through an interface VPC endpoint.

Configure Amazon Bedrock to use the private subnet where the EC2 instances are deployed.

Use AWS Direct Connect to link the VPC to Amazon Bedrock.

Buy Now

Questions 19

A company stores time-series data about user clicks in an Amazon S3 bucket. The raw data consists of millions of rows of user activity every day. ML engineers access the data to develop their ML models.

The ML engineers need to generate daily reports and analyze click trends over the past 3 days by using Amazon Athena. The company must retain the data for 30 days before archiving the data.

Which solution will provide the HIGHEST performance for data retrieval?

Options:

Keep all the time-series data without partitioning in the S3 bucket. Manually move data that is older than 30 days to separate S3 buckets.

Create AWS Lambda functions to copy the time-series data into separate S3 buckets. Apply S3 Lifecycle policies to archive data that is older than 30 days to S3 Glacier Flexible Retrieval.

Organize the time-series data into partitions by date prefix in the S3 bucket. Apply S3 Lifecycle policies to archive partitions that are older than 30 days to S3 Glacier Flexible Retrieval.

Put each day ' s time-series data into its own S3 bucket. Use S3 Lifecycle policies to archive S3 buckets that hold data that is older than 30 days to S3 Glacier Flexible Retrieval.

Buy Now

Questions 20

A gaming company needs to deploy a natural language processing (NLP) model to moderate a chat forum in a game. The workload experiences heavy usage during evenings and weekends but minimal activity during other hours.

Which solution will meet these requirements MOST cost-effectively?

Options:

Use an Amazon SageMaker AI batch transform job with fixed capacity.

Use Amazon SageMaker Serverless Inference.

Use a single Amazon EC2 GPU instance with reserved capacity.

Use Amazon SageMaker Asynchronous Inference.

Buy Now

Questions 21

A company is developing an ML model by using Amazon SageMaker AI. The company must monitor bias in the model and display the results on a dashboard. An ML engineer creates a bias monitoring job.

How should the ML engineer capture bias metrics to display on the dashboard?

Options:

Capture AWS CloudTrail metrics from SageMaker Clarify.

Capture Amazon CloudWatch metrics from SageMaker Clarify.

Capture SageMaker Model Monitor metrics from Amazon EventBridge.

Capture SageMaker Model Monitor metrics from Amazon SNS.

Buy Now

Questions 22

An ML engineer needs to create data ingestion pipelines and ML model deployment pipelines on AWS. All the raw data is stored in Amazon S3 buckets.

Which solution will meet these requirements?

Options:

Use Amazon Data Firehose to create the data ingestion pipelines. Use Amazon SageMaker Studio Classic to create the model deployment pipelines.

Use AWS Glue to create the data ingestion pipelines. Use Amazon SageMaker Studio Classic to create the model deployment pipelines.

Use Amazon Redshift ML to create the data ingestion pipelines. Use Amazon SageMaker Studio Classic to create the model deployment pipelines.

Use Amazon Athena to create the data ingestion pipelines. Use an Amazon SageMaker notebook to create the model deployment pipelines.

Buy Now

Questions 23

A company is uploading thousands of PDF policy documents into Amazon S3 and Amazon Bedrock Knowledge Bases. Each document contains structured sections. Users often search for a small section but need the full section context. The company wants accurate section-level search with automatic context retrieval and minimal custom coding.

Which chunking strategy meets these requirements?

Options:

Hierarchical

Maximum tokens

Semantic

Fixed-size

Buy Now

Questions 24

An ML engineer is configuring auto scaling for an inference component of a model that runs behind an Amazon SageMaker AI endpoint. The ML engineer configures SageMaker AI auto scaling with a target tracking scaling policy set to 100 invocations per model per minute. The SageMaker AI endpoint scales appropriately during normal business hours. However, the ML engineer notices that at the start of each business day, there are zero instances available to handle requests, which causes delays in processing.

The ML engineer must ensure that the SageMaker AI endpoint can handle incoming requests at the start of each business day.

Which solution will meet this requirement?

Options:

Reduce the SageMaker AI auto scaling cooldown period to the minimum supported value. Add an auto scaling lifecycle hook to scale the SageMaker AI instances.

Change the target metric to CPU utilization.

Modify the scaling policy target value to one.

Apply a step scaling policy that scales based on an Amazon CloudWatch alarm. Apply a second CloudWatch alarm and scaling policy to scale the minimum number of instances from zero to one at the start of each business day.

Buy Now

Questions 25

A company has trained an ML model in Amazon SageMaker. The company needs to host the model to provide inferences in a production environment.

The model must be highly available and must respond with minimum latency. The size of each request will be between 1 KB and 3 MB. The model will receive unpredictable bursts of requests during the day. The inferences must adapt proportionally to the changes in demand.

How should the company deploy the model into production to meet these requirements?

Options:

Create a SageMaker real-time inference endpoint. Configure auto scaling. Configure the endpoint to present the existing model.

Deploy the model on an Amazon Elastic Container Service (Amazon ECS) cluster. Use ECS scheduled scaling that is based on the CPU of the ECS cluster.

Install SageMaker Operator on an Amazon Elastic Kubernetes Service (Amazon EKS) cluster. Deploy the model in Amazon EKS. Set horizontal pod auto scaling to scale replicas based on the memory metric.

Use Spot Instances with a Spot Fleet behind an Application Load Balancer (ALB) for inferences. Use the ALBRequestCountPerTarget metric as the metric for auto scaling.

Buy Now

Questions 26

An ML engineer is using an Amazon SageMaker AI shadow test to evaluate a new model that is hosted on a SageMaker AI endpoint. The shadow test requires significant GPU resources for high performance. The production variant currently runs on a less powerful instance type.

The ML engineer needs to configure the shadow test to use a higher performance instance type for a shadow variant. The solution must not affect the instance type of the production variant.

Which solution will meet these requirements?

Options:

Modify the existing ProductionVariant configuration in the endpoint to include a ShadowProductionVariants list. Specify the larger instance type for the shadow variant.

Create a new endpoint configuration with two ProductionVariant definitions. Configure one definition for the existing production variant and one definition for the shadow variant with the larger instance type. Use the UpdateEndpoint action to apply the new configuration.

Create a separate SageMaker AI endpoint for the shadow variant that uses the larger instance type. Create an AWS Lambda function that routes a portion of the traffic to the shadow endpoint. Assign the Lambda function to the original endpoint.

Use the CreateEndpointConfig action to define a new configuration. Specify the existing production variant in the configuration and add a separate ShadowProductionVariants list. Specify the larger instance type for the shadow variant. Use the CreateEndpoint action and pass the new configuration to the endpoint.

Buy Now

Questions 27

A company uses a hybrid cloud environment. A model that is deployed on premises uses data in Amazon 53 to provide customers with a live conversational engine.

The model is using sensitive data. An ML engineer needs to implement a solution to identify and remove the sensitive data.

Which solution will meet these requirements with the LEAST operational overhead?

Options:

Deploy the model on Amazon SageMaker. Create a set of AWS Lambda functions to identify and remove the sensitive data.

Deploy the model on an Amazon Elastic Container Service (Amazon ECS) cluster that uses AWS Fargate. Create an AWS Batch job to identify and remove the sensitive data.

Use Amazon Macie to identify the sensitive data. Create a set of AWS Lambda functions to remove the sensitive data.

Use Amazon Comprehend to identify the sensitive data. Launch Amazon EC2 instances to remove the sensitive data.

Buy Now

Questions 28

An ML engineer is building a generative AI application on Amazon Bedrock by using large language models (LLMs).

Select the correct generative AI term from the following list for each description. Each term should be selected one time or not at all. (Select three.)

• Embedding

• Retrieval Augmented Generation (RAG)

• Temperature

• Token

MLA-C01 Question 28

Options:

Buy Now

Questions 29

A company wants to host an ML model on Amazon SageMaker. An ML engineer is configuring a continuous integration and continuous delivery (Cl/CD) pipeline in AWS CodePipeline to deploy the model. The pipeline must run automatically when new training data for the model is uploaded to an Amazon S3 bucket.

Select and order the pipeline ' s correct steps from the following list. Each step should be selected one time or not at all. (Select and order three.)

• An S3 event notification invokes the pipeline when new data is uploaded.

• S3 Lifecycle rule invokes the pipeline when new data is uploaded.

• SageMaker retrains the model by using the data in the S3 bucket.

• The pipeline deploys the model to a SageMaker endpoint.

• The pipeline deploys the model to SageMaker Model Registry.

MLA-C01 Question 29

Options:

Buy Now

Questions 30

A company is building an Amazon SageMaker AI pipeline for an ML model. The pipeline uses distributed processing and training.

An ML engineer needs to encrypt network communication between instances that run distributed jobs. The ML engineer configures the distributed jobs to run in a private VPC.

What should the ML engineer do to meet the encryption requirement?

Options:

Enable network isolation.

Configure traffic encryption by using security groups.

Enable inter-container traffic encryption.

Enable VPC flow logs.

Buy Now

Questions 31

An ML engineer is evaluating several ML models and must choose one model to use in production. The cost of false negative predictions by the models is much higher than the cost of false positive predictions.

Which metric finding should the ML engineer prioritize the MOST when choosing the model?

Options:

Low precision

High precision

Low recall

High recall

Buy Now

Questions 32

An ML engineer is tuning an image classification model that performs poorly on one of two classes. The poorly performing class represents an extremely small fraction of the training dataset.

Which solution will improve the model’s performance?

Options:

Optimize for accuracy. Use image augmentation on the less common images.

Optimize for F1 score. Use image augmentation on the less common images.

Optimize for accuracy. Use SMOTE to generate synthetic images.

Optimize for F1 score. Use SMOTE to generate synthetic images.

Buy Now

Questions 33

A company has developed a computer vision model. The company needs to deploy the model into production on Amazon SageMaker AI. The company has not hosted a model on SageMaker AI previously.

An ML engineer needs to implement a solution to track model versions. The solution also must provide recommendations about which Amazon EC2 instance types to use to host the model.

Which solution will meet these requirements?

Options:

Register the model in Amazon Elastic Container Registry (Amazon ECR). Use AWS Compute Optimizer for recommendations about instance types.

Register the model in the SageMaker Model Registry. Use SageMaker Inference Recommender for recommendations about instance types.

Register the model in Amazon Elastic Container Registry (Amazon ECR). Use SageMaker Experiments for recommendations about instance types.

Buy Now

Questions 34

An ML engineer is building a model to predict house and apartment prices. The model uses three features: Square Meters, Price, and Age of Building. The dataset has 10,000 data rows. The data includes data points for one large mansion and one extremely small apartment.

The ML engineer must perform preprocessing on the dataset to ensure that the model produces accurate predictions for the typical house or apartment.

Which solution will meet these requirements?

Options:

Remove the outliers and perform a log transformation on the Square Meters variable.

Keep the outliers and perform normalization on the Square Meters variable.

Remove the outliers and perform one-hot encoding on the Square Meters variable.

Keep the outliers and perform one-hot encoding on the Square Meters variable.

Buy Now

Questions 35

A company has an existing Amazon SageMaker AI model (v1) on a production endpoint. The company develops a new model version (v2) and needs to test v2 in production before substituting v2 for v1.

The company needs to minimize the risk of v2 generating incorrect output in production and must prevent any disruption of production traffic during the change.

Which solution will meet these requirements?

Options:

Create a second production variant for v2. Assign 1% of the traffic to v2 and 99% to v1. Collect all output of v2 in Amazon S3. If v2 performs as expected, switch all traffic to v2.

Create a second production variant for v2. Assign 10% of the traffic to v2 and 90% to v1. Collect all output of v2 in Amazon S3. If v2 performs as expected, switch all traffic to v2.

Deploy v2 to a new endpoint. Turn on data capture for the production endpoint. Send 100% of the input data to v2.

Deploy v2 into a shadow variant that samples 100% of the inference requests. Collect all output in Amazon S3. If v2 performs as expected, promote v2 to production.

Buy Now

Questions 36

A company has an application that uses different APIs to generate embeddings for input text. The company needs to implement a solution to automatically rotate the API tokens every 3 months.

Which solution will meet this requirement?

Options:

Store the tokens in AWS Secrets Manager. Create an AWS Lambda function to perform the rotation.

Store the tokens in AWS Systems Manager Parameter Store. Create an AWS Lambda function to perform the rotation.

Store the tokens in AWS Key Management Service (AWS KMS). Use an AWS managed key to perform the rotation.

Store the tokens in AWS Key Management Service (AWS KMS). Use an AWS owned key to perform the rotation.

Buy Now

Questions 37

A hospital wants to predict patient outcomes for the coming year An ML engineer must improve several existing ML models that currently perform poorly.

Select the correct regularization method from the following list to improve each model Select each regularization method one time, more than one time, or not at all. (Select THREE.)

• L1 regularization

• L2 regularization

• Early stopping

MLA-C01 Question 37

Options:

Buy Now

Questions 38

Case study

An ML engineer is developing a fraud detection model on AWS. The training dataset includes transaction logs, customer profiles, and tables from an on-premises MySQL database. The transaction logs and customer profiles are stored in Amazon S3.

The dataset has a class imbalance that affects the learning of the model ' s algorithm. Additionally, many of the features have interdependencies. The algorithm is not capturing all the desired underlying patterns in the data.

After the data is aggregated, the ML engineer must implement a solution to automatically detect anomalies in the data and to visualize the result.

Which solution will meet these requirements?

Options:

Use Amazon Athena to automatically detect the anomalies and to visualize the result.

Use Amazon Redshift Spectrum to automatically detect the anomalies. Use Amazon QuickSight to visualize the result.

Use Amazon SageMaker Data Wrangler to automatically detect the anomalies and to visualize the result.

Use AWS Batch to automatically detect the anomalies. Use Amazon QuickSight to visualize the result.

Buy Now

Answer:

Explanation:

Amazon SageMaker Data Wrangler is a comprehensive tool that streamlines the process of data preparation and offers built-in capabilities for anomaly detection and visualization.

Key Features of SageMaker Data Wrangler:

Data Importation: Connects seamlessly to various data sources, including Amazon S3 and on-premises databases, facilitating the aggregation of transaction logs, customer profiles, and MySQL tables.

Anomaly Detection: Provides built-in analyses to detect anomalies in time series data, enabling the identification of outliers that may indicate fraudulent activities.

Visualization: Offers a suite of visualization tools, such as histograms and scatter plots, to help understand data distributions and relationships, which are crucial for feature engineering and model development.

Implementation Steps:

Data Aggregation:

Import data from Amazon S3 and on-premises MySQL databases into SageMaker Data Wrangler.

Utilize Data Wrangler ' s data flow interface to combine and preprocess datasets, ensuring a unified dataset for analysis.

Anomaly Detection:

Apply the anomaly detection analysis feature to identify outliers in the dataset.

Configure parameters such as the anomaly threshold to fine-tune the detection sensitivity.

Visualization:

Use built-in visualization tools to create charts and graphs that depict data distributions and highlight anomalies.

Interpret these visualizations to gain insights into potential fraud patterns and feature interdependencies.

Advantages of Using SageMaker Data Wrangler:

Integrated Workflow: Combines data preparation, anomaly detection, and visualization within a single interface, streamlining the ML development process.

Operational Efficiency: Reduces the need for multiple tools and complex integrations, thereby minimizing operational overhead.

Scalability: Handles large datasets efficiently, making it suitable for extensive transaction logs and customer profiles.

By leveraging SageMaker Data Wrangler, the ML engineer can effectively detect anomalies and visualize results, facilitating the development of a robust fraud detection model.

Analyze and Visualize - Amazon SageMaker

Transform Data - Amazon SageMaker

Questions 39

A company needs to host a custom ML model to perform forecast analysis. The forecast analysis will occur with predictable and sustained load during the same 2-hour period every day.

Multiple invocations during the analysis period will require quick responses. The company needs AWS to manage the underlying infrastructure and any auto scaling activities.

Which solution will meet these requirements?

Options:

Schedule an Amazon SageMaker batch transform job by using AWS Lambda.

Configure an Auto Scaling group of Amazon EC2 instances to use scheduled scaling.

Use Amazon SageMaker Serverless Inference with provisioned concurrency.

Run the model on an Amazon Elastic Kubernetes Service (Amazon EKS) cluster on Amazon EC2 with pod auto scaling.

Buy Now

Questions 40

A company wants to use large language models (LLMs) that are supported by Amazon Bedrock to develop a chat interface for the company ' s internal technical documentation. The company stores the documentation as dozens of text files that are several megabytes in total size. The company updates the text files often.

Which solution will meet these requirements MOST cost-effectively?

Options:

Create a new LLM on Amazon Bedrock. Train the new LLM on the original dataset and the company documentation. Make the new model available in Bedrock for calls from the chat interface.

Integrate the company documentation with Amazon Bedrock guardrails. Invoke the guardrails for all Amazon Bedrock calls from the chat interface.

Use all the text files to fine tune a model in Amazon Bedrock. Use the fine-tuned model to process user prompts.

Upload all the text files to an Amazon Bedrock knowledge base. Use the knowledge base to provide context when the chat interface makes calls to Amazon Bedrock.

Buy Now

Questions 41

A company is building a conversational AI assistant on Amazon Bedrock. The company is using Retrieval Augmented Generation (RAG) to reference the company ' s internal knowledge base. The AI assistant uses the Anthropic Claude 4 foundation model (FM).

The company needs a solution that uses a vector embedding model, a vector store, and a vector search algorithm.

Which solution will develop the AI assistant with the LEAST development effort?

Options:

Use Amazon Kendra Experience Builder.

Use Amazon Aurora PostgreSQL with the pgvector extension.

Use Amazon RDS for PostgreSQL with the pgvector extension.

Use the AWS Glue Data Catalog metadata repository.

Buy Now

Questions 42

A travel company wants to create an ML model to recommend the next airport destination for its users. The company has collected millions of data records about user location, recent search history on the company ' s website, and 2,000 available airports. The data has several categorical features with a target column that is expected to have a high-dimensional sparse matrix.

The company needs to use Amazon SageMaker AI built-in algorithms for the model. An ML engineer converts the categorical features by using one-hot encoding.

Which algorithm should the ML engineer implement to meet these requirements?

Options:

Use the CatBoost algorithm to recommend the next airport destination.

Use the DeepAR forecasting algorithm to recommend the next airport destination.

Use the Factorization Machines algorithm to recommend the next airport destination.

Use the k-means algorithm to cluster users into groups and map each group to the next airport destination.

Buy Now

Questions 43

An ML engineer has an Amazon Comprehend custom model in Account A in the us-east-1 Region. The ML engineer needs to copy the model to Account В in the same Region.

Which solution will meet this requirement with the LEAST development effort?

Options:

Use Amazon S3 to make a copy of the model. Transfer the copy to Account B.

Create a resource-based IAM policy. Use the Amazon Comprehend ImportModel API operation to copy the model to Account B.

Use AWS DataSync to replicate the model from Account A to Account B.

Create an AWS Site-to-Site VPN connection between Account A and Account В to transfer the model.

Buy Now

Questions 44

An ML engineer wants to use, prepare, and load data from Amazon S3 for analytics. The ML engineer must run an extract, transform, and load (ETL) job to discover the schema of the data and to store the metadata.

Which solution will meet these requirements with the LEAST manual effort?

Options:

Use AWS Glue to run the ETL job. Use the job to discover the schema and to store the associated metadata in the AWS Glue Data Catalog.

Create an Amazon SageMaker Data Wrangler flow to run the ETL job. Use the job to discover the schema and to store the associated metadata in an S3 bucket.

Create an ETL pipeline by using Amazon Athena integrated with AWS Step Functions. Use the pipeline to run the ETL job to discover the schema and to store the associated metadata in an S3 bucket.

Launch an Amazon EC2 instance that includes the scikit-learn library to run the ETL job. Use the job to discover the schema and to store the associated metadata in Amazon Redshift.

Buy Now

Questions 45

An ML engineer is analyzing potential biases in a customer dataset before training an ML model. The dataset contains customer age (numeric), product reviews (text), and purchase outcomes (categorical).

Which statistical metrics should the ML engineer use to identify potential biases in the dataset before model training?

Options:

Calculate the statistical mean and standard deviation of customer age distribution. Count word frequencies in product reviews.

Calculate the class imbalance metric of purchase outcomes. Use product reviews to check sentiment distribution to capture bias.

Calculate the class imbalance metric of purchase outcomes and the difference in proportions of labels (DPL) across customer age groups.

Calculate the correlation coefficient between customer age and purchase outcomes. Calculate unique word counts in product reviews.

Buy Now

Questions 46

A company has trained and deployed an ML model by using Amazon SageMaker. The company needs to implement a solution to record and monitor all the API call events for the SageMaker endpoint. The solution also must provide a notification when the number of API call events breaches a threshold.

Use SageMaker Debugger to track the inferences and to report metrics. Create a custom rule to provide a notification when the threshold is breached.

Which solution will meet these requirements?

Options:

Use SageMaker Debugger to track the inferences and to report metrics. Create a custom rule to provide a notification when the threshold is breached.

Use SageMaker Debugger to track the inferences and to report metrics. Use the tensor_variance built-in rule to provide a notification when the threshold is breached.

Log all the endpoint invocation API events by using AWS CloudTrail. Use an Amazon CloudWatch dashboard for monitoring. Set up a CloudWatch alarm to provide notification when the threshold is breached.

Add the Invocations metric to an Amazon CloudWatch dashboard for monitoring. Set up a CloudWatch alarm to provide notification when the threshold is breached.

Buy Now

Questions 47

An ML engineer is developing a classification model. The ML engineer needs to use custom libraries in processing jobs, training jobs, and pipelines in Amazon SageMaker AI.

Which solution will provide this functionality with the LEAST implementation effort?

Options:

Manually install the libraries in the SageMaker AI containers.

Build a custom Docker container that includes the required libraries. Host the container in Amazon Elastic Container Registry (Amazon ECR). Use the ECR image in the SageMaker AI jobs and pipelines.

Use a SageMaker AI notebook instance and install libraries at startup.

Run code externally on Amazon EC2 and import results into SageMaker AI.

Buy Now

Questions 48

A company wants to build an anomaly detection ML model. The model will use large-scale tabular data that is stored in an Amazon S3 bucket. The company does not have expertise in Python, Spark, or other languages for ML.

An ML engineer needs to transform and prepare the data for ML model training.

Which solution will meet these requirements?

Options:

Prepare the data by using Amazon EMR Serverless applications that host Amazon SageMaker Studio notebooks.

Prepare the data by using the Amazon SageMaker Data Wrangler visual interface in Amazon SageMaker Canvas.

Run SQL queries from a JupyterLab space in Amazon SageMaker Studio. Process the data further by using pandas DataFrames.

Prepare the data by using a JupyterLab notebook in Amazon SageMaker Studio.

Buy Now

Questions 49

A company runs an Amazon SageMaker domain in a public subnet of a newly created VPC. The network is configured properly, and ML engineers can access the SageMaker domain.

Recently, the company discovered suspicious traffic to the domain from a specific IP address. The company needs to block traffic from the specific IP address.

Which update to the network configuration will meet this requirement?

Options:

Create a security group inbound rule to deny traffic from the specific IP address. Assign the security group to the domain.

Create a network ACL inbound rule to deny traffic from the specific IP address. Assign the rule to the default network Ad for the subnet where the domain is located.

Create a shadow variant for the domain. Configure SageMaker Inference Recommender to send traffic from the specific IP address to the shadow endpoint.

Create a VPC route table to deny inbound traffic from the specific IP address. Assign the route table to the domain.

Buy Now

Questions 50

A company ' s ML engineer is creating a classification model. The ML engineer explores the dataset and notices a column named day_of_week. The column contains the following values: Monday, Tuesday, Wednesday, Thursday, Friday, Saturday, and Sunday.

Which technique should the ML engineer use to convert this column’s data to binary values?

Options:

Binary encoding

Label encoding

One-hot encoding

Tokenization

Buy Now

Questions 51

A company has deployed an XGBoost prediction model in production to predict if a customer is likely to cancel a subscription. The company uses Amazon SageMaker Model Monitor to detect deviations in the F1 score.

During a baseline analysis of model quality, the company recorded a threshold for the F1 score. After several months of no change, the model ' s F1 score decreases significantly.

What could be the reason for the reduced F1 score?

Options:

Concept drift occurred in the underlying customer data that was used for predictions.

The model was not sufficiently complex to capture all the patterns in the original baseline data.

The original baseline data had a data quality issue of missing values.

Incorrect ground truth labels were provided to Model Monitor during the calculation of the baseline.

Buy Now

Questions 52

A company uses Amazon Athena to query a dataset in Amazon S3. The dataset has a target variable that the company wants to predict.

The company needs to use the dataset in a solution to determine if a model can predict the target variable.

Which solution will provide this information with the LEAST development effort?

Options:

Create a new model by using Amazon SageMaker Autopilot. Report the model ' s achieved performance.

Implement custom scripts to perform data pre-processing, multiple linear regression, and performance evaluation. Run the scripts on Amazon EC2 instances.

Configure Amazon Macie to analyze the dataset and to create a model. Report the model ' s achieved performance.

Select a model from Amazon Bedrock. Tune the model with the data. Report the model ' s achieved performance.

Buy Now

Questions 53

A digital media entertainment company needs real-time video content moderation to ensure compliance during live streaming events.

Which solution will meet these requirements with the LEAST operational overhead?

Options:

Use Amazon Rekognition and AWS Lambda to extract and analyze the metadata from the videos ' image frames.

Use Amazon Rekognition and a large language model (LLM) hosted on Amazon Bedrock to extract and analyze the metadata from the videos’ image frames.

Use Amazon SageMaker AI to extract and analyze the metadata from the videos ' image frames.

Use Amazon Transcribe and Amazon Comprehend to extract and analyze the metadata from the videos ' image frames.

Buy Now

Questions 54

A company has an ML model that needs to run one time each night to predict stock values. The model input is 3 MB of data that is collected during the current day. The model produces the predictions for the next day. The prediction process takes less than 1 minute to finish running.

How should the company deploy the model on Amazon SageMaker to meet these requirements?

Options:

Use a multi-model serverless endpoint. Enable caching.

Use an asynchronous inference endpoint. Set the InitialInstanceCount parameter to 0.

Use a real-time endpoint. Configure an auto scaling policy to scale the model to 0 when the model is not in use.

Use a serverless inference endpoint. Set the MaxConcurrency parameter to 1.

Buy Now

Questions 55

A company wants to improve the sustainability of its ML operations.

Which actions will reduce the energy usage and computational resources that are associated with the company ' s training jobs? (Choose two.)

Options:

Use Amazon SageMaker Debugger to stop training jobs when non-converging conditions are detected.

Use Amazon SageMaker Ground Truth for data labeling.

Deploy models by using AWS Lambda functions.

Use AWS Trainium instances for training.

Use PyTorch or TensorFlow with the distributed training option.

Buy Now

Questions 56

A hospital is using an ML model to validate x-ray results. The hospital runs a nightly batch inference job. The hospital needs to produce a daily report about model data quality and model performance.

Which solution will meet these requirements?

Options:

Schedule a monitoring job in Amazon SageMaker Model Monitor. Generate the monitoring results for the model and data.

Create an Amazon CloudWatch dashboard that includes the metrics for processing steps in the nightly batch inference job. Compare the baseline resource metrics. Share the dashboard link.

Use AWS Glue DataBrew to create a custom recipe job that uses the Numerical Statistics data quality check for the model file. Generate the results.

Create a SageMaker AI pipeline that includes a QualityCheck step to run monitoring jobs. Generate the monitoring results for the model and the data.

Buy Now

Questions 57

An ML engineer receives datasets that contain missing values, duplicates, and extreme outliers. The ML engineer must consolidate these datasets into a single data frame and must prepare the data for ML.

Which solution will meet these requirements?

Options:

Use Amazon SageMaker Data Wrangler to import the datasets and to consolidate them into a single data frame. Use the cleansing and enrichment functionalities to prepare the data.

Use Amazon SageMaker Ground Truth to import the datasets and to consolidate them into a single data frame. Use the human-in-the-loop capability to prepare the data.

Manually import and merge the datasets. Consolidate the datasets into a single data frame. Use Amazon Q Developer to generate code snippets that will prepare the data.

Manually import and merge the datasets. Consolidate the datasets into a single data frame. Use Amazon SageMaker data labeling to prepare the data.

Buy Now

Questions 58

An ML engineer is using Amazon SageMaker AI to train an ML model. The ML engineer needs to use SageMaker AI automatic model tuning (AMT) features to tune the model hyperparameters over a large parameter space.

The model has 20 categorical hyperparameters and 7 continuous hyperparameters that can be tuned. The ML engineer needs to run the tuning job a maximum of 1,000 times. The ML engineer must ensure that each parameter trial is built based on the performance of the previous trial.

Which solution will meet these requirements?

Options:

Define the search space as categorical parameters of 1,000 possible combinations. Use grid search.

Define the search space as continuous parameters. Use random search. Set the maximum number of tuning jobs to 1,000.

Define the search space as categorical parameters and continuous parameters. Use Bayesian optimization. Set the maximum number of training jobs to 1,000.

Define the search space as categorical parameters and continuous parameters. Use grid search. Set the maximum number of tuning jobs to 1,000.

Buy Now

Questions 59

A company uses a batching solution to process daily analytics. The company wants to provide near real-time updates, use open-source technology, and avoid managing or scaling infrastructure.

Which solution will meet these requirements?

Options:

Create Amazon Managed Streaming for Apache Kafka (Amazon MSK) Serverless clusters.

Create Amazon MSK Provisioned clusters.

Create Amazon Kinesis Data Streams with Application Auto Scaling.

Create self-hosted Apache Flink applications on Amazon EC2.

Buy Now

Questions 60

A music streaming company constantly streams song ratings from an application to an Amazon S3 bucket. The company wants to use the ratings as an input for training and inference of an Amazon SageMaker AI model.

The company has an AWS Glue Data Catalog that is configured with the S3 bucket as the source. An ML engineer needs to implement a solution to create a repository for this data. The solution must ensure that the data stays synchronized during batch training and real-time inference.

Which solution will meet these requirements?

Options:

Ingest data into SageMaker Feature Store from the S3 bucket. Apply tags and indexes.

Use Amazon Athena. Create tables by using CREATE TABLE AS SELECT (CTAS) queries to group data.

Use AWS Lake Formation. Apply tag-based control on the data.

Use the Generate Data Insights function in SageMaker Data Wrangler.

Buy Now

Questions 61

An ML engineer wants to deploy a workflow that processes streaming IoT sensor data and periodically retrains ML models. The most recent model versions must be deployed to production.

Which service will meet these requirements?

Options:

Amazon SageMaker Pipelines

Amazon Managed Workflows for Apache Airflow (MWAA)

AWS Lambda

Apache Spark

Buy Now

Questions 62

A company is developing an ML model for a customer. The training data is stored in an Amazon S3 bucket in the customer ' s AWS account (Account A). The company runs Amazon SageMaker AI training jobs in a separate AWS account (Account B).

The company defines an S3 bucket policy and an IAM policy to allow reads to the S3 bucket.

Which additional steps will meet the cross-account access requirement?

Options:

Create the S3 bucket policy in Account A. Attach the IAM policy to an IAM role that SageMaker AI uses in Account A.

Create the S3 bucket policy in Account A. Attach the IAM policy to an IAM role that SageMaker AI uses in Account B.

Create the S3 bucket policy in Account B. Attach the IAM policy to an IAM role that SageMaker AI uses in Account A.

Create the S3 bucket policy in Account B. Attach the IAM policy to an IAM role that SageMaker AI uses in Account B.

Buy Now

Questions 63

A financial company receives a high volume of real-time market data streams from an external provider. The streams consist of thousands of JSON records every second.

The company needs to implement a scalable solution on AWS to identify anomalous data points.

Which solution will meet these requirements with the LEAST operational overhead?

Options:

Ingest real-time data into Amazon Kinesis Data Streams. Use the built-in RANDOM_CUT_FOREST function in Amazon Managed Service for Apache Flink to process the data streams and to detect data anomalies.

Ingest real-time data into Amazon Kinesis Data Streams. Deploy an Amazon SageMaker AI endpoint for real-time outlier detection. Create an AWS Lambda function to detect anomalies. Use the data streams to invoke the Lambda function.

Ingest real-time data into Apache Kafka on Amazon EC2 instances. Deploy an Amazon SageMaker AI endpoint for real-time outlier detection. Create an AWS Lambda function to detect anomalies. Use the data streams to invoke the Lambda function.

Send real-time data to an Amazon Simple Queue Service (Amazon SQS) FIFO queue. Create an AWS Lambda function to consume the queue messages. Program the Lambda function to start an AWS Glue extract, transform, and load (ETL) job for batch processing and anomaly detection.

Buy Now

Questions 64

A company is using an Amazon S3 bucket to collect data that will be used for ML workflows. The company needs to use AWS Glue DataBrew to clean and normalize the data.

Which solution will meet these requirements?

Options:

Create a DataBrew dataset by using the S3 path. Clean and normalize the data by using a DataBrew profile job.

Create a DataBrew dataset by using the S3 path. Clean and normalize the data by using a DataBrew recipe job.

Create a DataBrew dataset by using a JDBC driver to connect to the S3 bucket. Use a profile job.

Create a DataBrew dataset by using a JDBC driver to connect to the S3 bucket. Use a recipe job.

Buy Now

Questions 65

A company has a Retrieval Augmented Generation (RAG) application that uses a vector database to store embeddings of documents. The company must migrate the application to AWS and must implement a solution that provides semantic search of text files. The company has already migrated the text repository to an Amazon S3 bucket.

Which solution will meet these requirements?

Options:

Use an AWS Batch job to process the files and generate embeddings. Use AWS Glue to store the embeddings. Use SQL queries to perform the semantic searches.

Use a custom Amazon SageMaker notebook to run a custom script to generate embeddings. Use SageMaker Feature Store to store the embeddings. Use SQL queries to perform the semantic searches.

Use the Amazon Kendra S3 connector to ingest the documents from the S3 bucket into Amazon Kendra. Query Amazon Kendra to perform the semantic searches.

Use an Amazon Textract asynchronous job to ingest the documents from the S3 bucket. Query Amazon Textract to perform the semantic searches.

Buy Now

Questions 66

A company is using an AWS Lambda function to monitor the metrics from an ML model. An ML engineer needs to implement a solution to send an email message when the metrics breach a threshold.

Which solution will meet this requirement?

Options:

Log the metrics from the Lambda function to AWS CloudTrail. Configure a CloudTrail trail to send the email message.

Log the metrics from the Lambda function to Amazon CloudFront. Configure an Amazon CloudWatch alarm to send the email message.

Log the metrics from the Lambda function to Amazon CloudWatch. Configure a CloudWatch alarm to send the email message.

Log the metrics from the Lambda function to Amazon CloudWatch. Configure an Amazon CloudFront rule to send the email message.

Buy Now

Questions 67

An ML engineer needs to use data with Amazon SageMaker Canvas to train an ML model. The data is stored in Amazon S3 and is complex in structure. The ML engineer must use a file format that minimizes processing time for the data.

Which file format will meet these requirements?

Options:

CSV files compressed with Snappy

JSON objects in JSONL format

JSON files compressed with gzip

Apache Parquet files

Buy Now

Questions 68

A company has deployed a model to predict the churn rate for its games by using Amazon SageMaker Studio. After the model is deployed, the company must monitor the model performance for data drift and inspect the report. Select and order the correct steps from the following list to model monitor actions. Select each step one time. (Select and order THREE.) .

Check the analysis results on the SageMaker Studio console. .

Create a Shapley Additive Explanations (SHAP) baseline for the model by using Amazon SageMaker Clarify.

Schedule an hourly model explainability monitor.

MLA-C01 Question 68

Options:

Buy Now

Questions 69

A customer call center uses Amazon Transcribe to convert hundreds of audio recordings of conversations between customers and support agents to text files. The call center wants to use the text files to train an ML model. To comply with industry regulations, the call center must remove customer names, addresses, and phone numbers from the training text files.

Which solution will meet these requirements with the LEAST development effort?

Options:

Use Amazon Bedrock Guardrails to process and redact personal information from the text files.

Use the AWS Glue Detect PII transform to remove personal information from the text files.

Store the text files in Amazon S3 buckets. Use S3 Object Lambda functions to redact personal information.

Configure an Amazon SageMaker Data Wrangler custom transformation to remove personal information from the text files.

Buy Now

Questions 70

A company uses Amazon SageMaker for its ML workloads. The company ' s ML engineer receives a 50 MB Apache Parquet data file to build a fraud detection model. The file includes several correlated columns that are not required.

What should the ML engineer do to drop the unnecessary columns in the file with the LEAST effort?

Options:

Download the file to a local workstation. Perform one-hot encoding by using a custom Python script.

Create an Apache Spark job that uses a custom processing script on Amazon EMR.

Create a SageMaker processing job by calling the SageMaker Python SDK.

Create a data flow in SageMaker Data Wrangler. Configure a transform step.

Buy Now

Questions 71

An ML engineer has trained a neural network by using stochastic gradient descent (SGD). The neural network performs poorly on the test set. The values for training loss and validation loss remain high and show an oscillating pattern. The values decrease for a few epochs and then increase for a few epochs before repeating the same cycle.

What should the ML engineer do to improve the training process?

Options:

Introduce early stopping.

Increase the size of the test set.

Increase the learning rate.

Decrease the learning rate.

Buy Now

Questions 72

A company wants to deploy an Amazon SageMaker AI model that can queue requests. The model needs to handle payloads of up to 1 GB that take up to 1 hour to process. The model must return an inference for each request. The model also must scale down when no requests are available to process.

Which inference option will meet these requirements?

Options:

Asynchronous inference

Batch transform

Serverless inference

Real-time inference

Buy Now

Exam Code: MLA-C01

Exam Name: AWS Certified Machine Learning Engineer - Associate

Last Update: May 30, 2026

Questions: 241

PDF + Testing Engine

$64.99 ~~$185.69~~

Testing Engine

$49.99 ~~$142.83~~

PDF (Q&A)

$54.99 ~~$157.11~~