Spring Sale Limited Time 65% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: pass65

NCP-AII NVIDIA AI Infrastructure Questions and Answers

Questions 4

A systems engineer is updating firmware across a large DGX cluster using automation. What is the best practice for minimizing risk and ensuring cluster health during and after the process?

Options:

A.

Drain nodes from the scheduler, run pre-update diagnostics, update firmware in batches, and verify health post-update before scaling to the next batch.

B.

To save time, simultaneously update all nodes in the cluster without draining or diagnostics.

C.

Update nodes that have reported faults, leaving others on older firmware.

D.

Drain nodes from the scheduler, update firmware in batches, skip diagnostics and verify health post-update before scaling to the next batch.

Buy Now
Questions 5

A team is validating a DGX BasePOD deployment. Using cmsh, they run a command to check GPU health across all nodes. What indicates that the system is ready for AI workloads?

Options:

A.

The command output is ignored if the system powers on without errors.

B.

At least half of the GPUs report Status_Health = OK.

C.

All GPUs report Status_Health = OK and Health = OK for each device.

D.

Only the head node's GPUs need to be healthy.

Buy Now
Questions 6

A System Administrator needs to change the scheduling behavior of a single GPU to use a fixed share scheduler. What command achieves this?

Options:

A.

esxcli system module parameters set -m nvidia -p

B.

esxcli -i 0 -mig 18

C.

nvidia-smi -i 0 -mig 1

D.

mlxconfig -d /dev/mst/mt4123_pciconf0 set LINK_TYPE_P1 =2

Buy Now
Questions 7

During HPL execution on a DGX cluster, the benchmark fails with "not enough memory" errors despite sufficient physical RAM. Which HPL.dat parameter adjustment is most effective?

Options:

A.

Reduce the problem size while maintaining the same block size.

B.

Set PMAP to 1 to enable process mapping.

C.

Increase block size to 6144 to maximize GPU utilization.

D.

Disable double-buffering via BCAST parameter.

Buy Now
Questions 8

A 24-hour HPL burn-in fails with "illegal value" errors during the first iteration. Which initial troubleshooting step resolves this without compromising burn-in validity?

Options:

A.

Switch from FP64 to FP32 precision.

B.

Disable GPU affinity.

C.

Reduce test duration to 12 hours.

D.

Verify the matrix size is divisible by block size.

Buy Now
Questions 9

A system administrator receives an alert about a potential hardware fault on an NVIDIA DGX A100. The GPU performance seems degraded, and the system fans are operating loudly. What step should be recommended to identify and troubleshoot the hardware fault?

Options:

A.

Run a deep learning workload to stress test the GPUs and check whether the issue persists.

B.

Check the NVIDIA System Management Interface (nvidia-smi) for GPU status and temperatures.

C.

Power drain then restart the DGX and check if the performance degradation resolves.

D.

Increase the fan speed to maximum and check whether the performance improves.

Buy Now
Questions 10

You are installing the operating system as part of the initial setup for a new NVIDIA Base Command Manager (BCM) cluster. Which two of the following actions are essential for a successful OS installation on the cluster's head node? (Pick the 2 correct responses below)

Options:

A.

Configure network switches for PXE boot to all compute nodes before installing the OS on the head node.

B.

Download the latest BCM ISO and verify its integrity using the provided checksum, then start the installation.

C.

Start the head node OS installation process with the system BIOS set to legacy boot mode instead of UEFI.

D.

Set the desired time zone and configure NTP synchronization during the OS installation wizard.

Buy Now
Questions 11

One of the nodes in a cluster is not running as fast as the others and the system administrator needs to check the status of the GPUs on that system. What command should be used?

Options:

A.

lspci | grep NVIDIA

B.

nvidia-smi

C.

nvidia-gpu-status

D.

iblinkinfo

Buy Now
Questions 12

An engineer needs to verify the current firmware versions of all components (ATF, BSP, NIC, UEFI) on a BlueField-3 DPU's BMC. Which Redfish API command provides this information?

Options:

A.

mlxconfig -d q

B.

curl -k -u root: -X GET https:// /redfish/v1/UpdateService/FirmwareList

C.

mstflint -d query full

D.

curl -k -u root: -X GET https:// /redfish/v1/UpdateService/FirmwareInventory

Buy Now
Questions 13

You are evaluating the integration of NVIDIA BlueField DPUs into your data center's storage architecture to optimize AI workloads. The storage solution chosen has incorporated BlueField DPUs to enhance performance and efficiency. Which of the following benefits directly results from this integration?

Options:

A.

Unlimited scalability by adding more DPUs without architectural changes.

B.

Elimination of latency issues in data processing tasks.

C.

Reduced CPU load by offloading data processing tasks to DPUs.

D.

Enhanced I/O performance with NVMe storage access speeds.

Buy Now
Questions 14

What command is needed to measure BER (Bit Error Rate)?

Options:

A.

mlxconfig -d q

B.

ethtool -S

C.

mlxlink -d -c -e

D.

mstflint -d q full

Buy Now
Questions 15

ClusterKit's NCCL bandwidth test shows 350 GB/s on a 400G InfiniBand fabric. How should this result be interpreted?

Options:

A.

Optimal performance, indicating healthy fabric and GPUDirect RDMA.

B.

Suboptimal performance; requires FEC tuning to reach 380+ GB/s.

C.

Critical failure; expected is >390 GB/s for HDR InfiniBand.

D.

Inconclusive; rerun with --stress=cpu to validate.

Buy Now
Questions 16

A customer has just completed the first boot of their DGX system and is prompted to create an administrative user. What is the correct approach for setting up this user to ensure secure BMC and GRUB access?

Options:

A.

Create separate usernames for BMC and GRUB to maximize flexibility.

B.

Skip the creation of a new user and retain the default admin account for BMC and GRUB access.

C.

Create a unique, strong, lower-case username and password that will be used for both BMC and GRUB access, avoiding default or weak credentials.

D.

Use “sysadmin” as the username and a simple password for ease of management.

Buy Now
Questions 17

An InfiniBand administrator needs to run performance benchmarks on new devices added to the fabric. What tool should be used to check the latency?

Options:

A.

tcpdump

B.

ib_write_lat

C.

ibdiagnet

D.

perfmon

Buy Now
Questions 18

After initial setup and health checks, the DGX H100 system administrator wants to verify that containers can access GPUs before running production workloads. Which method is recommended for this validation?

Options:

A.

sudo docker run --gpus all --rm nvcr.io/nvidia/cuda:12.1.1-base-ubuntu22.04 systemctl

B.

sudo docker run --gpus all --rm nvcr.io/nvidia/cuda:12.1.1-base-ubuntu22.04 ls -la

C.

sudo docker run --rm nvcr.io/nvidia/cuda:12.1.1-base-ubuntu22.04 nvidia-smi

D.

sudo docker run --gpus all --rm nvcr.io/nvidia/cuda:12.1.1-base-ubuntu22.04 nvidia-smi

Buy Now
Questions 19

A system administrator noticed a failure on a DGX H100 server. After a reboot, only the BMC is available. What could be the reason for this behavior?

Options:

A.

The network card has no link / connection.

B.

A boot disk has failed.

C.

Multiple GPUs have failed.

D.

There are more than two failed power supplies.

Buy Now
Questions 20

Your company is planning to expand its AI capabilities significantly over the next five years. To future-proof your storage infrastructure, you need a solution that can scale in both capacity and performance. Which of the following strategies best ensures that your storage infrastructure remains adaptable to future AI demands?

Options:

A.

Deploy an all-flash array and remove data tiering to reduce latency.

B.

Implement single-tier cloud storage solution to leverage cloud scalability.

C.

Use a hybrid cloud model combining scalable cloud resources with on-premises infrastructure.

D.

Implement on-premises block storage system with periodic hardware upgrades.

Buy Now
Questions 21

A user wants to restrict a Docker container to use only GPUs 0 and 2. Which command achieves this?

Options:

A.

docker run --gpus '"device=0,2"' nvidia/cuda:12.1-base nvidia-smi

B.

docker run -e NVIDIA_VISIBLE_DEVICES=0,2 nvidia/cuda:12.1-base nvidia-smi

C.

docker run --gpus all nvidia/cuda:12.1-base nvidia-smi -id=0,2

D.

docker run --device /dev/nvidia0,/dev/nvidia2 nvidia/cuda:12.1-base nvidia-smi

Buy Now
Exam Code: NCP-AII
Exam Name: NVIDIA AI Infrastructure
Last Update: Feb 28, 2026
Questions: 71

PDF + Testing Engine

$63.52  $181.49

Testing Engine

$50.57  $144.49
buy now NCP-AII testing engine

PDF (Q&A)

$43.57  $124.49
buy now NCP-AII pdf