Pre-Summer Sale Limited Time 65% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: pass65

NCP-AII NVIDIA AI Infrastructure Questions and Answers

Questions 4

A customer is designing an AI Factory for enterprise-scale deployments and wants to ensure redundancy and load balancing for the management and storage networks. Which feature should be implemented on the Ethernet switches?

Options:

A.

Implement redundant switches with spanning tree protocol.

B.

MLAG for bonded interfaces across redundant switches.

C.

Use only one switch for all management and storage traffic.

D.

Disable VLANs and use unmanaged switches.

Buy Now
Questions 5

A leaf switch shows "FW Version Mismatch" alerts for transceivers after cluster expansion. Which tool validates transceiver firmware against expected versions?

Options:

A.

flint

B.

iblinkinfo

C.

mlxconfig

D.

ethtool

Buy Now
Questions 6

An administrator is configuring node categories in BCM for a DGX BasePOD cluster. They need to group all NVIDIA DGX H200 nodes under a dedicated category for GPU-accelerated workloads. Which approach aligns with NVIDIA's recommended BCM practices?

Options:

A.

Assign nodes to the ’login" category to simplify Slurm integration.

B.

Create a new "dgx-h200" category, assign all DGX H200 nodes to it.

C.

Use the existing "dgxnodes" category without modification, as it is preconfigured for all DGX systems.

D.

Avoid categories and configure each DGX node individually via CLI.

Buy Now
Questions 7

An engineer needs to verify the current firmware versions of all components (ATF, BSP, NIC, UEFI) on a BlueField-3 DPU's BMC. Which Redfish API command provides this information?

Options:

A.

mlxconfig -d q

B.

curl -k -u root: -X GET https:// /redfish/v1/UpdateService/FirmwareList

C.

mstflint -d query full

D.

curl -k -u root: -X GET https:// /redfish/v1/UpdateService/FirmwareInventory

Buy Now
Questions 8

An InfiniBand server stops working, and a system administrator runs the "ibstat" command that provides the following output:

CA 'mlx5_1'

CA type: MT4115

Number of ports: 2

Firmware version: 10.20.1010

Hardware version: 0

Node GUID: 0x0002c90300002f78

System image GUID: 0x0002c90300002f7b

Port 1:

State: Initializing

Physical state: Linkup

Rate: 100

Base lid: 0

LMC: 0

SM lid: 0

Capability mask: 0x0251086a

Port GUID: 0x0002c90300002f79

Link layer: InfiniBand

What is the cause of the issue?

Options:

A.

The HCA port is faulty.

B.

There is no running SM in the fabric.

C.

The neighboring switch port is faulty.

D.

The cable is disconnected.

Buy Now
Questions 9

An administrator installs NVIDIA GPU drivers on a DGX H100 system with UEFI Secure Boot enabled. After reboot, the drivers fail to load. What is the first action to resolve this issue?

Options:

A.

Disable Secure Boot permanently in BIOS/UEFI settings.

B.

Delete /etc/X11/xorg.conf to force driver reconfiguration.

C.

Enroll the Machine Owner Key (MOK) during system reboot and enter the recorded password.

D.

Reinstall drivers using apt-get install nvidia-driver-550 without rebooting.

Buy Now
Questions 10

What command is needed to measure BER (Bit Error Rate)?

Options:

A.

mlxconfig -d q

B.

ethtool -S

C.

mlxlink -d -c -e

D.

mstflint -d q full

Buy Now
Questions 11

For an NVIDIA Enterprise AI Factory with 256 GPUs, which storage solution characteristic is most critical to validate during scaling tests?

Options:

A.

Consistent per-node throughput >8 GiB/s.

B.

Single-node write performance during idle clusters.

C.

RAID rebuild times under disk failure.

D.

Maximum 4K random read IOPS exceeding 1 million.

Buy Now
Questions 12

After running a 24-hour stress test on a DGX node, the administrator should verify which two key metrics to ensure system stability?

Options:

A.

Average CPU usage >80% and Docker container uptime.

B.

No thermal throttling events and consistent GPU utilization >95% throughout the test.

C.

SSD write endurance and RAM capacity.

D.

Total energy consumption and NVLink bandwidth.

Buy Now
Questions 13

A company has a registered NGC account and their server has NGC CLI installed. What step should be taken first to gain access to NGC?

Options:

A.

ngc config get

B.

ngc init

C.

ngc config set

D.

ngc config update

Buy Now
Questions 14

If two ports must be connected, but one is SFP and one is QSFP, for example, to connect a 25 GbE HOST CHANNEL ADAPTER to a QSFP port capable of both 100 GbE and 25 GbE, which of the following solutions would best meet this requirement?

Options:

A.

SFP Connectors

B.

SFP to 1G BASE-T (RJ45) adapter

C.

QSA Adapter

Buy Now
Questions 15

An engineer needs to validate 400G DAC cable signal integrity in a DGX cluster. Which CVT metric best identifies marginal cables needing replacement?

Options:

A.

Lane power variance < 3dB across all transceivers.

B.

Transceiver model matching QSFP-DD specifications.

C.

Temperature fluctuations > 5°C during validation.

D.

Effective BER > 1.5E-254 during a <6-hour monitoring window.

Buy Now
Questions 16

An engineer needs to verify NVLink isolation on a single node with 8 GPUs. Which NCCL test configuration stresses switch bisection bandwidth?

Options:

A.

Use NCCL_TESTS_SPLIT="DIV 8" with point-to-point tests

B.

Use all_reduce_perf -b 8 -e 16G -f 2 -g 8 with NCCL_TESTS_SPLIT="AND 0x1"

C.

Use reduce_scatter_perf -b 8 -e 16G -f 2 -g 4

D.

Use all_reduce_perf -b 8 -e 16G -f 2 -g 8 without splits

Buy Now
Questions 17

For a 48-hour NCCL burn-in test, which parameters ensure sustained fabric stress while detecting silent data corruption?

Options:

A.

broadcast_perf -b 4G -e 16G -w 160

B.

all_reduce_perf -b 8G -e 32G -c 1000 -z 1 -G 1000

C.

all_reduce_perf -b 8G -e 32G -z 1 -G 1000

D.

reduce_scatter_perf -f 2 -g 8

Buy Now
Questions 18

You are following the official steps to install the NVIDIA Container Toolkit using a package manager on Ubuntu. After importing the NVIDIA package repository and GPG key, what is the next action?

Options:

A.

Reboot the host system to apply the repository changes and proceed.

B.

Install the nvidia-container-toolkit package using your package manager.

C.

Format the disk to clear any existing NVIDIA-related dependencies first.

D.

Download the CUDA toolkit installer from NVIDIA'S official website.

Buy Now
Questions 19

A system administrator needs to configure a BlueField DPU and enable RShim on the baseboard management controller (BMC). Which command should be executed?

Options:

A.

ipmitool raw 0x32 0x6a 1

B.

systemctl restart rshim

C.

systemctl enable bmc-rshim.service

D.

scp root@:/dev/rshim0/boot

Buy Now
Questions 20

During cluster validation, the Cable Validation Tool (CVT) reports "Underperforming (BER)" for an InfiniBand link. Which BER thresholds indicate a critical signal quality issue requiring cable replacement?

Options:

A.

Rx power variance > 3dB between lanes

B.

Effective BER > 0 during the first 125 minutes of link operation

C.

Raw BER > 1e-12 or Effective BER > 1.5E-254 for <6hr measurements

D.

Temperature > 85°C on transceiver module

Buy Now
Questions 21

A system engineer needs to set the vGPU scheduling behavior for all GPUs to share the scheduling equally with the default time slice length. What command should be used?

Options:

A.

esxcli system module parameters set -m nvidia -p "NVreg_RegistryDwords=RmPVMRL=0x01"

B.

esxcli graphics module parameters set -m nvidia -p "NVreg_RegistryDwords=RmPVMRL=0x01"

C.

esxcli system module parameters set -m nvidia -p "NVreg_RegistryDwords=FRL=0x01"

D.

esxcli system module parameters set -m nvidia -p "NVreg_RegistryDwords=RmPVMRL=0x00"

Buy Now
Exam Code: NCP-AII
Exam Name: NVIDIA AI Infrastructure
Last Update: Apr 11, 2026
Questions: 71

PDF + Testing Engine

$63.52  $181.49

Testing Engine

$50.57  $144.49
buy now NCP-AII testing engine

PDF (Q&A)

$43.57  $124.49
buy now NCP-AII pdf