<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: MohamedELGAMASY</title>
    <description>The latest articles on DEV Community by MohamedELGAMASY (@gamasy).</description>
    <link>https://dev.to/gamasy</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F2810647%2F6117b041-3ab4-4fc9-ac32-7f35f3de8c47.jpeg</url>
      <title>DEV Community: MohamedELGAMASY</title>
      <link>https://dev.to/gamasy</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/gamasy"/>
    <language>en</language>
    <item>
      <title>GPU as a Service (GPUaaS): Empowering Accelerated AI and High-Performance Computing</title>
      <dc:creator>MohamedELGAMASY</dc:creator>
      <pubDate>Mon, 15 Sep 2025 18:14:14 +0000</pubDate>
      <link>https://dev.to/gamasy/gpu-as-a-service-gpuaas-empowering-accelerated-ai-and-high-performance-computing-31ie</link>
      <guid>https://dev.to/gamasy/gpu-as-a-service-gpuaas-empowering-accelerated-ai-and-high-performance-computing-31ie</guid>
      <description>&lt;h2&gt;
  
  
  Table of Contents
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Introduction&lt;/li&gt;
&lt;li&gt;Definition and Core Concept of GPU as a Service (GPUaaS)&lt;/li&gt;
&lt;li&gt;
How GPUaaS Operates: The Underlying Architecture and Mechanisms

&lt;ul&gt;
&lt;li&gt;3.1 Hardware Infrastructure: The Powerhouse
&lt;/li&gt;
&lt;li&gt;3.2 Orchestration Layers: Managing Complexity
&lt;/li&gt;
&lt;li&gt;3.3 APIs and SDKs: Facilitating Integration
&lt;/li&gt;
&lt;li&gt;3.4 Virtualization and Resource Sharing: Maximizing Utilization
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
Benefits of GPU as a Service: A Catalyst for Innovation and Efficiency

&lt;ul&gt;
&lt;li&gt;4.1 Unparalleled Cost Efficiency
&lt;/li&gt;
&lt;li&gt;4.2 Superior Scalability and Flexibility for Dynamic Workloads
&lt;/li&gt;
&lt;li&gt;4.3 Accelerated Speed and Enhanced Convenience
&lt;/li&gt;
&lt;li&gt;4.4 Simplified Resource Management and Minimized Downtime
&lt;/li&gt;
&lt;li&gt;4.5 Continuous Access to Latest Technology
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
Flexible Deployment Models in GPU as a Service

&lt;ul&gt;
&lt;li&gt;5.1 Bare Metal GPU
&lt;/li&gt;
&lt;li&gt;5.2 Virtual GPU (vGPU)
&lt;/li&gt;
&lt;li&gt;5.3 Managed Kubernetes
&lt;/li&gt;
&lt;li&gt;5.4 Self-managed GPUaaS
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
Use Cases and Applications of GPU as a Service

&lt;ul&gt;
&lt;li&gt;6.1 Artificial Intelligence (AI) and Machine Learning (ML) Model Training
&lt;/li&gt;
&lt;li&gt;6.2 Real-Time AI Inferencing
&lt;/li&gt;
&lt;li&gt;6.3 High-Performance Computing (HPC)
&lt;/li&gt;
&lt;li&gt;6.4 3D Rendering and Animation
&lt;/li&gt;
&lt;li&gt;6.5 Autonomous Systems and Robotics
&lt;/li&gt;
&lt;li&gt;6.6 Healthcare and Life Sciences
&lt;/li&gt;
&lt;li&gt;6.7 Manufacturing and Industrial Automation
&lt;/li&gt;
&lt;li&gt;6.8 Government and Public Sector
&lt;/li&gt;
&lt;li&gt;6.9 Banking and Financial Sector
&lt;/li&gt;
&lt;li&gt;6.10 Telecommunications
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;GPU as a Service Architecture Overview&lt;/li&gt;
&lt;li&gt;GPUaaS Platform Overview&lt;/li&gt;
&lt;li&gt;
Leading Providers in the GPU as a Service Market

&lt;ul&gt;
&lt;li&gt;9.1 Hyperscale Cloud Providers
&lt;/li&gt;
&lt;li&gt;9.2 Specialized GPUaaS Platforms
&lt;/li&gt;
&lt;li&gt;9.3 Comparative Overview of Selected Providers
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
OpenShift Integration with GPU as a Service

&lt;ul&gt;
&lt;li&gt;10.1 NVIDIA GPU Operator on OpenShift
&lt;/li&gt;
&lt;li&gt;10.2 Resource Management and Orchestration
&lt;/li&gt;
&lt;li&gt;10.3 OpenShift AI and Data Science Workflows
&lt;/li&gt;
&lt;li&gt;10.4 Bare Metal and Virtualized Deployments
&lt;/li&gt;
&lt;li&gt;10.5 Monitoring and Optimization
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;OpenShift GPU Integration Architecture&lt;/li&gt;
&lt;li&gt;NVIDIA GPU Operator Architecture on OpenShift&lt;/li&gt;
&lt;li&gt;Conclusion for GPU as a Service&lt;/li&gt;
&lt;li&gt;References for GPU as a Service&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;In the rapidly evolving landscape of Artificial Intelligence (AI), Machine Learning (ML), and High-Performance Computing (HPC), the demand for specialized and powerful computational resources has surged dramatically. Traditional CPU-based infrastructures often struggle to keep pace with the parallel processing requirements of modern AI models and complex simulations. This challenge has given rise to GPU as a Service (GPUaaS), a transformative cloud computing model that democratizes access to Graphics Processing Units (GPUs) – the workhorses of accelerated computing.&lt;/p&gt;

&lt;p&gt;GPUaaS offers a flexible, scalable, and cost-effective alternative to the traditional approach of purchasing, deploying, and maintaining expensive on-premises GPU hardware. By leveraging cloud infrastructure, organizations can access cutting-edge GPU technology on-demand, paying only for the resources they consume. This paradigm shift not only reduces significant capital expenditures but also accelerates innovation by providing immediate access to the computational power necessary for developing, training, and deploying advanced AI models and executing complex scientific workloads.&lt;/p&gt;

&lt;p&gt;This comprehensive article delves into the intricacies of GPU as a Service, exploring its fundamental definition, the underlying operational mechanisms, the myriad benefits it offers, various deployment models, and its wide-ranging applications across diverse industries. We will also touch upon the key players in the GPUaaS market, providing a holistic view of how this service model is empowering businesses and researchers to unlock the full potential of accelerated computing.&lt;/p&gt;

&lt;h2&gt;
  
  
  Definition and Core Concept of GPU as a Service (GPUaaS)
&lt;/h2&gt;

&lt;p&gt;GPU as a Service (GPUaaS) represents a paradigm shift in how organizations acquire and utilize high-performance computing resources. At its core, GPUaaS is a cloud computing model that provides on-demand access to Graphics Processing Units (GPUs) hosted within remote data centers. Instead of incurring the substantial capital expenditure and operational overhead associated with purchasing, deploying, maintaining, and upgrading physical GPU hardware, users can rent GPU resources through a cloud-based platform, adopting a pay-as-you-go consumption model where costs are directly tied to the computational power actively consumed [1, 2].&lt;/p&gt;

&lt;p&gt;This model has become indispensable in the current technological landscape, particularly with the explosive growth of AI and ML. Traditional Central Processing Units (CPUs), while versatile, are inherently designed for sequential processing and often prove inefficient for the massive parallel computations demanded by modern AI algorithms, deep learning model training, complex data simulations, and High-Performance Computing (HPC) tasks [2]. GPUs, in contrast, are architecturally optimized for parallel processing, featuring thousands of smaller, specialized cores that can execute numerous computations simultaneously. This inherent design allows GPUs to complete tasks in minutes that would typically take hours or even days on CPUs, thereby significantly accelerating AI optimization, facilitating the processing of massive datasets, and enabling complex computations with unparalleled efficiency [1, 2].&lt;/p&gt;

&lt;p&gt;The fundamental appeal of GPUaaS lies in its ability to transform capital expenditures (CapEx) into predictable operational expenses (OpEx). This financial restructuring simplifies budgeting, reduces the financial barrier to entry for advanced computing, and makes high-performance computing capabilities accessible to a diverse range of entities, from nascent startups and academic institutions to established enterprises. By abstracting the complexities of hardware management, GPUaaS allows innovators to focus their resources and expertise on developing groundbreaking AI applications and scientific discoveries, rather than on infrastructure procurement and maintenance [1].&lt;/p&gt;

&lt;h2&gt;
  
  
  How GPUaaS Operates: The Underlying Architecture and Mechanisms
&lt;/h2&gt;

&lt;p&gt;The operational efficacy of GPUaaS is predicated on a sophisticated cloud infrastructure that seamlessly abstracts the intricate complexities of hardware management from the end-user. This abstraction allows developers and data scientists to provision and utilize GPU resources without needing deep expertise in hardware maintenance, driver management, or physical infrastructure setup. The core components and operational mechanisms that underpin a typical GPUaaS offering include:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Hardware Infrastructure: The Powerhouse
&lt;/h3&gt;

&lt;p&gt;At the foundation of any GPUaaS offering is a robust hardware infrastructure comprising high-performance GPUs. These GPUs are strategically deployed in secure, geographically distributed data centers to ensure low latency and high availability. Providers invest in the latest and most powerful GPU architectures to meet the demanding computational needs of AI/ML, graphics rendering, and scientific research. Examples of such cutting-edge GPUs include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;NVIDIA A100 and H100/H200&lt;/strong&gt;: These are enterprise-grade GPUs specifically designed for AI training, inference, and HPC workloads. They feature advanced Tensor Cores, significant memory bandwidth, and specialized interconnect technologies (like NVLink) to accelerate complex computations [2]. The H100, for instance, offers up to 9x performance improvements for AI training tasks compared to its predecessor, the A100, due to its fourth-generation Tensor Cores and transformer optimizations [2].&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;NVIDIA L40 and L40S&lt;/strong&gt;: These GPUs are optimized for a balance of performance and cost-efficiency, suitable for a broader range of workloads including graphics, visualization, and mid-range AI inference [2].&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;AMD MI300x&lt;/strong&gt;: AMD's offerings, such as the Instinct MI300x, provide competitive performance for HPC and AI, leveraging technologies like Infinity Fabric for high-bandwidth interconnects, achieving impressive bandwidths for multi-node HPC workloads [2].&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These GPUs are housed in state-of-the-art data centers equipped with advanced cooling systems, redundant power supplies, and robust network connectivity to ensure optimal performance and reliability.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Orchestration Layers: Managing Complexity
&lt;/h3&gt;

&lt;p&gt;To efficiently manage and allocate these powerful GPU resources, GPUaaS platforms employ sophisticated orchestration layers. These layers are crucial for automating the deployment, scaling, and optimization of GPU workloads, ensuring that resources are utilized effectively and dynamically. Key orchestration technologies often include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Kubernetes&lt;/strong&gt;: As the de facto standard for container orchestration, Kubernetes plays a pivotal role in GPUaaS. It enables the deployment of containerized AI/ML applications, automatically managing the allocation of GPU resources to pods, scaling workloads based on demand, and ensuring high availability. Kubernetes simplifies the management of complex, distributed AI training and inference jobs [2, 3].&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;NVIDIA GPU Cloud (NGC)&lt;/strong&gt;: NGC is a comprehensive catalog of GPU-optimized software, including AI frameworks, HPC applications, and pre-trained models. It provides containerized environments that are pre-configured and tested to run efficiently on NVIDIA GPUs, further streamlining the deployment and management of AI workloads within GPUaaS platforms [2].&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These orchestration tools work in concert to provide a seamless experience for users, abstracting away the underlying infrastructure complexities and allowing them to focus on their AI development.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. APIs and SDKs: Facilitating Integration
&lt;/h3&gt;

&lt;p&gt;Integration is a critical aspect of GPUaaS, allowing developers to easily incorporate GPU capabilities into their existing workflows and applications. This is achieved through well-documented Application Programming Interfaces (APIs) and Software Development Kits (SDKs). These interfaces enable programmatic access to GPU resources and services, supporting popular AI/ML frameworks and libraries:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;TensorFlow, PyTorch, and Keras&lt;/strong&gt;: These are leading open-source machine learning frameworks widely used for deep learning. GPUaaS platforms provide optimized environments and APIs that allow seamless execution of models built with these frameworks on their GPU infrastructure [2, 3].&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;CUDA&lt;/strong&gt;: NVIDIA's CUDA (Compute Unified Device Architecture) is a parallel computing platform and programming model that enables dramatic increases in computing performance by harnessing the power of GPUs. GPUaaS environments often provide direct or abstracted access to CUDA, allowing for highly optimized custom GPU programming [2].&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These APIs and SDKs streamline the AI model development, training, and deployment lifecycle, making it easier for developers to leverage the full power of GPUs without needing to manage low-level hardware interactions.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Virtualization and Resource Sharing: Maximizing Utilization
&lt;/h3&gt;

&lt;p&gt;To achieve maximum resource utilization and cost-efficiency, GPUaaS platforms employ advanced virtualization technologies. This allows multiple users or applications to share a single physical GPU without interference, a concept often referred to as multi-tenancy. Key aspects include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Virtual GPU (vGPU)&lt;/strong&gt;: Technologies like NVIDIA vGPU enable the partitioning of a physical GPU into multiple virtual instances. Each vGPU can be assigned to a virtual machine (VM) or container, providing dedicated GPU memory and compute resources. This allows for optimized resource sharing across various workloads or users, enhancing overall utilization and flexibility [3].&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Time-Slicing&lt;/strong&gt;: For certain workloads, GPUs can be time-sliced, allowing different processes to take turns using the GPU's compute units. This method is particularly useful for smaller, bursty workloads that don't require continuous, exclusive access to a full GPU.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;By enabling efficient resource sharing, GPUaaS providers can offer more flexible and cost-effective solutions, ensuring that computational power is delivered precisely where and when it is needed [2].&lt;/p&gt;

&lt;h2&gt;
  
  
  Benefits of GPU as a Service: A Catalyst for Innovation and Efficiency
&lt;/h2&gt;

&lt;p&gt;The adoption of GPU as a Service (GPUaaS) offers a multitude of compelling advantages that directly address the inherent challenges associated with traditional on-premises GPU infrastructure. These benefits collectively position GPUaaS as a critical enabler for organizations striving to innovate, optimize costs, and accelerate their AI and HPC initiatives.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Unparalleled Cost Efficiency
&lt;/h3&gt;

&lt;p&gt;One of the most significant benefits of GPUaaS is its profound impact on cost efficiency. Investing in high-performance GPUs, especially the latest generations, involves substantial upfront capital expenditure. Beyond the initial purchase, there are ongoing costs associated with maintenance, power consumption, cooling systems, and the physical space required to house these powerful machines. These factors can create a significant financial barrier for many businesses, particularly startups and smaller enterprises [1, 2].&lt;/p&gt;

&lt;p&gt;GPUaaS mitigates this burden by operating on a pay-as-you-go model. Users are charged only for the compute time they actively consume, whether it's on an hourly, daily, or monthly basis. This model is exceptionally beneficial for projects with fluctuating or intermittent computational demands, as it prevents underutilization of expensive hardware. By converting large capital expenditures (CapEx) into predictable operational expenses (OpEx), GPUaaS simplifies budgeting and allows organizations to redirect saved funds towards higher-ROI initiatives, fostering greater financial agility [1, 2, 3].&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Superior Scalability and Flexibility for Dynamic Workloads
&lt;/h3&gt;

&lt;p&gt;AI and ML projects are characterized by their dynamic and often unpredictable compute needs. The early phases of model development and experimentation might require minimal resources, while the intensive training phases of large deep learning models can demand hundreds or even thousands of GPUs simultaneously. Scaling physical GPU infrastructure to match these fluctuating demands is a slow, complex, and costly endeavor, often leading to either over-provisioning (and wasted resources) or under-provisioning (and performance bottlenecks) [2].&lt;/p&gt;

&lt;p&gt;GPUaaS elegantly solves this challenge by offering elastic, on-demand scaling. Users can instantly provision additional GPU resources during peak computational periods and seamlessly scale down when demand subsides, optimizing both performance and cost. This flexibility extends to choosing from a diverse array of GPU types (e.g., NVIDIA H100, A100, L40, AMD MI300x) tailored to specific project requirements, ensuring that the right computational power is always available. Furthermore, the cloud-based nature of GPUaaS allows access to these resources from anywhere with an internet connection, facilitating remote work and global collaboration [1, 2, 3].&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Accelerated Speed and Enhanced Convenience
&lt;/h3&gt;

&lt;p&gt;The traditional procurement and setup of on-premises GPU infrastructure can take weeks or even months, delaying critical AI projects. GPUaaS dramatically shortens this timeline, enabling organizations to initiate AI projects within minutes. Providers offer pre-configured environments, often bundled with popular AI/ML frameworks and libraries, which eliminate the need for complex setup processes, driver installations, and software configurations. This immediate readiness allows developers and data scientists to bypass infrastructure hurdles and dive directly into their core work [1, 2].&lt;/p&gt;

&lt;p&gt;Moreover, GPUaaS ensures continuous access to the latest GPU technology. Providers regularly update their hardware, meaning users automatically benefit from cutting-edge advancements without the burden of managing hardware upgrades or dealing with obsolescence. This continuous access to state-of-the-art technology is crucial for maintaining a competitive edge in fast-paced AI research and development [1, 2].&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Simplified Resource Management and Minimized Downtime
&lt;/h3&gt;

&lt;p&gt;Managing physical GPU setups demands specialized IT staff with expertise in hardware maintenance, driver updates, performance monitoring, and troubleshooting. These tasks are time-consuming and resource-intensive, and any issues can lead to significant downtime, impacting productivity and project timelines. In contrast, GPUaaS providers assume full responsibility for backend management, including hardware upkeep, driver updates, security patches, and infrastructure maintenance [2, 3].&lt;/p&gt;

&lt;p&gt;This abstraction minimizes downtime and frees internal teams from operational burdens, allowing them to concentrate on strategic AI development. GPUaaS platforms typically include built-in performance monitoring tools that provide users with clear insights into resource utilization and application performance, all without requiring manual intervention. This simplified management significantly reduces operational complexity and enhances overall system reliability [2, 3].&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Continuous Access to Latest Technology
&lt;/h3&gt;

&lt;p&gt;The rapid pace of innovation in GPU technology means that physical hardware can become outdated within a relatively short period, leading to a constant cycle of costly upgrades. GPUaaS providers, however, continuously refresh their infrastructure with the newest offerings from manufacturers like NVIDIA (e.g., Ampere, Hopper architectures) and AMD (e.g., RDNA series, Instinct accelerators). This ensures that users always have access to the most advanced and powerful GPUs available on the market [2, 3].&lt;/p&gt;

&lt;p&gt;This continuous access to cutting-edge technology provides businesses with a crucial first-mover advantage, boosting computational efficiency, enabling the development of more sophisticated AI models, and helping innovation-driven companies remain competitive and operate at peak performance. It eliminates the risk of hardware obsolescence and ensures that computational capabilities are always aligned with the latest technological advancements.&lt;/p&gt;

&lt;h2&gt;
  
  
  Flexible Deployment Models in GPU as a Service
&lt;/h2&gt;

&lt;p&gt;GPUaaS providers offer a variety of deployment models, each designed to cater to specific enterprise requirements, workload characteristics, and levels of control desired by the user. These models provide flexibility in how GPU resources are consumed and managed, allowing organizations to choose the approach that best aligns with their technical and operational needs [3].&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Bare Metal GPU
&lt;/h3&gt;

&lt;p&gt;This deployment model provides users with direct, exclusive access to physical GPU hardware without any virtualization layer. In a bare metal GPU setup, the user has maximum control over the operating system, drivers, and software stack. This model is particularly favored for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Maximum Performance&lt;/strong&gt;: Eliminating the virtualization overhead ensures the highest possible performance, making it ideal for highly demanding workloads such as large-scale AI model training, complex scientific simulations, and real-time data processing where every millisecond counts.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Complete Control&lt;/strong&gt;: Users have full administrative access to the GPU server, allowing for deep customization of the environment, installation of specialized drivers, and fine-tuning of performance parameters.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Specific Compliance Needs&lt;/strong&gt;: Certain regulatory or compliance requirements may necessitate direct access to physical hardware, which bare metal GPUaaS can provide.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;While offering unparalleled performance and control, bare metal GPUaaS typically comes with a higher cost and requires more expertise from the user to manage the software stack [3].&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Virtual GPU (vGPU)
&lt;/h3&gt;

&lt;p&gt;Virtual GPU (vGPU) technology allows a single physical GPU to be partitioned and shared among multiple virtual machines (VMs) or containers. Each vGPU instance is presented to the VM as a dedicated GPU, complete with its own allocated memory and compute resources. This model is optimized for resource sharing and is particularly beneficial for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Optimized Resource Utilization&lt;/strong&gt;: By sharing a single powerful GPU among multiple users or workloads, vGPU significantly increases the overall utilization rate of the hardware, leading to cost savings.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Flexibility and Isolation&lt;/strong&gt;: Each vGPU operates independently, providing a consistent and isolated experience for each user or application. This is crucial in multi-tenant environments where different users might have varying demands.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Desktop Virtualization and Graphics-Intensive Applications&lt;/strong&gt;: vGPU is widely used in virtual desktop infrastructure (VDI) environments to deliver high-performance graphics to remote users, as well as for running graphics-intensive applications like CAD/CAM, medical imaging, and content creation tools.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;NVIDIA vGPU solutions, for example, allow administrators to precisely allocate GPU resources, ensuring that each virtual machine receives the necessary performance for its tasks [3].&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Managed Kubernetes
&lt;/h3&gt;

&lt;p&gt;In this deployment model, the GPUaaS provider manages the underlying Kubernetes clusters, which are specifically configured to orchestrate GPU-accelerated workloads. Users deploy their containerized AI/ML applications onto these managed clusters without needing to manage the Kubernetes control plane or the underlying infrastructure. Key advantages include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Simplified Orchestration&lt;/strong&gt;: The complexities of setting up, configuring, and maintaining Kubernetes are handled by the provider, allowing users to focus solely on their applications.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Scalability and Automation&lt;/strong&gt;: Managed Kubernetes environments inherently offer dynamic scaling of GPU resources based on workload demands, automated deployment, and self-healing capabilities.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Containerization Benefits&lt;/strong&gt;: Leveraging containers (e.g., Docker) ensures application portability, consistency across different environments, and efficient resource packaging.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This model is ideal for organizations that want to leverage the power of Kubernetes for their AI workloads but prefer to offload the operational burden of cluster management [3].&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Self-managed GPUaaS
&lt;/h3&gt;

&lt;p&gt;This model offers a higher degree of control to the tenant, who manages their environment up to the Kubernetes control plane or even the operating system level, while the provider typically handles the physical hardware and basic infrastructure. This approach provides a balance between full control and managed services, suitable for users who:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Require Customization&lt;/strong&gt;: Need to install specific software, drivers, or configurations that might not be supported in fully managed environments.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Have Specific Security Policies&lt;/strong&gt;: Can implement their own security measures and access controls at a deeper level within their managed environment.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Possess Internal Expertise&lt;/strong&gt;: Have the in-house expertise to manage and optimize their Kubernetes clusters or operating systems.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Self-managed GPUaaS still benefits from the cloud provider's robust infrastructure and automated provisioning, billing, and governance mechanisms, but grants the user more granular control over their compute environment [3].&lt;/p&gt;

&lt;h2&gt;
  
  
  Use Cases and Applications of GPU as a Service
&lt;/h2&gt;

&lt;p&gt;GPU as a Service (GPUaaS) is a versatile technology that underpins a vast array of demanding applications across numerous industries. Its ability to provide scalable, on-demand access to high-performance computational power makes it indispensable for workloads that require intensive parallel processing. The following are some of the key sectors and applications where GPUaaS is making a significant impact:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Artificial Intelligence (AI) and Machine Learning (ML) Model Training
&lt;/h3&gt;

&lt;p&gt;One of the most prominent applications of GPUaaS is in the training of large-scale AI and ML models, particularly deep learning networks. Models such as OpenAI’s GPT-4, while requiring immense computational power, represent the pinnacle of this demand. However, even lighter alternatives or custom models can be trained or fine-tuned efficiently on cloud GPUs, offering cost-effective experimentation and deployment for smaller teams and specialized projects [2].&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Deep Learning&lt;/strong&gt;: GPUs excel at the matrix multiplications and parallel computations central to deep neural networks. GPUaaS provides the necessary infrastructure for training complex models like Convolutional Neural Networks (CNNs) for image recognition, Recurrent Neural Networks (RNNs) for natural language processing, and Transformer models for advanced language understanding.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Accelerated Training&lt;/strong&gt;: Modern GPUs, such as NVIDIA H100, feature advanced Tensor Cores and transformer optimizations that can offer significant performance improvements (e.g., up to 9x compared to prior generations like A100) for AI training tasks. GPUaaS platforms make these cutting-edge GPUs readily available, allowing researchers and developers to iterate faster and achieve higher accuracy in their models [2].&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Hyperparameter Tuning&lt;/strong&gt;: The process of finding the optimal set of hyperparameters for an ML model often involves running numerous training experiments. GPUaaS allows for parallel execution of these experiments, drastically reducing the time required for model optimization.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  2. Real-Time AI Inferencing
&lt;/h3&gt;

&lt;p&gt;Beyond training, GPUaaS is crucial for deploying trained AI models into production for real-time inferencing. Inferencing involves using a trained model to make predictions or decisions on new, unseen data. For many applications, this process requires extremely low latency and high throughput to deliver immediate results.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Low-Latency Predictions&lt;/strong&gt;: Applications like customer support chatbots, fraud detection systems, and autonomous driving require AI models to process data and respond in milliseconds. GPUaaS facilitates the deployment of inferencing pipelines on-demand, ensuring that these applications can meet their performance requirements.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;High-Throughput Processing&lt;/strong&gt;: For scenarios involving a large volume of concurrent requests, such as real-time natural language processing (NLP) tasks or image analysis from multiple video streams, GPUaaS platforms can efficiently manage the workload. Tools like NVIDIA Triton Inference Server, often hosted on GPUaaS, are designed to optimize the deployment and execution of AI models in production environments [2].&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  3. High-Performance Computing (HPC)
&lt;/h3&gt;

&lt;p&gt;HPC encompasses a broad range of scientific, engineering, and business applications that require massive computational power to solve complex problems. GPUaaS provides the necessary infrastructure for these demanding workloads, which often involve large-scale simulations and data analysis.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Scientific Simulations&lt;/strong&gt;: Fields such as molecular dynamics (e.g., protein folding simulations), climate modeling, astrophysics, and computational fluid dynamics (CFD) rely heavily on HPC. GPUs accelerate these simulations by performing parallel calculations much faster than CPUs.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Data Analytics&lt;/strong&gt;: Processing and analyzing massive datasets in fields like genomics, financial modeling, and seismic exploration benefit from the parallel processing capabilities of GPUs. GPUaaS enables researchers to run complex algorithms and extract insights more rapidly.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Accelerators&lt;/strong&gt;: Advanced GPU accelerators, such as AMD’s Instinct MI200, available via GPUaaS, enable multi-node HPC workloads by leveraging high-bandwidth interconnects like Infinity Fabric, achieving impressive data transfer rates (e.g., up to 3.2 TB/s bandwidth) [2].&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  4. 3D Rendering and Animation
&lt;/h3&gt;

&lt;p&gt;The media and entertainment industry, particularly in areas like film production, video game development, and architectural visualization, heavily relies on GPUs for rendering complex 3D graphics and animations.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Accelerated Rendering&lt;/strong&gt;: Rendering a single frame of a high-fidelity 3D animated film can take hours or even days on traditional CPU-based systems. With GPUaaS, studios can scale their rendering pipelines across hundreds or thousands of GPUs, significantly cutting down production times and enabling faster iteration on creative designs [2].&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Distributed Rendering&lt;/strong&gt;: Services like AWS Thinkbox, often integrated with GPUaaS, enable distributed rendering, allowing large rendering jobs to be broken down and processed concurrently across multiple GPU instances, leading to massive time savings.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  5. Autonomous Systems and Robotics
&lt;/h3&gt;

&lt;p&gt;Autonomous vehicles, drones, and robotics generate and process vast amounts of data from various sensors (cameras, LiDAR, radar) in real-time. GPUaaS plays a critical role in both the development and operational phases of these systems.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Sensor Data Processing&lt;/strong&gt;: GPUs are essential for processing and interpreting the continuous streams of sensor data, enabling tasks like object detection, scene understanding, and path planning.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Simulation and Testing&lt;/strong&gt;: Before deployment in the real world, autonomous systems undergo extensive simulation and testing in virtual environments. GPUaaS allows developers to run these simulations at scale, using platforms like NVIDIA Drive AGX, which are optimized for autonomous workloads, to validate algorithms and ensure safety [2].&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  6. Healthcare and Life Sciences
&lt;/h3&gt;

&lt;p&gt;GPUaaS is transforming healthcare and life sciences by accelerating research, diagnosis, and treatment development.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Medical Imaging&lt;/strong&gt;: Accelerating the processing and analysis of high-resolution medical images (e.g., MRI, CT scans) for faster and more accurate diagnoses, including AI-powered anomaly detection.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Genomics and Drug Discovery&lt;/strong&gt;: Speeding up genomic sequencing, protein folding simulations, and drug discovery processes, leading to breakthroughs in personalized medicine and new therapies [3].&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Clinical AI Models&lt;/strong&gt;: Deployment of AI models for predictive analytics in clinical settings, such as predicting patient outcomes or identifying at-risk individuals [3].&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  7. Manufacturing and Industrial Automation
&lt;/h3&gt;

&lt;p&gt;In manufacturing, GPUaaS supports the digital transformation of production processes.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Digital Twin Simulations&lt;/strong&gt;: Creating and simulating digital twins of factories, products, and processes to optimize operations, predict failures, and test new designs virtually [3].&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Quality Control&lt;/strong&gt;: Implementing computer vision systems for automated quality inspection, identifying defects with high precision and speed.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Process Optimization&lt;/strong&gt;: Using AI models trained on GPUaaS to optimize complex manufacturing processes, reduce waste, and improve efficiency [3].&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  8. Government and Public Sector
&lt;/h3&gt;

&lt;p&gt;Government entities are leveraging GPUaaS for various public services and national security applications.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;E-governance Platforms&lt;/strong&gt;: Accelerating data processing and AI capabilities for more efficient and responsive e-governance services [3].&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Cybersecurity&lt;/strong&gt;: Enhancing AI-driven threat detection and anomaly identification in vast network traffic to bolster national cybersecurity [3].&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Advanced Analytics&lt;/strong&gt;: Powering advanced analytics for public safety, urban planning, and resource management [3].&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  9. Banking and Financial Sector
&lt;/h3&gt;

&lt;p&gt;Financial institutions use GPUaaS for high-speed data processing and complex algorithmic tasks.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Fraud Detection&lt;/strong&gt;: Deploying real-time anomaly detection models to identify and prevent fraudulent transactions [3].&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Algorithmic Trading&lt;/strong&gt;: Executing complex trading strategies that require rapid analysis of market data and high-frequency decision-making.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Risk Management&lt;/strong&gt;: Running sophisticated simulations for risk assessment and portfolio optimization [3].&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  10. Telecommunications
&lt;/h3&gt;

&lt;p&gt;In the telecom industry, GPUaaS supports network optimization and service innovation.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Network Optimization&lt;/strong&gt;: Analyzing vast amounts of network data to predict congestion, optimize traffic flow, and enhance service quality [3].&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Digital Twin Simulations&lt;/strong&gt;: Creating digital twins of network infrastructure to simulate changes and predict their impact before physical implementation [3].&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;AI-driven Operations&lt;/strong&gt;: Implementing AI for predictive maintenance of network equipment and automating operational tasks [3].&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These diverse applications underscore the transformative power of GPUaaS, making high-performance computing accessible and scalable for virtually any organization seeking to harness the power of AI and advanced analytics.&lt;/p&gt;

&lt;h2&gt;
  
  
  GPU as a Service Architecture Overview
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffdbdcbffrzdh2d9iro7j.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffdbdcbffrzdh2d9iro7j.png" alt="GPU as a Service Architecture" width="800" height="417"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Figure 1: Secure, Deliver and Optimize Gen AI App - an Enterprise View [Source: F5]&lt;/p&gt;

&lt;p&gt;This architectural diagram provides a high-level enterprise view of how GPU as a Service can be strategically integrated to secure, deliver, and optimize Generative AI applications. It illustrates a comprehensive ecosystem that spans from edge computing environments to core data centers, highlighting key components such as secure gateways, the AI Factory (which represents the GPUaaS layer), and various data management and processing units. The diagram emphasizes the end-to-end workflow for deploying and managing AI applications, ensuring both performance and security across distributed infrastructures. It showcases how GPUaaS acts as a foundational element within a broader AI deployment strategy, enabling scalable and efficient execution of AI workloads.&lt;/p&gt;

&lt;h2&gt;
  
  
  Leading Providers in the GPU as a Service Market
&lt;/h2&gt;

&lt;p&gt;The GPUaaS market is characterized by a diverse ecosystem of providers, ranging from hyperscale cloud giants to specialized platforms, each offering unique strengths and catering to different customer segments. Understanding the competitive landscape is crucial for organizations to make informed decisions about which provider best suits their specific needs [1].&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Hyperscale Cloud Providers
&lt;/h3&gt;

&lt;p&gt;Major cloud service providers (CSPs) like Amazon Web Services (AWS), Microsoft Azure, and Google Cloud Platform (GCP) dominate the GPUaaS market in terms of sheer infrastructure scale and global reach. They offer a vast array of GPU instances, integrated services, and extensive ecosystems. Their strengths include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Massive Infrastructure&lt;/strong&gt;: Unparalleled scale and global distribution of data centers, ensuring high availability and low latency for users worldwide.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Extensive Integrations&lt;/strong&gt;: Deep integration with a wide range of other cloud services (e.g., storage, databases, networking, AI/ML platforms), enabling end-to-end cloud solutions.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Robust Enterprise Focus&lt;/strong&gt;: Designed to meet the stringent security, compliance, and support requirements of large enterprises.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Diverse GPU Offerings&lt;/strong&gt;: Access to a broad portfolio of GPU types, often including the latest NVIDIA and AMD accelerators.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;However, their pricing models can sometimes be geared towards enterprises with high minimum spend commitments, which might be less flexible for smaller teams or startups [1].&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Specialized GPUaaS Platforms
&lt;/h3&gt;

&lt;p&gt;Alongside the hyperscalers, a growing number of specialized GPUaaS providers focus specifically on delivering optimized GPU resources for AI/ML and HPC workloads. These platforms often differentiate themselves through:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Competitive Pricing&lt;/strong&gt;: More flexible, usage-based pricing models without long-term contracts, making them attractive for startups and projects with fluctuating demands [1].&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Faster Access to New Architectures&lt;/strong&gt;: Often quicker to adopt and offer the very latest GPU hardware generations, providing a cutting-edge advantage.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Optimized Environments&lt;/strong&gt;: Pre-configured environments and tools specifically tailored for AI workflows, reducing setup time and complexity.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Specialized Support&lt;/strong&gt;: Dedicated support teams with deep expertise in GPU-accelerated computing and AI/ML applications.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Examples of such providers include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Northflank&lt;/strong&gt;: Offers a comprehensive deployment platform that integrates GPUs with CI/CD pipelines, monitoring, and deployment tools, providing a complete development workflow [1].&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;NVIDIA DGX Cloud&lt;/strong&gt;: Provides access to NVIDIA’s powerful DGX systems in the cloud, optimized for high-performance AI training with the latest hardware and NVIDIA’s full software stack [1].&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Neysa.ai&lt;/strong&gt;: Focuses on providing GPUaaS for AI/ML, graphics, and scientific research activities, emphasizing cost-efficiency and scalability [2].&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Edarat Group&lt;/strong&gt;: Offers enterprise-grade GPU infrastructure hosted in sovereign data centers, purpose-built for AI/ML workloads and simulations, with flexible deployment models [3].&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Comparative Overview of Selected Providers
&lt;/h3&gt;

&lt;p&gt;To illustrate the diverse offerings, here's a comparative overview of some leading GPUaaS providers:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Provider&lt;/th&gt;
&lt;th&gt;Strengths&lt;/th&gt;
&lt;th&gt;Best For&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Northflank&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;GPU + full deployment platform, integrated CI/CD, competitive pricing&lt;/td&gt;
&lt;td&gt;Teams seeking a complete, simplified development workflow with cost-effective, usage-based pricing for GPU-intensive workloads [1]&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;AWS&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Largest infrastructure, extensive integrations, broad service portfolio&lt;/td&gt;
&lt;td&gt;Enterprise applications, existing AWS users, projects requiring deep integration with other cloud services [1]&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Microsoft Azure&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Robust enterprise focus, strong integration with Microsoft ecosystem&lt;/td&gt;
&lt;td&gt;Corporate environments, hybrid cloud strategies, users leveraging Office 365 and other Microsoft services [1]&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Google Cloud&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Advanced AI/ML services, competitive pricing, strong data analytics tools&lt;/td&gt;
&lt;td&gt;Data analytics, research projects, users leveraging Google's AI platform and open-source contributions [1]&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;NVIDIA DGX Cloud&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Latest hardware, optimized performance, NVIDIA software stack&lt;/td&gt;
&lt;td&gt;High-performance AI training, large-scale deep learning, users requiring NVIDIA's full ecosystem and support [1]&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Neysa.ai&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Cost-efficient, scalable, focus on AI/ML, graphics, scientific research&lt;/td&gt;
&lt;td&gt;Businesses with fluctuating or short-term workloads, startups, and academic research [2]&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Edarat Group&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Sovereign data centers, flexible deployment models, enterprise-grade&lt;/td&gt;
&lt;td&gt;Enterprises with specific compliance needs, those requiring bare metal or managed Kubernetes GPUaaS in specific regions [3]&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;This table highlights that while hyperscalers provide broad capabilities, specialized providers often offer tailored solutions that can be more advantageous for specific use cases, particularly for AI development teams seeking an optimal balance of performance, efficiency, and cost [1]. The choice of provider ultimately depends on an organization's specific technical requirements, budget constraints, existing infrastructure, and strategic objectives.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion for GPU as a Service
&lt;/h2&gt;

&lt;p&gt;GPU as a Service (GPUaaS) has emerged as a critical enabler for the rapid advancement and widespread adoption of Artificial Intelligence, Machine Learning, and High-Performance Computing. By offering on-demand, scalable, and cost-effective access to powerful GPU resources, GPUaaS effectively removes the traditional barriers of high capital investment, complex infrastructure management, and rapid hardware obsolescence. This cloud-based model empowers organizations of all sizes to leverage cutting-edge GPU technology, accelerating their AI initiatives, fostering innovation, and driving efficiency across a multitude of demanding workloads.&lt;/p&gt;

&lt;p&gt;The diverse deployment models, ranging from bare metal to managed Kubernetes, provide flexibility to cater to specific performance, control, and operational requirements. Furthermore, the extensive range of applications, from AI model training and real-time inferencing to scientific simulations and autonomous systems, underscores the transformative impact of GPUaaS across virtually every industry. As the demand for accelerated computing continues to grow, GPUaaS will undoubtedly remain a cornerstone technology, enabling businesses and researchers to push the boundaries of what is possible in the age of AI.&lt;/p&gt;

&lt;h2&gt;
  
  
  References for GPU as a Service
&lt;/h2&gt;

&lt;p&gt;[1] Northflank. (2025, September 10). &lt;em&gt;What is GPU-as-a-Service (GPUaaS)? Use cases and leading providers&lt;/em&gt;. Retrieved from &lt;a href="https://northflank.com/blog/gpu-as-a-service" rel="noopener noreferrer"&gt;https://northflank.com/blog/gpu-as-a-service&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;[2] Neysa.ai. (2025, September 3). &lt;em&gt;GPU as a Service (GPUaaS): What is, Benefits &amp;amp; Top Providers [2025]&lt;/em&gt;. Retrieved from &lt;a href="https://neysa.ai/blog/gpu-as-a-service/" rel="noopener noreferrer"&gt;https://neysa.ai/blog/gpu-as-a-service/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;[3] Edarat Group. (n.d.). &lt;em&gt;GPU As A Service&lt;/em&gt;. Retrieved from &lt;a href="https://edaratgroup.com/gpu-as-a-service/" rel="noopener noreferrer"&gt;https://edaratgroup.com/gpu-as-a-service/&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  GPUaaS Platform Overview
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fb3ihwgz54pv6i5ktiquz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fb3ihwgz54pv6i5ktiquz.png" alt="GPUaaS Platform Overview" width="800" height="634"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Figure 2: Platform Overview - NVIDIA AI Enterprise: VMware Deployment [Source: NVIDIA Documentation]&lt;/p&gt;

&lt;p&gt;This diagram illustrates a platform overview for GPUaaS, specifically in the context of NVIDIA AI Enterprise and VMware deployment. It highlights the various layers involved, from the underlying hardware (GPUs, networking, virtualization drivers) to the infrastructure software (Kubernetes Operators, Cluster Management) and application software (NVIDIA NIM, NeMo, Community Models, Partner Models, AI Application SDKs, frameworks, and libraries). This provides a detailed view of the software stack and components that enable GPUaaS within an enterprise environment, emphasizing the integration of NVIDIA technologies.&lt;/p&gt;

&lt;h2&gt;
  
  
  OpenShift Integration with GPU as a Service
&lt;/h2&gt;

&lt;p&gt;Red Hat OpenShift, as a leading enterprise Kubernetes platform, plays a pivotal role in enabling and optimizing GPU as a Service (GPUaaS) deployments. OpenShift provides a robust, scalable, and secure foundation for managing GPU resources, orchestrating GPU-accelerated workloads, and delivering GPUaaS capabilities to developers and data scientists. The integration of OpenShift with NVIDIA technologies, particularly through the NVIDIA GPU Operator, streamlines the consumption and management of GPUs, transforming raw hardware into a flexible, on-demand service.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. &lt;strong&gt;NVIDIA GPU Operator on OpenShift&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;The cornerstone of GPUaaS on OpenShift is the NVIDIA GPU Operator. This operator automates the deployment and management of all necessary NVIDIA software components, including:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;NVIDIA Drivers&lt;/strong&gt;: Ensures the correct drivers are installed and maintained for the GPUs.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Container Runtimes&lt;/strong&gt;: Configures the container runtime (e.g., &lt;code&gt;containerd&lt;/code&gt; with &lt;code&gt;nvidia-container-toolkit&lt;/code&gt;) to enable GPU access within containers.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Kubernetes Device Plugins&lt;/strong&gt;: Exposes GPU resources to Kubernetes, allowing pods to request GPUs.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Monitoring Tools&lt;/strong&gt;: Integrates with monitoring solutions to track GPU utilization and performance.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;MIG (Multi-Instance GPU) Support&lt;/strong&gt;: For NVIDIA Ampere architecture GPUs, the operator can configure MIG, allowing a single physical GPU to be partitioned into multiple, isolated GPU instances, each with its own dedicated resources. This significantly enhances GPU utilization and enables fine-grained resource allocation for diverse workloads.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;GPU Feature Discovery (NFD)&lt;/strong&gt;: The Node Feature Discovery (NFD) Operator is often deployed alongside the GPU Operator to label nodes with GPU-specific capabilities, enabling intelligent scheduling of GPU-accelerated workloads.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;By automating these complex tasks, the NVIDIA GPU Operator simplifies the process of making GPUs available as a service within an OpenShift cluster, reducing operational overhead and accelerating time to value for GPU-dependent applications.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. &lt;strong&gt;Resource Management and Orchestration&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;OpenShift's Kubernetes-native architecture provides powerful mechanisms for managing and orchestrating GPU resources:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Declarative Resource Management&lt;/strong&gt;: Developers can declaratively request GPU resources for their applications using Kubernetes manifests (e.g., &lt;code&gt;resources.limits.nvidia.com/gpu: 1&lt;/code&gt;). OpenShift's scheduler then intelligently places these workloads on nodes with available GPUs.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Workload Isolation&lt;/strong&gt;: OpenShift's containerization and namespace capabilities ensure strong isolation between GPU-accelerated workloads, preventing resource contention and enhancing security.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Scaling and High Availability&lt;/strong&gt;: OpenShift can automatically scale GPU-accelerated applications based on demand, ensuring that GPU resources are efficiently utilized. It also provides high availability features, such as self-healing and automated restarts, for GPU-dependent services.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Multi-tenancy&lt;/strong&gt;: OpenShift enables secure multi-tenancy, allowing multiple teams or users to share a single GPU-enabled cluster while maintaining isolation and resource quotas.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  3. &lt;strong&gt;OpenShift AI and Data Science Workflows&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;OpenShift AI (formerly Open Data Hub) further enhances GPUaaS by providing a comprehensive platform for AI/ML development and deployment. It integrates various tools and services that leverage GPU resources managed by OpenShift:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Jupyter Notebooks&lt;/strong&gt;: Data scientists can access GPU-accelerated Jupyter environments directly within OpenShift AI, enabling interactive development and experimentation with large datasets and complex models.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Model Training and Inferencing&lt;/strong&gt;: OpenShift AI facilitates the training of ML models on GPUs and their deployment for inferencing, providing a streamlined MLOps pipeline.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Integration with NVIDIA AI Enterprise&lt;/strong&gt;: OpenShift AI is designed to work seamlessly with NVIDIA AI Enterprise, a software suite that includes optimized AI frameworks, libraries, and tools. This integration ensures that GPU-accelerated workloads running on OpenShift benefit from NVIDIA's performance optimizations and enterprise-grade support.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  4. &lt;strong&gt;Bare Metal and Virtualized Deployments&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;OpenShift supports various deployment models for GPUaaS:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Bare Metal&lt;/strong&gt;: For maximum performance and direct access to GPU hardware, OpenShift can be deployed on bare metal servers with NVIDIA GPUs. The GPU Operator handles the necessary configurations for these environments.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Virtualized Environments&lt;/strong&gt;: OpenShift Virtualization allows for the creation of virtual machines (VMs) within the OpenShift cluster. The NVIDIA GPU Operator can be configured to deploy GPU components within these VMs, enabling GPUaaS in virtualized settings. This is particularly useful for scenarios requiring strong isolation or specific operating system requirements for GPU workloads.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  5. &lt;strong&gt;Monitoring and Optimization&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;OpenShift, in conjunction with the NVIDIA GPU Operator, provides robust monitoring capabilities for GPU resources. This includes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;GPU Metrics&lt;/strong&gt;: Access to detailed GPU metrics (e.g., utilization, memory usage, temperature) through OpenShift's monitoring stack (Prometheus and Grafana).&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Time-Slicing&lt;/strong&gt;: For GPUs that do not support MIG, OpenShift can leverage time-slicing to allow multiple containers to share a single GPU, improving utilization for smaller workloads.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;By providing a unified platform for managing, orchestrating, and monitoring GPU resources, OpenShift significantly simplifies the delivery of GPU as a Service, empowering organizations to accelerate their AI, ML, and HPC initiatives with greater efficiency and agility.&lt;/p&gt;

&lt;h3&gt;
  
  
  GPU as a Service Architecture Diagram
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F633ntybvsn5jrxw95r75.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F633ntybvsn5jrxw95r75.webp" alt="GPU as a Service Architecture" width="800" height="696"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  GPUaaS Platform Overview
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Feyx4tobdxdfi05nji9w6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Feyx4tobdxdfi05nji9w6.png" alt="GPUaaS Platform Overview" width="800" height="595"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  OpenShift GPU Integration Architecture
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F656q7iku4puv47gw5imb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F656q7iku4puv47gw5imb.png" alt="OpenShift GPU Integration Architecture" width="800" height="448"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  NVIDIA GPU Operator Architecture on OpenShift
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F64givp6zmnamjgoaseic.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F64givp6zmnamjgoaseic.png" alt="NVIDIA GPU Operator Architecture" width="502" height="642"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>openshift</category>
      <category>ai</category>
      <category>gpuaas</category>
      <category>cloudnative</category>
    </item>
    <item>
      <title>[Part01] Getting Started with Red Hat OpenShift with NVIDIA</title>
      <dc:creator>MohamedELGAMASY</dc:creator>
      <pubDate>Sat, 05 Jul 2025 09:07:51 +0000</pubDate>
      <link>https://dev.to/gamasy/getting-started-with-red-hat-openshift-with-nvidia-k23</link>
      <guid>https://dev.to/gamasy/getting-started-with-red-hat-openshift-with-nvidia-k23</guid>
      <description>&lt;h1&gt;
  
  
  Getting Started with Red Hat OpenShift with NVIDIA
&lt;/h1&gt;

&lt;h2&gt;
  
  
  Table of Contents
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Introduction&lt;/li&gt;
&lt;li&gt;
Remote Direct Memory Access (RDMA)

&lt;ul&gt;
&lt;li&gt;Introduction to RDMA&lt;/li&gt;
&lt;li&gt;RDMA Protocols and Network Technologies&lt;/li&gt;
&lt;li&gt;Verifying RDMA Capability in OpenShift&lt;/li&gt;
&lt;li&gt;RDMA Configuration Options in OpenShift&lt;/li&gt;
&lt;li&gt;RDMA Network Configuration&lt;/li&gt;
&lt;li&gt;Testing and Verification&lt;/li&gt;
&lt;li&gt;Performance Optimization&lt;/li&gt;
&lt;li&gt;Common Issues and Troubleshooting&lt;/li&gt;
&lt;li&gt;Integration with Storage and GPU Workloads&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
NVIDIA GPU Architecture

&lt;ul&gt;
&lt;li&gt;Introduction to GPU Concurrency and Sharing Mechanisms&lt;/li&gt;
&lt;li&gt;GPU Sharing Technologies&lt;/li&gt;
&lt;li&gt;Deployment Considerations for Different OpenShift Scenarios&lt;/li&gt;
&lt;li&gt;Implementation Guidelines&lt;/li&gt;
&lt;li&gt;Performance Optimization&lt;/li&gt;
&lt;li&gt;Integration with RDMA for High-Performance Computing&lt;/li&gt;
&lt;li&gt;Troubleshooting&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Conclusion&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;This comprehensive guide provides detailed guidance for architects, consultants, and practitioners involved in implementing Red Hat OpenShift with NVIDIA networking hardware and GPU technologies. The guide offers methodologies, best practices, and configuration examples to help organizations effectively leverage NVIDIA technologies in OpenShift environments for high-performance computing, AI/ML workloads, and other latency-sensitive applications.&lt;/p&gt;

&lt;p&gt;NVIDIA technologies, when integrated with Red Hat OpenShift, provide high-bandwidth, low-latency connectivity and powerful GPU acceleration essential for modern data-intensive workloads. This guide covers both RDMA (Remote Direct Memory Access) networking and NVIDIA GPU architecture to provide a complete reference for implementation.&lt;/p&gt;

&lt;h2&gt;
  
  
  Remote Direct Memory Access (RDMA)
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Introduction to RDMA
&lt;/h3&gt;

&lt;p&gt;Remote Direct Memory Access (RDMA) is a technology that enables direct memory access from the memory of one computer to the memory of another without involving either computer's operating system, CPU, or cache. RDMA provides high-throughput, low-latency networking by bypassing traditional networking stacks and reducing CPU overhead, making it ideal for data-intensive workloads in OpenShift environments.&lt;/p&gt;

&lt;p&gt;Key benefits of RDMA include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Reduced Latency&lt;/strong&gt;: By bypassing the OS kernel and CPU, RDMA significantly reduces communication latency&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Higher Bandwidth&lt;/strong&gt;: Enables near line-rate data transfer speeds&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Lower CPU Utilization&lt;/strong&gt;: Offloads data transfer operations from the CPU to the network adapter&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Zero-Copy Networking&lt;/strong&gt;: Data is transferred directly between application memory spaces without intermediate copies&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Kernel Bypass&lt;/strong&gt;: Communication bypasses the operating system kernel, reducing context switches and interrupts&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;RDMA is particularly valuable for OpenShift deployments running high-performance computing (HPC) workloads, AI/ML training and inference, database applications, and storage systems that require high-bandwidth, low-latency communication.&lt;/p&gt;

&lt;h3&gt;
  
  
  RDMA Protocols and Network Technologies
&lt;/h3&gt;

&lt;p&gt;RDMA can be implemented using several protocols and network technologies:&lt;/p&gt;

&lt;h4&gt;
  
  
  InfiniBand
&lt;/h4&gt;

&lt;p&gt;InfiniBand is a specialized high-performance network technology designed specifically for high-throughput, low-latency communications. It provides native support for RDMA and is commonly used in HPC environments.&lt;/p&gt;

&lt;p&gt;Key characteristics:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Purpose-built for high reliability, high bandwidth, and low latency&lt;/li&gt;
&lt;li&gt;Uses a cut-through approach with 16-bit LID for faster forwarding&lt;/li&gt;
&lt;li&gt;Provides end-to-end flow control for lossless networking&lt;/li&gt;
&lt;li&gt;Built-in software-defined networking with subnet manager&lt;/li&gt;
&lt;li&gt;Requires specialized hardware and infrastructure&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Note&lt;/strong&gt;: OpenShift installation requires Ethernet connectivity for the cluster API traffic. InfiniBand can only be used as a secondary network for application traffic after the cluster is installed.&lt;/p&gt;

&lt;h4&gt;
  
  
  RDMA over Converged Ethernet (RoCE)
&lt;/h4&gt;

&lt;p&gt;RoCE enables RDMA functionality over standard Ethernet networks, making it more accessible and cost-effective than InfiniBand while still providing many of the performance benefits.&lt;/p&gt;

&lt;p&gt;Key characteristics:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;RoCEv1: Layer 2 protocol that works within a single broadcast domain&lt;/li&gt;
&lt;li&gt;RoCEv2: Routable protocol that runs on top of UDP/IP (IPv4 or IPv6)&lt;/li&gt;
&lt;li&gt;Widely supported by modern network adapters&lt;/li&gt;
&lt;li&gt;Currently the most popular protocol for implementing RDMA&lt;/li&gt;
&lt;li&gt;Compatible with existing Ethernet infrastructure&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Internet Wide Area RDMA Protocol (iWARP)
&lt;/h4&gt;

&lt;p&gt;iWARP implements RDMA over TCP/IP networks, providing RDMA capabilities over standard TCP connections.&lt;/p&gt;

&lt;p&gt;Key characteristics:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Leverages TCP or SCTP for reliable transport&lt;/li&gt;
&lt;li&gt;Works over standard TCP/IP networks without specialized hardware&lt;/li&gt;
&lt;li&gt;Generally has higher latency than RoCE or InfiniBand&lt;/li&gt;
&lt;li&gt;More tolerant of packet loss and congestion&lt;/li&gt;
&lt;li&gt;Easier to deploy in existing TCP/IP networks&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Verifying RDMA Capability in OpenShift
&lt;/h3&gt;

&lt;p&gt;Before implementing RDMA in your OpenShift environment, you need to verify that your nodes have RDMA-capable hardware and that it's properly recognized by the system.&lt;/p&gt;

&lt;h4&gt;
  
  
  Using Node Feature Discovery (NFD)
&lt;/h4&gt;

&lt;p&gt;Node Feature Discovery automatically detects and labels nodes with RDMA capabilities. To verify RDMA capability:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Ensure NFD is installed and running in your cluster&lt;/li&gt;
&lt;li&gt;Check for RDMA-related labels on your nodes:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;oc get node &lt;span class="nt"&gt;-o&lt;/span&gt; json | jq &lt;span class="s1"&gt;'.items[0].metadata.labels | with_entries(select(.key | startswith("feature.node.kubernetes.io")))'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Look for the following labels which indicate RDMA capability:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;feature.node.kubernetes.io/rdma.available: "true"&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;feature.node.kubernetes.io/rdma.capable: "true"&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;feature.node.kubernetes.io/pci-15b3.present: "true"&lt;/code&gt; (for Mellanox/NVIDIA NICs)&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Manual Verification
&lt;/h4&gt;

&lt;p&gt;You can also manually verify RDMA capability on your nodes:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Check for RDMA devices:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;rdma &lt;span class="nb"&gt;link&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;For Mellanox/NVIDIA NICs, verify the presence of InfiniBand devices:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;lspci &lt;span class="nt"&gt;-nn&lt;/span&gt; | &lt;span class="nb"&gt;grep &lt;/span&gt;Infiniband
ibstat | &lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="s2"&gt;"Link layer"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;Check the RDMA subsystem mode:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;rdma system
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  RDMA Configuration Options in OpenShift
&lt;/h3&gt;

&lt;p&gt;OpenShift with NVIDIA networking supports three primary RDMA configuration methods, each with different characteristics and use cases.&lt;/p&gt;

&lt;h4&gt;
  
  
  1. RDMA Shared Device
&lt;/h4&gt;

&lt;p&gt;The RDMA Shared Device configuration allows multiple pods on a worker node to share the same RDMA device. This method is suitable for development environments or applications where maximum performance is not critical.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Configuration Example&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;mellanox.com/v1alpha1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;NicClusterPolicy&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;nic-cluster-policy&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;ofedDriver&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;doca-driver&lt;/span&gt;
    &lt;span class="na"&gt;repository&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;nvcr.io/nvidia/mellanox&lt;/span&gt;
    &lt;span class="na"&gt;version&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;25.04-0.6.1.0-2&lt;/span&gt;
    &lt;span class="na"&gt;startupProbe&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;initialDelaySeconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;10&lt;/span&gt;
      &lt;span class="na"&gt;periodSeconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;20&lt;/span&gt;
    &lt;span class="na"&gt;livenessProbe&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;initialDelaySeconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;30&lt;/span&gt;
      &lt;span class="na"&gt;periodSeconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;30&lt;/span&gt;
    &lt;span class="na"&gt;readinessProbe&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;initialDelaySeconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;10&lt;/span&gt;
      &lt;span class="na"&gt;periodSeconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;30&lt;/span&gt;
  &lt;span class="na"&gt;rdmaSharedDevicePlugin&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;config&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
      &lt;span class="s"&gt;{&lt;/span&gt;
        &lt;span class="s"&gt;"configList": [&lt;/span&gt;
          &lt;span class="s"&gt;{&lt;/span&gt;
            &lt;span class="s"&gt;"resourceName": "rdma_shared_device_ib",&lt;/span&gt;
            &lt;span class="s"&gt;"rdmaHcaMax": 63,&lt;/span&gt;
            &lt;span class="s"&gt;"selectors": {&lt;/span&gt;
              &lt;span class="s"&gt;"ifNames": ["ibs2f0"]&lt;/span&gt;
            &lt;span class="s"&gt;}&lt;/span&gt;
          &lt;span class="s"&gt;},&lt;/span&gt;
          &lt;span class="s"&gt;{&lt;/span&gt;
            &lt;span class="s"&gt;"resourceName": "rdma_shared_device_eth",&lt;/span&gt;
            &lt;span class="s"&gt;"rdmaHcaMax": 63,&lt;/span&gt;
            &lt;span class="s"&gt;"selectors": {&lt;/span&gt;
              &lt;span class="s"&gt;"ifNames": ["ens8f0np0"]&lt;/span&gt;
            &lt;span class="s"&gt;}&lt;/span&gt;
          &lt;span class="s"&gt;}&lt;/span&gt;
        &lt;span class="s"&gt;]&lt;/span&gt;
      &lt;span class="s"&gt;}&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;k8s-rdma-shared-dev-plugin&lt;/span&gt;
    &lt;span class="na"&gt;repository&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;nvcr.io/nvidia/cloud-native&lt;/span&gt;
    &lt;span class="na"&gt;version&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;v1.5.3&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Key Parameters&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;rdmaHcaMax&lt;/code&gt;: Maximum number of pods that can share the device&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;selectors.ifNames&lt;/code&gt;: Network interface names to be used for RDMA&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Use Cases&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Development and testing environments&lt;/li&gt;
&lt;li&gt;Applications where multiple pods need RDMA functionality but not maximum performance&lt;/li&gt;
&lt;li&gt;Environments with limited hardware resources&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Limitations&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;All pods sharing the device compete for bandwidth and resources&lt;/li&gt;
&lt;li&gt;No isolation between pods using the same device&lt;/li&gt;
&lt;li&gt;Performance may degrade as more pods use the device&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  2. RDMA SR-IOV Legacy Device
&lt;/h4&gt;

&lt;p&gt;The SR-IOV (Single Root I/O Virtualization) configuration segments a network device at the hardware layer, creating multiple virtual functions (VFs) that can be assigned to different pods. This provides better isolation and performance compared to the shared device method.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Configuration Example&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# NicClusterPolicy for OFED driver&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;mellanox.com/v1alpha1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;NicClusterPolicy&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;nic-cluster-policy&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;ofedDriver&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;doca-driver&lt;/span&gt;
    &lt;span class="na"&gt;repository&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;nvcr.io/nvidia/mellanox&lt;/span&gt;
    &lt;span class="na"&gt;version&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;25.04-0.6.1.0-2&lt;/span&gt;
    &lt;span class="na"&gt;startupProbe&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;initialDelaySeconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;10&lt;/span&gt;
      &lt;span class="na"&gt;periodSeconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;20&lt;/span&gt;
    &lt;span class="na"&gt;livenessProbe&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;initialDelaySeconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;30&lt;/span&gt;
      &lt;span class="na"&gt;periodSeconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;30&lt;/span&gt;
    &lt;span class="na"&gt;readinessProbe&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;initialDelaySeconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;10&lt;/span&gt;
      &lt;span class="na"&gt;periodSeconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;30&lt;/span&gt;

&lt;span class="c1"&gt;# SR-IOV Network Node Policy&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;sriovnetwork.openshift.io/v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;SriovNetworkNodePolicy&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;sriov-legacy-policy&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;openshift-sriov-network-operator&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;deviceType&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;netdevice&lt;/span&gt;
  &lt;span class="na"&gt;mtu&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1500&lt;/span&gt;
  &lt;span class="na"&gt;nicSelector&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;vendor&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;15b3"&lt;/span&gt;
    &lt;span class="na"&gt;pfNames&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ens8f0np0#0-7"&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
  &lt;span class="na"&gt;nodeSelector&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;feature.node.kubernetes.io/pci-15b3.present&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;true"&lt;/span&gt;
  &lt;span class="na"&gt;numVfs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;8&lt;/span&gt;
  &lt;span class="na"&gt;priority&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;90&lt;/span&gt;
  &lt;span class="na"&gt;isRdma&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="na"&gt;resourceName&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;sriovlegacy&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Key Parameters&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;numVfs&lt;/code&gt;: Number of virtual functions to create&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;pfNames&lt;/code&gt;: Physical function names to use for SR-IOV&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;isRdma&lt;/code&gt;: Enable RDMA capability for the VFs&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Use Cases&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Production environments requiring high performance&lt;/li&gt;
&lt;li&gt;Workloads sensitive to latency and bandwidth&lt;/li&gt;
&lt;li&gt;Applications requiring isolation between network resources&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Limitations&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Limited by the maximum number of VFs supported by the hardware&lt;/li&gt;
&lt;li&gt;Requires SR-IOV capable network adapters&lt;/li&gt;
&lt;li&gt;May require system reboot when changing configuration&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  3. RDMA Host Device
&lt;/h4&gt;

&lt;p&gt;The Host Device configuration passes the entire physical network device from the host to a pod. This provides maximum performance but limits the device to a single pod at a time.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Configuration Example&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;mellanox.com/v1alpha1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;NicClusterPolicy&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;nic-cluster-policy&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;ofedDriver&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;doca-driver&lt;/span&gt;
    &lt;span class="na"&gt;repository&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;nvcr.io/nvidia/mellanox&lt;/span&gt;
    &lt;span class="na"&gt;version&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;25.04-0.6.1.0-2&lt;/span&gt;
    &lt;span class="na"&gt;startupProbe&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;initialDelaySeconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;10&lt;/span&gt;
      &lt;span class="na"&gt;periodSeconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;20&lt;/span&gt;
    &lt;span class="na"&gt;livenessProbe&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;initialDelaySeconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;30&lt;/span&gt;
      &lt;span class="na"&gt;periodSeconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;30&lt;/span&gt;
    &lt;span class="na"&gt;readinessProbe&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;initialDelaySeconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;10&lt;/span&gt;
      &lt;span class="na"&gt;periodSeconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;30&lt;/span&gt;
  &lt;span class="na"&gt;sriovDevicePlugin&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;sriov-network-device-plugin&lt;/span&gt;
    &lt;span class="na"&gt;repository&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ghcr.io/k8snetworkplumbingwg&lt;/span&gt;
    &lt;span class="na"&gt;version&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;v3.7.0&lt;/span&gt;
    &lt;span class="na"&gt;config&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
      &lt;span class="s"&gt;{&lt;/span&gt;
        &lt;span class="s"&gt;"resourceList": [&lt;/span&gt;
          &lt;span class="s"&gt;{&lt;/span&gt;
            &lt;span class="s"&gt;"resourcePrefix": "nvidia.com",&lt;/span&gt;
            &lt;span class="s"&gt;"resourceName": "hostdev",&lt;/span&gt;
            &lt;span class="s"&gt;"selectors": {&lt;/span&gt;
              &lt;span class="s"&gt;"vendors": ["15b3"],&lt;/span&gt;
              &lt;span class="s"&gt;"isRdma": true&lt;/span&gt;
            &lt;span class="s"&gt;}&lt;/span&gt;
          &lt;span class="s"&gt;}&lt;/span&gt;
        &lt;span class="s"&gt;]&lt;/span&gt;
      &lt;span class="s"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Key Parameters&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;resourcePrefix&lt;/code&gt;: Prefix for the resource name&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;resourceName&lt;/code&gt;: Name of the resource to be exposed&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;selectors.vendors&lt;/code&gt;: Vendor IDs to match (15b3 for Mellanox/NVIDIA)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Use Cases&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Workloads requiring maximum performance&lt;/li&gt;
&lt;li&gt;Systems where SR-IOV is not supported&lt;/li&gt;
&lt;li&gt;Applications needing features only available in the physical function driver&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Limitations&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Device is exclusive to a single pod&lt;/li&gt;
&lt;li&gt;Limited scalability as each device can only be used by one pod at a time&lt;/li&gt;
&lt;li&gt;May not be suitable for environments with many pods requiring RDMA&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  RDMA Network Configuration
&lt;/h3&gt;

&lt;h4&gt;
  
  
  InfiniBand Network Configuration
&lt;/h4&gt;

&lt;p&gt;For InfiniBand networks, additional configuration is required:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Ensure the host has an InfiniBand card installed and the driver is properly installed&lt;/li&gt;
&lt;li&gt;Verify the RDMA subsystem mode:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;   rdma system
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For exclusive mode (recommended for production):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;   rdma system &lt;span class="nb"&gt;set &lt;/span&gt;netns exclusive
   &lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"options ib_core netns_mode=0"&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; /etc/modprobe.d/ib_core.conf
   reboot
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;Configure SR-IOV for InfiniBand:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;   &lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;sriovnetwork.openshift.io/v1&lt;/span&gt;
   &lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;SriovNetworkNodePolicy&lt;/span&gt;
   &lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
     &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ib-sriov&lt;/span&gt;
     &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;kube-system&lt;/span&gt;
   &lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
     &lt;span class="na"&gt;nodeSelector&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
       &lt;span class="na"&gt;kubernetes.io/os&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;linux"&lt;/span&gt;
     &lt;span class="na"&gt;resourceName&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;mellanoxibsriov&lt;/span&gt;
     &lt;span class="na"&gt;priority&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;99&lt;/span&gt;
     &lt;span class="na"&gt;numVfs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;12&lt;/span&gt;
     &lt;span class="na"&gt;nicSelector&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
         &lt;span class="na"&gt;deviceID&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;1017"&lt;/span&gt;
         &lt;span class="na"&gt;rootDevices&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
         &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;0000:86:00.0&lt;/span&gt;
         &lt;span class="na"&gt;vendor&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;15b3"&lt;/span&gt;
     &lt;span class="na"&gt;deviceType&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;netdevice&lt;/span&gt;
     &lt;span class="na"&gt;isRdma&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;Create a network attachment definition:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;   &lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;spiderpool.spidernet.io/v2beta1&lt;/span&gt;
   &lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;SpiderMultusConfig&lt;/span&gt;
   &lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
     &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ib-sriov&lt;/span&gt;
     &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;kube-system&lt;/span&gt;
   &lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
     &lt;span class="na"&gt;cniType&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ib-sriov&lt;/span&gt;
     &lt;span class="na"&gt;ibsriov&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
       &lt;span class="na"&gt;resourceName&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;spidernet.io/mellanoxibsriov&lt;/span&gt;
       &lt;span class="na"&gt;ippools&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
         &lt;span class="na"&gt;ipv4&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;v4-91"&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  RoCE Network Configuration
&lt;/h4&gt;

&lt;p&gt;For RoCE networks, ensure:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The network adapters support RoCE (typically Mellanox ConnectX-4 or newer)&lt;/li&gt;
&lt;li&gt;Priority Flow Control (PFC) is configured on the switches&lt;/li&gt;
&lt;li&gt;Explicit Congestion Notification (ECN) is enabled&lt;/li&gt;
&lt;li&gt;Appropriate QoS settings are configured&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Testing and Verification
&lt;/h3&gt;

&lt;h4&gt;
  
  
  Verifying RDMA Functionality
&lt;/h4&gt;

&lt;p&gt;To verify RDMA functionality between pods:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Deploy test pods with RDMA capabilities:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;   &lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;v1&lt;/span&gt;
   &lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Pod&lt;/span&gt;
   &lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
     &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;rdma-test-pod-1&lt;/span&gt;
     &lt;span class="na"&gt;annotations&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
       &lt;span class="na"&gt;k8s.v1.cni.cncf.io/networks&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;rdma-network&lt;/span&gt;
   &lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
     &lt;span class="na"&gt;containers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
     &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;rdma-test-container&lt;/span&gt;
       &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;mellanox/rping-test&lt;/span&gt;
       &lt;span class="na"&gt;securityContext&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
         &lt;span class="na"&gt;capabilities&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
           &lt;span class="na"&gt;add&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;IPC_LOCK"&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
       &lt;span class="na"&gt;resources&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
         &lt;span class="na"&gt;limits&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
           &lt;span class="na"&gt;nvidia.com/hostdev&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;
         &lt;span class="na"&gt;requests&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
           &lt;span class="na"&gt;nvidia.com/hostdev&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;
       &lt;span class="na"&gt;command&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
       &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;sh&lt;/span&gt;
       &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;-c&lt;/span&gt;
       &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;sleep infinity&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;Verify RDMA devices in the pods:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;   oc &lt;span class="nb"&gt;exec&lt;/span&gt; &lt;span class="nt"&gt;-it&lt;/span&gt; rdma-test-pod-1 &lt;span class="nt"&gt;--&lt;/span&gt; rdma &lt;span class="nb"&gt;link&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;Run RDMA performance tests:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;   &lt;span class="c"&gt;# In pod 1 (server)&lt;/span&gt;
   oc &lt;span class="nb"&gt;exec&lt;/span&gt; &lt;span class="nt"&gt;-it&lt;/span&gt; rdma-test-pod-1 &lt;span class="nt"&gt;--&lt;/span&gt; ib_read_lat

   &lt;span class="c"&gt;# In pod 2 (client)&lt;/span&gt;
   oc &lt;span class="nb"&gt;exec&lt;/span&gt; &lt;span class="nt"&gt;-it&lt;/span&gt; rdma-test-pod-2 &lt;span class="nt"&gt;--&lt;/span&gt; ib_read_lat &amp;lt;server-ip&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Performance Optimization
&lt;/h3&gt;

&lt;h4&gt;
  
  
  Network Tuning
&lt;/h4&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;MTU Size&lt;/strong&gt;: Configure jumbo frames (MTU 9000) for improved throughput:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;   &lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;sriovnetwork.openshift.io/v1&lt;/span&gt;
   &lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;SriovNetworkNodePolicy&lt;/span&gt;
   &lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
     &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;sriov-policy&lt;/span&gt;
   &lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
     &lt;span class="na"&gt;mtu&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;9000&lt;/span&gt;
     &lt;span class="c1"&gt;# other parameters...&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;NUMA Alignment&lt;/strong&gt;: Ensure RDMA devices are aligned with CPU and memory resources:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;   &lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;v1&lt;/span&gt;
   &lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Pod&lt;/span&gt;
   &lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
     &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;rdma-numa-aligned-pod&lt;/span&gt;
   &lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
     &lt;span class="na"&gt;containers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
     &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;rdma-container&lt;/span&gt;
       &lt;span class="c1"&gt;# ...&lt;/span&gt;
     &lt;span class="na"&gt;nodeSelector&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
       &lt;span class="na"&gt;kubernetes.io/hostname&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;node-with-aligned-resources&lt;/span&gt;
     &lt;span class="na"&gt;topologySpreadConstraints&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
     &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;maxSkew&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;
       &lt;span class="na"&gt;topologyKey&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;kubernetes.io/hostname&lt;/span&gt;
       &lt;span class="na"&gt;whenUnsatisfiable&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;DoNotSchedule&lt;/span&gt;
       &lt;span class="na"&gt;labelSelector&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
         &lt;span class="na"&gt;matchLabels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
           &lt;span class="na"&gt;app&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;rdma-app&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;IRQ Affinity&lt;/strong&gt;: Configure IRQ affinity for RDMA devices to specific CPU cores:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;   &lt;span class="c"&gt;# On the host&lt;/span&gt;
   set_irq_affinity.sh &amp;lt;interface_name&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Application Tuning
&lt;/h4&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Buffer Sizes&lt;/strong&gt;: Adjust RDMA buffer sizes for optimal performance:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;   &lt;span class="c"&gt;# Example for increasing queue pairs&lt;/span&gt;
   &lt;span class="nb"&gt;echo &lt;/span&gt;8192 &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; /sys/module/mlx4_core/parameters/log_num_qp
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Transport Selection&lt;/strong&gt;: Choose the appropriate RDMA transport based on your network:

&lt;ul&gt;
&lt;li&gt;InfiniBand: Use native InfiniBand transport for lowest latency&lt;/li&gt;
&lt;li&gt;RoCE: Use RoCEv2 for routable RDMA over Ethernet&lt;/li&gt;
&lt;li&gt;iWARP: Use for compatibility with standard TCP/IP networks&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Common Issues and Troubleshooting
&lt;/h3&gt;

&lt;h4&gt;
  
  
  RDMA Device Not Visible in Pod
&lt;/h4&gt;

&lt;ol&gt;
&lt;li&gt;Verify the OFED driver is installed:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;   oc get pods &lt;span class="nt"&gt;-n&lt;/span&gt; nvidia-network-operator | &lt;span class="nb"&gt;grep &lt;/span&gt;ofed
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;Check RDMA device allocation:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;   oc describe pod &amp;lt;pod-name&amp;gt;
   &lt;span class="c"&gt;# Look for resource allocation in the Events section&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;Verify RDMA capability is enabled:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;   oc get sriovnetworknodestates &lt;span class="nt"&gt;-n&lt;/span&gt; openshift-sriov-network-operator &lt;span class="nt"&gt;-o&lt;/span&gt; yaml
   &lt;span class="c"&gt;# Check for isRdma: true&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Performance Issues
&lt;/h4&gt;

&lt;ol&gt;
&lt;li&gt;Check for network congestion:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;   &lt;span class="c"&gt;# On the host&lt;/span&gt;
   perfquery &lt;span class="nt"&gt;-r&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;Verify PFC is working:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;   &lt;span class="c"&gt;# On the switch&lt;/span&gt;
   show priority-flow-control
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;Monitor RDMA statistics:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;   &lt;span class="c"&gt;# On the host&lt;/span&gt;
   rdma &lt;span class="nt"&gt;-d&lt;/span&gt; mlx5_0 &lt;span class="nt"&gt;-p&lt;/span&gt; port_rcv_data,port_xmit_data &lt;span class="nb"&gt;stat &lt;/span&gt;show
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Connectivity Issues
&lt;/h4&gt;

&lt;ol&gt;
&lt;li&gt;Verify subnet manager is running (for InfiniBand):
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;   &lt;span class="c"&gt;# On the host&lt;/span&gt;
   sminfo
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;Check link state:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;   &lt;span class="c"&gt;# On the host&lt;/span&gt;
   ibv_devinfo
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;Test basic connectivity:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;   &lt;span class="c"&gt;# In the pod&lt;/span&gt;
   ping &amp;lt;remote-ip&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Integration with Storage and GPU Workloads
&lt;/h3&gt;

&lt;h4&gt;
  
  
  NVMe over Fabrics (NVMe-oF)
&lt;/h4&gt;

&lt;p&gt;RDMA is a key transport for NVMe over Fabrics, providing high-performance access to NVMe storage devices over a network:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Configure NVMe-oF target:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;   &lt;span class="c"&gt;# Example configuration&lt;/span&gt;
   nvmetcli
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;Connect to NVMe-oF using RDMA:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;   &lt;span class="c"&gt;# In the pod&lt;/span&gt;
   nvme connect &lt;span class="nt"&gt;-t&lt;/span&gt; rdma &lt;span class="nt"&gt;-a&lt;/span&gt; &amp;lt;target-ip&amp;gt; &lt;span class="nt"&gt;-s&lt;/span&gt; 4420 &lt;span class="nt"&gt;-n&lt;/span&gt; &amp;lt;subsystem-nqn&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  GPUDirect RDMA
&lt;/h4&gt;

&lt;p&gt;GPUDirect RDMA enables direct data transfer between GPU memory and network adapters, bypassing CPU and system memory:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Ensure NVIDIA GPU Operator is installed&lt;/li&gt;
&lt;li&gt;Configure GPUDirect RDMA:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;   &lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;mellanox.com/v1alpha1&lt;/span&gt;
   &lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;NicClusterPolicy&lt;/span&gt;
   &lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
     &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;nic-cluster-policy&lt;/span&gt;
   &lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
     &lt;span class="na"&gt;ofedDriver&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
       &lt;span class="c1"&gt;# ...&lt;/span&gt;
     &lt;span class="na"&gt;rdmaSharedDevicePlugin&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
       &lt;span class="c1"&gt;# ...&lt;/span&gt;
     &lt;span class="na"&gt;gpuDirectRdma&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
       &lt;span class="na"&gt;enabled&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;Verify GPUDirect RDMA functionality:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;   &lt;span class="c"&gt;# In the pod&lt;/span&gt;
   nvidia-smi topo &lt;span class="nt"&gt;-m&lt;/span&gt;
   &lt;span class="c"&gt;# Look for "NV" in the RDMA column&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  NVIDIA GPU Architecture
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Introduction to GPU Concurrency and Sharing Mechanisms
&lt;/h3&gt;

&lt;p&gt;In enterprise-level OpenShift environments, applications typically have varying compute requirements that can leave GPUs underutilized. Providing the right amount of compute resources for each workload is critical to reduce deployment costs and maximize GPU utilization. Red Hat and NVIDIA have developed GPU concurrency and sharing mechanisms to simplify GPU-accelerated computing on OpenShift clusters.&lt;/p&gt;

&lt;p&gt;GPU concurrency mechanisms for improving utilization range from programming model APIs to system software and hardware partitioning, including virtualization. These mechanisms allow multiple workloads to share GPU resources efficiently, improving overall utilization and reducing costs.&lt;/p&gt;

&lt;h3&gt;
  
  
  GPU Sharing Technologies
&lt;/h3&gt;

&lt;h4&gt;
  
  
  CUDA Streams
&lt;/h4&gt;

&lt;p&gt;Compute Unified Device Architecture (CUDA) is a parallel computing platform and programming model developed by NVIDIA for general computing on GPUs. CUDA streams provide a mechanism for parallel execution of operations on the GPU.&lt;/p&gt;

&lt;p&gt;Key characteristics:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A stream is a sequence of operations that executes in issue-order on the GPU&lt;/li&gt;
&lt;li&gt;CUDA commands are typically executed sequentially in a default stream&lt;/li&gt;
&lt;li&gt;Asynchronous processing across different streams allows for parallel execution&lt;/li&gt;
&lt;li&gt;Tasks in different streams can run before, during, or after each other&lt;/li&gt;
&lt;li&gt;Enables the GPU to run multiple tasks simultaneously in no prescribed order&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Use Cases&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Applications with multiple independent tasks that can be executed in parallel&lt;/li&gt;
&lt;li&gt;Workloads that can benefit from overlapping data transfers and computations&lt;/li&gt;
&lt;li&gt;Scenarios where multiple small kernels need to be executed concurrently&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Time-Slicing
&lt;/h4&gt;

&lt;p&gt;GPU time-slicing interleaves workloads scheduled on overloaded GPUs when running multiple CUDA applications. This approach allows for better utilization of GPU resources without requiring hardware-level partitioning.&lt;/p&gt;

&lt;p&gt;Key characteristics:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Enables sharing of GPUs by defining a set of replicas for a GPU&lt;/li&gt;
&lt;li&gt;Each replica can be independently distributed to a pod&lt;/li&gt;
&lt;li&gt;No memory or fault isolation between replicas&lt;/li&gt;
&lt;li&gt;Uses GPU time-slicing to multiplex workloads from replicas&lt;/li&gt;
&lt;li&gt;Can be applied cluster-wide or to specific nodes&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Configuration Example&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;nvidia.com/v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ClusterPolicy&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;gpu-cluster-policy&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;devicePlugin&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;config&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;time-slicing-config&lt;/span&gt;
      &lt;span class="na"&gt;default&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;any-gpu-time-slicing&lt;/span&gt;
      &lt;span class="na"&gt;sharing&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;timeSlicing&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;resources&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;any-gpu-time-slicing&lt;/span&gt;
            &lt;span class="na"&gt;replicas&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;4&lt;/span&gt;
            &lt;span class="na"&gt;renameByDefault&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;
            &lt;span class="na"&gt;failRequestsGreaterThanOne&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Use Cases&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Older NVIDIA cards with no MIG support on bare metal&lt;/li&gt;
&lt;li&gt;Workloads that don't require strict isolation&lt;/li&gt;
&lt;li&gt;Development and testing environments&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  CUDA Multi-Process Service (MPS)
&lt;/h4&gt;

&lt;p&gt;CUDA Multi-Process Service (MPS) allows a single GPU to use multiple CUDA processes. The processes run in parallel on the GPU, eliminating saturation of the GPU compute resources.&lt;/p&gt;

&lt;p&gt;Key characteristics:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Enables concurrent execution of kernel operations from different processes&lt;/li&gt;
&lt;li&gt;Allows overlapping of memory copying from different processes&lt;/li&gt;
&lt;li&gt;Enhances GPU utilization by enabling multiple processes to share the GPU&lt;/li&gt;
&lt;li&gt;Provides a server process that manages access to the GPU&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Use Cases&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;HPC workloads with multiple MPI ranks&lt;/li&gt;
&lt;li&gt;Applications with multiple small CUDA kernels&lt;/li&gt;
&lt;li&gt;Scenarios where multiple processes need to share a single GPU&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Multi-Instance GPU (MIG)
&lt;/h4&gt;

&lt;p&gt;Multi-Instance GPU (MIG) is a feature of the NVIDIA Ampere architecture that enables splitting GPU compute units and memory into multiple MIG instances. Each instance represents a standalone GPU device from a system perspective.&lt;/p&gt;

&lt;p&gt;Key characteristics:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Each MIG instance appears as an individual GPU to the system&lt;/li&gt;
&lt;li&gt;Provides hardware-level isolation between instances&lt;/li&gt;
&lt;li&gt;Supported on NVIDIA A100 and A30 Ampere cards&lt;/li&gt;
&lt;li&gt;Can support up to seven independent CUDA applications&lt;/li&gt;
&lt;li&gt;Offers complete isolation with dedicated hardware resources&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Configuration Example&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;nvidia.com/v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ClusterPolicy&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;gpu-cluster-policy&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;mig&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;strategy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;single&lt;/span&gt;
    &lt;span class="na"&gt;config&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;gpuIds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;0&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;1&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
        &lt;span class="na"&gt;migEnabled&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
        &lt;span class="na"&gt;devices&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;profile&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;1g.5gb"&lt;/span&gt;
            &lt;span class="na"&gt;count&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;2&lt;/span&gt;
          &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;profile&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;2g.10gb"&lt;/span&gt;
            &lt;span class="na"&gt;count&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Use Cases&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Production environments requiring strict isolation&lt;/li&gt;
&lt;li&gt;Mixed workloads with different resource requirements&lt;/li&gt;
&lt;li&gt;Bare metal deployments with MIG-enabled cards&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Virtualization with vGPU
&lt;/h4&gt;

&lt;p&gt;Virtual machines (VMs) can directly access a single physical GPU using NVIDIA vGPU. This capability combines the power of GPU performance with the management and security benefits provided by virtualization.&lt;/p&gt;

&lt;p&gt;Key characteristics:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Creates virtual GPUs that can be shared by VMs across the enterprise&lt;/li&gt;
&lt;li&gt;Provides management and monitoring for VM environments&lt;/li&gt;
&lt;li&gt;Enables workload balancing for mixed VDI and compute workloads&lt;/li&gt;
&lt;li&gt;Allows resource sharing across multiple VMs&lt;/li&gt;
&lt;li&gt;Offers proactive management capabilities&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Use Cases&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;VM environments requiring GPU acceleration&lt;/li&gt;
&lt;li&gt;OpenShift Virtualization deployments&lt;/li&gt;
&lt;li&gt;Mixed VDI and compute workloads&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Deployment Considerations for Different OpenShift Scenarios
&lt;/h3&gt;

&lt;p&gt;When implementing GPU sharing in OpenShift, consider the following recommendations for different scenarios:&lt;/p&gt;

&lt;h4&gt;
  
  
  Bare Metal Deployments
&lt;/h4&gt;

&lt;p&gt;For bare metal OpenShift deployments:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;vGPU is not available&lt;/li&gt;
&lt;li&gt;Consider using MIG-enabled cards (A100, A30) for hardware-level partitioning&lt;/li&gt;
&lt;li&gt;If using older NVIDIA cards without MIG support, consider time-slicing&lt;/li&gt;
&lt;li&gt;For maximum performance, use direct GPU assignment without sharing&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Virtual Machine Deployments
&lt;/h4&gt;

&lt;p&gt;For OpenShift deployments on virtual machines:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;vGPU is the best choice for sharing GPU resources&lt;/li&gt;
&lt;li&gt;Consider using separate VMs when you need both passthrough and vGPU&lt;/li&gt;
&lt;li&gt;Ensure the hypervisor supports GPU passthrough or vGPU&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Mixed Environments with OKD Virtualization
&lt;/h4&gt;

&lt;p&gt;For bare metal with OKD Virtualization and multiple GPUs:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Consider using pass-through for hosted VMs&lt;/li&gt;
&lt;li&gt;Use time-slicing for containers&lt;/li&gt;
&lt;li&gt;Align NUMA topology for optimal performance&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Implementation Guidelines
&lt;/h3&gt;

&lt;h4&gt;
  
  
  Enabling Time-Slicing
&lt;/h4&gt;

&lt;p&gt;To enable time-slicing of GPUs on Kubernetes:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Create a ConfigMap with the time-slicing configuration:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;   &lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;v1&lt;/span&gt;
   &lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ConfigMap&lt;/span&gt;
   &lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
     &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;time-slicing-config&lt;/span&gt;
     &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;gpu-operator&lt;/span&gt;
   &lt;span class="na"&gt;data&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
     &lt;span class="na"&gt;config.yaml&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
       &lt;span class="s"&gt;version: v1&lt;/span&gt;
       &lt;span class="s"&gt;sharing:&lt;/span&gt;
         &lt;span class="s"&gt;timeSlicing:&lt;/span&gt;
           &lt;span class="s"&gt;resources:&lt;/span&gt;
           &lt;span class="s"&gt;- name: nvidia.com/gpu&lt;/span&gt;
             &lt;span class="s"&gt;replicas: 4&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;Apply the configuration to the NVIDIA GPU Operator:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;   &lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;nvidia.com/v1&lt;/span&gt;
   &lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ClusterPolicy&lt;/span&gt;
   &lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
     &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;gpu-cluster-policy&lt;/span&gt;
   &lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
     &lt;span class="na"&gt;devicePlugin&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
       &lt;span class="na"&gt;config&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
         &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;time-slicing-config&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;Verify the configuration:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;   oc get nodes &lt;span class="nt"&gt;-o&lt;/span&gt; json | jq &lt;span class="s1"&gt;'.items[].status.allocatable | select(has("nvidia.com/gpu"))'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Configuring MIG
&lt;/h4&gt;

&lt;p&gt;To configure Multi-Instance GPU (MIG) on OpenShift:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Ensure you have MIG-capable GPUs (A100, A30)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Create a MIG strategy configuration:&lt;br&gt;
&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;   &lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;nvidia.com/v1&lt;/span&gt;
   &lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ClusterPolicy&lt;/span&gt;
   &lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
     &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;gpu-cluster-policy&lt;/span&gt;
   &lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
     &lt;span class="na"&gt;mig&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
       &lt;span class="na"&gt;strategy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;single&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Apply the configuration and wait for the GPU Operator to configure MIG&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Verify MIG instances:&lt;br&gt;
&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;   oc &lt;span class="nb"&gt;exec&lt;/span&gt; &lt;span class="nt"&gt;-it&lt;/span&gt; &lt;span class="nt"&gt;-n&lt;/span&gt; nvidia-gpu-operator nvidia-device-plugin-daemonset-xyz &lt;span class="nt"&gt;--&lt;/span&gt; nvidia-smi mig &lt;span class="nt"&gt;-lgi&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Setting Up vGPU
&lt;/h4&gt;

&lt;p&gt;To set up vGPU in an OpenShift environment:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Install the NVIDIA vGPU software on the hypervisor&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Configure vGPU profiles for your VMs&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Install the GPU Operator in the OpenShift cluster:&lt;br&gt;
&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;   helm &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;--wait&lt;/span&gt; &lt;span class="nt"&gt;--generate-name&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
     &lt;span class="nt"&gt;-n&lt;/span&gt; gpu-operator &lt;span class="nt"&gt;--create-namespace&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
     nvidia/gpu-operator &lt;span class="se"&gt;\&lt;/span&gt;
     &lt;span class="nt"&gt;--set&lt;/span&gt; driver.enabled&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;false&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;Verify vGPU detection:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;   oc get nodes &lt;span class="nt"&gt;-o&lt;/span&gt; json | jq &lt;span class="s1"&gt;'.items[].status.allocatable | select(has("nvidia.com/gpu"))'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Performance Optimization
&lt;/h3&gt;

&lt;h4&gt;
  
  
  NUMA Alignment
&lt;/h4&gt;

&lt;p&gt;For optimal GPU performance, ensure NUMA alignment between GPUs, NICs, and CPU cores:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Identify NUMA topology:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;   oc debug node/&amp;lt;node-name&amp;gt; &lt;span class="nt"&gt;--&lt;/span&gt; &lt;span class="nb"&gt;chroot&lt;/span&gt; /host nvidia-smi topo &lt;span class="nt"&gt;-m&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;Configure pod placement to respect NUMA boundaries:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;   &lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;v1&lt;/span&gt;
   &lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Pod&lt;/span&gt;
   &lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
     &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;gpu-numa-aligned-pod&lt;/span&gt;
   &lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
     &lt;span class="na"&gt;containers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
     &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;gpu-container&lt;/span&gt;
       &lt;span class="c1"&gt;# ...&lt;/span&gt;
     &lt;span class="na"&gt;nodeSelector&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
       &lt;span class="na"&gt;nvidia.com/gpu.present&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;true"&lt;/span&gt;
     &lt;span class="na"&gt;topologySpreadConstraints&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
     &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;maxSkew&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;
       &lt;span class="na"&gt;topologyKey&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;kubernetes.io/hostname&lt;/span&gt;
       &lt;span class="na"&gt;whenUnsatisfiable&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;DoNotSchedule&lt;/span&gt;
       &lt;span class="na"&gt;labelSelector&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
         &lt;span class="na"&gt;matchLabels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
           &lt;span class="na"&gt;app&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;gpu-app&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  GPU Monitoring
&lt;/h4&gt;

&lt;p&gt;Monitor GPU utilization to identify opportunities for optimization:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Deploy NVIDIA DCGM-Exporter:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;   oc apply &lt;span class="nt"&gt;-f&lt;/span&gt; https://raw.githubusercontent.com/NVIDIA/dcgm-exporter/main/deployment/openshift/dcgm-exporter.yaml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;Configure Prometheus to scrape metrics:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;   &lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;monitoring.coreos.com/v1&lt;/span&gt;
   &lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ServiceMonitor&lt;/span&gt;
   &lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
     &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;dcgm-exporter&lt;/span&gt;
     &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;nvidia-gpu-operator&lt;/span&gt;
   &lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
     &lt;span class="na"&gt;endpoints&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
     &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;port&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;metrics&lt;/span&gt;
       &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/metrics&lt;/span&gt;
       &lt;span class="na"&gt;interval&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;15s&lt;/span&gt;
     &lt;span class="na"&gt;selector&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
       &lt;span class="na"&gt;matchLabels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
         &lt;span class="na"&gt;app&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;dcgm-exporter&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;Create Grafana dashboards to visualize GPU metrics&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Integration with RDMA for High-Performance Computing
&lt;/h3&gt;

&lt;h4&gt;
  
  
  GPUDirect RDMA
&lt;/h4&gt;

&lt;p&gt;GPUDirect RDMA enables direct data transfer between GPU memory and network adapters, bypassing CPU and system memory. This integration is particularly valuable for high-performance computing workloads.&lt;/p&gt;

&lt;p&gt;To configure GPUDirect RDMA:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Ensure both NVIDIA GPU Operator and NVIDIA Network Operator are installed&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Enable GPUDirect RDMA in the NicClusterPolicy:&lt;br&gt;
&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;   &lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;mellanox.com/v1alpha1&lt;/span&gt;
   &lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;NicClusterPolicy&lt;/span&gt;
   &lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
     &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;nic-cluster-policy&lt;/span&gt;
   &lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
     &lt;span class="na"&gt;ofedDriver&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
       &lt;span class="c1"&gt;# ...&lt;/span&gt;
     &lt;span class="na"&gt;rdmaSharedDevicePlugin&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
       &lt;span class="c1"&gt;# ...&lt;/span&gt;
     &lt;span class="na"&gt;gpuDirectRdma&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
       &lt;span class="na"&gt;enabled&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;Verify GPUDirect RDMA functionality:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;   &lt;span class="c"&gt;# In the pod&lt;/span&gt;
   nvidia-smi topo &lt;span class="nt"&gt;-m&lt;/span&gt;
   &lt;span class="c"&gt;# Look for "NV" in the RDMA column&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Performance Considerations
&lt;/h4&gt;

&lt;p&gt;When using GPUDirect RDMA with GPU sharing mechanisms:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;MIG instances can use GPUDirect RDMA independently&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Time-slicing may introduce additional latency for RDMA operations&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;vGPU requires SR-IOV network adapters for optimal RDMA performance&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Align NUMA topology for GPUs and network adapters to minimize PCIe traffic across NUMA nodes&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Troubleshooting
&lt;/h3&gt;

&lt;h4&gt;
  
  
  Common Issues
&lt;/h4&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;GPU Not Detected&lt;/strong&gt;:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;   oc get nodes &lt;span class="nt"&gt;-o&lt;/span&gt; json | jq &lt;span class="s1"&gt;'.items[].status.allocatable | select(has("nvidia.com/gpu"))'&lt;/span&gt;
   oc logs &lt;span class="nt"&gt;-n&lt;/span&gt; nvidia-gpu-operator nvidia-driver-daemonset-xyz
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;MIG Configuration Failures&lt;/strong&gt;:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;   oc logs &lt;span class="nt"&gt;-n&lt;/span&gt; nvidia-gpu-operator nvidia-mig-manager-xyz
   oc &lt;span class="nb"&gt;exec&lt;/span&gt; &lt;span class="nt"&gt;-it&lt;/span&gt; &lt;span class="nt"&gt;-n&lt;/span&gt; nvidia-gpu-operator nvidia-device-plugin-daemonset-xyz &lt;span class="nt"&gt;--&lt;/span&gt; nvidia-smi &lt;span class="nt"&gt;-mig&lt;/span&gt; 0
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Time-Slicing Issues&lt;/strong&gt;:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;   oc describe configmap &lt;span class="nt"&gt;-n&lt;/span&gt; gpu-operator time-slicing-config
   oc logs &lt;span class="nt"&gt;-n&lt;/span&gt; nvidia-gpu-operator nvidia-device-plugin-daemonset-xyz
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Debugging GPU Workloads
&lt;/h4&gt;

&lt;ol&gt;
&lt;li&gt;Check GPU allocation:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;   oc describe pod &amp;lt;pod-name&amp;gt;
   &lt;span class="c"&gt;# Look for resource allocation in the Events section&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;Verify GPU visibility in the pod:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;   oc &lt;span class="nb"&gt;exec&lt;/span&gt; &lt;span class="nt"&gt;-it&lt;/span&gt; &amp;lt;pod-name&amp;gt; &lt;span class="nt"&gt;--&lt;/span&gt; nvidia-smi
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;Check GPU utilization:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;   oc &lt;span class="nb"&gt;exec&lt;/span&gt; &lt;span class="nt"&gt;-it&lt;/span&gt; &amp;lt;pod-name&amp;gt; &lt;span class="nt"&gt;--&lt;/span&gt; nvidia-smi dmon
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;This comprehensive guide has covered both RDMA networking and NVIDIA GPU architecture in OpenShift environments. By understanding and implementing these technologies, organizations can significantly improve the performance and efficiency of their data-intensive workloads.&lt;/p&gt;

&lt;h3&gt;
  
  
  RDMA Benefits and Best Practices
&lt;/h3&gt;

&lt;p&gt;RDMA provides significant performance benefits for high-throughput, low-latency applications in OpenShift environments. When implementing RDMA in OpenShift:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Choose the appropriate RDMA configuration based on your performance and isolation requirements&lt;/li&gt;
&lt;li&gt;Ensure proper network configuration for optimal performance&lt;/li&gt;
&lt;li&gt;Align RDMA resources with CPU, memory, and GPU resources for best results&lt;/li&gt;
&lt;li&gt;Monitor and tune your RDMA deployment to maintain peak performance&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  GPU Sharing Benefits and Best Practices
&lt;/h3&gt;

&lt;p&gt;NVIDIA GPU architecture in OpenShift provides flexible options for sharing and utilizing GPU resources efficiently. When implementing GPU sharing in OpenShift:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Select the appropriate sharing mechanism based on your hardware capabilities and isolation requirements&lt;/li&gt;
&lt;li&gt;Consider the deployment scenario (bare metal, VMs, or mixed) when choosing a sharing approach&lt;/li&gt;
&lt;li&gt;Optimize performance through proper NUMA alignment and monitoring&lt;/li&gt;
&lt;li&gt;Integrate with RDMA for high-performance computing workloads&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;By combining RDMA networking with GPU acceleration and implementing the appropriate sharing mechanisms, organizations can build high-performance, cost-effective OpenShift environments for AI/ML, HPC, and other data-intensive workloads.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>[Part02] Getting Started with Red Hat OpenShift with NVIDIA</title>
      <dc:creator>MohamedELGAMASY</dc:creator>
      <pubDate>Fri, 04 Jul 2025 16:35:10 +0000</pubDate>
      <link>https://dev.to/gamasy/-getting-started-with-red-hat-openshift-with-nvidia-kbk</link>
      <guid>https://dev.to/gamasy/-getting-started-with-red-hat-openshift-with-nvidia-kbk</guid>
      <description>&lt;h1&gt;
  
  
  Getting Started with Red Hat OpenShift with NVIDIA
&lt;/h1&gt;

&lt;h2&gt;
  
  
  Prerequisites
&lt;/h2&gt;

&lt;p&gt;Before deploying NVIDIA networking solutions on OpenShift, ensure the following prerequisites are met:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;A functioning Red Hat OpenShift Container Platform cluster (version 4.10 or later recommended)&lt;/li&gt;
&lt;li&gt;NVIDIA networking hardware (Mellanox ConnectX or BlueField series) installed in worker nodes&lt;/li&gt;
&lt;li&gt;Node Feature Discovery (NFD) operator installed and configured&lt;/li&gt;
&lt;li&gt;SR-IOV Network Operator installed (if using SR-IOV capabilities)&lt;/li&gt;
&lt;li&gt;GPU Operator installed (if using GPUDirect RDMA)&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Understanding Networking Technologies
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Ethernet vs. InfiniBand
&lt;/h3&gt;

&lt;p&gt;When planning your OpenShift deployment with NVIDIA networking hardware, it's crucial to understand the fundamental differences between Ethernet and InfiniBand technologies:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;Ethernet&lt;/th&gt;
&lt;th&gt;InfiniBand&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Design Purpose&lt;/td&gt;
&lt;td&gt;General data movement between systems&lt;/td&gt;
&lt;td&gt;High reliability, high bandwidth, low latency for supercomputing clusters&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Latency Handling&lt;/td&gt;
&lt;td&gt;Store-and-forward with MAC address transport&lt;/td&gt;
&lt;td&gt;Cut-through approach with 16-bit LID for faster forwarding&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Network Reliability&lt;/td&gt;
&lt;td&gt;No scheduling-based flow control, potential for congestion&lt;/td&gt;
&lt;td&gt;End-to-end flow control providing lossless networking&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Network Mode&lt;/td&gt;
&lt;td&gt;MAC addresses with ARP protocol&lt;/td&gt;
&lt;td&gt;Built-in software-defined networking with subnet manager&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;OpenShift Compatibility&lt;/td&gt;
&lt;td&gt;Native support&lt;/td&gt;
&lt;td&gt;Requires special configuration and cannot be used for cluster API traffic&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Important Note&lt;/strong&gt;: OpenShift installation requires Ethernet connectivity for the cluster API traffic. InfiniBand can only be used as a secondary network for application traffic after the cluster is installed.&lt;/p&gt;

&lt;h3&gt;
  
  
  Recommended Network Architecture
&lt;/h3&gt;

&lt;p&gt;For deployments requiring both OpenShift functionality and high-speed InfiniBand connectivity, a dual-network architecture is recommended:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Primary Network (Ethernet)&lt;/strong&gt;: Used for OpenShift cluster API traffic, management, and standard application networking&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Secondary Network (InfiniBand or Ethernet with RDMA)&lt;/strong&gt;: Used for high-performance, low-latency application traffic&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This architecture can be implemented using:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Dual single-port NICs (one Ethernet, one InfiniBand)&lt;/li&gt;
&lt;li&gt;Dual-port NICs with one port configured for Ethernet and one for InfiniBand&lt;/li&gt;
&lt;li&gt;Multiple Ethernet NICs with RDMA capabilities&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  NVIDIA Network Operator Installation
&lt;/h2&gt;

&lt;p&gt;The NVIDIA Network Operator is the primary tool for deploying and managing NVIDIA networking components in OpenShift. It can be installed using either the OpenShift web console or the command-line interface.&lt;/p&gt;

&lt;h3&gt;
  
  
  Prerequisites
&lt;/h3&gt;

&lt;p&gt;Before installing the Network Operator, ensure:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Node Feature Discovery (NFD) is properly configured&lt;/li&gt;
&lt;li&gt;Worker nodes with NVIDIA networking hardware are labeled with &lt;code&gt;feature.node.kubernetes.io/pci-15b3.present=true&lt;/code&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Installation Using OpenShift Web Console
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;In the OpenShift Container Platform web console, navigate to &lt;strong&gt;Operators &amp;gt; OperatorHub&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Search for "NVIDIA Network Operator"&lt;/li&gt;
&lt;li&gt;Select the operator and click &lt;strong&gt;Install&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Follow the on-screen instructions to complete the installation&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Installation Using OpenShift CLI
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;Create a namespace for the Network Operator:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;   oc create namespace nvidia-network-operator
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;Determine the current channel version:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;   oc get packagemanifest nvidia-network-operator &lt;span class="nt"&gt;-n&lt;/span&gt; openshift-marketplace &lt;span class="nt"&gt;-o&lt;/span&gt; &lt;span class="nv"&gt;jsonpath&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'{.status.defaultChannel}'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;Create a subscription file (network-operator-sub.yaml):
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;   &lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;operators.coreos.com/v1alpha1&lt;/span&gt;
   &lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Subscription&lt;/span&gt;
   &lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
     &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;nvidia-network-operator&lt;/span&gt;
     &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;nvidia-network-operator&lt;/span&gt;
   &lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
     &lt;span class="na"&gt;channel&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;v25.4"&lt;/span&gt;  &lt;span class="c1"&gt;# Replace with the current channel from step 2&lt;/span&gt;
     &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;nvidia-network-operator&lt;/span&gt;
     &lt;span class="na"&gt;source&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;certified-operators&lt;/span&gt;
     &lt;span class="na"&gt;sourceNamespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;openshift-marketplace&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;Apply the subscription:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;   oc create &lt;span class="nt"&gt;-f&lt;/span&gt; network-operator-sub.yaml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;Switch to the network-operator project:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;   oc project nvidia-network-operator
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Verification
&lt;/h3&gt;

&lt;p&gt;Verify the operator deployment with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;oc get pods &lt;span class="nt"&gt;-n&lt;/span&gt; nvidia-network-operator
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A successful deployment will show the controller manager pod with a Running status.&lt;/p&gt;

&lt;h2&gt;
  
  
  RDMA Configuration Options
&lt;/h2&gt;

&lt;p&gt;Remote Direct Memory Access (RDMA) allows computers to directly access each other's memory without involving the CPU or operating system, providing high bandwidth and low latency. NVIDIA offers three configuration methods for RDMA in OpenShift:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. RDMA Shared Device
&lt;/h3&gt;

&lt;p&gt;The RDMA Shared Device configuration allows multiple pods on a worker node to share the same RDMA device. This method is suitable for development environments or applications where maximum performance is not critical.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Configuration Example&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;mellanox.com/v1alpha1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;NicClusterPolicy&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;nic-cluster-policy&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;ofedDriver&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;doca-driver&lt;/span&gt;
    &lt;span class="na"&gt;repository&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;nvcr.io/nvidia/mellanox&lt;/span&gt;
    &lt;span class="na"&gt;version&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;25.04-0.6.1.0-2&lt;/span&gt;
    &lt;span class="na"&gt;startupProbe&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;initialDelaySeconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;10&lt;/span&gt;
      &lt;span class="na"&gt;periodSeconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;20&lt;/span&gt;
    &lt;span class="na"&gt;livenessProbe&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;initialDelaySeconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;30&lt;/span&gt;
      &lt;span class="na"&gt;periodSeconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;30&lt;/span&gt;
    &lt;span class="na"&gt;readinessProbe&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;initialDelaySeconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;10&lt;/span&gt;
      &lt;span class="na"&gt;periodSeconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;30&lt;/span&gt;
  &lt;span class="na"&gt;rdmaSharedDevicePlugin&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;config&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
      &lt;span class="s"&gt;{&lt;/span&gt;
        &lt;span class="s"&gt;"configList": [&lt;/span&gt;
          &lt;span class="s"&gt;{&lt;/span&gt;
            &lt;span class="s"&gt;"resourceName": "rdma_shared_device_ib",&lt;/span&gt;
            &lt;span class="s"&gt;"rdmaHcaMax": 63,&lt;/span&gt;
            &lt;span class="s"&gt;"selectors": {&lt;/span&gt;
              &lt;span class="s"&gt;"ifNames": ["ibs2f0"]&lt;/span&gt;
            &lt;span class="s"&gt;}&lt;/span&gt;
          &lt;span class="s"&gt;},&lt;/span&gt;
          &lt;span class="s"&gt;{&lt;/span&gt;
            &lt;span class="s"&gt;"resourceName": "rdma_shared_device_eth",&lt;/span&gt;
            &lt;span class="s"&gt;"rdmaHcaMax": 63,&lt;/span&gt;
            &lt;span class="s"&gt;"selectors": {&lt;/span&gt;
              &lt;span class="s"&gt;"ifNames": ["ens8f0np0"]&lt;/span&gt;
            &lt;span class="s"&gt;}&lt;/span&gt;
          &lt;span class="s"&gt;}&lt;/span&gt;
        &lt;span class="s"&gt;]&lt;/span&gt;
      &lt;span class="s"&gt;}&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;k8s-rdma-shared-dev-plugin&lt;/span&gt;
    &lt;span class="na"&gt;repository&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;nvcr.io/nvidia/cloud-native&lt;/span&gt;
    &lt;span class="na"&gt;version&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;v1.5.3&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Use Cases&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Development and testing environments&lt;/li&gt;
&lt;li&gt;Applications where multiple pods need RDMA functionality but not maximum performance&lt;/li&gt;
&lt;li&gt;Environments with limited hardware resources&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  2. RDMA SR-IOV Legacy Device
&lt;/h3&gt;

&lt;p&gt;The SR-IOV (Single Root I/O Virtualization) configuration segments a network device at the hardware layer, creating multiple virtual functions (VFs) that can be assigned to different pods. This provides better isolation and performance compared to the shared device method.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Configuration Example&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# NicClusterPolicy for OFED driver&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;mellanox.com/v1alpha1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;NicClusterPolicy&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;nic-cluster-policy&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;ofedDriver&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;doca-driver&lt;/span&gt;
    &lt;span class="na"&gt;repository&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;nvcr.io/nvidia/mellanox&lt;/span&gt;
    &lt;span class="na"&gt;version&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;25.04-0.6.1.0-2&lt;/span&gt;
    &lt;span class="na"&gt;startupProbe&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;initialDelaySeconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;10&lt;/span&gt;
      &lt;span class="na"&gt;periodSeconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;20&lt;/span&gt;
    &lt;span class="na"&gt;livenessProbe&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;initialDelaySeconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;30&lt;/span&gt;
      &lt;span class="na"&gt;periodSeconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;30&lt;/span&gt;
    &lt;span class="na"&gt;readinessProbe&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;initialDelaySeconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;10&lt;/span&gt;
      &lt;span class="na"&gt;periodSeconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;30&lt;/span&gt;

&lt;span class="c1"&gt;# SR-IOV Network Node Policy&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;sriovnetwork.openshift.io/v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;SriovNetworkNodePolicy&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;sriov-legacy-policy&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;openshift-sriov-network-operator&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;deviceType&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;netdevice&lt;/span&gt;
  &lt;span class="na"&gt;mtu&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1500&lt;/span&gt;
  &lt;span class="na"&gt;nicSelector&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;vendor&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;15b3"&lt;/span&gt;
    &lt;span class="na"&gt;pfNames&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ens8f0np0#0-7"&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
  &lt;span class="na"&gt;nodeSelector&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;feature.node.kubernetes.io/pci-15b3.present&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;true"&lt;/span&gt;
  &lt;span class="na"&gt;numVfs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;8&lt;/span&gt;
  &lt;span class="na"&gt;priority&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;90&lt;/span&gt;
  &lt;span class="na"&gt;isRdma&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="na"&gt;resourceName&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;sriovlegacy&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Use Cases&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Production environments requiring high performance&lt;/li&gt;
&lt;li&gt;Workloads sensitive to latency and bandwidth&lt;/li&gt;
&lt;li&gt;Applications requiring isolation between network resources&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  3. RDMA Host Device
&lt;/h3&gt;

&lt;p&gt;The Host Device configuration passes the entire physical network device from the host to a pod. This provides maximum performance but limits the device to a single pod at a time.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Configuration Example&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;mellanox.com/v1alpha1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;NicClusterPolicy&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;nic-cluster-policy&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;ofedDriver&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;doca-driver&lt;/span&gt;
    &lt;span class="na"&gt;repository&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;nvcr.io/nvidia/mellanox&lt;/span&gt;
    &lt;span class="na"&gt;version&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;25.04-0.6.1.0-2&lt;/span&gt;
    &lt;span class="na"&gt;startupProbe&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;initialDelaySeconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;10&lt;/span&gt;
      &lt;span class="na"&gt;periodSeconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;20&lt;/span&gt;
    &lt;span class="na"&gt;livenessProbe&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;initialDelaySeconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;30&lt;/span&gt;
      &lt;span class="na"&gt;periodSeconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;30&lt;/span&gt;
    &lt;span class="na"&gt;readinessProbe&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;initialDelaySeconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;10&lt;/span&gt;
      &lt;span class="na"&gt;periodSeconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;30&lt;/span&gt;
  &lt;span class="na"&gt;sriovDevicePlugin&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;sriov-network-device-plugin&lt;/span&gt;
    &lt;span class="na"&gt;repository&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ghcr.io/k8snetworkplumbingwg&lt;/span&gt;
    &lt;span class="na"&gt;version&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;v3.7.0&lt;/span&gt;
    &lt;span class="na"&gt;config&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
      &lt;span class="s"&gt;{&lt;/span&gt;
        &lt;span class="s"&gt;"resourceList": [&lt;/span&gt;
          &lt;span class="s"&gt;{&lt;/span&gt;
            &lt;span class="s"&gt;"resourcePrefix": "nvidia.com",&lt;/span&gt;
            &lt;span class="s"&gt;"resourceName": "hostdev",&lt;/span&gt;
            &lt;span class="s"&gt;"selectors": {&lt;/span&gt;
              &lt;span class="s"&gt;"vendors": ["15b3"],&lt;/span&gt;
              &lt;span class="s"&gt;"isRdma": true&lt;/span&gt;
            &lt;span class="s"&gt;}&lt;/span&gt;
          &lt;span class="s"&gt;}&lt;/span&gt;
        &lt;span class="s"&gt;]&lt;/span&gt;
      &lt;span class="s"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Use Cases&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Workloads requiring maximum performance&lt;/li&gt;
&lt;li&gt;Systems where SR-IOV is not supported&lt;/li&gt;
&lt;li&gt;Applications needing features only available in the physical function driver&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Deployment Examples
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Example 1: Network Operator with Host Device Network
&lt;/h3&gt;

&lt;p&gt;This example demonstrates deploying the Network Operator with SR-IOV device plugin and a single SR-IOV resource pool:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# NicClusterPolicy&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;mellanox.com/v1alpha1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;NicClusterPolicy&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;nic-cluster-policy&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;ofedDriver&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;doca-driver&lt;/span&gt;
    &lt;span class="na"&gt;repository&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;nvcr.io/nvidia/mellanox&lt;/span&gt;
    &lt;span class="na"&gt;version&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;25.04-0.6.1.0-2&lt;/span&gt;
    &lt;span class="na"&gt;startupProbe&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;initialDelaySeconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;10&lt;/span&gt;
      &lt;span class="na"&gt;periodSeconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;20&lt;/span&gt;
    &lt;span class="na"&gt;livenessProbe&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;initialDelaySeconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;30&lt;/span&gt;
      &lt;span class="na"&gt;periodSeconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;30&lt;/span&gt;
    &lt;span class="na"&gt;readinessProbe&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;initialDelaySeconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;10&lt;/span&gt;
      &lt;span class="na"&gt;periodSeconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;30&lt;/span&gt;
  &lt;span class="na"&gt;sriovDevicePlugin&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;sriov-network-device-plugin&lt;/span&gt;
    &lt;span class="na"&gt;repository&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ghcr.io/k8snetworkplumbingwg&lt;/span&gt;
    &lt;span class="na"&gt;version&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;v3.9.0&lt;/span&gt;
    &lt;span class="na"&gt;config&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
      &lt;span class="s"&gt;{&lt;/span&gt;
        &lt;span class="s"&gt;"resourceList": [&lt;/span&gt;
          &lt;span class="s"&gt;{&lt;/span&gt;
            &lt;span class="s"&gt;"resourcePrefix": "nvidia.com",&lt;/span&gt;
            &lt;span class="s"&gt;"resourceName": "hostdev",&lt;/span&gt;
            &lt;span class="s"&gt;"selectors": {&lt;/span&gt;
              &lt;span class="s"&gt;"vendors": ["15b3"],&lt;/span&gt;
              &lt;span class="s"&gt;"isRdma": true&lt;/span&gt;
            &lt;span class="s"&gt;}&lt;/span&gt;
          &lt;span class="s"&gt;}&lt;/span&gt;
        &lt;span class="s"&gt;]&lt;/span&gt;
      &lt;span class="s"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;# HostDeviceNetwork&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;mellanox.com/v1alpha1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;HostDeviceNetwork&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;hostdev-net&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;networkNamespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;default"&lt;/span&gt;
  &lt;span class="na"&gt;resourceName&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;hostdev"&lt;/span&gt;
  &lt;span class="na"&gt;ipam&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
    &lt;span class="s"&gt;{&lt;/span&gt;
      &lt;span class="s"&gt;"type": "whereabouts",&lt;/span&gt;
      &lt;span class="s"&gt;"datastore": "kubernetes",&lt;/span&gt;
      &lt;span class="s"&gt;"kubernetes": {&lt;/span&gt;
        &lt;span class="s"&gt;"kubeconfig": "/etc/cni/net.d/whereabouts.d/whereabouts.kubeconfig"&lt;/span&gt;
      &lt;span class="s"&gt;},&lt;/span&gt;
      &lt;span class="s"&gt;"range": "192.168.3.225/28",&lt;/span&gt;
      &lt;span class="s"&gt;"exclude": [&lt;/span&gt;
        &lt;span class="s"&gt;"192.168.3.229/30",&lt;/span&gt;
        &lt;span class="s"&gt;"192.168.3.236/32"&lt;/span&gt;
      &lt;span class="s"&gt;],&lt;/span&gt;
      &lt;span class="s"&gt;"log_file" : "/var/log/whereabouts.log",&lt;/span&gt;
      &lt;span class="s"&gt;"log_level" : "info"&lt;/span&gt;
    &lt;span class="s"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;# Example Pod&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Pod&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;hostdev-test-pod&lt;/span&gt;
  &lt;span class="na"&gt;annotations&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;k8s.v1.cni.cncf.io/networks&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;hostdev-net&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;restartPolicy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;OnFailure&lt;/span&gt;
  &lt;span class="na"&gt;containers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;&amp;lt;rdma image&amp;gt;&lt;/span&gt;
    &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;doca-test-ctr&lt;/span&gt;
    &lt;span class="na"&gt;securityContext&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;capabilities&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;add&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;IPC_LOCK"&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
    &lt;span class="na"&gt;resources&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;requests&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;nvidia.com/hostdev&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;
      &lt;span class="na"&gt;limits&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;nvidia.com/hostdev&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;
    &lt;span class="na"&gt;command&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;sh&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;-c&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;sleep inf&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Example 2: Network Operator with SR-IOV Legacy Mode
&lt;/h3&gt;

&lt;p&gt;This example demonstrates deploying the Network Operator with SR-IOV in legacy mode:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# NicClusterPolicy&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;mellanox.com/v1alpha1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;NicClusterPolicy&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;nic-cluster-policy&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;ofedDriver&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;doca-driver&lt;/span&gt;
    &lt;span class="na"&gt;repository&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;nvcr.io/nvidia/mellanox&lt;/span&gt;
    &lt;span class="na"&gt;version&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;25.04-0.6.1.0-2&lt;/span&gt;
    &lt;span class="na"&gt;startupProbe&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;initialDelaySeconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;10&lt;/span&gt;
      &lt;span class="na"&gt;periodSeconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;20&lt;/span&gt;
    &lt;span class="na"&gt;livenessProbe&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;initialDelaySeconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;30&lt;/span&gt;
      &lt;span class="na"&gt;periodSeconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;30&lt;/span&gt;
    &lt;span class="na"&gt;readinessProbe&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;initialDelaySeconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;10&lt;/span&gt;
      &lt;span class="na"&gt;periodSeconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;30&lt;/span&gt;

&lt;span class="c1"&gt;# SriovNetworkNodePolicy&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;sriovnetwork.openshift.io/v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;SriovNetworkNodePolicy&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;policy-1&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;openshift-sriov-network-operator&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;deviceType&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;netdevice&lt;/span&gt;
  &lt;span class="na"&gt;mtu&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1500&lt;/span&gt;
  &lt;span class="na"&gt;nicSelector&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;vendor&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;15b3"&lt;/span&gt;
    &lt;span class="na"&gt;pfNames&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ens2f0"&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
  &lt;span class="na"&gt;nodeSelector&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;feature.node.kubernetes.io/pci-15b3.present&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;true"&lt;/span&gt;
  &lt;span class="na"&gt;numVfs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;8&lt;/span&gt;
  &lt;span class="na"&gt;priority&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;90&lt;/span&gt;
  &lt;span class="na"&gt;isRdma&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="na"&gt;resourceName&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;sriovlegacy&lt;/span&gt;

&lt;span class="c1"&gt;# SriovNetwork&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;sriovnetwork.openshift.io/v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;SriovNetwork&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;sriov-network&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;openshift-sriov-network-operator&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;vlan&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;0&lt;/span&gt;
  &lt;span class="na"&gt;networkNamespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;default"&lt;/span&gt;
  &lt;span class="na"&gt;resourceName&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sriovlegacy"&lt;/span&gt;
  &lt;span class="na"&gt;ipam&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
    &lt;span class="s"&gt;{&lt;/span&gt;
      &lt;span class="s"&gt;"datastore": "kubernetes",&lt;/span&gt;
      &lt;span class="s"&gt;"kubernetes": {&lt;/span&gt;
        &lt;span class="s"&gt;"kubeconfig": "/etc/cni/net.d/whereabouts.d/whereabouts.kubeconfig"&lt;/span&gt;
      &lt;span class="s"&gt;},&lt;/span&gt;
      &lt;span class="s"&gt;"log_file": "/tmp/whereabouts.log",&lt;/span&gt;
      &lt;span class="s"&gt;"log_level": "debug",&lt;/span&gt;
      &lt;span class="s"&gt;"type": "whereabouts",&lt;/span&gt;
      &lt;span class="s"&gt;"range": "192.168.101.0/24"&lt;/span&gt;
    &lt;span class="s"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;# Example Pod&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Pod&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;testpod1&lt;/span&gt;
  &lt;span class="na"&gt;annotations&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;k8s.v1.cni.cncf.io/networks&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;sriov-network&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;containers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;appcntr1&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;&amp;lt;image&amp;gt;&lt;/span&gt;
    &lt;span class="na"&gt;imagePullPolicy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;IfNotPresent&lt;/span&gt;
    &lt;span class="na"&gt;securityContext&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;capabilities&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;add&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;IPC_LOCK"&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
    &lt;span class="na"&gt;command&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;sh&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;-c&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;sleep inf&lt;/span&gt;
    &lt;span class="na"&gt;resources&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;requests&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;openshift.io/sriovlegacy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;1'&lt;/span&gt;
      &lt;span class="na"&gt;limits&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;openshift.io/sriovlegacy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;1'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Example 3: Network Operator with RDMA Shared Device
&lt;/h3&gt;

&lt;p&gt;This example demonstrates deploying the Network Operator with RDMA Shared Device and MacVlan Network:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# NicClusterPolicy&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;mellanox.com/v1alpha1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;NicClusterPolicy&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;nic-cluster-policy&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;ofedDriver&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;doca-driver&lt;/span&gt;
    &lt;span class="na"&gt;repository&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;nvcr.io/nvidia/mellanox&lt;/span&gt;
    &lt;span class="na"&gt;version&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;25.04-0.6.1.0-2&lt;/span&gt;
    &lt;span class="na"&gt;startupProbe&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;initialDelaySeconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;10&lt;/span&gt;
      &lt;span class="na"&gt;periodSeconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;20&lt;/span&gt;
    &lt;span class="na"&gt;livenessProbe&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;initialDelaySeconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;30&lt;/span&gt;
      &lt;span class="na"&gt;periodSeconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;30&lt;/span&gt;
    &lt;span class="na"&gt;readinessProbe&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;initialDelaySeconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;10&lt;/span&gt;
      &lt;span class="na"&gt;periodSeconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;30&lt;/span&gt;
  &lt;span class="na"&gt;rdmaSharedDevicePlugin&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;config&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
      &lt;span class="s"&gt;{&lt;/span&gt;
        &lt;span class="s"&gt;"configList": [&lt;/span&gt;
          &lt;span class="s"&gt;{&lt;/span&gt;
            &lt;span class="s"&gt;"resourceName": "rdmashared",&lt;/span&gt;
            &lt;span class="s"&gt;"rdmaHcaMax": 1000,&lt;/span&gt;
            &lt;span class="s"&gt;"selectors": {&lt;/span&gt;
              &lt;span class="s"&gt;"ifNames": ["enp4s0f0np0"]&lt;/span&gt;
            &lt;span class="s"&gt;}&lt;/span&gt;
          &lt;span class="s"&gt;}&lt;/span&gt;
        &lt;span class="s"&gt;]&lt;/span&gt;
      &lt;span class="s"&gt;}&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;k8s-rdma-shared-dev-plugin&lt;/span&gt;
    &lt;span class="na"&gt;repository&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;nvcr.io/nvidia/cloud-native&lt;/span&gt;
    &lt;span class="na"&gt;version&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;v1.5.3&lt;/span&gt;

&lt;span class="c1"&gt;# MacvlanNetwork&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;mellanox.com/v1alpha1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;MacvlanNetwork&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;rdmashared-net&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;networkNamespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;default&lt;/span&gt;
  &lt;span class="na"&gt;master&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;enp4s0f0np0&lt;/span&gt;
  &lt;span class="na"&gt;mode&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;bridge&lt;/span&gt;
  &lt;span class="na"&gt;mtu&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1500&lt;/span&gt;
  &lt;span class="na"&gt;ipam&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;{"type":"whereabouts","range":"16.0.2.0/24","gateway":"16.0.2.1"}'&lt;/span&gt;

&lt;span class="c1"&gt;# Example Pod&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Pod&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;test-rdma-shared-1&lt;/span&gt;
  &lt;span class="na"&gt;annotations&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;k8s.v1.cni.cncf.io/networks&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;rdmashared-net&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;containers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;myimage&lt;/span&gt;
    &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;rdma-shared-1&lt;/span&gt;
    &lt;span class="na"&gt;securityContext&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;capabilities&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;add&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;IPC_LOCK&lt;/span&gt;
    &lt;span class="na"&gt;resources&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;limits&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;rdma/rdmashared&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;
      &lt;span class="na"&gt;requests&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;rdma/rdmashared&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;
  &lt;span class="na"&gt;restartPolicy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;OnFailure&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Best Practices
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Network Design Considerations
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Separate Control and Data Planes&lt;/strong&gt;: Dedicate control plane nodes for OpenShift deployments with NVIDIA Network Operator&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Dual Network Architecture&lt;/strong&gt;: Use Ethernet for cluster API traffic and InfiniBand for high-performance application traffic&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Resource Planning&lt;/strong&gt;: Carefully plan the number of VFs and resource allocation based on workload requirements&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;NUMA Alignment&lt;/strong&gt;: Ensure NUMA alignment between GPUs, NICs, and CPU cores for optimal performance&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Performance Optimization
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;RDMA Configuration Selection&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Use RDMA Shared Device for development or when maximum performance is not critical&lt;/li&gt;
&lt;li&gt;Use SR-IOV Legacy for production workloads requiring high performance with multiple pods&lt;/li&gt;
&lt;li&gt;Use Host Device when maximum performance is required for a single pod&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;MTU Configuration&lt;/strong&gt;: Configure jumbo frames (MTU 9000) for improved throughput when supported by the network infrastructure&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;HUGEPAGES Configuration&lt;/strong&gt;: For DPDK workloads, configure HUGEPAGES following the &lt;a href="https://docs.openshift.com/container-platform/latest/scalability_and_performance/what-huge-pages-do-and-how-they-are-consumed-by-apps.html" rel="noopener noreferrer"&gt;OpenShift documentation&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Troubleshooting
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Verify Node Feature Discovery&lt;/strong&gt;:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;   oc describe node | &lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="nt"&gt;-E&lt;/span&gt; &lt;span class="s1"&gt;'Roles|pci'&lt;/span&gt; | &lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="nt"&gt;-v&lt;/span&gt; &lt;span class="s2"&gt;"control-plane"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Ensure nodes with NVIDIA hardware show the label &lt;code&gt;feature.node.kubernetes.io/pci-15b3.present=true&lt;/code&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Check Network Operator Status&lt;/strong&gt;:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;   oc get pods &lt;span class="nt"&gt;-n&lt;/span&gt; nvidia-network-operator
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Verify OFED Driver Installation&lt;/strong&gt;:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;   oc get pods &lt;span class="nt"&gt;-n&lt;/span&gt; nvidia-network-operator | &lt;span class="nb"&gt;grep &lt;/span&gt;ofed
   oc logs &amp;lt;ofed-driver-pod-name&amp;gt; &lt;span class="nt"&gt;-n&lt;/span&gt; nvidia-network-operator
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Check SR-IOV Resources&lt;/strong&gt;:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;   oc get sriovnetworknodestates &lt;span class="nt"&gt;-n&lt;/span&gt; openshift-sriov-network-operator &lt;span class="nt"&gt;-o&lt;/span&gt; yaml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Verify Available Resources&lt;/strong&gt;:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;   oc get node &amp;lt;node-name&amp;gt; &lt;span class="nt"&gt;-o&lt;/span&gt; json | jq &lt;span class="s1"&gt;'.status.allocatable'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  References
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;&lt;a href="https://docs.nvidia.com/networking/display/kubernetes2540/getting-started-openshift.html" rel="noopener noreferrer"&gt;NVIDIA Network Operator Documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.openshift.com/container-platform/latest/welcome/index.html" rel="noopener noreferrer"&gt;Red Hat OpenShift Documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.nvidia.com/doca/" rel="noopener noreferrer"&gt;NVIDIA DOCA Documentation&lt;/a&gt;&lt;/li&gt;
&lt;/ol&gt;

</description>
      <category>nvidia</category>
      <category>openshift</category>
      <category>ai</category>
      <category>programming</category>
    </item>
    <item>
      <title>Prompt Engineering for Generative AI: Comprehensive Guide</title>
      <dc:creator>MohamedELGAMASY</dc:creator>
      <pubDate>Sat, 21 Jun 2025 19:55:08 +0000</pubDate>
      <link>https://dev.to/gamasy/prompt-engineering-for-generative-ai-comprehensive-guide-19ph</link>
      <guid>https://dev.to/gamasy/prompt-engineering-for-generative-ai-comprehensive-guide-19ph</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F91y6sszlmfabhmj8club.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F91y6sszlmfabhmj8club.png" alt="Image description" width="800" height="800"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  Prompt Engineering for Generative AI: Comprehensive Guide
&lt;/h1&gt;

&lt;h2&gt;
  
  
  Table of Contents
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Introduction to Prompt Engineering&lt;/li&gt;
&lt;li&gt;
Fundamentals of Prompt Engineering

&lt;ul&gt;
&lt;li&gt;What Makes an Effective Prompt&lt;/li&gt;
&lt;li&gt;The Prompt Engineering Workflow&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;Best Practices for Prompt Engineering&lt;/li&gt;

&lt;li&gt;Advanced Prompt Engineering Techniques&lt;/li&gt;

&lt;li&gt;Implementing Prompt Engineering in Enterprise Contexts&lt;/li&gt;

&lt;li&gt;Industry-Specific Applications&lt;/li&gt;

&lt;li&gt;Ethical Considerations in Prompt Engineering&lt;/li&gt;

&lt;li&gt;Future Trends in Prompt Engineering&lt;/li&gt;

&lt;li&gt;Conclusion&lt;/li&gt;

&lt;li&gt;References&lt;/li&gt;

&lt;/ul&gt;

&lt;h2&gt;
  
  
  Introduction to Prompt Engineering
&lt;/h2&gt;

&lt;p&gt;Prompt engineering is the art and science of designing and refining inputs (prompts) to elicit the desired output from generative AI models. As organizations increasingly adopt AI technologies, the ability to effectively communicate with these models has become a critical skill for maximizing their value and utility. This guide provides a comprehensive framework for understanding and implementing prompt engineering techniques in enterprise contexts.&lt;/p&gt;

&lt;p&gt;Generative AI models, particularly large language models (LLMs), are trained on vast amounts of text data to learn patterns and relationships between units of language. When given a prompt, these models predict what is likely to come next, functioning as sophisticated autocompletion tools. The quality of the input significantly influences the relevance, accuracy, and usefulness of the AI's response, making prompt engineering a pivotal discipline in the AI implementation lifecycle.&lt;/p&gt;

&lt;h2&gt;
  
  
  Fundamentals of Prompt Engineering
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What Makes an Effective Prompt
&lt;/h3&gt;

&lt;p&gt;An effective prompt combines two essential aspects: content and structure. The content provides all relevant information associated with the task, while the structure helps the model parse this information efficiently.&lt;/p&gt;

&lt;h4&gt;
  
  
  Content Components
&lt;/h4&gt;

&lt;p&gt;The content of a prompt should include all necessary information for the model to understand and complete the requested task. Key components include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Objective&lt;/strong&gt;: A clear statement of what you want the model to achieve&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Instructions&lt;/strong&gt;: Step-by-step guidance on how to perform the task&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Context&lt;/strong&gt;: Background information or reference materials needed to inform the response&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Constraints&lt;/strong&gt;: Boundaries and limitations that the model must adhere to&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Examples&lt;/strong&gt;: Sample inputs and outputs that demonstrate the expected format and quality&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Structural Elements
&lt;/h4&gt;

&lt;p&gt;The structure of a prompt helps the model interpret the provided information correctly. Important structural elements include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Ordering&lt;/strong&gt;: The sequence in which information is presented&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Delimiters&lt;/strong&gt;: Special characters or formatting that separate different components&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Labeling&lt;/strong&gt;: Clear identification of different sections within the prompt&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Formatting&lt;/strong&gt;: Consistent use of spacing, paragraphs, and other visual elements&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  The Prompt Engineering Workflow
&lt;/h3&gt;

&lt;p&gt;Prompt engineering is a test-driven and iterative process that can significantly enhance model performance. The workflow typically involves:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Define Objectives&lt;/strong&gt;: Clearly articulate what you want the model to accomplish&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Draft Initial Prompt&lt;/strong&gt;: Create a first version based on best practices&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Test and Evaluate&lt;/strong&gt;: Assess the model's responses against your objectives&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Refine and Iterate&lt;/strong&gt;: Make targeted improvements based on test results&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Document and Share&lt;/strong&gt;: Record successful prompts and techniques for reuse&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This systematic approach ensures continuous improvement and knowledge sharing across the organization, leading to more effective AI implementations over time.&lt;/p&gt;

&lt;h2&gt;
  
  
  Best Practices for Prompt Engineering
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Use the Latest Model
&lt;/h3&gt;

&lt;p&gt;For optimal results, use the most recent and capable models available. Newer models generally offer improved performance and are often easier to prompt engineer due to their enhanced capabilities and more sophisticated understanding of instructions.&lt;/p&gt;

&lt;h3&gt;
  
  
  Structure Instructions Effectively
&lt;/h3&gt;

&lt;p&gt;Place instructions at the beginning of the prompt for maximum impact. Use clear delimiters such as triple quotes (&lt;code&gt;"""&lt;/code&gt;) or triple hashes (&lt;code&gt;###&lt;/code&gt;) to separate instructions from context or examples:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Summarize the text below as a bullet point list of the most important points.

Text: """
{text input here}
"""
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Be Specific and Detailed
&lt;/h3&gt;

&lt;p&gt;Provide specific, descriptive, and detailed instructions about the desired context, outcome, length, format, and style. Vague prompts lead to unpredictable results, while detailed prompts guide the model toward more precise outputs:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Write a short inspiring poem about OpenAI, focusing on the recent DALL-E product launch (DALL-E is a text to image ML model) in the style of Robert Frost.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Articulate Desired Output Format
&lt;/h3&gt;

&lt;p&gt;Show the model exactly what format you expect by providing examples or explicit formatting instructions:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Extract the important entities mentioned in the text below. First extract all company names, then extract all people names, then extract specific topics which fit the content and finally extract general overarching themes.

Desired format:
Company names: &amp;lt;comma_separated_list_of_company_names&amp;gt;
People names: &amp;lt;comma_separated_list_of_people_names&amp;gt;
Specific topics: &amp;lt;comma_separated_list_of_topics&amp;gt;
General themes: &amp;lt;comma_separated_list_of_themes&amp;gt;

Text: {text}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Use Progressive Prompting Techniques
&lt;/h3&gt;

&lt;p&gt;Start with simpler approaches and progressively increase complexity as needed:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Zero-shot&lt;/strong&gt;: Provide instructions without examples&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Few-shot&lt;/strong&gt;: Include a few examples of desired inputs and outputs&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Fine-tuning&lt;/strong&gt;: For specialized applications, consider fine-tuning the model&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Reduce Ambiguity
&lt;/h3&gt;

&lt;p&gt;Replace vague or imprecise language with specific, quantifiable instructions:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Instead of: "The description for this product should be fairly short, a few sentences only, and not too much more."

Use: "Write a 3 to 5 sentence paragraph to describe this product."
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Provide Positive Direction
&lt;/h3&gt;

&lt;p&gt;Instead of focusing on what not to do, clearly articulate what the model should do:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Instead of: "DO NOT ASK USERNAME OR PASSWORD. DO NOT REPEAT."

Use: "The agent will attempt to diagnose the problem and suggest a solution, whilst refraining from asking any questions related to PII. Instead of asking for PII, such as username or password, refer the user to the help article www.samplewebsite.com/help/faq"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Use Leading Words for Code Generation
&lt;/h3&gt;

&lt;p&gt;When generating code, provide "leading words" to nudge the model toward the desired programming language or pattern:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Write a simple python function that
# 1. Asks the user for a number in miles
# 2. Converts miles to kilometers

import
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Experiment with Model Parameters
&lt;/h3&gt;

&lt;p&gt;Adjust model parameters to fine-tune outputs:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Temperature&lt;/strong&gt;: Controls randomness (0 for deterministic responses, higher values for more creative outputs)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Max tokens&lt;/strong&gt;: Limits the length of the response&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Stop sequences&lt;/strong&gt;: Defines when the model should stop generating text&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Top-p/Top-k&lt;/strong&gt;: Influences the diversity of generated text&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Advanced Prompt Engineering Techniques
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Role Assignment
&lt;/h3&gt;

&lt;p&gt;Assign a specific role to the model to influence its perspective and expertise:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;You are an experienced data scientist specializing in time series analysis. Explain the ARIMA model to a business analyst who has basic statistical knowledge but no experience with time series forecasting.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Chain-of-Thought Prompting
&lt;/h3&gt;

&lt;p&gt;Instruct the model to explain its reasoning step by step, which often leads to more accurate results:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Solve the following math problem, explaining your reasoning at each step:

A store sells notebooks for $4 each and pens for $1.50 each. If a customer buys 3 notebooks and twice as many pens, how much will they spend in total?
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Few-Shot Learning
&lt;/h3&gt;

&lt;p&gt;Provide examples of desired inputs and outputs to guide the model:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Classify the sentiment of the following customer reviews as positive, negative, or neutral:

Review: "The product arrived on time and works perfectly."
Sentiment: Positive

Review: "I've had better experiences with similar products."
Sentiment: Neutral

Review: "This is the worst purchase I've ever made. Completely disappointed."
Sentiment: Negative

Review: "The interface is intuitive, but it crashes occasionally."
Sentiment:
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Task Decomposition
&lt;/h3&gt;

&lt;p&gt;Break complex tasks into smaller, manageable subtasks:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;I need to analyze customer feedback for our new product. Please help me with this process by:

1. First, categorizing the feedback into themes (e.g., usability, performance, features)
2. Then, identifying the most common issues within each theme
3. Finally, suggesting potential improvements based on the feedback

Customer feedback: {text}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Contextual Priming
&lt;/h3&gt;

&lt;p&gt;Provide relevant context before asking the model to perform a task:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Context: Our company is a B2B SaaS provider specializing in supply chain management solutions. Our target audience is procurement managers and supply chain directors at mid to large enterprises.

Task: Draft three potential email subject lines for our upcoming webinar on "Resilient Supply Chains in Uncertain Times."
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Implementing Prompt Engineering in Enterprise Contexts
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Creating a Prompt Library
&lt;/h3&gt;

&lt;p&gt;Develop and maintain a centralized repository of effective prompts for common tasks:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Categorize prompts&lt;/strong&gt; by function, department, or use case&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Document performance metrics&lt;/strong&gt; for each prompt&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Version control&lt;/strong&gt; prompts as they evolve&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Include metadata&lt;/strong&gt; such as intended model, parameters, and use cases&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Establishing Prompt Engineering Guidelines
&lt;/h3&gt;

&lt;p&gt;Create organizational standards for prompt design:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Style guide&lt;/strong&gt; for consistent formatting&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Quality criteria&lt;/strong&gt; for evaluating prompts&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Review process&lt;/strong&gt; for new or modified prompts&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Training materials&lt;/strong&gt; for prompt engineering skills development&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Integrating with Existing Workflows
&lt;/h3&gt;

&lt;p&gt;Incorporate prompt engineering into established business processes:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Identify integration points&lt;/strong&gt; where AI can add value&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Design workflow-specific prompts&lt;/strong&gt; tailored to each use case&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Create feedback loops&lt;/strong&gt; to continuously improve prompt performance&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Develop handoff protocols&lt;/strong&gt; between AI and human workers&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Measuring and Optimizing Prompt Performance
&lt;/h3&gt;

&lt;p&gt;Establish metrics and processes for ongoing improvement:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Define success metrics&lt;/strong&gt; specific to each use case&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Implement A/B testing&lt;/strong&gt; for prompt variations&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Collect user feedback&lt;/strong&gt; on AI-generated outputs&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Analyze error patterns&lt;/strong&gt; to identify improvement opportunities&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Industry-Specific Applications
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Healthcare
&lt;/h3&gt;

&lt;p&gt;Prompt engineering in healthcare requires particular attention to accuracy, privacy, and ethical considerations:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Clinical documentation&lt;/strong&gt;: Summarizing patient encounters&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Medical research&lt;/strong&gt;: Literature review and hypothesis generation&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Patient education&lt;/strong&gt;: Creating accessible explanations of medical concepts&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Financial Services
&lt;/h3&gt;

&lt;p&gt;Financial applications demand precision, compliance awareness, and risk sensitivity:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Investment analysis&lt;/strong&gt;: Summarizing market trends and company performance&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Regulatory compliance&lt;/strong&gt;: Checking documents against regulatory requirements&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Customer service&lt;/strong&gt;: Generating responses to common financial queries&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Manufacturing and Supply Chain
&lt;/h3&gt;

&lt;p&gt;These sectors benefit from prompts that focus on operational efficiency and technical accuracy:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Process optimization&lt;/strong&gt;: Analyzing production data for improvement opportunities&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Quality control&lt;/strong&gt;: Generating inspection checklists and procedures&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Inventory management&lt;/strong&gt;: Forecasting demand and suggesting reorder points&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Marketing and Customer Experience
&lt;/h3&gt;

&lt;p&gt;Creative applications require prompts that balance brand voice, creativity, and strategic alignment:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Content creation&lt;/strong&gt;: Generating marketing copy and social media posts&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Customer segmentation&lt;/strong&gt;: Analyzing customer data for targeting&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Campaign analysis&lt;/strong&gt;: Evaluating performance metrics and suggesting optimizations&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Ethical Considerations in Prompt Engineering
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Mitigating Bias
&lt;/h3&gt;

&lt;p&gt;Strategies to reduce bias in AI-generated content:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Audit prompts&lt;/strong&gt; for potentially biased language or assumptions&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Test with diverse inputs&lt;/strong&gt; to identify disparate outcomes&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Include explicit instructions&lt;/strong&gt; for fairness and inclusivity&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Implement human review&lt;/strong&gt; for sensitive applications&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Ensuring Transparency
&lt;/h3&gt;

&lt;p&gt;Approaches to maintain transparency in AI applications:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Clearly identify AI-generated content&lt;/strong&gt; to users&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Document prompt design decisions&lt;/strong&gt; and their rationale&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Provide explanations&lt;/strong&gt; of how outputs were generated&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Maintain audit trails&lt;/strong&gt; of prompt versions and their effects&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Protecting Privacy and Security
&lt;/h3&gt;

&lt;p&gt;Safeguards for sensitive information:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Avoid including PII in prompts&lt;/strong&gt; unless absolutely necessary&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Implement data minimization principles&lt;/strong&gt; in prompt design&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Use secure channels&lt;/strong&gt; for transmitting prompts and responses&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Establish clear data retention policies&lt;/strong&gt; for prompts and outputs&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Future Trends in Prompt Engineering
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Automated Prompt Optimization
&lt;/h3&gt;

&lt;p&gt;Emerging techniques for algorithmic improvement of prompts:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Genetic algorithms&lt;/strong&gt; that evolve prompts based on performance&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Reinforcement learning&lt;/strong&gt; from human feedback&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Meta-prompting&lt;/strong&gt; where AI helps design prompts for other AI systems&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Prompt testing frameworks&lt;/strong&gt; that automatically evaluate effectiveness&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Multimodal Prompting
&lt;/h3&gt;

&lt;p&gt;The evolution toward prompts that combine multiple types of inputs:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Text-image combinations&lt;/strong&gt; for visual tasks&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Audio-text integration&lt;/strong&gt; for speech-related applications&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Structured data with natural language&lt;/strong&gt; for analytics applications&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Interactive prompting&lt;/strong&gt; with real-time feedback loops&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Domain-Specific Prompt Engineering
&lt;/h3&gt;

&lt;p&gt;The specialization of prompt techniques for particular fields:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Legal prompt engineering&lt;/strong&gt; for contract analysis and legal research&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Scientific prompt engineering&lt;/strong&gt; for research and experimentation&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Educational prompt engineering&lt;/strong&gt; for personalized learning&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Creative prompt engineering&lt;/strong&gt; for art, music, and literature&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Prompt engineering represents a critical capability for organizations seeking to maximize the value of their generative AI investments. By applying the principles, techniques, and best practices outlined in this guide, practitioners can significantly improve the quality, reliability, and usefulness of AI-generated outputs.&lt;/p&gt;

&lt;p&gt;As generative AI continues to evolve, prompt engineering will remain a dynamic field requiring ongoing learning and adaptation. Organizations that develop robust prompt engineering capabilities will be better positioned to leverage these powerful technologies for competitive advantage and innovation.&lt;/p&gt;

&lt;h2&gt;
  
  
  References
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;OpenAI. "Best practices for prompt engineering with the OpenAI API." OpenAI Help Center.&lt;/li&gt;
&lt;li&gt;Google Cloud. "Overview of prompting strategies." Generative AI on Vertex AI.&lt;/li&gt;
&lt;li&gt;DigitalOcean. "Prompt Engineering Best Practices: Tips, Tricks, and Tools."&lt;/li&gt;
&lt;li&gt;Atlassian. "Best practices for generating AI prompts." Work Life by Atlassian.&lt;/li&gt;
&lt;li&gt;Microsoft. "Understanding Prompt Engineering Fundamentals." Generative AI for Beginners.&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>devops</category>
      <category>promptengineering</category>
      <category>machinelearning</category>
    </item>
  </channel>
</rss>
