High Performance Computing Storage in OCI using Lustre File System
As cloud workloads evolve, especially in areas like high-performance computing (HPC), machine learning, and big data analytics, traditional storage systems often become a bottleneck. These workloads require high throughput, low latency, and parallel file access.
In Oracle Cloud Infrastructure, high-performance storage requirements can be addressed using the Lustre File System, a distributed file system designed for large-scale workloads.
This article explores how Lustre works and how it can be used in OCI environments.
What is Lustre File System?
Lustre is a parallel distributed file system designed for environments that require high-speed access to large datasets.
It is commonly used in:
- High Performance Computing (HPC)
- Artificial Intelligence and Machine Learning
- Scientific simulations
- Big data processing
Unlike traditional file systems, Lustre distributes data across multiple storage nodes to achieve high performance.
Why Use Lustre in OCI?
Cloud-based HPC workloads demand:
- High throughput
- Scalable storage
- Parallel access from multiple compute nodes
Lustre provides:
- Parallel read/write operations
- Horizontal scalability
- High bandwidth performance
This makes it ideal for workloads where multiple compute instances process large datasets simultaneously.
Lustre Architecture Overview
Lustre is built using multiple components working together.
Key Components
- Metadata Server (MDS) → Stores file metadata
- Object Storage Servers (OSS) → Store actual data
- Clients → Compute instances accessing the file system
Architecture Flow
Compute Nodes (Clients)
│
▼
Metadata Server (MDS)
│
▼
Object Storage Servers (OSS)
│
▼
Distributed Storage
In this architecture:
- Clients request metadata from MDS
- Data is read/written from OSS nodes
- Operations happen in parallel for high performance
How Lustre Works
When a client accesses a file:
- Metadata request is sent to MDS
- MDS provides file location information
- Client directly accesses data from OSS nodes
- Data transfer happens in parallel
This parallel architecture significantly improves performance.
Real-World Use Cases
Lustre is widely used in scenarios such as:
-
Machine Learning Training
Training large models requires fast access to massive datasets.
2.Scientific Research
Simulations generate huge amounts of data that must be processed quickly.
3.Media Rendering
Video processing and rendering workflows benefit from high throughput.
Benefits of Lustre in OCI
- High throughput storage
- Scalable architecture
- Parallel data access
- Optimized for HPC workloads
Best Practices
When using Lustre in OCI:
- Use multiple compute nodes for parallel processing
- Design workloads for distributed execution
- Monitor performance and I/O usage
- Use high-performance networking for better throughput
Lustre File System Limits
Lustre limits are per availability domain:
Resource Limit
Max file systems 8 per tenant per availability domain
Max capacity per FS 200 TB
Aggregate throughput 200 Gbps per tenancy per availability domain
The Lustre client is mandatory for any VM or compute instance that wants to access a Lustre file system.
Lustre client works only with Red Hat Compatible Kernel (RHCK) on Oracle Linu
Syncing Lustre with Object Storage
OCI Lustre can sync data with Object Storage for cost-effective long-term storage:
Import
• Pull objects from Object Storage → Lustre
• Use case: AI training, data processing- Export • Push files from Lustre → Object Storage Use case: Save processed results
OCI Lustre file systems require a Lustre client kernel module.
However:
- Oracle Linux normally uses UEK kernel, not compatible with Lustre
- So you must switch to RHCK kernel (Red Hat Compatible Kernel)
- Then you must build the Lustre client from source code unless a prebuilt package exists
Top comments (0)