"Infrastructure in the Era of Big Models" Storage Design and Implementation of GPU Clusters

ltaodream

"Infrastructure in the Era of Big Models" Storage Design and Implementation of GPU Clusters [Copy link]

Distributed block storage

Business requirements for block storage

Local disk, assign the host machine's physical hard disk to the virtual machine

To create a new virtual machine, you need to completely copy a system disk image data to the local disk, which takes a long time to create
Hot migration requires a complete copy of the system disk, and hot migration performance is poor
The system disk cannot create snapshots, and the local hard disk

Use fast storage at the remote end of the network as the system disk of the virtual machine

Operating system boot: Block storage devices can be found after booting
Normal mount: Identify the block storage device and send the read/write command words of the file system to the block storage device from the network card
Data redundancy backup: snapshot

Centralized block storage

FC-SAN：

HBA Card
SAN Switch
Centralized storage device based on FC-SAN storage controller

The problem is performance and scalability of the number of terminals

Distributed block storage

Use industrial standard servers with large-capacity disks to form a cluster, and achieve redundant backup of stored data through multiple copies or EC mode

Ceph: CRUSH algorithm (random) wastes space
Self-developed block storage: Raft algorithm, node/disk/disk offset ternary array

Distributed Object Storage

Used to efficiently store and retrieve unstructured data (documents, images, videos, sounds)
Use HTTP-based open interface access
Define a bucket. Each file in the bucket has a globally unique identifier.
Add key-value tags to objects for easy retrieval

Ceph

A unified storage platform that unifies three interfaces. Upper-layer applications support Object, Block, and File.
In the data strong consistency algorithm, the write transaction is considered completed only when all copies of the data are written and returned. The write efficiency will be lower, so it is more suitable for scenarios with less writes and more reads.

Swift

A component of OpenStack
Only the final consistency of the data is guaranteed. Commit can be performed after writing two copies. This means that the read operation needs to compare and verify the copies, and the read efficiency is relatively low.
The consistent hashing algorithm is used to complete data distribution calculations. The fault isolation distribution of data copies is achieved by first calculating the mapping of objects to logical objects (Zones). Then, the distribution calculation of objects in Buckets is completed through the hash consistency algorithm. The Ring structure is used to organize Bucket nodes, and the data distribution is not as uniform as Ceph.
Proxy nodes are required to access data, which is different from directly accessing data nodes through clients. In terms of data access efficiency, it is worse than Ceph.

Commercial object storage

Consistent Hashing Algorithm
Tiered management of hot and cold data
Generally divided into HTTP service layer, storage node layer, key-value database layer
Provide VIP outside the cloud through the cloud load balancing service example, and provide service discovery and service routing for virtual machines in the VPC through VPCGW to achieve unlimited horizontal expansion

MinIO

A high-performance distributed open source storage project based on Go
Provides deep integration solutions with mainstream containerization technologies such as k8s, etcd, and docker
Different MinIO clusters can be federated to form a global namespace that spans multiple data centers.
Amazon S3 Compatibility
Erasure codes and checksum mechanisms are used to prevent hardware errors and silent data corruption. In the highest redundancy configuration, data can be recovered even if 1/2 of the disks are lost.

Distributed file storage

JuiceFS

Adopt an architecture that stores "data" and "metadata" separately.
Supports multiple access interfaces, including POSIX file system-compatible interfaces, Hadoop Java SDK, FUSE, and Kubernetes.
A local multi-level cache mechanism is provided to improve data access speed and throughput.
JuiceFS has good compatibility and supports multiple file system interfaces such as POSIX, HDFS and S3 API.

Jacktang

The system disk cannot create a snapshot of the local hard disk. How do I understand this?

huwuren1a

I'm here to occupy a building and earn points. Thank you so much!

ltaodream

Jacktang posted on 2024-9-26 07:21 The system disk cannot create a snapshot, the local hard disk, how to understand this

It means that there is no support from the basic software level. It is just a simple allocation of hard disks to virtual machines. Other peripherals are difficult to do.

"Infrastructure in the Era of Big Models" Storage Design and Implementation of GPU Clusters [Copy link]

Distributed block storage

Business requirements for block storage

Local disk, assign the host machine's physical hard disk to the virtual machine

Use fast storage at the remote end of the network as the system disk of the virtual machine

Centralized block storage

Distributed block storage

Distributed Object Storage

Ceph

Swift

Commercial object storage

MinIO

Distributed file storage

JuiceFS

Latest reply

Comments