"Infrastructure in the Era of Big Models" Storage Design and Implementation of GPU Clusters[Copy link]
Distributed block storage
Business requirements for block storage
Local disk, assign the host machine's physical hard disk to the virtual machine
To create a new virtual machine, you need to completely copy a system disk image data to the local disk, which takes a long time to create
Hot migration requires a complete copy of the system disk, and hot migration performance is poor
The system disk cannot create snapshots, and the local hard disk
Use fast storage at the remote end of the network as the system disk of the virtual machine
Operating system boot: Block storage devices can be found after booting
Normal mount: Identify the block storage device and send the read/write command words of the file system to the block storage device from the network card
Data redundancy backup: snapshot
Centralized block storage
FC-SAN:
HBA Card
SAN Switch
Centralized storage device based on FC-SAN storage controller
The problem is performance and scalability of the number of terminals
Distributed block storage
Use industrial standard servers with large-capacity disks to form a cluster, and achieve redundant backup of stored data through multiple copies or EC mode
Used to efficiently store and retrieve unstructured data (documents, images, videos, sounds)
Use HTTP-based open interface access
Define a bucket. Each file in the bucket has a globally unique identifier.
Add key-value tags to objects for easy retrieval
Ceph
A unified storage platform that unifies three interfaces. Upper-layer applications support Object, Block, and File.
In the data strong consistency algorithm, the write transaction is considered completed only when all copies of the data are written and returned. The write efficiency will be lower, so it is more suitable for scenarios with less writes and more reads.
Swift
A component of OpenStack
Only the final consistency of the data is guaranteed. Commit can be performed after writing two copies. This means that the read operation needs to compare and verify the copies, and the read efficiency is relatively low.
The consistent hashing algorithm is used to complete data distribution calculations. The fault isolation distribution of data copies is achieved by first calculating the mapping of objects to logical objects (Zones). Then, the distribution calculation of objects in Buckets is completed through the hash consistency algorithm. The Ring structure is used to organize Bucket nodes, and the data distribution is not as uniform as Ceph.
Proxy nodes are required to access data, which is different from directly accessing data nodes through clients. In terms of data access efficiency, it is worse than Ceph.
Commercial object storage
Consistent Hashing Algorithm
Tiered management of hot and cold data
Generally divided into HTTP service layer, storage node layer, key-value database layer
Provide VIP outside the cloud through the cloud load balancing service example, and provide service discovery and service routing for virtual machines in the VPC through VPCGW to achieve unlimited horizontal expansion
MinIO
A high-performance distributed open source storage project based on Go
Provides deep integration solutions with mainstream containerization technologies such as k8s, etcd, and docker
Different MinIO clusters can be federated to form a global namespace that spans multiple data centers.
Amazon S3 Compatibility
Erasure codes and checksum mechanisms are used to prevent hardware errors and silent data corruption. In the highest redundancy configuration, data can be recovered even if 1/2 of the disks are lost.
Distributed file storage
JuiceFS
Adopt an architecture that stores "data" and "metadata" separately.
Supports multiple access interfaces, including POSIX file system-compatible interfaces, Hadoop Java SDK, FUSE, and Kubernetes.
A local multi-level cache mechanism is provided to improve data access speed and throughput.
JuiceFS has good compatibility and supports multiple file system interfaces such as POSIX, HDFS and S3 API.
It means that there is no support from the basic software level. It is just a simple allocation of hard disks to virtual machines. Other peripherals are difficult to do.
Details
Published on 2024-9-26 10:00
It means that there is no support from the basic software level. It is just a simple allocation of hard disks to virtual machines. Other peripherals are difficult to do.