263 views|3 replies

60

Posts

0

Resources
The OP
 

"Infrastructure in the Era of Big Models" Storage Design and Implementation of GPU Clusters [Copy link]

 

Distributed block storage

Business requirements for block storage

Local disk, assign the host machine's physical hard disk to the virtual machine

  • To create a new virtual machine, you need to completely copy a system disk image data to the local disk, which takes a long time to create
  • Hot migration requires a complete copy of the system disk, and hot migration performance is poor
  • The system disk cannot create snapshots, and the local hard disk

Use fast storage at the remote end of the network as the system disk of the virtual machine

  • Operating system boot: Block storage devices can be found after booting
  • Normal mount: Identify the block storage device and send the read/write command words of the file system to the block storage device from the network card
  • Data redundancy backup: snapshot

Centralized block storage

FC-SAN:

  • HBA Card
  • SAN Switch
  • Centralized storage device based on FC-SAN storage controller

The problem is performance and scalability of the number of terminals

Distributed block storage

Use industrial standard servers with large-capacity disks to form a cluster, and achieve redundant backup of stored data through multiple copies or EC mode

  • Ceph: CRUSH algorithm (random) wastes space
  • Self-developed block storage: Raft algorithm, node/disk/disk offset ternary array

Distributed Object Storage

  • Used to efficiently store and retrieve unstructured data (documents, images, videos, sounds)
  • Use HTTP-based open interface access
  • Define a bucket. Each file in the bucket has a globally unique identifier.
  • Add key-value tags to objects for easy retrieval

Ceph

  • A unified storage platform that unifies three interfaces. Upper-layer applications support Object, Block, and File.
  • In the data strong consistency algorithm, the write transaction is considered completed only when all copies of the data are written and returned. The write efficiency will be lower, so it is more suitable for scenarios with less writes and more reads.

Swift

  • A component of OpenStack
  • Only the final consistency of the data is guaranteed. Commit can be performed after writing two copies. This means that the read operation needs to compare and verify the copies, and the read efficiency is relatively low.
  • The consistent hashing algorithm is used to complete data distribution calculations. The fault isolation distribution of data copies is achieved by first calculating the mapping of objects to logical objects (Zones). Then, the distribution calculation of objects in Buckets is completed through the hash consistency algorithm. The Ring structure is used to organize Bucket nodes, and the data distribution is not as uniform as Ceph.
  • Proxy nodes are required to access data, which is different from directly accessing data nodes through clients. In terms of data access efficiency, it is worse than Ceph.

Commercial object storage

  • Consistent Hashing Algorithm
  • Tiered management of hot and cold data
  • Generally divided into HTTP service layer, storage node layer, key-value database layer
  • Provide VIP outside the cloud through the cloud load balancing service example, and provide service discovery and service routing for virtual machines in the VPC through VPCGW to achieve unlimited horizontal expansion

MinIO

  • A high-performance distributed open source storage project based on Go
  • Provides deep integration solutions with mainstream containerization technologies such as k8s, etcd, and docker
  • Different MinIO clusters can be federated to form a global namespace that spans multiple data centers.
  • Amazon S3 Compatibility
  • Erasure codes and checksum mechanisms are used to prevent hardware errors and silent data corruption. In the highest redundancy configuration, data can be recovered even if 1/2 of the disks are lost.

Distributed file storage

JuiceFS

  • Adopt an architecture that stores "data" and "metadata" separately.
  • Supports multiple access interfaces, including POSIX file system-compatible interfaces, Hadoop Java SDK, FUSE, and Kubernetes.
  • A local multi-level cache mechanism is provided to improve data access speed and throughput.
  • JuiceFS has good compatibility and supports multiple file system interfaces such as POSIX, HDFS and S3 API.

Latest reply

I'm here to occupy a building and earn points. Thank you so much!  Details Published on 2024-9-26 09:17
 
 

6580

Posts

0

Resources
2
 
The system disk cannot create a snapshot of the local hard disk. How do I understand this?

Comments

It means that there is no support from the basic software level. It is just a simple allocation of hard disks to virtual machines. Other peripherals are difficult to do.  Details Published on 2024-9-26 10:00
 
 
 

42

Posts

0

Resources
3
 
I'm here to occupy a building and earn points. Thank you so much!
 
 
 

60

Posts

0

Resources
4
 
Jacktang posted on 2024-9-26 07:21 The system disk cannot create a snapshot, the local hard disk, how to understand this

It means that there is no support from the basic software level. It is just a simple allocation of hard disks to virtual machines. Other peripherals are difficult to do.

 
 
 

Just looking around
Find a datasheet?

EEWorld Datasheet Technical Support

Copyright © 2005-2024 EEWORLD.com.cn, Inc. All rights reserved 京B2-20211791 京ICP备10001474号-1 电信业务审批[2006]字第258号函 京公网安备 11010802033920号
快速回复 返回顶部 Return list