Changes in Project Architecture for Image Storage
In the previous session, we implemented image storage by placing uploaded images within the same project's server directory. Other modules accessed these images via HTTP requests.
While this approach enabled cross-module access to images, several issues emerged:
- Images were scattered across multiple servers.
- Servers hosting large numbers of images faced significant load, potentially affecting other functionalities.
- Storing images within project paths led to data loss upon restarts; external file storage introduced performence bottlenecks due to I/O operations.
To address these concerns, a dedicated image server can be set up specifically for storing and serving images. This requires implementing specialized image storage technologies or tools.
Overview of Distributed File Systems
1. Classification
1.1 General-Purpose Distributed File Systems
These systems contrast with traditional local file systems such as ext3 or NTFS. Notable examples include Lustre and MooseFS.
Advantages
They support standard file system operations, lowering the barrier for developers.
Disadvantages
High system complexity due to full POSIX compliance (Portable Operating System Interface), including features like directory structures, file permissions, and locks.
Performance may suffer because of the overhead required to maintain POSIX compatibility.
1.2 Specialized Distributed File Systems
Inspired by Google’s File System (GFS) design, these systems do not allow modification of files after upload. They require proprietary APIs for access and are also referred to as distributed file storage services. Examples include MogileFS, FastDFS, and TFS.
Advantages
Simpler architecture since they do not need to implement POSIX features, resulting in better performance.
Disadvantages
Higher learning curve for developers due to the use of custom APIs (often wrapped into utility classes).
2. Google FS Architecture
2.1 Roles
- Name Server (indexing server)
- Storage Servers
2.2 Key Features
- No support for file modification.
- Files are split into chunks requiring a index server.
- Replication of files across multiple storage servers using dynamic allocation strategies.
Introduction to FastDFS
1. Overview
FastDFS is a lightweight open-source distributed file system initiated in April 2008. It is a simplified version of Google's File System, written entirely in C, and supports Linux, FreeBSD, AIX, and other UNIX-like operating systems.
It addresses challenges related to large-scale file storage and high-concurrency access. Load balancing during file retrieval is supported, and it provides software RAID capabilities using low-cost IDE drives. Additionally, it allows online expansion of storage nodes and deduplicates identical files to save disk space.
Access to FastDFS is limited to Client APIs, excluding POSIX-based access methods.
It is well-suited for medium to large websites aiming to store resources such as images, documents, audio, and video files.
2. Resources
FastDFS does not have an official website. However, its creator, Yu Qing (happy_fish100), serves as a moderator for the FastDFS section on ChinaUnix forums, regularly updating content there:
http://bbs.chinaunix.net/
The software can be downloaded from SourceForge, with the latest version being 5.08:
https://sourceforge.net/projects/fastdfs/files/