Design a high-availability file sync system like Dropbox
An excellent talk is here:https://www.youtube.com/watch?v=PE4gwstWhmc
Supported Operations
- User should be able to upload/download any file from his/her pc to the service
- Users should be able to sync their entire repo on the service with their pc
- Undo: User should be able to restore the file at a previous stage
- ACID operations:
- Atomic: File upload should be all or none
- Consistency: Both versions on pc and server must be same
- Isolation: ?
- Durability: Must be highly available
Scale of the Problem
- 10 million users
- 100 million requests per day
- Very high write/read ratio (almost 1:1)
Abstract Architecture
- Read-Write App Servers
- A sync operation happens on read-write server
- Database
- NoSQL database to store metadata related to users, files
- Memcache
- All app servers periodically write their results to memcache
- Load Balancers
- All requests to app servers go through load balancers
- Metadata servers
- All logs of user interactions, user sessions etc stored in these
- Distributed filesystem
- Storing actual file objects
Operations
- A user's login goes through load-balancer to one of the metadata servers that performs authentication/session management work and sends a cookie back to user. THe cookie, along with usual session-related parameters, also contains internal id of read-write server that all subsequent requests from this user must be directed to corresponding read-write server
- File download/upload: A read-write server communicates with distributed FS to retrieve or save corresponding file. The server also logs user's interaction, change delta and commits changes to Memcached.
- Memcache then stores these changes to NoSQL database.
- Undo: The app server queries the change to database (through memcache) and then writes it out to filesystem through lock.