Keep bragging

Notes on technologies, coding, and algorithms

System design involves 6 steps:

Summary

In requirement clarification, focus on 4 key areas:

User	Scale	Performance	Cost
Who/how	QPS/TPS, size/query, spike	p99 latency, write to read delay	development/maintenance

Function vs non-function

High level design covers the big picture workflow and services

What to store?
- Data
- Data schema
- Requirements: read/write, latency, scalability, availability, fail-over
- Transaction or analysis?
Where to store?
- Compare db options related to non-function requirements
- How to scale write/read
- How to make both read/write faster?
- How not to lose data
- How to maintain data consistency?
- How to make sure data integrity?
How to store?

Problem 1: Aggregate data

Should we pre-aggregate data in processing logic?
- Design 1: 3 updates, 3 count increase to database (3 x +1)
- Design 2: 3 updates, processing logic aggregate them, 1 increase to database (+3)
- Choice: design 2
Push or pull
- push model won’t handle situation when processing unit fails
- push model won’t scale when processing unit takes long time to process
- pull model adds queue/persistency between event source and processing units, avoid both problems.
- checkpointing: queue remembers the offset for each consumer to ensure sequence and failover
- partitioning:

Client-side

Load balancer

Messaging systems

Data processing

Storage

Cache

Master

Monitoring

Paritition

How to identify bottlenecks? How to monitor system health? How to make sure results accurate?