Site Architecture
Summary
The whole process of evolution
The initial tiny site
Application server needs much stronger CPU to control the complex business.
Database server need much faster disk and larger memory to get disk retrieval & data cache faster
So as file server need the disk.
1 | Note: There are 80% business on the 20% data |
The site is growing
Read and write database will be separated exactly.
CDN & Reverse Proxy
Both CDN and Reverse Proxy use the cache model.
But the CDN decided u access the web on the close net provider while the Reverse Proxy on the center.
- distributed file and database.
- With the task get much complex, the demand ability of data check getting higher, so you need to take some advice like
NoSQL
orNondatabase query technology
like a search engine.
Business split
distributed service
Value
We are growing, not rebuild or create.
The real power is the business development
Business make technology, career makes a man.
Misunderstand area
Blind pursue large site solutions.
For technology to technology.(but for business)
Technology not the real point sometimes(12306)
Architecture Pattern
Pattern
stratification(horizontal): application, service, data.
advantage: keep the interface, everyone justifies their own works.
disadvantage: the interface and splice layer border need be careful.
segmentation(vertical): divide the function and business
distributed: both front points to this aim
distributed application and business
distributed static sources
distributed data and storage
distributed computing
- Server clustering
Some servers deploy the same application and provide service by loading balance.
- Cache
CDN
Reverse Proxy
Local cache
distributed cache
precondition
the cache only be short-term effective
the data which caused by hot point without balance should be put in the cache
- Asynchronous
improve system
improve web site responsive speed
avoid distributed access peak
- Redundance
cold backup
hot backup
- Automatic
code manager
test
security
deploy
monitor
alert
lose effect move
lose effect recover
level down
allocate resources
Sina apply example
initial: lamp
base server layer: support database, storage, cache, search and other technology.
the middle layer is platform service and application service.
the upon layer is API, the third party service and sina business layer.
MPSS: Now the solutions shows like virtual the physical machine. As this way, they even can use the same port while the MPSS can not.
Multi-Level Cache
Architecture kernel point
What is architecture?
The highest level of planning, difficult decisions to change
Keep their balance
Performance
Browser: cache, compress the page, decrease the transfer cookie, layer regularly.
Server: CDN, local and distributed cache, asynchronous message queue
Code: multi threads, manage the memory
Databae: index, cache, sql optimise, NoSQL
Serviceability
For application server, it can not storage session info.
For storage server, it should be real-time backup
Function: Check whether the whole can work when some servers died.
Flexibility
Application Cluster: Add new blood by using loading balance machine.
Cache Cluster: Cache router algorithm
Database Cluster Way: Routing partition
Augment ability
Event-driven Architecture: Message queue
Distributed Service: Divide the business and reuse service and call by distributed service framework
Safety
Architecture
Instant response
Web Site Performance Test
- Different view of the sites’ performance
User view: the speed
- Most about front end, optimise html css, cdn, reverse proxy, cache strategy
Developer view:
Cache speed up data, distributed handle
Improve the read and write ability by using cluster
Asynchronous message speed up the response.
Operations view:
MNO bandwidth
Server hardware configuration
Data center network architecture
- Performance test index
Time: test to calculate the time segment
Distribution counts: test by using multi threads
Throughput capacity: TPS, QPS, HPS
Performance counter: System load(top command)
1
2
3
4
5
6
7
8top - 18:38:43 up 5 days, 8:33, 1 user, load average: 0.00, 0.01, 0.05
Tasks: 74 total, 1 running, 73 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem : 1883860 total, 106724 free, 456832 used, 1320304 buff/cache
KiB Swap: 0 total, 0 free, 0 used. 1115392 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1 root 20 0 43076 3528 2408 S 0.0 0.2 0:06.31 systemd
- Performance test function
Performance test
Loading test
Pressure test
Stability test
- Performance test report example
Front end optimise
- Browser
reduce the http request(merge the img, css and js)
Use the browser cache(header: Cache-control, expire; one by one update the icon; update the call name instead of the file(html call, js file))
turn on the gzip
CSS on the header, js on the body tail.
Reduce the cookie transfer(While static resources)
- CDN & Reverse Proxy
- loading speed, reduce request, security; loading balance, cache
Application server optimise
- Distributed cache
- Rational use cache
update the data usually
No access point
Not the same data & read wrong info
Cache use ability
Cache warming(metadata)
The cache to penetrate(not exist also need be saved as null value)
- Distributed cache architecture
JBoss cache and Memcached
Libevent
Memory manage: slab, chunk, LRU,
- Code optimise
Multi threads code
Start threads = [Task execute time/(Task execute time - IO wait time)] * CPU kernel counts
Stateless object
Local object
distributed access with lock
Resources reuse
Singleton
Object pool
Data Structure & Algorithm
Hash time33
Garbage collection
Stack: function args, local variables
Heap: create & delete object & garbage collection
Storage performance optimise
SSD (B+ potential)
B+ tree VS LSM tree
RAID VS HDFS
- RAID
- HDFS: name node & data node(Map reduce)
No danger of anything going wrong
Measurement and assessment
Cluster session
session copy(small cluster)
session binding(…)
note session by cookie
Session server(The best method)
High availability data
CAP (always use ap without c)
Backup
cold(abandon), hot(Master-Slave)Fail over
check: keep-alive, report of access failed
move: route computing find true server
data recover: recover the backup count again
Monitor
- Data collection
User behaviors collection
Server logs collection
Client Browser logs collection by js(Tool: storm log analyze)
Server Performance Monitor
- Load, memory, disk IO, NetWork IO(Tool: Ganglia)
Data report
- Monitor manage
System alert
Fail over
Automatic degrade
Telescopic architecture
Architecture edesign
Different function divided by physical
Single function divided by cluster
Application server cluster design (Loading balance)
- Http(302 but for SEO works not well)
- DNS
- Reverse Proxy
- IP
- Data Link(Direct route) [linux tool: LVS]
- Algorithm
Round robin
Weighted round robin
Random
Least Connections
Sources Hashing
Distributed Cache Cluster Design
- Memcached model
- Memcached challenge
When: Distributed Cache Cluster need be extension
Loading balance design advantage demand: Cache
- Distributed Cache Hash Algorithm
Data Storage Server Cluster Design
Schema Database Telescopic (Cobar, GreenPlum)
NoSQL Database(Apache HBase)
Site Extensibility
Structure Extensibility Architecture
Module Coupling Decoupling
Distributed Message Queue
Event Driven Architecture
Distributed Message Queue
- ESB SOA
Reuse Platform
Questions
Compiling & Deploy
Code Patch Manage Difficult
Database Connections Exhaustion
New Business Add Difficult
Separate
Vertical: various applications
Horizontal: distributed business
Web Service & Enterprise Service
Server[WSDL] -> Service Broker[UDDI] <- SOAP [Client]
Disadvantages
Bloat register and find management
Inefficient xml serializable method
Large spending Http connections
Complex deploy and maintenance method
Distributed Service demand and features
Loading balance, fail over, efficient long-distance communication
Heterogeneous systems, Minimum invasion to applications
Versions control, Real time monitor
- Distributed Service Framework Design
Dubbo(NIO communication)
Extensible data structure
- ColumnFamily
Open platform theory
Safety Architecture
- XSS
Filter escape character
HttpOnly
Inject
- OPEN Sources, Error echo, Blinds, Filter escape, Args bind(OS injection)
CSRF
- Form token, verify code, Refer check
Others
- Error code, HTML annotation, file upload, traversal paths
Web application firewall
- ModSecurity
Web security scanner
Info encryption and secret key management
One-way hash encryption
- MD5, SHA
Symmetric encryption
- DES, RC
Asymmetric encryption
- Https, RSA
Secret key management
Info filter & anti-spam
- Text Match
Trie (base array: storage, check array: status)
Multilevel hash match
- Classify
- Basyes(Advance) -> TAN -> ARCS
- Blacklist
- Bloom Filter
Electronic Commerce Control
- Risk
- Account, Sellers, Buyers, Trade
- Risk control
rule engine
statistics model
Example
TaoBao
Evolution
Lamp
2004, eBay: Php->Java, Mysql->Oracle, MVC Webx, ORM: iBatis, Manage: antx, Server: Weblogic
Note: Taobao choose the free plan when the begin and choose the no free plan when speed up growing web. Both of them are the right decisions.
abandon EJB, import spring; JBoss(Jetty further more) not Weblogic,
At this moment, taobao begin to make progress, many technology which be their base was from that moment.
Wiki
The whole wiki
- GeoDNS, LVS, Squid, Lighttpd, PHP, Memcached, Lucene, MySQL
Wiki performance optimise strategy
- Front end
- Server
APC
Imagemagick
Tex
replace strtr function
- Backstage
Doris
Solutions
- Classify Faults
- Instant, Temp, Forever
Normal status access
Instant fault high availability solutions
- Temporary Error high availability solutions
- forever fault high availability solutions
Machine can not distinguish between temp and forever.
So you need to find it artificial.
Online shopping spike
Challenges
Impact existing business
Application, Database loading
Tape width
Direct url
Strategy
Independent deploy
Page static(reduce request)
Rent tape width(CDN)
Dynamic generate random order page url
Architecture Design
- Spike button control
- Spike process & Architecture Design
Fault example analyze
Log cause fault
Log output level: global debug
Experiences:
Self log & third party should be config individually
Config log level at least: warn, and check the output code call whether accord with real log level.
Shut down third party no use log(Most are error log)
Highly concurrent access database cause fault
Experiences:
Home page should not access database
Home page had better as static
Highly concurrent latch cause fault
Cache cause fault
Application start not at the same time
Apache, JBoss
JBoss start, then request it by curl. success: start Apache
I/O Big file cause disk
- Tiny file should be storage themselves instead of sharing with distributed big file storage system.
Abuse of production environment
- Access production environment should be regularly(DBA)
Non-standard process
Diff before you push the code
Stronger the code review
Code habits
Check the null pointer when you are not sure the input object status
Null object pattern
Architects
Leader Art
Man not Production
Discover the excellence of man
Share the blueprint
Learn to compromise
Engage and Develop Others
Career Raiders
Find questions
Ask questions and support
Site architecture
Effect
- Design, Fire Fighting, Sermon, Geek
Result
- Sherpa, Spartan, VIP
Duty
- Productions, Basic service, Basic equipment
Attention Level
- Function, Not function(Performance & others), Team organization, Production Future, Production operative
Public praise
- Best, good, normal, bad. worst
Non-mainstream
- Normal, Literature, 1+1
Appendix
Front End
Browser optimise
- Cache, reduce Http, page compress
CDN
Static resources should be storage in their own server cluster
Image(not logo… but the user upload like avatar)
- Own server & child domain
Reverse Proxy
DNS
Application layer architecture
Development framework
Page Rendering
Loading balance
Session management
Dynamic page staticize
Business split
Virtual server
Service layer architecture
Distributed message
Distributed service
Distributed cache
Distributed configuration
Storage architecture
Distributed files
Schema database
NoSQL database
Data synchronization
Backstage architecture
Search engine
Data repositories
Recommend system
Data collection and monitor
Browser data collection
Server business data collection
Server performance collection
System monitor
System alert
Security Architecture
Web Attack
Data protection
Data Center architecture
Computer room
Cabinet
Server architecture
Appendix B
Web: only static html
CGI cause dynamic page content
process like: server push the reuqest to cgi programmer, CGI computing and generate the html.
CGI use Perl, Java servlet call servlet in the web container.
Php(Asp, Jsp) improve the situation which caused business code and page programmer coupling by CGI
MVC (combine cgi and web server)
Done
-
- Post title:Site Architecture
- Post author:ReZero
- Create time:2020-07-18 22:15:30
- Post link:https://rezeros.github.io/2020/07/18/site-architechture/
- Copyright Notice:All articles in this blog are licensed under BY-NC-SA unless stating additionally.