Analysis of Five Characteristics of Big data open source frameworkOn January 04,2021 by Tom Routley
Big data is now a hot industry. But for big data, open source framework has its internal characteristics. I believe that the following analysis can help you.
First, high concurrency. An example of 10 G memory allocation for ES single machine is measured. The writing capacity is 1200qps. 60G memory, 12 core CPU from 3 instances is expected to reach 6000qps.
Second, the average write time of single data in the same computer room is 3MS. Fault tolerance is better than MG.
Third, it is easy to expand. The concurrency and volume can be expanded by configuring between instances. And automatic allocation of write mechanism.
Fourth, support more complex condition query. Sorting is not a problem.
Disadvantage: transactions are not supported. There is a certain delay in reading and writing. No permission management.
It is a Java search class library. It is not a complete full-text retrieval engine.
Advantages: there are many mature cases. Apache is a top-level project, and is making rapid progress. Large and active development community, a large number of developers.
Disadvantages: additional development work is required. All the extension, distribution, reliability and so on need to be implemented by ourselves.
First, excellent reading and writing performance.
Second, support data persistence. AOF and RDB are supported.
Third, it supports master-slave replication. The master will automatically synchronize the data to the slave. It can also be read-write separated.
Fourth, the data structure is rich. Supports value of string type. It also supports string, Hash, Set, Sortedset, List and other data structures.
First, Redis does not have automatic fault tolerance and recovery functions. The downtime of the host and slave will lead to the failure of the front-end part of the read-write request. You need to wait for the machine to restart or manually switch the front-end IP to recover.
Second, the host machine is down, and some data cannot be synchronized to the slave machine in time. Data inconsistency will be introduced after IP switching. The availability of the system is reduced.
Third, the master-slave replication of Redis adopts full replication. During the replication process, the host will fork out a child process to take a snapshot of the memory. And save the memory snapshot of the sub process as a file and send it to the slave machine. This process needs to ensure that the host has enough free memory.
Fourth, it is difficult for Redis to support online expansion. When the cluster capacity reaches the upper limit, online capacity expansion will become very complex. The operation and maintenance personnel must ensure that there is enough space when the system is online. So it's a huge waste of resources.
First, save storage space
Second, Hbase automatically splits data.
Third, Hbase can provide support for high concurrency read and write operations
First, conditional queries are not supported. Only query by Row key is supported.
Second, it can not support the Master server for the time being. When the Master is down, the entire storage system will crash.
First, it has excellent scalability. It can scale to thousands of nodes. It is suitable for users with huge data demand.
Second, it not only saves the cost from the software, but also has low requirements on the hardware.
Third, the Hadoop ecosystem is active. The surrounding open source projects are rich.
First, full scene, serial within task. Heavy throughput, response time is not guaranteed at all. Intermediate results are not visible and cannot be shared.
Second, single input and single output, chain waste is serious. Chain MR cannot be parallel. Coarse grained fault tolerance may cause traps.
Third, graph computing is not friendly. Iterative computation is not friendly. It can't support second level computing, only suitable for offline data analysis.
People should analyze their use scenarios according to the above characteristics. Analyze and solve the problem in advance, so as not to delay the work.