Analysis of Five Characteristics of Big data open source framework
On January 04,2021 by Tom RoutleyBig data is now a hot industry. But for big data, open source framework has its internal characteristics. I believe that the following analysis can help you.
1.ElasticSearch
Advantage:
First, high concurrency. An example of 10 G memory allocation for ES single machine is measured. The writing capacity is 1200qps. 60G memory, 12 core CPU from 3 instances is expected to reach 6000qps.
Second, the average write time of single data in the same computer room is 3MS. Fault tolerance is better than MG.
Third, it is easy to expand. The concurrency and volume can be expanded by configuring between instances. And automatic allocation of write mechanism.
Fourth, support more complex condition query. Sorting is not a problem.
Disadvantage: transactions are not supported. There is a certain delay in reading and writing. No permission management.
2.Lucene
It is a Java search class library. It is not a complete full-text retrieval engine.
Advantages: there are many mature cases. Apache is a top-level project, and is making rapid progress. Large and active development community, a large number of developers.
Disadvantages: additional development work is required. All the extension, distribution, reliability and so on need to be implemented by ourselves.
3.Redis
Advantage:
First, excellent reading and writing performance.
Second, support data persistence. AOF and RDB are supported.
Third, it supports master-slave replication. The master will automatically synchronize the data to the slave. It can also be read-write separated.
Fourth, the data structure is rich. Supports value of string type. It also supports string, Hash, Set, Sortedset, List and other data structures.
Disadvantages:
First, Redis does not have automatic fault tolerance and recovery functions. The downtime of the host and slave will lead to the failure of the front-end part of the read-write request. You need to wait for the machine to restart or manually switch the front-end IP to recover.
Second, the host machine is down, and some data cannot be synchronized to the slave machine in time. Data inconsistency will be introduced after IP switching. The availability of the system is reduced.
Third, the master-slave replication of Redis adopts full replication. During the replication process, the host will fork out a child process to take a snapshot of the memory. And save the memory snapshot of the sub process as a file and send it to the slave machine. This process needs to ensure that the host has enough free memory.
Fourth, it is difficult for Redis to support online expansion. When the cluster capacity reaches the upper limit, online capacity expansion will become very complex. The operation and maintenance personnel must ensure that there is enough space when the system is online. So it's a huge waste of resources.
4.Hbase
Advantage:
First, save storage space
Second, Hbase automatically splits data.
Third, Hbase can provide support for high concurrency read and write operations
Disadvantages:
First, conditional queries are not supported. Only query by Row key is supported.
Second, it can not support the Master server for the time being. When the Master is down, the entire storage system will crash.
5.Hadoop
Advantage:
First, it has excellent scalability. It can scale to thousands of nodes. It is suitable for users with huge data demand.
Second, it not only saves the cost from the software, but also has low requirements on the hardware.
Third, the Hadoop ecosystem is active. The surrounding open source projects are rich.
Disadvantages:
First, full scene, serial within task. Heavy throughput, response time is not guaranteed at all. Intermediate results are not visible and cannot be shared.
Second, single input and single output, chain waste is serious. Chain MR cannot be parallel. Coarse grained fault tolerance may cause traps.
Third, graph computing is not friendly. Iterative computation is not friendly. It can't support second level computing, only suitable for offline data analysis.
People should analyze their use scenarios according to the above characteristics. Analyze and solve the problem in advance, so as not to delay the work.
Article Recommendations
Latest articles
Popular Articles
Archives
- September 2024
- August 2024
- July 2024
- June 2024
- May 2024
- April 2024
- March 2024
- February 2024
- January 2024
- December 2023
- November 2023
- October 2023
- September 2023
- August 2023
- July 2023
- June 2023
- May 2023
- April 2023
- March 2023
- February 2023
- January 2023
- December 2022
- November 2022
- October 2022
- September 2022
- August 2022
- July 2022
- June 2022
- May 2022
- April 2022
- March 2022
- February 2022
- January 2022
- December 2021
- November 2021
- October 2021
- September 2021
- August 2021
- July 2021
- January 2021
Leave a Reply