Elasticsearch comes from the Compass library, the first open source project of the author Shay Banon, and the original purpose of this library was to make a recipe search engine for Shay's wife who was learning to cook at the time.
I immediately think of Pinterest’s creative source and founder Ben Silbermann. In order to make it easier for his girlfriend to find an engagement ring, he can easily paste the same page with thumbtacks for comparison.
Yes it is! The programmer is the most romantic person
ES itself is schema less, and there are special fields that need to be set through Mapping. Each data point is a row of data Document, and the ES data classification is done through the Index layer.
The basic concept of Elassticsearch-data model, as shown in the table bellow, compares the concept of ES data model with traditional databases.
First, it is divided into two layers, one is ES, and the bottom is Lucene. An ES cluster is composed of multiple nodes. An ES instance will host some shards, and multiple shards will fall on different machines. P1 and P2 are two different shards, and the shards have the concept of master and slave. For each shard of ES, it is a Lucene Index.
An ES host running a single instance is called a node. It is a member of the cluster and can store data, participate in cluster indexing and search operations. Similar to a cluster, nodes are identified by their names, and the default is a random Marvel character name automatically generated at startup. Users can customize any name they want to use as needed, but for management purposes, this name should be as recognizable as possible. The node determines the cluster it wants to join through the ES cluster name configured for it.
2.3.3 Shard and Replica
ES's "shard" mechanism can distribute the data in an index to multiple nodes. It divides an index into multiple underlying physical Lucene indexes to complete the partition storage function of index data. A physical Lucene index is called a shard. Each shard is a fully functional and independent index inside, so it can be stored by any host in the cluster. When creating an index, the user can specify the number of its shards, the default number is 5.
There are two types of shards: primary and replica, that is, the primary shard and the replica shard. Primary shard is used for document storage. Each new index will automatically create 5 Primary shards (the latest version is changed to 1). Of course, this number can be defined by configuration before index creation. However, once the creation is completed, its Primary The number of shards cannot be changed. Replica shard is a copy of Primary Shard, used for redundant data and improving search performance. Each Primary shard is configured with one Replica shard by default, but multiple shards can be configured, and the number can be changed dynamically. ES will automatically increase or decrease the number of these Replica shards as needed.
The ES cluster can be composed of multiple nodes, and each shard is stored on these nodes in a distributed manner.
ES can automatically move shards between nodes as needed, for example when adding nodes or node failures. In short, sharding realizes the distributed storage of the cluster, and the copy realizes its distributed processing and redundancy functions.
When dynamically updating the index, the inverted index that has been generated is not modified, but a new segment (segment) is generated
Each segment is an inverted index, and then another commit file is used to record all the segments in the index, and the data source for generating the segments is placed in the buffer in the memory
There are four main steps to persistence,
Each time ES adds a new data record, it will double write the data to the translog and in-memory buffer memory buffers
At this time, it cannot be retrieved yet, and the data must be refreshed to the segment before it can be retrieved
By default, es executes refresh every 1s, which is too performance-consuming. You can modify this refresh interval through
The entire refresh specifically does the following things
As the translog file becomes larger and larger, you should consider flushing the data in the memory to the disk. This process is called flush
Flush is to flush the data in the memory (including translog and segments) to the disk, while fsync is only the disk that flushes the translog (to ensure that the data is not lost).
Through the automatic refresh mechanism every second, a new segment will be created, and there will be a lot of segments in a short time. The segment consumes the file handle, memory, and CPU clock of the system. The most important thing is that every request will check all the segments in turn. The more segments, the slower the retrieval will be.
ES solves this problem by merging these segments in the background. Merge small segments to large ones
This process is also the process in which the "deleted" documents are actually cleared out of the file system, because the documents marked as deleted will not be copied to the large segment.
During the indexing process, the refresh process will create new segments and then open them for indexing.
The merge process will select some small segments and merge them into a large segment. This process will not interrupt retrieval and index creation. Once the merge is completed, the old segments will be deleted.
The new segment is flushed to disk
A new commit point is written, including new segments, excluding old small segments
The new segment is opened for indexing
Old segments are deleted
The following is a supplementary introduction to translog
Based on the consideration of node downtime, Lucence will place the Translog every time it writes, similar to db binlog, but the difference is that db binlog sets the expiration time in days by expire_logs_days=7. The default is 7 days.
The translog is cleared after each flush. The clearing strategy can be set through several dimensions
index.translog.flush_threshold_ops, execute a flush after how many operations are performed, the default is unlimited
index.translog.flush_threshold_size, flush when the size of translog exceeds this parameter, the default is 512mb
index.translog.flush_threshold_period, how often to force flush once, default 30m
index.translog.interval, how often does es go to check whether the translog meets the flush condition
The translog log provides a persistent record of all operations that have not been flushed to disk. When ES is started, it will use the latest commit point to restore all existing segments from the disk, and then reproduce all the operations in the translog to add updates. These updates occur after the record of the latest commit. fsync.
The translog log can also be used to provide real-time CRUD. When you try to read, update, or delete a document by document ID, it will first check the translog log to see if there is the latest update, and then get the document from the corresponding segment. This means that it will operate on the latest version of the document every time, in real time.
Theoretically, setting this expiration strategy, after the translog is obtained before flushing, can be used for dual-computer room synchronization or further message notification processing, which can also achieve great results, and the feasibility has yet to be tried.
The main shard + at least one of the secondary shards are all written successfully before returning success
1 The client initiates a document writing request to node1
2 Node1 calculates that the document belongs to the shard0 based on the document ID (_id field), and then routes the request to the main shard P0 of Node3
shard = hash(routing)% number_of_primary_shards
3 After Node3 executes the write request on p0, if it succeeds, it will route the request in parallel to Node1 Node2 and its replica R0, and after all success, Node1 will report to the client.
wait_for_active_shards to configure the replication allocation synchronization strategy -Set to 1 means return only after writing the main shard -Set to all to wait for all replica shards to be written before returning -Set to a value between 1-number_of_replicas+1, for example, if there are 2 replica shards, if one of them is successfully written, it will be returned timeout controls the waiting time when the cluster's abnormal copy synchronization shard is unavailable
A document can be read on any primary and secondary shards
The update operation is actually to read first and then write
Since the segments are unchanged, the document cannot be deleted from the old segments, nor can it be updated in the old segments to map a new document version. Alternatively, each submission point will contain a
.del file, listing which segment and which document has been deleted. When a document is "deleted", it is just marked in the .del file. The "deleted" document can still be indexed, but it will be removed when the final result is returned.
The same is true for document update: when the document is updated, the old version of the document will be marked as deleted, and the new version of the document will be indexed in the new segment. Maybe the old and new versions of the documents will be retrieved, but the old versions of the documents will be removed when the final result is returned.