Elasticsearch basic finishing-Romance from programmers

created at 07-08-2021 views: 29

1 ES origin

Elasticsearch comes from the Compass library, the first open source project of the author Shay Banon, and the original purpose of this library was to make a recipe search engine for Shay's wife who was learning to cook at the time.

I immediately think of Pinterest’s creative source and founder Ben Silbermann. In order to make it easier for his girlfriend to find an engagement ring, he can easily paste the same page with thumbtacks for comparison.

Yes it is! The programmer is the most romantic person

2 ES basics

2.1 Data model

data model of elasticsearch

As shown 

  • An ES Index (index, such as product search index, order search index) cluster is composed of multiple Nodes. Each node is an instance of ES.
  • There will be multiple shards (shards) on each node, P1 P2 is the primary shard R1 R2 is the replica shard
  • Each shard corresponds to a Lucene Index (underlying index file)
  • “Lucene Index” is a collective term. It is composed of multiple Segments (segment files, which are inverted indexes). Each segment file is stored as a Doc document.

2.2 Logical concept

ES itself is schema less, and there are special fields that need to be set through Mapping. Each data point is a row of data Document, and the ES data classification is done through the Index layer.
The basic concept of Elassticsearch-data model, as shown in the table bellow, compares the concept of ES data model with traditional databases.

ElasticsearchSQL database
type table

2.3 Physical storage

First, it is divided into two layers, one is ES, and the bottom is Lucene. An ES cluster is composed of multiple nodes. An ES instance will host some shards, and multiple shards will fall on different machines. P1 and P2 are two different shards, and the shards have the concept of master and slave. For each shard of ES, it is a Lucene Index.

2.3.1 Node

An ES host running a single instance is called a node. It is a member of the cluster and can store data, participate in cluster indexing and search operations. Similar to a cluster, nodes are identified by their names, and the default is a random Marvel character name automatically generated at startup. Users can customize any name they want to use as needed, but for management purposes, this name should be as recognizable as possible. The node determines the cluster it wants to join through the ES cluster name configured for it.

2.3.3 Shard and Replica

ES's "shard" mechanism can distribute the data in an index to multiple nodes. It divides an index into multiple underlying physical Lucene indexes to complete the partition storage function of index data. A physical Lucene index is called a shard. Each shard is a fully functional and independent index inside, so it can be stored by any host in the cluster. When creating an index, the user can specify the number of its shards, the default number is 5.

There are two types of shards: primary and replica, that is, the primary shard and the replica shard. Primary shard is used for document storage. Each new index will automatically create 5 Primary shards (the latest version is changed to 1). Of course, this number can be defined by configuration before index creation. However, once the creation is completed, its Primary The number of shards cannot be changed. Replica shard is a copy of Primary Shard, used for redundant data and improving search performance. Each Primary shard is configured with one Replica shard by default, but multiple shards can be configured, and the number can be changed dynamically. ES will automatically increase or decrease the number of these Replica shards as needed.

The ES cluster can be composed of multiple nodes, and each shard is stored on these nodes in a distributed manner.

ES can automatically move shards between nodes as needed, for example when adding nodes or node failures. In short, sharding realizes the distributed storage of the cluster, and the copy realizes its distributed processing and redundancy functions.

3 Data persistence

Data persistence of elasticsearch

4 Lucence index principle

4.1 newly received data is written into the new index file

When dynamically updating the index, the inverted index that has been generated is not modified, but a new segment (segment) is generated
Each segment is an inverted index, and then another commit file is used to record all the segments in the index, and the data source for generating the segments is placed in the buffer in the memory

There are four main steps to persistence, write->refresh->flush->merge:

  • Write in-memory buffer and transaction log translog
  • Periodically refresh to the segment file segment, which can be retrieved
  • Regular flush segement placement to clear translog
  • Periodically merge segments to optimize the process

4.2 write

Each time ES adds a new data record, it will double write the data to the translog and in-memory buffer memory buffers

write in data

At this time, it cannot be retrieved yet, and the data must be refreshed to the segment before it can be retrieved

4.3 refresh

By default, es executes refresh every 1s, which is too performance-consuming. You can modify this refresh interval through index.refresh_interval.
The entire refresh specifically does the following things

  • All documents in the memory buffer are written to a new segment, but fsync is not called, so the data may be lost. At this time, the segment is first written to the kernel's file system cache
  • The document inside the segment is opened can be viewed and retrieved
  • Clear the in-memory buffer, as shown in the figure below after clearing


4.4 flush

As the translog file becomes larger and larger, you should consider flushing the data in the memory to the disk. This process is called flush

  1. Write all the documents in the memory buffer to a new segment
  2. Empty the memory buffer
  3. Write commit point information to disk
  4. File system page cache (segments) fsync to disk
  5. Delete the old translog file, so at this time the segments in the memory have been written to the disk, there is no need for translog to ensure data security, the effect after flush is as follows

flush to clear data

4.5 difference between flush and fsync

Flush is to flush the data in the memory (including translog and segments) to the disk, while fsync is only the disk that flushes the translog (to ensure that the data is not lost).

4.6 segment merge

Through the automatic refresh mechanism every second, a new segment will be created, and there will be a lot of segments in a short time. The segment consumes the file handle, memory, and CPU clock of the system. The most important thing is that every request will check all the segments in turn. The more segments, the slower the retrieval will be.

ES solves this problem by merging these segments in the background. Merge small segments to large ones

This process is also the process in which the "deleted" documents are actually cleared out of the file system, because the documents marked as deleted will not be copied to the large segment.

segment merge

During the indexing process, the refresh process will create new segments and then open them for indexing.
The merge process will select some small segments and merge them into a large segment. This process will not interrupt retrieval and index creation. Once the merge is completed, the old segments will be deleted.

delete old segments

The new segment is flushed to disk
A new commit point is written, including new segments, excluding old small segments
The new segment is opened for indexing
Old segments are deleted

The following is a supplementary introduction to translog

4.7 Translog

Based on the consideration of node downtime, Lucence will place the Translog every time it writes, similar to db binlog, but the difference is that db binlog sets the expiration time in days by expire_logs_days=7. The default is 7 days.
The translog is cleared after each flush. The clearing strategy can be set through several dimensions

  • index.translog.flush_threshold_ops, execute a flush after how many operations are performed, the default is unlimited
  • index.translog.flush_threshold_size, flush when the size of translog exceeds this parameter, the default is 512mb
  • index.translog.flush_threshold_period, how often to force flush once, default 30m
  • index.translog.interval, how often does es go to check whether the translog meets the flush condition

The translog log provides a persistent record of all operations that have not been flushed to disk. When ES is started, it will use the latest commit point to restore all existing segments from the disk, and then reproduce all the operations in the translog to add updates. These updates occur after the record of the latest commit. fsync.

The translog log can also be used to provide real-time CRUD. When you try to read, update, or delete a document by document ID, it will first check the translog log to see if there is the latest update, and then get the document from the corresponding segment. This means that it will operate on the latest version of the document every time, in real time.
Theoretically, setting this expiration strategy, after the translog is obtained before flushing, can be used for dual-computer room synchronization or further message notification processing, which can also achieve great results, and the feasibility has yet to be tried.

4.8 Write operation

The main shard + at least one of the secondary shards are all written successfully before returning success

Write operation

specific process:

1 The client initiates a document writing request to node1

2 Node1 calculates that the document belongs to the shard0 based on the document ID (_id field), and then routes the request to the main shard P0 of Node3

Routing formula: shard = hash(routing)% number_of_primary_shards

3 After Node3 executes the write request on p0, if it succeeds, it will route the request in parallel to Node1 Node2 and its replica R0, and after all success, Node1 will report to the client.

wait_for_active_shards to configure the replication allocation synchronization strategy
-Set to 1 means return only after writing the main shard
-Set to all to wait for all replica shards to be written before returning
-Set to a value between 1-number_of_replicas+1, for example, if there are 2 replica shards, if one of them is successfully written, it will be returned

timeout controls the waiting time when the cluster's abnormal copy synchronization shard is unavailable

Read operation

A document can be read on any primary and secondary shards

Read operation

Reading process:

  1. The client initiates a read request to Node1
  2. Node1 calculates based on the document ID (_id field) that the document belongs to the shard 0, which is available on all nodes. This time it routes the request to Node2 based on load balancing.
  3. Node2 returns the document to Node1, and Node1 returns the document to the client

Update operation

The update operation is actually to read first and then write

Update operation

Update process:

  1. The client sends the update request to Node1
  2. Node1 calculates based on the document ID (_id field) that the document belongs to shard0, and its main shard is on Node, so the request is routed to Node3
  3. Node3 reads the document from p0, changes the json content of the source field, and then re-indexes the modified data in P0. If the document is modified by other processes at this time, the 3 steps will be executed again. If this process exceeds the number of retries set by retryon_confilct, it will be abandoned.
  4. If Node3 successfully updates the document, it will synchronize the new version of the document to the replica shards of Node1 and Node2 to re-index. Once all the replicas report success, Node3 returns success to the requested Node1 node, and then Node1 returns to the client. success

Update and delete

Since the segments are unchanged, the document cannot be deleted from the old segments, nor can it be updated in the old segments to map a new document version. Alternatively, each submission point will contain a .del file, listing which segment and which document has been deleted. When a document is "deleted", it is just marked in the .del file. The "deleted" document can still be indexed, but it will be removed when the final result is returned.

The same is true for document update: when the document is updated, the old version of the document will be marked as deleted, and the new version of the document will be indexed in the new segment. Maybe the old and new versions of the documents will be retrieved, but the old versions of the documents will be removed when the final result is returned.

created at:07-08-2021
edited at: 07-08-2021: