批量格式
最后更新于:2022-04-01 00:39:09
### Why the funny format?
When we learned about Bulk requests earlier in <>, you may have askedyourself: ``Why does the`bulk`API require the funny format with the newlinecharacters, instead of just sending the requests wrapped in a JSON array, likethe`mget` API?''
To answer this, we need to explain a little background:
Each document referenced in a bulk request may belong to a different primaryshard, each of which may be allocated to any of the nodes in the cluster. Thismeans that every _action_ inside a `bulk` request needs to be forwarded to thecorrect shard on the correct node.
If the individual requests were wrapped up in a JSON array, that would meanthat we would need to:
- parse the JSON into an array (including the document data, whichcan be very large)
- look at each request to determine which shard it should go to
- create an array of requests for each shard
- serialize these arrays into the internal transport format
- send the requests to each shard
It would work, but would need a lot of RAM to hold copies of essentiallythe same data, and would create many more data structures that the JVMwould have to spend time garbage collecting.
Instead, Elasticsearch reaches up into the networking buffer, where the rawrequest has been received and reads the data directly. It uses the newlinecharacters to identify and parse just the small _action/metadata_ lines inorder to decide which shard should handle each request.
These raw requests are forwarded directly to the correct shard. Thereis no redundant copying of data, no wasted data structures. The entirerequest process is handled in the smallest amount of memory possible.