Elasticsearch Document Operations, Mapping Configuration, and Search APIs
Document Lifecycle and Batch Operations
Index a document with auto-generated identifier:
POST inventory/_doc
{
"item_code": "SKU-8842",
"timestamp": "2023-08-21T14:35:22Z",
"notes": "Initial stock entry"
}
Create with explicit ID, failing if document exists:
PUT inventory/_doc/8842?op_type=create
{
"item_code": "SKU-8842",
"timestamp": "2023-08-21T15:00:00Z",
"notes": "Reserved inventory"
}
Alternative syntax for conditional creation:
PUT inventory/_create/8842
{
"item_code": "SKU-8842",
"timestamp": "2023-08-21T15:00:00Z",
"notes": "Reserved inventory"
}
Retrieve by identifier:
GET inventory/_doc/8842
Full document replacement:
PUT inventory/_doc/8842
{
"item_code": "SKU-8842",
"status": "active"
}
Partial update with doc merging:
POST inventory/_update/8842/
{
"doc": {
"last_updated": "2023-08-21T16:00:00Z",
"warehouse": "East-01"
}
}
Remove document:
DELETE inventory/_doc/8842
Bulk Processing
Execute multiple operations atomically. The create action fails on duplicates, while index performs upsert operations:
POST _bulk
{"index":{"_index":"transactions","_id":"txn-001"}}
{"amount":150.00,"currency":"USD"}
{"delete":{"_index":"transactions","_id":"txn-099"}}
{"create":{"_index":"archive","_id":"txn-001"}}
{"amount":150.00,"archived":true}
{"update":{"_id":"txn-001","_index":"transactions"}}
{"doc":{"status":"processed"}}
First execution creates the document with version 1. Subsequent executions of the same bulk request will update the existing document when using index, fail on create if ID exists, and return not_found for non-existent deletions.
Multi-Document Retrieval
Fetch multiple documents across indices:
GET /_mget
{
"docs": [
{"_index": "transactions", "_id": "txn-001"},
{"_index": "transactions", "_id": "txn-002"}
]
}
With implicit index context:
GET /transactions/_mget
{
"docs": [
{"_id": "txn-001"},
{"_id": "txn-002"}
]
}
Control source field inclusion:
GET /_mget
{
"docs": [
{"_index": "transactions", "_id": "txn-001", "_source": false},
{"_index": "transactions", "_id": "txn-002", "_source": ["amount", "currency"]},
{"_index": "transactions", "_id": "txn-003", "_source": {"include": ["metadata"], "exclude": ["metadata.internal"]}}
]
}
Inverted Index Fundamentals
Elasticsearch utilizes inverted indices for high-performance full-text retrieval. This structure maintains a vocabulary of unique terms extracted from the document corpus, with each term mapping to a posting list containing document references and positional metadata.
Text Analysis and Tokenization
Built-in analyzers process text during indexing and search:
- Standard: Grammar-aware tokenization with lowercase normalization
- Simple: Non-letter character delimiting with lowercase conversion
- Stop: Simple analyzer with stop word removal (articles, prepositions)
- Whitespace: Space-delimited splitting preserving original case
- Keyword: No-op analyzer treating input as single token
- Pattern: Regular expression-based splitting (default: non-word characters)
- Language: Language-specific tokenization with stemming (30+ languages available)
Standard Analyzer Behavior:
GET _analyze
{
"analyzer": "standard",
"text": "The Quick Brown-Fox jumps over 3 lazy dogs!"
}
Produces tokens: the, quick, brown, fox, jumps, over, 3, lazy, dogs
Simple Analyzer Behavior:
GET _analyze
{
"analyzer": "simple",
"text": "The Quick Brown-Fox jumps over 3 lazy dogs!"
}
Produces tokens: the, quick, brown, fox, jumps, over, lazy, dogs (numeric tokens excluded)
Stop Analyzer Behavior:
GET _analyze
{
"analyzer": "stop",
"text": "The Quick Brown-Fox jumps over 3 lazy dogs!"
}
Produces tokens: quick, brown, fox, jumps, lazy, dogs (removes the, over)
Whitespace Analyzer Behavior:
GET _analyze
{
"analyzer": "whitespace",
"text": "The Quick Brown-Fox jumps over 3 lazy dogs!"
}
Preserves case and punctuation: The, Quick, Brown-Fox, jumps, over, 3, lazy, dogs!
Keyword Analyzer Behavior:
GET _analyze
{
"analyzer": "keyword",
"text": "The Quick Brown-Fox jumps over 3 lazy dogs!"
}
Single token: The Quick Brown-Fox jumps over 3 lazy dogs!
English Analyzer with Stemming:
GET _analyze
{
"analyzer": "english",
"text": "The Quick Brown-Fox jumps over 3 lazy dogs!"
}
Produces stemmed tokens: quick, brown, fox, jump, over, 3, lazi, dog
CJK Text Processing:
POST _analyze
{
"analyzer": "icu_analyzer",
"text": "搜索引擎技术"
}
Segments text into meaningful units: 搜索, 引擎, 技术
Compared to Standard analyzer on CJK text which produces single character tokens: 搜, 索, 引, 擎, 技, 术
Search APIs
URI-based Search:
GET ecommerce/_search?q=customer_name:Alice
GET ecommerce*/_search?q=status:pending
GET /_all/_search?q=amount:>100
Request Body Search:
POST sales/_search
{
"query": {
"match": {
"description": "laptop computer"
}
}
}
Relevance Metrics:
- Precision: Ratio of relevant documents in retrieved results
- Recall: Ratio of retrieved relevant documents to total relevant documents
- Ranking: Ordering by relevance score
URI Search Syntax
Query string parameters:
GET /products/_search?q=wireless&df=name&sort=price:asc&from=0&size=20&timeout=1s
q: Query expression using Query String Syntaxdf: Default search field (searches all fields if omitted)sort,from,size: Pagination controlsprofile: Include execution plan details
Field Specification vs Generic Search:
GET /products/_search?q=name:headphones
GET /products/_search?q=headphones
Term vs Phrase Queries:
GET /products/_search?q=name:wireless headphones
Executes as: wireless OR headphones
GET /products/_search?q=name:"wireless headphones"
Executes as phrase match requiring exact word adjacency and order.
Boolean Logic:
GET /products/_search?q=name:(wireless AND headphones)
GET /products/_search?q=name:(wireless NOT bluetooth)
GET /products/_search?q=name:(+wireless +noise-canceling)
Range Queries:
GET /products/_search?q=price:>100
GET /products/_search?q=created:[2023-01-01 TO 2023-12-31}
GET /products/_search?q=stock:[10 TO *]
Wildcard and Fuzzy Matching:
GET /products/_search?q=name:head*
GET /products/_search?q=name:headphon~1
Query DSL
Match query with OR logic (default):
POST articles/_search
{
"query": {
"match": {
"content": "machine learning"
}
}
}
Match query with AND logic:
POST articles/_search
{
"query": {
"match": {
"content": {
"query": "machine learning",
"operator": "and"
}
}
}
}
Phrase matching:
POST articles/_search
{
"query": {
"match_phrase": {
"content": {
"query": "artificial intelligence"
}
}
}
}
Phrase with slop (word distance tolerance):
POST articles/_search
{
"query": {
"match_phrase": {
"content": {
"query": "artificial general",
"slop": 2
}
}
}
}
Multi-field phrase match:
POST articles/_search
{
"query": {
"multi_match": {
"query": "neural networks",
"type": "phrase",
"fields": ["title", "abstract", "content"],
"slop": 1
}
}
}
Query String Syntax:
POST users/_search
{
"query": {
"query_string": {
"default_field": "bio",
"query": "(Java AND Spring) OR (Python AND Django)"
}
}
}
Simple Query String:
POST users/_search
{
"query": {
"simple_query_string": {
"query": "developer engineer",
"fields": ["title", "skills"],
"default_operator": "AND"
}
}
}
Note: Simple Query String does not support complex boolean expressions; reserved characters are treated as literals.
Mapping Configuration
Dynamic Mapping Behavior:
PUT sensor_data/_doc/1
{
"device_id": "dev-001",
"temperature": 23.5,
"is_active": true,
"metadata": {
"location": "building-a"
}
}
Elasticsearch infers:
device_id: text with keyword subfieldtemperature: floatis_active: booleanmetadata: object
Dynamic Mapping Controls:
dynamic: true (default) - New fields indexed and searchable
dynamic: false - New fields ignored in indexing but stored in _source
dynamic: strict - Reject documents with unrecognized fields (HTTP 400)
PUT strict_index
{
"mappings": {
"dynamic": "strict",
"properties": {
"title": { "type": "text" }
}
}
}
Explicit Mapping Definition:
PUT customer_profiles
{
"mappings": {
"properties": {
"email": {
"type": "keyword",
"index": false
},
"full_name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
},
"phone": {
"type": "keyword",
"null_value": "N/A"
},
"join_date": {
"type": "date",
"format": "yyyy-MM-dd"
}
}
}
}
Copy_to for Composite Fields:
PUT contact_directory
{
"mappings": {
"properties": {
"first_name": {
"type": "text",
"copy_to": "full_name"
},
"last_name": {
"type": "text",
"copy_to": "full_name"
},
"full_name": {
"type": "text"
}
}
}
}
Search across both fields:
GET contact_directory/_search?q=full_name:(John Smith)
Array Handling:
Arrays require no special mapping configuration. Any field accepting single values accepts multiple values:
PUT tags/_doc/1
{
"category": "electronics",
"labels": ["gadget", "mobile", "wireless"]
}
Custom Analysis Pipelines
Character Filters:
Remove HTML entities:
POST _analyze
{
"tokenizer": "standard",
"char_filter": ["html_strip"],
"text": "<p>Device configuration</p>"
}
Mapping filter for synonym preprocessing:
POST _analyze
{
"tokenizer": "standard",
"char_filter": [
{
"type": "mapping",
"mappings": ["- => _", ":) => positive", ":( => negative"]
}
],
"text": "State-of-the-art :)"
}
Tokenizers:
Path hierarchy tokenizer:
POST _analyze
{
"tokenizer": "path_hierarchy",
"text": "/var/log/elasticsearch/cluster/nodes"
}
Produces: /var, /var/log, /var/log/elasticsearch, etc.
Token Filters:
Snowball stemmer with stop word removal:
GET _analyze
{
"tokenizer": "whitespace",
"filter": ["lowercase", "stop", "snowball"],
"text": "The computers are computing computational problems"
}
Produces: comput, problem (stemmed forms)
Custom Analyzer Definition:
PUT log_index
{
"settings": {
"analysis": {
"analyzer": {
"log_analyzer": {
"type": "custom",
"tokenizer": "standard",
"char_filter": ["html_strip"],
"filter": ["lowercase", "stop"]
}
}
}
},
"mappings": {
"properties": {
"message": {
"type": "text",
"analyzer": "log_analyzer"
}
}
}
}
Index Templates
Index templates apply configurations to new indices matching patterns:
PUT _template/base_configuration
{
"index_patterns": ["logs-*", "metrics-*"],
"order": 0,
"settings": {
"number_of_shards": 2,
"number_of_replicas": 1
},
"mappings": {
"dynamic_templates": [
{
"strings_as_keywords": {
"match_mapping_type": "string",
"mapping": {
"type": "keyword"
}
}
}
]
}
}
Template inheritance by order (higher values override lower):
PUT _template/priority_configuration
{
"index_patterns": ["logs-critical-*"],
"order": 1,
"settings": {
"number_of_replicas": 2,
"index.refresh_interval": "1s"
}
}
Dynamic Templates
Dynamic templates control field type inference based on naming conventions:
PUT dynamic_content
{
"mappings": {
"dynamic_templates": [
{
"boolean_flags": {
"match_mapping_type": "string",
"match": "is_*",
"unmatch": "*_text",
"mapping": {
"type": "boolean"
}
}
},
{
"text_content": {
"match_mapping_type": "string",
"mapping": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
}
}
}
]
}
}
Path-based matching:
PUT hierarchical_data
{
"mappings": {
"dynamic_templates": [
{
"path_based_copy": {
"path_match": "user.*",
"path_unmatch": "user.password",
"mapping": {
"type": "text",
"copy_to": "user_profile"
}
}
}
]
}
}
Aggregation Framework
Aggregations enable data summarization and analytics:
Bucket Aggregation:
GET orders/_search
{
"size": 0,
"aggs": {
"by_category": {
"terms": {
"field": "category.keyword"
}
}
}
}
Metric Aggregations within Buckets:
GET orders/_search
{
"size": 0,
"aggs": {
"by_category": {
"terms": {
"field": "category.keyword"
},
"aggs": {
"avg_price": {
"avg": {
"field": "price"
}
},
"max_price": {
"max": {
"field": "price"
}
},
"min_price": {
"min": {
"field": "price"
}
}
}
}
}
}
Nested Sub-aggregations:
GET orders/_search
{
"size": 0,
"aggs": {
"by_region": {
"terms": {
"field": "region"
},
"aggs": {
"price_stats": {
"stats": {
"field": "total_amount"
}
},
"by_payment_method": {
"terms": {
"field": "payment_type",
"size": 5
}
}
}
}
}
}
Schema Evolution
Add fields to existing mappings:
PUT existing_index/_mapping
{
"properties": {
"processed_date": {
"type": "date",
"format": "yyyy-MM-dd"
},
"processing_time": {
"type": "integer"
}
}
}
Performance Tuning
Script Compilation Limits:
When encountering max_compilations_rate errors during bulk updates:
PUT _cluster/settings
{
"transient": {
"script.max_compilations_rate": "1000/1m"
}
}
Result Window Expansion:
For deep pagination requirements:
PUT large_dataset/_settings
{
"index": {
"max_result_window": 100000
}
}
Alternatively, use search_after or scroll APIs for deep pagination instead of expanding max_result_window.
Total Hit Count Accuracy:
For accurate hit counts beyond 10,000 documents:
POST large_dataset/_search
{
"track_total_hits": true,
"query": {
"match_all": {}
}
}