Elasticsearch Usage Guide
Elasticsearch is an open-source search engine built on Apache Lucene™, designed to simplify full-text search by exposing a consistent RESTful API while handling the complexity of Lucene internally. Key features include:
- Distributed real-time document storage with indexable fields
- Real-time distributed search and analytics
- Horizontal scalability supporting hundreds of nodes and petabytes of structured or unstructured data
Elasticsearch operates on documents rather than rows and columns, enabling powerful full-text search capabilities.
Operations
Indexing
Data in Elasticsearch is stored in indices. An index can contain multiple document types, each holding numerous documents with various properties. A document's path is structured as:
/index/type/id
For example: "/corporate/employee/123" represents an employee document with ID 123 in the "corporate" index and "employee" type.
Basic Document Operations
- Insert: POST /corporate/employee/_doc/123
{
"name": "Alex",
"department": "Engineering",
"position": "Software Developer",
"skills": ["Java", "Python", "Elasticsearch"]
}
- Update: PUT /corporate/employeee/_doc/123
- Delete: DELETE /corporate/employee/_doc/123
- Check Existence: HEAD /corporate/employee/_doc/123
Search Types
Query DSL Search
Search criteria are specified in the request body:
GET /corporate/employee/_search
{
"query": {
"match": {
"department": "Engineering"
}
}
}
Lightweight Search
Search parameters are passed directly in the URL:
GET /corporate/employee/_search?q=department:Engineering
Query Examples
Phrase Matching
GET /corporate/employee/_search
{
"query": {
"match_phrase": {
"skills": "Java Python"
}
}
}
Returns documents where "Java" and "Python" appear consecutively in the skills field.
Highlighting
Add highlighted matches to the response:
GET /corporate/employee/_search
{
"query": {
"match": {
"about": "data analysis"
}
},
"highlight": {
"fields": {
"about": {}
}
}
}
Index Management
- Global Search: GET /_search
- Multi-Index Search: GET /index1,index2/_search
- Pagination: GET /_search?from=0&size=10
- Filter Search: GET /products/_search ```
{
"query": {
"bool": {
"must": [
{"term": {"status": "active"}},
{"range": {"price": {"gte": 100, "lte": 500}}}
],
"must_not": [{"term": {"category": "deprecated"}}]
}
}
}
Aggregations
Aggregations group documents into buckets and calculate metrics on them.
Simple Aggreagtion
Count documents by a specific field:
GET /vehicles/sales/_search
{
"size": 0,
"aggs": {
"popular_colors": {
"terms": {
"field": "color"
}
}
}
}
Result:
{
"aggregations": {
"popular_colors": {
"buckets": [
{"key": "red", "doc_count": 120},
{"key": "blue", "doc_count": 80},
{"key": "green", "doc_count": 50}
]
}
}
}
Combined Search and Aggregation
GET /vehicles/sales/_search
{
"size": 0,
"query": {
"match_all": {}
},
"aggs": {
"price_stats": {
"stats": {
"field": "price"
}
}
}
}
Cluster Operations
Cluster Health
GET /_cluster/health
Status values:
- green: All shards and replicas are active.
- yellow: All primary shards are active, but some replicas are missing.
- red: Some primary shards are missing or failed.
Index Configuration
Set index settings at creation:
PUT /blogs
{
"settings": {
"number_of_shards": 3,
"number_of_replicas": 1
}
}
Shards are distributed across nodes for load balancing and redundancy.
Node Types
Nodes in Elasticsearch can be:
- Master nodes: Manage cluster-wide operations like index creation.
- Data nodes: Store data and perform searches.
- Ingest nodes: Preprocess documents before indexing.
Shard Allocation
Elasticsearch automatically balances shards across nodes and maintains redundancy through replication. When adding or removing nodes:
- Shards are reallocated to maintain balance.
- Replicas are created to ensure data availability.
- Queries are routed to the appropriate nodes.
Clusters scale horizontally by adding more nodes, with Elasticsearch handling the distribution transparently.