Fading Coder

One Final Commit for the Last Sprint

Home > Tech > Content

Elasticsearch Usage Guide

Tech Apr 22 9

Elasticsearch is an open-source search engine built on Apache Lucene™, designed to simplify full-text search by exposing a consistent RESTful API while handling the complexity of Lucene internally. Key features include:

  • Distributed real-time document storage with indexable fields
  • Real-time distributed search and analytics
  • Horizontal scalability supporting hundreds of nodes and petabytes of structured or unstructured data

Elasticsearch operates on documents rather than rows and columns, enabling powerful full-text search capabilities.

Operations

Indexing

Data in Elasticsearch is stored in indices. An index can contain multiple document types, each holding numerous documents with various properties. A document's path is structured as:

/index/type/id

For example: "/corporate/employee/123" represents an employee document with ID 123 in the "corporate" index and "employee" type.

Basic Document Operations

  • Insert: POST /corporate/employee/_doc/123
{
  "name": "Alex",
  "department": "Engineering",
  "position": "Software Developer",
  "skills": ["Java", "Python", "Elasticsearch"]
}
  1. Update: PUT /corporate/employeee/_doc/123
  2. Delete: DELETE /corporate/employee/_doc/123
  3. Check Existence: HEAD /corporate/employee/_doc/123

Search Types

Query DSL Search

Search criteria are specified in the request body:

GET /corporate/employee/_search
{
  "query": {
    "match": {
      "department": "Engineering"
    }
  }
}

Lightweight Search

Search parameters are passed directly in the URL:

GET /corporate/employee/_search?q=department:Engineering

Query Examples

Phrase Matching

GET /corporate/employee/_search
{
  "query": {
    "match_phrase": {
      "skills": "Java Python"
    }
  }
}

Returns documents where "Java" and "Python" appear consecutively in the skills field.

Highlighting

Add highlighted matches to the response:

GET /corporate/employee/_search
{
  "query": {
    "match": {
      "about": "data analysis"
    }
  },
  "highlight": {
    "fields": {
      "about": {}
    }
  }
}

Index Management

  • Global Search: GET /_search
  • Multi-Index Search: GET /index1,index2/_search
  • Pagination: GET /_search?from=0&size=10
  • Filter Search: GET /products/_search ``` { "query": { "bool": { "must": [ {"term": {"status": "active"}}, {"range": {"price": {"gte": 100, "lte": 500}}} ], "must_not": [{"term": {"category": "deprecated"}}] } } }
    
    

Aggregations

Aggregations group documents into buckets and calculate metrics on them.

Simple Aggreagtion

Count documents by a specific field:

GET /vehicles/sales/_search
{
  "size": 0,
  "aggs": {
    "popular_colors": {
      "terms": {
        "field": "color"
      }
    }
  }
}

Result:

{
  "aggregations": {
    "popular_colors": {
      "buckets": [
        {"key": "red", "doc_count": 120},
        {"key": "blue", "doc_count": 80},
        {"key": "green", "doc_count": 50}
      ]
    }
  }
}

Combined Search and Aggregation

GET /vehicles/sales/_search
{
  "size": 0,
  "query": {
    "match_all": {}
  },
  "aggs": {
    "price_stats": {
      "stats": {
        "field": "price"
      }
    }
  }
}

Cluster Operations

Cluster Health

GET /_cluster/health

Status values:

  • green: All shards and replicas are active.
  • yellow: All primary shards are active, but some replicas are missing.
  • red: Some primary shards are missing or failed.

Index Configuration

Set index settings at creation:

PUT /blogs
{
  "settings": {
    "number_of_shards": 3,
    "number_of_replicas": 1
  }
}

Shards are distributed across nodes for load balancing and redundancy.

Node Types

Nodes in Elasticsearch can be:

  • Master nodes: Manage cluster-wide operations like index creation.
  • Data nodes: Store data and perform searches.
  • Ingest nodes: Preprocess documents before indexing.

Shard Allocation

Elasticsearch automatically balances shards across nodes and maintains redundancy through replication. When adding or removing nodes:

  • Shards are reallocated to maintain balance.
  • Replicas are created to ensure data availability.
  • Queries are routed to the appropriate nodes.

Clusters scale horizontally by adding more nodes, with Elasticsearch handling the distribution transparently.

Related Articles

Understanding Strong and Weak References in Java

Strong References Strong reference are the most prevalent type of object referencing in Java. When an object has a strong reference pointing to it, the garbage collector will not reclaim its memory. F...

Comprehensive Guide to SSTI Explained with Payload Bypass Techniques

Introduction Server-Side Template Injection (SSTI) is a vulnerability in web applications where user input is improper handled within the template engine and executed on the server. This exploit can r...

Implement Image Upload Functionality for Django Integrated TinyMCE Editor

Django’s Admin panel is highly user-friendly, and pairing it with TinyMCE, an effective rich text editor, simplifies content management significantly. Combining the two is particular useful for bloggi...

Leave a Comment

Anonymous

◎Feel free to join the discussion and share your thoughts.