Elasticsearch Notes

From Federal Burro of Information
Revision as of 15:14, 12 September 2019 by David (talk | contribs) (→‎also see)
Jump to navigationJump to search

how to secure

Tough, use an app proxy to be sure. For now: local access only. Not designed with security in mind.

to file /etc/elasticsearch/elasticsearch.yml added to the end

script.disable_dynamic: true


quick stuff

HEAD is no longer used, instead use kibana, which is it's own service.

elasticsearch-head and elastic search plugin ( https://github.com/mobz/elasticsearch-head )

_search?search_type=count

{
 "aggs" : {
  "all_users": {
   "terms": {
    "field": "screen_name"
   }
  }
 }
}

list indexes and summary:

curl 'localhost:9200/_cat/indices?v'

show health

curl 'localhost:9200/_cat/health?v'

list nodes:

curl 'localhost:9200/_cat/nodes?v'

delete an index

curl -XDELETE 'http://localhost:9200/twitterindex_v2/'

created an index with mappings from a file:

curl -XPUT localhost:9200/twitterindex_v2 -T 'mapping.1'

get the mappoings for an index

curl -XGET "http://localhost:9200/test-index/_mapping" | jsonlint > mapping

pattern of data import

  1. import data
  2. dump mapping
  3. edit mapping
  4. create new index with new mapping
  5. import data again.


Explicitly mapping date fields

from: http://joelabrahamsson.com/dynamic-mappings-and-dates-in-elasticsearch/

curl -XPUT "http://localhost:9200/myindex" -d'
{
   "mappings": {
      "tweet": {
         "date_detection": false,
         "properties": {
             "postDate": {
                 "type": "date"
             }
         }
      }
   }
}'
curl -XPUT 'https://search-myiotcatcher-eq4tipuq24ltctdtgz5hydwvb4.us-east-1.es.amazonaws.com/iotworld_v4' -H 'Content-Type: application/json' -d'
{
    "container" : {
    "_timestamp" : {"enabled": true, "type":"date", "format": "epoch_second", "store":true, "path" : "timestamp"}
    },
  "mappings": {
    "sensordata": {
        "properties": {
            "temperature": {
                "type": "float"
            },
            "humidity": {
                "type": "float"
            },
            "timestamp": {
                "type": "date"
            }
        }
    }
  }
}
'


curl -XGET 'https://search-myiotcatcher-eq4tipuq24ltctdtgz5hydwvb4.us-east-1.es.amazonaws.com/iotworld_v4/_mapping' | python -m json.tool


curl -XGET 'https://search-myiotcatcher-eq4tipuq24ltctdtgz5hydwvb4.us-east-1.es.amazonaws.com/iotworld_v4/sensordata/_search' | python -m json.tool

Changing mappings

so you don't like the data mapping and you want to change it:

first dump the existing mapping to a file:

curl -XGET 'http://localhost:9200/fitstat_v1/_mapping' | python -m json.tool > fitstat_v1_mapping

then copy that mapping to the new version:

cp fitstat_v1_mapping fitstat_v2_mapping

edit the new mapping, for example adding "type": "nested", to you nested objects.

then create a new index specifying the new mapping:

curl -XPUT 'http://localhost:9200/fitstat_v2' -d @fitstat_v2_mapping

next: extractin from old, puting into new and nuking old.

... FIXME

backup

from: https://www.elastic.co/guide/en/elasticsearch/guide/current/backing-up-your-cluster.html

add to the end of /etc/elasticsearch/elasticsearch.yml :

path.repo: ["/mnt/freenas/dataset_elasticsearch/backup"]
root@keres /mnt/freenas/dataset_elasticsearch/backup # curl -XPUT "http://localhost:9200/_snapshot/freenas_backup" -d'
{
    "type": "fs",
    "settings": {
        "location": "/mnt/freenas/dataset_elasticsearch/backup"
    }
}'


https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-snapshots.html

example searches

{
  "query": { "match_all": {} }
}


{
  "query": { "match": { "filter_level": "low" } }
}
{
  "query": { "match": { "source": "iPad" } },
   "_source": [ "source" , "text"]
}
{
  "size": 0,
  "aggs": {
    "group_by_state": {
      "terms": {
        "field": "source"
      }
    }
  }
}

"size": 0, - print agg only and not hits. PERFORMANCE!!

{
  "fields": [],
  "sort": [
    {
      "zkb.totalValue": {
        "order": "asc"
      }
    },
    "_score"
  ],
  "query": {
    "range": {
      "zkb.totalValue": {
        "lt": 200000000
      }
    }
  }
}
{
"fields" : [
    "victim.shipTypeID" ,
    "victim.corporationName",
    "victim.characterID" ,
    "victim.characterName"],
"sort" : [
        { "zkb.totalValue" : {"order" : "asc"}},
        "_score"
    ],
    "query": {
    "range": {
      "zkb.totalValue": {
        "lt": 200000000
      }
    }
  }
}

changing-mapping-with-zero-downtime

https://www.elastic.co/blog/changing-mapping-with-zero-downtime

aggregates

https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-filters-aggregation.html

https://www.elastic.co/guide/en/logstash/current/plugins-filters-aggregate.html

moving data between indexes

Use ElasticDump ( https://www.npmjs.com/package/elasticdump )

1) yum install epel-release

2) yum install nodejs

3) yum install nodejs npm

4) npm install elasticdump

5) cd node_modules/elasticdump/bin

6)

./elasticdump \
  --input=http://192.168.1.1:9200/original \
  --output=http://192.168.1.2:9200/newCopy \
  --type=data
elasticdump \
  --input=http://localhost:9700/.kibana \
  --output=http://localhost:9700/.kibana_read_only \
  --type=mapping

elasticdump \
  --input=http://localhost:9700/.kibana \
  --output=http://localhost:9700/.kibana_read_only \
  --type=data

Dumping to a file

In this example I dump my AWS Elasticsearch cluster to a file.

it's one index with 20k records, not huge.

time /home/david/node_modules/.bin/elasticdump \
  --input=https://search-myiotcatcher-eq4tipuq24ltctdtgz5hydwvb4.us-east-1.es.amazonaws.com/iotworld_v5 \
  --output=/mnt/freenas/dataset_elasticsearch/iotworld_v5/iotworld_v5_mapping.json \
  --type=mapping
time /home/david/node_modules/.bin/elasticdump \
  --input=https://search-myiotcatcher-eq4tipuq24ltctdtgz5hydwvb4.us-east-1.es.amazonaws.com/iotworld_v5 \
  --output=/mnt/freenas/dataset_elasticsearch/iotworld_v5/iotworld_v5.json \
  --type=data

Disk full -> readonly lock

If the disk fills up the indexes will got into "read-only" mode.

reset it like this:

curl -X PUT http://${HOST}:9200/.kibana/_settings -d '
{
"index": {
"blocks": {
"read_only_allow_delete": "false"
}
}
}' -H'Content-Type: application/json'

and you will get back if it worked:

{"acknowledged":true}

Trim data

#!/bin/sh

export HOST=es.staging.thecarrotlab.com
for i in `curl -s -XGET "http://es.staging.thecarrotlab.com:9200/_cat/indices?v" | grep logsta | sort -k 3 -n -r | awk '{print $3}' | tail -n +32`
do
echo $i

curl -XDELETE "http://es.staging.thecarrotlab.com:9200/$i"
echo

done

clean up old logstash indexes 32 days old +

#!/bin/sh
export HOST=servername
for i in `curl -s -XGET "http://$HOST:9200/_cat/indices?v" | grep logsta | sort -k 3 -n -r | awk '{print $3}' | tail -n +32`
do
echo $i
echo curl -XDELETE "http://$HOST:9200/$i"
done

serverconfig notes

stuff I've added to my default config:

# for backups
path.repo: ["/mnt/freenas/dataset_elasticsearch/backup"]
# to disallow remote code execution
script.disable_dynamic: true


/etc/sysconfig/sysconfig/elasticsearch ( grep -v ^# )

DATA_DIR=/data/elasticsearch/data
LOG_DIR=/data/elasticsearch/log
WORK_DIR=data/elasticsearch/tmp
ES_HEAP_SIZE=2g
ES_GC_LOG_FILE=/data/elasticsearch/log/gc.log

Network

/etc/services updated:

$ grep 9200 /etc/services
elasticsearch-rest 9200/tcp             # elasticsearch-restful api
#wap-wsp         9200/tcp                # WAP connectionless session service
wap-wsp         9200/udp                # WAP connectionless session service
$ grep 9300 /etc/services
elasticsearch-transport 9300/tcp        # elasticsearch-transpost
# vrace           9300/tcp                # Virtual Racing Service
vrace           9300/udp                # Virtual Racing Service

python es

sudo pip install elasticsearch
sudo pip install certifi

to read

Also See