Parsing data from ElasticSearch in R

   ElasticSearch indexed data can be easily parsed in R.
   In this post, I show a simple way of how to read data from ElasticSearch in R. We need an elastic package, you may need to install it if it is not available in your machine. This package provides several functions to work with ElasticSearch in R.
   We'll start by loading the elastic library.

> library(elastic)

   The below image shows a Kibana page, showing ElasticSearch data. Our purpose is to read those rows and collect them into the data frame. I assume that ElasticSearch and Kibana are already installed on your machine, and you have similar elasticsearch data on your server.






To connect ElasticSeach server, we use connect() function.
The default is localhost with no password on port 9200 that is ready to use if your environment is on the localhost. Otherwise, the server needs to be identified.

> connect()
transport:  http 
host:       127.0.0.1 
port:       9200 
path:       NULL 
username:   NULL 
password:   <secret> 
errors:     simple 
headers (names):  NULL

setting a different server name

> connect(es_host = "127.0.0.1")  # change ip

To parse data, we use the Search function. In Search, we need to set several query options such as index name, data type, and target columns. Otherwise, all data will be retrieved from ElasticSearch data.

> head(Search(index="abcd", type="rtype")) 
$took
[1] 4

$timed_out
[1] FALSE

$`_shards`
$`_shards`$total
[1] 5

$`_shards`$successful
[1] 5

$`_shards`$skipped
[1] 0

$`_shards`$failed
[1] 0


$hits
$hits$total
[1] 31

$hits$max_score
[1] 1

$hits$hits
$hits$hits[[1]]
$hits$hits[[1]]$`_index`
[1] "abcd"

$hits$hits[[1]]$`_type`
[1] "rtype"

$hits$hits[[1]]$`_id`
[1] "abcd.7.2017-02-11.02:52:00.Qbi"

$hits$hits[[1]]$`_score`
.............. 

Above is a returned JSON type data. We can clean the data and change it into a data frame. An asdf parameter converts JSON to the data frame type.

> Search(index="abcd", type="rtype", asdf = T) 
$took
[1] 0

$timed_out
[1] FALSE

$`_shards`
$`_shards`$total
[1] 5

$`_shards`$successful
[1] 5

$`_shards`$skipped
[1] 0

$`_shards`$failed
[1] 0


$hits
$hits$total
[1] 31

$hits$max_score
[1] 1

$hits$hits
   _index _type                               _id _score _source.id        _source.time _source.item _source.value
1    abcd rtype    abcd.7.2017-02-11.02:52:00.Qbi      1          7 2017-02-11 02:52:00          Qbi       6461271
2    abcd rtype    abcd.8.2017-02-11.05:38:40.Dxw      1          8 2017-02-11 05:38:40          Dxw       4239156
3    abcd rtype    abcd.8.2017-03-02.05:38:40.Nik      1          8 2017-03-02 05:38:40          Nik       5358013
4    abcd rtype    abcd.7.2017-05-02.02:52:00.Oas      1          7 2017-05-02 02:52:00          Oas       9079367
5    abcd rtype    abcd.1.2017-02-10.10:12:00.Jfg      1          1 2017-02-10 10:12:00          Jfg        346032
6    abcd rtype    abcd.5.2017-02-10.21:18:40.Mls      1          5 2017-02-10 21:18:40          Mls       2838305
7    abcd rtype    abcd.6.2017-02-11.00:05:20.Zur      1          6 2017-02-11 00:05:20          Zur       1934962
8    abcd rtype   abcd.10.2017-05-02.11:12:00.Ekg      1         10 2017-05-02 11:12:00          Ekg        633164
9    abcd rtype y1.2017-06-06.01:12:00.857183.Fut      1          1 2017-02-10 10:12:00          Nog       2976649
10   abcd rtype    abcd.3.2017-02-10.15:45:20.Zqv      1          3 2017-02-10 15:45:20          Zqv       1834427


The result is still not clean, and we only need the last part of this data. In below command, we get data in hits[3]

> Search(index="abcd", type="rtype", asdf = T)$hits[3] 
$hits
   _index _type                               _id _score _source.id        _source.time _source.item _source.value
1    abcd rtype    abcd.7.2017-02-11.02:52:00.Qbi      1          7 2017-02-11 02:52:00          Qbi       6461271
2    abcd rtype    abcd.8.2017-02-11.05:38:40.Dxw      1          8 2017-02-11 05:38:40          Dxw       4239156
3    abcd rtype    abcd.8.2017-03-02.05:38:40.Nik      1          8 2017-03-02 05:38:40          Nik       5358013
4    abcd rtype    abcd.7.2017-05-02.02:52:00.Oas      1          7 2017-05-02 02:52:00          Oas       9079367
5    abcd rtype    abcd.1.2017-02-10.10:12:00.Jfg      1          1 2017-02-10 10:12:00          Jfg        346032
6    abcd rtype    abcd.5.2017-02-10.21:18:40.Mls      1          5 2017-02-10 21:18:40          Mls       2838305
7    abcd rtype    abcd.6.2017-02-11.00:05:20.Zur      1          6 2017-02-11 00:05:20          Zur       1934962
8    abcd rtype   abcd.10.2017-05-02.11:12:00.Ekg      1         10 2017-05-02 11:12:00          Ekg        633164
9    abcd rtype y1.2017-06-06.01:12:00.857183.Fut      1          1 2017-02-10 10:12:00          Nog       2976649
10   abcd rtype    abcd.3.2017-02-10.15:45:20.Zqv      1          3 2017-02-10 15:45:20          Zqv       1834427

These are the rows we wanted to parse from ElasticSearch.

   There are many options in the Search function. Sending queries, selecting specific fields, sorting and other options are available. Please check the help of this function for more information.

        How to insert data into ElasticSearch

No comments:

Post a Comment