DataTechNotes: Inserting data into ElasticSearch with R

Data can be inserted into ElasticSearch with R.
In this post, I will show how to insert data into ElasticSearch with R. I assume that ElasticSearch and Kibana are already installed on your machine.
We need 'RCurl', 'jsonlite' packages and you may need to install them.

> library(RCurl)
> library(jsonlite)

First, we generate sample data.

n <- 10
id <- seq(1, n, 1)
timestamp <- seq(as.POSIXct("2017/05/01 10:12"), by = 10000, len = n)
item <- paste(sample(LETTERS, n, replace = T),
sample(letters, n, replace = T),
sample(letters, n, replace = T), sep = "")
value <- sample(10000000, n)

dataframe <- data.frame(id = id, time = timestamp, item = item, value = value)

Our sample data looks as below,

> head(dataframe)
  id                time item   value
1  1 2017-05-01 10:12:00  Wsl 1315126
2  2 2017-05-01 12:58:40  Eya 3025197
3  3 2017-05-01 15:45:20  Del 5796777
4  4 2017-05-01 18:32:00  Etv 4884738
5  5 2017-05-01 21:18:40  Vtl 3875728
6  6 2017-05-02 00:05:20  Bbv 1912280

We insert data into ElasticSearch with httpPUT() function. httpPUT() uses JSON data, so data needs to be converted into JSON. Below method helps us to convert data into JSON type.

rowdata_to_json <- function(row_dat)
{
dat_to_json <- jsonlite::toJSON(row_dat, pretty = TRUE)
dat_to_json <- substr(dat_to_json, start = 2, nchar(dat_to_json) - 1) # cleaning up json data
return(dat_to_json)
}

The outcome of rowdata_to_json() method is

> rowdata_to_json(dataframe[1,])

  {
    "id": 1,
    "time": "2017-05-01 10:12:00",
    "item": "Wsl",
    "value": 1315126
  }

We need several definitions for our data such as index_name, type_name, and id.

index_name <- "abcd" # used for indexing pattern
type_name<- "rtype" # ElasticSearch record property
es_path<- paste("localhost", "9200", sep = ":")

es_path defines ElasticSearch URL path. You may change it into your own server name, the port remains unchanged.

We insert data frame data in a loop as below,

for(id in 1:nrow(dataframe))
{
# here, I use id, time, and item to create index
time <- gsub(" ", ".", strftime(dataframe[id, "time"], format = "%Y-%m-%d %H:%M:%S"))
var_id <- paste(index_name, id, time, dataframe[id, "item"], sep = ".")

# convert data frame row into json data
json_data<- rowdata_to_json(dataframe[id,])

# insert json data into ElasticSearch
httpPUT(paste(es_path, index_name, type_name, var_id, sep = "/"), json_data)
}

Here, an httpPUT() method sends URL and JSON data to the target server with a path to save it. We run the script and check the result in Kibana.

Data was inserted into ElasticSearch, and the first row of data looks as above.
   Since we are inserting row by row in the above script, it takes too much time to write large data frame into the server. Thus, consider applying another method when you work with large data frames.

   Thank you for reading!

    How to parse data from ElasticSearch

2 comments:

UnknownFebruary 28, 2018 at 1:01 AM
This is a comprehensive and helpful piece of information. Here you are explained how Inserting data into ElasticSearch with R details are very useful. Keep the sharing regarding Elasticsearch. we are expecting more these kind of helpful posts from you.

Pages

Inserting data into ElasticSearch with R

2 comments: