Working with csv data

Csv data is very common data type and we may often deal with it during our data analysis. R can easily read/write/process csv data. In this post I will show some simple steps with working csv data.
Below is a content of data.csv file that we use in this topic. You may copy and save it for your test.
id,name,score,datetime
101,Aaaa,78,2017/05/25 10:05
102,Bbbb,84,2017/05/26 11:15
103,Cccc,89,2017/05/25 11:35
104,Dddd,,2017/05/25 10:00
105,Eeee,90,2017/05/26 11:20



read.csv function loads csv data into dataframe.
> data=read.csv("data.csv")
> data
   id name score         datetime
1 101 Aaaa    78 2017/05/25 10:05
2 102 Bbbb    84 2017/05/26 11:15
3 103 Cccc    89 2017/05/25 11:35
4 104 Dddd    NA 2017/05/25 10:00
5 105 Eeee    90 2017/05/26 11:20

NA conversion

Empty spaces are shown as NA in R dataframe. In 4th row empty score value is shown as NA.
To do some operation with data (e.g. sum, mean) NA should be avoided. na.omit() function can help to solve this problem.
> na.omit(data)
   id name score         datetime
1 101 Aaaa    78 2017/05/25 10:05
2 102 Bbbb    84 2017/05/26 11:15
3 103 Cccc    89 2017/05/25 11:35
5 105 Eeee    90 2017/05/26 11:20
 or complete.case() also can be used.
> data[complete.cases(data),]
   id name score         datetime
1 101 Aaaa    78 2017/05/25 10:05
2 102 Bbbb    84 2017/05/26 11:15
3 103 Cccc    89 2017/05/25 11:35
5 105 Eeee    90 2017/05/26 11:20

There are cases we may want to change NA into certain number, for example 0.
NA can be changed to 0 as a following way.
> data[is.na(data)]=0
> data
   id name score         datetime
1 101 Aaaa    78 2017/05/25 10:05
2 102 Bbbb    84 2017/05/26 11:15
3 103 Cccc    89 2017/05/25 11:35
4 104 Dddd     0 2017/05/25 10:00
5 105 Eeee    90 2017/05/26 11:20


Datetime conversion

When csv loaded from file, date (timestamp) values are not parsed yet as an R date/time type.
The following command converts datetime column into R date/time type.
> data$datetime=as.POSIXct(data$datetime,format="%Y/%m/%d %H:%M")
 Now we can do time related operation with datetime column data.
> data$datetime             # origanal data 
[1] "2017-05-25 10:05:00 JST" "2017-05-26 11:15:00 JST" "2017-05-25 11:35:00 JST"
[4] "2017-05-25 10:00:00 JST" "2017-05-26 11:20:00 JST"
> data[,"datetime"]+60*60   # add 1 hour into datetime data
[1] "2017-05-25 11:05:00 JST" "2017-05-26 12:15:00 JST" "2017-05-25 12:35:00 JST"
[4] "2017-05-25 11:00:00 JST" "2017-05-26 12:20:00 JST"

Append another csv file

> data2=read.csv("data2.csv")
> data2
   id name score         datetime
1 112 Xxxx    81 2017-05-27 10:05
2 112 Yyyy    84 2017-05-27 11:15
> data=rbind(data,data2)
> data
   id name score            datetime
1 101 Aaaa    78 2017-05-25 10:05:00
2 102 Bbbb    84 2017-05-26 11:15:00
3 103 Cccc    89 2017-05-25 11:35:00
4 104 Dddd     0 2017-05-25 10:00:00
5 105 Eeee    90 2017-05-26 11:20:00
6 112 Xxxx    81 2017-05-27 10:05:00
7 112 Yyyy    84 2017-05-27 11:15:00


No comments:

Post a Comment