Csv data is very common data type and we may often deal with it during our data analysis. R can easily read/write/process csv data. In this post I will show some simple steps with working csv data.
Below is a content of data.csv file that we use in this topic. You may copy and save it for your test.
id,name,score,datetime
101,Aaaa,78,2017/05/25 10:05
102,Bbbb,84,2017/05/26 11:15
103,Cccc,89,2017/05/25 11:35
104,Dddd,,2017/05/25 10:00
105,Eeee,90,2017/05/26 11:20
read.csv function loads csv data into dataframe.
> data=read.csv("data.csv")
> data
id name score datetime
1 101 Aaaa 78 2017/05/25 10:05
2 102 Bbbb 84 2017/05/26 11:15
3 103 Cccc 89 2017/05/25 11:35
4 104 Dddd NA 2017/05/25 10:00
5 105 Eeee 90 2017/05/26 11:20
|
|
|
NA conversion
Empty spaces are shown as NA in R dataframe. In 4th row empty score value is shown as NA.
To do some operation with data (e.g. sum, mean) NA should be avoided. na.omit() function can help to solve this problem.
> na.omit(data)
id name score datetime
1 101 Aaaa 78 2017/05/25 10:05
2 102 Bbbb 84 2017/05/26 11:15
3 103 Cccc 89 2017/05/25 11:35
5 105 Eeee 90 2017/05/26 11:20
|
|
|
or complete.case() also can be used.
> data[complete.cases(data),]
id name score datetime
1 101 Aaaa 78 2017/05/25 10:05
2 102 Bbbb 84 2017/05/26 11:15
3 103 Cccc 89 2017/05/25 11:35
5 105 Eeee 90 2017/05/26 11:20
|
|
|
There are cases we may want to change NA into certain number, for example 0.
NA can be changed to 0 as a following way.
> data[is.na(data)]=0
> data
id name score datetime
1 101 Aaaa 78 2017/05/25 10:05
2 102 Bbbb 84 2017/05/26 11:15
3 103 Cccc 89 2017/05/25 11:35
4 104 Dddd 0 2017/05/25 10:00
5 105 Eeee 90 2017/05/26 11:20
|
|
|
Datetime conversion
When csv loaded from file, date (timestamp) values are not parsed yet as an R date/time type.
The following command converts datetime column into R date/time type.
> data$datetime=as.POSIXct(data$datetime,format="%Y/%m/%d %H:%M")
|
|
|
Now we can do time related operation with datetime column data.
> data$datetime # origanal data
[1] "2017-05-25 10:05:00 JST" "2017-05-26 11:15:00 JST" "2017-05-25 11:35:00 JST"
[4] "2017-05-25 10:00:00 JST" "2017-05-26 11:20:00 JST"
|
|
|
> data[,"datetime"]+60*60 # add 1 hour into datetime data
[1] "2017-05-25 11:05:00 JST" "2017-05-26 12:15:00 JST" "2017-05-25 12:35:00 JST"
[4] "2017-05-25 11:00:00 JST" "2017-05-26 12:20:00 JST"
|
|
|
Append another csv file
> data2=read.csv("data2.csv")
> data2
id name score datetime
1 112 Xxxx 81 2017-05-27 10:05
2 112 Yyyy 84 2017-05-27 11:15
|
|
|
> data=rbind(data,data2)
> data
id name score datetime
1 101 Aaaa 78 2017-05-25 10:05:00
2 102 Bbbb 84 2017-05-26 11:15:00
3 103 Cccc 89 2017-05-25 11:35:00
4 104 Dddd 0 2017-05-25 10:00:00
5 105 Eeee 90 2017-05-26 11:20:00
6 112 Xxxx 81 2017-05-27 10:05:00
7 112 Yyyy 84 2017-05-27 11:15:00
|
|
|
No comments:
Post a Comment