[R Programming] tidyr / dplyr
tidyr / dplyr
데이터 처리 tidyr (pipe)
파이프(pipe) 개념
- The pipe passes the data frame output that results from the function right before the pipe to input it as the first argument of the function right after the pipe.
파이프 (pipe) 연산자 %>%
파이프 사용하기 (piping)
- 파이프 %>% 사용하기
- tidyr 패키지 필요
- install.packages(“tidyr”)
- library(tidyr)
# Without piping
function(dataframe, argument_2, argument_3)
# With piping
dataframe %>%
function(argument_2, argument_3)
📌 ext_tracks hurricane dataset : 11,824 obs.of29variables
📌 만약, 파이프를 사용하지 않으면?
- For example, without piping, if you wanted to see the time, date, and maximum winds for Katrina from the first three rows of the ext_tracks hurricane data, you could run:
- In the code, you are creating new R objects at each step, which makes the code clutterd and also requires copying the data frame several times into memory
katrina <- filter(ext_tracks, storm_name == "KATRINA")
katrina_reduced <- select(katrina, month, day, hour, max_wind)
head(katrina_reduced, 3)
- As an alternative, you could just wrap one function inside another:
- This aviods re-assigning the data frame at each step, but quickly becomes ungainly.
head(select(filter(ext_tracks, storm_name == "KATRINA"),
month, day, hour, max_wind), 3)
연산자
- Operators in R
- Arithmetic Operators
- Relational Operators
- Logical Operators
- Assignment Operators
Arithmetic Operators
Relational Operators
Logical Operators
Assignment Operators
데이터 프레임 처리 dplyr 패키지
Selecting Data
- The select function subsets certain columns of a data frame by specifying the full column names.
exam %>%
select(class, english)
Filtering Data
- The filter function picks out certain rows.
exam %>%
filter(class == 1)
arrange - 순서대로 정렬하기
- 해당 컬럼을 오름차순 혹은 내림차순으로 정렬
exam %>%
arrange(id) # id 오름차순으로 정렬
exam %>%
arrnage(desc(science)) # science 내림차순으로 정렬
- id 컬럼은 오름차순, science 컬럼은 내림차순으로 정렬하려면?
exam %>%
arrange(id, desc(science))
mutate - 새로운 변수(컬럼) 추가하기
- 새로운 컬럼을 추가하기
exam %>%
mutate(total = english + science)
- mean (평균) 컬럼을 추가하고 english와 science의 평균을 넣어라!!
exam %>%
mutate(mean = total/2)
- test 컬럼을 추가하고 mean(평균)이 60 이상이면 “pass”, 60 미만이면 “fail”로 마킹하라!!
exam %>%
mutate(test = ifelse(mean >= 60, "pass", "fail"))
group_by & summarise - 그룹별로 요약하기
exam %>%
group_by(class) %>%
summarise(english_sum = sum(english),
english_mean = mean(english),
english_median = median(english),
english_sd = sd(english),
n = n())
댓글남기기