[R Programming] tidyr / dplyr

1 분 소요

tidyr / dplyr

데이터 처리 tidyr (pipe)

파이프(pipe) 개념

The pipe passes the data frame output that results from the function right before the pipe to input it as the first argument of the function right after the pipe.

파이프 (pipe) 연산자 %>%

파이프 사용하기 (piping)

파이프 %>% 사용하기
- tidyr 패키지 필요
- install.packages(“tidyr”)
- library(tidyr)

# Without piping
function(dataframe, argument_2, argument_3)

# With piping
dataframe %>%
  function(argument_2, argument_3)

📌 ext_tracks hurricane dataset : 11,824 obs.of29variables

📌 만약, 파이프를 사용하지 않으면?

For example, without piping, if you wanted to see the time, date, and maximum winds for Katrina from the first three rows of the ext_tracks hurricane data, you could run:
- In the code, you are creating new R objects at each step, which makes the code clutterd and also requires copying the data frame several times into memory

katrina <- filter(ext_tracks, storm_name == "KATRINA")
katrina_reduced <- select(katrina, month, day, hour, max_wind)
head(katrina_reduced, 3)

As an alternative, you could just wrap one function inside another:
- This aviods re-assigning the data frame at each step, but quickly becomes ungainly.

head(select(filter(ext_tracks, storm_name == "KATRINA"),
            month, day, hour, max_wind), 3)

연산자

Operators in R
- Arithmetic Operators
- Relational Operators
- Logical Operators
- Assignment Operators

Arithmetic Operators

Relational Operators

Logical Operators

Assignment Operators

데이터 프레임 처리 dplyr 패키지

Selecting Data

The select function subsets certain columns of a data frame by specifying the full column names.

exam %>%
  select(class, english)

Filtering Data

The filter function picks out certain rows.

exam %>%
  filter(class == 1)

arrange - 순서대로 정렬하기

해당 컬럼을 오름차순 혹은 내림차순으로 정렬

exam %>%
  arrange(id)             # id 오름차순으로 정렬

exam %>%
  arrnage(desc(science))  # science 내림차순으로 정렬

id 컬럼은 오름차순, science 컬럼은 내림차순으로 정렬하려면?

exam %>%
  arrange(id, desc(science))

mutate - 새로운 변수(컬럼) 추가하기

새로운 컬럼을 추가하기

exam %>%
  mutate(total = english + science)

mean (평균) 컬럼을 추가하고 english와 science의 평균을 넣어라!!

exam %>%
  mutate(mean = total/2)

test 컬럼을 추가하고 mean(평균)이 60 이상이면 “pass”, 60 미만이면 “fail”로 마킹하라!!

exam %>%
  mutate(test = ifelse(mean >= 60, "pass", "fail"))

group_by & summarise - 그룹별로 요약하기

exam %>%
  group_by(class) %>%
  summarise(english_sum = sum(english),
            english_mean = mean(english),
            english_median = median(english),
            english_sd = sd(english),
            n = n())

Twitter Facebook LinkedIn

LEE CHANWOO

[R Programming] tidyr / dplyr

tidyr / dplyr

데이터 처리 tidyr (pipe)

파이프(pipe) 개념

파이프 (pipe) 연산자 %>%

파이프 사용하기 (piping)

연산자

Arithmetic Operators

Relational Operators

Logical Operators

Assignment Operators

데이터 프레임 처리 dplyr 패키지

Selecting Data

Filtering Data

arrange - 순서대로 정렬하기

mutate - 새로운 변수(컬럼) 추가하기

group_by & summarise - 그룹별로 요약하기

공유하기

댓글남기기

참고

[Machine Learning] VAE : Encoder, Auto Encoder 그리고 Variational Auto Encoder

[논문 리뷰] Vectorizing the Trie: Efficient Constrained Decoding for LLM-based Generative Retrieval on Accelerators

[Network] 네트워크 터널링 : Cloudflare Tunnel로 로컬 환경을 안전하게 외부로 연결하기

[Machine Learning] SMOTE : 불균형 데이터 합성 샘플링 가이드