barcode

R 배워보기- 6.2. Manipulating data-Factors

Coding/R

R의 데이터에는 벡터와 팩터가 있다. 그리고 숫자벡터-문자벡터-팩터간에 변환이 가능하다. 어쨌든 가능함. 


팩터란 무엇인가 

뮤츠씨가 좋아하는거 그건 팩트고 아무튼 벡터와 달리 팩터를 단식으로 뽑게 되면 한 가지 요소가 더 나오게 된다. 그것이 바로 '레벨'이다. 

 

> v=factor(c("A","B","C","D","E","F"))
> v
[1] A B C D E F
Levels: A B C D E F
> w=factor(c("35S Promoter","pHellsgate","pStargate","pWatergate","pHellsgate"))
> w
[1] 35S Promoter pHellsgate   pStargate    pWatergate   pHellsgate  
Levels: 35S Promoter pHellsgate pStargate pWatergate

팩터는 안에 들어있는 요소들로 레벨을 결정한다. 이 때 중복되는 원소는 레벨링에서 빠진다. 선생님 질문있는데 헬게이트 앞에 왜 p가 있나요 그거 벡터임 예? 벡터라고 왜죠 그건 모르는디? 

 

참고: https://www.addgene.org/vector-database/5976/

 

Addgene: Vector Database - pHELLSGATE 12

This vector is NOT available from Addgene.

www.addgene.org

진짜로 있는 벡터임

 

https://www.csiro.au/en/work-with-us/services/sample-procurement/rnai-material-transfer-agreement

 

Hairpin RNAi vectors for plants - Material Transfer Agreement - CSIRO

CSIRO acknowledges the Traditional Owners of the land, sea and waters, of the area that we live and work on across Australia. We acknowledge their continuing connection to their culture and pay our respects to their Elders past and present. View our vision

www.csiro.au

스타게이트랑 워터게이트는 벡터 DB에는 없고 관련된 논문(이거 썼다 이런 논문)이 나오는데 셋 다 RNAi 관련된 transformation 벡터임다. 

그니까 내 포켓몬 파티가 난천을 다 깨고 나니 레벨이 60, 61, 59, 60, 62, 60이면 이걸 팩터로 만들었을 때 레벨은 59, 60, 61, 62가 된다. 저기 62짜리는 뭐죠 한카리아스 잡았나요 킹갓키스가 원탑플레이 했나보지 

 

레벨 바꾸기

팩터 레벨이 파이썬 튜플마냥 불변이 아니다. 그래서 바꿀 수 있는데... 여기서도 plyr 라이브러리가 쓰인다. 

 

> library(plyr)
> revalue(v,c("A"="A+"))
[1] A+ B  C  D  E  F 
Levels: A+ B C D E F
> mapvalues(w,from=c("35S Promoter"),to=c("pH2GW7"))
[1] pH2GW7     pHellsgate pStargate  pWatergate pHellsgate
Levels: pH2GW7 pHellsgate pStargate pWatergate

plyr 라이브러리를 쓰게 되면 revalue()와 mapvalues()를 쓰면 된다. 

 

> levels(v)[levels(v)=="B"]="B+"
> v
[1] A  B+ C  D  E  F 
Levels: A B+ C D E F
> levels(v)[1]="A+"
> v
[1] A+ B+ C  D  E  F 
Levels: A+ B+ C D E F
> levels(v)=c("A+","B+","C+","D+","E","F")
> v
[1] A+ B+ C+ D+ E  F 
Levels: A+ B+ C+ D+ E F

라이브러리 없이도 이런 방식으로 바꿀 수 있다. 팩터 레벨은 벡터를 써서 뭉텅이로 바꾸는 게 된다. 

 

sub()과 gsub()

사실 둘이 뭔 차인지 나도 모름... 그래서 해봤음. 

 

> v=factor(c("alpha","beta","gamma","alpha","beta"))
> levels(v)=sub("^alpha$","psi",levels(v))
> v
[1] psi   beta  gamma psi   beta 
Levels: psi beta gamma
> levels(v)=sub("m","M",levels(v))
> v
[1] pI    beta  gaMma pI    beta 
Levels: pI beta gaMma

sub()은 전체 원소를 치환해주는 건 같지만, 특정 원소의 특정 글자를 치환할 때 첫 글자만 바꿔준다. 

 

> levels(v)=gsub("psi","pi",levels(v))
> v
[1] pi    beta  gamma pi    beta 
Levels: pi beta gamma
> levels(v)=gsub("m","M",levels(v))
> v
[1] alpha beta  gaMMa alpha beta 
Levels: alpha beta gaMMa

gsub()은 sub()과 달리 특정 원소의 모든 특정 글자를 바꿔준다. 

 

레벨 조정-없는 레벨 지우기

> v=factor(c("grass","water","grass"),levels=c("grass","water","fire"))
> v
[1] grass water grass
Levels: grass water fire

여기 팩터가 있다. 근데 요소들을 보다 보니... 레벨은 있는데 요소가 없는 게 있네? 

> v=factor(v)
> v
[1] grass water grass
Levels: grass water

그래서 지워드렸습니다. 

 

> x=factor(c("A","B","A"),levels=c("A","B","C"))
> y=c(1,2,3)
> z=factor(c("R","G","G"),levels=c("R","G","B"))
> df=data.frame(x,y,z)
> df
  x y z
1 A 1 R
2 B 2 G
3 A 3 G

팩터도 당연히 데이터프레임이 된다. 데이터프레임으로 만들 경우, 데이터프레임으로 출력할 때는 그냥 표로 나오게 되지만 

> df$x
[1] A B A
Levels: A B C

여기서 팩터 단식으로 불러내면 이렇게 레벨이 나온다. 

그런데 여기서도 요소가 없는 레벨이 있다? 

> df=droplevels(df)
> df
  x y z
1 A 1 R
2 B 2 G
3 A 3 G
> df$x
[1] A B A
Levels: A B

dropevels()를 쓰면 지워진다. 

 

레벨 조정-순서 조정 

> v=factor(c("S","M","L","XL","M","L"))
> v
[1] S  M  L  XL M  L 
Levels: L M S XL

퍼 펌킨인쨔응 하악하악 뭐여 아무튼... 호바귀와 펌킨인쨔응은 사이즈라는 개념이 있다. 근데 레벨을 보면... 저 순서가 아니다. 알파벳순 아니라고...OTL 

> v=factor(v,levels=c("S","M","L","XL"))
> v
[1] S  M  L  XL M  L 
Levels: S M L XL

그래서 바꿔드렸습니다^^ 

이거 말고도 방법은 많다. 

> w=factor(c("pokeball","superball","ultraball","masterball","pokeball"))
> w
[1] pokeball   superball  ultraball  masterball pokeball  
Levels: masterball pokeball superball ultraball
> w=ordered(c("pokeball","superball","ultraball","masterball"))
> w
[1] pokeball   superball  ultraball  masterball
Levels: masterball < pokeball < superball < ultraball

ordered()를 써서 정렬하던가... 근데 이걸 이렇게 정렬하면 안되는데...? 

 

> w=factor(c("pokeball","superball","ultraball","pokeball","masterball"),levels=c("pokeball","superball","ultraball","masterball"))
> w
[1] pokeball   superball  ultraball  pokeball   masterball
Levels: pokeball superball ultraball masterball

만들 때 순서를 아예 정하던가... 

 

> w=factor(c("pokeball","superball","ultraball","masterball","pokeball"))
> w=relevel(w,"masterball")
> w
[1] pokeball   superball  ultraball  masterball pokeball  
Levels: masterball pokeball superball ultraball
> w=relevel(w,"pokeball")
> w
[1] pokeball   superball  ultraball  masterball pokeball  
Levels: pokeball masterball superball ultraball

순서가 다 좋은데 앞에 딱 하나만 걸려 그러면 relevel()로 그 걸리는 걸 앞으로 빼버리면 된다. 

 

> x=factor(w,levels=rev(levels(w)))
> x
[1] pokeball   superball  ultraball  pokeball   masterball
Levels: masterball ultraball superball pokeball

아예 거꾸로 하고 싶을 때는 rev()를 쓰면 된다.