Log in

View Full Version : Tutorial: How to make 3D PCA in RStudio



Lemgrant
07-29-2019, 11:32 PM
How to make 3D PCA in RStudio and perform hierarchical k-means clustering
https://youtu.be/_lZ_EqV-cZw

https://youtu.be/_lZ_EqV-cZw

Eurogenes K13 datasheet:
https://drive.google.com/file/d/1cDv9fq4TyQuI21Y3Sbs8p0ffwoAHewJ6/view

Script:

install.packages("randomcoloR")
install.packages("factoextra")
install.packages("rgl")
install.packages("rmarkdown", repos = "https://cran.revolutionanalytics.com")

data01 <- read.csv(file.choose(), row.names = 'Population')
pca <- princomp(data01[,1:13], cor=TRUE, scores=TRUE)
summary(pca)
library(rgl) #load the package
plot3d(pca$scores[,1:3], col="red", type="p")
text3d(pca$scores[,1:3],texts=rownames(data01),font=2)
grid3d('x')
grid3d('y')
grid3d('z')

text3d(pca$loadings[,1:3], texts=rownames(pca$loadings), col="red")
coords <- NULL
for (i in 1:nrow(pca$loadings)) {
coords <- rbind(coords, rbind(c(0,0,0),pca$loadings[i,1:3]))
}
lines3d(coords, col="red", lwd=4)

library(factoextra) #load the package
cl <- hkmeans(data01[,1:13],130) #130 clusters
data01$cluster <- as.factor(cl$cluster)
library(randomcoloR) #load the package
palette <- distinctColorPalette(130) #130 colors
palette(palette)
plot3d(pca$scores[,1:3], col=data01$cluster, main="Hierarchical k-means clusters")
text3d(pca$scores[,1:3],texts=rownames(data01),font=2, col=data01$cluster)

observer3d(0, 0, 40) # x, y, z

Kaspias
07-31-2019, 02:15 PM
I got error while running data01 command, probably something is wrong on my spreadsheet file but i couldn't find where is the problem.

Error in data[[rowvar]] : attempt to select less than one element in get1index


Row/Column format:

https://i.ibb.co/mbsbPKd/Ads-z.png

Tried with both .csv and .xlsx despite i know it should be .csv, both gave the same error.

Changed the decimal points (./,), both are not working.

Deleted the row.names(here: data01 <- read.csv(file.choose(), row.names = 'Population')) by thinking it is just a vector name and not required, it worked and read my spreadsheet. But didn't read as 13 component. See: 49 obs. of 1 variable.


Excel makes me mad.

Lemgrant
07-31-2019, 06:33 PM
I got error while running data01 command, probably something is wrong on my spreadsheet file but i couldn't find where is the problem.

Error in data[[rowvar]] : attempt to select less than one element in get1index


Row/Column format:

https://i.ibb.co/mbsbPKd/Ads-z.png

Tried with both .csv and .xlsx despite i know it should be .csv, both gave the same error.

Changed the decimal points (./,), both are not working.

Deleted the row.names(here: data01 <- read.csv(file.choose(), row.names = 'Population')) by thinking it is just a vector name and not required, it worked and read my spreadsheet. But didn't read as 13 component. See: 49 obs. of 1 variable.


Excel makes me mad.

send me your csv file to check if the error is replicated on my pc

Lemgrant
07-31-2019, 06:45 PM
.
Do you get the same error with my csv file ?
https://drive.google.com/file/d/1cDv9fq4TyQuI21Y3Sbs8p0ffwoAHewJ6/view

Lemgrant
07-31-2019, 06:58 PM
I got error while running data01 command, probably something is wrong on my spreadsheet file but i couldn't find where is the problem.

Error in data[[rowvar]] : attempt to select less than one element in get1index


Row/Column format:

https://i.ibb.co/mbsbPKd/Ads-z.png

Tried with both .csv and .xlsx despite i know it should be .csv, both gave the same error.

Changed the decimal points (./,), both are not working.

Deleted the row.names(here: data01 <- read.csv(file.choose(), row.names = 'Population')) by thinking it is just a vector name and not required, it worked and read my spreadsheet. But didn't read as 13 component. See: 49 obs. of 1 variable.


Excel makes me mad.

Your csv file is not valid, you have a semicolon in your file.

On the left is how it should be:
https://i.imgur.com/un3m1TR.png

you can fix it with replace function in notepad. Replace ; with ,

Lucas
07-31-2019, 07:02 PM
https://plot.ly/online-chart-maker/

Lemgrant
08-02-2019, 12:14 PM
>>

Komintasavalta
02-24-2021, 07:20 PM
This uses plotly which is also used by Vahaduo's 3D plot viewer. It assigns colors to samples by clustering them with hclust.

You can install R in macOS by running `brew install R`. Then run `R` and run `install.packages("plotly")`.


library(plotly)
download.file("https://drive.google.com/uc?export=download&id=1wZr-UOve0KUKo_Qbgeo27m-CQncZWb8y","modernave")
t<-read.csv("modernave",header=T,row.names=1)
k<-cutree(hclust(dist(t)),k=32)
p<-plot_ly(t,x=~PC1,y=~PC2,z=~PC3,mode="text",text=row.names(t),color=as.factor(k))
p<-layout(p,paper_bgcolor="#1f1f1f",showlegend=F)
add_trace(p,type="scatter3d")

https://i.ibb.co/KyLBnQK/a.png

The plot above took almost a minute to generate even though it only has about 600 points. You can add this to select only 50 random rows: `t<-t[sample(nrow(t),50),]`.

This plots the 128 ancient individuals closest to the Mari population average.


library(plotly)
download.file("https://drive.google.com/uc?export=download&id=1wZr-UOve0KUKo_Qbgeo27m-CQncZWb8y","modernave")
download.file("https://drive.google.com/uc?export=download&id=1UrhcfNMLW0oMXIbHGUE60v2taCM7PFw1","ancientind")
t<-read.csv("modernave",header=T,row.names=1)
t2<-read.csv("ancientind",header=T,row.names=1)
target<-t["Mansi",]
picks<-t2[names(head(sort(apply(t2,1,function(x)sqrt(sum((x-target)^2)))),128)),]
k<-cutree(hclust(dist(picks)),k=32)
p<-plot_ly(picks,x=~PC1,y=~PC2,z=~PC3,mode="text",text=row.names(picks),color=as.factor(k))
p<-layout(p,paper_bgcolor="#1f1f1f",showlegend=F)
add_trace(p,type="scatter3d")

https://i.ibb.co/0BB7n0S/b.png

(I think Munkhkhairkhan sounds very cool, and it's one of the usernames I thought of registering after I got banned. It means "eternal holy mountain" in Mongolian.)

This plots users from Feiichy's phenotype calculator thread (https://www.theapricity.com/forum/showthread.php?339081-Calculate-your-phenotype!-(for-people-inside-Euro-phenotype-spectrum)):


library(plotly);library(dplyr)
download.file("https://pastebin.com/raw/7nc9hNyD","feiichyphenotypecalculator")
t<-read.csv("feiichyphenotypecalculator",header=T,row.names=1,check.names=F)
p<-prcomp(t)
k<-cutree(hclust(dist(p$x),method="ward.D2"),k=4)
plot_ly(as.data.frame(p$x),x=~PC1,y=~PC2,z=~PC3,mo de="text",text=row.names(as.data.frame(p$x)),color=as.facto r(k))%>%
layout(paper_bgcolor="#1f1f1f",showlegend=F)%>%add_trace(type="scatter3d")

https://i.ibb.co/2YfTCxv/c.png