Welcome to WuJiGu Developer Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
278 views
in Technique[技术] by (71.8m points)

dataframe - Dropping Columns of Specific Name in R

I'm working, in RStudio, with data for patients that are either normal, have Crohn's disease, or ulcerative colitis. Now, the data is structured in such a way that patient information is in a separate data frame (called sampleInfo), and the data I want to use for analysis is in a different data frame (called expressionData). For my analysis, I would like to remove the patients that are 'normal' from the dataset and only keep those with Crohn's disease or ulcerative colitis.

So, what I did was first run the following command to make a new data frame from sampleInfo containing all the patients (aka rows) with the normal disease state, using the following command:
bad_patients <- sampleInfo[sampleInfo$characteristics_ch1.3 == "disease state: normal", ]

bad_patients has a column called geoaccession, which contains the patient ID, which also corresponds with the column names for the same patient in expressionData. I save the names of these IDs using
patient_names <- bad_patients$geo_accession.

Now, I want to remove the columns with these names from expressionData. I looked at a lot of different StackOverflow posts, as well as posts on the R help forum, and found two main ways, both of which I have tried. The first is done with the following command:
newDataFrame <- expressionData[ , !names(expressionData) %in% patient_names]
Though this method does produce a new matrix called newDataFrame, attempting to view this matrix in RStudio gives the following error:
Error in View : 'names' attribute [1] must be the same length as the vector [0]

I also tried a second subset method with the following command:
newDataFrame <- subset(expressionData, -patient_names)
which raises the error: Error in -patient_names : invalid argument to unary operator
I also tried this subset method by explicity typing out the columns I wanted to remove as follows:
newDataFrame <- subset(expressionData, -c('ID090190', ...) (where ... corresponds to the rest of the IDs) and got the same exact error.

Can someone tell me what I'm doing wrong, or how to work around this?

question from:https://stackoverflow.com/questions/65874486/dropping-columns-of-specific-name-in-r

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

Couple of solutions:

Subsetting based on names

newDataFrame <- expressionData[!(names(expressionData) %in% patient_names)]

One problem with your attempt was that you hadn't wrapped the whole expression evaluated by ! in parentheses. As it was, you were looking for !names(expressionData) in patient_names. ! here would coerce names(expressionData) into a logical and likely return a vector full of FALSEs

I've subset with only one dimension (x[this] rather than x[,this]). You can do this with the columns of data frames because a data frame is a list of its columns. This subsetting method preserves the data.frame class of the returned object, whereas the two-dimensional subset will just return a vector if you select only one column. (Tibbles will return a tibble with both methods, which is one big advantage of tibbles)

Tidyverse solution: use dplyr::select with dplyr::all_of

newDataFrame <- dplyr::select(expressionData, -dplyr::all_of(patientnames))

Edit: Make sure your data really is a data.frame

If you're getting this error Error in UseMethod("select_") : no applicable method for 'select_' applied to an object of class "c('matrix', 'array', 'double', 'numeric')", it's because your data is a matrix, rather than a data frame. You may have inadvertently coerced it in processing.

Use as.data.frame to return to a data frame object, which will be compabtible with the methods above. If you wish to keep your data as a matrix, use colnames:

expressionData[ , !(colnames(expressionData) %in% patient_names)] to subset the columns.

If expressionData is a matrix, you'll need to subset the columns with colnames, rather than names. The names of a data.frame are identical to its colnames (because a df is a list of its columns), but the names of a matrix are the names of every element in the matrix, because a matrix is just an array with dimensionality. You'll want to check colnames(expressionData) to make sure that there are colnames to subset.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to WuJiGu Developer Q&A Community for programmer and developer-Open, Learning and Share
...