ChatGPT解决这个技术问题 Extra ChatGPT

How does one reorder columns in a data frame?

How would one change this input (with the sequence: time, in, out, files):

Time   In    Out  Files
1      2     3    4
2      3     4    5

To this output (with the sequence: time, out, in, files)?

Time   Out   In  Files
1      3     2    4
2      4     3    5

Here's the dummy R data:

table <- data.frame(Time=c(1,2), In=c(2,3), Out=c(3,4), Files=c(4,5))
table
##  Time In Out Files
##1    1  2   3     4
##2    2  3   4     5
help(Extract) also known as ?'['
In addition to @Joris's suggesting, Try reading sections 2.7 and section 5 of the "An Introduction to R" manual: cran.r-project.org/doc/manuals/R-intro.html
One additional issue: all the answers require the full list of columns, otherwise they result in subsetting. What if we only want to list a few columns to be ordered as the first ones, but also retaining all the others?

B
Braiam

Your dataframe has four columns like so df[,c(1,2,3,4)]. Note the first comma means keep all the rows, and the 1,2,3,4 refers to the columns.

To change the order as in the above question do df2[,c(1,3,2,4)]

If you want to output this file as a csv, do write.csv(df2, file="somedf.csv")


This is ok when you have a limited number of columns, but what if you have for example 50 columns, it would take too much time to type all column numbers or names. What would be a quicker solution?
@user4050: in that case you can use the ":" syntax, e.g. df[,c(1,3,2,4,5:50)].
to put the columns in idcols at the start: idcols <- c("name", "id2", "start", "duration"); cols <- c(idcols, names(cts)[-which(names(cts) %in% idcols)]); df <- df[cols]
@user4050: you can also use df[,c(1,3,2,4:ncol(df))] when you don't know how many columns there are.
You can also use dput(colnames(df)), it prints column names in R character format. You can then rearrange the names.
s
salac33
# reorder by column name
data <- data[, c("A", "B", "C")] # leave the row index blank to keep all rows

#reorder by column index
data <- data[, c(1,3,2)] # leave the row index blank to keep all rows

Question as a beginner, can you combine ordering by index and by name? E.g. data <- data[c(1,3,"Var1", 2)]?
@BramVanroy nope, c(1,3,"Var1", 2) will be read as c("1","3","Var1", "2") because vectors can contain data of only one type, so types are promoted to the most general type present. Because there are no columns with the character names "1", "3", etc. you'll get "undefined columns". list(1,3,"Var1", 2) keeps values without type promotion, but you can't use a list in the above context.
Why does the mtcars[c(1,3,2)] subsetting work? I would have expected an error relating to incorrect dimensions or similar... Shouldn't it be mtcars[,c(1,3,2)]?
data.frames are lists under the hood with columns as first order items
g
guyabel

You can also use the subset function:

data <- subset(data, select=c(3,2,1))

You should better use the [] operator as in the other answers, but it may be useful to know that you can do a subset and a column reorder operation in a single command.

Update:

You can also use the select function from the dplyr package:

data = data %>% select(Time, out, In, Files)

I am not sure about the efficiency, but thanks to dplyr's syntax this solution should be more flexible, specially if you have a lot of columns. For example, the following will reorder the columns of the mtcars dataset in the opposite order:

mtcars %>% select(carb:mpg)

And the following will reorder only some columns, and discard others:

mtcars %>% select(mpg:disp, hp, wt, gear:qsec, starts_with('carb'))

Read more about dplyr's select syntax.


There are some reasons not to use subset(), see this question.
Thank you. In any case I would now use the select function from the dplyr package, instead of subset.
When you want to bring a couple of columns to the left hand side and not drop the others, I find everything() particularly awesome; mtcars %>% select(wt, gear, everything())
Here is another way to use the everything() select_helper function to rearrange the columns to the right/end. stackoverflow.com/a/44353144/4663008 github.com/tidyverse/dplyr/issues/2838 Seems like you will need to use 2 select()'s to move some columns to the right end and others to the left.
new function dplyr::relocate is exactly for this. see H 1 's answer below
C
Community

As mentioned in this comment, the standard suggestions for re-ordering columns in a data.frame are generally cumbersome and error-prone, especially if you have a lot of columns.

This function allows to re-arrange columns by position: specify a variable name and the desired position, and don't worry about the other columns.

##arrange df vars by position
##'vars' must be a named vector, e.g. c("var.name"=1)
arrange.vars <- function(data, vars){
    ##stop if not a data.frame (but should work for matrices as well)
    stopifnot(is.data.frame(data))

    ##sort out inputs
    data.nms <- names(data)
    var.nr <- length(data.nms)
    var.nms <- names(vars)
    var.pos <- vars
    ##sanity checks
    stopifnot( !any(duplicated(var.nms)), 
               !any(duplicated(var.pos)) )
    stopifnot( is.character(var.nms), 
               is.numeric(var.pos) )
    stopifnot( all(var.nms %in% data.nms) )
    stopifnot( all(var.pos > 0), 
               all(var.pos <= var.nr) )

    ##prepare output
    out.vec <- character(var.nr)
    out.vec[var.pos] <- var.nms
    out.vec[-var.pos] <- data.nms[ !(data.nms %in% var.nms) ]
    stopifnot( length(out.vec)==var.nr )

    ##re-arrange vars by position
    data <- data[ , out.vec]
    return(data)
}

Now the OP's request becomes as simple as this:

table <- data.frame(Time=c(1,2), In=c(2,3), Out=c(3,4), Files=c(4,5))
table
##  Time In Out Files
##1    1  2   3     4
##2    2  3   4     5

arrange.vars(table, c("Out"=2))
##  Time Out In Files
##1    1   3  2     4
##2    2   4  3     5

To additionally swap Time and Files columns you can do this:

arrange.vars(table, c("Out"=2, "Files"=1, "Time"=4))
##  Files Out In Time
##1     4   3  2    1
##2     5   4  3    2

Very nice function. I added a modified version of this function to my personal package.
This is really useful - it's going to save me a lot of time when I just want to move one column from the end of a really wide tibble to the beginning
D
David Tonhofer

A dplyr solution (part of the tidyverse package set) is to use select:

select(table, "Time", "Out", "In", "Files") 

# or

select(table, Time, Out, In, Files)

The best option for me. Even if I had to install it, it is clearly the clearest possibility.
Tidyverse (dplyr in fact) also has the option to select groups of columns, for example to move the Species variable to the front: select(iris, Species, everything()). Also note that quotes are not needed.
It's important to note that this will drop all columns which are not explicitly specified unless you include everything() as in PaulRougieux's comment
dplyr's group will also rearrange the variables, so watch out when using that in a chain.
As of dplyr version 1.0.0 they added a relocate() function that's intuitive and easy to read. It's especially helpful if you just want to add columns after or before a specific column.
R
Ritchie Sacramento

dplyr version 1.0.0 includes the relocate() function to easily reorder columns:

dat <- data.frame(Time=c(1,2), In=c(2,3), Out=c(3,4), Files=c(4,5))

library(dplyr) # from version 1.0.0 only

dat %>%
  relocate(Out, .before = In)

or

dat %>%
  relocate(Out, .after = Time)

That's a very neat solution. Thanks!
This is probably the most flexible and simple solution. Thanks!
u
user3482899

Maybe it's a coincidence that the column order you want happens to have column names in descending alphabetical order. Since that's the case you could just do:

df<-df[,order(colnames(df),decreasing=TRUE)]

That's what I use when I have large files with many columns.


!! WARNING !! data.table turns TARGET into an int vector: TARGET <- TARGET[ , order(colnames(TARGET), decreasing=TRUE)] to fix that: TARGET <- as.data.frame(TARGET) TARGET <- TARGET[ , order(colnames(TARGET), decreasing=TRUE)]
a
andschar

You can use the data.table package:

How to reorder data.table columns (without copying)

require(data.table)
setcolorder(DT,myOrder)

V
Vrokipal

The three top-rated answers have a weakness.

If your dataframe looks like this

df <- data.frame(Time=c(1,2), In=c(2,3), Out=c(3,4), Files=c(4,5))

> df
  Time In Out Files
1    1  2   3     4
2    2  3   4     5

then it's a poor solution to use

> df2[,c(1,3,2,4)]

It does the job, but you have just introduced a dependence on the order of the columns in your input.

This style of brittle programming is to be avoided.

The explicit naming of the columns is a better solution

data[,c("Time", "Out", "In", "Files")]

Plus, if you intend to reuse your code in a more general setting, you can simply

out.column.name <- "Out"
in.column.name <- "In"
data[,c("Time", out.column.name, in.column.name, "Files")]

which is also quite nice because it fully isolates literals. By contrast, if you use dplyr's select

data <- data %>% select(Time, out, In, Files)

then you'd be setting up those who will read your code later, yourself included, for a bit of a deception. The column names are being used as literals without appearing in the code as such.


H
Hossein Noorazar
data.table::setcolorder(table, c("Out", "in", "files"))

pls state the library you take the function setcolorder from.
C
Cybernetic

The only one I have seen work well is from here.

 shuffle_columns <- function (invec, movecommand) {
      movecommand <- lapply(strsplit(strsplit(movecommand, ";")[[1]],
                                 ",|\\s+"), function(x) x[x != ""])
  movelist <- lapply(movecommand, function(x) {
    Where <- x[which(x %in% c("before", "after", "first",
                              "last")):length(x)]
    ToMove <- setdiff(x, Where)
    list(ToMove, Where)
  })
  myVec <- invec
  for (i in seq_along(movelist)) {
    temp <- setdiff(myVec, movelist[[i]][[1]])
    A <- movelist[[i]][[2]][1]
    if (A %in% c("before", "after")) {
      ba <- movelist[[i]][[2]][2]
      if (A == "before") {
        after <- match(ba, temp) - 1
      }
      else if (A == "after") {
        after <- match(ba, temp)
      }
    }
    else if (A == "first") {
      after <- 0
    }
    else if (A == "last") {
      after <- length(myVec)
    }
    myVec <- append(temp, values = movelist[[i]][[1]], after = after)
  }
  myVec
}

Use like this:

new_df <- iris[shuffle_columns(names(iris), "Sepal.Width before Sepal.Length")]

Works like a charm.


P
Pau

Dplyr has a function that allows you to move specific columns to before or after other columns. That is a critical tool when you work with big data frameworks (if it is 4 columns, it's faster to use select as mentioned before).

https://dplyr.tidyverse.org/reference/relocate.html

In your case, it would be:

df <- df %>% relocate(Out, .after = In)

Simple and elegant. It also allows you to move several columns together and move it to the beginning or to the end:

df <- df %>% relocate(any_of(c('ColX', 'ColY', 'ColZ')), .after = last_col())

Again: super powerful when you work with big dataframes :)