How would one change this input (with the sequence: time, in, out, files):
Time In Out Files
1 2 3 4
2 3 4 5
To this output (with the sequence: time, out, in, files)?
Time Out In Files
1 3 2 4
2 4 3 5
Here's the dummy R data:
table <- data.frame(Time=c(1,2), In=c(2,3), Out=c(3,4), Files=c(4,5))
table
## Time In Out Files
##1 1 2 3 4
##2 2 3 4 5
help(Extract)
also known as ?'['
Your dataframe has four columns like so df[,c(1,2,3,4)]
. Note the first comma means keep all the rows, and the 1,2,3,4 refers to the columns.
To change the order as in the above question do df2[,c(1,3,2,4)]
If you want to output this file as a csv, do write.csv(df2, file="somedf.csv")
# reorder by column name
data <- data[, c("A", "B", "C")] # leave the row index blank to keep all rows
#reorder by column index
data <- data[, c(1,3,2)] # leave the row index blank to keep all rows
data <- data[c(1,3,"Var1", 2)]
?
c(1,3,"Var1", 2)
will be read as c("1","3","Var1", "2")
because vectors can contain data of only one type, so types are promoted to the most general type present. Because there are no columns with the character names "1", "3", etc. you'll get "undefined columns". list(1,3,"Var1", 2)
keeps values without type promotion, but you can't use a list
in the above context.
mtcars[c(1,3,2)]
subsetting work? I would have expected an error relating to incorrect dimensions or similar... Shouldn't it be mtcars[,c(1,3,2)]
?
You can also use the subset function:
data <- subset(data, select=c(3,2,1))
You should better use the [] operator as in the other answers, but it may be useful to know that you can do a subset and a column reorder operation in a single command.
Update:
You can also use the select function from the dplyr package:
data = data %>% select(Time, out, In, Files)
I am not sure about the efficiency, but thanks to dplyr's syntax this solution should be more flexible, specially if you have a lot of columns. For example, the following will reorder the columns of the mtcars dataset in the opposite order:
mtcars %>% select(carb:mpg)
And the following will reorder only some columns, and discard others:
mtcars %>% select(mpg:disp, hp, wt, gear:qsec, starts_with('carb'))
Read more about dplyr's select syntax.
subset()
, see this question.
everything()
particularly awesome; mtcars %>% select(wt, gear, everything())
As mentioned in this comment, the standard suggestions for re-ordering columns in a data.frame
are generally cumbersome and error-prone, especially if you have a lot of columns.
This function allows to re-arrange columns by position: specify a variable name and the desired position, and don't worry about the other columns.
##arrange df vars by position
##'vars' must be a named vector, e.g. c("var.name"=1)
arrange.vars <- function(data, vars){
##stop if not a data.frame (but should work for matrices as well)
stopifnot(is.data.frame(data))
##sort out inputs
data.nms <- names(data)
var.nr <- length(data.nms)
var.nms <- names(vars)
var.pos <- vars
##sanity checks
stopifnot( !any(duplicated(var.nms)),
!any(duplicated(var.pos)) )
stopifnot( is.character(var.nms),
is.numeric(var.pos) )
stopifnot( all(var.nms %in% data.nms) )
stopifnot( all(var.pos > 0),
all(var.pos <= var.nr) )
##prepare output
out.vec <- character(var.nr)
out.vec[var.pos] <- var.nms
out.vec[-var.pos] <- data.nms[ !(data.nms %in% var.nms) ]
stopifnot( length(out.vec)==var.nr )
##re-arrange vars by position
data <- data[ , out.vec]
return(data)
}
Now the OP's request becomes as simple as this:
table <- data.frame(Time=c(1,2), In=c(2,3), Out=c(3,4), Files=c(4,5))
table
## Time In Out Files
##1 1 2 3 4
##2 2 3 4 5
arrange.vars(table, c("Out"=2))
## Time Out In Files
##1 1 3 2 4
##2 2 4 3 5
To additionally swap Time
and Files
columns you can do this:
arrange.vars(table, c("Out"=2, "Files"=1, "Time"=4))
## Files Out In Time
##1 4 3 2 1
##2 5 4 3 2
A dplyr
solution (part of the tidyverse
package set) is to use select
:
select(table, "Time", "Out", "In", "Files")
# or
select(table, Time, Out, In, Files)
select(iris, Species, everything())
. Also note that quotes are not needed.
everything()
as in PaulRougieux's comment
dplyr
's group
will also rearrange the variables, so watch out when using that in a chain.
dplyr
version 1.0.0
they added a relocate()
function that's intuitive and easy to read. It's especially helpful if you just want to add columns after or before a specific column.
dplyr
version 1.0.0
includes the relocate()
function to easily reorder columns:
dat <- data.frame(Time=c(1,2), In=c(2,3), Out=c(3,4), Files=c(4,5))
library(dplyr) # from version 1.0.0 only
dat %>%
relocate(Out, .before = In)
or
dat %>%
relocate(Out, .after = Time)
Maybe it's a coincidence that the column order you want happens to have column names in descending alphabetical order. Since that's the case you could just do:
df<-df[,order(colnames(df),decreasing=TRUE)]
That's what I use when I have large files with many columns.
!! WARNING !!
data.table
turns TARGET
into an int vector: TARGET <- TARGET[ , order(colnames(TARGET), decreasing=TRUE)]
to fix that: TARGET <- as.data.frame(TARGET)
TARGET <- TARGET[ , order(colnames(TARGET), decreasing=TRUE)]
You can use the data.table package:
How to reorder data.table columns (without copying)
require(data.table)
setcolorder(DT,myOrder)
The three top-rated answers have a weakness.
If your dataframe looks like this
df <- data.frame(Time=c(1,2), In=c(2,3), Out=c(3,4), Files=c(4,5))
> df
Time In Out Files
1 1 2 3 4
2 2 3 4 5
then it's a poor solution to use
> df2[,c(1,3,2,4)]
It does the job, but you have just introduced a dependence on the order of the columns in your input.
This style of brittle programming is to be avoided.
The explicit naming of the columns is a better solution
data[,c("Time", "Out", "In", "Files")]
Plus, if you intend to reuse your code in a more general setting, you can simply
out.column.name <- "Out"
in.column.name <- "In"
data[,c("Time", out.column.name, in.column.name, "Files")]
which is also quite nice because it fully isolates literals. By contrast, if you use dplyr's select
data <- data %>% select(Time, out, In, Files)
then you'd be setting up those who will read your code later, yourself included, for a bit of a deception. The column names are being used as literals without appearing in the code as such.
data.table::setcolorder(table, c("Out", "in", "files"))
setcolorder
from.
The only one I have seen work well is from here.
shuffle_columns <- function (invec, movecommand) {
movecommand <- lapply(strsplit(strsplit(movecommand, ";")[[1]],
",|\\s+"), function(x) x[x != ""])
movelist <- lapply(movecommand, function(x) {
Where <- x[which(x %in% c("before", "after", "first",
"last")):length(x)]
ToMove <- setdiff(x, Where)
list(ToMove, Where)
})
myVec <- invec
for (i in seq_along(movelist)) {
temp <- setdiff(myVec, movelist[[i]][[1]])
A <- movelist[[i]][[2]][1]
if (A %in% c("before", "after")) {
ba <- movelist[[i]][[2]][2]
if (A == "before") {
after <- match(ba, temp) - 1
}
else if (A == "after") {
after <- match(ba, temp)
}
}
else if (A == "first") {
after <- 0
}
else if (A == "last") {
after <- length(myVec)
}
myVec <- append(temp, values = movelist[[i]][[1]], after = after)
}
myVec
}
Use like this:
new_df <- iris[shuffle_columns(names(iris), "Sepal.Width before Sepal.Length")]
Works like a charm.
Dplyr has a function that allows you to move specific columns to before or after other columns. That is a critical tool when you work with big data frameworks (if it is 4 columns, it's faster to use select as mentioned before).
https://dplyr.tidyverse.org/reference/relocate.html
In your case, it would be:
df <- df %>% relocate(Out, .after = In)
Simple and elegant. It also allows you to move several columns together and move it to the beginning or to the end:
df <- df %>% relocate(any_of(c('ColX', 'ColY', 'ColZ')), .after = last_col())
Again: super powerful when you work with big dataframes :)
Success story sharing
df[,c(1,3,2,4:ncol(df))]
when you don't know how many columns there are.