ChatGPT解决这个技术问题 Extra ChatGPT

Reorder levels of a factor without changing order of values

I have data frame with some numerical variables and some categorical factor variables. The order of levels for those factors is not the way I want them to be.

numbers <- 1:4
letters <- factor(c("a", "b", "c", "d"))
df <- data.frame(numbers, letters)
df
#   numbers letters
# 1       1       a
# 2       2       b
# 3       3       c
# 4       4       d

If I change the order of the levels, the letters no longer are with their corresponding numbers (my data is total nonsense from this point on).

levels(df$letters) <- c("d", "c", "b", "a")
df
#   numbers letters
# 1       1       d
# 2       2       c
# 3       3       b
# 4       4       a

I simply want to change the level order, so when plotting, the bars are shown in the desired order - which may differ from default alphabetical order.

Could someone give me a hint to why assignment to levels(...) changes the order of the entries in the data frame, as crangos shows in the question? It seems terribly unintuitive and undesired to me. I spent some time debugging an issue caused by this today myself. I am thinking there might be a reason for this behaviour that I cannot see though, or at least a reasonable explanation for why it happens.

H
Henrik

Use the levels argument of factor:

df <- data.frame(f = 1:4, g = letters[1:4])
df
#   f g
# 1 1 a
# 2 2 b
# 3 3 c
# 4 4 d

levels(df$g)
# [1] "a" "b" "c" "d"

df$g <- factor(df$g, levels = letters[4:1])
# levels(df$g)
# [1] "d" "c" "b" "a"

df
#   f g
# 1 1 a
# 2 2 b
# 3 3 c
# 4 4 d

Thank you, this worked. For some strange reason ggplot now correctly changed the order in the legend, but not in the plot. Weird.
ggplot2 required me to change both, the order of the levels (see above) and the order of the values of the data frame. df <- df[nrow(df):1, ] # reverse
@crangos, I think ggplot uses alphabetical ordering of levels, and sometimes ignores custom factor levels. Please confirm, and include version number.
r
rawr

some more, just for the record

## reorder is a base function
df$letters <- reorder(df$letters, new.order=letters[4:1])

library(gdata)
df$letters <- reorder.factor(df$letters, letters[4:1])

You may also find useful Relevel and combine_factor.


Your first answer doesn't work for me. But this works: reorder(df$letters, seq(4,1))
I have a very strange situation where the ´reorder´ works on one dataset, not on another. On the other dataset, it throws an error "Error in tapply(X = X, INDEX = x, FUN = FUN, ...) : argument "X" is missing, with no default". Not sure what the solution to this problem is. I can't find any relevant difference between the datasets.
J
Joe

Since this question was last active Hadley has released his new forcats package for manipulating factors and I'm finding it outrageously useful. Examples from the OP's data frame:

levels(df$letters)
# [1] "a" "b" "c" "d"

To reverse levels:

library(forcats)
fct_rev(df$letters) %>% levels
# [1] "d" "c" "b" "a"

To add more levels:

fct_expand(df$letters, "e") %>% levels
# [1] "a" "b" "c" "d" "e"

And many more useful fct_xxx() functions.


Is this still available?
You want to write a code like this: df %>% mutate(letters = fct_rev(letters)) .
r
rawr

so what you want, in R lexicon, is to change only the labels for a given factor variable (ie, leave the data as well as the factor levels, unchanged).

df$letters = factor(df$letters, labels=c("d", "c", "b", "a"))

given that you want to change only the datapoint-to-label mapping and not the data or the factor schema (how the datapoints are binned into individual bins or factor values, it might help to know how the mapping is originally set when you initially create the factor.

the rules are simple:

labels are mapped to levels by index value (ie, the value at levels[2] is given the label, label[2]);

factor levels can be set explicitly by passing them in via the the levels argument; or

if no value is supplied for the levels argument, the default value is used which is the result calling unique on the data vector passed in (for the data argument);

labels can be set explicitly via the labels argument; or

if no value is supplied for the labels argument, the default value is used which is just the levels vector


I don't know why this isn't as up voted as the accepted answer. This is much more informative.
If you use this approach, your data is mislabeled.
actually yeah I don't know what to do with this, the answer seems to intend to mislabel the data for sake of plotting? ugh. rolled back to original. users beware
a
aL3xa

Dealing with factors in R is quite peculiar job, I must admit... While reordering the factor levels, you're not reordering underlying numerical values. Here's a little demonstration:

> numbers = 1:4
> letters = factor(letters[1:4])
> dtf <- data.frame(numbers, letters)
> dtf
  numbers letters
1       1       a
2       2       b
3       3       c
4       4       d
> sapply(dtf, class)
  numbers   letters 
"integer"  "factor" 

Now, if you convert this factor to numeric, you'll get:

# return underlying numerical values
1> with(dtf, as.numeric(letters))
[1] 1 2 3 4
# change levels
1> levels(dtf$letters) <- letters[4:1]
1> dtf
  numbers letters
1       1       d
2       2       c
3       3       b
4       4       a
# return numerical values once again
1> with(dtf, as.numeric(letters))
[1] 1 2 3 4

As you can see... by changing levels, you change levels only (who would tell, eh?), not the numerical values! But, when you use factor function as @Jonathan Chang suggested, something different happens: you change numerical values themselves.

You're getting error once again 'cause you do levels and then try to relevel it with factor. Don't do it!!! Do not use levels or you'll mess things up (unless you know exactly what you're doing).

One lil' suggestion: avoid naming your objects with an identical name as R's objects (df is density function for F distribution, letters gives lowercase alphabet letters). In this particular case, your code would not be faulty, but sometimes it can be... but this can create confusion, and we don't want that, do we?!? =)

Instead, use something like this (I'll go from the beginning once again):

> dtf <- data.frame(f = 1:4, g = factor(letters[1:4]))
> dtf
  f g
1 1 a
2 2 b
3 3 c
4 4 d
> with(dtf, as.numeric(g))
[1] 1 2 3 4
> dtf$g <- factor(dtf$g, levels = letters[4:1])
> dtf
  f g
1 1 a
2 2 b
3 3 c
4 4 d
> with(dtf, as.numeric(g))
[1] 4 3 2 1

Note that you can also name you data.frame with df and letters instead of g, and the result will be OK. Actually, this code is identical with the one you posted, only the names are changed. This part factor(dtf$letter, levels = letters[4:1]) wouldn't throw an error, but it can be confounding!

Read the ?factor manual thoroughly! What's the difference between factor(g, levels = letters[4:1]) and factor(g, labels = letters[4:1])? What's similar in levels(g) <- letters[4:1] and g <- factor(g, labels = letters[4:1])?

You can put ggplot syntax, so we can help you more on this one!

Cheers!!!

Edit:

ggplot2 actually requires to change both levels and values? Hm... I'll dig this one out...


j
joel.wilson

I wish to add another case where the levels could be strings carrying numbers alongwith some special characters : like below example

df <- data.frame(x = c("15-25", "0-4", "5-10", "11-14", "100+"))

The default levels of x is :

df$x
# [1] 15-25 0-4   5-10  11-14 100+ 
# Levels: 0-4 100+ 11-14 15-25 5-10

Here if we want to reorder the factor levels according to the numeric value, without explicitly writing out the levels, what we could do is

library(gtools)
df$x <- factor(df$x, levels = mixedsort(df$x))

df$x
# [1] 15-25 0-4   5-10  11-14 100+ 
# Levels: 0-4 5-10 11-14 15-25 100+
as.numeric(df$x)
# [1] 4 1 2 3 5

I hope this can be considered as useful information for future readers.


B
Boern

Here's my function to reorder factors of a given dataframe:

reorderFactors <- function(df, column = "my_column_name", 
                           desired_level_order = c("fac1", "fac2", "fac3")) {

  x = df[[column]]
  lvls_src = levels(x) 

  idxs_target <- vector(mode="numeric", length=0)
  for (target in desired_level_order) {
    idxs_target <- c(idxs_target, which(lvls_src == target))
  }

  x_new <- factor(x,levels(x)[idxs_target])

  df[[column]] <- x_new

  return (df)
}

Usage: reorderFactors(df, "my_col", desired_level_order = c("how","I","want"))


M
Maria

I would simply use the levels argument:

levels(df$letters) <- levels(df$letters)[c(4:1)]

x
xaviescacs

To add yet another approach that is quite useful as it frees us from remembering functions from differents packages. The levels of a factor are just attributes, so one can do the following:

numbers <- 1:4
letters <- factor(c("a", "b", "c", "d"))
df <- data.frame(numbers, letters)

# Original attributes
> attributes(df$letters)
$levels
[1] "a" "b" "c" "d"

$class
[1] "factor"

# Modify attributes
attr(df$letters,"levels") <- c("d", "c", "b", "a")

> df$letters
[1] d c b a
Levels: d c b a

# New attributes
> attributes(df$letters)
$levels
[1] "d" "c" "b" "a"

$class
[1] "factor"

关注公众号,不定期副业成功案例分享
Follow WeChat

Success story sharing

Want to stay one step ahead of the latest teleworks?

Subscribe Now