I have a nested list of data. Its length is 132 and each item is a list of length 20. Is there a quick way to convert this structure into a data frame that has 132 rows and 20 columns of data?
Here is some sample data to work with:
l <- replicate(
132,
as.list(sample(letters, 20)),
simplify = FALSE
)
With rbind
do.call(rbind.data.frame, your_list)
Edit: Previous version return data.frame
of list
's instead of vectors (as @IanSudbery pointed out in comments).
Update July 2020:
The default for the parameter stringsAsFactors
is now default.stringsAsFactors()
which in turn yields FALSE
as its default.
Assuming your list of lists is called l
:
df <- data.frame(matrix(unlist(l), nrow=length(l), byrow=TRUE))
The above will convert all character columns to factors, to avoid this you can add a parameter to the data.frame() call:
df <- data.frame(matrix(unlist(l), nrow=132, byrow=TRUE),stringsAsFactors=FALSE)
names(df) <- names(unlist(l[1]))
after using above command.
You can use the plyr
package. For example a nested list of the form
l <- list(a = list(var.1 = 1, var.2 = 2, var.3 = 3)
, b = list(var.1 = 4, var.2 = 5, var.3 = 6)
, c = list(var.1 = 7, var.2 = 8, var.3 = 9)
, d = list(var.1 = 10, var.2 = 11, var.3 = 12)
)
has now a length of 4 and each list in l
contains another list of the length 3. Now you can run
library (plyr)
df <- ldply (l, data.frame)
and should get the same result as in the answer @Marek and @nico.
matrix
approach.
Fixing the sample data so it matches the original description 'each item is a list of length 20'
mylistlist <- replicate(
132,
as.list(sample(letters, 20)),
simplify = FALSE
)
we can convert it to a data frame like this:
data.frame(t(sapply(mylistlist,c)))
sapply
converts it to a matrix. data.frame
converts the matrix to a data frame.
resulting in:
https://i.stack.imgur.com/Lv0kn.png
c
to play here, one instance of the list's data? Oh wait, c for the concatenate function right? Getting confused with @mnel's usage of c. I also concur with @dchandler, getting the column names right was a valuable need in my use case. Brilliant solution.
?c
: Combine Values into a Vector or List
assume your list is called L
,
data.frame(Reduce(rbind, L))
data.frame(Reduce(rbind, list(c('col1','col2'))))
produces a data frame with 2 rows, 1 column (I expected 1 row 2 columns)
reduce(L, rbind)
. This outputs a single dataframe and assumes that each data frame in your list (L) is organized the same way (i.e. contains the same number of columns in the same order.
The package data.table
has the function rbindlist
which is a superfast implementation of do.call(rbind, list(...))
.
It can take a list of lists
, data.frames
or data.tables
as input.
library(data.table)
ll <- list(a = list(var.1 = 1, var.2 = 2, var.3 = 3)
, b = list(var.1 = 4, var.2 = 5, var.3 = 6)
, c = list(var.1 = 7, var.2 = 8, var.3 = 9)
, d = list(var.1 = 10, var.2 = 11, var.3 = 12)
)
DT <- rbindlist(ll)
This returns a data.table
inherits from data.frame
.
If you really want to convert back to a data.frame use as.data.frame(DT)
setDF
now allows for returning to data.frame by reference.
The tibble
package has a function enframe()
that solves this problem by coercing nested list
objects to nested tibble
("tidy" data frame) objects. Here's a brief example from R for Data Science:
x <- list(
a = 1:5,
b = 3:4,
c = 5:6
)
df <- enframe(x)
df
#> # A tibble: 3 × 2
#> name value
#> <chr> <list>
#> 1 a <int [5]>
#> 2 b <int [2]>
#> 3 c <int [2]>
Since you have several nests in your list, l
, you can use the unlist(recursive = FALSE)
to remove unnecessary nesting to get just a single hierarchical list and then pass to enframe()
. I use tidyr::unnest()
to unnest the output into a single level "tidy" data frame, which has your two columns (one for the group name
and one for the observations with the groups value
). If you want columns that make wide, you can add a column using add_column()
that just repeats the order of the values 132 times. Then just spread()
the values.
library(tidyverse)
l <- replicate(
132,
list(sample(letters, 20)),
simplify = FALSE
)
l_tib <- l %>%
unlist(recursive = FALSE) %>%
enframe() %>%
unnest()
l_tib
#> # A tibble: 2,640 x 2
#> name value
#> <int> <chr>
#> 1 1 d
#> 2 1 z
#> 3 1 l
#> 4 1 b
#> 5 1 i
#> 6 1 j
#> 7 1 g
#> 8 1 w
#> 9 1 r
#> 10 1 p
#> # ... with 2,630 more rows
l_tib_spread <- l_tib %>%
add_column(index = rep(1:20, 132)) %>%
spread(key = index, value = value)
l_tib_spread
#> # A tibble: 132 x 21
#> name `1` `2` `3` `4` `5` `6` `7` `8` `9` `10` `11`
#> * <int> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 1 d z l b i j g w r p y
#> 2 2 w s h r i k d u a f j
#> 3 3 r v q s m u j p f a i
#> 4 4 o y x n p i f m h l t
#> 5 5 p w v d k a l r j q n
#> 6 6 i k w o c n m b v e q
#> 7 7 c d m i u o e z v g p
#> 8 8 f s e o p n k x c z h
#> 9 9 d g o h x i c y t f j
#> 10 10 y r f k d o b u i x s
#> # ... with 122 more rows, and 9 more variables: `12` <chr>, `13` <chr>,
#> # `14` <chr>, `15` <chr>, `16` <chr>, `17` <chr>, `18` <chr>,
#> # `19` <chr>, `20` <chr>
Depending on the structure of your lists there are some tidyverse
options that work nicely with unequal length lists:
l <- list(a = list(var.1 = 1, var.2 = 2, var.3 = 3)
, b = list(var.1 = 4, var.2 = 5)
, c = list(var.1 = 7, var.3 = 9)
, d = list(var.1 = 10, var.2 = 11, var.3 = NA))
df <- dplyr::bind_rows(l)
df <- purrr::map_df(l, dplyr::bind_rows)
df <- purrr::map_df(l, ~.x)
# all create the same data frame:
# A tibble: 4 x 3
var.1 var.2 var.3
<dbl> <dbl> <dbl>
1 1 2 3
2 4 5 NA
3 7 NA 9
4 10 11 NA
You can also mix vectors and data frames:
library(dplyr)
bind_rows(
list(a = 1, b = 2),
data_frame(a = 3:4, b = 5:6),
c(a = 7)
)
# A tibble: 4 x 2
a b
<dbl> <dbl>
1 1 2
2 3 5
3 4 6
4 7 NA
X2
can't be converted from integer to character
This method uses a tidyverse
package (purrr).
The list:
x <- as.list(mtcars)
Converting it into a data frame (a tibble
more specifically):
library(purrr)
map_df(x, ~.x)
EDIT: May 30, 2021
This can actually be achieved with the bind_rows()
function in dplyr
.
x <- as.list(mtcars)
dplyr::bind_rows(x)
A tibble: 32 x 11
mpg cyl disp hp drat wt qsec vs am gear carb
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 21 6 160 110 3.9 2.62 16.5 0 1 4 4
2 21 6 160 110 3.9 2.88 17.0 0 1 4 4
3 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1
4 21.4 6 258 110 3.08 3.22 19.4 1 0 3 1
5 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2
6 18.1 6 225 105 2.76 3.46 20.2 1 0 3 1
7 14.3 8 360 245 3.21 3.57 15.8 0 0 3 4
8 24.4 4 147. 62 3.69 3.19 20 1 0 4 2
9 22.8 4 141. 95 3.92 3.15 22.9 1 0 4 2
10 19.2 6 168. 123 3.92 3.44 18.3 1 0 4 4
# ... with 22 more rows
Reshape2 yields the same output as the plyr example above:
library(reshape2)
l <- list(a = list(var.1 = 1, var.2 = 2, var.3 = 3)
, b = list(var.1 = 4, var.2 = 5, var.3 = 6)
, c = list(var.1 = 7, var.2 = 8, var.3 = 9)
, d = list(var.1 = 10, var.2 = 11, var.3 = 12)
)
l <- melt(l)
dcast(l, L1 ~ L2)
yields:
L1 var.1 var.2 var.3
1 a 1 2 3
2 b 4 5 6
3 c 7 8 9
4 d 10 11 12
If you were almost out of pixels you could do this all in 1 line w/ recast().
Extending on @Marek's answer: if you want to avoid strings to be turned into factors and efficiency is not a concern try
do.call(rbind, lapply(your_list, data.frame, stringsAsFactors=FALSE))
For the general case of deeply nested lists with 3 or more levels like the ones obtained from a nested JSON:
{
"2015": {
"spain": {"population": 43, "GNP": 9},
"sweden": {"population": 7, "GNP": 6}},
"2016": {
"spain": {"population": 45, "GNP": 10},
"sweden": {"population": 9, "GNP": 8}}
}
consider the approach of melt()
to convert the nested list to a tall format first:
myjson <- jsonlite:fromJSON(file("test.json"))
tall <- reshape2::melt(myjson)[, c("L1", "L2", "L3", "value")]
L1 L2 L3 value
1 2015 spain population 43
2 2015 spain GNP 9
3 2015 sweden population 7
4 2015 sweden GNP 6
5 2016 spain population 45
6 2016 spain GNP 10
7 2016 sweden population 9
8 2016 sweden GNP 8
followed by dcast()
then to wide again into a tidy dataset where each variable forms a a column and each observation forms a row:
wide <- reshape2::dcast(tall, L1+L2~L3)
# left side of the formula defines the rows/observations and the
# right side defines the variables/measurements
L1 L2 GNP population
1 2015 spain 9 43
2 2015 sweden 6 7
3 2016 spain 10 45
4 2016 sweden 8 9
More answers, along with timings in the answer to this question: What is the most efficient way to cast a list as a data frame?
The quickest way, that doesn't produce a dataframe with lists rather than vectors for columns appears to be (from Martin Morgan's answer):
l <- list(list(col1="a",col2=1),list(col1="b",col2=2))
f = function(x) function(i) unlist(lapply(x, `[[`, i), use.names=FALSE)
as.data.frame(Map(f(l), names(l[[1]])))
Sometimes your data may be a list of lists of vectors of the same length.
lolov = list(list(c(1,2,3),c(4,5,6)), list(c(7,8,9),c(10,11,12),c(13,14,15)) )
(The inner vectors could also be lists, but I'm simplifying to make this easier to read).
Then you can make the following modification. Remember that you can unlist one level at a time:
lov = unlist(lolov, recursive = FALSE )
> lov
[[1]]
[1] 1 2 3
[[2]]
[1] 4 5 6
[[3]]
[1] 7 8 9
[[4]]
[1] 10 11 12
[[5]]
[1] 13 14 15
Now use your favorite method mentioned in the other answers:
library(plyr)
>ldply(lov)
V1 V2 V3
1 1 2 3
2 4 5 6
3 7 8 9
4 10 11 12
5 13 14 15
The following simple command worked for me:
myDf <- as.data.frame(myList)
Reference (Quora answer)
> myList <- list(a = c(1, 2, 3), b = c(4, 5, 6))
> myList
$a
[1] 1 2 3
$b
[1] 4 5 6
> myDf <- as.data.frame(myList)
a b
1 1 4
2 2 5
3 3 6
> class(myDf)
[1] "data.frame"
But this will fail if it’s not obvious how to convert the list to a data frame:
> myList <- list(a = c(1, 2, 3), b = c(4, 5, 6, 7))
> myDf <- as.data.frame(myList)
Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, : arguments imply differing number of rows: 3, 4
Note: The answer is toward the title of the question and may skips some details of the question
If your list has elements with the same dimensions, you could use the bind_rows
function from the tidyverse.
# Load the tidyverse
Library(tidyverse)
# make a list with elements having same dimensions
My_list <- list(a = c(1, 4, 5), b = c(9, 3, 8))
## Bind the rows
My_list %>% bind_rows()
The result is a data frame with two rows.
This is what finally worked for me:
do.call("rbind", lapply(S1, as.data.frame))
For a paralleled (multicore, multisession, etc) solution using purrr
family of solutions, use:
library (furrr)
plan(multisession) # see below to see which other plan() is the more efficient
myTibble <- future_map_dfc(l, ~.x)
Where l
is the list.
To benchmark the most efficient plan()
you can use:
library(tictoc)
plan(sequential) # reference time
# plan(multisession) # benchamark plan() goes here. See ?plan().
tic()
myTibble <- future_map_dfc(l, ~.x)
toc()
A short (but perhaps not the fastest) way to do this would be to use base r, since a data frame is just a list of equal length vectors. Thus the conversion between your input list and a 30 x 132 data.frame would be:
df <- data.frame(l)
From there we can transpose it to a 132 x 30 matrix, and convert it back to a dataframe:
new_df <- data.frame(t(df))
As a one-liner:
new_df <- data.frame(t(data.frame(l)))
The rownames will be pretty annoying to look at, but you could always rename those with
rownames(new_df) <- 1:nrow(new_df)
l <- replicate(10,list(sample(letters, 20)))
a <-lapply(l[1:10],data.frame)
do.call("cbind", a)
Every solution I have found seems to only apply when every object in a list
has the same length
. I needed to convert a list
to a data.frame
when the length
of the objects in the list
were of unequal length
. Below is the base R
solution I came up with. It no doubt is very inefficient, but it does seem to work.
x1 <- c(2, 13)
x2 <- c(2, 4, 6, 9, 11, 13)
x3 <- c(1, 1, 2, 3, 3, 4, 5, 5, 6, 7, 7, 8, 9, 9, 10, 11, 11, 12, 13, 13)
my.results <- list(x1, x2, x3)
# identify length of each list
my.lengths <- unlist(lapply(my.results, function (x) { length(unlist(x))}))
my.lengths
#[1] 2 6 20
# create a vector of values in all lists
my.values <- as.numeric(unlist(c(do.call(rbind, lapply(my.results, as.data.frame)))))
my.values
#[1] 2 13 2 4 6 9 11 13 1 1 2 3 3 4 5 5 6 7 7 8 9 9 10 11 11 12 13 13
my.matrix <- matrix(NA, nrow = max(my.lengths), ncol = length(my.lengths))
my.cumsum <- cumsum(my.lengths)
mm <- 1
for(i in 1:length(my.lengths)) {
my.matrix[1:my.lengths[i],i] <- my.values[mm:my.cumsum[i]]
mm <- my.cumsum[i]+1
}
my.df <- as.data.frame(my.matrix)
my.df
# V1 V2 V3
#1 2 2 1
#2 13 4 1
#3 NA 6 2
#4 NA 9 3
#5 NA 11 3
#6 NA 13 4
#7 NA NA 5
#8 NA NA 5
#9 NA NA 6
#10 NA NA 7
#11 NA NA 7
#12 NA NA 8
#13 NA NA 9
#14 NA NA 9
#15 NA NA 10
#16 NA NA 11
#17 NA NA 11
#18 NA NA 12
#19 NA NA 13
#20 NA NA 13
How about using map_
function together with a for
loop? Here is my solution:
list_to_df <- function(list_to_convert) {
tmp_data_frame <- data.frame()
for (i in 1:length(list_to_convert)) {
tmp <- map_dfr(list_to_convert[[i]], data.frame)
tmp_data_frame <- rbind(tmp_data_frame, tmp)
}
return(tmp_data_frame)
}
where map_dfr
convert each of the list element into a data.frame and then rbind
union them altogether.
In your case, I guess it would be:
converted_list <- list_to_df(l)
map(list_to_convert, ~map_dfr(., data.frame))
but still it is wrong.
Try collapse::unlist2d
(shorthand for 'unlist to data.frame'):
l <- replicate(
132,
list(sample(letters, 20)),
simplify = FALSE
)
library(collapse)
head(unlist2d(l))
.id.1 .id.2 V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 V17 V18 V19 V20
1 1 1 e x b d s p a c k z q m u l h n r t o y
2 2 1 r t i k m b h n s e p f o c x l g v a j
3 3 1 t r v z a u c o w f m b d g p q y e n k
4 4 1 x i e p f d q k h b j s z a t v y l m n
5 5 1 d z k y a p b h c v f m u l n q e i w j
6 6 1 l f s u o v p z q e r c h n a t m k y x
head(unlist2d(l, idcols = FALSE))
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 V17 V18 V19 V20
1 e x b d s p a c k z q m u l h n r t o y
2 r t i k m b h n s e p f o c x l g v a j
3 t r v z a u c o w f m b d g p q y e n k
4 x i e p f d q k h b j s z a t v y l m n
5 d z k y a p b h c v f m u l n q e i w j
6 l f s u o v p z q e r c h n a t m k y x
Or you could use the tibble package (from tidyverse):
#create examplelist
l <- replicate(
132,
as.list(sample(letters, 20)),
simplify = FALSE
)
#package tidyverse
library(tidyverse)
#make a dataframe (or use as_tibble)
df <- as_data_frame(l,.name_repair = "unique")
I want to suggest this solution as well. Although it looks similar to other solutions, it uses rbind.fill from the plyr package. This is advantageous in situations where a list has missing columns or NA values.
l <- replicate(10,as.list(sample(letters,10)),simplify = FALSE)
res<-data.frame()
for (i in 1:length(l))
res<-plyr::rbind.fill(res,data.frame(t(unlist(l[i]))))
res
From a different perspective;
install.packages("smotefamily")
library(smotefamily)
library(dplyr)
data_example = sample_generator(5000,ratio = 0.80)
genData = BLSMOTE(data_example[,-3],data_example[,3])
#There are many lists in genData. If we want to convert one of them to dataframe.
sentetic=as.data.frame.array(genData$syn_data)
# as.data.frame.array seems to be working.
Success story sharing
rbind(your_list)
returns a 1x32 list matrix?do.call
pass elements ofyour_list
as arguments torbind
. It's equivalent ofrbind(your_list[[1]], your_list[[2]], your_list[[3]], ....., your_list[[length of your_list]])
.your_list
contain equally sized vectors.NULL
has length 0 so it should failed.