With this data frame ("df"):
year pollution
1 1999 346.82000
2 2002 134.30882
3 2005 130.43038
4 2008 88.27546
I try to create a line chart like this:
plot5 <- ggplot(df, aes(year, pollution)) +
geom_point() +
geom_line() +
labs(x = "Year", y = "Particulate matter emissions (tons)", title = "Motor vehicle emissions in Baltimore")
The error I get is:
geom_path: Each group consist of only one observation. Do you need to adjust the group aesthetic?
The chart appears as a scatter plot even though I want a line chart. I tried to replace geom_line()
with geom_line(aes(group = year))
but that didn't work.
In an answer I was told to convert year to a factor variable. I did and the problem persists. This is the output of str(df)
and dput(df)
:
'data.frame': 4 obs. of 2 variables:
$ year : num 1 2 3 4
$ pollution: num [1:4(1d)] 346.8 134.3 130.4 88.3
..- attr(*, "dimnames")=List of 1
.. ..$ : chr "1999" "2002" "2005" "2008"
structure(list(year = c(1, 2, 3, 4), pollution = structure(c(346.82,
134.308821199349, 130.430379885892, 88.275457392443), .Dim = 4L, .Dimnames = list(
c("1999", "2002", "2005", "2008")))), .Names = c("year",
"pollution"), row.names = c(NA, -4L), class = "data.frame")
df
is not what you think it is. Please state your question in reproducible form, i.e. show the output of dput(df)
.
You only have to add group = 1
into the ggplot or geom_line aes().
For line graphs, the data points must be grouped so that it knows which points to connect. In this case, it is simple -- all points should be connected, so group=1. When more variables are used and multiple lines are drawn, the grouping for lines is usually done by variable.
Reference: Cookbook for R, Chapter: Graphs Bar_and_line_graphs_(ggplot2), Line graphs.
Try this:
plot5 <- ggplot(df, aes(year, pollution, group = 1)) +
geom_point() +
geom_line() +
labs(x = "Year", y = "Particulate matter emissions (tons)",
title = "Motor vehicle emissions in Baltimore")
You get this error because one of your variables is actually a factor variable . Execute
str(df)
to check this. Then do this double variable change to keep the year numbers instead of transforming into "1,2,3,4" level numbers:
df$year <- as.numeric(as.character(df$year))
EDIT: it appears that your data.frame has a variable of class "array" which might cause the pb. Try then:
df <- data.frame(apply(df, 2, unclass))
and plot again?
I had similar problem with the data frame:
group time weight.loss
1 Control wl1 4.500000
2 Diet wl1 5.333333
3 DietEx wl1 6.200000
4 Control wl2 3.333333
5 Diet wl2 3.916667
6 DietEx wl2 6.100000
7 Control wl3 2.083333
8 Diet wl3 2.250000
9 DietEx wl3 2.200000
I think the variable for x axis should be numeric, so that geom_line knows how to connect the points to draw the line.
after I change the 2nd column to numeric:
group time weight.loss
1 Control 1 4.500000
2 Diet 1 5.333333
3 DietEx 1 6.200000
4 Control 2 3.333333
5 Diet 2 3.916667
6 DietEx 2 6.100000
7 Control 3 2.083333
8 Diet 3 2.250000
9 DietEx 3 2.200000
then it works.
Start up R in a fresh session and paste this in:
library(ggplot2)
df <- structure(list(year = c(1, 2, 3, 4), pollution = structure(c(346.82,
134.308821199349, 130.430379885892, 88.275457392443), .Dim = 4L, .Dimnames = list(
c("1999", "2002", "2005", "2008")))), .Names = c("year",
"pollution"), row.names = c(NA, -4L), class = "data.frame")
df[] <- lapply(df, as.numeric) # make all columns numeric
ggplot(df, aes(year, pollution)) +
geom_point() +
geom_line() +
labs(x = "Year",
y = "Particulate matter emissions (tons)",
title = "Motor vehicle emissions in Baltimore")
pollution
is a 1d array rather than a plain vector. Look at str(df)
I got a similar prompt. It was because I had specified the x-axis in terms of some percentage (for example: 10%A, 20%B,....). So an alternate approach could be that you multiply these values and write them in the simplest form.
I found this can also occur if the most of the data plotted is outside of the axis limits. In that case, adjust the axis scales accordingly.
Success story sharing
group
argument. Grouping only e.g. bycolor
would not be sufficient.I just had this trouble and hope this helps someone running into the samedf %>% arrange(pollution) %>% ggplot()