翻译整理自：Top 50 ggplot2 Visualizations - The Master List，有删改。

最后一部分了，希望一次完成🙈️。拖这么久主要是后面有些图不是很常用，所以没什么动力去仔细看。

4. Distribution

当数据量很大，我们只想看看数据分布情况。

Histogram

默认情况下，如果传给 ggplot2 只有一个参数，geom_bar() 会尝试将对这一列数据进行计数然后用计数来画条图。如果数据本身就是数值（不是数量）想用来直接画条图，可以使用 stat=identity 参数，但这个时候必须同时有 x/y 两个数据。

Histogram on a continuous variable

geom_bar() 或 geom_histogram() 多可以用来针对连续变量画条图。geom_histogram() 可以用 bins 参数控制图条的数量，也可以用 binwidth 设置图条对应的区间宽度。也因为 geom_histogram() 的参数更加灵活，所以画直方图是推荐用它的。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18


library(ggplot2)
theme_set(theme_classic())

# Histogram on a Continuous (Numeric) Variable
g <- ggplot(mpg, aes(displ)) + scale_fill_brewer(palette = "Spectral")
g + geom_histogram(aes(fill=class), 
                   binwidth = .1, 
                   col="black", 
                   size=.1) +  # change binwidth
    labs(title="Histogram with Auto Binning", 
         subtitle="Engine Displacement across Vehicle Classes")

g + geom_histogram(aes(fill=class), 
                   bins=5, 
                   col="black", 
                   size=.1) +   # change number of bins
    labs(title="Histogram with Fixed Bins", 
         subtitle="Engine Displacement across Vehicle Classes") 

Histogram on a categorical variable

对分类变量画条图会得到各个类别的计数情况。通过调整 width 参数可以控制图条的宽度。

1
2
3
4
5
6
7
8


library(ggplot2)
theme_set(theme_classic())
# Histogram on a Categorical variable
g <- ggplot(mpg, aes(manufacturer))
g + geom_bar(aes(fill=class), width = 0.5) + 
    theme(axis.text.x = element_text(angle=65, vjust=0.6)) + 
    labs(title="Histogram on Categorical Variable", 
         subtitle="Manufacturer across Vehicle Classes") 

Density plot

密度图一般用来看连续性变量分布情况

 1
 2
 3
 4
 5
 6
 7
 8
 9
10


library(ggplot2)
theme_set(theme_classic())
# Plot
g <- ggplot(mpg, aes(cty))
g + geom_density(aes(fill=factor(cyl)), alpha=0.8) + 
    labs(title="Density plot", 
         subtitle="City Mileage Grouped by Number of cylinders",
         caption="Source: mpg",
         x="City Mileage",
         fill="# Cylinders")

Box Plot

箱式图也是展示数据分布的好办法。箱式图同时展示了中位数、上下限以及离群点：箱子内的横线是中位数，上下边分别是 75% 和 25% 分位值，箱子两端上下的线表示 1.5*IQR （Inter Quartile Range，表示 25% 和 75% 之间的距离），这之外的数据一般用点画出来，表示离群点。

varwidth=TRUE 可以让箱子的宽度反映出箱子代表的数据点的多少。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10


library(ggplot2)
theme_set(theme_classic())
# Plot
g <- ggplot(mpg, aes(class, cty))
g + geom_boxplot(varwidth=T, fill="plum") + 
    labs(title="Box plot", 
         subtitle="City Mileage grouped by Class of vehicle",
         caption="Source: mpg",
         x="Class of Vehicle",
         y="City Mileage")

1
2
3
4
5
6
7
8
9


library(ggthemes)
g <- ggplot(mpg, aes(class, cty))
g + geom_boxplot(aes(fill=factor(cyl))) + 
  theme(axis.text.x = element_text(angle=65, vjust=0.6)) + 
  labs(title="Box plot", 
       subtitle="City Mileage grouped by Class of vehicle",
       caption="Source: mpg",
       x="Class of Vehicle",
       y="City Mileage")

Dot + Box Plot

在箱式图的基础上，还可以把数据点叠加上来。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16


library(ggplot2)
theme_set(theme_bw())

# plot
g <- ggplot(mpg, aes(manufacturer, cty))
g + geom_boxplot() + 
  geom_dotplot(binaxis='y', 
               stackdir='center', 
               dotsize = .5, 
               fill="red") +
  theme(axis.text.x = element_text(angle=65, vjust=0.6)) + 
  labs(title="Box plot + Dot plot", 
       subtitle="City Mileage vs Class: Each dot represents 1 row in source data",
       caption="Source: mpg",
       x="Class of Vehicle",
       y="City Mileage")

Tufte Boxplot

Tufte 箱式图是基于 Edward Tufte 的可视化理论的一种图，由 ggthemes 提供的。它是一种极简同时又更美观的箱式图。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13


library(ggthemes)
library(ggplot2)
theme_set(theme_tufte())  # from ggthemes

# plot
g <- ggplot(mpg, aes(manufacturer, cty))
g + geom_tufteboxplot() + 
      theme(axis.text.x = element_text(angle=65, vjust=0.6)) + 
      labs(title="Tufte Styled Boxplot", 
           subtitle="City Mileage grouped by Class of vehicle",
           caption="Source: mpg",
           x="Class of Vehicle",
           y="City Mileage")

Violin Plot

小提琴图和箱式图类似，增加了数据的密度信息的展示，这是箱式图所没有的。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11


library(ggplot2)
theme_set(theme_bw())

# plot
g <- ggplot(mpg, aes(class, cty))
g + geom_violin() + 
  labs(title="Violin plot", 
       subtitle="City Mileage vs Class of vehicle",
       caption="Source: mpg",
       x="Class of Vehicle",
       y="City Mileage")

Population Pyramid

人口金字塔，展示各类别人口或者人口百分比的一种图形。下面的图是展示的是邮件促销活动中各个阶段用户量的情况：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25


library(ggplot2)
library(ggthemes)
options(scipen = 999)  # turns of scientific notations like 1e+40

# Read data
email_campaign_funnel <-
    read.csv(
        "https://raw.githubusercontent.com/selva86/datasets/master/email_campaign_funnel.csv"
    )

# X Axis Breaks and Labels
brks <- seq(-15000000, 15000000, 5000000)
lbls = paste0(as.character(c(seq(15, 0, -5), seq(5, 15, 5))), "m")

# Plot
ggplot(email_campaign_funnel, aes(x = Stage, y = Users, fill = Gender)) +   # Fill column
    geom_bar(stat = "identity", width = .6) +   # draw the bars
    scale_y_continuous(breaks = brks,   # Breaks
                       labels = lbls) + # Labels
    coord_flip() +  # Flip axes
    labs(title = "Email Campaign Funnel") +
    theme_tufte() +  # Tufte theme from ggfortify
    theme(plot.title = element_text(hjust = .5),
          axis.ticks = element_blank()) +   # Centre plot title
    scale_fill_brewer(palette = "Dark2")  # Color palette

画这个图的技巧是把不同两组数据画条图在一幅图中，但是其中一个数值改为负值。

5. Composition

Waffle Chart

华夫图用来展示总体中不同类别组成情况的。ggplot 没有提供这个功能，但是我们可以用 geom_tile() 实现这个：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29


var <- mpg$class  # the categorical data
## Prep data (nothing to change here)
nrows <- 10
df <- expand.grid(y = 1:nrows, x = 1:nrows)
categ_table <- round(table(var) * ((nrows * nrows) / (length(var))))
categ_table

df$category <- factor(rep(names(categ_table), categ_table))
# NOTE: if sum(categ_table) is not 100 (i.e. nrows^2), it will need adjustment to make the sum to 100.

## Plot
ggplot(df, aes(x = x, y = y, fill = category)) +
    geom_tile(color = "black", size = 0.5) +
    scale_x_continuous(expand = c(0, 0)) +
    scale_y_continuous(expand = c(0, 0), trans = 'reverse') +
    scale_fill_brewer(palette = "Set3") +
    labs(title = "Waffle Chart",
         subtitle = "'Class' of vehicles",
         caption = "Source: mpg") +
    theme(
        panel.border = element_rect(size = 2),
        plot.title = element_text(size = rel(1.2)),
        axis.text = element_blank(),
        axis.title = element_blank(),
        axis.ticks = element_blank(),
        legend.title = element_blank(),
        legend.position = "right"
    ) + 
    theme_dark()

Pie Chart

饼图就很熟悉了。但是 ggplot2 画饼图有一点点小难，用到的是 coord_polar()：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19


library(ggplot2)
theme_set(theme_classic())

# Source: Frequency table
df <- as.data.frame(table(mpg$class))
colnames(df) <- c("class", "freq")
pie <- ggplot(df, aes(x = "", y = freq, fill = factor(class))) +
    geom_bar(width = 1, stat = "identity") +
    theme(axis.line = element_blank(),
          plot.title = element_text(hjust = 0.5)) +
    labs(
        fill = "class",
        x = NULL,
        y = NULL,
        title = "Pie Chart of class",
        caption = "Source: mpg"
    )

pie + coord_polar(theta = "y", start = 0)

这是当数据是频数资料的时候的画法。下面则是数据是原始分类数据的时候的画法：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15


# Source: Categorical variable.
# mpg$class
pie <- ggplot(mpg, aes(x = "", fill = factor(class))) +
    geom_bar(width = 1) +
    theme(axis.line = element_blank(),
          plot.title = element_text(hjust = 0.5)) +
    labs(
        fill = "class",
        x = NULL,
        y = NULL,
        title = "Pie Chart of class",
        caption = "Source: mpg"
    )

pie + coord_polar(theta = "y", start = 0)

和饼图类似的是甜甜圈图（Donut plot），下面的例子来自 Most basic doughnut chart with ggplot2（这个帖子也很有意思，值得一看）:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33


# load library
library(ggplot2)
# Create test data.
data <- data.frame(category = c("A", "B", "C"),
                   count = c(10, 60, 30))
# Compute percentages
data$fraction <- data$count / sum(data$count)
# Compute the cumulative percentages (top of each rectangle)
data$ymax <- cumsum(data$fraction)
# Compute the bottom of each rectangle
data$ymin <- c(0, head(data$ymax, n = -1))
# Compute label position
data$labelPosition <- (data$ymax + data$ymin) / 2
# Compute a good label
data$label <- paste0(data$category, "\n value: ", data$count)

# Make the plot
ggplot(data, aes(
    ymax = ymax,
    ymin = ymin,
    xmax = 4,
    xmin = 3,
    fill = category
)) +
    geom_rect() +
    geom_label(x = 3.5,
               aes(y = labelPosition, label = label),
               size = 5) +
    scale_fill_brewer(palette = 4) +
    coord_polar(theta = "y") +
    xlim(c(2, 4)) +
    theme_void() +
    theme(legend.position = "none")

Treemap

略。

Bar Chart

默认情况下，geom_bar() 的 stat 设置为 count。这导致当只提供一个连续型数据作为 X 变量而不提供 Y 时会得到一个直方图。要画直条图而不是直方图，需要两个数据：

设置 stat = identity
提示提供 X 和 Y 并且设置到 aes() 里，X 是因子型或者字符型，Y 是数值型。

直接用一列分类型数据或者整理好的频数表都可以画条图。width 参数可以调整条的宽度。如果数据已经是整理好的频数资料，那就需要在 geom_bar() 里设置 stat = identity。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21


library("ggplot2")
# prep frequency table
freqtable <- table(mpg$manufacturer)
df <- as.data.frame.table(freqtable)
head(df)
#        Var1 Freq
# 1      audi   18
# 2 chevrolet   19
# 3     dodge   37
# 4      ford   25
# 5     honda    9
# 6   hyundai   14

theme_set(theme_classic())
# Plot
g <- ggplot(df, aes(Var1, Freq))
g + geom_bar(stat = "identity", width = 0.5, fill = "tomato2") +
    labs(title = "Bar Chart",
         subtitle = "Manufacturer of vehicles",
         caption = "Source: Frequency of Manufacturers from 'mpg' dataset") +
    theme(axis.text.x = element_text(angle = 65, vjust = 0.6))

其实不提供计算好的频数表，ggplot 也能自己计算频数然后画图。这时候只需要提供 X 变量就可以，同时不要设置 stat = identity:

1
2
3
4
5
6
7


# From on a categorical column variable
g <- ggplot(mpg, aes(manufacturer))
g + geom_bar(aes(fill = class), width = 0.5) +
    theme(axis.text.x = element_text(angle = 65, vjust = 0.6)) +
    labs(title = "Categorywise Bar Chart",
         subtitle = "Manufacturer of vehicles",
         caption = "Source: Manufacturers from 'mpg' dataset")

6. Change

这里的改变都是指随时间改变的时间序列数据。

Time Series Plot From a Time Series Object (`ts`)

ggfortify 包可以识别时间序列对象直接自动作图：

1
2
3
4
5
6
7
8
9


## From Timeseries object (ts)
library("ggplot2")
library("ggfortify")
theme_set(theme_classic())

# Plot
autoplot(AirPassengers) +
    labs(title = "AirPassengers") +
    theme(plot.title = element_text(hjust = 0.5))

Time Series Plot From a Data Frame

geom_line() 可以直接使用数据框画时间序列的线图。这时候 X 轴会根据数据自动生成。下面的例子里 X 轴自动在每 10 年的位置生成了一个刻度。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26


library("ggplot2")
theme_set(theme_classic())

data("economics")
head(economics)
# # A tibble: 6 x 6
#   date         pce    pop psavert uempmed unemploy
#   <date>     <dbl>  <dbl>   <dbl>   <dbl>    <dbl>
# 1 1967-07-01  507. 198712    12.6     4.5     2944
# 2 1967-08-01  510. 198911    12.6     4.7     2945
# 3 1967-09-01  516. 199113    11.9     4.6     2958
# 4 1967-10-01  512. 199311    12.9     4.9     3143
# 5 1967-11-01  517. 199498    12.8     4.7     3066
# 6 1967-12-01  525. 199657    11.8     4.8     3018

economics$returns_perc <-
    c(0,
      diff(economics$psavert) / economics$psavert[-length(economics$psavert)])
# Allow Default X Axis Labels
ggplot(economics, aes(x = date)) +
    geom_line(aes(y = returns_perc)) +
    labs(
        title = "Time Series Chart",
        subtitle = "Returns Percentage from 'Economics' Dataset",
        caption = "Source: Economics",
        y = "Returns %")

Time Series Plot For a Monthly Time Series

如果对自动生成的时间刻度不满意，可以用 scale_x_date() 分别指定 breaks 和 labels 来设置新的 X 轴：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27


library("ggplot2")
library("lubridate")
theme_set(theme_bw())

economics_m <- economics[1:24,]

# labels and breaks for X axis text
lbls <-
    paste0(month.abb[month(economics_m$date)], 
           " ",
           lubridate::year(economics_m$date))
brks <- economics_m$date

# plot
ggplot(economics_m, aes(x = date)) +
    geom_line(aes(y = returns_perc)) +
    labs(
        title = "Monthly Time Series",
        subtitle = "Returns Percentage from Economics Dataset",
        caption = "Source: Economics",
        y = "Returns %"
    ) +  # title and caption
    scale_x_date(labels = lbls,
                 breaks = brks) +  # change to monthly ticks and labels
    theme(axis.text.x = element_text(angle = 90, vjust = 0.5),
          # rotate x axis text
          panel.grid.minor = element_blank())  # turn off minor grid

Time Series Plot For a Yearly Time Series

既然能自定义为按月作图，自然也就可以定义为按年作图了。做法和上面一样：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24


library("ggplot2")
library("lubridate")
theme_set(theme_bw())

economics_y <- economics[1:90,]

# labels and breaks for X axis text
brks <- economics_y$date[seq(1, length(economics_y$date), 12)]
lbls <- lubridate::year(brks)

# plot
ggplot(economics_y, aes(x = date)) +
    geom_line(aes(y = returns_perc)) +
    labs(
        title = "Yearly Time Series",
        subtitle = "Returns Percentage from Economics Dataset",
        caption = "Source: Economics",
        y = "Returns %"
    ) +  # title and caption
    scale_x_date(labels = lbls,
                 breaks = brks) +  # change to monthly ticks and labels
    theme(axis.text.x = element_text(angle = 90, vjust = 0.5),
          # rotate x axis text
          panel.grid.minor = element_blank())  # turn off minor grid

Time Series Plot From Long Data Format

长数据形式就是说主要的数据只有两列，一列表示变量名，另一列是值。下面的例子我们用上面的 economics 长数据形式 economics_long，当然因为还有一个时间序列用来做 X 轴，所以这个数据是三列。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49


library("ggplot2")
library("lubridate")
theme_set(theme_bw())

data(economics_long, package = "ggplot2")
head(economics_long)
# # A tibble: 6 x 4
#   date       variable value  value01
#   <date>     <chr>    <dbl>    <dbl>
# 1 1967-07-01 pce       507. 0       
# 2 1967-08-01 pce       510. 0.000265
# 3 1967-09-01 pce       516. 0.000762
# 4 1967-10-01 pce       512. 0.000471
# 5 1967-11-01 pce       517. 0.000916
# 6 1967-12-01 pce       525. 0.00157

df <-
    economics_long[economics_long$variable %in% c("psavert", "uempmed"),]
df <- df[lubridate::year(df$date) %in% c(1967:1981),]

# labels and breaks for X axis text
brks <- df$date[seq(1, length(df$date), 12)]
lbls <- lubridate::year(brks)

# plot
ggplot(df, aes(x = date)) +
    geom_line(aes(y = value, col = variable)) +
    labs(
        title = "Time Series of Returns Percentage",
        subtitle = "Drawn from Long Data format",
        caption = "Source: Economics",
        y = "Returns %",
        color = NULL
    ) +  # title and caption
    # change to monthly ticks and labels
    scale_x_date(labels = lbls, breaks = brks) +
    scale_color_manual(
        labels = c("psavert", "uempmed"),
        values = c("psavert" = "#00ba38", "uempmed" = "#f8766d")
    ) +  # line color
    theme(
        axis.text.x = element_text(
            angle = 90,
            vjust = 0.5,
            size = 8
        ),
        # rotate x axis text
        panel.grid.minor = element_blank()
    )  # turn off minor grid

Time Series Plot From Wide Data Format

前面提到过，作图的时候只要依据一列数据通过 geom 改变了图的几何特性（点的形状/大小/颜色，线的粗细/类型/颜色等等），ggplot 都会自动生成一个对应的图例。但是当我们是用时间序列组图的时候是自己一次一次的调用 geom_line() 一条一条画线，所以这时候并没有自动生成图例。偏偏这时候一般确实又是需要有图例给不同的线做解释的。这时候就可以用 scale_aesthetic_manual() 这些函数来自己加上图例（比如如果只改了线的颜色那就可以用 scale_color_manual()）。这时候还可以通过分别通过 name 和 values 参数指定图例的标题和和作图的颜色。下面我们会作出一张和刚刚上面长数据出来的一模一样的图，但是看代码就知道事实上所用的方法确是完全不一样的。在长数据作图中虽然也用到了 scale_color_manual()，但是在那里这个函数仅仅是为了改变线条颜色而已，不用这个函数上面的图也会有图例生成，只是图会使用 ggplot 的默认颜色而已。但是在这里的例子里如果不使用 scale_color_manual() 的话图根本不会有图例生成。（事实上我自己试了这里即使注释掉 scale_color_manual() 函数出来的图还是有图例的，只是线条颜色确实会变成 ggplot 默认颜色而已而且图例标题不会去掉而已，我猜这可能是 ggplot 在更新过程中加入了这一功能）

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25


library("ggplot2")
library("lubridate")
theme_set(theme_bw())

df <- economics[, c("date", "psavert", "uempmed")]
df <- df[lubridate::year(df$date) %in% c(1967:1981),]

# labels and breaks for X axis text
brks <- df$date[seq(1, length(df$date), 12)]
lbls <- lubridate::year(brks)

# plot
ggplot(df, aes(x = date)) +
    geom_line(aes(y = psavert, col = "psavert")) +
    geom_line(aes(y = uempmed, col = "uempmed")) +
    labs(
        title = "Time Series of Returns Percentage",
        subtitle = "Drawn From Wide Data format",
        caption = "Source: Economics",
        y = "Returns %"
    ) +  # title and caption
    scale_x_date(labels = lbls, breaks = brks) +  # change to monthly ticks and labels
    scale_color_manual(name = "",
                       values = c("psavert" = "#00ba38", "uempmed" = "#f8766d")) +  # line color
    theme(panel.grid.minor = element_blank())  # turn off minor grid

Stacked Area Chart

略。

Calendar Heatmap

略。

Slope Chart

坡度图很适合用于展示数值的变化情况以及不同类别的排序。当时间序列数据但是时间点很少的时候也很适合用坡度图。

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116


library("dplyr")
theme_set(theme_classic())

url <- textConnection(RCurl::getURL("https://raw.githubusercontent.com/jkeirstead/r-slopegraph/master/cancer_survival_rates.csv"))
source_df <- read.csv(url)
head(source_df)
#                               group year value
# 1                       Oral cavity    5  56.7
# 2                        Oesophagus    5  14.2
# 3                           Stomach    5  23.8
# 4                             Colon    5  61.7
# 5                            Rectum    5  62.6
# 6  Liver and intrahepatic bile duct    5   7.5
# 7                          Pancreas    5   4.0
# 8                            Larynx    5  68.8
# 9                 Lung and bronchus    5  15.0
# 10                        Melanomas    5  89.0

# Define functions. Source: https://github.com/jkeirstead/r-slopegraph
tufte_sort <-
    function(df,
             x = "year",
             y = "value",
             group = "group",
             method = "tufte",
             min.space = 0.05) {
        ## First rename the columns for consistency
        ids <- match(c(x, y, group), names(df))
        df <- df[, ids]
        names(df) <- c("x", "y", "group")
        
        ## Expand grid to ensure every combination has a defined value
        tmp <- expand.grid(x = unique(df$x), group = unique(df$group))
        tmp <- merge(df, tmp, all.y = TRUE)
        df <- dplyr::mutate(tmp, y = ifelse(is.na(y), 0, y))
        
        ## Cast into a matrix shape and arrange by first column
        require("reshape2")
        tmp <- reshape2::dcast(df, group ~ x, value.var = "y")
        ord <- order(tmp[, 2])
        tmp <- tmp[ord, ]
        
        min.space <- min.space * diff(range(tmp[, -1]))
        yshift <- numeric(nrow(tmp))
        ## Start at "bottom" row
        ## Repeat for rest of the rows until you hit the top
        for (i in 2:nrow(tmp)) {
            ## Shift subsequent row up by equal space so gap between
            ## two entries is >= minimum
            mat <- as.matrix(tmp[(i - 1):i, -1])
            d.min <- min(diff(mat))
            yshift[i] <- ifelse(d.min < min.space, min.space - d.min, 0)
        }
        
        
        tmp <- cbind(tmp, yshift = cumsum(yshift))
        
        scale <- 1
        tmp <-
            reshape2::melt(
                tmp,
                id = c("group", "yshift"),
                variable.name = "x",
                value.name = "y"
            )
        ## Store these gaps in a separate variable so that they can be scaled ypos = a*yshift + y
        
        tmp <- transform(tmp, ypos = y + scale * yshift)
        return(tmp)
        
    }

plot_slopegraph <- function(df) {
    ylabs <- subset(df, x == head(x, 1))$group
    yvals <- subset(df, x == head(x, 1))$ypos
    fontSize <- 3
    gg <- ggplot(df, aes(x = x, y = ypos)) +
        geom_line(aes(group = group), colour = "grey80") +
        geom_point(colour = "white", size = 8) +
        geom_text(aes(label = y), size = fontSize, family = "American Typewriter") +
        scale_y_continuous(name = "",
                           breaks = yvals,
                           labels = ylabs)
    return(gg)
}

## Prepare data
df <- tufte_sort(
    source_df,
    x = "year",
    y = "value",
    group = "group",
    method = "tufte",
    min.space = 0.05
)

df <- transform(df,
                x = factor(
                    x,
                    levels = c(5, 10, 15, 20),
                    labels = c("5 years", "10 years", "15 years", "20 years")
                ),
                y = round(y))

## Plot
plot_slopegraph(df) + labs(title = "Estimates of % survival rates") +
    theme(
        axis.title = element_blank(),
        axis.ticks = element_blank(),
        plot.title = element_text(
            hjust = 0.5,
            family = "American Typewriter",
            face = "bold"
        ),
        axis.text = element_text(family = "American Typewriter",
                                 face = "bold"))

说实话，这个函数过于复杂，我已经放弃读代码了。这个代码如注释里写的，其实是参考 jkeirstead/r-slopegraph 写的。但是我也找到一个 R 包 leeper/slopegraph，这个包就已经包装得很好了，可以直接安装使用。

Seasonal Plot

涉及到时间序列对象 ts 或者 xts 的时候，forecast::ggseasonplot 可以可视化数据的季节性变化情况。下面的例子分别用了自带的时间序列 AirPassengers 和 nottem 作图:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14


library("ggplot2")
library("forecast")
theme_set(theme_classic())

# Subset data for a smaller timewindow
nottem_small <- window(nottem,
                       start = c(1920, 1),
                       end = c(1925, 12))

# Plot
ggseasonplot(AirPassengers) +
    labs(title = "Seasonal plot: International Airline Passengers")
ggseasonplot(nottem_small) +
    labs(title = "Seasonal plot: Air temperatures at Nottingham Castle")

可以看到飞机乘客数是逐年上涨并且是有季节性的模式的。

而这里天气温度虽然没有逐年上涨，但是明显是有相同的季节性变化模式的。

后面的第 7 节 Groups 里的 Hierarchical Dendrogram 图和 Cluster 都比较简单，我用的不多，略。第 8 节是 Spatial 涉及地图作图，我完全用不上，略。

用的代码：ggplot2.R

ggplot2 学习第三部分：The Master List (下)

文章目录

4. Distribution

Histogram

Histogram on a continuous variable

Histogram on a categorical variable

Density plot

Box Plot

Dot + Box Plot

Tufte Boxplot

Violin Plot

Population Pyramid

5. Composition

Waffle Chart

Pie Chart

Treemap

Bar Chart

6. Change

Time Series Plot From a Time Series Object (`ts`)

Time Series Plot From a Data Frame

Time Series Plot For a Monthly Time Series

Time Series Plot For a Yearly Time Series

Time Series Plot From Long Data Format

Time Series Plot From Wide Data Format

Stacked Area Chart

Calendar Heatmap

Slope Chart

Seasonal Plot

文章目录

4. Distribution

Histogram

Histogram on a continuous variable

Histogram on a categorical variable

Density plot

Box Plot

Dot + Box Plot

Tufte Boxplot

Violin Plot

Population Pyramid

5. Composition

Waffle Chart

Pie Chart

Treemap

Bar Chart

6. Change

Time Series Plot From a Time Series Object (ts)

Time Series Plot From a Data Frame

Time Series Plot For a Monthly Time Series

Time Series Plot For a Yearly Time Series

Time Series Plot From Long Data Format

Time Series Plot From Wide Data Format

Stacked Area Chart

Calendar Heatmap

Slope Chart

Seasonal Plot

Time Series Plot From a Time Series Object (`ts`)