A sina plot visualizes a single variable across classes, with jitter width reflecting the data's density in each class.
import pandas as pd
from lets_plot import *
LetsPlot.setup_html()
df = pd.read_csv("https://raw.githubusercontent.com/JetBrains/lets-plot-docs/refs/heads/master/data/mpg.csv")
print(df.shape)
df.head()
g = ggplot(df, aes("drv", "hwy"))
g + geom_sina(seed=42)
gggrid([
g + geom_boxplot() + ggtitle("geom_boxplot()", "Show distribution but not sample size"),
g + geom_violin() + ggtitle("geom_violin()", "Show distribution but not sample size"),
g + geom_jitter(seed=42) + ggtitle("geom_jitter()", "Show sample size but not distribution"),
g + geom_sina(seed=42) + ggtitle("geom_sina()", "Show both distribution and sample size"),
], ncol=2)
Sometimes vertically adjusting points might be desirable:
overlapping values, where multiple observations share the exact same y-value;
integerish banding, where values are close to integers and appear artificially grouped into horizontal bands.
In these cases, you may consider using a position adjustment.
gggrid([
g + geom_sina(seed=42) + ggtitle("Default position"),
g + geom_sina(seed=42, position=position_jitter(width=0, seed=42)) + ggtitle("'jitter' position"),
])
Use the 'jitterdodge' position adjustment if additional grouping is required:
gggrid([
g + geom_sina(aes(color=as_discrete("year")), seed=42) + \
scale_color_discrete(format="d") + \
ggtitle("Default position"),
g + geom_sina(aes(color=as_discrete("year")), seed=42,
position=position_jitterdodge(jitter_width=0, seed=42)) + \
scale_color_discrete(format="d") + \
ggtitle("'jitterdodge' position"),
])
In a sina, points are randomly positioned within the violin boundaries when both layers use the same parameters.
g + \
geom_violin(bw=1.5) + \
geom_sina(bw=1.5, seed=42)
g + \
geom_violin(aes(color='..quantile..', fill='..quantile..'), alpha=.5) + \
geom_sina(aes(color='..quantile..'), size=2, seed=42) + \
scale_continuous(['color', 'fill'], low="#1a9641", high="#d7191c")
scale Values¶gggrid([
g + \
geom_violin(scale='width') + \
geom_sina(scale='width', size=1.5, seed=42) + \
ggtitle("scale='width'"),
g + \
geom_violin(scale='area') + \
geom_sina(scale='area', size=1.5, seed=42) + \
ggtitle("scale='area'"),
g + \
geom_violin(scale='count') + \
geom_sina(scale='count', size=1.5, seed=42) + \
ggtitle("scale='count'"),
])
gggrid([
g + geom_violin() + ggtitle("Violin\nstat='ydensity' (default)"),
g + geom_violin(stat='sina') + ggtitle("Violin\nstat='sina'"),
g + geom_sina(size=1.5, seed=42, stat='ydensity') + ggtitle("Sina\nstat='ydensity'"),
g + geom_sina(size=1.5, seed=42) + ggtitle("Sina\nstat='sina' (default)"),
], ncol=2)
show_half Parameter¶g + \
geom_violin(show_half=-1, size=0, fill="gray85") + \
geom_sina(show_half=1, seed=42)
g + \
geom_violin(aes(fill="drv"), show_half=1, size=0, position=position_nudge(x=.07)) + \
geom_boxplot(aes(fill="drv"), color="white", width=.1, outlier_alpha=0) + \
geom_sina(aes(color="drv"), show_half=-1, seed=42,
position=position_nudge(x=-.07)) + \
scale_color_brewer(palette="Set2") + \
scale_fill_brewer(palette="Pastel2") + \
facet_grid(x="year") + \
coord_flip() + \
theme_light() + flavor_solarized_dark()