R - ggplot2 - Get histogram of difference between two groups -
let's have histogram 2 overlapping groups. here's possible command ggplot2 , pretend output graph.
ggplot2(data, aes(x=variable1, fill=binaryvariable)) + geom_histogram(position="identity")
so have frequency or count of each event. what i'd instead difference between 2 events in each bin. possible? how?
for example, if red minus blue:
- value @ x=2 ~ -10
- value @ x=4 ~ 40 - 200 = -160
- value @ x=6 ~ 190 - 25 = 155
- value @ x=8 ~ 10
i'd prefer using ggplot2, way fine. dataframe set items toy example (dimensions 25000 rows x 30 columns) edited: here example data work with gist example
id variable1 binaryvariable 1 50 t 2 55 t 3 51 n .. .. .. 1000 1001 t 1001 1944 t 1002 1042 n
as can see example, i'm interested in histogram plot variable1 (a continuous variable) separately each binaryvariable (t or n). want difference between frequencies.
so, in order need make sure "bins" use histograms same both levels of indicator variable. here's naive solution (in base r
):
df = data.frame(y = c(rnorm(50), rnorm(50, mean = 1)), x = rep(c(0,1), each = 50)) #full hist fullhist = hist(df$y, breaks = 20) #specify more breaks necessary #create histograms 0 & 1 using breaks full histogram zerohist = with(subset(df, x == 0), hist(y, breaks = fullhist$breaks)) oneshist = with(subset(df, x == 1), hist(y, breaks = fullhist$breaks)) #combine hists combhist = fullhist combhist$counts = zerohist$counts - oneshist$counts plot(combhist)
so specify how many breaks should used (based on values histogram on full data), , compute differences in counts @ each of breaks.
ps might helpful examine non-graphical output of hist()
is.
Comments
Post a Comment