r - Plot many categories -

- March 15, 2013

i've data follow, each experiment lead apparition of composition, , each composition belong 1 or many categories. want plot occurence number of each composition:

df <- read.table(text = " comp         category  comp1             1 comp2             1    comp3             4,2 comp4             1,3 comp1             1,2 comp3             3 ", header = true)  barplot(table(df$comp))

so worked me.

after that, composition belong 1 or many categories. there's comma separations between categories.i want barplot compo in x , nb of compo in y, , each bar % of each category.

my idea duplicate line there comma, repete n+1 number of comma.

df = table(df$category,df$comp) cats <- strsplit(rownames(df), ",", fixed = true) df <- df[rep(seq_len(nrow(df)), sapply(cats, length)),] df <- as.data.frame(unclass(df)) df$cat <- unlist(cats) df <- aggregate(. ~ cat, df, fun = sum)

it give me example: comp1

          1     2     3     4 comp1     2     1     0     0

but if apply method, total number of category (3) won't correspond total number of compositions (comp1=2).

how proceed in such case ? solution devide nb of comma +1 ? if yes, how in code, , there simpliest way ?

thanks lot !

producing plot requires 2 steps, noticed. first, 1 needs prepare data, 1 can create plot.

preparing data

you have shown efforts of bringing data suitable form, let me propose alternative way.

first, have make sure category column of data frame character , not factor. store vector of categories appear in data frame:

df$category <- as.character(df$category) cats <- unique(unlist(strsplit(df$category, ",")))

i need summarise data. purpose, need function gives each value in comp percentage each category scaled such, sum of values gives number of rows in original data comp.

the following function returns information entire data frame in form of data frame (the output needs data frame, because want use function do() later).

cat_perc <- function(cats, vec) {   # percentages   nums <- sapply(cats, function(cat) sum(grepl(cat, vec)))   perc <- nums/sum(nums)   final <- perc * length(vec)   df <- as.data.frame(as.list(final))   names(df) <- cats   return(df) }

running function on complete data frame gives:

cat_perc(cats, df$category) ##          1         4        2        3 ## 1 2.666667 0.6666667 1.333333 1.333333

the values sum six, indeed total number of rows in original data frame.

now want run function each value of comp, can done using dplyr package:

library(dplyr) plot_data <- group_by(df, comp) %>%   do(cat_perc(cats, .$category)) plot_data ## plot_data ## source: local data frame [4 x 5] ## groups: comp [4] ##  ##     comp        1         4         2         3 ##   (fctr)    (dbl)     (dbl)     (dbl)     (dbl) ## 1  comp1 1.333333 0.0000000 0.6666667 0.0000000 ## 2  comp2 1.000000 0.0000000 0.0000000 0.0000000 ## 3  comp3 0.000000 0.6666667 0.6666667 0.6666667 ## 4  comp4 0.500000 0.0000000 0.0000000 0.5000000

this first groups data comp , applies function cat_perc subset of data frame given comp.

i plot data ggplot2 package, requires data in so-called long format. means each data point plotted should correspond row in data frame. (as now, each row contains 4 data points.) can done tidyr package follows:

library(tidyr) plot_data <-  gather(plot_data, category, value, -comp) head(plot_data) ## source: local data frame [6 x 3] ## groups: comp [4] ##  ##     comp category    value ##   (fctr)    (chr)    (dbl) ## 1  comp1        1 1.333333 ## 2  comp2        1 1.000000 ## 3  comp3        1 0.000000 ## 4  comp4        1 0.500000 ## 5  comp1        4 0.000000 ## 6  comp2        4 0.000000

as can see, there single data point per row, characterised comp, category , corresponding value.

plotting data

now read, can plot data using ggplot:

library(ggplot2) ggplot(plot_data, aes(x = comp, y = value, fill = category)) +   geom_bar(stat = "identity")

Search This Blog

Look