How to split a character column into multiple columns in R -

- January 15, 2010

i have dataframe x:

dput(x) structure(list(district = structure(c(6l, 6l, 6l, 6l, 6l, 6l), .label = c("district - central (06)",  "district - east (04)", "district - new delhi (05)", "district - north (02)",  "district - north east (03)", "district - north west (01)", "district - south (09)",  "district - south west (08)", "district - west (07)"), class = "factor"),      age = structure(c(103l, 1l, 2l, 14l, 25l, 36l), .label = c("0",      "1", "10", "100+", "11", "12", "13", "14", "15", "16", "17",      "18", "19", "2", "20", "21", "22", "23", "24", "25", "26",      "27", "28", "29", "3", "30", "31", "32", "33", "34", "35",      "36", "37", "38", "39", "4", "40", "41", "42", "43", "44",      "45", "46", "47", "48", "49", "5", "50", "51", "52", "53",      "54", "55", "56", "57", "58", "59", "6", "60", "61", "62",      "63", "64", "65", "66", "67", "68", "69", "7", "70", "71",      "72", "73", "74", "75", "76", "77", "78", "79", "8", "80",      "81", "82", "83", "84", "85", "86", "87", "88", "89", "9",      "90", "91", "92", "93", "94", "95", "96", "97", "98", "99",      "age not stated", "all ages"), class = "factor"), total = c(3656539l,      56131l, 58644l, 63835l, 63859l, 64945l), rural = c(213950l,      3589l, 3757l, 4200l, 4102l, 4223l), urban = c(3442589l, 52542l,      54887l, 59635l, 59757l, 60722l)), .names = c("district",  "age", "total", "rural", "urban"), row.names = c(na, 6l), class = "data.frame")

i want split district column extract name of district new column name. e.g. "district - north west (01)" should split give "north west". tried str_split_fixed , got:

x                     district      age   total  rural   urban 1    name 1 district - north west (01) ages 3656539 213950 3442589      north west (01) 2 district - north west (01)        0   56131   3589   52542      north west (01) 3 district - north west (01)        1   58644   3757   54887      north west (01) 4 district - north west (01)        2   63835   4200   59635      north west (01) 5 district - north west (01)        3   63859   4102   59757      north west (01) 6 district - north west (01)        4   64945   4223   60722      north west (01)

i try using same function again split name column separate district name code, gives me following error:

error in stri_split_regex(string, pattern, n = n, simplify = true, opts_regex = attr(pattern, : incorrectly nested parentheses in regexp pattern. (u_regex_mismatched_paren)

is there way split character column multiple columns based on pattern in single function?

you can want gsub:

gsub("^.* +- +([a-za-z ]+) \\(.*$", "\\1", df$district) [1] "north west" "north west" "north west" "north west" "north west" "north west"

the first argument gsub ("^.* +- +([a-za-z ]+) \(.*$") regular expression. can interpreted follows:

from the beginning of string "^", match characters ".*" followed @ least 1 space, hyphen, , @ least 1 space " +- +". capture next text "()" made of (at least one) letters , spaces "[a-za-z ]+". stop capturing when reach space followed parenthesis " \\(", match until end of text ".*$".

the second argument of gsub, "\\1" says replace text text captured parentheses.

to assign variable:

df$name <- gsub("^.* +- +([a-za-z ]+) \\(.*$", "\\1", df$district)

Search This Blog

Look

How to split a character column into multiple columns in R -

Comments

Post a Comment

Popular posts from this blog

filehandler - java open files not cleaned, even when the process is killed -

java - Suppress Jboss version details from HTTP error response -

gridview - Yii2 DataPorivider $totalSum for a column -