How to split a character column into multiple columns in R -
i have dataframe x
:
dput(x) structure(list(district = structure(c(6l, 6l, 6l, 6l, 6l, 6l), .label = c("district - central (06)", "district - east (04)", "district - new delhi (05)", "district - north (02)", "district - north east (03)", "district - north west (01)", "district - south (09)", "district - south west (08)", "district - west (07)"), class = "factor"), age = structure(c(103l, 1l, 2l, 14l, 25l, 36l), .label = c("0", "1", "10", "100+", "11", "12", "13", "14", "15", "16", "17", "18", "19", "2", "20", "21", "22", "23", "24", "25", "26", "27", "28", "29", "3", "30", "31", "32", "33", "34", "35", "36", "37", "38", "39", "4", "40", "41", "42", "43", "44", "45", "46", "47", "48", "49", "5", "50", "51", "52", "53", "54", "55", "56", "57", "58", "59", "6", "60", "61", "62", "63", "64", "65", "66", "67", "68", "69", "7", "70", "71", "72", "73", "74", "75", "76", "77", "78", "79", "8", "80", "81", "82", "83", "84", "85", "86", "87", "88", "89", "9", "90", "91", "92", "93", "94", "95", "96", "97", "98", "99", "age not stated", "all ages"), class = "factor"), total = c(3656539l, 56131l, 58644l, 63835l, 63859l, 64945l), rural = c(213950l, 3589l, 3757l, 4200l, 4102l, 4223l), urban = c(3442589l, 52542l, 54887l, 59635l, 59757l, 60722l)), .names = c("district", "age", "total", "rural", "urban"), row.names = c(na, 6l), class = "data.frame")
i want split district
column extract name of district new column name
. e.g. "district - north west (01)" should split give "north west". tried str_split_fixed
, got:
x district age total rural urban 1 name 1 district - north west (01) ages 3656539 213950 3442589 north west (01) 2 district - north west (01) 0 56131 3589 52542 north west (01) 3 district - north west (01) 1 58644 3757 54887 north west (01) 4 district - north west (01) 2 63835 4200 59635 north west (01) 5 district - north west (01) 3 63859 4102 59757 north west (01) 6 district - north west (01) 4 64945 4223 60722 north west (01)
i try using same function again split name
column separate district name code, gives me following error:
error in stri_split_regex(string, pattern, n = n, simplify = true, opts_regex = attr(pattern, : incorrectly nested parentheses in regexp pattern. (u_regex_mismatched_paren)
is there way split character column multiple columns based on pattern in single function?
you can want gsub
:
gsub("^.* +- +([a-za-z ]+) \\(.*$", "\\1", df$district) [1] "north west" "north west" "north west" "north west" "north west" "north west"
the first argument gsub
("^.* +- +([a-za-z ]+) \(.*$") regular expression. can interpreted follows:
from the beginning of string "^", match characters ".*" followed @ least 1 space, hyphen, , @ least 1 space " +- +". capture next text "()" made of (at least one) letters , spaces "[a-za-z ]+". stop capturing when reach space followed parenthesis " \\(", match until end of text ".*$".
the second argument of gsub
, "\\1" says replace text text captured parentheses.
to assign variable:
df$name <- gsub("^.* +- +([a-za-z ]+) \\(.*$", "\\1", df$district)
Comments
Post a Comment