BASH - Splitting string at special occurence of character (underscore), depending on total number of underscores in string -
i have data frame several columns , lines 1 column contains different strings, each string being composed of different number of underscores. want split each string in half, depending on number of occurrences.
example:
id_1 id_2 haplotypeid ... a_b_a_b a_b_a_b hap.1.1 ... a_b_c_a_b_c a_b_c_a_b_c hap.1.2 ... a_b_c_d_a_b_c_d a_b_c_d_a_b_c_d hap.2.1 ... a_b_c_d_e_a_b_c_d_e a_b_c_d_e_a_b_c_d_e hap.2.1 ... ... ... ... ...
the output be:
id_1 id_2 haplotypeid ... a_b a_b hap.1.1 ... a_b_c a_b_c hap.1.2 ... a_b_c_d a_b_c_d hap.2.1 ... a_b_c_d_e a_b_c_d_e hap.2.1 ... ... ... ... ...
i hope can me. thank in advance!
you can use sed
this:
$ cat input.txt id_1 id_2 haplotypeid ... a_b_a_b a_b_a_b hap.1.1 ... a_b_d_a_b_d a_b_c_a_b_c hap.1.2 ... a_b_c_d_a_b_c_d a_b_c_d_a_b_c_d hap.2.1 ... a_b_c_d_e_a_b_c_d_e a_b_c_d_e_a_b_c_d_e hap.2.1 ... ... ... ... ... $ sed -r 's/(^| )([^ ]*)_\2/\1\2/g' input.txt | column -t id_1 id_2 haplotypeid ... a_b a_b hap.1.1 ... a_b_d a_b_c hap.1.2 ... a_b_c_d a_b_c_d hap.2.1 ... a_b_c_d_e a_b_c_d_e hap.2.1 ... ... ... ... ...
or
$ sed -r 's/(^| )( *)\2([^ ]*)_\3/\1\2\3/g' inp id_1 id_2 haplotypeid ... a_b a_b hap.1.1 ... a_b_d a_b_c hap.1.2 ... a_b_c_d a_b_c_d hap.2.1 ... a_b_c_d_e a_b_c_d_e hap.2.1 ... ... ... ... ...
logic:
replace (string)_(repeat of same string) (string)
in sed
(& other regex based tools), \1
/\2
/\3
etc. refer backreferences of previous matches.
Comments
Post a Comment