BASH - Splitting string at special occurence of character (underscore), depending on total number of underscores in string -

- April 15, 2013

i have data frame several columns , lines 1 column contains different strings, each string being composed of different number of underscores. want split each string in half, depending on number of occurrences.

example:

               id_1                    id_2     haplotypeid    ...             a_b_a_b                 a_b_a_b         hap.1.1    ...         a_b_c_a_b_c             a_b_c_a_b_c         hap.1.2    ...     a_b_c_d_a_b_c_d         a_b_c_d_a_b_c_d         hap.2.1    ... a_b_c_d_e_a_b_c_d_e     a_b_c_d_e_a_b_c_d_e         hap.2.1    ...                 ...                     ...             ...    ...

the output be:

           id_1             id_2      haplotypeid    ...             a_b              a_b          hap.1.1    ...           a_b_c            a_b_c          hap.1.2    ...         a_b_c_d          a_b_c_d          hap.2.1    ...       a_b_c_d_e        a_b_c_d_e          hap.2.1    ...             ...              ...              ...    ...

i hope can me. thank in advance!

you can use sed this:

$ cat input.txt                id_1                    id_2     haplotypeid    ...             a_b_a_b                 a_b_a_b         hap.1.1    ...         a_b_d_a_b_d             a_b_c_a_b_c         hap.1.2    ...     a_b_c_d_a_b_c_d         a_b_c_d_a_b_c_d         hap.2.1    ... a_b_c_d_e_a_b_c_d_e     a_b_c_d_e_a_b_c_d_e         hap.2.1    ...                 ...                     ...             ...    ...  $ sed -r 's/(^| )([^ ]*)_\2/\1\2/g' input.txt | column -t id_1       id_2       haplotypeid  ... a_b        a_b        hap.1.1      ... a_b_d      a_b_c      hap.1.2      ... a_b_c_d    a_b_c_d    hap.2.1      ... a_b_c_d_e  a_b_c_d_e  hap.2.1      ... ...        ...        ...          ...

$ sed -r 's/(^| )( *)\2([^ ]*)_\3/\1\2\3/g' inp                id_1                    id_2     haplotypeid    ...       a_b         a_b         hap.1.1    ...     a_b_d       a_b_c         hap.1.2    ...   a_b_c_d     a_b_c_d         hap.2.1    ... a_b_c_d_e   a_b_c_d_e         hap.2.1    ...             ...                     ...             ...    ...

logic:
replace (string)_(repeat of same string) (string)
in sed (& other regex based tools), \1/\2/\3 etc. refer backreferences of previous matches.

Search This Blog

Look

BASH - Splitting string at special occurence of character (underscore), depending on total number of underscores in string -

Comments

Post a Comment

Popular posts from this blog

filehandler - java open files not cleaned, even when the process is killed -

java - Suppress Jboss version details from HTTP error response -

gridview - Yii2 DataPorivider $totalSum for a column -