linux - Pearson Correlation between two columns -
good morning. here problem: have several files 1 below:
104 0.1697 12.3513214 15.9136214 112 -0.3146 12.0517303 14.8027303 122 0.2718 10.881109 13.259109 123 -0.4185 11.2880142 14.0237142 128 0.0205 13.0585763 15.4365763 132 0.1562 13.3956582 16.9579582 136 -0.4602 12.2567041 14.6347041 157 0.8142 13.6455927 17.2078927 158 -0.9244 8.0012967 11.5635967
approximately 10000 files, each file several rows. , need make pearson correlation between column 2 , 4 each file. later, need make average of these correlations. , linux commands. can me, please? thanks
try script. need bash , bc (to operate on floating point numbers).
- give access execute
chmod +x /path/to/pearson.sh
- change files directory files stored
- call script no parameters
bash /path/to/pearson.sh
.
it should produce mean of pearson correlation coefficients calculated on data files.
#! /bin/bash files=/path/to/files/ function add { echo $1 + $2 | bc } function sub { echo $1 - $2 | bc } function mult { echo $1*$2 | bc } function div { echo $1 / $2 | bc -l } function sqrt { echo "sqrt ($1)" | bc -l } x=0 x2=0 y=0 y2=0 xy=0 r=0 r=0 n=0 f in $files/*; n=$((n+1)) n=0 while read l; n=$((n+1)) read -r -a rows <<< $l x=${rows[1]} y=${rows[3]} x=$(add $x $x) x2=$(add $x2 $(mult $x $x)) y=$(add $y $y) y2=$(add $y2 $(mult $y $y)) xy=$(add $xy $(mult $x $y)) done < $f; r=$(add $r $xy) r=$(sub $r $(div $(mult $x $y) $n)) d1=$(sub $x2 $(div $(mult $x $x) $n)) d2=$(sub $y2 $(div $(mult $y $y) $n)) r=$(div $r $(sqrt $(mult $d1 $d2))) r=$(add $r $r) x=0 x2=0 y=0 y2=0 xy=0 r=0 n=0 done echo mean=$(div $r $n)
ps: assumed files have format 1 presented. formula evaluate coefficients taken link gave.
Comments
Post a Comment