linux - Pearson Correlation between two columns -


good morning. here problem: have several files 1 below:

104 0.1697 12.3513214 15.9136214 112 -0.3146 12.0517303 14.8027303 122 0.2718 10.881109 13.259109 123 -0.4185 11.2880142 14.0237142 128 0.0205 13.0585763 15.4365763 132 0.1562 13.3956582 16.9579582 136 -0.4602 12.2567041 14.6347041 157 0.8142 13.6455927 17.2078927 158 -0.9244 8.0012967 11.5635967 

approximately 10000 files, each file several rows. , need make pearson correlation between column 2 , 4 each file. later, need make average of these correlations. , linux commands. can me, please? thanks

try script. need bash , bc (to operate on floating point numbers).

  • give access execute chmod +x /path/to/pearson.sh
  • change files directory files stored
  • call script no parameters bash /path/to/pearson.sh.

it should produce mean of pearson correlation coefficients calculated on data files.

#! /bin/bash  files=/path/to/files/  function add {   echo $1 + $2 | bc } function sub {   echo $1 - $2 | bc } function mult {   echo $1*$2 | bc } function div {   echo $1 / $2 | bc -l } function sqrt {   echo "sqrt ($1)" | bc -l }  x=0 x2=0 y=0 y2=0 xy=0  r=0 r=0 n=0  f in $files/*;   n=$((n+1))   n=0   while read l;     n=$((n+1))     read -r -a rows <<< $l     x=${rows[1]}     y=${rows[3]}     x=$(add $x $x)     x2=$(add $x2 $(mult $x $x))     y=$(add $y $y)     y2=$(add $y2 $(mult $y $y))     xy=$(add $xy $(mult $x $y))   done < $f;   r=$(add $r $xy)   r=$(sub $r $(div $(mult $x $y) $n))   d1=$(sub $x2 $(div $(mult $x $x) $n))   d2=$(sub $y2 $(div $(mult $y $y) $n))   r=$(div $r $(sqrt $(mult $d1 $d2)))   r=$(add $r $r)   x=0   x2=0   y=0   y2=0   xy=0   r=0   n=0 done  echo mean=$(div $r $n) 

ps: assumed files have format 1 presented. formula evaluate coefficients taken link gave.


Comments

Popular posts from this blog

java - Suppress Jboss version details from HTTP error response -

gridview - Yii2 DataPorivider $totalSum for a column -

Sass watch command compiles .scss files before full sftp upload -