stdin - reading from different files without keeping old content perl -
so have function open file read from, analyse lines , write in new file. call funktion several times different files. noticed, every new function call, lines of previous files read in too. how can prevent that?
content of es.txt: el anarquismo es una filosofía política y social que llama
content of dt.txt: der regel mit veränderungen der chemischen bindungen in
after program run, created file profilede looks (although should contain tokens "dt.txt", tokens "es.txt"):
una ilos ism lític qui lí ti polí socia
-- actual code:
#! /usr/bin/perl use utf8; use warnings; use strict; use list::util qw(min); use open ':encoding(utf8)'; binmode(stdout, ":utf8"); binmode(stdin, ":utf8"); generateprofile("es.txt", "es"); #function call read file es.txt generateprofile("dt.txt", "de"); #second call read file dt.txt sub generateprofile { $file= $_[0]; #taking arguments $lang = $_[1]; open(in, "<:utf8",$file) || die "error"; #to read file open(out, ">:utf8", "profile$lang.txt"); # create , write in file e.g profilede (%ngraml); #any hash later $line; (@words); (%ngraml); (@uni, @bi, @tri, @quad, @five); #array keeps letterkombinations of different length while($line =<in>){ chomp $line; # print $line; # testing: during second function call, print here old content "es.txt" instead of reading "dt.txt" push(@words, $line); } close in; #doesn't closed? foreach $word (@words){ bigramm($word); #split word in different letter combinations } freql(); #fill hash frequences, how many times occures 1 letter combination e.g. "ab" = 2, "tion"=5 print_hashl(); #print hash sub bigramm{ $wort= $_[0]; $i; $k; @letters= split(//, $wort); ($i=0; $i<length($wort)-0; $i++){ ####!!!!! -1? $bi= substr($wort, $i, 1); push(@uni, $bi); } ($i=0; $i<length($wort)-1; $i++){ $bi= substr($wort, $i, 2); push(@bi, $bi); } ($i=0; $i<length($wort)-2; $i++){ $bi= substr($wort, $i, 3); push(@tri, $bi); } ($i=0; $i<length($wort)-3; $i++){ $bi= substr($wort, $i, 4); push(@quad, $bi); } ($i=0; $i<length($wort)-4; $i++){ $bi= substr($wort, $i, 5); push(@five, $bi); } } sub freql{ $duo (@uni, @bi, @tri, @quad, @five){ if(defined $ngraml{$duo}) {$ngraml{$duo}++;} else {$ngraml{$duo}=1;} } } sub print_hashl{ foreach $elem(sort{$ngraml{$b}<=>$ngraml{$a}} keys %ngraml) { print out "$elem\n";} } }
also there warnings, may or may not cause problem? :
"my" variable %ngraml masks earlier declaration in same scope @ stack.pl line 23. variable "@uni" not stay shared @ stack.pl line 46. variable "@bi" not stay shared @ stack.pl line 49. variable "@tri" not stay shared @ stack.pl line 52. variable "@quad" not stay shared @ stack.pl line 55. variable "@five" not stay shared @ stack.pl line 58. variable "@uni" not stay shared @ stack.pl line 63. variable "@bi" not stay shared @ stack.pl line 63. variable "@tri" not stay shared @ stack.pl line 63. variable "@quad" not stay shared @ stack.pl line 63. variable "@five" not stay shared @ stack.pl line 63. variable "%ngraml" not stay shared @ stack.pl line 64. variable "%ngraml" not stay shared @ stack.pl line 70.
while($line =<in>){ chomp $line; print $line; # during second function call, print here old content "es.txt" instead of reading "dt.txt" push(@words, $line); close in; #doesn't closed?
after read 1 line input file, close filehandler. then, when go second line, won't able read file because closed before.
Comments
Post a Comment