java - IndexOutOfBoundsException when trying to add more instances to training set using Weka -


i trying add more instances training set , perform 10-fold cross validation.

my instances in string format use stringtowordvector filter transform them numbers. things work if not add pages want. when add command trainset.addall(data2); , pass trainset filter strange indexoutofboundsexception in first iteration @ instances ftrainset = filter.usefilter(trainset, filter);

instances data = getdatafromfile("pathtofile.arff");//main dataset 1821 instances instances data2 = getdatafromfile("anotherpath.arff");//709 instances want add  int folds = 10; for(int i=0;i<folds;i++){     instances trainset = data.traincv(folds, i);//training set     system.out.println(trainset.numinstances());//prints 1638     instances testset =  data.testcv(folds, i);//testing set      //add more instances     trainset.addall(data2);             system.out.println(trainset.numinstances());//prints 2347      //filter     stringtowordvector filter = new stringtowordvector();     filter.setinputformat(trainset);             filter.setwordstokeep(10000);     filter.settftransform(true);     filter.setlowercasetokens(true);     filter.setoutputwordcounts(true);     stemmer stemmer = new iteratedlovinsstemmer();     filter.setstemmer(stemmer);     wordsfromfile stopwords = new wordsfromfile();     stopwords.setstopwords(new file(".data/stopwords2.txt"));     filter.setstopwordshandler(stopwords);      instances ftrainset = filter.usefilter(trainset, filter);//error!!!     instances ftestset = filter.usefilter(testset, filter);     ....     //classification , evaluation.... 

i following error when trying use filter:

exception in thread "main" java.lang.indexoutofboundsexception: index: 2161, size: 1749     @ java.util.arraylist.rangecheck(unknown source)     @ java.util.arraylist.get(unknown source)     @ weka.core.attribute.addstringvalue(attribute.java:924)     @ weka.core.stringlocator.copystringvalues(stringlocator.java:150)     @ weka.core.stringlocator.copystringvalues(stringlocator.java:91)     @ weka.filters.filter.copyvalues(filter.java:399)     @ weka.filters.filter.bufferinput(filter.java:342)     @ weka.filters.unsupervised.attribute.stringtowordvector.input(stringtowordvector.java:655)     @ weka.filters.filter.usefilter(filter.java:692)     @ crossvalidationexample.main(crossvalidationexample.java:108) 

what wrong?

after searching realize there wrong addall function. 1 reason can think of addall adds references of instances , issue when try use them filter . instead, used merge function proposed here https://stackoverflow.com/a/12359788/3923800 ,so replaced trainset.addall(data2); instances newtrainsettrainset = merge(trainset,data2); , works fine.


Comments

Popular posts from this blog

java - Suppress Jboss version details from HTTP error response -

gridview - Yii2 DataPorivider $totalSum for a column -

Sass watch command compiles .scss files before full sftp upload -