java - IndexOutOfBoundsException when trying to add more instances to training set using Weka -
i trying add more instances training set , perform 10-fold cross validation.
my instances in string format use stringtowordvector filter transform them numbers. things work if not add pages want. when add command trainset.addall(data2);
, pass trainset
filter strange indexoutofboundsexception
in first iteration @ instances ftrainset = filter.usefilter(trainset, filter);
instances data = getdatafromfile("pathtofile.arff");//main dataset 1821 instances instances data2 = getdatafromfile("anotherpath.arff");//709 instances want add int folds = 10; for(int i=0;i<folds;i++){ instances trainset = data.traincv(folds, i);//training set system.out.println(trainset.numinstances());//prints 1638 instances testset = data.testcv(folds, i);//testing set //add more instances trainset.addall(data2); system.out.println(trainset.numinstances());//prints 2347 //filter stringtowordvector filter = new stringtowordvector(); filter.setinputformat(trainset); filter.setwordstokeep(10000); filter.settftransform(true); filter.setlowercasetokens(true); filter.setoutputwordcounts(true); stemmer stemmer = new iteratedlovinsstemmer(); filter.setstemmer(stemmer); wordsfromfile stopwords = new wordsfromfile(); stopwords.setstopwords(new file(".data/stopwords2.txt")); filter.setstopwordshandler(stopwords); instances ftrainset = filter.usefilter(trainset, filter);//error!!! instances ftestset = filter.usefilter(testset, filter); .... //classification , evaluation....
i following error when trying use filter:
exception in thread "main" java.lang.indexoutofboundsexception: index: 2161, size: 1749 @ java.util.arraylist.rangecheck(unknown source) @ java.util.arraylist.get(unknown source) @ weka.core.attribute.addstringvalue(attribute.java:924) @ weka.core.stringlocator.copystringvalues(stringlocator.java:150) @ weka.core.stringlocator.copystringvalues(stringlocator.java:91) @ weka.filters.filter.copyvalues(filter.java:399) @ weka.filters.filter.bufferinput(filter.java:342) @ weka.filters.unsupervised.attribute.stringtowordvector.input(stringtowordvector.java:655) @ weka.filters.filter.usefilter(filter.java:692) @ crossvalidationexample.main(crossvalidationexample.java:108)
what wrong?
after searching realize there wrong addall
function. 1 reason can think of addall
adds references of instances , issue when try use them filter
. instead, used merge function proposed here https://stackoverflow.com/a/12359788/3923800 ,so replaced trainset.addall(data2);
instances newtrainsettrainset = merge(trainset,data2);
, works fine.
Comments
Post a Comment