classification - Azure machine learning even sampling -

- July 15, 2015

i'm trying basic multi-label classification in azure ml. have basic data in following format:

value_x value_y label x1      y1      label1 x2      y2      label1 x3      y3      label2 .....

my problem in data labels (out of total of five) overrepresented, 40% of data label1, 20% label 2 , rest around 10%.

i sampling out of these train model, each label represented in equal amounts.

tried stratification option in sampling module on labels column, gives me sampling same distribution of labels in initial dataset.

any idea how module?

i able using combination of split data, partition , sample, , add rows modules. there may easier way it, did confirm works. :) published work @ http://gallery.azureml.net/details/1245147fd7004e91bc7a3683cda19cc7 can grab directly there, , run confirm expect.

since said wanted sampling of data, reduced each of labels 10% have labels represented equally. since have understanding of distribution in dataset, leave label 3, 4, , 5 @ 10%, , reduce label 1 1/4 , label 2 1/2 10% of them well.

to explain did in workspace linked above:

i used "split data" modules filter out label1 , label2 data. in split data module, change splitting mode "regular expression" , set regular expression \"label" ^label1 (to label1 data, example).
then used "partition , sample" modules reduce size of label1 , label2 data appropriately.
finally, used "add rows" modules join of data again.

finally, didn't include in work, can @ smote module. increase number of low-occurring samples using synthetic minority oversampling.

Search This Blog

Look

classification - Azure machine learning even sampling -

Comments

Post a Comment

Popular posts from this blog

filehandler - java open files not cleaned, even when the process is killed -

java - Suppress Jboss version details from HTTP error response -

gridview - Yii2 DataPorivider $totalSum for a column -