scala - How to retrieve record with min value in spark? -


lets have rdd -> (string, date, int)

[("sam", 02-25-2016, 2), ("sam",02-14-2016, 4), ("pam",03-16-2016, 1), ("pam",02-16-2016, 5)] 

and want convert list ->

[("sam", 02-14-2016, 4), ("pam",02-16-2016, 5)] 

where value record date min each key. best way this?

i assume since tagged question being related spark mean rdd opposed list.

making record 2 tuple, key first element allow use reducebykey method, this:

rdd   .map(t => (t._1, (t._2, t._3))   .reducebykey((a, b) => if (a._1 < b._1) else b)   .map(t => (t._1, t._2._1, t._2._2)) 

alternatively, using pattern matching clarity: (i find _* accessors tuples bit confusing read)

rdd   .map {case (name, date, value) => (name, (date, value))}   .reducebykey((a, b) => (a, b) match {      case ((adate, aval), (bdate, bval)) =>         if (adate < bdate) else b   })   .map {case (name, (date, value)) => (name, date, value)} 

replace a._1 < b._1 whatever comparison appropriate date type working with.

see http://spark.apache.org/docs/latest/programming-guide.html#working-with-key-value-pairs documentation on reducebykey, , other things can key/value pairs in spark

if looking plain old scala list, following work:

list   .groupby(_._1)   .mapvalues(l => l.reduce((a, b) => if(a._2 < b._2) else b))   .values   .tolist 

and again pattern matched version clarity:

list   .groupby {case (name, date, value) => name}   .mapvalues(l => l.reduce((a, b) => (a,b) match {     case ((aname, adate, avalue), (bname, bdate, bvalue)) =>        if(adate < bdate) else b   }))   .values   .tolist 

Comments

Popular posts from this blog

gridview - Yii2 DataPorivider $totalSum for a column -

java - Suppress Jboss version details from HTTP error response -

Sass watch command compiles .scss files before full sftp upload -