第19课:火花高级排序彻底解密 - 行业资讯 - 肥雀云

排序在Spark运用程序中使用的比较多，且维度也不一样，如二次排序，三次排序等，在机器学习算法中经常碰到，所以非常重要，必须掌握！

所谓二次排序，就是根据两列值进行排序，如下测试数据：

2 3

4 1

3 2

4 3

8 7

2 1

经过二次排序后的结果（升序）：

2 1

2 3

3 2

4 1

4 3

8 7

在编写二次排序代码前，先简单的写下单个key排序的代码：

val conf=new SparkConf().setAppName("SortByKey").setMaster("local")

val sc=new SparkContext(conf)

val lines=sc.textFile("C:\\User\\Test.txt")

words=

val wordcount=words.map(word=>(word._2,word._1)).(false).map(word=>(word._2,word._1))

wordcount.collect().foreach(println)

以上就是简单的wordcount程序，程序中使用了sortByKey排序

首先我们先通过Java代码实现上面测试数据进行二次排序

排序最主要的就是Key的准备，我们先用Java编写二次排序的key，参考代码如下：

import java.io.Serializable;

public class SecondarySortKey implements {

private int first;

private int second;

@Override

public int hashCode() {

final int prime = 31;

int result = 1;

result = prime * result + first;

result = prime * result + second;

return result;

}

@Override

public boolean equals(Object obj) {

if (this == obj)

return true;

if (obj == null)

return false;

if (getClass() != obj.getClass())

return false;

SecondarySortKey other = (SecondarySortKey) obj;

if (first != other.first)

return false;

if (second != other.second)

return false;

return true;

}

public int getFirst() {

return first;

}

public void setFirst(int first) {

this.first = first;

}

public int getSecond() {

return second;

}

public void setSecond(int second) {

this.second = second;

}

public SecondarySortKey(int first, int second) {

this.first = first;

this.second = second;

}

public boolean $greater(SecondarySortKey other) {

if (this.first > other.getFirst()) {

return true;

} else if (this.first == other.getFirst() && this.second > other.getSecond()) {

return true;