今天在对reduce的参数Iterable进行迭代时,发现一个问题,即Iterator的next()方法每次返回的是同一个对象,next()只是修改了Writable对象的值,而不是重新返回一个新的Writable对象。
使用wordcount来验证:
我的代码如下:
protected void reduce(Text key, Iterable<IntWritable> values,
Reducer<Text, IntWritable, Text, IntWritable>.Context context)
throws IOException, InterruptedException {
int sum = 0;
// 保存每个IntWritable到list
List<IntWritable> intWritables = new ArrayList<IntWritable>();
for (IntWritable val : values) {
intWritables.add(val);
sum += val.get();
}
if(intWritables.size() > 1) {
// 当list size大于1时,验证第一个元素和第二个元素是否是同一个对象
System.out.println("objects is same -> "
+ (intWritables.get(0) == intWritables.get(1)));
}
result.set(sum);
context.write(key, result);
}
日志输出:
objects is same -> true
这个Iterable的实现是org.apache.hadoop.mapreduce.task.ReduceContextImpl.ValueIterable
Iterator实现是org.apache.hadoop.mapreduce.task.ReduceContextImpl.ValueIterator
其中next()实现时,调用的是org.apache.hadoop.io.serializer.WritableSerialization的deserialize(Writable w)方法,
Writable deserialize(Writable w) IOException {
Writable writable;
(w == ) {
writable
= (Writable) ReflectionUtils.(, getConf());
} {
writable = w;
}
writable.readFields();
writable;
}
该方法只是调用了入参w的readFields方法,并没有创建新对象,除非w是null