implicit final class KeyValueOpsWrapper[K, V] extends AnyVal
Adds some important operation on a key/value-based RDD.
- K
the type of the keys
- V
the type of the values
- Alphabetic
- By Inheritance
- KeyValueOpsWrapper
- AnyVal
- Any
- Hide All
- Show All
- Public
- All
Instance Constructors
-
new
KeyValueOpsWrapper(rdd: RDD[(K, V)])
- rdd
the RDD to process
Value Members
-
final
def
!=(arg0: Any): Boolean
- Definition Classes
- Any
-
final
def
##(): Int
- Definition Classes
- Any
-
final
def
==(arg0: Any): Boolean
- Definition Classes
- Any
-
final
def
asInstanceOf[T0]: T0
- Definition Classes
- Any
-
def
distinctByKey(partitioner: Partitioner)(implicit kt: ClassTag[K], vt: ClassTag[V]): RDD[(K, V)]
Removes duplicate elements from the subject
rdd
efficiently.Removes duplicate elements from the subject
rdd
efficiently. The operation is local on the workers, a shuffle is applied only in caserdd
is not partitioned or it is partitioned differently than the partitioner specified.- partitioner
the partitioner applied to the returned RDD
- returns
a new RDD with no duplicate elements
-
def
distinctByKey()(implicit kt: ClassTag[K], vt: ClassTag[V]): RDD[(K, V)]
Removes duplicate elements from the subject
rdd
efficiently.Removes duplicate elements from the subject
rdd
efficiently. The operation is local on the workers, a shuffle is applied only in caserdd
is not partitioned.- returns
a new RDD with no duplicate elements
-
def
distinctKeys(partitioner: Partitioner)(implicit kt: ClassTag[K], vt: ClassTag[V]): RDD[(K, Unit)]
Calculates an RDD with the keys of the subject
rdd
, with no values and duplicates removed in an efficient way.Calculates an RDD with the keys of the subject
rdd
, with no values and duplicates removed in an efficient way.- partitioner
the partitioner applied to the returned RDD
- returns
the keys of
rdd
without values and duplicates
-
def
distinctKeys()(implicit kt: ClassTag[K], vt: ClassTag[V]): RDD[(K, Unit)]
Calculates an RDD with the keys of the subject
rdd
, with no values and duplicates removed in an efficient way.Calculates an RDD with the keys of the subject
rdd
, with no values and duplicates removed in an efficient way.- returns
the keys of
rdd
without values and duplicates
-
def
flatMapKeysAndRepartition[K2](f: (K) ⇒ Iterable[K2], partitioner: Partitioner)(implicit arg0: ClassTag[K2], kt: ClassTag[K], vt: ClassTag[V]): RDD[(K2, V)]
Flat maps the keys of a pair RDD, possibly duplicating the value.
Flat maps the keys of a pair RDD, possibly duplicating the value.
- K2
The type of the resulting key.
- f
The function to transform the key.
- partitioner
The partitioner applied to the returned RDD.
- returns
An RDD with the same values, but new keys.
-
def
getClass(): Class[_ <: AnyVal]
- Definition Classes
- AnyVal → Any
-
def
intersectKeys[X](other: RDD[(K, X)], partitioner: Partitioner)(implicit arg0: ClassTag[X], kt: ClassTag[K], vt: ClassTag[V]): RDD[(K, Unit)]
Calculates the intersection of the keys of two
rdd
's.Calculates the intersection of the keys of two
rdd
's. Distinct keys are returned, values are stripped.- X
any type, it is not used
- other
the RDD whose keys are intersected with the one of the subject
rdd
- partitioner
the partitioner applied to the returned RDD
- returns
the keys of
rdd
intersected with the one of the RDD without values and duplicates
-
def
intersectKeys[X](other: RDD[(K, X)])(implicit arg0: ClassTag[X], kt: ClassTag[K], vt: ClassTag[V]): RDD[(K, Unit)]
Calculates the intersection of the keys of two
rdd
's.Calculates the intersection of the keys of two
rdd
's. Distinct keys are returned, values are stripped.- X
any type, it is not used
- other
the RDD whose keys are intersected with the one of the subject
rdd
- returns
the keys of
rdd
intersected with the one of the other RDD without values and duplicates
-
final
def
isInstanceOf[T0]: Boolean
- Definition Classes
- Any
-
def
mapKeysAndRepartition[K2](f: (K) ⇒ K2, partitioner: Partitioner)(implicit arg0: ClassTag[K2], kt: ClassTag[K], vt: ClassTag[V]): RDD[(K2, V)]
Maps the keys of a pair RDD, without changing the value.
Maps the keys of a pair RDD, without changing the value.
- K2
The type of the resulting key.
- f
The function to transform the key.
- partitioner
The partitioner applied to the returned RDD.
- returns
An RDD with the same values, but new keys.
-
def
replaceAndDeleteByKey[X](replace: RDD[(K, V)], delete: RDD[(K, X)], partitioner: Partitioner)(implicit arg0: ClassTag[X], kt: ClassTag[K], vt: ClassTag[V]): RDD[(K, V)]
Replaces and deletes elements of
rdd
by key in one single, efficient operation.Replaces and deletes elements of
rdd
by key in one single, efficient operation. Comparison is done by key: if a key is being replaced/deleted, every element with that key already present inrdd
is replaced/deleted as well.- X
any type, it is not used
- replace
the keys and values of new elements that should replace existing elements in
rdd
- delete
the keys of elements that should be deleted from
rdd
- partitioner
the partitioner applied to the returned RDD
- returns
an RDD with its elements replaced/deleted by key.
-
def
replaceAndDeleteByKey[X](replace: RDD[(K, V)], delete: RDD[(K, X)])(implicit arg0: ClassTag[X], kt: ClassTag[K], vt: ClassTag[V]): RDD[(K, V)]
Replaces and deletes elements of
rdd
by key in one single, efficient operation.Replaces and deletes elements of
rdd
by key in one single, efficient operation. Comparison is done by key: if a key is being replaced/deleted, every element with that key already present inrdd
is replaced/deleted as well.- X
any type, it is not used
- replace
the keys and values of new elements that should replace existing elements in
rdd
- delete
the keys of elements that should be deleted from
rdd
- returns
an RDD with its elements replaced/deleted by key.
-
def
replaceByKey(replace: RDD[(K, V)], partitioner: Partitioner)(implicit kt: ClassTag[K], vt: ClassTag[V]): RDD[(K, V)]
Replaces elements of
rdd
by key in one single, efficient operation.Replaces elements of
rdd
by key in one single, efficient operation. Comparision is done by key: if a key is being replaced, every element with that key already present inrdd
is replaced as well.- replace
Keys and values of new elements that should replace existing elements in
rdd
- partitioner
The partitioner applied to the returned RDD
- returns
An RDD with its elements replaced by key.
-
def
replaceByKey(replace: RDD[(K, V)])(implicit kt: ClassTag[K], vt: ClassTag[V]): RDD[(K, V)]
Replaces elements of
rdd
by key in one single, efficient operation.Replaces elements of
rdd
by key in one single, efficient operation. Comparision is done by key: if a key is being replaced, every element with that key already present inrdd
is replaced as well.- replace
the keys and values of new elements that should replace existing elements in
rdd
- returns
an RDD with its elements replaced by key.
-
def
stripValues()(implicit kt: ClassTag[K], vt: ClassTag[V]): RDD[(K, Unit)]
Strips values.
Strips values.
In many computations we do not need any value associated to keys in RDD's, in that case the need to set the value part of the key-value RDD to scala.Unit.
- returns
an
rdd
having the same key as the input one and scala.Unit as values
- Note
input partitioning is kept
-
def
toString(): String
- Definition Classes
- Any
-
def
unionKeys[X](other: RDD[(K, X)], partitioner: Partitioner)(implicit arg0: ClassTag[X], kt: ClassTag[K], vt: ClassTag[V]): RDD[(K, Unit)]
Calculates an RDD with the union of the keys of the subject
rdd
and of another RDD.Calculates an RDD with the union of the keys of the subject
rdd
and of another RDD. No duplicate keys are returned.- X
any type, it is not used
- other
the other RDD to perform the union
- partitioner
the partitioner applied to the returned RDD
- returns
the keys of
rdd
and the other RDD without values and duplicates
-
def
unionKeys[X](other: RDD[(K, X)])(implicit arg0: ClassTag[X], kt: ClassTag[K], vt: ClassTag[V]): RDD[(K, Unit)]
Calculates an RDD with the union of the keys of the subject
rdd
and of another RDD.Calculates an RDD with the union of the keys of the subject
rdd
and of another RDD. No duplicate keys are returned.- X
any type, it is not used
- other
the other RDD to perform the union
- returns
the keys of
rdd
and the other RDD without values and duplicates