HERE Data SDK - Scala API references - com.here.platform.data.processing.spark.partitioner

case class AdapterPartitioner[K](p: org.apache.spark.Partitioner)(implicit evidence$1: ClassTag[K]) extends Partitioner[K] with Product with Serializable

Service class that adapts an existing org.apache.spark.Partitioner, that can by definition return the partition identifier for objects of scala.Any type, to the more restrictive interface of Partitioner of K, that can work with keys of type K only.

K: the type of the keys to be partitioned

case class AdaptiveLevelingPartitioner(pattern: AdaptivePattern, fallbackPartitioner: Option[PartitionNamePartitioner] = None) extends PartitionNamePartitioner with Product with Serializable

A Partitioner for com.here.platform.data.processing.catalog.Partition.Keys that uses a precalculated com.here.platform.data.processing.leveling.AdaptivePattern.

Keys are distributed to Spark partitions strictly following the leveling points that the pattern specifies. Keys left not aggregated by the pattern are distributed among a disjoint set of Spark partitions using a fallback partitioner, if specified. Otherwise they are uniformly distributed over the existing partitions.

The number of partitions used for aggregated keys is fixed and matches the number of leveling points of the pattern.

pattern: The adaptive leveling pattern that controls the partitioning.
fallbackPartitioner: The optional partitioner used for non-aggregated keys. If undefined, non-aggregated keys are uniformly distributed over the existing partitions.

case class KeyUnpackPartitioner[K](p: Partitioner[K])(implicit evidence$1: ClassTag[K]) extends Partitioner[K] with Product with Serializable

A partitioner for keys K that detect the K even if it is inside a pair.

Given a partitioner for keys K, this implements a partitioner that is able to use the given partitioner for K in case K is the key of the RDD passed or the key of the RDD passed is made of a pair (K, _). In this case the second member of the pair is discarded and the resulting partition is calculated on the first member K directly.

K: the type of the keys to be partitioned
p: the actual partitioner used to calculate the partition for a key K

Note: In some cases it is need to repartition some status component from (Ko, Ki) tuple types to Ko. In this case partitioning on only the first component of key, in case this is a tuple could be beneficial. Please note also that currently the partitioner does not offer a real type safety respect to (Ko, Ki) tuple types, that it is why it derives from org.apache.spark.Partitioner instead of Partitioner.

case class LocalityAwarePartitioner(numPartitions: Int, level: Int) extends PartitionNamePartitioner with Product with Serializable

Implements a Partitioner for com.here.platform.data.processing.catalog.Partition.Key that is aware of the geographic location of the keys and can therefore put keys that are close to each other in the same Spark partition.

Implements a Partitioner for com.here.platform.data.processing.catalog.Partition.Key that is aware of the geographic location of the keys and can therefore put keys that are close to each other in the same Spark partition. This increases the data locality and speeds up the processing of Spark worker nodes.

The partitioner detects which com.here.platform.data.processing.catalog.Partition.Keys are actually HereTiles: in this case keys are grouped at a fixed quadtree level, that is generally higher that the level of the keys (= lower number). Keys that are at a level even higher that the one specified, or keys that are not HERE tiles are partitioned using their hashcode and spread uniformly across all the available partitions. These partitions do not have data locality.

numPartitions: The overall number of partitions.
level: The level by which to group HereTiles in the same Spark partition.

case class NameHashPartitioner(numPartitions: Int) extends PartitionNamePartitioner with Product with Serializable

A Partitioner for com.here.platform.data.processing.catalog.Partition.Key calculating partition basing on the hash code of the name

abstract class PartitionNamePartitioner extends Partitioner[KeyOrName]

A base class for all Partitioners for com.here.platform.data.processing.catalog.Partition.Key that guarantees that only the partition name of the key is taken into account when calculating the partition.

A base class for all Partitioners for com.here.platform.data.processing.catalog.Partition.Key that guarantees that only the partition name of the key is taken into account when calculating the partition. This is important so the processing library can change keys (that also contains catalog and layer) without changing the name and still be sure that the partitioning is not destroyed, improving performances.

A PartitionNamePartitioner accepts either a Partition.Key or a Partition.Name as a key; in both cases getPartitionForName will be used to determine the partition.

abstract class Partitioner[-K] extends org.apache.spark.Partitioner

Abstract class implementing a type-safe Spark partitioner.

Abstract class implementing a type-safe Spark partitioner. This class is usually employed in interfaces of the processing library to better specify the kind of partitioner needed; compiler developers must provide this class.

A Partitioner is also a Spark org.apache.spark.Partitioner, but the function that has to be implemented to provide the code requires the key of type K instead of the scala.Any required by the Spark org.apache.spark.Partitioner.

equals and hashCode methods must be implemented properly otherwise Spark may introduce unnecessary shuffle operations or assertions in the data processing algorithms may fail.

K: the type of the keys

Packages

partitioner

package partitioner

Type Members

Value Members

Ungrouped