Packages

package root

Definition Classes: root

package com

Definition Classes: root

package here

Definition Classes: com

package platform

Definition Classes: here

package data

Definition Classes: platform

package processing

This package provides the Data Processing Library for building distributed data processing applications.

A Runner both implements the interface with the environment for an application to run, and starts the application. The application, in turn, is driven by a Driver, that controls and performs the distributed processing.

Choose a Runner best suited for the environment where the application runs.

The Driver performs one of more tasks which read layers from input catalogs and write to one or more layers of an output catalog.

The main entry point in the processing library is the com.here.platform.data.processing.driver.DriverBuilder class where you can add different kinds of tasks to the driver. The driver runs the tasks, and commits the final results to the output catalog.

Tasks are implemented using one or more compilers.

The simplest compiler is the direct compiler which maps each input tile to N output tiles. The application needs to define com.here.platform.data.processing.compiler.Direct1ToNCompiler.

Other more complex compilation patterns are based on some kind of dependency tracking between input partitions and output partitions.

The processing Library supports the following patterns:

- com.here.platform.data.processing.compiler.NonIncrementalCompiler: non-incremental compilation only - com.here.platform.data.processing.compiler.DepCompiler: non-incremental dependency calculation and incremental compilation - com.here.platform.data.processing.compiler.IncrementalDepCompiler: incremental dependency calculation and compilation - com.here.platform.data.processing.compiler.Direct1ToNCompiler: incremental compilation where every output tile depends only on one input tile, and this mapping is independent from tile content - com.here.platform.data.processing.compiler.DirectMToNCompiler: incremental compilation where every output tile depends on multiple input tiles, and this mapping is independent from tile content - com.here.platform.data.processing.compiler.MapGroupCompiler: incremental compilation where every output tile can depend on multiple input tiles, and this mapping depend on the tile content - com.here.platform.data.processing.compiler.RefTreeCompiler: fully-managed two phases incremental compilation that can resolve references between input partitions. Input/Output dependency management is implemented and the developer doesn't need to provide this logic

The application's main object normally mixes in the a runner trait (like PipelineRunner) to setup the Driver, and interfaces with the environment where the application is run. See the Main classes in the example compilers for more details.

com.here.platform.data.processing.catalog, com.here.platform.data.processing.blobstore, and com.here.platform.data.processing.publisher contain utilities for accessing catalogs and payloads in a Spark-friendly way, providing an RDD-based abstraction over data and metadata. These classes are used by the processing library, but can also be used independently.

Definition Classes: data

package java

This package provides Java bindings for the Data Processing Library, to build distributed data processing applications in Java.

Choose a Runner best suited for the environment where the application runs.

The Driver performs one of more tasks which read layers from input catalogs and write to one or more layers of an output catalog.

The main entry point in the processing library is the com.here.platform.data.processing.java.driver.DriverBuilder class where you can add different kinds of tasks to the driver. The driver runs the tasks, and commits the final results to the output catalog.

Tasks are implemented using one or more compilers.

The simplest compiler is the direct compiler which maps each input tile to N output tiles. The application needs to implement a com.here.platform.data.processing.java.compiler.Direct1ToNCompiler.

Other more complex compilation patterns are based on different types of dependency tracking between input partitions and output partitions.

The processing Library supports the following patterns:

- com.here.platform.data.processing.java.compiler.NonIncrementalCompiler: non-incremental compilation only - com.here.platform.data.processing.java.compiler.DepCompiler: non-incremental dependency calculation and incremental compilation - com.here.platform.data.processing.java.compiler.IncrementalDepCompiler: incremental dependency calculation and compilation - com.here.platform.data.processing.java.compiler.Direct1ToNCompiler: incremental compilation where every output tile depends only on one input tile, and this mapping is independent from tile content - com.here.platform.data.processing.java.compiler.DirectMToNCompiler: incremental compilation where every output tile can depend on multiple input tiles, and this mapping is independent from tile content - com.here.platform.data.processing.java.compiler.MapGroupCompiler: incremental compilation where every output tile can depend on multiple input tiles, and this mapping depend on the tile content - com.here.platform.data.processing.java.compiler.RefTreeCompiler: fully-managed two phases incremental compilation that can resolve references between input partitions. Input/Output dependency management is implemented and the developer doesn't need to provide this logic

The application's main object normally extends a runner class (like PipelineRunner) to setup the Driver, and interfaces with the environment where the application runs. For more details, see the Main classes in the example compilers.

com.here.platform.data.processing.java.catalog, com.here.platform.data.processing.java.blobstore, and com.here.platform.data.processing.java.publisher contain utilities for accessing catalogs and payloads in a Spark-friendly way, providing an RDD-based abstraction over data and metadata. These classes are used by the processing library, but can also be used independently.

Definition Classes: processing

package spark

Definition Classes: java

package partitioner

Definition Classes: spark

AdaptiveLevelingPartitioner

HashPartitioner

LocalityAwarePartitioner

NameHashPartitioner

PartitionNamePartitioner

PartitionerOfKey

com.here.platform.data.processing.java.spark

partitioner

package partitioner

Type Members

final class AdaptiveLevelingPartitioner extends PartitionNamePartitioner with ScalaPartitionNamePartitionerWrapper
A PartitionerOfKey that uses a precalculated leveling com.here.platform.data.processing.java.leveling.Pattern.
A PartitionerOfKey that uses a precalculated leveling com.here.platform.data.processing.java.leveling.Pattern.
Keys are distributed to Spark partitions strictly following the leveling points that the pattern specifies. Keys left not aggregated by the pattern are distributed among a disjoint set of Spark partitions using a fallback partitioner, if specified. Otherwise they are uniformly distributed over the existing partitions.
The number of partitions used for aggregated keys is fixed and matches the number of leveling points of the pattern.
final class HashPartitioner extends ScalaPartitionerWrapper
final class LocalityAwarePartitioner extends PartitionNamePartitioner with ScalaPartitionNamePartitionerWrapper
Implements a PartitionerOfKey that is aware of the geographic location of the keys and can therefore put keys that are close to each other in the same Spark partition.
Implements a PartitionerOfKey that is aware of the geographic location of the keys and can therefore put keys that are close to each other in the same Spark partition. This increases the data locality and speeds up the processing of Spark worker nodes.
The partitioner detects which com.here.platform.data.processing.java.catalog.partition.Keys are actually HereTiles: in this case keys are grouped at a fixed quadtree level, that is generally higher that the level of the keys (= lower number). Keys that are at a level even higher that the one specified, or keys that are not HERE tiles are partitioned using their hashcode and spread uniformly across all the available partitions. These partitions do not have data locality.
final class NameHashPartitioner extends PartitionNamePartitioner with ScalaPartitionNamePartitionerWrapper
A PartitionerOfKey that assigns keys to Spark partitions based on the hash code of the partition name.
abstract class PartitionNamePartitioner extends PartitionerOfKey
A base class for all Partitioners for com.here.platform.data.processing.java.catalog.partition.Key that guarantees that only the partition name of the key is taken into account when calculating the partition.
A base class for all Partitioners for com.here.platform.data.processing.java.catalog.partition.Key that guarantees that only the partition name of the key is taken into account when calculating the partition. This is important so the processing library can change keys (that also contains catalog and layer) without changing the name and still be sure that the partitioning is not destroyed, improving performances.
trait PartitionerOfKey extends Serializable
Abstract class implementing in Java a type-safe Spark partitioner.
Abstract class implementing in Java a type-safe Spark partitioner. This class is usually employed in interfaces of the processing library to better specify the kind of partitioner needed; compiler developers must provide this class.
A PartitionerOfKey is similar to a Spark org.apache.spark.Partitioner, but the function that has to be implemented to provide the code requires the key of type com.here.platform.data.processing.java.catalog.partition.Key instead of a generic Object required by the Spark org.apache.spark.Partitioner.
equals and hashCode methods must be implemented properly otherwise Spark may introduce unnecessary shuffle operations or assertions in the data processing algorithms may fail.

Packages

partitioner 

package partitioner

Type Members

Ungrouped

partitioner