Packages

package root

Definition Classes: root

package com

Definition Classes: root

package here

Definition Classes: com

package platform

Definition Classes: here

package data

Definition Classes: platform

package processing

This package provides the Data Processing Library for building distributed data processing applications.

A Runner both implements the interface with the environment for an application to run, and starts the application. The application, in turn, is driven by a Driver, that controls and performs the distributed processing.

Choose a Runner best suited for the environment where the application runs.

The Driver performs one of more tasks which read layers from input catalogs and write to one or more layers of an output catalog.

The main entry point in the processing library is the com.here.platform.data.processing.driver.DriverBuilder class where you can add different kinds of tasks to the driver. The driver runs the tasks, and commits the final results to the output catalog.

Tasks are implemented using one or more compilers.

The simplest compiler is the direct compiler which maps each input tile to N output tiles. The application needs to define com.here.platform.data.processing.compiler.Direct1ToNCompiler.

Other more complex compilation patterns are based on some kind of dependency tracking between input partitions and output partitions.

The processing Library supports the following patterns:

- com.here.platform.data.processing.compiler.NonIncrementalCompiler: non-incremental compilation only - com.here.platform.data.processing.compiler.DepCompiler: non-incremental dependency calculation and incremental compilation - com.here.platform.data.processing.compiler.IncrementalDepCompiler: incremental dependency calculation and compilation - com.here.platform.data.processing.compiler.Direct1ToNCompiler: incremental compilation where every output tile depends only on one input tile, and this mapping is independent from tile content - com.here.platform.data.processing.compiler.DirectMToNCompiler: incremental compilation where every output tile depends on multiple input tiles, and this mapping is independent from tile content - com.here.platform.data.processing.compiler.MapGroupCompiler: incremental compilation where every output tile can depend on multiple input tiles, and this mapping depend on the tile content - com.here.platform.data.processing.compiler.RefTreeCompiler: fully-managed two phases incremental compilation that can resolve references between input partitions. Input/Output dependency management is implemented and the developer doesn't need to provide this logic

The application's main object normally mixes in the a runner trait (like PipelineRunner) to setup the Driver, and interfaces with the environment where the application is run. See the Main classes in the example compilers for more details.

com.here.platform.data.processing.catalog, com.here.platform.data.processing.blobstore, and com.here.platform.data.processing.publisher contain utilities for accessing catalogs and payloads in a Spark-friendly way, providing an RDD-based abstraction over data and metadata. These classes are used by the processing library, but can also be used independently.

Definition Classes: data

package java

This package provides Java bindings for the Data Processing Library, to build distributed data processing applications in Java.

Choose a Runner best suited for the environment where the application runs.

The Driver performs one of more tasks which read layers from input catalogs and write to one or more layers of an output catalog.

The main entry point in the processing library is the com.here.platform.data.processing.java.driver.DriverBuilder class where you can add different kinds of tasks to the driver. The driver runs the tasks, and commits the final results to the output catalog.

Tasks are implemented using one or more compilers.

The simplest compiler is the direct compiler which maps each input tile to N output tiles. The application needs to implement a com.here.platform.data.processing.java.compiler.Direct1ToNCompiler.

Other more complex compilation patterns are based on different types of dependency tracking between input partitions and output partitions.

The processing Library supports the following patterns:

- com.here.platform.data.processing.java.compiler.NonIncrementalCompiler: non-incremental compilation only - com.here.platform.data.processing.java.compiler.DepCompiler: non-incremental dependency calculation and incremental compilation - com.here.platform.data.processing.java.compiler.IncrementalDepCompiler: incremental dependency calculation and compilation - com.here.platform.data.processing.java.compiler.Direct1ToNCompiler: incremental compilation where every output tile depends only on one input tile, and this mapping is independent from tile content - com.here.platform.data.processing.java.compiler.DirectMToNCompiler: incremental compilation where every output tile can depend on multiple input tiles, and this mapping is independent from tile content - com.here.platform.data.processing.java.compiler.MapGroupCompiler: incremental compilation where every output tile can depend on multiple input tiles, and this mapping depend on the tile content - com.here.platform.data.processing.java.compiler.RefTreeCompiler: fully-managed two phases incremental compilation that can resolve references between input partitions. Input/Output dependency management is implemented and the developer doesn't need to provide this logic

The application's main object normally extends a runner class (like PipelineRunner) to setup the Driver, and interfaces with the environment where the application runs. For more details, see the Main classes in the example compilers.

com.here.platform.data.processing.java.catalog, com.here.platform.data.processing.java.blobstore, and com.here.platform.data.processing.java.publisher contain utilities for accessing catalogs and payloads in a Spark-friendly way, providing an RDD-based abstraction over data and metadata. These classes are used by the processing library, but can also be used independently.

Definition Classes: processing

package compiler

Definition Classes: java

package direct

Definition Classes: compiler

package mapgroup

Definition Classes: compiler

package reftree

Definition Classes: compiler

IncrementalDepCompiler

NonIncrementalCompiler

com.here.platform.data.processing.java.compiler

MapGroupCompiler

trait MapGroupCompiler[T] extends InputLayers with InputOptPartitioner with CompileInFn[T] with OutputLayers with OutputOptPartitioner with CompileOutFn[T]

Compiler to implement a generic Map-Reduce pattern, where the reduce function is group-by. The front-end compiler each input partition and produces the list of output partition that this input affect, each with a value of custom type. Values are then grouped per output partition and passed to the back-end that produces the output map.

This pattern is a more general version of Direct1ToNCompiler and DirectMToNCompiler where not only a M:N input/output relationship is supported, but this relationship is function of the input payloads, so the input content.

This pattern, however, compiles input partitions standalone, meaning that compiling one input partition sees data and metadata of that partition only. In case it is needed to lookup information from additional input partition in the front-end, please refer to RefTreeCompiler.

T: the custom type of the values passed between front-end and back-end

Note: the implementation must be scala.Serializable as this is copied to workers and run inside Spark map functions
,
This is a Java friendly version of com.here.platform.data.processing.compiler.MapGroupCompiler.
See also: extended interfaces for more details

Linear Supertypes

CompileOutFn[T], OutputOptPartitioner, OutputLayers, CompileInFn[T], Serializable, InputOptPartitioner, InputLayers, AnyRef, Any

Ordering

Alphabetic
By Inheritance

Inherited

MapGroupCompiler
CompileOutFn
OutputOptPartitioner
OutputLayers
CompileInFn
Serializable
InputOptPartitioner
InputLayers
AnyRef
Any

Hide All
Show All

Visibility

Public
Protected

Abstract Value Members

abstract def compileInFn(in: Java.Pair[InKey, InMeta]): Iterable[Java.Pair[OutKey, T]]
Calculates the dependent output partitions and intermediate results from a single input partition.
Calculates the dependent output partitions and intermediate results from a single input partition.
in
the input partition to process
returns
all the impacted output partitions OutKey and intermediate data of type T for this partition. It may contain more than one element per output key. compileOutFn will only be called for outKeys which have at least one intermediate value from this phase. Other outKeys will be automatically deleted.
Definition Classes
CompileInFn
abstract def inLayers: Map[String, Set[String]]
Layers of the input catalogs that should be queried and provided to the compiler, grouped by input catalog and identified by catalog id and layer ID.
Layers of the input catalogs that should be queried and provided to the compiler, grouped by input catalog and identified by catalog id and layer ID.
Definition Classes
InputLayers
abstract def inPartitioner(parallelism: Int): Option[PartitionerOfKey]
If the returned com.here.platform.data.processing.java.Java.Option is Empty, the Executor will use a default partitioner, if it is defined then the given partitioner will be applied when querying the input catalogs.
If the returned com.here.platform.data.processing.java.Java.Option is Empty, the Executor will use a default partitioner, if it is defined then the given partitioner will be applied when querying the input catalogs.
parallelism
the parallelism of the partitioner
returns
the input optional partitioner with the given parallelism
Definition Classes
InputOptPartitioner
abstract def outLayers: Set[String]
Layers that are expected to be produced by the compiler.
Layers that are expected to be produced by the compiler.
Definition Classes
OutputLayers
abstract def outPartitioner(parallelism: Int): Option[PartitionerOfKey]
If the returned com.here.platform.data.processing.java.Java.Option is Empty, the Executor will use a default partitioner, if it is defined then the given partitioner will be applied when querying the output catalog and producing data to be published.
If the returned com.here.platform.data.processing.java.Java.Option is Empty, the Executor will use a default partitioner, if it is defined then the given partitioner will be applied when querying the output catalog and producing data to be published.
parallelism
the parallelism of the partitioner
returns
the output optional partitioner with the given parallelism
Definition Classes
OutputOptPartitioner

Concrete Value Members

final def !=(arg0: Any): Boolean
Definition Classes
AnyRef → Any
final def ##: Int
Definition Classes
AnyRef → Any
final def ==(arg0: Any): Boolean
Definition Classes
AnyRef → Any
final def asInstanceOf[T0]: T0
Definition Classes
Any
def clone(): AnyRef
Attributes
protected[lang]
Definition Classes
AnyRef
Annotations
@throws(classOf[java.lang.CloneNotSupportedException]) @IntrinsicCandidate() @native()
final def eq(arg0: AnyRef): Boolean
Definition Classes
AnyRef
def equals(arg0: AnyRef): Boolean
Definition Classes
AnyRef → Any
final def getClass(): Class[_ <: AnyRef]
Definition Classes
AnyRef → Any
Annotations
@IntrinsicCandidate() @native()
def hashCode(): Int
Definition Classes
AnyRef → Any
Annotations
@IntrinsicCandidate() @native()
final def isInstanceOf[T0]: Boolean
Definition Classes
Any
final def ne(arg0: AnyRef): Boolean
Definition Classes
AnyRef
final def notify(): Unit
Definition Classes
AnyRef
Annotations
@IntrinsicCandidate() @native()
final def notifyAll(): Unit
Definition Classes
AnyRef
Annotations
@IntrinsicCandidate() @native()
final def synchronized[T0](arg0: => T0): T0
Definition Classes
AnyRef
def toString(): String
Definition Classes
AnyRef → Any
final def wait(arg0: Long, arg1: Int): Unit
Definition Classes
AnyRef
Annotations
@throws(classOf[java.lang.InterruptedException])
final def wait(arg0: Long): Unit
Definition Classes
AnyRef
Annotations
@throws(classOf[java.lang.InterruptedException]) @native()
final def wait(): Unit
Definition Classes
AnyRef
Annotations
@throws(classOf[java.lang.InterruptedException])

Deprecated Value Members

def finalize(): Unit
Attributes
protected[lang]
Definition Classes
AnyRef
Annotations
@throws(classOf[java.lang.Throwable]) @Deprecated
Deprecated
(Since version 9)

Packages

MapGroupCompiler

trait MapGroupCompiler[T] extends InputLayers with InputOptPartitioner with CompileInFn[T] with OutputLayers with OutputOptPartitioner with CompileOutFn[T]

Abstract Value Members

Concrete Value Members

Deprecated Value Members

Inherited from CompileOutFn[T]

Inherited from OutputOptPartitioner

Inherited from OutputLayers

Inherited from CompileInFn[T]

Inherited from Serializable

Inherited from InputOptPartitioner

Inherited from InputLayers

Inherited from AnyRef

Inherited from Any

Ungrouped

Packages

MapGroupCompiler

trait MapGroupCompiler[T] extends InputLayers with InputOptPartitioner with CompileInFn[T] with OutputLayers with OutputOptPartitioner with CompileOutFn[T]

Abstract Value Members

Concrete Value Members

Deprecated Value Members

Inherited from CompileOutFn[T]

Inherited from OutputOptPartitioner

Inherited from OutputLayers

Inherited from CompileInFn[T]

Inherited from Serializable

Inherited from InputOptPartitioner

Inherited from InputLayers

Inherited from AnyRef

Inherited from Any

Ungrouped

MapGroupCompiler