com.here.platform.data.processing.java.compiler

IncrementalDepCompiler

trait IncrementalDepCompiler[T, CarryOver] extends DepCompiler[T]

Improved version of DepCompiler that can update the dependencies from the previous run instead of recalculating them from scratch.

The non-incremental case works exactly as in DepCompiler.

The incremental case is implemented in three parts: 1) a function that updates the previous dependency graph given changes from the input. 2) a function that, given a subgraph of the dependency graph, calculates the Ts. 3) the compile function, the same as in DepCompiler, in invoked on the output partitions that are affected by the changes.

A mechanism to carry over an arbitrary intermediate result for the 1st function to the 2nd function is also provided.

T: the type of the values collected from dependencies for each output partition
CarryOver: the type of the opaque object carried over from 1st to 2nd function

Note: This is a Java friendly version of com.here.platform.data.processing.compiler.IncrementalDepCompiler.

Linear Supertypes

DepCompiler[T], OutputPartitioner, OutputLayers, InputPartitioner, InputLayers, AnyRef, Any

Ordering

Alphabetic
By Inheritance

Inherited

IncrementalDepCompiler
DepCompiler
OutputPartitioner
OutputLayers
InputPartitioner
InputLayers
AnyRef
Any

Hide All
Show All

Visibility

Public
Protected

Type Members

type DepGraph = JavaPairRDD[InKey, OutKey]
The input/output dependencies in the following form: as first key com.here.platform.data.processing.java.compiler.InKey, the input partition key, as value com.here.platform.data.processing.java.compiler.OutKey, the output partition key that depends on the input.
The input/output dependencies in the following form: as first key com.here.platform.data.processing.java.compiler.InKey, the input partition key, as value com.here.platform.data.processing.java.compiler.OutKey, the output partition key that depends on the input. In case one input partition originates dependencies to multiple outputs, each must be represented as single entry in this RDD with the same com.here.platform.data.processing.java.compiler.InKey as key.
Definition Classes
DepCompiler
type IntermediateResult = JavaPairRDD[OutKey, T]
The values calculated as part of the dependencies that are provided for each com.here.platform.data.processing.java.compiler.OutKey.
The values calculated as part of the dependencies that are provided for each com.here.platform.data.processing.java.compiler.OutKey. In case one output partition receives multiple values, these must be represented as multiple entries in this RDD with the same com.here.platform.data.processing.java.compiler.OutKey as key.
Definition Classes
DepCompiler
type ToCompile = JavaPairRDD[OutKey, Iterable[T]]
The aggregated Ts for each com.here.platform.data.processing.java.compiler.OutKey provided as input of compilation.
The aggregated Ts for each com.here.platform.data.processing.java.compiler.OutKey provided as input of compilation.
Definition Classes
DepCompiler

Abstract Value Members

abstract def compileIn(inData: InData, subDepGraph: DepGraph, carriedOver: CarryOver, parallelism: Int): IntermediateResult
Return the dependency values for a subset of the dependencies.
Return the dependency values for a subset of the dependencies.
The subset of dependencies has the form of a pruned dependency graph. It is guaranteed that, if an com.here.platform.data.processing.java.compiler.OutKey is present in the subgraph, every com.here.platform.data.processing.java.compiler.InKey corresponding to that com.here.platform.data.processing.java.compiler.OutKey in the complete dependency graph are present in the subgraph as well.
inData
the full input dataset. Keys are partitioned as specified in com.here.platform.data.processing.java.compiler.InputPartitioner
subDepGraph
subset of the dependency graph for which the values must be calculated. Keys are partitioned as specified in com.here.platform.data.processing.java.compiler.InputPartitioner
carriedOver
opaque value carried over from the updateDepGraph function
parallelism
the parallelism of both the input and the output RDDs. This parameter is normally needed to get partitioners from com.here.platform.data.processing.java.compiler.InputOptPartitioner and/or from com.here.platform.data.processing.java.compiler.OutputOptPartitioner traits
returns
the dependency values equivalent to the one produced by com.here.platform.data.processing.java.compiler.DepCompiler.compileIn and later pruned to only the com.here.platform.data.processing.java.compiler.OutKeys present in the subgraph
Note
it's responsibility of this function to produce dependency values that are equivalent to the one produced by com.here.platform.data.processing.java.compiler.DepCompiler.compileIn if invoked on the full input dataset and then filtered to select the subset of output values corresponding to the subgraph. Failure to do so will results in invalid/inconsistent data committed to the output catalog.
,
please note and follow the RDD persistence policy described in com.here.platform.data.processing.driver.Executor
abstract def compileIn(inData: InData, parallelism: Int): Java.Pair[DepGraph, IntermediateResult]
Calculates the dependencies of the output partitions in terms of input partitions.
Calculates the dependencies of the output partitions in terms of input partitions. These dependencies are aggregated and provided later to the compileOut call. This method is invoked in both full and incremental re-compile cases with the whole input catalog as input.
inData
the metadata of the whole input catalog. Keys are partitioned as specified in com.here.platform.data.processing.java.compiler.InputPartitioner
parallelism
the parallelism of both the input and the output RDDs. This parameter is normally needed to get partitioners from com.here.platform.data.processing.java.compiler.InputPartitioner and/or from com.here.platform.data.processing.java.compiler.OutputOptPartitioner traits
returns
the input/output dependencies and the values for each of them. Returned graph must be partitioned by inPartitioner and the intermediate result by outPartitioner for optimal performances.
Definition Classes
DepCompiler
Note
please note and follow the RDD persistence policy described in com.here.platform.data.processing.driver.Executor
abstract def compileOut(toCompile: ToCompile, parallelism: Int): ToPublish
Compiles partitions and returns actual compiled data, if any.
Compiles partitions and returns actual compiled data, if any. This method is invoked in both full and incremental re-compile cases.
The required behaviour for this method is to return exactly the same number of elements, with the same values for the out keys, as were passed in toCompile.
toCompile
the first com.here.platform.data.processing.compiler.OutKey is the output partition key. The second iterable collection of T contains the dependency values for that key. The input is provided partitioned according to com.here.platform.data.processing.java.compiler.OutputPartitioner. The RDD is not persisted and, if used multiple times, it should be persisted.
parallelism
the parallelism of both the input and the output RDDs. This parameter is normally needed to get partitioners from com.here.platform.data.processing.java.compiler.InputOptPartitioner and/or from com.here.platform.data.processing.compiler.OutputOptPartitioner traits
returns
com.here.platform.data.processing.java.compiler.OutKey is the key of compiled partition. com.here.platform.data.processing.java.blobstore.Payload is the output data, if any. The returned keys shall be partitioned as specified in outPartitioner, so it is expected that this function keeps using this partitioner passed in the input otherwise the produced data is shuffled and this must be avoided (and enforced)
Definition Classes
DepCompiler
abstract def inLayers: Map[String, Set[String]]
Layers of the input catalogs that should be queried and provided to the compiler, grouped by input catalog and identified by catalog id and layer ID.
Layers of the input catalogs that should be queried and provided to the compiler, grouped by input catalog and identified by catalog id and layer ID.
Definition Classes
InputLayers
abstract def inPartitioner(parallelism: Int): PartitionerOfKey
The partitioner to be applied when querying the input catalog.
The partitioner to be applied when querying the input catalog.
parallelism
the parallelism of the partitioner
returns
the input partitioner with the given parallelism
Definition Classes
InputPartitioner
abstract def outLayers: Set[String]
Layers that are expected to be produced by the compiler.
Layers that are expected to be produced by the compiler.
Definition Classes
OutputLayers
abstract def outPartitioner(parallelism: Int): PartitionerOfKey
The partitioner to be applied when querying the output catalog and producing output data.
The partitioner to be applied when querying the output catalog and producing output data.
parallelism
the parallelism of the partitioner
returns
the output partitioner with the given parallelism
Definition Classes
OutputPartitioner
abstract def updateDepGraph(inData: InData, inChanges: InChanges, prevDepGraph: DepGraph, parallelism: Int): Java.Pair[DepGraph, CarryOver]
Update the dependency graph of the previous run given what is changed in the input.
Update the dependency graph of the previous run given what is changed in the input.
inData
the full input dataset. Keys are partitioned as specified in com.here.platform.data.processing.java.compiler.InputPartitioner
inChanges
what is changed in the input dataset from the previous run. Keys are partitioned as specified in com.here.platform.data.processing.java.compiler.InputPartitioner
prevDepGraph
the dependency graph of the previous run. Keys are partitioned as specified in com.here.platform.data.processing.java.compiler.InputPartitioner
parallelism
the parallelism of both the input and the output RDDs. This parameter is normally needed to get partitioners from com.here.platform.data.processing.java.compiler.InputOptPartitioner and/or from com.here.platform.data.processing.java.compiler.OutputOptPartitioner traits
returns
the updated complete dependency graph, which must be equivalent to the one produced by com.here.platform.data.processing.java.compiler.DepCompiler.compileIn. In addition, an opaque value is returned and later provided to the compileIn function defined in the IncrementalDepCompiler. This graph must be partitioned by inPartitioner.
Note
it's responsibility of this function to produce a dependency graph that is equivalent to the one produced by com.here.platform.data.processing.java.compiler.DepCompiler.compileIn if invoked on the full input dataset. Failure to do so will results in invalid/inconsistent data committed to the output catalog.
,
Follow the RDD persistence policy described in com.here.platform.data.processing.driver.Executor.

Concrete Value Members

final def !=(arg0: Any): Boolean
Definition Classes
AnyRef → Any
final def ##: Int
Definition Classes
AnyRef → Any
final def ==(arg0: Any): Boolean
Definition Classes
AnyRef → Any
final def asInstanceOf[T0]: T0
Definition Classes
Any
def clone(): AnyRef
Attributes
protected[lang]
Definition Classes
AnyRef
Annotations
@throws(classOf[java.lang.CloneNotSupportedException]) @IntrinsicCandidate() @native()
final def eq(arg0: AnyRef): Boolean
Definition Classes
AnyRef
def equals(arg0: AnyRef): Boolean
Definition Classes
AnyRef → Any
final def getClass(): Class[_ <: AnyRef]
Definition Classes
AnyRef → Any
Annotations
@IntrinsicCandidate() @native()
def hashCode(): Int
Definition Classes
AnyRef → Any
Annotations
@IntrinsicCandidate() @native()
final def isInstanceOf[T0]: Boolean
Definition Classes
Any
final def ne(arg0: AnyRef): Boolean
Definition Classes
AnyRef
final def notify(): Unit
Definition Classes
AnyRef
Annotations
@IntrinsicCandidate() @native()
final def notifyAll(): Unit
Definition Classes
AnyRef
Annotations
@IntrinsicCandidate() @native()
final def synchronized[T0](arg0: => T0): T0
Definition Classes
AnyRef
def toString(): String
Definition Classes
AnyRef → Any
final def wait(arg0: Long, arg1: Int): Unit
Definition Classes
AnyRef
Annotations
@throws(classOf[java.lang.InterruptedException])
final def wait(arg0: Long): Unit
Definition Classes
AnyRef
Annotations
@throws(classOf[java.lang.InterruptedException]) @native()
final def wait(): Unit
Definition Classes
AnyRef
Annotations
@throws(classOf[java.lang.InterruptedException])

Deprecated Value Members

def finalize(): Unit
Attributes
protected[lang]
Definition Classes
AnyRef
Annotations
@throws(classOf[java.lang.Throwable]) @Deprecated
Deprecated
(Since version 9)

Packages

IncrementalDepCompiler

trait IncrementalDepCompiler[T, CarryOver] extends DepCompiler[T]

Type Members

Abstract Value Members

Concrete Value Members

Deprecated Value Members

Inherited from DepCompiler[T]

Inherited from OutputPartitioner

Inherited from OutputLayers

Inherited from InputPartitioner

Inherited from InputLayers

Inherited from AnyRef

Inherited from Any

Ungrouped

Packages

IncrementalDepCompiler

trait IncrementalDepCompiler[T, CarryOver] extends DepCompiler[T]

Type Members

Abstract Value Members

Concrete Value Members

Deprecated Value Members

Inherited from DepCompiler[T]

Inherited from OutputPartitioner

Inherited from OutputLayers

Inherited from InputPartitioner

Inherited from InputLayers

Inherited from AnyRef

Inherited from Any

Ungrouped

IncrementalDepCompiler