final class ConcreteIncrementalDepCompiler[T, CarryOver, W <: IncrementalDepCompiler[T, CarryOver]] extends ConcreteDepCompiler[T, W] with IncrementalDepCompilerWrapper[T, CarryOver, W]
- Alphabetic
- By Inheritance
- ConcreteIncrementalDepCompiler
- IncrementalDepCompilerWrapper
- IncrementalDepCompiler
- ConcreteDepCompiler
- DepCompilerWrapper
- InputPartitionerWrapper
- WrapperInputLayers
- OutputPartitionerWrapper
- WrapperOutputLayers
- Wrapper
- DepCompiler
- InputPartitioner
- InputLayers
- DepCompilerBase
- OutputPartitioner
- OutputLayers
- AnyRef
- Any
- Hide All
- Show All
- Public
- Protected
Instance Constructors
- new ConcreteIncrementalDepCompiler(impl: W)(implicit arg0: ClassTag[T])
Type Members
- type DepGraph = RDD[(InKey, OutKey)]
The input/output dependencies in the following form: as first key com.here.platform.data.processing.compiler.InKey, the input partition key, as value com.here.platform.data.processing.compiler.OutKey, the output partition key that depends on the input.
The input/output dependencies in the following form: as first key com.here.platform.data.processing.compiler.InKey, the input partition key, as value com.here.platform.data.processing.compiler.OutKey, the output partition key that depends on the input. In case one input partition originates dependencies to multiple outputs, each must be represented as single entry in this RDD with the same com.here.platform.data.processing.compiler.InKey as key.
- Definition Classes
- DepCompiler
- type IntermediateResult = RDD[(OutKey, T)]
The values calculated as part of the dependencies that are provided for each com.here.platform.data.processing.compiler.OutKey.
The values calculated as part of the dependencies that are provided for each com.here.platform.data.processing.compiler.OutKey. In case one output partition receives multiple values, these must be represented as multiple entries in this RDD with the same com.here.platform.data.processing.compiler.OutKey as key.
- Definition Classes
- DepCompilerBase
- type ToCompile = RDD[(OutKey, Iterable[T])]
The aggregated
Ts for each com.here.platform.data.processing.compiler.OutKey provided as input of compilation.The aggregated
Ts for each com.here.platform.data.processing.compiler.OutKey provided as input of compilation.- Definition Classes
- DepCompilerBase
Value Members
- implicit object intermediateToScala extends ScalaConvertible[T, T] with Serializable
- Attributes
- protected
- Definition Classes
- ConcreteDepCompiler
- implicit object iterableToJava extends JavaConvertible[Iterable[T], Iterable[T]] with Serializable
- Attributes
- protected
- Definition Classes
- ConcreteDepCompiler
- final def !=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
- final def ##: Int
- Definition Classes
- AnyRef → Any
- final def ==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
- final def asInstanceOf[T0]: T0
- Definition Classes
- Any
- def clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.CloneNotSupportedException]) @IntrinsicCandidate() @native()
- def compileIn(inDataScala: InData, subDepGraphScala: DepGraph, carriedOver: CarryOver, parallelism: Int): IntermediateResult
Return the dependency values for a subset of the dependencies.
Return the dependency values for a subset of the dependencies.
The subset of dependencies has the form of a pruned dependency graph. It is guaranteed that, if an com.here.platform.data.processing.compiler.OutKey is present in the subgraph, every com.here.platform.data.processing.compiler.InKey corresponding to that com.here.platform.data.processing.compiler.OutKey in the complete dependency graph are present in the subgraph as well.
- carriedOver
opaque value carried over from the updateDepGraph function
- parallelism
the parallelism of both the input and the output RDDs. This parameter is normally needed to get partitioners from com.here.platform.data.processing.compiler.InputOptPartitioner and/or from com.here.platform.data.processing.compiler.OutputOptPartitioner traits
- returns
the dependency values equivalent to the one produced by com.here.platform.data.processing.compiler.DepCompiler.compileIn and later pruned to only the com.here.platform.data.processing.compiler.OutKeys present in the subgraph
- Definition Classes
- ConcreteIncrementalDepCompiler → IncrementalDepCompiler
- Note
it's responsibility of this function to produce dependency values that are equivalent to the one produced by com.here.platform.data.processing.compiler.DepCompiler.compileIn if invoked on the full input dataset and then filtered to select the subset of output values corresponding to the subgraph. Failure to do so will results in invalid/inconsistent data committed to the output catalog.
,please note and follow the RDD persistence policy described in com.here.platform.data.processing.driver.Executor
- def compileIn(inDataScala: InData, parallelism: Int): (DepGraph, IntermediateResult)
Calculates the dependencies of the output partitions in terms of input partitions.
Calculates the dependencies of the output partitions in terms of input partitions. These dependencies are aggregated and provided later to the compileOut call. This method is invoked in both full and incremental re-compile cases with the whole input catalog as input.
- parallelism
the parallelism of both the input and the output RDDs. This parameter is normally needed to get partitioners from com.here.platform.data.processing.compiler.InputOptPartitioner and/or from com.here.platform.data.processing.compiler.OutputOptPartitioner traits
- returns
the input/output dependencies and the values for each of them. Returned graph must be partitioned by inPartitioner and the intermediate result by com.here.platform.data.processing.compiler.OutputPartitioner.outPartitioner for optimal performances.
- Definition Classes
- ConcreteDepCompiler → DepCompiler
- Note
please note and follow the RDD persistence policy described in com.here.platform.data.processing.driver.Executor
- def compileOut(toCompileScala: ToCompile, parallelism: Int): ToPublish
Compiles partitions and returns actual compiled data, if any.
Compiles partitions and returns actual compiled data, if any. This method is invoked in both full and incremental re-compile cases.
The required behaviour for this method is to return exactly the same number of elements, with the same values for the out keys, as were passed in toCompile.
- parallelism
the parallelism of both the input and the output RDDs. This parameter is normally needed to get partitioners from com.here.platform.data.processing.compiler.InputOptPartitioner and/or from com.here.platform.data.processing.compiler.OutputOptPartitioner traits
- returns
com.here.platform.data.processing.compiler.OutKey is the key of compiled partition. com.here.platform.data.processing.blobstore.Payload is the output data, if any. The returned keys shall be partitioned as specified in com.here.platform.data.processing.compiler.OutputPartitioner, so it is expected that this function keeps using this partitioner passed in the input otherwise the produced data is shuffled and this must be avoided (and enforced)
- Definition Classes
- ConcreteDepCompiler → DepCompilerBase
- final def eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
- def equals(o: Any): Boolean
- Definition Classes
- Wrapper → AnyRef → Any
- final def getClass(): Class[_ <: AnyRef]
- Definition Classes
- AnyRef → Any
- Annotations
- @IntrinsicCandidate() @native()
- def hashCode(): Int
- Definition Classes
- Wrapper → AnyRef → Any
- val impl: W
- Definition Classes
- ConcreteIncrementalDepCompiler → ConcreteDepCompiler → Wrapper
- def inLayers: Map[Id, Set[Id]]
Represents layers of the input catalogs that you should query and provide to the compiler.
Represents layers of the input catalogs that you should query and provide to the compiler. These layers are grouped by input catalog and identified by catalog ID and layer ID.
- Definition Classes
- WrapperInputLayers → InputLayers
- def inPartitioner(parallelism: Int): Partitioner[InKey]
Specifies the partitioner to use when querying the input catalogs.
Specifies the partitioner to use when querying the input catalogs.
- parallelism
The number of partitions the partitioner should partition the catalog into, this should match the parallelism of the Spark RDD containing the input partitions.
- returns
The input partitioner with the parallelism specified.
- Definition Classes
- InputPartitionerWrapper → InputPartitioner
- final def isInstanceOf[T0]: Boolean
- Definition Classes
- Any
- final def ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
- final def notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @IntrinsicCandidate() @native()
- final def notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @IntrinsicCandidate() @native()
- final val outCatalogId: Id
Identifier for the output catalog.
Identifier for the output catalog.
- Definition Classes
- OutputLayers
- def outLayers: Set[Id]
Layers to be produced by the compiler.
Layers to be produced by the compiler.
- Definition Classes
- WrapperOutputLayers → OutputLayers
- def outPartitioner(parallelism: Int): Partitioner[OutKey]
Specifies the partitioner to use when querying the output catalog and producing output data.
Specifies the partitioner to use when querying the output catalog and producing output data.
- parallelism
The number of partitions the partitioner should partition the catalog into, this should match the parallelism of the Spark RDD containing the output partitions.
- returns
The output partitioner with the parallelism specified.
- Definition Classes
- OutputPartitionerWrapper → OutputPartitioner
- final def synchronized[T0](arg0: => T0): T0
- Definition Classes
- AnyRef
- def toString(): String
- Definition Classes
- Wrapper → AnyRef → Any
- def updateDepGraph(inDataScala: InData, inChangesScala: InChanges, prevDepGraphScala: DepGraph, parallelism: Int): (DepGraph, CarryOver)
Update the dependency graph of the previous run given what is changed in the input.
Update the dependency graph of the previous run given what is changed in the input.
- parallelism
the parallelism of both the input and the output RDDs. This parameter is normally needed to get partitioners from com.here.platform.data.processing.compiler.InputOptPartitioner and/or from com.here.platform.data.processing.compiler.OutputOptPartitioner traits
- returns
the updated complete dependency graph, it must be equivalent to the one produced by com.here.platform.data.processing.compiler.DepCompiler.compileIn. In addition, an opaque value is returned and later provided to the compileIn function defined in the IncrementalDepCompiler. Returned graph must be partitioned by inPartitioner
- Definition Classes
- ConcreteIncrementalDepCompiler → IncrementalDepCompiler
- Note
it's responsibility of this function to produce a dependency graph that is equivalent to the one produced by com.here.platform.data.processing.compiler.DepCompiler.compileIn if invoked on the full input dataset. Failure to do so will results in invalid/inconsistent data committed to the output catalog.
,please note and follow the RDD persistence policy described in com.here.platform.data.processing.driver.Executor
- final def wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException])
- final def wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException]) @native()
- final def wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException])
Deprecated Value Members
- def finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.Throwable]) @Deprecated
- Deprecated
(Since version 9)