trait DepCompiler[T] extends InputLayers with InputPartitioner with OutputLayers with OutputPartitioner
Interface for a basic incremental dependency-based compiler.
Calculated dependencies shall be exhaustive especially for the incremental case, as the compiler will schedule compilation of the output partition returned as part of dependency calculation without any intermediate further processing.
Returning non-exhaustive dependencies has the effect of producing a corrupted output catalog when doing incremental compilation, or an output catalog with missing/invalid data when doing full compilation.
The compiler is efficient only in case the cost of calculating dependencies is negligible compared to the cost of actually producing the output map, as dependency calculation is always triggered with the full input map.
- T
The type of the values collected from dependencies for each output partition
- Note
This is a Java friendly version of com.here.platform.data.processing.compiler.DepCompiler.
- See also
traits mixed in for more details
- Alphabetic
- By Inheritance
- DepCompiler
- OutputPartitioner
- OutputLayers
- InputPartitioner
- InputLayers
- AnyRef
- Any
- Hide All
- Show All
- Public
- All
Type Members
-
type
DepGraph = JavaPairRDD[InKey, OutKey]
The input/output dependencies in the following form: as first key com.here.platform.data.processing.java.compiler.InKey, the input partition key, as value com.here.platform.data.processing.java.compiler.OutKey, the output partition key that depends on the input.
The input/output dependencies in the following form: as first key com.here.platform.data.processing.java.compiler.InKey, the input partition key, as value com.here.platform.data.processing.java.compiler.OutKey, the output partition key that depends on the input. In case one input partition originates dependencies to multiple outputs, each must be represented as single entry in this RDD with the same com.here.platform.data.processing.java.compiler.InKey as key.
-
type
IntermediateResult = JavaPairRDD[OutKey, T]
The values calculated as part of the dependencies that are provided for each com.here.platform.data.processing.java.compiler.OutKey.
The values calculated as part of the dependencies that are provided for each com.here.platform.data.processing.java.compiler.OutKey. In case one output partition receives multiple values, these must be represented as multiple entries in this RDD with the same com.here.platform.data.processing.java.compiler.OutKey as key.
-
type
ToCompile = JavaPairRDD[OutKey, Iterable[T]]
The aggregated
T
s for each com.here.platform.data.processing.java.compiler.OutKey provided as input of compilation.
Abstract Value Members
-
abstract
def
compileIn(inData: InData, parallelism: Int): Java.Pair[DepGraph, IntermediateResult]
Calculates the dependencies of the output partitions in terms of input partitions.
Calculates the dependencies of the output partitions in terms of input partitions. These dependencies are aggregated and provided later to the compileOut call. This method is invoked in both full and incremental re-compile cases with the whole input catalog as input.
- inData
the metadata of the whole input catalog. Keys are partitioned as specified in com.here.platform.data.processing.java.compiler.InputPartitioner
- parallelism
the parallelism of both the input and the output RDDs. This parameter is normally needed to get partitioners from com.here.platform.data.processing.java.compiler.InputPartitioner and/or from com.here.platform.data.processing.java.compiler.OutputOptPartitioner traits
- returns
the input/output dependencies and the values for each of them. Returned graph must be partitioned by inPartitioner and the intermediate result by outPartitioner for optimal performances.
- Note
please note and follow the RDD persistence policy described in com.here.platform.data.processing.driver.Executor
-
abstract
def
compileOut(toCompile: ToCompile, parallelism: Int): ToPublish
Compiles partitions and returns actual compiled data, if any.
Compiles partitions and returns actual compiled data, if any. This method is invoked in both full and incremental re-compile cases.
The required behaviour for this method is to return exactly the same number of elements, with the same values for the out keys, as were passed in toCompile.
- toCompile
the first com.here.platform.data.processing.compiler.OutKey is the output partition key. The second iterable collection of
T
contains the dependency values for that key. The input is provided partitioned according to com.here.platform.data.processing.java.compiler.OutputPartitioner. The RDD is not persisted and, if used multiple times, it should be persisted.- parallelism
the parallelism of both the input and the output RDDs. This parameter is normally needed to get partitioners from com.here.platform.data.processing.java.compiler.InputOptPartitioner and/or from com.here.platform.data.processing.compiler.OutputOptPartitioner traits
- returns
com.here.platform.data.processing.java.compiler.OutKey is the key of compiled partition. com.here.platform.data.processing.java.blobstore.Payload is the output data, if any. The returned keys shall be partitioned as specified in outPartitioner, so it is expected that this function keeps using this partitioner passed in the input otherwise the produced data is shuffled and this must be avoided (and enforced)
-
abstract
def
inLayers: Map[String, Set[String]]
Layers of the input catalogs that should be queried and provided to the compiler, grouped by input catalog and identified by catalog id and layer ID.
Layers of the input catalogs that should be queried and provided to the compiler, grouped by input catalog and identified by catalog id and layer ID.
- Definition Classes
- InputLayers
-
abstract
def
inPartitioner(parallelism: Int): PartitionerOfKey
The partitioner to be applied when querying the input catalog.
The partitioner to be applied when querying the input catalog.
- parallelism
the parallelism of the partitioner
- returns
the input partitioner with the given parallelism
- Definition Classes
- InputPartitioner
-
abstract
def
outLayers: Set[String]
Layers that are expected to be produced by the compiler.
Layers that are expected to be produced by the compiler.
- Definition Classes
- OutputLayers
-
abstract
def
outPartitioner(parallelism: Int): PartitionerOfKey
The partitioner to be applied when querying the output catalog and producing output data.
The partitioner to be applied when querying the output catalog and producing output data.
- parallelism
the parallelism of the partitioner
- returns
the output partitioner with the given parallelism
- Definition Classes
- OutputPartitioner
Concrete Value Members
-
final
def
!=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
##(): Int
- Definition Classes
- AnyRef → Any
-
final
def
==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
asInstanceOf[T0]: T0
- Definition Classes
- Any
-
def
clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()
-
final
def
eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
def
equals(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
def
finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( classOf[java.lang.Throwable] )
-
final
def
getClass(): Class[_]
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
-
def
hashCode(): Int
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
-
final
def
isInstanceOf[T0]: Boolean
- Definition Classes
- Any
-
final
def
ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
final
def
notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
final
def
notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
final
def
synchronized[T0](arg0: ⇒ T0): T0
- Definition Classes
- AnyRef
-
def
toString(): String
- Definition Classes
- AnyRef → Any
-
final
def
wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()