package compiler
Type Members
-
trait
CompileOut1To1Fn[T] extends CompileOutFn[T]
Produces one com.here.platform.data.processing.java.blobstore.Payload for each "nominal" output partition the compiler generates.
Produces one com.here.platform.data.processing.java.blobstore.Payload for each "nominal" output partition the compiler generates.
- T
the custom type of the values passed between front-end and back-end
- Note
This is a Java friendly version of com.here.platform.data.processing.compiler.CompileOut1To1Fn.
-
trait
CompileOut1ToNFn[T] extends CompileOutFn[T]
Produces multiple OutKeys and com.here.platform.data.processing.java.blobstore.Payloads for each "nominal" output partition the compiler generates.
Produces multiple OutKeys and com.here.platform.data.processing.java.blobstore.Payloads for each "nominal" output partition the compiler generates.
- T
the custom type of the values passed between front-end and back-end
- Note
This is a Java friendly version of com.here.platform.data.processing.compiler.CompileOut1ToNFn.
-
trait
CompileOutFn[T] extends Serializable
Back-end of DirectMToNCompiler, MapGroupCompiler and RefTreeCompiler.
Back-end of DirectMToNCompiler, MapGroupCompiler and RefTreeCompiler. Receives values of a custom type produced by the front-end and produces the final payloads that are the uploaded and committed to the output catalog.
- T
the custom type of the values passed between front-end and back-end
- Note
The implementation must be scala.Serializable as this is copied to workers and run inside Spark map functions.
,Users should implement one of the interfaces extending this one to their implementation of DirectMToNCompiler, MapGroupCompiler or RefTreeCompiler according to the wanted behaviour.
,This is a Java friendly version of com.here.platform.data.processing.compiler.CompileOutFn.
-
trait
DepCompiler[T] extends InputLayers with InputPartitioner with OutputLayers with OutputPartitioner
Interface for a basic incremental dependency-based compiler.
Interface for a basic incremental dependency-based compiler.
Calculated dependencies shall be exhaustive especially for the incremental case, as the compiler will schedule compilation of the output partition returned as part of dependency calculation without any intermediate further processing.
Returning non-exhaustive dependencies has the effect of producing a corrupted output catalog when doing incremental compilation, or an output catalog with missing/invalid data when doing full compilation.
The compiler is efficient only in case the cost of calculating dependencies is negligible compared to the cost of actually producing the output map, as dependency calculation is always triggered with the full input map.
- T
The type of the values collected from dependencies for each output partition
- Note
This is a Java friendly version of com.here.platform.data.processing.compiler.DepCompiler.
- See also
traits mixed in for more details
-
trait
Direct1ToNCompiler[T] extends InputLayers with InputOptPartitioner with CompileInFn[T] with OutputLayers with OutputOptPartitioner with compiler.direct.CompileOutFn[T]
This compiler allows a stateless incremental compilation, in the case the keys of output partitions are a function of the input keys only, so where the input/output mapping does not depend on the input content but only on the input keys.
This compiler allows a stateless incremental compilation, in the case the keys of output partitions are a function of the input keys only, so where the input/output mapping does not depend on the input content but only on the input keys.
This compiler is applicable only when each output partition is affected by one single input partition. This is restriction enforced but allows a very efficient implementation of the incremental compilation case. The limitation is on the output side only: one input partition may be mapped to zero, one or more output partitions but each output partition can be mapped to by one input partition only, without overlaps.
In case multiple input partitions may affect the same output partition, the DirectMToNCompiler should be used instead.
- T
the type of the values passed between front-end and back-end
- Note
the implementation must be scala.Serializable as this is copied to workers and run inside Spark map functions
,This is a Java friendly version of com.here.platform.data.processing.compiler.Direct1ToNCompiler.
- See also
extended interfaces for more details
-
trait
DirectMToNCompiler[T] extends InputLayers with InputOptPartitioner with CompileInFn[T] with OutputLayers with OutputOptPartitioner with CompileOutFn[T]
This compiler allows a stateless incremental compilation, in the case the keys of output partitions are a function of the input keys only, so where the input/output mapping does not depend on the input content but only on the input keys.
This compiler allows a stateless incremental compilation, in the case the keys of output partitions are a function of the input keys only, so where the input/output mapping does not depend on the input content but only on the input keys.
This compiler allows multiple input partitions to map to the same output partition. In the more limited case where each output partition is affected by one single input partition, Direct1ToNCompiler should be used instead, that enforce this property and can perform incremental compilation in an even for efficient way.
- T
the type of the values passed between front-end and back-end
- Note
the implementation must be scala.Serializable as this is copied to workers and run inside Spark map functions.
,This is a Java friendly version of com.here.platform.data.processing.compiler.DirectMToNCompiler.
- See also
extended interfaces for more details
- type InChange = Change
- type InChanges = JavaPairRDD[InKey, InChange]
- type InData = JavaPairRDD[InKey, InMeta]
- type InKey = Key
- type InMeta = Meta
-
trait
IncrementalDepCompiler[T, CarryOver] extends DepCompiler[T]
Improved version of DepCompiler that can update the dependencies from the previous run instead of recalculating them from scratch.
Improved version of DepCompiler that can update the dependencies from the previous run instead of recalculating them from scratch.
The non-incremental case works exactly as in DepCompiler.
The incremental case is implemented in three parts: 1) a function that updates the previous dependency graph given changes from the input. 2) a function that, given a subgraph of the dependency graph, calculates the
T
s. 3) the compile function, the same as in DepCompiler, in invoked on the output partitions that are affected by the changes.A mechanism to carry over an arbitrary intermediate result for the 1st function to the 2nd function is also provided.
- T
the type of the values collected from dependencies for each output partition
- CarryOver
the type of the opaque object carried over from 1st to 2nd function
- Note
This is a Java friendly version of com.here.platform.data.processing.compiler.IncrementalDepCompiler.
-
trait
InputLayers extends AnyRef
Java friendly version of com.here.platform.data.processing.compiler.InputLayers.
-
trait
InputOptPartitioner extends AnyRef
Java friendly version of com.here.platform.data.processing.compiler.InputOptPartitioner.
-
trait
InputPartitioner extends AnyRef
Java friendly version of com.here.platform.data.processing.compiler.InputPartitioner.
-
trait
MapGroupCompiler[T] extends InputLayers with InputOptPartitioner with CompileInFn[T] with OutputLayers with OutputOptPartitioner with CompileOutFn[T]
Compiler to implement a generic Map-Reduce pattern, where the reduce function is group-by.
Compiler to implement a generic Map-Reduce pattern, where the reduce function is group-by. The front-end compiler each input partition and produces the list of output partition that this input affect, each with a value of custom type. Values are then grouped per output partition and passed to the back-end that produces the output map.
This pattern is a more general version of Direct1ToNCompiler and DirectMToNCompiler where not only a M:N input/output relationship is supported, but this relationship is function of the input payloads, so the input content.
This pattern, however, compiles input partitions standalone, meaning that compiling one input partition sees data and metadata of that partition only. In case it is needed to lookup information from additional input partition in the front-end, please refer to RefTreeCompiler.
- T
the custom type of the values passed between front-end and back-end
- Note
the implementation must be scala.Serializable as this is copied to workers and run inside Spark map functions
,This is a Java friendly version of com.here.platform.data.processing.compiler.MapGroupCompiler.
- See also
extended interfaces for more details
-
trait
NonIncrementalCompiler extends InputLayers with InputPartitioner with OutputLayers with OutputPartitioner
The simplest interface that an actual compiler can implement.
The simplest interface that an actual compiler can implement. This leaves the maximum flexibility to the compiler implementors, but has no support for incremental compilation.
- Note
This is a Java friendly version of com.here.platform.data.processing.compiler.NonIncrementalCompiler.
- See also
extended interfaces for more details
- type OutKey = Key
- type OutMeta = Meta
-
trait
OutputLayers extends AnyRef
Java friendly version of com.here.platform.data.processing.compiler.OutputLayers.
-
trait
OutputOptPartitioner extends AnyRef
Java friendly version of com.here.platform.data.processing.compiler.OutputOptPartitioner.
-
trait
OutputPartitioner extends AnyRef
Java friendly version of com.here.platform.data.processing.compiler.OutputPartitioner.
-
trait
RefTreeCompiler[T] extends InputLayers with InputOptPartitioner with ResolveInFn with CompileInFn[T] with OutputLayers with OutputOptPartitioner with CompileOutFn[T]
A RefTreeCompiler allows full and incremental compilation with complex reference structures.
A RefTreeCompiler allows full and incremental compilation with complex reference structures. A condition is that all references of a layer can be calculated purely from the source data of this layer using a resolve function, which just gets the meta data of one partition as an input. The structure of the references has to be predefined in a reftree.RefTree object.
Apart from the reference resolution pre-phase described above, the compilation itself is split in two phases, front-end and backend, as in other compilers, similar to MapGroupCompiler.
In the first phase, compileIn from reftree.CompileInFn is called for every partition, with the full list of meta data for all of its referenced partitions. This method returns one or more values of type
T
for every impacted output partition.The compileIn function for the first phase of the compilation is defined in traits that extend reftree.CompileInFn. One of these need to be mixed in, like reftree.CompileInFnWithRefs or reftree.CompileInFnWithRefsReturnsReferences.
In the second phase, a method from one of the CompileOutFn traits is called for every output partition where the first phase provided at least one element of
T
. Elements coming from various input partitions are grouped together and provided as input of compilation for each output partition.- T
The custom type of the values passed between front-end and back-end
- Note
The implementation must be scala.Serializable as this is copied to workers and run inside Spark map functions.
,This is a Java friendly version of com.here.platform.data.processing.compiler.RefTreeCompiler.
- See also
extended interfaces for more details
- type ToPublish = JavaPairRDD[OutKey, Option[Payload]]