Packages

trait DepCompiler[T] extends DepCompilerBase[T] with InputLayers with InputPartitioner

Interface for a basic incremental dependency-based compiler.

Calculated dependencies shall be exhaustive especially for the incremental case, as the compiler will schedule compilation of the output partition returned as part of dependency calculation without any intermediate further processing.

Returning non-exhaustive dependencies has the effect of producing a corrupted output catalog when doing incremental compilation, or an output catalog with missing/invalid data when doing full compilation.

The compiler is efficient only in case the cost of calculating dependencies is negligible compared to the cost of actually producing the output map, as dependency calculation is always triggered with the full input map.

T

The type of the values collected from dependencies for each output partition

See also

traits mixed in for more details

Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. DepCompiler
  2. InputPartitioner
  3. InputLayers
  4. DepCompilerBase
  5. OutputPartitioner
  6. OutputLayers
  7. AnyRef
  8. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Type Members

  1. type DepGraph = RDD[(InKey, OutKey)]

    The input/output dependencies in the following form: as first key com.here.platform.data.processing.compiler.InKey, the input partition key, as value com.here.platform.data.processing.compiler.OutKey, the output partition key that depends on the input.

    The input/output dependencies in the following form: as first key com.here.platform.data.processing.compiler.InKey, the input partition key, as value com.here.platform.data.processing.compiler.OutKey, the output partition key that depends on the input. In case one input partition originates dependencies to multiple outputs, each must be represented as single entry in this RDD with the same com.here.platform.data.processing.compiler.InKey as key.

  2. type IntermediateResult = RDD[(OutKey, T)]

    The values calculated as part of the dependencies that are provided for each com.here.platform.data.processing.compiler.OutKey.

    The values calculated as part of the dependencies that are provided for each com.here.platform.data.processing.compiler.OutKey. In case one output partition receives multiple values, these must be represented as multiple entries in this RDD with the same com.here.platform.data.processing.compiler.OutKey as key.

    Definition Classes
    DepCompilerBase
  3. type ToCompile = RDD[(OutKey, Iterable[T])]

    The aggregated Ts for each com.here.platform.data.processing.compiler.OutKey provided as input of compilation.

    The aggregated Ts for each com.here.platform.data.processing.compiler.OutKey provided as input of compilation.

    Definition Classes
    DepCompilerBase

Abstract Value Members

  1. abstract def compileIn(inData: InData, parallelism: Int): (DepGraph, IntermediateResult)

    Calculates the dependencies of the output partitions in terms of input partitions.

    Calculates the dependencies of the output partitions in terms of input partitions. These dependencies are aggregated and provided later to the compileOut call. This method is invoked in both full and incremental re-compile cases with the whole input catalog as input.

    inData

    the metadata of the whole input catalog. Keys are partitioned as specified in com.here.platform.data.processing.compiler.InputPartitioner

    parallelism

    the parallelism of both the input and the output RDDs. This parameter is normally needed to get partitioners from com.here.platform.data.processing.compiler.InputOptPartitioner and/or from com.here.platform.data.processing.compiler.OutputOptPartitioner traits

    returns

    the input/output dependencies and the values for each of them. Returned graph must be partitioned by inPartitioner and the intermediate result by com.here.platform.data.processing.compiler.OutputPartitioner.outPartitioner for optimal performances.

    Note

    please note and follow the RDD persistence policy described in com.here.platform.data.processing.driver.Executor

  2. abstract def compileOut(toCompile: ToCompile, parallelism: Int): ToPublish

    Compiles partitions and returns actual compiled data, if any.

    Compiles partitions and returns actual compiled data, if any. This method is invoked in both full and incremental re-compile cases.

    The required behaviour for this method is to return exactly the same number of elements, with the same values for the out keys, as were passed in toCompile.

    toCompile

    the first com.here.platform.data.processing.compiler.OutKey is the output partition key. The second iterable collection of T contains the dependency values for that key. The input is provided partitioned according to com.here.platform.data.processing.compiler.OutputPartitioner. The RDD is not persisted and, if used multiple times, it should be persisted.

    parallelism

    the parallelism of both the input and the output RDDs. This parameter is normally needed to get partitioners from com.here.platform.data.processing.compiler.InputOptPartitioner and/or from com.here.platform.data.processing.compiler.OutputOptPartitioner traits

    returns

    com.here.platform.data.processing.compiler.OutKey is the key of compiled partition. com.here.platform.data.processing.blobstore.Payload is the output data, if any. The returned keys shall be partitioned as specified in com.here.platform.data.processing.compiler.OutputPartitioner, so it is expected that this function keeps using this partitioner passed in the input otherwise the produced data is shuffled and this must be avoided (and enforced)

    Definition Classes
    DepCompilerBase
  3. abstract def inLayers: Map[Id, Set[Id]]

    Represents layers of the input catalogs that you should query and provide to the compiler.

    Represents layers of the input catalogs that you should query and provide to the compiler. These layers are grouped by input catalog and identified by catalog ID and layer ID.

    Definition Classes
    InputLayers
  4. abstract def inPartitioner(parallelism: Int): Partitioner[InKey]

    Specifies the partitioner to use when querying the input catalogs.

    Specifies the partitioner to use when querying the input catalogs.

    parallelism

    The number of partitions the partitioner should partition the catalog into, this should match the parallelism of the Spark RDD containing the input partitions.

    returns

    The input partitioner with the parallelism specified.

    Definition Classes
    InputPartitioner
  5. abstract def outLayers: Set[Id]

    Layers to be produced by the compiler.

    Layers to be produced by the compiler.

    Definition Classes
    OutputLayers
  6. abstract def outPartitioner(parallelism: Int): Partitioner[OutKey]

    Specifies the partitioner to use when querying the output catalog and producing output data.

    Specifies the partitioner to use when querying the output catalog and producing output data.

    parallelism

    The number of partitions the partitioner should partition the catalog into, this should match the parallelism of the Spark RDD containing the output partitions.

    returns

    The output partitioner with the parallelism specified.

    Definition Classes
    OutputPartitioner

Concrete Value Members

  1. final def !=(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  4. final def asInstanceOf[T0]: T0
    Definition Classes
    Any
  5. def clone(): AnyRef
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... ) @native()
  6. final def eq(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  7. def equals(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  8. def finalize(): Unit
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  9. final def getClass(): Class[_]
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  10. def hashCode(): Int
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  11. final def isInstanceOf[T0]: Boolean
    Definition Classes
    Any
  12. final def ne(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  13. final def notify(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  14. final def notifyAll(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  15. final val outCatalogId: Id

    Identifier for the output catalog.

    Identifier for the output catalog.

    Definition Classes
    OutputLayers
  16. final def synchronized[T0](arg0: ⇒ T0): T0
    Definition Classes
    AnyRef
  17. def toString(): String
    Definition Classes
    AnyRef → Any
  18. final def wait(): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  19. final def wait(arg0: Long, arg1: Int): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  20. final def wait(arg0: Long): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... ) @native()

Inherited from InputPartitioner

Inherited from InputLayers

Inherited from DepCompilerBase[T]

Inherited from OutputPartitioner

Inherited from OutputLayers

Inherited from AnyRef

Inherited from Any

Ungrouped