Packages

  • package root
    Definition Classes
    root
  • package com
    Definition Classes
    root
  • package here
    Definition Classes
    com
  • package platform
    Definition Classes
    here
  • package data
    Definition Classes
    platform
  • package processing

    This package provides the Data Processing Library for building distributed data processing applications.

    This package provides the Data Processing Library for building distributed data processing applications.

    A Runner both implements the interface with the environment for an application to run, and starts the application. The application, in turn, is driven by a Driver, that controls and performs the distributed processing.

    Choose a Runner best suited for the environment where the application runs.

    The Driver performs one of more tasks which read layers from input catalogs and write to one or more layers of an output catalog.

    The main entry point in the processing library is the com.here.platform.data.processing.driver.DriverBuilder class where you can add different kinds of tasks to the driver. The driver runs the tasks, and commits the final results to the output catalog.

    Tasks are implemented using one or more compilers.

    The simplest compiler is the direct compiler which maps each input tile to N output tiles. The application needs to define com.here.platform.data.processing.compiler.Direct1ToNCompiler.

    Other more complex compilation patterns are based on some kind of dependency tracking between input partitions and output partitions.

    The processing Library supports the following patterns:

    - com.here.platform.data.processing.compiler.NonIncrementalCompiler: non-incremental compilation only - com.here.platform.data.processing.compiler.DepCompiler: non-incremental dependency calculation and incremental compilation - com.here.platform.data.processing.compiler.IncrementalDepCompiler: incremental dependency calculation and compilation - com.here.platform.data.processing.compiler.Direct1ToNCompiler: incremental compilation where every output tile depends only on one input tile, and this mapping is independent from tile content - com.here.platform.data.processing.compiler.DirectMToNCompiler: incremental compilation where every output tile depends on multiple input tiles, and this mapping is independent from tile content - com.here.platform.data.processing.compiler.MapGroupCompiler: incremental compilation where every output tile can depend on multiple input tiles, and this mapping depend on the tile content - com.here.platform.data.processing.compiler.RefTreeCompiler: fully-managed two phases incremental compilation that can resolve references between input partitions. Input/Output dependency management is implemented and the developer doesn't need to provide this logic

    The application's main object normally mixes in the a runner trait (like PipelineRunner) to setup the Driver, and interfaces with the environment where the application is run. See the Main classes in the example compilers for more details.

    com.here.platform.data.processing.catalog, com.here.platform.data.processing.blobstore, and com.here.platform.data.processing.publisher contain utilities for accessing catalogs and payloads in a Spark-friendly way, providing an RDD-based abstraction over data and metadata. These classes are used by the processing library, but can also be used independently.

    Definition Classes
    data
  • package broadcast

    This package should be considered as the preferred way of working with broadcast variables in a Data Processing Library compiler.

    This package should be considered as the preferred way of working with broadcast variables in a Data Processing Library compiler. It provides the basic functionality to create an instance of org.apache.spark.broadcast.Broadcast.

    This functionality is based on org.apache.spark.broadcast.Broadcast which offers a self-contained mechanism to create a single broadcast variable of a generic type.

    The toBroadcast() method is provided to create a broadcast variable and add the hash to the fingerprint of the com.here.platform.data.processing.driver.DriverContext, which is required in order for incremental compilations to work correctly.

    The package also enables developers to query the catalogs without the need to manage versions manually.

    Definition Classes
    processing
  • BroadcastCompiler

abstract class BroadcastCompiler[T] extends InputLayers with InputOptPartitioner

Offers the interface for the data compilation supposed to be stored in a Spark org.apache.spark.broadcast.Broadcast variable.

Compiled data is an object instance of type T. The suggested use of this functionality is to:

  • perform the broadcast creations during the driver setting up.
  • send the broadcast variables as a parameter to the compiler which is under creation.

This compiler class is the preferred way to work with broadcast variables in the Data Processing Library.

T

The type of the object to be stored in the broadcast variable.

Linear Supertypes
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. BroadcastCompiler
  2. InputOptPartitioner
  3. InputLayers
  4. AnyRef
  5. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Instance Constructors

  1. new BroadcastCompiler(context: DriverContext, id: String)(implicit arg0: ClassTag[T])

    BroadcastCompiler constructor that specifies the compiler ID.

    BroadcastCompiler constructor that specifies the compiler ID.

    context

    The com.here.platform.data.processing.driver.DriverContext object that the compiler is running in.

    id

    The ID for the compiler. Used to stored the fingerprint of the broadcast variable with an ID, which makes it independent of creation order.

  2. new BroadcastCompiler(context: DriverContext)(implicit arg0: ClassTag[T])

    BroadcastCompiler constructor.

    BroadcastCompiler constructor.

    context

    The com.here.platform.data.processing.driver.DriverContext object that the compiler is running in.

Abstract Value Members

  1. abstract def calculateFingerprint(t: T): Int

    Calculate the fingerprint of the compiled data.

    Calculate the fingerprint of the compiled data.

    t

    The data to calculate the fingerprint of.

    returns

    The calculated fingerprint.

    Attributes
    protected
    Note

    The fingerprint calculation needs to be stable, otherwise the correctness of incremental compilation will be compromised.

  2. abstract def compile(input: RDD[(InKey, InMeta)], parallelism: Int): T

    Compiles an input metadata RDD into an object of data T.

    Compiles an input metadata RDD into an object of data T.

    This function needs to be overridden to perform the compilation of the input meta RDD.

    input

    The input metadata RDD.

    parallelism

    The parallelism of the RDDs. This parameter is normally needed to get partitioners from com.here.platform.data.processing.compiler.InputOptPartitioner and/or from com.here.platform.data.processing.compiler.OutputOptPartitioner traits.

    returns

    The compilation result.

    Attributes
    protected
  3. abstract def inLayers: Map[Id, Set[Id]]

    Represents layers of the input catalogs that you should query and provide to the compiler.

    Represents layers of the input catalogs that you should query and provide to the compiler. These layers are grouped by input catalog and identified by catalog ID and layer ID.

    Definition Classes
    InputLayers
  4. abstract def inPartitioner(parallelism: Int): Option[Partitioner[InKey]]

    Specifies the partitioner to use when querying the input catalogs.

    Specifies the partitioner to use when querying the input catalogs. If no partitioner is provided, by returning None from this function, then the Executor uses the default partitioner.

    parallelism

    The number of partitions the partitioner should partition the catalog into, this should match the parallelism of the Spark RDD containing the input partitions.

    returns

    The optional input partitioner with the parallelism specified.

    Definition Classes
    InputOptPartitioner

Concrete Value Members

  1. final def !=(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  4. final def asInstanceOf[T0]: T0
    Definition Classes
    Any
  5. def clone(): AnyRef
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... ) @native()
  6. final def compileBroadcast(parallelism: Int): Broadcast[T]

    Performs the compilation and the related operations to create the broadcast variable and to update the fingerprints.

    Performs the compilation and the related operations to create the broadcast variable and to update the fingerprints. The id passed to the constructor, if specified, is used as fingerprint ID.

    Class users should call this method. Note that this method updates a mutable content in the com.here.platform.data.processing.driver.DriverContext object and changes a shared status. Multiple calls to this function should then happen in a deterministic order.

    parallelism

    The parallelism of the RDDs. This parameter is normally needed to get partitioners from com.here.platform.data.processing.compiler.InputOptPartitioner and/or from com.here.platform.data.processing.compiler.OutputOptPartitioner traits

    returns

    The resulting broadcast variable.

  7. final def compileBroadcastNoUpdateFingerprints(parallelism: Int): Broadcast[T]

    Performs the compilation and the related operations to create the broadcast variable without updating the fingerprints.

    Performs the compilation and the related operations to create the broadcast variable without updating the fingerprints. WARNING: the recompilation won't be triggered if a change of the broadcast object is detected.

    parallelism

    The parallelism of the RDDs. This parameter is normally needed to get partitioners from com.here.platform.data.processing.compiler.InputOptPartitioner and/or from com.here.platform.data.processing.compiler.OutputOptPartitioner traits

    returns

    The resulting broadcast variable.

  8. final def eq(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  9. def equals(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  10. def finalize(): Unit
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  11. final def getClass(): Class[_]
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  12. def hashCode(): Int
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  13. final def isInstanceOf[T0]: Boolean
    Definition Classes
    Any
  14. final def ne(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  15. final def notify(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  16. final def notifyAll(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  17. final def synchronized[T0](arg0: ⇒ T0): T0
    Definition Classes
    AnyRef
  18. def toString(): String
    Definition Classes
    AnyRef → Any
  19. final def wait(): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  20. final def wait(arg0: Long, arg1: Int): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  21. final def wait(arg0: Long): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... ) @native()

Inherited from InputOptPartitioner

Inherited from InputLayers

Inherited from AnyRef

Inherited from Any

Ungrouped