Packages

package broadcast

This package should be considered as the preferred way of working with broadcast variables in a Data Processing Library compiler. It provides the basic functionality to create an instance of org.apache.spark.broadcast.Broadcast.

This functionality is based on org.apache.spark.broadcast.Broadcast which offers a self-contained mechanism to create a single broadcast variable of a generic type.

The toBroadcast() method is provided to create a broadcast variable and add the hash to the fingerprint of the com.here.platform.data.processing.driver.DriverContext, which is required in order for incremental compilations to work correctly.

The package also enables developers to query the catalogs without the need to manage versions manually.

Linear Supertypes
AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. broadcast
  2. AnyRef
  3. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Type Members

  1. abstract class BroadcastCompiler[T] extends InputLayers with InputOptPartitioner

    Offers the interface for the data compilation supposed to be stored in a Spark org.apache.spark.broadcast.Broadcast variable.

    Offers the interface for the data compilation supposed to be stored in a Spark org.apache.spark.broadcast.Broadcast variable.

    Compiled data is an object instance of type T. The suggested use of this functionality is to:

    • perform the broadcast creations during the driver setting up.
    • send the broadcast variables as a parameter to the compiler which is under creation.

    This compiler class is the preferred way to work with broadcast variables in the Data Processing Library.

    T

    The type of the object to be stored in the broadcast variable.

Value Members

  1. def queryInputMeta(context: DriverContext, inLayers: Map[Id, Set[Id]], inPartitioner: InputOptPartitioner, parallelism: Int): RDD[(InKey, InMeta)]

    Gets the input metadata from the input catalogs at the version that needs to be compiled.

    Gets the input metadata from the input catalogs at the version that needs to be compiled.

    context

    The com.here.platform.data.processing.driver.DriverContext object that the compiler is running in.

    inLayers

    The input layers containing the partitions to be queried.

    inPartitioner

    The input partition option, if None a default hash based partitioner with the default parallelism will be used

    parallelism

    The parallelism of the partitioner.

    returns

    The input metadata RDD.

  2. def toBroadcast[T](context: DriverContext, t: T, fingerprint: Int, fingerprintId: String)(implicit arg0: ClassTag[T]): Broadcast[T]

    Creates a broadcast variable out of an object instance of type T.

    Creates a broadcast variable out of an object instance of type T.

    Please note that this method updates a mutable content in the com.here.platform.data.processing.driver.DriverContext object: it changes a shared status. Multiple calls to this functionality should then happen in a deterministic order.

    The normal use of this functionality is to:

    • perform the broadcast creations during the driver setting up.
    • send the broadcast variables as a parameter to the compiler which is under creation.
    T

    The type of the broadcast variable under creation.

    context

    The com.here.platform.data.processing.driver.DriverContext object that the compiler is running in.

    t

    The object to create a broadcast variable out of.

    fingerprint

    The fingerprint of the parameter t that will be stored in the com.here.platform.data.processing.driver.Fingerprints.

    fingerprintId

    The ID used to store the fingerprint.

    returns

    The broadcast variable.

    Note

    As object hashes are added to the context fingerprints, objects of type T should have stable hashCode() methods, implemented based on the actual object content.

  3. def toBroadcast[T](context: DriverContext, t: T, fingerprint: Int)(implicit arg0: ClassTag[T]): Broadcast[T]

    Creates a broadcast variable out of an object instance of type T.

    Creates a broadcast variable out of an object instance of type T.

    Please note that this method updates a mutable content in the com.here.platform.data.processing.driver.DriverContext object: it changes a shared status. Multiple calls to this functionality should then happen in a deterministic order.

    The normal use of this functionality is to:

    • perform the broadcast creations during the driver setting up.
    • send the broadcast variables as a parameter to the compiler which is under creation.
    T

    The type of the broadcast variable under creation.

    context

    The com.here.platform.data.processing.driver.DriverContext object that the compiler is running in.

    t

    The object to create a broadcast variable out of.

    fingerprint

    The fingerprint of the parameter t that will be stored in the com.here.platform.data.processing.driver.Fingerprints.

    returns

    The broadcast variable.

    Note

    As object hashes are added to the context fingerprints, objects of type T should have stable hashCode() methods, implemented based on the actual object content.

  4. def toBroadcast[T](context: DriverContext, t: T, fingerprintId: String)(implicit arg0: ClassTag[T]): Broadcast[T]

    Creates a broadcast variable out of an object instance of type T.

    Creates a broadcast variable out of an object instance of type T.

    Please note that this method updates mutable content in the com.here.platform.data.processing.driver.DriverContext object: it changes a shared status. Multiple calls to this functionality should then happen in a deterministic order.

    The normal use of this functionality is to:

    • perform the broadcast creations during the driver setting up.
    • send the broadcast variables as a parameter to the compiler which is under creation.
    T

    The type of the broadcast variable under creation.

    context

    The com.here.platform.data.processing.driver.DriverContext object that the compiler is running in.

    t

    The object to create a broadcast variable out of.

    fingerprintId

    The ID used to store the fingerprint of the object.

    returns

    The broadcast variable.

    Note

    As object hashes are added to the context fingerprints, objects of type T should have stable hashCode() methods, implemented based on the actual object content.

  5. def toBroadcast[T](context: DriverContext, t: T)(implicit arg0: ClassTag[T]): Broadcast[T]

    Creates a broadcast variable out of an object instance of type T.

    Creates a broadcast variable out of an object instance of type T.

    Please note that this method updates mutable content in the com.here.platform.data.processing.driver.DriverContext object: it changes a shared status. Multiple calls to this functionality should then happen in a deterministic order.

    The normal use of this functionality is to:

    • perform the broadcast creations during the driver setting up.
    • send the broadcast variables as a parameter to the compiler which is under creation.
    T

    The type of the broadcast variable under creation.

    context

    The com.here.platform.data.processing.driver.DriverContext object that the compiler is running in.

    t

    The object to create a broadcast variable out of.

    returns

    The broadcast variable.

    Note

    As object hashes are added to the context fingerprints, objects of type T should have stable hashCode() methods, implemented based on the actual object content.

  6. def toBroadcastNoUpdateFingerprints[T](context: DriverContext, t: T)(implicit arg0: ClassTag[T]): Broadcast[T]

    Creates a broadcast variable out of an object instance of type T without updating fingerprints WARNING: the recompilation won't be triggered if a change of the broadcast object is detected.

    Creates a broadcast variable out of an object instance of type T without updating fingerprints WARNING: the recompilation won't be triggered if a change of the broadcast object is detected.

    The normal use of this functionality is to:

    • perform the broadcast creations during the driver setting up.
    • send the broadcast variables as a parameter to the compiler which is under creation.
    T

    The type of the broadcast variable under creation.

    context

    The com.here.platform.data.processing.driver.DriverContext object that the compiler is running in.

    t

    The object to create a broadcast variable out of.

    returns

    The broadcast variable.

    Note

    As object hashes AREN'T added to the context fingerprints, so the recompilation won't be triggered if a change of the object is detected.

Inherited from AnyRef

Inherited from Any

Ungrouped