package broadcast
This package should be considered as the preferred way of working with broadcast variables in a Data Processing Library compiler. It provides the basic functionality to create an instance of org.apache.spark.broadcast.Broadcast.
This functionality is based on org.apache.spark.broadcast.Broadcast which offers a self-contained mechanism to create a single broadcast variable of a generic type.
The toBroadcast() method is provided to create a broadcast variable and add the hash to the fingerprint of the com.here.platform.data.processing.driver.DriverContext, which is required in order for incremental compilations to work correctly.
The package also enables developers to query the catalogs without the need to manage versions manually.
- Alphabetic
- By Inheritance
- broadcast
- AnyRef
- Any
- Hide All
- Show All
- Public
- All
Type Members
-
abstract
class
BroadcastCompiler[T] extends InputLayers with InputOptPartitioner
Offers the interface for the data compilation supposed to be stored in a Spark org.apache.spark.broadcast.Broadcast variable.
Offers the interface for the data compilation supposed to be stored in a Spark org.apache.spark.broadcast.Broadcast variable.
Compiled data is an object instance of type T. The suggested use of this functionality is to:
- perform the broadcast creations during the driver setting up.
- send the broadcast variables as a parameter to the compiler which is under creation.
This compiler class is the preferred way to work with broadcast variables in the Data Processing Library.
- T
The type of the object to be stored in the broadcast variable.
Value Members
-
def
queryInputMeta(context: DriverContext, inLayers: Map[Id, Set[Id]], inPartitioner: InputOptPartitioner, parallelism: Int): RDD[(InKey, InMeta)]
Gets the input metadata from the input catalogs at the version that needs to be compiled.
Gets the input metadata from the input catalogs at the version that needs to be compiled.
- context
The com.here.platform.data.processing.driver.DriverContext object that the compiler is running in.
- inLayers
The input layers containing the partitions to be queried.
- inPartitioner
The input partition option, if None a default hash based partitioner with the default parallelism will be used
- parallelism
The parallelism of the partitioner.
- returns
The input metadata RDD.
-
def
toBroadcast[T](context: DriverContext, t: T, fingerprint: Int, fingerprintId: String)(implicit arg0: ClassTag[T]): Broadcast[T]
Creates a broadcast variable out of an object instance of type T.
Creates a broadcast variable out of an object instance of type T.
Please note that this method updates a mutable content in the com.here.platform.data.processing.driver.DriverContext object: it changes a shared status. Multiple calls to this functionality should then happen in a deterministic order.
The normal use of this functionality is to:
- perform the broadcast creations during the driver setting up.
- send the broadcast variables as a parameter to the compiler which is under creation.
- T
The type of the broadcast variable under creation.
- context
The com.here.platform.data.processing.driver.DriverContext object that the compiler is running in.
- t
The object to create a broadcast variable out of.
- fingerprint
The fingerprint of the parameter t that will be stored in the com.here.platform.data.processing.driver.Fingerprints.
- fingerprintId
The ID used to store the fingerprint.
- returns
The broadcast variable.
- Note
As object hashes are added to the context fingerprints, objects of type T should have stable hashCode() methods, implemented based on the actual object content.
-
def
toBroadcast[T](context: DriverContext, t: T, fingerprint: Int)(implicit arg0: ClassTag[T]): Broadcast[T]
Creates a broadcast variable out of an object instance of type T.
Creates a broadcast variable out of an object instance of type T.
Please note that this method updates a mutable content in the com.here.platform.data.processing.driver.DriverContext object: it changes a shared status. Multiple calls to this functionality should then happen in a deterministic order.
The normal use of this functionality is to:
- perform the broadcast creations during the driver setting up.
- send the broadcast variables as a parameter to the compiler which is under creation.
- T
The type of the broadcast variable under creation.
- context
The com.here.platform.data.processing.driver.DriverContext object that the compiler is running in.
- t
The object to create a broadcast variable out of.
- fingerprint
The fingerprint of the parameter t that will be stored in the com.here.platform.data.processing.driver.Fingerprints.
- returns
The broadcast variable.
- Note
As object hashes are added to the context fingerprints, objects of type T should have stable hashCode() methods, implemented based on the actual object content.
-
def
toBroadcast[T](context: DriverContext, t: T, fingerprintId: String)(implicit arg0: ClassTag[T]): Broadcast[T]
Creates a broadcast variable out of an object instance of type T.
Creates a broadcast variable out of an object instance of type T.
Please note that this method updates mutable content in the com.here.platform.data.processing.driver.DriverContext object: it changes a shared status. Multiple calls to this functionality should then happen in a deterministic order.
The normal use of this functionality is to:
- perform the broadcast creations during the driver setting up.
- send the broadcast variables as a parameter to the compiler which is under creation.
- T
The type of the broadcast variable under creation.
- context
The com.here.platform.data.processing.driver.DriverContext object that the compiler is running in.
- t
The object to create a broadcast variable out of.
- fingerprintId
The ID used to store the fingerprint of the object.
- returns
The broadcast variable.
- Note
As object hashes are added to the context fingerprints, objects of type T should have stable hashCode() methods, implemented based on the actual object content.
-
def
toBroadcast[T](context: DriverContext, t: T)(implicit arg0: ClassTag[T]): Broadcast[T]
Creates a broadcast variable out of an object instance of type T.
Creates a broadcast variable out of an object instance of type T.
Please note that this method updates mutable content in the com.here.platform.data.processing.driver.DriverContext object: it changes a shared status. Multiple calls to this functionality should then happen in a deterministic order.
The normal use of this functionality is to:
- perform the broadcast creations during the driver setting up.
- send the broadcast variables as a parameter to the compiler which is under creation.
- T
The type of the broadcast variable under creation.
- context
The com.here.platform.data.processing.driver.DriverContext object that the compiler is running in.
- t
The object to create a broadcast variable out of.
- returns
The broadcast variable.
- Note
As object hashes are added to the context fingerprints, objects of type T should have stable hashCode() methods, implemented based on the actual object content.
-
def
toBroadcastNoUpdateFingerprints[T](context: DriverContext, t: T)(implicit arg0: ClassTag[T]): Broadcast[T]
Creates a broadcast variable out of an object instance of type T without updating fingerprints WARNING: the recompilation won't be triggered if a change of the broadcast object is detected.
Creates a broadcast variable out of an object instance of type T without updating fingerprints WARNING: the recompilation won't be triggered if a change of the broadcast object is detected.
The normal use of this functionality is to:
- perform the broadcast creations during the driver setting up.
- send the broadcast variables as a parameter to the compiler which is under creation.
- T
The type of the broadcast variable under creation.
- context
The com.here.platform.data.processing.driver.DriverContext object that the compiler is running in.
- t
The object to create a broadcast variable out of.
- returns
The broadcast variable.
- Note
As object hashes AREN'T added to the context fingerprints, so the recompilation won't be triggered if a change of the object is detected.