package broadcast
Provides the basic functionality to perform org.apache.spark.broadcast.Broadcast creation.
To use a broadcast variable in the processing library, developers are required to add the variable's hash to the fingerprints. This ensures that incremental compilations are not compromised. This package object provides the method toBroadcast() to use for that purpose. The package also offers a functionality to query catalogs by properly managing versions. The functionality is based on the org.apache.spark.broadcast.Broadcast which offers a self-contained mechanism to create a single broadcast variable of a generic type.
This compiler class is the preferred way of working with broadcast variables in the Data Processing Library.
- Alphabetic
- By Inheritance
- broadcast
- AnyRef
- Any
- Hide All
- Show All
- Public
- All
Type Members
-
abstract
class
BroadcastCompiler[T] extends InputLayers with InputOptPartitioner
Offers the interface for the data compilation supposed to be stored on a Spark org.apache.spark.broadcast.Broadcast variable.
Offers the interface for the data compilation supposed to be stored on a Spark org.apache.spark.broadcast.Broadcast variable.
Compiled data is an object instance of type T. The suggested use of this functionality is to: - perform the broadcast creations during the driver setting up. - send the broadcast variables to the compiler which is under creation as a parameter.
- T
the type of the object to be stored in the broadcast variable
Value Members
-
def
queryInputMeta(context: DriverContext, inLayers: Map[String, Set[String]], inPartitioner: InputOptPartitioner, parallelism: Int): JavaPairRDD[InKey, InMeta]
Gets the input metadata from the input catalogs at the version that needs to be compiled.
Gets the input metadata from the input catalogs at the version that needs to be compiled.
- context
the com.here.platform.data.processing.java.driver.DriverContext object
- inLayers
the input layers
- inPartitioner
the input partition option, if empty a default hash based partitioner on the default parallelism will be used
- parallelism
the parallelism of both the output RDD
- returns
the input metadata RDD
-
def
toBroadcast[T](context: DriverContext, t: T, fingerprint: Int, fingerprintId: String, clazz: Class[T]): Broadcast[T]
Creates a broadcast variable out of an object instance of type T.
Creates a broadcast variable out of an object instance of type T.
Please note that this method updates a mutable content in the com.here.platform.data.processing.java.driver.DriverContext object: it changes a shared status. Multiple calls to this functionality should then happen in a deterministic order.
The normal use of this functionality is to: - perform the broadcast creations during the driver setting up. - send the broadcast variables to the compiler which is under creation as a parameter.
- T
the type of the broadcast variable under creation
- context
the com.here.platform.data.processing.java.driver.DriverContext object
- t
the object to create a broadcast variable out ot it
- fingerprint
the fingerprint of the parameter t that will be stored in the com.here.platform.data.processing.driver.Fingerprints
- fingerprintId
The ID used to store the fingerprint.
- clazz
the Class of type T
- returns
the broadcast variable
- Note
the stability of the provided fingerprint calculation is crucial to not compromise subsequent incremental compilations
-
def
toBroadcast[T](context: DriverContext, t: T, fingerprint: Int, clazz: Class[T]): Broadcast[T]
Creates a broadcast variable out of an object instance of type T.
Creates a broadcast variable out of an object instance of type T.
Please note that this method updates a mutable content in the com.here.platform.data.processing.java.driver.DriverContext object: it changes a shared status. Multiple calls to this functionality should then happen in a deterministic order.
The normal use of this functionality is to: - perform the broadcast creations during the driver setting up. - send the broadcast variables to the compiler which is under creation as a parameter.
- T
the type of the broadcast variable under creation
- context
the com.here.platform.data.processing.java.driver.DriverContext object
- t
the object to create a broadcast variable out ot it
- fingerprint
the fingerprint of the parameter t that will be stored in the com.here.platform.data.processing.driver.Fingerprints
- clazz
the Class of type T
- returns
the broadcast variable
- Note
the stability of the provided fingerprint calculation is crucial to not compromise subsequent incremental compilations
-
def
toBroadcast[T](context: DriverContext, t: T, fingerprintId: String, clazz: Class[T]): Broadcast[T]
Creates a broadcast variable out of an object instance of type T.
Creates a broadcast variable out of an object instance of type T.
Please note that this method updates a mutable content in the com.here.platform.data.processing.java.driver.DriverContext object: it changes a shared status. Multiple calls to this functionality should then happen in a deterministic order.
The normal use of this functionality is to: - perform the broadcast creations during the driver setting up. - send the broadcast variables to the compiler which is under creation as a parameter.
- T
the type of the broadcast variable under creation
- context
the com.here.platform.data.processing.java.driver.DriverContext object
- t
the object to create a broadcast variable out ot it
- fingerprintId
The ID used to store the fingerprint.
- clazz
the Class of type T
- returns
the broadcast variable
- Note
as object hashes are added to fingerprints, objects of type T are supposed to have stable hashCode() methods, implemented based of the actual object content
-
def
toBroadcast[T](context: DriverContext, t: T, clazz: Class[T]): Broadcast[T]
Creates a broadcast variable out of an object instance of type T.
Creates a broadcast variable out of an object instance of type T.
Please note that this method updates a mutable content in the com.here.platform.data.processing.java.driver.DriverContext object: it changes a shared status. Multiple calls to this functionality should then happen in a deterministic order.
The normal use of this functionality is to: - perform the broadcast creations during the driver setting up. - send the broadcast variables to the compiler which is under creation as a parameter.
- T
the type of the broadcast variable under creation
- context
the com.here.platform.data.processing.java.driver.DriverContext object
- t
the object to create a broadcast variable out ot it
- clazz
the Class of type T
- returns
the broadcast variable
- Note
as object hashes are added to fingerprints, objects of type T are supposed to have stable hashCode() methods, implemented based of the actual object content
-
def
toBroadcastNoUpdateFingerprints[T](context: DriverContext, t: T, clazz: Class[T]): Broadcast[T]
Creates a broadcast variable out of an object instance of type T.
Creates a broadcast variable out of an object instance of type T. WARNING: the recompilation won't be triggered if a change of the broadcast object is detected.
The normal use of this functionality is to: - perform the broadcast creations during the driver setting up. - send the broadcast variables to the compiler which is under creation as a parameter.
- T
the type of the broadcast variable under creation
- context
the com.here.platform.data.processing.java.driver.DriverContext object
- t
the object to create a broadcast variable out ot it
- clazz
the Class of type T
- returns
the broadcast variable
- Note
As object hashes AREN'T added to the context fingerprints, so the recompilation won't be triggered if a change of the object is detected.