com.here.platform.data.processing.compiler

MapGroupCompiler

trait MapGroupCompiler[T] extends InputLayers with InputOptPartitioner with CompileInFn[T] with OutputLayers with OutputOptPartitioner with CompileOutFn[T]

Compiler to implement a generic Map-Reduce pattern, where the reduce function is group-by. The front-end compiler each input partition and produces the list of output partition that this input affect, each with a value of custom type. Values are then grouped per output partition and passed to the back-end that produces the output map.

This pattern is a more general version of Direct1ToNCompiler and DirectMToNCompiler where not only a M:N input/output relationship is supported, but this relationship is function of the input payloads, so the input content.

This pattern, however, compiles input partitions standalone, meaning that compiling one input partition sees data and metadata of that partition only. In case it is needed to lookup information from additional input partition in the front-end, please refer to RefTreeCompiler.

T: the custom type of the values passed between front-end and back-end

Note: the implementation must be scala.Serializable as this is copied to workers and run inside Spark map functions
See also: traits mixed in for more details

Linear Supertypes

CompileOutFn[T], OutputOptPartitioner, OutputLayers, CompileInFn[T], Serializable, Serializable, InputOptPartitioner, InputLayers, AnyRef, Any

Known Subclasses

ConcreteDirectMapGroupCompiler, WrapperMapGroupCompiler

Ordering

Alphabetic
By Inheritance

Inherited

MapGroupCompiler
CompileOutFn
OutputOptPartitioner
OutputLayers
CompileInFn
Serializable
Serializable
InputOptPartitioner
InputLayers
AnyRef
Any

Hide All
Show All

Visibility

Public
All

Abstract Value Members

abstract def compileInFn(in: (InKey, InMeta)): Iterable[(OutKey, T)]
Calculates the dependent output partitions and intermediate results from a single input partition.
Calculates the dependent output partitions and intermediate results from a single input partition.
in
the input partition to process
returns
all the impacted output partitions com.here.platform.data.processing.compiler.OutKey and intermediate data of type T for this partition. It may contain more than one element per output key. compileOutFn will only be called for outKeys which have at least one intermediate value from this phase. Other outKeys will be automatically deleted.

Definition Classes
CompileInFn
abstract def compileOutFnDefined(): Unit
Must be overridden as final by all subclasses, to block the mixin of different interfaces in the same compiler class and to assure that at least one child interface is mixed in.
Must be overridden as final by all subclasses, to block the mixin of different interfaces in the same compiler class and to assure that at least one child interface is mixed in.

Attributes
protected
Definition Classes
CompileOutFn
abstract def inLayers: Map[Id, Set[Id]]
Represents layers of the input catalogs that you should query and provide to the compiler.
Represents layers of the input catalogs that you should query and provide to the compiler. These layers are grouped by input catalog and identified by catalog ID and layer ID.

Definition Classes
InputLayers
abstract def inPartitioner(parallelism: Int): Option[Partitioner[InKey]]
Specifies the partitioner to use when querying the input catalogs.
Specifies the partitioner to use when querying the input catalogs. If no partitioner is provided, by returning None from this function, then the Executor uses the default partitioner.
parallelism
The number of partitions the partitioner should partition the catalog into, this should match the parallelism of the Spark RDD containing the input partitions.
returns
The optional input partitioner with the parallelism specified.

Definition Classes
InputOptPartitioner
abstract def outLayers: Set[Id]
Layers to be produced by the compiler.
Layers to be produced by the compiler.

Definition Classes
OutputLayers
abstract def outPartitioner(parallelism: Int): Option[Partitioner[OutKey]]
Specifies the partitioner to use when querying the output catalog and producing output data.
Specifies the partitioner to use when querying the output catalog and producing output data. If no partitioner is provided, by returning None from this function, then the Executor uses the default partitioner.
parallelism
The number of partitions the partitioner should partition the catalog into, this should match the parallelism of the Spark RDD containing the output partitions.
returns
The optional output partitioner with the parallelism specified.

Definition Classes
OutputOptPartitioner

Concrete Value Members

final def !=(arg0: Any): Boolean

Definition Classes
AnyRef → Any
final def ##(): Int

Definition Classes
AnyRef → Any
final def ==(arg0: Any): Boolean

Definition Classes
AnyRef → Any
final def asInstanceOf[T0]: T0

Definition Classes
Any
def clone(): AnyRef

Attributes
protected[lang]
Definition Classes
AnyRef
Annotations
@throws( ... ) @native() @IntrinsicCandidate()
final def eq(arg0: AnyRef): Boolean

Definition Classes
AnyRef
def equals(arg0: Any): Boolean

Definition Classes
AnyRef → Any
final def getClass(): Class[_]

Definition Classes
AnyRef → Any
Annotations
@native() @IntrinsicCandidate()
def hashCode(): Int

Definition Classes
AnyRef → Any
Annotations
@native() @IntrinsicCandidate()
final def isInstanceOf[T0]: Boolean

Definition Classes
Any
final def ne(arg0: AnyRef): Boolean

Definition Classes
AnyRef
final def notify(): Unit

Definition Classes
AnyRef
Annotations
@native() @IntrinsicCandidate()
final def notifyAll(): Unit

Definition Classes
AnyRef
Annotations
@native() @IntrinsicCandidate()
final val outCatalogId: Id
Identifier for the output catalog.
Identifier for the output catalog.

Definition Classes
OutputLayers
final def synchronized[T0](arg0: ⇒ T0): T0

Definition Classes
AnyRef
def toString(): String

Definition Classes
AnyRef → Any
final def wait(arg0: Long, arg1: Int): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def wait(arg0: Long): Unit

Definition Classes
AnyRef
Annotations
@throws( ... ) @native()
final def wait(): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )

Deprecated Value Members

def finalize(): Unit

Attributes
protected[lang]
Definition Classes
AnyRef
Annotations
@throws( classOf[java.lang.Throwable] ) @Deprecated
Deprecated

Packages

MapGroupCompiler

trait MapGroupCompiler[T] extends InputLayers with InputOptPartitioner with CompileInFn[T] with OutputLayers with OutputOptPartitioner with CompileOutFn[T]

Abstract Value Members

Concrete Value Members

Deprecated Value Members

Inherited from CompileOutFn[T]

Inherited from OutputOptPartitioner

Inherited from OutputLayers

Inherited from CompileInFn[T]

Inherited from Serializable

Inherited from Serializable

Inherited from InputOptPartitioner

Inherited from InputLayers

Inherited from AnyRef

Inherited from Any

Ungrouped

Packages

MapGroupCompiler 

trait MapGroupCompiler[T] extends InputLayers with InputOptPartitioner with CompileInFn[T] with OutputLayers with OutputOptPartitioner with CompileOutFn[T]

Abstract Value Members

Concrete Value Members

Deprecated Value Members

Inherited from CompileOutFn[T]

Inherited from OutputOptPartitioner

Inherited from OutputLayers

Inherited from CompileInFn[T]

Inherited from Serializable

Inherited from Serializable

Inherited from InputOptPartitioner

Inherited from InputLayers

Inherited from AnyRef

Inherited from Any

Ungrouped

MapGroupCompiler