HERE Data SDK - Scala API references < Back

Packages

package root

Definition Classes
root
package com

Definition Classes
root
package here

Definition Classes
com
package platform

Definition Classes
here
package data

Definition Classes
platform
package processing
This package provides the Data Processing Library for building distributed data processing applications.
This package provides the Data Processing Library for building distributed data processing applications.
A Runner both implements the interface with the environment for an application to run, and starts the application. The application, in turn, is driven by a Driver, that controls and performs the distributed processing.
Choose a Runner best suited for the environment where the application runs.
The Driver performs one of more tasks which read layers from input catalogs and write to one or more layers of an output catalog.
The main entry point in the processing library is the com.here.platform.data.processing.driver.DriverBuilder class where you can add different kinds of tasks to the driver. The driver runs the tasks, and commits the final results to the output catalog.
Tasks are implemented using one or more compilers.
The simplest compiler is the direct compiler which maps each input tile to N output tiles. The application needs to define com.here.platform.data.processing.compiler.Direct1ToNCompiler.
Other more complex compilation patterns are based on some kind of dependency tracking between input partitions and output partitions.
The processing Library supports the following patterns:
- com.here.platform.data.processing.compiler.NonIncrementalCompiler: non-incremental compilation only - com.here.platform.data.processing.compiler.DepCompiler: non-incremental dependency calculation and incremental compilation - com.here.platform.data.processing.compiler.IncrementalDepCompiler: incremental dependency calculation and compilation - com.here.platform.data.processing.compiler.Direct1ToNCompiler: incremental compilation where every output tile depends only on one input tile, and this mapping is independent from tile content - com.here.platform.data.processing.compiler.DirectMToNCompiler: incremental compilation where every output tile depends on multiple input tiles, and this mapping is independent from tile content - com.here.platform.data.processing.compiler.MapGroupCompiler: incremental compilation where every output tile can depend on multiple input tiles, and this mapping depend on the tile content - com.here.platform.data.processing.compiler.RefTreeCompiler: fully-managed two phases incremental compilation that can resolve references between input partitions. Input/Output dependency management is implemented and the developer doesn't need to provide this logic
The application's main object normally mixes in the a runner trait (like PipelineRunner) to setup the Driver, and interfaces with the environment where the application is run. See the Main classes in the example compilers for more details.
com.here.platform.data.processing.catalog, com.here.platform.data.processing.blobstore, and com.here.platform.data.processing.publisher contain utilities for accessing catalogs and payloads in a Spark-friendly way, providing an RDD-based abstraction over data and metadata. These classes are used by the processing library, but can also be used independently.

Definition Classes
data
package java
This package provides Java bindings for the Data Processing Library, to build distributed data processing applications in Java.
This package provides Java bindings for the Data Processing Library, to build distributed data processing applications in Java.
A Runner both implements the interface with the environment for an application to run, and starts the application. The application, in turn, is driven by a Driver, that controls and performs the distributed processing.
Choose a Runner best suited for the environment where the application runs.
The Driver performs one of more tasks which read layers from input catalogs and write to one or more layers of an output catalog.
The main entry point in the processing library is the com.here.platform.data.processing.java.driver.DriverBuilder class where you can add different kinds of tasks to the driver. The driver runs the tasks, and commits the final results to the output catalog.
Tasks are implemented using one or more compilers.
The simplest compiler is the direct compiler which maps each input tile to N output tiles. The application needs to implement a com.here.platform.data.processing.java.compiler.Direct1ToNCompiler.
Other more complex compilation patterns are based on different types of dependency tracking between input partitions and output partitions.
The processing Library supports the following patterns:
- com.here.platform.data.processing.java.compiler.NonIncrementalCompiler: non-incremental compilation only - com.here.platform.data.processing.java.compiler.DepCompiler: non-incremental dependency calculation and incremental compilation - com.here.platform.data.processing.java.compiler.IncrementalDepCompiler: incremental dependency calculation and compilation - com.here.platform.data.processing.java.compiler.Direct1ToNCompiler: incremental compilation where every output tile depends only on one input tile, and this mapping is independent from tile content - com.here.platform.data.processing.java.compiler.DirectMToNCompiler: incremental compilation where every output tile can depend on multiple input tiles, and this mapping is independent from tile content - com.here.platform.data.processing.java.compiler.MapGroupCompiler: incremental compilation where every output tile can depend on multiple input tiles, and this mapping depend on the tile content - com.here.platform.data.processing.java.compiler.RefTreeCompiler: fully-managed two phases incremental compilation that can resolve references between input partitions. Input/Output dependency management is implemented and the developer doesn't need to provide this logic
The application's main object normally extends a runner class (like PipelineRunner) to setup the Driver, and interfaces with the environment where the application runs. For more details, see the Main classes in the example compilers.
com.here.platform.data.processing.java.catalog, com.here.platform.data.processing.java.blobstore, and com.here.platform.data.processing.java.publisher contain utilities for accessing catalogs and payloads in a Spark-friendly way, providing an RDD-based abstraction over data and metadata. These classes are used by the processing library, but can also be used independently.

Definition Classes
processing
package blobstore
Java bindings for the blobstore package.
Java bindings for the blobstore package.

Definition Classes
java
package broadcast
Provides the basic functionality to perform org.apache.spark.broadcast.Broadcast creation.
Provides the basic functionality to perform org.apache.spark.broadcast.Broadcast creation.
To use a broadcast variable in the processing library, developers are required to add the variable's hash to the fingerprints. This ensures that incremental compilations are not compromised. This package object provides the method toBroadcast() to use for that purpose. The package also offers a functionality to query catalogs by properly managing versions. The functionality is based on the org.apache.spark.broadcast.Broadcast which offers a self-contained mechanism to create a single broadcast variable of a generic type.
This compiler class is the preferred way of working with broadcast variables in the Data Processing Library.

Definition Classes
java
package catalog
Java bindings for the catalog package.
Java bindings for the catalog package.

Definition Classes
java
package compiler

Definition Classes
java
package direct
package mapgroup
package reftree
CompileOut1To1Fn
CompileOut1ToNFn
CompileOutFn
DepCompiler
Direct1ToNCompiler
DirectMToNCompiler
IncrementalDepCompiler
InputLayers
InputOptPartitioner
InputPartitioner
MapGroupCompiler
NonIncrementalCompiler
OutputLayers
OutputOptPartitioner
OutputPartitioner
RefTreeCompiler
package driver

Definition Classes
java
package impl

Definition Classes
java
package leveling

Definition Classes
java
package publisher

Definition Classes
java
package spark

Definition Classes
java
package utils

Definition Classes
java

com.here.platform.data.processing.java

compiler

package compiler

Linear Supertypes

AnyRef, Any

Type Members

trait CompileOut1To1Fn[T] extends CompileOutFn[T]
Produces one com.here.platform.data.processing.java.blobstore.Payload for each "nominal" output partition the compiler generates.
Produces one com.here.platform.data.processing.java.blobstore.Payload for each "nominal" output partition the compiler generates.
T
the custom type of the values passed between front-end and back-end

Note
This is a Java friendly version of com.here.platform.data.processing.compiler.CompileOut1To1Fn.
trait CompileOut1ToNFn[T] extends CompileOutFn[T]
Produces multiple OutKeys and com.here.platform.data.processing.java.blobstore.Payloads for each "nominal" output partition the compiler generates.
Produces multiple OutKeys and com.here.platform.data.processing.java.blobstore.Payloads for each "nominal" output partition the compiler generates.
T
the custom type of the values passed between front-end and back-end

Note
This is a Java friendly version of com.here.platform.data.processing.compiler.CompileOut1ToNFn.
trait CompileOutFn[T] extends Serializable
Back-end of DirectMToNCompiler, MapGroupCompiler and RefTreeCompiler.
Back-end of DirectMToNCompiler, MapGroupCompiler and RefTreeCompiler. Receives values of a custom type produced by the front-end and produces the final payloads that are the uploaded and committed to the output catalog.
T
the custom type of the values passed between front-end and back-end

Note
The implementation must be scala.Serializable as this is copied to workers and run inside Spark map functions.
,
Users should implement one of the interfaces extending this one to their implementation of DirectMToNCompiler, MapGroupCompiler or RefTreeCompiler according to the wanted behaviour.
,
This is a Java friendly version of com.here.platform.data.processing.compiler.CompileOutFn.
trait DepCompiler[T] extends InputLayers with InputPartitioner with OutputLayers with OutputPartitioner
Interface for a basic incremental dependency-based compiler.
Interface for a basic incremental dependency-based compiler.
Calculated dependencies shall be exhaustive especially for the incremental case, as the compiler will schedule compilation of the output partition returned as part of dependency calculation without any intermediate further processing.
Returning non-exhaustive dependencies has the effect of producing a corrupted output catalog when doing incremental compilation, or an output catalog with missing/invalid data when doing full compilation.
The compiler is efficient only in case the cost of calculating dependencies is negligible compared to the cost of actually producing the output map, as dependency calculation is always triggered with the full input map.
T
The type of the values collected from dependencies for each output partition

Note
This is a Java friendly version of com.here.platform.data.processing.compiler.DepCompiler.
See also
traits mixed in for more details
trait Direct1ToNCompiler[T] extends InputLayers with InputOptPartitioner with CompileInFn[T] with OutputLayers with OutputOptPartitioner with compiler.direct.CompileOutFn[T]
This compiler allows a stateless incremental compilation, in the case the keys of output partitions are a function of the input keys only, so where the input/output mapping does not depend on the input content but only on the input keys.
This compiler allows a stateless incremental compilation, in the case the keys of output partitions are a function of the input keys only, so where the input/output mapping does not depend on the input content but only on the input keys.
This compiler is applicable only when each output partition is affected by one single input partition. This is restriction enforced but allows a very efficient implementation of the incremental compilation case. The limitation is on the output side only: one input partition may be mapped to zero, one or more output partitions but each output partition can be mapped to by one input partition only, without overlaps.
In case multiple input partitions may affect the same output partition, the DirectMToNCompiler should be used instead.
T
the type of the values passed between front-end and back-end

Note
the implementation must be scala.Serializable as this is copied to workers and run inside Spark map functions
,
This is a Java friendly version of com.here.platform.data.processing.compiler.Direct1ToNCompiler.
See also
extended interfaces for more details
trait DirectMToNCompiler[T] extends InputLayers with InputOptPartitioner with CompileInFn[T] with OutputLayers with OutputOptPartitioner with CompileOutFn[T]
This compiler allows a stateless incremental compilation, in the case the keys of output partitions are a function of the input keys only, so where the input/output mapping does not depend on the input content but only on the input keys.
This compiler allows a stateless incremental compilation, in the case the keys of output partitions are a function of the input keys only, so where the input/output mapping does not depend on the input content but only on the input keys.
This compiler allows multiple input partitions to map to the same output partition. In the more limited case where each output partition is affected by one single input partition, Direct1ToNCompiler should be used instead, that enforce this property and can perform incremental compilation in an even for efficient way.
T
the type of the values passed between front-end and back-end

Note
the implementation must be scala.Serializable as this is copied to workers and run inside Spark map functions.
,
This is a Java friendly version of com.here.platform.data.processing.compiler.DirectMToNCompiler.
See also
extended interfaces for more details
type InChange = Change
type InChanges = JavaPairRDD[InKey, InChange]
type InData = JavaPairRDD[InKey, InMeta]
type InKey = Key
type InMeta = Meta
trait IncrementalDepCompiler[T, CarryOver] extends DepCompiler[T]
Improved version of DepCompiler that can update the dependencies from the previous run instead of recalculating them from scratch.
Improved version of DepCompiler that can update the dependencies from the previous run instead of recalculating them from scratch.
The non-incremental case works exactly as in DepCompiler.
The incremental case is implemented in three parts: 1) a function that updates the previous dependency graph given changes from the input. 2) a function that, given a subgraph of the dependency graph, calculates the Ts. 3) the compile function, the same as in DepCompiler, in invoked on the output partitions that are affected by the changes.
A mechanism to carry over an arbitrary intermediate result for the 1st function to the 2nd function is also provided.
T
the type of the values collected from dependencies for each output partition
CarryOver
the type of the opaque object carried over from 1st to 2nd function

Note
This is a Java friendly version of com.here.platform.data.processing.compiler.IncrementalDepCompiler.
trait InputLayers extends AnyRef
Java friendly version of com.here.platform.data.processing.compiler.InputLayers.
trait InputOptPartitioner extends AnyRef
Java friendly version of com.here.platform.data.processing.compiler.InputOptPartitioner.
trait InputPartitioner extends AnyRef
Java friendly version of com.here.platform.data.processing.compiler.InputPartitioner.
trait MapGroupCompiler[T] extends InputLayers with InputOptPartitioner with CompileInFn[T] with OutputLayers with OutputOptPartitioner with CompileOutFn[T]
Compiler to implement a generic Map-Reduce pattern, where the reduce function is group-by.
Compiler to implement a generic Map-Reduce pattern, where the reduce function is group-by. The front-end compiler each input partition and produces the list of output partition that this input affect, each with a value of custom type. Values are then grouped per output partition and passed to the back-end that produces the output map.
This pattern is a more general version of Direct1ToNCompiler and DirectMToNCompiler where not only a M:N input/output relationship is supported, but this relationship is function of the input payloads, so the input content.
This pattern, however, compiles input partitions standalone, meaning that compiling one input partition sees data and metadata of that partition only. In case it is needed to lookup information from additional input partition in the front-end, please refer to RefTreeCompiler.
T
the custom type of the values passed between front-end and back-end

Note
the implementation must be scala.Serializable as this is copied to workers and run inside Spark map functions
,
This is a Java friendly version of com.here.platform.data.processing.compiler.MapGroupCompiler.
See also
extended interfaces for more details
trait NonIncrementalCompiler extends InputLayers with InputPartitioner with OutputLayers with OutputPartitioner
The simplest interface that an actual compiler can implement.
The simplest interface that an actual compiler can implement. This leaves the maximum flexibility to the compiler implementors, but has no support for incremental compilation.

Note
This is a Java friendly version of com.here.platform.data.processing.compiler.NonIncrementalCompiler.
See also
extended interfaces for more details
type OutKey = Key
type OutMeta = Meta
trait OutputLayers extends AnyRef
Java friendly version of com.here.platform.data.processing.compiler.OutputLayers.
trait OutputOptPartitioner extends AnyRef
Java friendly version of com.here.platform.data.processing.compiler.OutputOptPartitioner.
trait OutputPartitioner extends AnyRef
Java friendly version of com.here.platform.data.processing.compiler.OutputPartitioner.
trait RefTreeCompiler[T] extends InputLayers with InputOptPartitioner with ResolveInFn with CompileInFn[T] with OutputLayers with OutputOptPartitioner with CompileOutFn[T]
A RefTreeCompiler allows full and incremental compilation with complex reference structures.
A RefTreeCompiler allows full and incremental compilation with complex reference structures. A condition is that all references of a layer can be calculated purely from the source data of this layer using a resolve function, which just gets the meta data of one partition as an input. The structure of the references has to be predefined in a reftree.RefTree object.
Apart from the reference resolution pre-phase described above, the compilation itself is split in two phases, front-end and backend, as in other compilers, similar to MapGroupCompiler.
In the first phase, compileIn from reftree.CompileInFn is called for every partition, with the full list of meta data for all of its referenced partitions. This method returns one or more values of type T for every impacted output partition.
The compileIn function for the first phase of the compilation is defined in traits that extend reftree.CompileInFn. One of these need to be mixed in, like reftree.CompileInFnWithRefs or reftree.CompileInFnWithRefsReturnsReferences.
In the second phase, a method from one of the CompileOutFn traits is called for every output partition where the first phase provided at least one element of T. Elements coming from various input partitions are grouped together and provided as input of compilation for each output partition.
T
The custom type of the values passed between front-end and back-end

Note
The implementation must be scala.Serializable as this is copied to workers and run inside Spark map functions.
,
This is a Java friendly version of com.here.platform.data.processing.compiler.RefTreeCompiler.
See also
extended interfaces for more details
type ToPublish = JavaPairRDD[OutKey, Option[Payload]]

Packages

compiler

package compiler

Type Members

Inherited from AnyRef

Inherited from Any

Ungrouped

Packages

compiler 

package compiler

Type Members

Inherited from AnyRef

Inherited from Any

Ungrouped

compiler