Packages

package root

Definition Classes: root

package com

Definition Classes: root

package here

Definition Classes: com

package platform

Definition Classes: here

package data

Definition Classes: platform

package processing

This package provides the Data Processing Library for building distributed data processing applications.

A Runner both implements the interface with the environment for an application to run, and starts the application. The application, in turn, is driven by a Driver, that controls and performs the distributed processing.

Choose a Runner best suited for the environment where the application runs.

The Driver performs one of more tasks which read layers from input catalogs and write to one or more layers of an output catalog.

The main entry point in the processing library is the com.here.platform.data.processing.driver.DriverBuilder class where you can add different kinds of tasks to the driver. The driver runs the tasks, and commits the final results to the output catalog.

Tasks are implemented using one or more compilers.

The simplest compiler is the direct compiler which maps each input tile to N output tiles. The application needs to define com.here.platform.data.processing.compiler.Direct1ToNCompiler.

Other more complex compilation patterns are based on some kind of dependency tracking between input partitions and output partitions.

The processing Library supports the following patterns:

- com.here.platform.data.processing.compiler.NonIncrementalCompiler: non-incremental compilation only - com.here.platform.data.processing.compiler.DepCompiler: non-incremental dependency calculation and incremental compilation - com.here.platform.data.processing.compiler.IncrementalDepCompiler: incremental dependency calculation and compilation - com.here.platform.data.processing.compiler.Direct1ToNCompiler: incremental compilation where every output tile depends only on one input tile, and this mapping is independent from tile content - com.here.platform.data.processing.compiler.DirectMToNCompiler: incremental compilation where every output tile depends on multiple input tiles, and this mapping is independent from tile content - com.here.platform.data.processing.compiler.MapGroupCompiler: incremental compilation where every output tile can depend on multiple input tiles, and this mapping depend on the tile content - com.here.platform.data.processing.compiler.RefTreeCompiler: fully-managed two phases incremental compilation that can resolve references between input partitions. Input/Output dependency management is implemented and the developer doesn't need to provide this logic

The application's main object normally mixes in the a runner trait (like PipelineRunner) to setup the Driver, and interfaces with the environment where the application is run. See the Main classes in the example compilers for more details.

com.here.platform.data.processing.catalog, com.here.platform.data.processing.blobstore, and com.here.platform.data.processing.publisher contain utilities for accessing catalogs and payloads in a Spark-friendly way, providing an RDD-based abstraction over data and metadata. These classes are used by the processing library, but can also be used independently.

Definition Classes: data

package blobstore

Contains an abstract interface used to access BlobStore.

Obtain the retriever and uploader for a catalog directly from the com.here.platform.data.processing.catalog.Catalog instance.

Definition Classes: processing

package broadcast

This package should be considered as the preferred way of working with broadcast variables in a Data Processing Library compiler.

This package should be considered as the preferred way of working with broadcast variables in a Data Processing Library compiler. It provides the basic functionality to create an instance of org.apache.spark.broadcast.Broadcast.

This functionality is based on org.apache.spark.broadcast.Broadcast which offers a self-contained mechanism to create a single broadcast variable of a generic type.

The toBroadcast() method is provided to create a broadcast variable and add the hash to the fingerprint of the com.here.platform.data.processing.driver.DriverContext, which is required in order for incremental compilations to work correctly.

The package also enables developers to query the catalogs without the need to manage versions manually.

Definition Classes: processing

package catalog

Contains an abstract Scala interface for accessing catalogs from Spark.

Use the com.here.platform.data.processing.catalog.Catalog factory methods to obtain instances.

Definition Classes: processing

package clientfactory

Definition Classes: processing

package compiler

Definition Classes: processing

package driver

Definition Classes: processing

package exception

Definition Classes: processing

package java

This package provides Java bindings for the Data Processing Library, to build distributed data processing applications in Java.

Choose a Runner best suited for the environment where the application runs.

The Driver performs one of more tasks which read layers from input catalogs and write to one or more layers of an output catalog.

The main entry point in the processing library is the com.here.platform.data.processing.java.driver.DriverBuilder class where you can add different kinds of tasks to the driver. The driver runs the tasks, and commits the final results to the output catalog.

Tasks are implemented using one or more compilers.

The simplest compiler is the direct compiler which maps each input tile to N output tiles. The application needs to implement a com.here.platform.data.processing.java.compiler.Direct1ToNCompiler.

Other more complex compilation patterns are based on different types of dependency tracking between input partitions and output partitions.

The processing Library supports the following patterns:

- com.here.platform.data.processing.java.compiler.NonIncrementalCompiler: non-incremental compilation only - com.here.platform.data.processing.java.compiler.DepCompiler: non-incremental dependency calculation and incremental compilation - com.here.platform.data.processing.java.compiler.IncrementalDepCompiler: incremental dependency calculation and compilation - com.here.platform.data.processing.java.compiler.Direct1ToNCompiler: incremental compilation where every output tile depends only on one input tile, and this mapping is independent from tile content - com.here.platform.data.processing.java.compiler.DirectMToNCompiler: incremental compilation where every output tile can depend on multiple input tiles, and this mapping is independent from tile content - com.here.platform.data.processing.java.compiler.MapGroupCompiler: incremental compilation where every output tile can depend on multiple input tiles, and this mapping depend on the tile content - com.here.platform.data.processing.java.compiler.RefTreeCompiler: fully-managed two phases incremental compilation that can resolve references between input partitions. Input/Output dependency management is implemented and the developer doesn't need to provide this logic

The application's main object normally extends a runner class (like PipelineRunner) to setup the Driver, and interfaces with the environment where the application runs. For more details, see the Main classes in the example compilers.

com.here.platform.data.processing.java.catalog, com.here.platform.data.processing.java.blobstore, and com.here.platform.data.processing.java.publisher contain utilities for accessing catalogs and payloads in a Spark-friendly way, providing an RDD-based abstraction over data and metadata. These classes are used by the processing library, but can also be used independently.

Definition Classes: processing

package blobstore

Java bindings for the blobstore package.

package broadcast

Provides the basic functionality to perform org.apache.spark.broadcast.Broadcast creation.

To use a broadcast variable in the processing library, developers are required to add the variable's hash to the fingerprints. This ensures that incremental compilations are not compromised. This package object provides the method toBroadcast() to use for that purpose. The package also offers a functionality to query catalogs by properly managing versions. The functionality is based on the org.apache.spark.broadcast.Broadcast which offers a self-contained mechanism to create a single broadcast variable of a generic type.

This compiler class is the preferred way of working with broadcast variables in the Data Processing Library.

package catalog

Java bindings for the catalog package.

package compiler

package driver

package impl

package leveling

package publisher

package spark

package utils

Java

Pair

package leveling

Definition Classes: processing

package logging

Definition Classes: processing

package publisher

Definition Classes: processing

package spark

Definition Classes: processing

package statistics

Common statistics utilities.

Definition Classes: processing

package utils

Definition Classes: processing

package validation

Definition Classes: processing

com.here.platform.data.processing

java

package java

This package provides Java bindings for the Data Processing Library, to build distributed data processing applications in Java.

Choose a Runner best suited for the environment where the application runs.

The Driver performs one of more tasks which read layers from input catalogs and write to one or more layers of an output catalog.

Tasks are implemented using one or more compilers.

The simplest compiler is the direct compiler which maps each input tile to N output tiles. The application needs to implement a com.here.platform.data.processing.java.compiler.Direct1ToNCompiler.

Other more complex compilation patterns are based on different types of dependency tracking between input partitions and output partitions.

The processing Library supports the following patterns:

Linear Supertypes

AnyRef, Any

Ordering

Alphabetic
By Inheritance

Inherited

java
AnyRef
Any

Hide All
Show All

Visibility

Public
Protected

Package Members

package blobstore
Java bindings for the blobstore package.
package broadcast
Provides the basic functionality to perform org.apache.spark.broadcast.Broadcast creation.
Provides the basic functionality to perform org.apache.spark.broadcast.Broadcast creation.
To use a broadcast variable in the processing library, developers are required to add the variable's hash to the fingerprints. This ensures that incremental compilations are not compromised. This package object provides the method toBroadcast() to use for that purpose. The package also offers a functionality to query catalogs by properly managing versions. The functionality is based on the org.apache.spark.broadcast.Broadcast which offers a self-contained mechanism to create a single broadcast variable of a generic type.
This compiler class is the preferred way of working with broadcast variables in the Data Processing Library.
package catalog
Java bindings for the catalog package.
package compiler
package driver
package impl
package leveling
package publisher
package spark
package utils

Type Members

final class Pair[K, V] extends org.apache.commons.lang3.tuple.Pair[K, V]
Immutable implementation of org.apache.commons.lang3.tuple.Pair.
Immutable implementation of org.apache.commons.lang3.tuple.Pair.
K
type of the left member of the pair
V
type of the right member of the pair

Packages

java

package java

Package Members

Type Members

Value Members

Inherited from AnyRef

Inherited from Any

Ungrouped

Packages

java

package java

Package Members

Type Members

Value Members

Inherited from AnyRef

Inherited from Any

Ungrouped

java