Packages

package root

Definition Classes: root

package com

Definition Classes: root

package here

Definition Classes: com

package platform

Definition Classes: here

package data

Definition Classes: platform

package processing

This package provides the Data Processing Library for building distributed data processing applications.

A Runner both implements the interface with the environment for an application to run, and starts the application. The application, in turn, is driven by a Driver, that controls and performs the distributed processing.

Choose a Runner best suited for the environment where the application runs.

The Driver performs one of more tasks which read layers from input catalogs and write to one or more layers of an output catalog.

The main entry point in the processing library is the com.here.platform.data.processing.driver.DriverBuilder class where you can add different kinds of tasks to the driver. The driver runs the tasks, and commits the final results to the output catalog.

Tasks are implemented using one or more compilers.

The simplest compiler is the direct compiler which maps each input tile to N output tiles. The application needs to define com.here.platform.data.processing.compiler.Direct1ToNCompiler.

Other more complex compilation patterns are based on some kind of dependency tracking between input partitions and output partitions.

The processing Library supports the following patterns:

- com.here.platform.data.processing.compiler.NonIncrementalCompiler: non-incremental compilation only - com.here.platform.data.processing.compiler.DepCompiler: non-incremental dependency calculation and incremental compilation - com.here.platform.data.processing.compiler.IncrementalDepCompiler: incremental dependency calculation and compilation - com.here.platform.data.processing.compiler.Direct1ToNCompiler: incremental compilation where every output tile depends only on one input tile, and this mapping is independent from tile content - com.here.platform.data.processing.compiler.DirectMToNCompiler: incremental compilation where every output tile depends on multiple input tiles, and this mapping is independent from tile content - com.here.platform.data.processing.compiler.MapGroupCompiler: incremental compilation where every output tile can depend on multiple input tiles, and this mapping depend on the tile content - com.here.platform.data.processing.compiler.RefTreeCompiler: fully-managed two phases incremental compilation that can resolve references between input partitions. Input/Output dependency management is implemented and the developer doesn't need to provide this logic

The application's main object normally mixes in the a runner trait (like PipelineRunner) to setup the Driver, and interfaces with the environment where the application is run. See the Main classes in the example compilers for more details.

com.here.platform.data.processing.catalog, com.here.platform.data.processing.blobstore, and com.here.platform.data.processing.publisher contain utilities for accessing catalogs and payloads in a Spark-friendly way, providing an RDD-based abstraction over data and metadata. These classes are used by the processing library, but can also be used independently.

Definition Classes: data

package blobstore

Contains an abstract interface used to access BlobStore.

Obtain the retriever and uploader for a catalog directly from the com.here.platform.data.processing.catalog.Catalog instance.

Definition Classes: processing

package broadcast

This package should be considered as the preferred way of working with broadcast variables in a Data Processing Library compiler.

This package should be considered as the preferred way of working with broadcast variables in a Data Processing Library compiler. It provides the basic functionality to create an instance of org.apache.spark.broadcast.Broadcast.

This functionality is based on org.apache.spark.broadcast.Broadcast which offers a self-contained mechanism to create a single broadcast variable of a generic type.

The toBroadcast() method is provided to create a broadcast variable and add the hash to the fingerprint of the com.here.platform.data.processing.driver.DriverContext, which is required in order for incremental compilations to work correctly.

The package also enables developers to query the catalogs without the need to manage versions manually.

Definition Classes: processing

package catalog

Contains an abstract Scala interface for accessing catalogs from Spark.

Use the com.here.platform.data.processing.catalog.Catalog factory methods to obtain instances.

Definition Classes: processing

package clientfactory

Definition Classes: processing

package compiler

Definition Classes: processing

package driver

Definition Classes: processing

package exception

Definition Classes: processing

package java

This package provides Java bindings for the Data Processing Library, to build distributed data processing applications in Java.

Choose a Runner best suited for the environment where the application runs.

The Driver performs one of more tasks which read layers from input catalogs and write to one or more layers of an output catalog.

The main entry point in the processing library is the com.here.platform.data.processing.java.driver.DriverBuilder class where you can add different kinds of tasks to the driver. The driver runs the tasks, and commits the final results to the output catalog.

Tasks are implemented using one or more compilers.

The simplest compiler is the direct compiler which maps each input tile to N output tiles. The application needs to implement a com.here.platform.data.processing.java.compiler.Direct1ToNCompiler.

Other more complex compilation patterns are based on different types of dependency tracking between input partitions and output partitions.

The processing Library supports the following patterns:

- com.here.platform.data.processing.java.compiler.NonIncrementalCompiler: non-incremental compilation only - com.here.platform.data.processing.java.compiler.DepCompiler: non-incremental dependency calculation and incremental compilation - com.here.platform.data.processing.java.compiler.IncrementalDepCompiler: incremental dependency calculation and compilation - com.here.platform.data.processing.java.compiler.Direct1ToNCompiler: incremental compilation where every output tile depends only on one input tile, and this mapping is independent from tile content - com.here.platform.data.processing.java.compiler.DirectMToNCompiler: incremental compilation where every output tile can depend on multiple input tiles, and this mapping is independent from tile content - com.here.platform.data.processing.java.compiler.MapGroupCompiler: incremental compilation where every output tile can depend on multiple input tiles, and this mapping depend on the tile content - com.here.platform.data.processing.java.compiler.RefTreeCompiler: fully-managed two phases incremental compilation that can resolve references between input partitions. Input/Output dependency management is implemented and the developer doesn't need to provide this logic

The application's main object normally extends a runner class (like PipelineRunner) to setup the Driver, and interfaces with the environment where the application runs. For more details, see the Main classes in the example compilers.

com.here.platform.data.processing.java.catalog, com.here.platform.data.processing.java.blobstore, and com.here.platform.data.processing.java.publisher contain utilities for accessing catalogs and payloads in a Spark-friendly way, providing an RDD-based abstraction over data and metadata. These classes are used by the processing library, but can also be used independently.

Definition Classes: processing

package leveling

Definition Classes: processing

AdaptivePattern

AdaptivePatternEstimateFn

AdaptivePatternEstimator

FixedPattern

Implicits

Pattern

package logging

Definition Classes: processing

package publisher

Definition Classes: processing

package spark

Definition Classes: processing

package statistics

Common statistics utilities.

Definition Classes: processing

package utils

Definition Classes: processing

package validation

Definition Classes: processing

com.here.platform.data.processing

leveling

package leveling

Ordering

Alphabetic

Visibility

Public
All

Type Members

final class AdaptivePattern extends Pattern
Levels partitions by parent com.here.platform.data.processing.catalog.Partition.HereTiles.
Levels partitions by parent com.here.platform.data.processing.catalog.Partition.HereTiles.
This Pattern is used to implement adaptive leveling of output tiles based on content density. Adaptive leveling can be used to output tiles at a lower level in geographic areas where the content is sparse or at a higher level in geographic areas where the content is dense.
This solution results in the following: - the sizes of the output tiles are more uniform and distributed closer to the average size - extremes such as few tiles that are too big or too many small tiles are avoided - download times are more uniform and predictable, especially for interactive applications
This Pattern can also be used to balance the sizes of Spark partitions to obtain a more even, uniformed distribution of content inside them, avoiding cases where the partitions are too heavy to process, or there are too many light partitions. This results in smoother processing and better cluster resource utilization, without affecting the output.
The Pattern is controlled by a set of parent tiles that represent leveling points in the tiles tree. If a partition is a HereTile with a parent included in that set, then the partition is mapped to that parent; the parent is the leveling point. If there are multiple parents present in the controlling set, the closest parent is the leveling point. This is determined by navigating from the HereTile upwards toward the root.
Partition names that are not HereTiles, or are orphan HereTiles in the controlling set are left unmapped.
In cases where every HereTile needs to be aggregated, make sure to include the root HereTile in the controlling set of parent tiles, so that every HereTile has at least one leveling point.
Consider passing this object to Spark worker nodes inside a org.apache.spark.broadcast.Broadcast, as the set of controlling parents may be very large.
This solution applies to HereTiles only, as it requires tiles to have a chain of parents.
However, developers may implement a similar pattern for the Generic partitioning scheme with a custom Pattern. Suppose you want to level generic data in single partitions, one per country for small countries; or in multiple partitions, one per region/state for large countries. You can establish a convention to use Generic partition names with ISO country codes for small countries, such as AND or SLO, and country codes followed by region/state codes for large countries, such as USA_CA or CAN_BC. Then, you can implement and use a custom Pattern that holds the set of ISO country codes of the large countries. Ultimately, given a country code and a region/state code, the pattern returns just the country code if it's not in the set. Otherwise, the pattern concatenates the country code and the region/state code.

Note
Use a AdaptivePatternEstimator to compute the pattern and a com.here.platform.data.processing.spark.partitioner.AdaptiveLevelingPartitioner to balance the size of Spark partitions.
trait AdaptivePatternEstimateFn extends InputLayers with InputOptPartitioner with Serializable
Main interface the user has to implement to calculate an com.here.platform.data.processing.leveling.AdaptivePattern.
Main interface the user has to implement to calculate an com.here.platform.data.processing.leveling.AdaptivePattern.
The users estimate the contribution of input partitions to different tile, by returning a weight for each tile involved. An com.here.platform.data.processing.leveling.AdaptivePattern is then calculated by accumulating weights to find out leveling points whose total weights (the sum of the weights of their children) does not exceed a given threshold. Users are free to give any custom meaning to weights and thresholds.
Used by AdaptivePatternEstimator, it is invoked in a distributed fashion on every input partition of the layers mentioned as input. com.here.platform.data.processing.blobstore.Retriever may be passed in constructor and used internally, however this pattern is discouraged.
AdaptivePatternEstimator applies this function to the whole input layers at every run non-incrementally so, unless the input layers are few and small, using a com.here.platform.data.processing.blobstore.Retriever would download the whole input data every time.
Estimates don't have to be precise, so the suggested pattern is to use the payload size present in the metadata as indication of the data size, without retrieving the payload.
class AdaptivePatternEstimator extends AnyRef
Computes an AdaptivePattern given an AdaptivePatternEstimateFn and a threshold.
case class FixedPattern(level: Int) extends Pattern with Product with Serializable
Levels partitions to a com.here.platform.data.processing.catalog.Partition.HereTile level not greater than a fixed one.
Levels partitions to a com.here.platform.data.processing.catalog.Partition.HereTile level not greater than a fixed one.
Partition names that are HereTiles with a level greater than the fixed level are aggregated into their parent HereTile at the fixed level. Partition names that are not HereTiles, or are HereTiles already at the given level or at a lower level, are left unmapped.
level
The fixed level.
trait Pattern extends Function[Name, Name] with Serializable
Represents a leveling pattern.
Represents a leveling pattern.
A Pattern controls a leveling algorithm by deciding if a given partition should be mapped to another one, usually one of its parent and at a different level. Partitions are mapped to other partitions to balance the density of the content, such as map many small partitions to one single partition at a lower level; big partitions remain unmapped. Content is rebalanced and leveled, at the "leveling point".
Typically a leveling point may not be defined for every partition.
A Pattern can be used in multiple ways: - In com.here.platform.data.processing.compiler.direct.CompileInFn.mappingFn, com.here.platform.data.processing.compiler.mapgroup.CompileInFn.compileInFn, com.here.platform.data.processing.compiler.reftree.CompileInFnWithRefs.compileInFn or com.here.platform.data.processing.compiler.reftree.CompileInFnWithRefsReturnsReferences.compileInRefsFn to produce com.here.platform.data.processing.compiler.OutKeys that are functions of the density of the input, thus resulting in a density-based leveling of output partitions. - In the specialized com.here.platform.data.processing.spark.partitioner.AdaptiveLevelingPartitioner to define Spark partitions based on content density and distribute processing more uniformly across Spark partitions. This does not affect the output catalog but only the runtime characteristics of the process.
Leveling patterns are defined on com.here.platform.data.processing.catalog.Partition.Name. This concept is applicable to both the Generic and the HereTile partitioning scheme.
It is possible to use the pattern as a scala.Predef.Function to map each partition to its leveling point. In case the partition passed is not supposed to be aggregated, it is returned unchanged.
A Pattern must be scala.Serializable: it is usually calculated in the com.here.platform.data.processing.driver.Driver and transferred to worker nodes to implement the distributed adaptive leveling and/or the density-aware Spark partitioning.
Given the size and complexity of some Patterns, the org.apache.spark.broadcast.Broadcast mechanism should be used when capturing a Pattern. This happens for example when a Pattern is passed as parameter in the constructor of your implementation of com.here.platform.data.processing.compiler.direct.CompileInFn, com.here.platform.data.processing.compiler.mapgroup.CompileInFn or com.here.platform.data.processing.compiler.reftree.CompileInFn.

Value Members

object AdaptivePattern extends Serializable
object AdaptivePatternEstimator
Contains the core functions of the algorithm
object Implicits

Packages

leveling 

package leveling

Type Members

Value Members

Ungrouped

leveling