package deltasets
- Alphabetic
- Public
- Protected
Package Members
- package transformations
Type Members
- trait BaseSet extends AnyRef
The common base type for PublishedSet and DeltaSet.
- class BaseSetIdAssigner extends AnyRef
- final case class CanDetermine[+A](x: A) extends Determine[A] with Product with Serializable
A value that can be determined.
- sealed trait Changes[+K, +V] extends AnyRef
Represents the changes between two KeyValues.
Represents the changes between two KeyValues. They may be either NoChanges or SomeChanges, which are represented by replaced key-values and deleted keys.
- K
The type of the key
- V
The type of the value
- sealed trait Commits extends AnyRef
Represents the set of com.here.platform.data.processing.catalog.Partition.Commits created by a PublishedSetLike.
Represents the set of com.here.platform.data.processing.catalog.Partition.Commits created by a PublishedSetLike. They may be either NoCommits, when the upstream DeltaSet has NoChanges, or SomeCommits, containing the possibly empty set of committed keys.
- trait DeltaContext extends AnyRef
Provides access to a set of common resources which a user of DeltaSets may require, that can be used by com.here.platform.data.processing.driver.DeltaSetup.
- trait DeltaSet[K, V] extends BaseSet
The DeltaSet is the main processing abstraction to implement custom processing patterns.
The DeltaSet is the main processing abstraction to implement custom processing patterns.
Pipeline developers may implement their processing logic by applying transformations to DeltaSets.
Core of the incremental processing framework is the ability to expose not only the contents of a DeltaSet but also what has changed about the contents since a previous, reference run. This information may be used by transformed DeltaSets to also expose only what has changed, enabling differential processing across any DAG of transformations.
Each DeltaSet is characterized by the type of keys and values it contains. The format of the input data, when applicable, needed produce the output data is not specified here, and depends on the DeltaSet implementation.
Data in a DeltaSet is not only strongly typed but also strongly partitioned.
The provided DeltaContext exposes the input catalog as sources. When integrating with the com.here.platform.data.processing.driver.Driver via the com.here.platform.data.processing.driver.DeltaSetup interface and relative com.here.platform.data.processing.driver.DeltaDriverTaskBuilder or directly using the com.here.platform.data.processing.driver.DeltaDriverTask, the result of processing must be exposed as sink of fixed types.
DeltaSets are immutable, distributed, de-duplicated.
DeltaSet transformations are incremental, lazy.
- case class DeltaSetConfig(intermediateStorageLevel: StorageLevel, validationLevel: DeltaSetConfig.ValidationLevel.Value, threads: Int, sorting: Boolean, incremental: Boolean, forceStateless: Boolean) extends Product with Serializable
- sealed trait Determine[+A] extends AnyRef
Represents a value that may or may not be determined.
Represents a value that may or may not be determined.
Several Determine value can be combined using Determine!.zip and Determine.reduce.
- A
The type of the wrapped value.
- final case class KeyValues[K, V](rdd: RDD[(K, V)], partitioner: Partitioner[K]) extends Product with Serializable
Uses a Spark RDD to store key-value pairs.
Uses a Spark RDD to store key-value pairs. Compared to a normal RDD, the use of this class asserts and, where possible, ensures that two conditions are met:
1. There are no duplicate keys. 2. The RDD is partitioned with the given partitioner.
- K
The type of the keys.
- V
The type of the values.
- rdd
The Spark RDD storing a set of key-value pairs. Must be partitioned using
partitioner.- partitioner
The partitioner.
- final case class Keys[K](rdd: RDD[(K, Unit)], partitioner: Partitioner[K]) extends Product with Serializable
Uses a Spark RDD to store a set of keys.
Uses a Spark RDD to store a set of keys. Compared to a normal RDD, the use of this class asserts and, where possible, ensures that two conditions are met:
1. There are no duplicate keys. 2. The RDD is partitioned with the given partitioner.
- K
The type of the keys.
- rdd
The Spark RDD storing a set of keys. Must be partitioned using
partitioner.- partitioner
The partitioner.
- class ManyToMany[S, T] extends (S) => Iterable[T] with Serializable
Represents an m-to-n relation by pairing the function
mapFnwith its inverse functioninverseFn.Represents an m-to-n relation by pairing the function
mapFnwith its inverse functioninverseFn. If the function represented by this class is applied to a value for whichinverseFnis not the inverse ofmapFn, an exception is thrown.Note that
inverseFncan be called by the Data Processing Library on values that are not produced bymapFn. DefineinverseFnas a partial function to correctly restrict its domain to the set of keys, for which an inverse can be defined.- S
The domain of the function.
- T
The co-domain of the function.
- class ManyToOne[S, T] extends (S) => T with Serializable
Represents an n-to-1 relation by pairing the function
mapFnwith its inverse functioninverseFn.Represents an n-to-1 relation by pairing the function
mapFnwith its inverse functioninverseFn. If the function represented by this class is applied to a value for whichinverseFnis not the inverse ofmapFn, an exception is thrown.Note that
inverseFncan be called by the Data Processing Library on values that are not produced bymapFn. DefineinverseFnas a partial function to correctly restrict its domain to the set of keys, for which an inverse can be defined.- S
The domain of the function.
- T
The co-domain of the function.
- class OneToMany[S, T] extends (S) => Iterable[T] with Serializable
Represents a 1-to-n relation by pairing a function
flatMapFnwith its inverse functioninverseFn.Represents a 1-to-n relation by pairing a function
flatMapFnwith its inverse functioninverseFn. If the function represented by this class is applied to a value for whichinverseFnis not the inverse ofmapFn, an exception is thrown.Note that
inverseFncan be called by the Data Processing Library on values that are not produced bymapFn. DefineinverseFnas a partial function to correctly restrict its domain to the set of keys, for which an inverse can be defined.- S
The domain of the function.
- T
The co-domain of the function.
- class OneToOne[S, T] extends (S) => T with Serializable
Represents a 1-to-1 relation by pairing a function
mapFnwith its inverse functioninverseFn.Represents a 1-to-1 relation by pairing a function
mapFnwith its inverse functioninverseFn. If the function represented by this class is applied to a value for whichinverseFnis not the inverse ofmapFn, an exception is thrown.Note that
inverseFncan be called by the Data Processing Library on values that are not produced bymapFn. DefineinverseFnas a partial function to correctly restrict its domain to the set of keys, for which an inverse can be defined.- S
The domain of the function.
- T
The co-domain of the function.
- case class PartMapperByLevel(levels: Set[Int]) extends PublishedPartMapper with Product with Serializable
A PublishedPartMapper that assigns each key to a publish part based on its com.here.platform.data.processing.catalog.Partition.Name's
level.A PublishedPartMapper that assigns each key to a publish part based on its com.here.platform.data.processing.catalog.Partition.Name's
level. Typically used with com.here.platform.data.processing.catalog.Partition.HereTile keys, to publish each zoom level independently.- levels
The set of levels.
- sealed trait PartitioningStrategy[-K] extends AnyRef
Indicates, for a transformation that transformations keys of type
K1to keys of typeK2, whether the transformation will preserve the partitioning of the input DeltaSet, or whether it must be repartitioned with a partitioner.Indicates, for a transformation that transformations keys of type
K1to keys of typeK2, whether the transformation will preserve the partitioning of the input DeltaSet, or whether it must be repartitioned with a partitioner.- K
the type of output keys.
- trait PublishedPart extends PublishedSetLike
The result of publishing a DeltaSet to blobstore.
The result of publishing a DeltaSet to blobstore. Unlike a PublishedSet, a PublishedPart corresponds to a single
partof the output layers only. - trait PublishedPartMapper extends Serializable
An object that specifies how the output keys are partitioned in a multi part publishing.
An object that specifies how the output keys are partitioned in a multi part publishing.
- trait PublishedSet extends PublishedSetLike
The PublishedSet is the result of publishing a DeltaSet to blobstore.
- trait PublishedSetLike extends BaseSet
Base trait for classes that represent the result of publishing a DeltaSet to blobstore.
- case class RequiresRepartitioning[K](partitioner: Partitioner[K]) extends PartitioningStrategy[K] with Product with Serializable
Indicates that a transformation will not preserve the partitioning of the input DeltaSet.
Indicates that a transformation will not preserve the partitioning of the input DeltaSet. This means that an input key may transformed into an output key in a different Spark partition, and therefore, the data must be repartitioned.
- K
the type of output keys.
- partitioner
The partitioner to apply after the transformation.
- trait ResolutionStrategy[-K, -V] extends AnyRef
Defines a strategy that determines how metadata should be resolved.
Defines a strategy that determines how metadata should be resolved.
- K
the key type of the subject DeltaSet (the one transformed by mapValuesWithResolver)
- V
the value type of the subject DeltaSet (the one transformed by mapValuesWithResolver)
- trait Resolver extends AnyRef
Interface to resolve keys to metadata.
Interface to resolve keys to metadata. Provided to a mapping function used with
mapValuesWithResolverand backed by one or more ResolutionStrategys. - final case class SomeChanges[K, V](replaces: KeyValues[K, V], deletes: Keys[K]) extends Changes[K, V] with Product with Serializable
Represents changes between two KeyValues, which may be non-empty (as opposed to NoChanges, which always represents empty changes).
Represents changes between two KeyValues, which may be non-empty (as opposed to NoChanges, which always represents empty changes). Contains all keys with new values (added keys or keys with changed values), and all deleted keys.
- K
The type of the keys.
- V
The type of the values.
- replaces
All keys added or changed, with their new value.
- deletes
All keys deleted.
- final case class SomeCommits(commits: KeyValues[Key, Commit]) extends Commits with Product with Serializable
Represents a set of of com.here.platform.data.processing.catalog.Partition.Commits created by a PublishedSetLike, which may be non-empty (as opposed to NoCommits, which always represents empty commits).
Represents a set of of com.here.platform.data.processing.catalog.Partition.Commits created by a PublishedSetLike, which may be non-empty (as opposed to NoCommits, which always represents empty commits).
- commits
All commits.
- trait StateManager extends AnyRef
Interface for retrieving the state from within a DeltaSet implementation.
- trait Transformations extends AnyRef
Value Members
- object BaseSet
- case object CannotDetermine extends Determine[Nothing] with Product with Serializable
A value that cannot be determined.
- object DeltaContext
- object DeltaSetConfig extends Serializable
- object Determine
- object ManyToMany extends Serializable
- object ManyToOne extends Serializable
- case object NoChanges extends Changes[Nothing, Nothing] with Product with Serializable
A value that has not changed.
- case object NoCommits extends Commits with Product with Serializable
An empty set of commits, indicating that the upstream DeltaSet has no changes.
- object OneToMany extends Serializable
- object OneToOne extends Serializable
- case object PreservesPartitioning extends PartitioningStrategy[Any] with Product with Serializable
Indicates that a transformation will preserve the partitioning of the input DeltaSet.
Indicates that a transformation will preserve the partitioning of the input DeltaSet. This means, that output keys will reside in the same Spark partition as the input partitioner.
Using this setting will increase performance of the transformation.
- object ResolutionStrategy
Default ResolutionStrategies to use in
mapValuesWithResolver. - object SomeChanges extends Serializable
- object SomeCommits extends Serializable