package leveling
- Alphabetic
- Public
- All
Type Members
-
final
class
AdaptivePattern extends Pattern
Levels partitions by parent com.here.platform.data.processing.catalog.Partition.HereTiles.
Levels partitions by parent com.here.platform.data.processing.catalog.Partition.HereTiles.
This Pattern is used to implement adaptive leveling of output tiles based on content density. Adaptive leveling can be used to output tiles at a lower level in geographic areas where the content is sparse or at a higher level in geographic areas where the content is dense.
This solution results in the following: - the sizes of the output tiles are more uniform and distributed closer to the average size - extremes such as few tiles that are too big or too many small tiles are avoided - download times are more uniform and predictable, especially for interactive applications
This Pattern can also be used to balance the sizes of Spark partitions to obtain a more even, uniformed distribution of content inside them, avoiding cases where the partitions are too heavy to process, or there are too many light partitions. This results in smoother processing and better cluster resource utilization, without affecting the output.
The Pattern is controlled by a set of parent tiles that represent leveling points in the tiles tree. If a partition is a HereTile with a parent included in that set, then the partition is mapped to that parent; the parent is the leveling point. If there are multiple parents present in the controlling set, the closest parent is the leveling point. This is determined by navigating from the HereTile upwards toward the root.
Partition names that are not HereTiles, or are orphan HereTiles in the controlling set are left unmapped.
In cases where every HereTile needs to be aggregated, make sure to include the root HereTile in the controlling set of parent tiles, so that every HereTile has at least one leveling point.
Consider passing this object to Spark worker nodes inside a org.apache.spark.broadcast.Broadcast, as the set of controlling parents may be very large.
This solution applies to HereTiles only, as it requires tiles to have a chain of parents.
However, developers may implement a similar pattern for the Generic partitioning scheme with a custom Pattern. Suppose you want to level generic data in single partitions, one per country for small countries; or in multiple partitions, one per region/state for large countries. You can establish a convention to use Generic partition names with ISO country codes for small countries, such as AND or SLO, and country codes followed by region/state codes for large countries, such as USA_CA or CAN_BC. Then, you can implement and use a custom Pattern that holds the set of ISO country codes of the large countries. Ultimately, given a country code and a region/state code, the pattern returns just the country code if it's not in the set. Otherwise, the pattern concatenates the country code and the region/state code.
- Note
Use a AdaptivePatternEstimator to compute the pattern and a com.here.platform.data.processing.spark.partitioner.AdaptiveLevelingPartitioner to balance the size of Spark partitions.
-
trait
AdaptivePatternEstimateFn extends InputLayers with InputOptPartitioner with Serializable
Main interface the user has to implement to calculate an com.here.platform.data.processing.leveling.AdaptivePattern.
Main interface the user has to implement to calculate an com.here.platform.data.processing.leveling.AdaptivePattern.
The users estimate the contribution of input partitions to different tile, by returning a weight for each tile involved. An com.here.platform.data.processing.leveling.AdaptivePattern is then calculated by accumulating weights to find out leveling points whose total weights (the sum of the weights of their children) does not exceed a given threshold. Users are free to give any custom meaning to weights and thresholds.
Used by AdaptivePatternEstimator, it is invoked in a distributed fashion on every input partition of the layers mentioned as input. com.here.platform.data.processing.blobstore.Retriever may be passed in constructor and used internally, however this pattern is discouraged.
AdaptivePatternEstimator applies this function to the whole input layers at every run non-incrementally so, unless the input layers are few and small, using a com.here.platform.data.processing.blobstore.Retriever would download the whole input data every time.
Estimates don't have to be precise, so the suggested pattern is to use the payload size present in the metadata as indication of the data size, without retrieving the payload.
-
class
AdaptivePatternEstimator extends AnyRef
Computes an AdaptivePattern given an AdaptivePatternEstimateFn and a threshold.
-
case class
FixedPattern(level: Int) extends Pattern with Product with Serializable
Levels partitions to a com.here.platform.data.processing.catalog.Partition.HereTile level not greater than a fixed one.
Levels partitions to a com.here.platform.data.processing.catalog.Partition.HereTile level not greater than a fixed one.
Partition names that are HereTiles with a level greater than the fixed level are aggregated into their parent HereTile at the fixed level. Partition names that are not HereTiles, or are HereTiles already at the given level or at a lower level, are left unmapped.
- level
The fixed level.
-
trait
Pattern extends Function[Name, Name] with Serializable
Represents a leveling pattern.
Represents a leveling pattern.
A Pattern controls a leveling algorithm by deciding if a given partition should be mapped to another one, usually one of its parent and at a different level. Partitions are mapped to other partitions to balance the density of the content, such as map many small partitions to one single partition at a lower level; big partitions remain unmapped. Content is rebalanced and leveled, at the "leveling point".
Typically a leveling point may not be defined for every partition.
A Pattern can be used in multiple ways: - In com.here.platform.data.processing.compiler.direct.CompileInFn.mappingFn, com.here.platform.data.processing.compiler.mapgroup.CompileInFn.compileInFn, com.here.platform.data.processing.compiler.reftree.CompileInFnWithRefs.compileInFn or com.here.platform.data.processing.compiler.reftree.CompileInFnWithRefsReturnsReferences.compileInRefsFn to produce com.here.platform.data.processing.compiler.OutKeys that are functions of the density of the input, thus resulting in a density-based leveling of output partitions. - In the specialized com.here.platform.data.processing.spark.partitioner.AdaptiveLevelingPartitioner to define Spark partitions based on content density and distribute processing more uniformly across Spark partitions. This does not affect the output catalog but only the runtime characteristics of the process.
Leveling patterns are defined on com.here.platform.data.processing.catalog.Partition.Name. This concept is applicable to both the Generic and the HereTile partitioning scheme.
It is possible to use the pattern as a scala.Predef.Function to map each partition to its leveling point. In case the partition passed is not supposed to be aggregated, it is returned unchanged.
A Pattern must be scala.Serializable: it is usually calculated in the com.here.platform.data.processing.driver.Driver and transferred to worker nodes to implement the distributed adaptive leveling and/or the density-aware Spark partitioning.
Given the size and complexity of some Patterns, the org.apache.spark.broadcast.Broadcast mechanism should be used when capturing a Pattern. This happens for example when a Pattern is passed as parameter in the constructor of your implementation of com.here.platform.data.processing.compiler.direct.CompileInFn, com.here.platform.data.processing.compiler.mapgroup.CompileInFn or com.here.platform.data.processing.compiler.reftree.CompileInFn.
Value Members
- object AdaptivePattern extends Serializable
-
object
AdaptivePatternEstimator
Contains the core functions of the algorithm
- object Implicits