Packages

package config

This package contains the configuration classes for all components of a Driver. Configuration is read from application.conf and provided to the developer as a com.here.platform.data.processing.driver.config.CompleteConfig instance, when the driver is setup.

Linear Supertypes
AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. config
  2. AnyRef
  3. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Type Members

  1. final case class CatalogConfig(additionalFields: Set[AdditionalField]) extends Product with Serializable
  2. final case class CatalogsConfig(default: CatalogConfig, overrides: Map[Id, Override]) extends Product with Serializable
  3. case class CompileInConfig(threads: Int, sorting: Boolean) extends Product with Serializable

    The configuration for the compileIn function.

    The configuration for the compileIn function.

    threads

    The number of parallel compileIn functions to execute per Spark task. This can be used to tune the CPU load/memory consumption of specific compileIn implementations.

    sorting

    If true, compileIn executes over the Spark partitions sorted by partition key. In cases where compileIn retrieves additional payloads which are geographically close to the partition compileIn is producing, consider using a com.here.platform.data.processing.spark.partitioner.LocalityAwarePartitioner together with a cache for the additional content. This pattern improves performance and avoids retrieving and decoding the additional content multiple times. This parameter sorts the partition being produced by partition key, in this case locality, consequently improving the hit or miss ratio for the cache you implement in your code.

  4. case class CompileOutConfig(threads: Int, sorting: Boolean) extends Product with Serializable

    The configuration for the compileOut function.

    The configuration for the compileOut function.

    threads

    The number of parallel compileOut functions to execute per Spark task. This can be used to tune the CPU load/memory consumption of your compileOut implementations.

    sorting

    If true, compileOut executes over the Spark partitions sorted by partition key. In cases where compileOut retrieves additional payloads which are geographically close to the partition compileOut is producing, consider using a com.here.platform.data.processing.spark.partitioner.LocalityAwarePartitioner together with a cache for the additional content. This pattern improves performance and avoids retrieving and decoding the additional content multiple times. This parameter sorts the partition being produced by partition key, in this case locality, consequently improving the hit or miss ratio for the cache you implement in your code.

  5. trait CompleteConfig extends AnyRef

    Provides the complete configuration available to the processing library and compilers.

    Provides the complete configuration available to the processing library and compilers. This configuration combines all the information specified in the library's reference.conf, in the compilers' application.conf, and via the command line parameters.

  6. case class DriverConfig(appName: String, parallelUploads: Int, parallelRetrievers: Int, numCommitPartitions: Int, sparkStorageLevels: SparkStorageLevelsConfig, state: StateConfig, disableIncremental: Boolean, uniquePartitionLimitInBytes: Int, disableCommitIntegrityCheck: Boolean, allowEmptyPayloads: Boolean, catalogs: CatalogsConfig) extends Product with Serializable

    The configuration necessary to instantiate and configure a com.here.platform.data.processing.driver.Driver.

    The configuration necessary to instantiate and configure a com.here.platform.data.processing.driver.Driver.

    appName

    The name of the application to be set in the Spark context.

    parallelUploads

    The number of parallel uploads the library should perform inside a Spark task, when data is published to the Blob API.

    parallelRetrievers

    The number of parallel retrieves the library should perform inside a Spark task, when data is retrieved from the Blob API.

    numCommitPartitions

    The maximum number of parts to commit within a multipart commit to the Data API.

    sparkStorageLevels

    The configuration of the Spark storage levels for each RDD category in the library.

    state

    The configuration for the state layer that specifies how the layer is stored.

    disableIncremental

    If true, incremental compilation is disabled.

    uniquePartitionLimitInBytes

    The size limit beyond which partitions are considered to be unique. The data handle for partitions with identical content is reused to avoid uploading the same payload multiple times.

    disableCommitIntegrityCheck

    If true, the final integrity check on the committed partitions is disabled.

    allowEmptyPayloads

    Whether to allow publishing of empty (0 byte) payloads.

    catalogs

    catalog specific driver configurations.

  7. case class ExecutorConfig(compilein: CompileInConfig, compileout: CompileOutConfig, reftree: RefTreeExecutorConfig, partitionKeyFilters: Seq[PartitionKeyFilterConfig], debug: ExecutorDebugConfig) extends Product with Serializable

    The configuration for all driver executors.

    The configuration for all driver executors.

    compilein

    The configuration for the compileIn function.

    compileout

    The configuration for the compileOut function.

    reftree

    The configuration specific to the RefTree executors.

    debug

    Configuration parameters to simplify the debugging process.

  8. case class ExecutorDebugConfig(collectStageErrors: Boolean) extends Product with Serializable

    Configures the executors' behavior to simplify the debugging process.

    Configures the executors' behavior to simplify the debugging process.

    collectStageErrors

    If true, runtime errors that occur in every stage are collected and logged in the Spark driver. Use this option only when you need to debug exceptions, as it splits the Spark workload into multiple jobs.

  9. case class PartitionKeyFilterConfig(className: Class[_ <: PartitionKeyFilter], param: Config) extends Product with Serializable

    className

    The name of the class implementing the filter, it must derive from com.here.platform.data.processing.driver.filter.PartitionKeyFilter.

    param

    The config that is passed to the appropriate com.here.platform.data.processing.driver.filter.PartitionKeyFilter subclass instance.

    Exceptions thrown

    scala.Exception if failed to load class by name, or if the class does not derive from PartitionKeyFilter.

  10. case class RefTreeExecutorConfig(parallelResolves: Int) extends Product with Serializable

    The configuration for RefTree executors.

    The configuration for RefTree executors.

    parallelResolves

    The number of parallel resolveFn functions to execute per Spark task. This can be used to tune the CPU load/memory consumption of your resolveFn implementations.

  11. case class SparkStorageLevelsConfig(default: StorageLevel, catalogQueries: StorageLevel, publishedPayloads: StorageLevel, persistedState: StorageLevel) extends Product with Serializable

    The configuration for Spark storage levels, used by the processing library when persisting important RDDs.

    The configuration for Spark storage levels, used by the processing library when persisting important RDDs.

    default

    The default storage level used for all the RDDs unless otherwise mentioned.

    catalogQueries

    The storage level used for metadata queried from the Data API.

    publishedPayloads

    The storage level used for metadata of payloads published to the Blob API and not yet committed.

    persistedState

    The storage level used for the state when saved to or loaded back from the Blob API.

  12. case class StateConfig(layer: Id, partitions: Int) extends Product with Serializable

    The configuration that specifies how the state is published in the output catalog.

    The configuration that specifies how the state is published in the output catalog.

    layer

    The ID of the layer where the state is saved.

    partitions

    The number of partitions the state layer should be divided into.

Value Members

  1. object CatalogConfig extends Serializable
  2. object CompleteConfig

    Common definitions for CompleteConfig trait implementations.

  3. object PartitionKeyFilterConfig extends Serializable

Inherited from AnyRef

Inherited from Any

Ungrouped