package config
This package contains the configuration classes for all components of a
Driver. Configuration is read from
application.conf
and provided to the developer as a
com.here.platform.data.processing.driver.config.CompleteConfig instance, when the driver is
setup.
- Alphabetic
- By Inheritance
- config
- AnyRef
- Any
- Hide All
- Show All
- Public
- All
Type Members
- final case class CatalogConfig(additionalFields: Set[AdditionalField]) extends Product with Serializable
- final case class CatalogsConfig(default: CatalogConfig, overrides: Map[Id, Override]) extends Product with Serializable
-
case class
CompileInConfig(threads: Int, sorting: Boolean) extends Product with Serializable
The configuration for the compileIn function.
The configuration for the compileIn function.
- threads
The number of parallel compileIn functions to execute per Spark task. This can be used to tune the CPU load/memory consumption of specific compileIn implementations.
- sorting
If true, compileIn executes over the Spark partitions sorted by partition key. In cases where compileIn retrieves additional payloads which are geographically close to the partition compileIn is producing, consider using a com.here.platform.data.processing.spark.partitioner.LocalityAwarePartitioner together with a cache for the additional content. This pattern improves performance and avoids retrieving and decoding the additional content multiple times. This parameter sorts the partition being produced by partition key, in this case locality, consequently improving the hit or miss ratio for the cache you implement in your code.
-
case class
CompileOutConfig(threads: Int, sorting: Boolean) extends Product with Serializable
The configuration for the compileOut function.
The configuration for the compileOut function.
- threads
The number of parallel compileOut functions to execute per Spark task. This can be used to tune the CPU load/memory consumption of your compileOut implementations.
- sorting
If true, compileOut executes over the Spark partitions sorted by partition key. In cases where compileOut retrieves additional payloads which are geographically close to the partition compileOut is producing, consider using a com.here.platform.data.processing.spark.partitioner.LocalityAwarePartitioner together with a cache for the additional content. This pattern improves performance and avoids retrieving and decoding the additional content multiple times. This parameter sorts the partition being produced by partition key, in this case locality, consequently improving the hit or miss ratio for the cache you implement in your code.
-
trait
CompleteConfig extends AnyRef
Provides the complete configuration available to the processing library and compilers.
Provides the complete configuration available to the processing library and compilers. This configuration combines all the information specified in the library's
reference.conf
, in the compilers'application.conf
, and via the command line parameters. -
case class
DriverConfig(appName: String, parallelUploads: Int, parallelRetrievers: Int, numCommitPartitions: Int, sparkStorageLevels: SparkStorageLevelsConfig, state: StateConfig, disableIncremental: Boolean, uniquePartitionLimitInBytes: Int, disableCommitIntegrityCheck: Boolean, allowEmptyPayloads: Boolean, catalogs: CatalogsConfig) extends Product with Serializable
The configuration necessary to instantiate and configure a com.here.platform.data.processing.driver.Driver.
The configuration necessary to instantiate and configure a com.here.platform.data.processing.driver.Driver.
- appName
The name of the application to be set in the Spark context.
- parallelUploads
The number of parallel uploads the library should perform inside a Spark task, when data is published to the Blob API.
- parallelRetrievers
The number of parallel retrieves the library should perform inside a Spark task, when data is retrieved from the Blob API.
- numCommitPartitions
The maximum number of parts to commit within a multipart commit to the Data API.
- sparkStorageLevels
The configuration of the Spark storage levels for each RDD category in the library.
- state
The configuration for the state layer that specifies how the layer is stored.
- disableIncremental
If true, incremental compilation is disabled.
- uniquePartitionLimitInBytes
The size limit beyond which partitions are considered to be unique. The data handle for partitions with identical content is reused to avoid uploading the same payload multiple times.
- disableCommitIntegrityCheck
If true, the final integrity check on the committed partitions is disabled.
- allowEmptyPayloads
Whether to allow publishing of empty (0 byte) payloads.
- catalogs
catalog specific driver configurations.
-
case class
ExecutorConfig(compilein: CompileInConfig, compileout: CompileOutConfig, reftree: RefTreeExecutorConfig, partitionKeyFilters: Seq[PartitionKeyFilterConfig], debug: ExecutorDebugConfig) extends Product with Serializable
The configuration for all driver executors.
The configuration for all driver executors.
- compilein
The configuration for the compileIn function.
- compileout
The configuration for the compileOut function.
- reftree
The configuration specific to the RefTree executors.
- debug
Configuration parameters to simplify the debugging process.
-
case class
ExecutorDebugConfig(collectStageErrors: Boolean) extends Product with Serializable
Configures the executors' behavior to simplify the debugging process.
Configures the executors' behavior to simplify the debugging process.
- collectStageErrors
If true, runtime errors that occur in every stage are collected and logged in the Spark driver. Use this option only when you need to debug exceptions, as it splits the Spark workload into multiple jobs.
-
case class
PartitionKeyFilterConfig(className: Class[_ <: PartitionKeyFilter], param: Config) extends Product with Serializable
- className
The name of the class implementing the filter, it must derive from com.here.platform.data.processing.driver.filter.PartitionKeyFilter.
- param
The config that is passed to the appropriate com.here.platform.data.processing.driver.filter.PartitionKeyFilter subclass instance.
- Exceptions thrown
scala.Exception
if failed to load class by name, or if the class does not derive from PartitionKeyFilter.
-
case class
RefTreeExecutorConfig(parallelResolves: Int) extends Product with Serializable
The configuration for RefTree executors.
The configuration for RefTree executors.
- parallelResolves
The number of parallel resolveFn functions to execute per Spark task. This can be used to tune the CPU load/memory consumption of your resolveFn implementations.
-
case class
SparkStorageLevelsConfig(default: StorageLevel, catalogQueries: StorageLevel, publishedPayloads: StorageLevel, persistedState: StorageLevel) extends Product with Serializable
The configuration for Spark storage levels, used by the processing library when persisting important RDDs.
The configuration for Spark storage levels, used by the processing library when persisting important RDDs.
- default
The default storage level used for all the RDDs unless otherwise mentioned.
- catalogQueries
The storage level used for metadata queried from the Data API.
- publishedPayloads
The storage level used for metadata of payloads published to the Blob API and not yet committed.
- persistedState
The storage level used for the state when saved to or loaded back from the Blob API.
-
case class
StateConfig(layer: Id, partitions: Int) extends Product with Serializable
The configuration that specifies how the state is published in the output catalog.
The configuration that specifies how the state is published in the output catalog.
- layer
The ID of the layer where the state is saved.
- partitions
The number of partitions the state layer should be divided into.
Value Members
- object CatalogConfig extends Serializable
-
object
CompleteConfig
Common definitions for CompleteConfig trait implementations.
- object PartitionKeyFilterConfig extends Serializable