class Publisher extends AnyRef
Publishes payloads, in the form of an RDD of com.here.platform.data.processing.catalog.Partition.Key and com.here.platform.data.processing.blobstore.Payload, to the Blob API and returns RDD of com.here.platform.data.processing.catalog.Partition.Commit to be committed to the Metadata API.
- Note
both the full snapshot and the incremental one are implemented here the difference is mainly in the joining strategy adopted
,metadata related to the published payloads need to be committed in a subsequent operation
- Alphabetic
- By Inheritance
- Publisher
- AnyRef
- Any
- Hide All
- Show All
- Public
- Protected
Instance Constructors
- new Publisher(uploader: Uploader, retriever: Retriever, numThreads: Int, uniquePartitionLimitInBytes: Int = 0, statistics: Option[Map[Id, PublisherStats]] = None)
Creates a new publisher.
Creates a new publisher.
- uploader
The com.here.platform.data.processing.blobstore.Uploader object.
- retriever
The com.here.platform.data.processing.blobstore.Retriever object. Needed when, during an incremental snapshot in conjunction with a rebase scenario, an unchanged partition is changed in the base catalog and must be overridden again, but the payload is only available in the base version of the output catalog. See com.here.platform.data.processing.publisher.PublishingAction.RebaseUpload.
- numThreads
The number of threads (per Spark partition) used to upload the data.
- uniquePartitionLimitInBytes
the commit partition of payloads of size smaller or equal to this will be cached (locally to the Spark partition). In case a layer has many identical partitions, which would end up having the same data handle, a cache is faster than gracefully handling a large number of conflict exceptions. 0 means only the empty payload will be cached, and it is the recommended setting in case of layers that cannot have the same content in different partitions.
- statistics
The optional statistics counters.
Value Members
- final def !=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
- final def ##: Int
- Definition Classes
- AnyRef → Any
- final def ==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
- final def asInstanceOf[T0]: T0
- Definition Classes
- Any
- def clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.CloneNotSupportedException]) @IntrinsicCandidate() @native()
- val dataPublisher: DataPublisher
- final def eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
- def equals(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef → Any
- final def getClass(): Class[_ <: AnyRef]
- Definition Classes
- AnyRef → Any
- Annotations
- @IntrinsicCandidate() @native()
- def hashCode(): Int
- Definition Classes
- AnyRef → Any
- Annotations
- @IntrinsicCandidate() @native()
- final def isInstanceOf[T0]: Boolean
- Definition Classes
- Any
- final def ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
- final def notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @IntrinsicCandidate() @native()
- final def notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @IntrinsicCandidate() @native()
- def publishFullSnapshot(current: RDD[(Key, Meta)], candidates: RDD[(Key, Option[Payload])], baseChanges: Option[RDD[(Key, Change)]] = None): RDD[(Key, Commit)]
Publishes a full snapshot of data.
Publishes a full snapshot of data.
- current
The RDD containing the catalog's current status.
- candidates
The RDD containing the candidate operations to upload.
- baseChanges
Optional RDD containing the changes inherited from the base catalog of all composite layers during a rebase. Should be defined if and only if the catalog has composite layers and if the dependency on the base catalog is being updated (rebase).
- returns
the metadata RDD that were already published but not yet committed.
- Note
Candidate operations are potential operations that can be done, depending on the status of the catalog. These operations depend on whether there is duplicated content or not, as duplicated content do not have to be published. This function deletes content in the current RDD that is not present in the candidates' RDD.
- def publishIncrementalSnapshot(current: RDD[(Key, Meta)], candidates: RDD[(Key, Option[Payload])], baseChanges: Option[RDD[(Key, Change)]] = None): RDD[(Key, Commit)]
Publishes an increment snapshot of data.
Publishes an increment snapshot of data.
- current
The RDD containing the catalog's current status.
- candidates
The RDD containing the candidate operations to upload.
- baseChanges
Optional RDD containing the changes inherited from the base catalog of all composite layers during a rebase. Should be defined if and only if the catalog has composite layers and if the dependency on the base catalog is being updated (rebase).
- returns
the metadata RDD that were already published but not yet committed.
- Note
Candidate operations are potential operations that can be done, depending on the status of the catalog. These operations depend on whether there is duplicated content or not, as duplicated content do not have to be published.
- final def synchronized[T0](arg0: => T0): T0
- Definition Classes
- AnyRef
- def toString(): String
- Definition Classes
- AnyRef → Any
- final def wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException])
- final def wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException]) @native()
- final def wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException])
Deprecated Value Members
- def finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.Throwable]) @Deprecated
- Deprecated
(Since version 9)