class Publisher extends AnyRef
Publishes payloads, in the form of an RDD of com.here.platform.data.processing.catalog.Partition.Key and com.here.platform.data.processing.blobstore.Payload, to the Blob API and returns RDD of com.here.platform.data.processing.catalog.Partition.Commit to be committed to the Metadata API.
- Note
both the full snapshot and the incremental one are implemented here the difference is mainly in the joining strategy adopted
,metadata related to the published payloads need to be committed in a subsequent operation
- Alphabetic
- By Inheritance
- Publisher
- AnyRef
- Any
- Hide All
- Show All
- Public
- All
Instance Constructors
-
new
Publisher(uploader: Uploader, retriever: Retriever, numThreads: Int, uniquePartitionLimitInBytes: Int = 0, statistics: Option[Map[Id, PublisherStats]] = None)
Creates a new publisher.
Creates a new publisher.
- uploader
The com.here.platform.data.processing.blobstore.Uploader object.
- retriever
The com.here.platform.data.processing.blobstore.Retriever object. Needed when, during an incremental snapshot in conjunction with a rebase scenario, an unchanged partition is changed in the base catalog and must be overridden again, but the payload is only available in the base version of the output catalog. See com.here.platform.data.processing.publisher.PublishingAction.RebaseUpload.
- numThreads
The number of threads (per Spark partition) used to upload the data.
- uniquePartitionLimitInBytes
the commit partition of payloads of size smaller or equal to this will be cached (locally to the Spark partition). In case a layer has many identical partitions, which would end up having the same data handle, a cache is faster than gracefully handling a large number of conflict exceptions. 0 means only the empty payload will be cached, and it is the recommended setting in case of layers that cannot have the same content in different partitions.
- statistics
The optional statistics counters.
Value Members
-
final
def
!=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
##(): Int
- Definition Classes
- AnyRef → Any
-
final
def
==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
asInstanceOf[T0]: T0
- Definition Classes
- Any
-
def
clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native() @IntrinsicCandidate()
- val dataPublisher: DataPublisher
-
final
def
eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
def
equals(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
getClass(): Class[_]
- Definition Classes
- AnyRef → Any
- Annotations
- @native() @IntrinsicCandidate()
-
def
hashCode(): Int
- Definition Classes
- AnyRef → Any
- Annotations
- @native() @IntrinsicCandidate()
-
final
def
isInstanceOf[T0]: Boolean
- Definition Classes
- Any
-
final
def
ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
final
def
notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native() @IntrinsicCandidate()
-
final
def
notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native() @IntrinsicCandidate()
-
def
publishFullSnapshot(current: RDD[(Key, Meta)], candidates: RDD[(Key, Option[Payload])], baseChanges: Option[RDD[(Key, Change)]] = None): RDD[(Key, Commit)]
Publishes a full snapshot of data.
Publishes a full snapshot of data.
- current
The RDD containing the catalog's current status.
- candidates
The RDD containing the candidate operations to upload.
- baseChanges
Optional RDD containing the changes inherited from the base catalog of all composite layers during a rebase. Should be defined if and only if the catalog has composite layers and if the dependency on the base catalog is being updated (rebase).
- returns
the metadata RDD that were already published but not yet committed.
- Note
Candidate operations are potential operations that can be done, depending on the status of the catalog. These operations depend on whether there is duplicated content or not, as duplicated content do not have to be published. This function deletes content in the current RDD that is not present in the candidates' RDD.
-
def
publishIncrementalSnapshot(current: RDD[(Key, Meta)], candidates: RDD[(Key, Option[Payload])], baseChanges: Option[RDD[(Key, Change)]] = None): RDD[(Key, Commit)]
Publishes an increment snapshot of data.
Publishes an increment snapshot of data.
- current
The RDD containing the catalog's current status.
- candidates
The RDD containing the candidate operations to upload.
- baseChanges
Optional RDD containing the changes inherited from the base catalog of all composite layers during a rebase. Should be defined if and only if the catalog has composite layers and if the dependency on the base catalog is being updated (rebase).
- returns
the metadata RDD that were already published but not yet committed.
- Note
Candidate operations are potential operations that can be done, depending on the status of the catalog. These operations depend on whether there is duplicated content or not, as duplicated content do not have to be published.
-
final
def
synchronized[T0](arg0: ⇒ T0): T0
- Definition Classes
- AnyRef
-
def
toString(): String
- Definition Classes
- AnyRef → Any
-
final
def
wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()
-
final
def
wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
Deprecated Value Members
-
def
finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( classOf[java.lang.Throwable] ) @Deprecated
- Deprecated