Packages

trait Stateful[K, V] extends AnyRef

Groups transformations for a DeltaSet[K, V] that require storing metadata in the output catalog, to enable incremental processing.

K

The type of DeltaSet's keys.

V

The type of DeltaSet's values.

Linear Supertypes
AnyRef, Any
Known Subclasses
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. Stateful
  2. AnyRef
  3. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Abstract Value Members

  1. abstract val autoIdAssigner: BaseSetIdAssigner

    The ID assigner used to automatically assign DeltaSet IDs if not provided.

    The ID assigner used to automatically assign DeltaSet IDs if not provided.

    Attributes
    protected
  2. abstract def detectChanges(checkSum: (V) ⇒ IndexedSeq[Byte], id: Id = autoIdAssigner("detectChanges"), configOverride: Override = DeltaSetConfig.noOverride): DeltaSet[K, V]

    Returns the same DeltaSet but with more precise information about what has changed in the data since the last run.

    Returns the same DeltaSet but with more precise information about what has changed in the data since the last run. For this, values in this DeltaSet are compared to those in the previous run using a user-provided checkSum function. The checkSum function should be (with high probability) collision-free for the same key, and the return value should be relatively small, as the checksum of every value is stored in the output catalog.

    If the previous state is available, this DeltaSet returns changes even when the upstream DeltaSet runs non-incrementally.

    This function does not shuffle any data any data.

    checkSum

    The function to compute the checksum.

    id

    Deprecated.

    configOverride

    Deprecated.

    returns

    A DeltaSet that represents the same data as this DeltaSet, but with more precise information about changes.

    Note

    All arguments must be serializable, since this transformation is serialized and sent to workers nodes.

  3. abstract def flatMapGroup[K2, V2](flatMapFn: (K, V) ⇒ Iterable[(K2, V2)], partitioning: PartitioningStrategy[K2], id: Id = autoIdAssigner("flatMapGroup"), configOverride: Override = DeltaSetConfig.noOverride)(implicit arg0: ClassTag[K2], arg1: ClassTag[V2]): DeltaSet[K2, Iterable[V2]]

    Applies a function to all key-value pairs in this DeltaSet, partitions the result, and then groups all output key-value pairs by key.

    Applies a function to all key-value pairs in this DeltaSet, partitions the result, and then groups all output key-value pairs by key.

    If the partitioning is not PreservesPartitioning, this transformation will shuffle data. In particular, all key-value pairs that belong to a different partition after the key transformation may have to be transferred between executors.

    K2

    The type of output key.

    V2

    The type of output value.

    flatMapFn

    The function applied to the key-value pairs in the DeltaSet.

    partitioning

    Either a partitioner to use, or PreservesPartitioning to indicate that the mapped keys are always in the same Spark partition as the original keys. This will be checked at run-time.

    id

    Deprecated.

    configOverride

    Deprecated.

    returns

    A DeltaSet that represents the result of the transformation.

    Note

    All arguments must be serializable, since this transformation is serialized and sent to workers nodes.

  4. abstract def flatMapReduce[K2, V2](flatMapFn: (K, V) ⇒ Iterable[(K2, V2)], reduceFn: (V2, V2) ⇒ V2, partitioning: PartitioningStrategy[K2], id: Id = autoIdAssigner("flatMapReduce"), configOverride: Override = DeltaSetConfig.noOverride)(implicit arg0: ClassTag[K2], arg1: ClassTag[V2]): DeltaSet[K2, V2]

    Applies a function to all key-value pairs in this DeltaSet, partitions the result, and then reduces all values with the same key to a single value.

    Applies a function to all key-value pairs in this DeltaSet, partitions the result, and then reduces all values with the same key to a single value.

    If the partitioning is not PreservesPartitioning, this transformation will shuffle data. In particular, all key-value pairs that belong to a different partition after the key transformation may have to be transferred between executors.

    K2

    The type of output key.

    V2

    The type of output value.

    flatMapFn

    The function applied to the key-value pairs in the DeltaSet.

    reduceFn

    The associative and commutative reduce function.

    partitioning

    Either a partitioner to use, or PreservesPartitioning to indicate that the mapped keys are always in the same Spark partition as the original keys. This will be checked at run-time.

    id

    Deprecated.

    configOverride

    Deprecated.

    returns

    A DeltaSet that represents the result of the transformation.

    Note

    All arguments must be serializable, since this transformation is serialized and sent to workers nodes.

  5. abstract def flatMapUnique[K2, V2](flatMapFn: (K, V) ⇒ Iterable[(K2, V2)], partitioning: PartitioningStrategy[K2], id: Id = autoIdAssigner("flatMapUnique"), configOverride: Override = DeltaSetConfig.noOverride)(implicit arg0: ClassTag[K2], arg1: ClassTag[V2]): DeltaSet[K2, V2]

    Applies a function to all key-value pairs in this DeltaSet and partitions the result.

    Applies a function to all key-value pairs in this DeltaSet and partitions the result. This function fails if applying flatMapFn to all key-values in the DeltaSet produces duplicate keys.

    If the partitioning is not PreservesPartitioning, this transformation will shuffle data. In particular, all key-value pairs that belong to a different partition after the key transformation may have to be transferred between executors.

    K2

    The type of output key.

    V2

    The type of output value.

    flatMapFn

    The function applied to the key-value pairs in the DeltaSet.

    partitioning

    Either a partitioner to use, or PreservesPartitioning to indicate that the mapped keys are always in the same Spark partition as the original keys. This will be checked at run-time.

    id

    Deprecated.

    configOverride

    Deprecated.

    returns

    A DeltaSet that represents the result of the transformation.

    Note

    All arguments must be serializable, since this transformation is serialized and sent to workers nodes.

  6. abstract def join[W](other: DeltaSet[K, W], id: Id = autoIdAssigner("join"), configOverride: Override = DeltaSetConfig.noOverride)(implicit arg0: ClassTag[W]): DeltaSet[K, (V, W)]

    Computes the inner join of this DeltaSet with another DeltaSet.

    Computes the inner join of this DeltaSet with another DeltaSet. For each key contained in both DeltaSets, the result contains the pair of values associated with the key in each of the DeltaSets.

    This function does not shuffle any data any data.

    Both DeltaSets must be partitioned by the same partitioner. If they are not, you must explicitly repartition one or both DeltaSets.

    W

    The value type of the other DeltaSet.

    other

    The other DeltaSet.

    id

    Deprecated.

    configOverride

    Deprecated.

    returns

    A DeltaSet that represents the result of the join.

    Note

    All arguments must be serializable, since this transformation is serialized and sent to workers nodes.

  7. abstract def mapGroup[K2, V2](mapFn: (K, V) ⇒ (K2, V2), partitioning: PartitioningStrategy[K2], id: Id = autoIdAssigner("mapGroup"), configOverride: Override = DeltaSetConfig.noOverride)(implicit arg0: ClassTag[K2], arg1: ClassTag[V2]): DeltaSet[K2, Iterable[V2]]

    Applies a function to all key-value pairs in this DeltaSet, partitions the result, and then groups all output key-value pairs by key.

    Applies a function to all key-value pairs in this DeltaSet, partitions the result, and then groups all output key-value pairs by key.

    If the partitioning is not PreservesPartitioning, this transformation will shuffle data. In particular, all key-value pairs that belong to a different partition after the key transformation may have to be transferred between executors.

    K2

    The type of output key.

    V2

    The type of output value.

    mapFn

    The function applied to the key-value pairs in the DeltaSet.

    partitioning

    Either a partitioner to use, or PreservesPartitioning to indicate that the mapped keys are always in the same Spark partition as the original keys. This will be checked at run-time.

    id

    Deprecated.

    configOverride

    Deprecated.

    returns

    A DeltaSet that represents the result of the transformation.

    Note

    All arguments must be serializable, since this transformation is serialized and sent to workers nodes.

  8. abstract def mapReduce[K2, V2](mapFn: (K, V) ⇒ (K2, V2), reduceFn: (V2, V2) ⇒ V2, partitioning: PartitioningStrategy[K2], id: Id = autoIdAssigner("mapReduce"), configOverride: Override = DeltaSetConfig.noOverride)(implicit arg0: ClassTag[K2], arg1: ClassTag[V2]): DeltaSet[K2, V2]

    Applies a function to all key-value pairs in this DeltaSet, partitions the result, and then reduces all values with the same key to a single value.

    Applies a function to all key-value pairs in this DeltaSet, partitions the result, and then reduces all values with the same key to a single value.

    If the partitioning is not PreservesPartitioning, this transformation will shuffle data. In particular, all key-value pairs that belong to a different partition after the key transformation may have to be transferred between executors.

    K2

    The type of output key.

    V2

    The type of output value.

    mapFn

    The function applied to the key-value pairs in the DeltaSet.

    reduceFn

    The associative and commutative reduce function.

    partitioning

    Either a partitioner to use, or PreservesPartitioning to indicate that the mapped keys are always in the same Spark partition as the original keys. This will be checked at run-time.

    id

    Deprecated.

    configOverride

    Deprecated.

    returns

    A DeltaSet that represents the result of the transformation.

    Note

    All arguments must be serializable, since this transformation is serialized and sent to workers nodes.

  9. abstract def mapUnique[K2, V2](mapFn: (K, V) ⇒ (K2, V2), partitioning: PartitioningStrategy[K2], id: Id = autoIdAssigner("mapUnique"), configOverride: Override = DeltaSetConfig.noOverride)(implicit arg0: ClassTag[K2], arg1: ClassTag[V2]): DeltaSet[K2, V2]

    Applies a function to all key-value pairs in this DeltaSet and partitions the result.

    Applies a function to all key-value pairs in this DeltaSet and partitions the result. This function fails if applying mapFn to all key-values in the DeltaSet produces duplicate keys.

    If the partitioning is not PreservesPartitioning, this transformation will shuffle data. In particular, all key-value pairs that belong to a different partition after the key transformation may have to be transferred between executors.

    K2

    The type of output key.

    V2

    The type of output value.

    mapFn

    The function applied to the key-value pairs of the DeltaSet.

    partitioning

    Either a partitioner to use, or PreservesPartitioning to indicate that the mapped keys are always in the same Spark partition as the original keys. This will be checked at run-time.

    id

    Deprecated.

    configOverride

    Deprecated.

    returns

    A DeltaSet that represents the result of the transformation.

    Note

    All arguments must be serializable, since this transformation is serialized and sent to workers nodes.

  10. abstract def mapValuesWithResolver[V2](mapFn: (Resolver, K, V) ⇒ V2, strategies: Seq[ResolutionStrategy[K, V]], referencePartitioner: Partitioner[Key], id: Id = ..., configOverride: Override = DeltaSetConfig.noOverride)(implicit arg0: ClassTag[V2]): DeltaSet[K, V2]

    Transforms the DeltaSet, the _subjects_, while having access to a Resolver that allows finding the metadata for related partitions, the _references_.

    Transforms the DeltaSet, the _subjects_, while having access to a Resolver that allows finding the metadata for related partitions, the _references_.

    mapFn

    The function to transform the values of the subject DeltaSet. Takes the Resolver, the key, and the original value of the subject as input; returns the new value of the subject.

    strategies

    Defines a list of strategies to find the metadata corresponding to a reference key. See ResolutionStrategy.

    referencePartitioner

    The partitioner used for references.

    id

    Deprecated.

    configOverride

    Deprecated.

    Note

    The mapFn must be scala.Serializable as it is copied to workers and run inside Spark map functions.

Concrete Value Members

  1. final def !=(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  4. final def asInstanceOf[T0]: T0
    Definition Classes
    Any
  5. def clone(): AnyRef
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... ) @native()
  6. final def eq(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  7. def equals(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  8. def finalize(): Unit
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  9. final def getClass(): Class[_]
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  10. def hashCode(): Int
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  11. final def isInstanceOf[T0]: Boolean
    Definition Classes
    Any
  12. final def ne(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  13. final def notify(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  14. final def notifyAll(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  15. final def synchronized[T0](arg0: ⇒ T0): T0
    Definition Classes
    AnyRef
  16. def toString(): String
    Definition Classes
    AnyRef → Any
  17. final def wait(): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  18. final def wait(arg0: Long, arg1: Int): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  19. final def wait(arg0: Long): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... ) @native()

Inherited from AnyRef

Inherited from Any

Ungrouped