class PartitionerAwareUnionRDD2[T] extends PartitionerAwareUnionRDD[T]
Version of PartitionerAwareUnionRDD that fixes the exponential behaviour of the original PartitionerAwareUnionRDD.getPreferredLocations.
The original implementation, in an attempt to optimize which preferred locations are picked, delegates the SparkContext (hence the DAGScheduler) to find the preferred locations of the parent RDDs. But doing so makes the DAGScheduler recursively walk the RDD graph until a preferred location is found. Since memoization is in place for a single call of DAGScheduler.getPreferredLocs but not across different calls of the method, if in a DAG several PartitionerAwareUnionRDD are chained together, getPreferredLocations will, for each of the upstream PartitionerAwareUnionRDDs, visit the sub-graph again, recursively. This can soon become very slow (and can block the driver for hours). In this fix we give up on trying to use the current preferred locations, and use the static ones instead.
The issue is described in https://issues.apache.org/jira/browse/SPARK-33356.
This class uses internal APIs marked as "developer API" so special attention should be paid when
the software is updated to a newer version of Spark. As of today (latest Spark version 3.4.1)
PartitionerAwareUnionRDD hasn't undergone any noticeable change since v2.4.7.
- Alphabetic
 - By Inheritance
 
- PartitionerAwareUnionRDD2
 - PartitionerAwareUnionRDD
 - RDD
 - Logging
 - Serializable
 - AnyRef
 - Any
 
- Hide All
 - Show All
 
- Public
 - Protected
 
Instance Constructors
-  new PartitionerAwareUnionRDD2(sc: SparkContext, rdds: Seq[RDD[T]])(implicit arg0: ClassTag[T])
 
Value Members
-   final  def !=(arg0: Any): Boolean
- Definition Classes
 - AnyRef → Any
 
 -   final  def ##: Int
- Definition Classes
 - AnyRef → Any
 
 -    def ++(other: RDD[T]): RDD[T]
- Definition Classes
 - RDD
 
 -   final  def ==(arg0: Any): Boolean
- Definition Classes
 - AnyRef → Any
 
 -    def aggregate[U](zeroValue: U)(seqOp: (U, T) => U, combOp: (U, U) => U)(implicit arg0: ClassTag[U]): U
- Definition Classes
 - RDD
 
 -   final  def asInstanceOf[T0]: T0
- Definition Classes
 - Any
 
 -    def barrier(): RDDBarrier[T]
- Definition Classes
 - RDD
 - Annotations
 - @Since("2.4.0") @Experimental()
 
 -    def cache(): PartitionerAwareUnionRDD2.this.type
- Definition Classes
 - RDD
 
 -    def cartesian[U](other: RDD[U])(implicit arg0: ClassTag[U]): RDD[(T, U)]
- Definition Classes
 - RDD
 
 -    def checkpoint(): Unit
- Definition Classes
 - RDD
 
 -    def cleanShuffleDependencies(blocking: Boolean): Unit
- Definition Classes
 - RDD
 - Annotations
 - @Since("3.1.0") @DeveloperApi()
 
 -    def clearDependencies(): Unit
- Definition Classes
 - PartitionerAwareUnionRDD → RDD
 
 -    def clone(): AnyRef
- Attributes
 - protected[lang]
 - Definition Classes
 - AnyRef
 - Annotations
 - @throws(classOf[java.lang.CloneNotSupportedException]) @IntrinsicCandidate() @native()
 
 -    def coalesce(numPartitions: Int, shuffle: Boolean, partitionCoalescer: Option[PartitionCoalescer])(implicit ord: Ordering[T]): RDD[T]
- Definition Classes
 - RDD
 
 -    def collect[U](f: PartialFunction[T, U])(implicit arg0: ClassTag[U]): RDD[U]
- Definition Classes
 - RDD
 
 -    def collect(): Array[T]
- Definition Classes
 - RDD
 
 -    def compute(s: Partition, context: TaskContext): Iterator[T]
- Definition Classes
 - PartitionerAwareUnionRDD → RDD
 
 -    def context: SparkContext
- Definition Classes
 - RDD
 
 -    def count(): Long
- Definition Classes
 - RDD
 
 -    def countApprox(timeout: Long, confidence: Double): PartialResult[BoundedDouble]
- Definition Classes
 - RDD
 
 -    def countApproxDistinct(relativeSD: Double): Long
- Definition Classes
 - RDD
 
 -    def countApproxDistinct(p: Int, sp: Int): Long
- Definition Classes
 - RDD
 
 -    def countByValue()(implicit ord: Ordering[T]): Map[T, Long]
- Definition Classes
 - RDD
 
 -    def countByValueApprox(timeout: Long, confidence: Double)(implicit ord: Ordering[T]): PartialResult[Map[T, BoundedDouble]]
- Definition Classes
 - RDD
 
 -   final  def dependencies: Seq[Dependency[_]]
- Definition Classes
 - RDD
 
 -    def distinct(): RDD[T]
- Definition Classes
 - RDD
 
 -    def distinct(numPartitions: Int)(implicit ord: Ordering[T]): RDD[T]
- Definition Classes
 - RDD
 
 -   final  def eq(arg0: AnyRef): Boolean
- Definition Classes
 - AnyRef
 
 -    def equals(arg0: AnyRef): Boolean
- Definition Classes
 - AnyRef → Any
 
 -    def filter(f: (T) => Boolean): RDD[T]
- Definition Classes
 - RDD
 
 -    def first(): T
- Definition Classes
 - RDD
 
 -    def firstParent[U](implicit arg0: ClassTag[U]): RDD[U]
- Attributes
 - protected[spark]
 - Definition Classes
 - RDD
 
 -    def flatMap[U](f: (T) => TraversableOnce[U])(implicit arg0: ClassTag[U]): RDD[U]
- Definition Classes
 - RDD
 
 -    def fold(zeroValue: T)(op: (T, T) => T): T
- Definition Classes
 - RDD
 
 -    def foreach(f: (T) => Unit): Unit
- Definition Classes
 - RDD
 
 -    def foreachPartition(f: (Iterator[T]) => Unit): Unit
- Definition Classes
 - RDD
 
 -    def getCheckpointFile: Option[String]
- Definition Classes
 - RDD
 
 -   final  def getClass(): Class[_ <: AnyRef]
- Definition Classes
 - AnyRef → Any
 - Annotations
 - @IntrinsicCandidate() @native()
 
 -    def getDependencies: Seq[Dependency[_]]
- Attributes
 - protected
 - Definition Classes
 - RDD
 
 -   final  def getNumPartitions: Int
- Definition Classes
 - RDD
 - Annotations
 - @Since("1.6.0")
 
 -    def getOutputDeterministicLevel: rdd.DeterministicLevel.Value
- Attributes
 - protected
 - Definition Classes
 - RDD
 - Annotations
 - @DeveloperApi()
 
 -    def getPartitions: Array[Partition]
- Definition Classes
 - PartitionerAwareUnionRDD → RDD
 
 -    def getPreferredLocations(s: Partition): Seq[String]
- Definition Classes
 - PartitionerAwareUnionRDD2 → PartitionerAwareUnionRDD → RDD
 
 -    def getResourceProfile(): ResourceProfile
- Definition Classes
 - RDD
 - Annotations
 - @Since("3.1.0") @Experimental()
 
 -    def getStorageLevel: StorageLevel
- Definition Classes
 - RDD
 
 -    def glom(): RDD[Array[T]]
- Definition Classes
 - RDD
 
 -    def groupBy[K](f: (T) => K, p: Partitioner)(implicit kt: ClassTag[K], ord: Ordering[K]): RDD[(K, Iterable[T])]
- Definition Classes
 - RDD
 
 -    def groupBy[K](f: (T) => K, numPartitions: Int)(implicit kt: ClassTag[K]): RDD[(K, Iterable[T])]
- Definition Classes
 - RDD
 
 -    def groupBy[K](f: (T) => K)(implicit kt: ClassTag[K]): RDD[(K, Iterable[T])]
- Definition Classes
 - RDD
 
 -    def hashCode(): Int
- Definition Classes
 - AnyRef → Any
 - Annotations
 - @IntrinsicCandidate() @native()
 
 -    val id: Int
- Definition Classes
 - RDD
 
 -    def initializeLogIfNecessary(isInterpreter: Boolean, silent: Boolean): Boolean
- Attributes
 - protected
 - Definition Classes
 - Logging
 
 -    def initializeLogIfNecessary(isInterpreter: Boolean): Unit
- Attributes
 - protected
 - Definition Classes
 - Logging
 
 -    def intersection(other: RDD[T], numPartitions: Int): RDD[T]
- Definition Classes
 - RDD
 
 -    def intersection(other: RDD[T], partitioner: Partitioner)(implicit ord: Ordering[T]): RDD[T]
- Definition Classes
 - RDD
 
 -    def intersection(other: RDD[T]): RDD[T]
- Definition Classes
 - RDD
 
 -    lazy val isBarrier_: Boolean
- Attributes
 - protected
 - Definition Classes
 - RDD
 - Annotations
 - @transient()
 
 -    def isCheckpointed: Boolean
- Definition Classes
 - RDD
 
 -    def isEmpty(): Boolean
- Definition Classes
 - RDD
 
 -   final  def isInstanceOf[T0]: Boolean
- Definition Classes
 - Any
 
 -    def isTraceEnabled(): Boolean
- Attributes
 - protected
 - Definition Classes
 - Logging
 
 -   final  def iterator(split: Partition, context: TaskContext): Iterator[T]
- Definition Classes
 - RDD
 
 -    def keyBy[K](f: (T) => K): RDD[(K, T)]
- Definition Classes
 - RDD
 
 -    def localCheckpoint(): PartitionerAwareUnionRDD2.this.type
- Definition Classes
 - RDD
 
 -    def log: Logger
- Attributes
 - protected
 - Definition Classes
 - Logging
 
 -    def logDebug(msg: => String, throwable: Throwable): Unit
- Attributes
 - protected
 - Definition Classes
 - Logging
 
 -    def logDebug(msg: => String): Unit
- Attributes
 - protected
 - Definition Classes
 - Logging
 
 -    def logError(msg: => String, throwable: Throwable): Unit
- Attributes
 - protected
 - Definition Classes
 - Logging
 
 -    def logError(msg: => String): Unit
- Attributes
 - protected
 - Definition Classes
 - Logging
 
 -    def logInfo(msg: => String, throwable: Throwable): Unit
- Attributes
 - protected
 - Definition Classes
 - Logging
 
 -    def logInfo(msg: => String): Unit
- Attributes
 - protected
 - Definition Classes
 - Logging
 
 -    def logName: String
- Attributes
 - protected
 - Definition Classes
 - Logging
 
 -    def logTrace(msg: => String, throwable: Throwable): Unit
- Attributes
 - protected
 - Definition Classes
 - Logging
 
 -    def logTrace(msg: => String): Unit
- Attributes
 - protected
 - Definition Classes
 - Logging
 
 -    def logWarning(msg: => String, throwable: Throwable): Unit
- Attributes
 - protected
 - Definition Classes
 - Logging
 
 -    def logWarning(msg: => String): Unit
- Attributes
 - protected
 - Definition Classes
 - Logging
 
 -    def map[U](f: (T) => U)(implicit arg0: ClassTag[U]): RDD[U]
- Definition Classes
 - RDD
 
 -    def mapPartitions[U](f: (Iterator[T]) => Iterator[U], preservesPartitioning: Boolean)(implicit arg0: ClassTag[U]): RDD[U]
- Definition Classes
 - RDD
 
 -    def mapPartitionsWithIndex[U](f: (Int, Iterator[T]) => Iterator[U], preservesPartitioning: Boolean)(implicit arg0: ClassTag[U]): RDD[U]
- Definition Classes
 - RDD
 
 -    def max()(implicit ord: Ordering[T]): T
- Definition Classes
 - RDD
 
 -    def min()(implicit ord: Ordering[T]): T
- Definition Classes
 - RDD
 
 -    var name: String
- Definition Classes
 - RDD
 
 -   final  def ne(arg0: AnyRef): Boolean
- Definition Classes
 - AnyRef
 
 -   final  def notify(): Unit
- Definition Classes
 - AnyRef
 - Annotations
 - @IntrinsicCandidate() @native()
 
 -   final  def notifyAll(): Unit
- Definition Classes
 - AnyRef
 - Annotations
 - @IntrinsicCandidate() @native()
 
 -    def parent[U](j: Int)(implicit arg0: ClassTag[U]): RDD[U]
- Attributes
 - protected[spark]
 - Definition Classes
 - RDD
 
 -    val partitioner: Option[Partitioner]
- Definition Classes
 - PartitionerAwareUnionRDD → RDD
 
 -   final  def partitions: Array[Partition]
- Definition Classes
 - RDD
 
 -    def persist(): PartitionerAwareUnionRDD2.this.type
- Definition Classes
 - RDD
 
 -    def persist(newLevel: StorageLevel): PartitionerAwareUnionRDD2.this.type
- Definition Classes
 - RDD
 
 -    def pipe(command: Seq[String], env: Map[String, String], printPipeContext: ((String) => Unit) => Unit, printRDDElement: (T, (String) => Unit) => Unit, separateWorkingDir: Boolean, bufferSize: Int, encoding: String): RDD[String]
- Definition Classes
 - RDD
 
 -    def pipe(command: String, env: Map[String, String]): RDD[String]
- Definition Classes
 - RDD
 
 -    def pipe(command: String): RDD[String]
- Definition Classes
 - RDD
 
 -   final  def preferredLocations(split: Partition): Seq[String]
- Definition Classes
 - RDD
 
 -    def randomSplit(weights: Array[Double], seed: Long): Array[RDD[T]]
- Definition Classes
 - RDD
 
 -    var rdds: Seq[RDD[T]]
- Definition Classes
 - PartitionerAwareUnionRDD
 
 -    def reduce(f: (T, T) => T): T
- Definition Classes
 - RDD
 
 -    def repartition(numPartitions: Int)(implicit ord: Ordering[T]): RDD[T]
- Definition Classes
 - RDD
 
 -    def sample(withReplacement: Boolean, fraction: Double, seed: Long): RDD[T]
- Definition Classes
 - RDD
 
 -    def saveAsObjectFile(path: String): Unit
- Definition Classes
 - RDD
 
 -    def saveAsTextFile(path: String, codec: Class[_ <: CompressionCodec]): Unit
- Definition Classes
 - RDD
 
 -    def saveAsTextFile(path: String): Unit
- Definition Classes
 - RDD
 
 -    def setName(_name: String): PartitionerAwareUnionRDD2.this.type
- Definition Classes
 - RDD
 
 -    def sortBy[K](f: (T) => K, ascending: Boolean, numPartitions: Int)(implicit ord: Ordering[K], ctag: ClassTag[K]): RDD[T]
- Definition Classes
 - RDD
 
 -    def sparkContext: SparkContext
- Definition Classes
 - RDD
 
 -    def subtract(other: RDD[T], p: Partitioner)(implicit ord: Ordering[T]): RDD[T]
- Definition Classes
 - RDD
 
 -    def subtract(other: RDD[T], numPartitions: Int): RDD[T]
- Definition Classes
 - RDD
 
 -    def subtract(other: RDD[T]): RDD[T]
- Definition Classes
 - RDD
 
 -   final  def synchronized[T0](arg0: => T0): T0
- Definition Classes
 - AnyRef
 
 -    def take(num: Int): Array[T]
- Definition Classes
 - RDD
 
 -    def takeOrdered(num: Int)(implicit ord: Ordering[T]): Array[T]
- Definition Classes
 - RDD
 
 -    def takeSample(withReplacement: Boolean, num: Int, seed: Long): Array[T]
- Definition Classes
 - RDD
 
 -    def toDebugString: String
- Definition Classes
 - RDD
 
 -    def toJavaRDD(): JavaRDD[T]
- Definition Classes
 - RDD
 
 -    def toLocalIterator: Iterator[T]
- Definition Classes
 - RDD
 
 -    def toString(): String
- Definition Classes
 - RDD → AnyRef → Any
 
 -    def top(num: Int)(implicit ord: Ordering[T]): Array[T]
- Definition Classes
 - RDD
 
 -    def treeAggregate[U](zeroValue: U, seqOp: (U, T) => U, combOp: (U, U) => U, depth: Int, finalAggregateOnExecutor: Boolean)(implicit arg0: ClassTag[U]): U
- Definition Classes
 - RDD
 
 -    def treeAggregate[U](zeroValue: U)(seqOp: (U, T) => U, combOp: (U, U) => U, depth: Int)(implicit arg0: ClassTag[U]): U
- Definition Classes
 - RDD
 
 -    def treeReduce(f: (T, T) => T, depth: Int): T
- Definition Classes
 - RDD
 
 -    def union(other: RDD[T]): RDD[T]
- Definition Classes
 - RDD
 
 -    def unpersist(blocking: Boolean): PartitionerAwareUnionRDD2.this.type
- Definition Classes
 - RDD
 
 -   final  def wait(arg0: Long, arg1: Int): Unit
- Definition Classes
 - AnyRef
 - Annotations
 - @throws(classOf[java.lang.InterruptedException])
 
 -   final  def wait(arg0: Long): Unit
- Definition Classes
 - AnyRef
 - Annotations
 - @throws(classOf[java.lang.InterruptedException]) @native()
 
 -   final  def wait(): Unit
- Definition Classes
 - AnyRef
 - Annotations
 - @throws(classOf[java.lang.InterruptedException])
 
 -    def withResources(rp: ResourceProfile): PartitionerAwareUnionRDD2.this.type
- Definition Classes
 - RDD
 - Annotations
 - @Since("3.1.0") @Experimental()
 
 -    def zip[U](other: RDD[U])(implicit arg0: ClassTag[U]): RDD[(T, U)]
- Definition Classes
 - RDD
 
 -    def zipPartitions[B, C, D, V](rdd2: RDD[B], rdd3: RDD[C], rdd4: RDD[D])(f: (Iterator[T], Iterator[B], Iterator[C], Iterator[D]) => Iterator[V])(implicit arg0: ClassTag[B], arg1: ClassTag[C], arg2: ClassTag[D], arg3: ClassTag[V]): RDD[V]
- Definition Classes
 - RDD
 
 -    def zipPartitions[B, C, D, V](rdd2: RDD[B], rdd3: RDD[C], rdd4: RDD[D], preservesPartitioning: Boolean)(f: (Iterator[T], Iterator[B], Iterator[C], Iterator[D]) => Iterator[V])(implicit arg0: ClassTag[B], arg1: ClassTag[C], arg2: ClassTag[D], arg3: ClassTag[V]): RDD[V]
- Definition Classes
 - RDD
 
 -    def zipPartitions[B, C, V](rdd2: RDD[B], rdd3: RDD[C])(f: (Iterator[T], Iterator[B], Iterator[C]) => Iterator[V])(implicit arg0: ClassTag[B], arg1: ClassTag[C], arg2: ClassTag[V]): RDD[V]
- Definition Classes
 - RDD
 
 -    def zipPartitions[B, C, V](rdd2: RDD[B], rdd3: RDD[C], preservesPartitioning: Boolean)(f: (Iterator[T], Iterator[B], Iterator[C]) => Iterator[V])(implicit arg0: ClassTag[B], arg1: ClassTag[C], arg2: ClassTag[V]): RDD[V]
- Definition Classes
 - RDD
 
 -    def zipPartitions[B, V](rdd2: RDD[B])(f: (Iterator[T], Iterator[B]) => Iterator[V])(implicit arg0: ClassTag[B], arg1: ClassTag[V]): RDD[V]
- Definition Classes
 - RDD
 
 -    def zipPartitions[B, V](rdd2: RDD[B], preservesPartitioning: Boolean)(f: (Iterator[T], Iterator[B]) => Iterator[V])(implicit arg0: ClassTag[B], arg1: ClassTag[V]): RDD[V]
- Definition Classes
 - RDD
 
 -    def zipWithIndex(): RDD[(T, Long)]
- Definition Classes
 - RDD
 
 -    def zipWithUniqueId(): RDD[(T, Long)]
- Definition Classes
 - RDD
 
 
Deprecated Value Members
-    def finalize(): Unit
- Attributes
 - protected[lang]
 - Definition Classes
 - AnyRef
 - Annotations
 - @throws(classOf[java.lang.Throwable]) @Deprecated
 - Deprecated
 (Since version 9)