Packages

package here

Ordering
  1. Alphabetic
Visibility
  1. Public
  2. All

Type Members

  1. class PartitionerAwareUnionRDD2[T] extends PartitionerAwareUnionRDD[T]

    Version of PartitionerAwareUnionRDD that fixes the exponential behaviour of the original PartitionerAwareUnionRDD.getPreferredLocations.

    Version of PartitionerAwareUnionRDD that fixes the exponential behaviour of the original PartitionerAwareUnionRDD.getPreferredLocations.

    The original implementation, in an attempt to optimize which preferred locations are picked, delegates the SparkContext (hence the DAGScheduler) to find the preferred locations of the parent RDDs. But doing so makes the DAGScheduler recursively walk the RDD graph until a preferred location is found. Since memoization is in place for a single call of DAGScheduler.getPreferredLocs but not across different calls of the method, if in a DAG several PartitionerAwareUnionRDD are chained together, getPreferredLocations will, for each of the upstream PartitionerAwareUnionRDDs, visit the sub-graph again, recursively. This can soon become very slow (and can block the driver for hours). In this fix we give up on trying to use the current preferred locations, and use the static ones instead.

    The issue is described in https://issues.apache.org/jira/browse/SPARK-33356.

    This class uses internal APIs marked as "developer API" so special attention should be paid when the software is updated to a newer version of Spark. As of today (latest Spark version 3.4.1) PartitionerAwareUnionRDD hasn't undergone any noticeable change since v2.4.7.

Value Members

  1. object Extensions

    Extensions and fixes not yet ported to Spark.

Ungrouped