Class DistributedClustering<GeolocatedItem>

java.lang.Object
com.here.platform.location.spark.DistributedClustering<GeolocatedItem>
All Implemented Interfaces:
Serializable, scala.Serializable

public class DistributedClustering<GeolocatedItem> extends Object implements scala.Serializable
Performs distributed clustering on a given collection of GeolocatedItem events.

The clustering is performed using a DBSCAN algorithm on each tile with a buffer zone of specified size. The clusters whose center lies on the buffer zone are rejected (as they are included in the output for a different tile).

Limitations:

- Due to the distribution scheme, this implementation returns incorrect results if there are clusters larger than the buffer zone. - DBSCAN identifies clusters by density, so the algorithm works best if all clusters have approximately the same spatial density.

param: neighborhoodRadiusInMeters The radius within which the algorithm searches for the neighboring events param: minNeighbors Minimal number of neighbors required to add the event to a cluster param: partitionBufferZoneInMeters Width of the zone that extends around each tile in order to support clusters spanning across the tile borders The buffer zone should be set to a larger size than the expected size of clusters param: partitionTileLevel The level of tiles used to distribute the clustering

See Also:
  • Constructor Details

  • Method Details

    • kryoClasses

      public static <GC> scala.collection.Seq<Class<?>> kryoClasses()
    • apply

      public org.apache.spark.rdd.RDD<Cluster<GeolocatedItem>> apply(org.apache.spark.rdd.RDD<GeolocatedItem> events, scala.reflect.ClassTag<GeolocatedItem> classTag)
      Apply clustering to the given events.
    • getExtendedTileKeysForPoint

      public scala.collection.Iterable<TileId> getExtendedTileKeysForPoint(GeolocatedItem coords)
    • getNeighboringIndices

      public scala.collection.Iterable<Object> getNeighboringIndices(scala.Tuple2<GeolocatedItem,Object>[] events, int i)
    • expandCluster

      public final scala.collection.immutable.List<GeolocatedItem> expandCluster(scala.collection.immutable.List<GeolocatedItem> currentCluster, boolean[] visited, scala.Tuple2<GeolocatedItem,Object>[] events, scala.collection.immutable.List<Object> toVisit)
    • clusterEventsOnTile

      public scala.Tuple2<TileId,scala.collection.Iterable<Cluster<GeolocatedItem>>> clusterEventsOnTile(scala.Tuple2<TileId,scala.collection.Iterable<GeolocatedItem>> tile)
    • removeClustersOnTileExtension

      public scala.collection.Iterable<Cluster<GeolocatedItem>> removeClustersOnTileExtension(scala.Option<org.apache.spark.util.LongAccumulator> clustersCount, scala.Tuple2<TileId,scala.collection.Iterable<Cluster<GeolocatedItem>>> tile)