Class DistributedClustering<GeolocatedItem>
- All Implemented Interfaces:
Serializable
,scala.Serializable
The clustering is performed using a DBSCAN algorithm on each tile with a buffer zone of specified size. The clusters whose center lies on the buffer zone are rejected (as they are included in the output for a different tile).
Limitations:
- Due to the distribution scheme, this implementation returns incorrect results if there are clusters larger than the buffer zone. - DBSCAN identifies clusters by density, so the algorithm works best if all clusters have approximately the same spatial density.
param: neighborhoodRadiusInMeters The radius within which the algorithm searches for the neighboring events param: minNeighbors Minimal number of neighbors required to add the event to a cluster param: partitionBufferZoneInMeters Width of the zone that extends around each tile in order to support clusters spanning across the tile borders The buffer zone should be set to a larger size than the expected size of clusters param: partitionTileLevel The level of tiles used to distribute the clustering
- See Also:
-
Constructor Summary
ConstructorsConstructorDescriptionDistributedClustering
(double neighborhoodRadiusInMeters, int minNeighbors, double partitionBufferZoneInMeters, HereTileLevel partitionTileLevel, GeoCoordinateOperations<GeolocatedItem> evidence$2) -
Method Summary
Modifier and TypeMethodDescriptionorg.apache.spark.rdd.RDD<Cluster<GeolocatedItem>>
apply
(org.apache.spark.rdd.RDD<GeolocatedItem> events, scala.reflect.ClassTag<GeolocatedItem> classTag) Apply clustering to the given events.scala.Tuple2<TileId,
scala.collection.Iterable<Cluster<GeolocatedItem>>> clusterEventsOnTile
(scala.Tuple2<TileId, scala.collection.Iterable<GeolocatedItem>> tile) final scala.collection.immutable.List<GeolocatedItem>
expandCluster
(scala.collection.immutable.List<GeolocatedItem> currentCluster, boolean[] visited, scala.Tuple2<GeolocatedItem, Object>[] events, scala.collection.immutable.List<Object> toVisit) scala.collection.Iterable<TileId>
scala.collection.Iterable<Object>
getNeighboringIndices
(scala.Tuple2<GeolocatedItem, Object>[] events, int i) static <GC> scala.collection.Seq<Class<?>>
scala.collection.Iterable<Cluster<GeolocatedItem>>
removeClustersOnTileExtension
(scala.Option<org.apache.spark.util.LongAccumulator> clustersCount, scala.Tuple2<TileId, scala.collection.Iterable<Cluster<GeolocatedItem>>> tile)
-
Constructor Details
-
DistributedClustering
public DistributedClustering(double neighborhoodRadiusInMeters, int minNeighbors, double partitionBufferZoneInMeters, HereTileLevel partitionTileLevel, GeoCoordinateOperations<GeolocatedItem> evidence$2)
-
-
Method Details
-
kryoClasses
-
apply
public org.apache.spark.rdd.RDD<Cluster<GeolocatedItem>> apply(org.apache.spark.rdd.RDD<GeolocatedItem> events, scala.reflect.ClassTag<GeolocatedItem> classTag) Apply clustering to the given events. -
getExtendedTileKeysForPoint
-
getNeighboringIndices
public scala.collection.Iterable<Object> getNeighboringIndices(scala.Tuple2<GeolocatedItem, Object>[] events, int i) -
expandCluster
public final scala.collection.immutable.List<GeolocatedItem> expandCluster(scala.collection.immutable.List<GeolocatedItem> currentCluster, boolean[] visited, scala.Tuple2<GeolocatedItem, Object>[] events, scala.collection.immutable.List<Object> toVisit) -
clusterEventsOnTile
public scala.Tuple2<TileId,scala.collection.Iterable<Cluster<GeolocatedItem>>> clusterEventsOnTile(scala.Tuple2<TileId, scala.collection.Iterable<GeolocatedItem>> tile) -
removeClustersOnTileExtension
public scala.collection.Iterable<Cluster<GeolocatedItem>> removeClustersOnTileExtension(scala.Option<org.apache.spark.util.LongAccumulator> clustersCount, scala.Tuple2<TileId, scala.collection.Iterable<Cluster<GeolocatedItem>>> tile)
-