Interface LayerDataFrameReader

All Known Subinterfaces:
IndexDataFrameReader, InteractiveMapDataFrameReader, VersionedDataFrameReader

public interface LayerDataFrameReader
Custom Spark DataFrameReader for querying data from a given layer.

The layer type will be inferred from the layer configuration. Therefore the api to read from index layer or versioned layer is the same. To read from an index or versioned layer, your application must perform the following operations:

  1. Create an instance of a SparkSession.
  2. Use LayerDataFrameReader.SparkSessionExt.readLayer to create a LayerDataFrameReader.
  3. Call query to specify the query.
  4. Call load to create a DataFrame that contains the data. The method will infer the data format from the layer content type.

Below is an example written in Scala that demonstrates how to query data from an index layer:


 import com.here.platform.data.client.spark.LayerDataFrameReader.SparkSessionExt

 val spark =
   SparkSession
     .builder()
     .appName(getClass.getSimpleName)
     .master("local[*]")
     .getOrCreate()

 val dataFrame: DataFrame = spark
     .readLayer(catalogHrn, indexLayer)
     .query(
         "tileId=INBOUNDINGBOX=(23.648524, 22.689013, 62.284241, 60.218811) and eventType==SignRecognition")
     .load()
 

Java developers should use com.here.platform.data.client.spark.javadsl.JavaLayerDataFrameReader#readLayer instead of spark.readLayer:


   Dataset<Row> df =
         JavaLayerDataFrameReader.create(spark)
             .readLayer(catalogHrn, layerId)
             .query(
                 "tileId=INBOUNDINGBOX=(23.648524, 22.689013, 62.284241, 60.218811) and eventType==SignRecognition")
             .load();
 

Note:
If the load method cannot correctly infer the data format from the layer content type, the application can enforce the data format by previously calling the format method.
  • Method Details

    • option

      LayerDataFrameReader option(String key, double value)
      Adds an input option for the underlying data source.
    • option

      LayerDataFrameReader option(String key, long value)
      Adds an input option for the underlying data source.
    • option

      LayerDataFrameReader option(String key, boolean value)
      Adds an input option for the underlying data source.
    • option

      LayerDataFrameReader option(String key, String value)
      Adds an input option for the underlying data source.
    • option

      LayerDataFrameReader option(String key, Enum<?> value)
      Adds an input option for the underlying data source.
    • format

      LayerDataFrameReader format(String source)
      Specifies the format of the data stored in the layer.
    • schema

      LayerDataFrameReader schema(org.apache.spark.sql.types.StructType schema)
      Specifies the schema of the data stored in the layer. Some data formats such as Apache Avro can infer the schema automatically from the data. By specifying the schema here, the underlying data source can skip the schema inference step, and thus speed up data loading.
    • query

      Specifies the query to use when querying the layer.

      Parameters:
      query - Query string to retrieve layer data. Format of query should follow RSQL. See https://github.com/jirutka/rsql-parser
    • queryMetadata

      LayerDataFrameReader queryMetadata(String query)
      Specifies the query to use when querying the layer partitions metadata.

      Parameters:
      query - Query string to retrieve layer partitions metadata. Format of query should follow RSQL. See https://github.com/jirutka/rsql-parser
    • load

      org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> load()
      Retrieve the data in the user defined format (see format(java.lang.String)) which satisfies the provided query (see query(java.lang.String)).

      If no format is set, the load() method infers the input data source format from the layer content type. For example, if the layer content type is application/x-parquet, the load() method will specify the parquet data source format.

      If no format is set and the load() method cannot infer the input data source format from the layer content type, the load() method will use the default format defined in the spark.sql.sources.default Spark property, whose default value is parquet.

      Returns:
      DataFrame with the data, note the structure of the data DataFrame will depend on the format (see format(java.lang.String)) or optional user provided schema
      Throws:
      DataClientNonRetriableException - in case of non-retriable error
      DataClientRetriableException - in case of retriable error