All Known Subinterfaces:: IndexDataFrameReader, InteractiveMapDataFrameReader, VersionedDataFrameReader

public interface LayerDataFrameReader

Custom Spark DataFrameReader for querying data from a given layer.

The layer type will be inferred from the layer configuration. Therefore the api to read from index layer or versioned layer is the same. To read from an index or versioned layer, your application must perform the following operations:

Create an instance of a SparkSession.
Use LayerDataFrameReader.SparkSessionExt.readLayer to create a LayerDataFrameReader.
Call query to specify the query.
Call load to create a DataFrame that contains the data. The method will infer the data format from the layer content type.

Below is an example written in Scala that demonstrates how to query data from an index layer:


 import com.here.platform.data.client.spark.LayerDataFrameReader.SparkSessionExt

 val spark =
   SparkSession
     .builder()
     .appName(getClass.getSimpleName)
     .master("local[*]")
     .getOrCreate()

 val dataFrame: DataFrame = spark
     .readLayer(catalogHrn, indexLayer)
     .query(
         "tileId=INBOUNDINGBOX=(23.648524, 22.689013, 62.284241, 60.218811) and eventType==SignRecognition")
     .load()

Java developers should use com.here.platform.data.client.spark.javadsl.JavaLayerDataFrameReader#readLayer instead of spark.readLayer:


   Dataset<Row> df =
         JavaLayerDataFrameReader.create(spark)
             .readLayer(catalogHrn, layerId)
             .query(
                 "tileId=INBOUNDINGBOX=(23.648524, 22.689013, 62.284241, 60.218811) and eventType==SignRecognition")
             .load();

Note:: If the load method cannot correctly infer the data format from the layer content type, the application can enforce the data format by previously calling the format method.

Nested Class Summary

Nested Classes

Modifier and Type

Interface

Description

static class

LayerDataFrameReader.SparkSessionExt

Implicit class to simplify the creation of custom Spark DataFrameReaders.
Method Summary

Modifier and Type

Method

Description

LayerDataFrameReader

format(String source)

Specifies the format of the data stored in the layer.

org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>

load()

Retrieve the data in the user defined format (see format(java.lang.String)) which satisfies the provided query (see query(java.lang.String)).

LayerDataFrameReader

option(String key, boolean value)

Adds an input option for the underlying data source.

LayerDataFrameReader

option(String key, double value)

Adds an input option for the underlying data source.

LayerDataFrameReader

option(String key, long value)

Adds an input option for the underlying data source.

LayerDataFrameReader

option(String key, Enum<?> value)

Adds an input option for the underlying data source.

LayerDataFrameReader

option(String key, String value)

Adds an input option for the underlying data source.

LayerDataFrameReader

query(String query)

Specifies the query to use when querying the layer.

LayerDataFrameReader

queryMetadata(String query)

Specifies the query to use when querying the layer partitions metadata.

LayerDataFrameReader

schema(org.apache.spark.sql.types.StructType schema)

Specifies the schema of the data stored in the layer.

Method Details
- option
  
  LayerDataFrameReader option(String key, double value)
  
  Adds an input option for the underlying data source.
- option
  
  LayerDataFrameReader option(String key, long value)
  
  Adds an input option for the underlying data source.
- option
  
  LayerDataFrameReader option(String key, boolean value)
  
  Adds an input option for the underlying data source.
- option
  
  LayerDataFrameReader option(String key, String value)
  
  Adds an input option for the underlying data source.
- option
  
  LayerDataFrameReader option(String key, Enum<?> value)
  
  Adds an input option for the underlying data source.
- format
  
  LayerDataFrameReader format(String source)
  
  Specifies the format of the data stored in the layer.
- schema
  
  LayerDataFrameReader schema(org.apache.spark.sql.types.StructType schema)
  
  Specifies the schema of the data stored in the layer. Some data formats such as Apache Avro can infer the schema automatically from the data. By specifying the schema here, the underlying data source can skip the schema inference step, and thus speed up data loading.
- query
  
  LayerDataFrameReader query(String query)
  
  Specifies the query to use when querying the layer.
  
  Parameters:
  
  query - Query string to retrieve layer data. Format of query should follow RSQL. See https://github.com/jirutka/rsql-parser
- queryMetadata
  
  LayerDataFrameReader queryMetadata(String query)
  
  Specifies the query to use when querying the layer partitions metadata.
  
  Parameters:
  
  query - Query string to retrieve layer partitions metadata. Format of query should follow RSQL. See https://github.com/jirutka/rsql-parser
- load
  
  org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> load()
  
  Retrieve the data in the user defined format (see format(java.lang.String)) which satisfies the provided query (see query(java.lang.String)).
  If no format is set, the load() method infers the input data source format from the layer content type. For example, if the layer content type is application/x-parquet, the load() method will specify the parquet data source format.
  If no format is set and the load() method cannot infer the input data source format from the layer content type, the load() method will use the default format defined in the spark.sql.sources.default Spark property, whose default value is parquet.
  
  Returns:
  
  DataFrame with the data, note the structure of the data DataFrame will depend on the format (see format(java.lang.String)) or optional user provided schema
  
  Throws:
  
  DataClientNonRetriableException - in case of non-retriable error
  
  DataClientRetriableException - in case of retriable error

Interface LayerDataFrameReader

Nested Class Summary

Method Summary

Method Details

option

option

option

option

option

format

schema

query

queryMetadata

load