Interface LayerDataFrameReader
- All Known Subinterfaces:
IndexDataFrameReader,InteractiveMapDataFrameReader,VersionedDataFrameReader
The layer type will be inferred from the layer configuration. Therefore the api to read from index layer or versioned layer is the same. To read from an index or versioned layer, your application must perform the following operations:
- Create an instance of a SparkSession.
- Use
LayerDataFrameReader.SparkSessionExt.readLayerto create aLayerDataFrameReader. - Call
queryto specify the query. - Call
loadto create a DataFrame that contains the data. The method will infer the data format from the layer content type.
Below is an example written in Scala that demonstrates how to query data from an index layer:
import com.here.platform.data.client.spark.LayerDataFrameReader.SparkSessionExt
val spark =
SparkSession
.builder()
.appName(getClass.getSimpleName)
.master("local[*]")
.getOrCreate()
val dataFrame: DataFrame = spark
.readLayer(catalogHrn, indexLayer)
.query(
"tileId=INBOUNDINGBOX=(23.648524, 22.689013, 62.284241, 60.218811) and eventType==SignRecognition")
.load()
Java developers should use com.here.platform.data.client.spark.javadsl.JavaLayerDataFrameReader#readLayer
instead of spark.readLayer:
Dataset<Row> df =
JavaLayerDataFrameReader.create(spark)
.readLayer(catalogHrn, layerId)
.query(
"tileId=INBOUNDINGBOX=(23.648524, 22.689013, 62.284241, 60.218811) and eventType==SignRecognition")
.load();
-
Nested Class Summary
Nested ClassesModifier and TypeInterfaceDescriptionstatic classImplicit class to simplify the creation of custom Spark DataFrameReaders. -
Method Summary
Modifier and TypeMethodDescriptionSpecifies the format of the data stored in the layer.org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>load()Retrieve the data in the user defined format (seeformat(java.lang.String)) which satisfies the provided query (seequery(java.lang.String)).Adds an input option for the underlying data source.Adds an input option for the underlying data source.Adds an input option for the underlying data source.Adds an input option for the underlying data source.Adds an input option for the underlying data source.Specifies the query to use when querying the layer.queryMetadata(String query) Specifies the query to use when querying the layer partitions metadata.schema(org.apache.spark.sql.types.StructType schema) Specifies the schema of the data stored in the layer.
-
Method Details
-
option
Adds an input option for the underlying data source. -
option
Adds an input option for the underlying data source. -
option
Adds an input option for the underlying data source. -
option
Adds an input option for the underlying data source. -
option
Adds an input option for the underlying data source. -
format
Specifies the format of the data stored in the layer. -
schema
Specifies the schema of the data stored in the layer. Some data formats such as Apache Avro can infer the schema automatically from the data. By specifying the schema here, the underlying data source can skip the schema inference step, and thus speed up data loading. -
query
Specifies the query to use when querying the layer.- Parameters:
query- Query string to retrieve layer data. Format of query should follow RSQL. See https://github.com/jirutka/rsql-parser
-
queryMetadata
Specifies the query to use when querying the layer partitions metadata.- Parameters:
query- Query string to retrieve layer partitions metadata. Format of query should follow RSQL. See https://github.com/jirutka/rsql-parser
-
load
org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> load()Retrieve the data in the user defined format (seeformat(java.lang.String)) which satisfies the provided query (seequery(java.lang.String)).If no format is set, the
load()method infers the input data source format from the layer content type. For example, if the layer content type isapplication/x-parquet, theload()method will specify theparquetdata source format.If no format is set and the
load()method cannot infer the input data source format from the layer content type, theload()method will use the default format defined in thespark.sql.sources.defaultSpark property, whose default value isparquet.- Returns:
- DataFrame with the data, note the structure of the data DataFrame will depend on the format (see
format(java.lang.String)) or optional user provided schema - Throws:
DataClientNonRetriableException- in case of non-retriable errorDataClientRetriableException- in case of retriable error
-