public interface LayerDataFrameWriter

Custom Spark DataFrameWriter for writing data to a given layer.

The layer type will be inferred from the layer configuration. Therefore the api to write to index layer or versioned layer is the same. To write to an index or versioned layer, your application must perform the following operations:

Have some data as a DataFrame.
Use LayerDataFrameWriter.DataFrameExt.writeLayer to create a LayerDataFrameWriter.
Call {@link LayerDataFrameWriter#withDataConverter(dataConverter:com\.here\.platform\.data\.client\.spark\.scaladsl\.DataConverter):com\.here\.platform\.data\.client\.spark\.LayerDataFrameWriter* withDataConverter} to specify how the data groupings should be merged to a single data file. This is not necessary if the data is stored in avro, parquet, or protobuf as the appropriate DataConverter will be inferred by the layer's content type.
Call save to save the data stored in the DataFrame to the given layer

Below is an example written in Scala that demonstrates how to write data to an index layer:


 import com.here.platform.data.client.spark.LayerDataFrameWriter.DataFrameExt

 val spark =
   SparkSession
     .builder()
     .appName(getClass.getSimpleName)
     .master("local[*]")
     .getOrCreate()

 val dataFrame: DataFrame = ???
 dataFrame
     .writeLayer(catalogHrn, indexLayer)
     .save()

Java developers should use com.here.platform.data.client.spark.javadsl.JavaLayerDataFrameWriter#writeLayer instead of dataFrame.writeLayer:


   Dataset<Row> df = ???
   JavaLayerDataFrameWriter.create(df)
       .writeLayer(catalogHrn, layerId)
       .save()

The batch size (number of Rows) for a grouping can be restricted to a certain amount by setting the option olp.groupedBatchSize (ie 2 for 2 Rows to be in each group):


 val dataFrame: DataFrame = ???
 dataFrame
     .writeLayer(catalogHrn, indexLayer)
     .option("olp.groupedBatchSize", 2)
     .save()

Note:: If the save method cannot correctly infer the DataConverter from the layer content type, the application will be required to provide the DataConverter using the {@link LayerDataFrameWriter#withDataConverter(dataConverter:com\.here\.platform\.data\.client\.spark\.scaladsl\.DataConverter):com\.here\.platform\.data\.client\.spark\.LayerDataFrameWriter* withDataConverter} method.

Nested Class Summary

Nested Classes

Modifier and Type

Interface

Description

static class

LayerDataFrameWriter.DataFrameExt

Implicit class to simplify the creation of custom Spark DataFrameWriters.
Method Summary

Modifier and Type

Method

Description

LayerDataFrameWriter

option(String key, boolean value)

Adds an option for the underlying data source.

LayerDataFrameWriter

option(String key, double value)

Adds an option for the underlying data source.

LayerDataFrameWriter

option(String key, long value)

Adds an option for the underlying data source.

LayerDataFrameWriter

option(String key, String value)

Adds an option for the underlying data source.

void

save()

Save the data to the given layer *

LayerDataFrameWriter

withDataConverter(DataConverter dataConverter)

Specify a DataConverter to convert a collection of data Row to an aggregated data file as a byte array.

LayerDataFrameWriter

withDataConverter(com.here.platform.data.client.spark.scaladsl.DataConverter dataConverter)

Specify a DataConverter to convert a collection of data Row to an aggregated data file as a byte array.

LayerDataFrameWriter

withDependencies(scala.collection.immutable.Seq<VersionDependency> dependencies)

Specify the dependencies to be used for write operation

Method Details
- option
  
  LayerDataFrameWriter option(String key, double value)
  
  Adds an option for the underlying data source.
- option
  
  LayerDataFrameWriter option(String key, long value)
  
  Adds an option for the underlying data source.
- option
  
  LayerDataFrameWriter option(String key, boolean value)
  
  Adds an option for the underlying data source.
- option
  
  LayerDataFrameWriter option(String key, String value)
  
  Adds an option for the underlying data source.
- withDataConverter
  
  LayerDataFrameWriter withDataConverter(com.here.platform.data.client.spark.scaladsl.DataConverter dataConverter)
  
  Specify a DataConverter to convert a collection of data Row to an aggregated data file as a byte array.
- withDataConverter
  
  LayerDataFrameWriter withDataConverter(DataConverter dataConverter)
  
  Specify a DataConverter to convert a collection of data Row to an aggregated data file as a byte array.
- withDependencies
  
  LayerDataFrameWriter withDependencies(scala.collection.immutable.Seq<VersionDependency> dependencies)
  
  Specify the dependencies to be used for write operation
- save
  
  void save()
  
  Save the data to the given layer *

Interface LayerDataFrameWriter

Nested Class Summary

Method Summary

Method Details

option

option

option

option

withDataConverter

withDataConverter

withDependencies

save