Interface LayerDataFrameWriter
public interface LayerDataFrameWriter
Custom Spark DataFrameWriter
for writing data to a given layer.
The layer type will be inferred from the layer configuration. Therefore the api to write to index layer or versioned layer is the same. To write to an index or versioned layer, your application must perform the following operations:
- Have some data as a DataFrame.
- Use
LayerDataFrameWriter.DataFrameExt.writeLayerto create aLayerDataFrameWriter. - Call {@link LayerDataFrameWriter#withDataConverter(dataConverter:com\.here\.platform\.data\.client\.spark\.scaladsl\.DataConverter):com\.here\.platform\.data\.client\.spark\.LayerDataFrameWriter* withDataConverter}
to specify how the data groupings should be merged to a single data file. This is not necessary if the data is stored
in avro, parquet, or protobuf as the appropriate
DataConverterwill be inferred by the layer's content type. - Call
saveto save the data stored in the DataFrame to the given layer
Below is an example written in Scala that demonstrates how to write data to an index layer:
import com.here.platform.data.client.spark.LayerDataFrameWriter.DataFrameExt
val spark =
SparkSession
.builder()
.appName(getClass.getSimpleName)
.master("local[*]")
.getOrCreate()
val dataFrame: DataFrame = ???
dataFrame
.writeLayer(catalogHrn, indexLayer)
.save()
Java developers should use com.here.platform.data.client.spark.javadsl.JavaLayerDataFrameWriter#writeLayer
instead of dataFrame.writeLayer:
Dataset<Row> df = ???
JavaLayerDataFrameWriter.create(df)
.writeLayer(catalogHrn, layerId)
.save()
The batch size (number of Rows) for a grouping can be restricted to a certain amount by setting the
option olp.groupedBatchSize (ie 2 for 2 Rows to be in each group):
val dataFrame: DataFrame = ???
dataFrame
.writeLayer(catalogHrn, indexLayer)
.option("olp.groupedBatchSize", 2)
.save()
- Note:
- If the
savemethod cannot correctly infer theDataConverterfrom the layer content type, the application will be required to provide theDataConverterusing the {@link LayerDataFrameWriter#withDataConverter(dataConverter:com\.here\.platform\.data\.client\.spark\.scaladsl\.DataConverter):com\.here\.platform\.data\.client\.spark\.LayerDataFrameWriter* withDataConverter} method.
-
Nested Class Summary
Nested ClassesModifier and TypeInterfaceDescriptionstatic classImplicit class to simplify the creation of custom Spark DataFrameWriters. -
Method Summary
Modifier and TypeMethodDescriptionAdds an option for the underlying data source.Adds an option for the underlying data source.Adds an option for the underlying data source.Adds an option for the underlying data source.voidsave()Save the data to the given layer *withDataConverter(DataConverter dataConverter) Specify aDataConverterto convert a collection of dataRowto an aggregated data file as a byte array.withDataConverter(com.here.platform.data.client.spark.scaladsl.DataConverter dataConverter) Specify aDataConverterto convert a collection of dataRowto an aggregated data file as a byte array.withDependencies(scala.collection.immutable.Seq<VersionDependency> dependencies) Specify the dependencies to be used for write operation
-
Method Details
-
option
Adds an option for the underlying data source. -
option
Adds an option for the underlying data source. -
option
Adds an option for the underlying data source. -
option
Adds an option for the underlying data source. -
withDataConverter
LayerDataFrameWriter withDataConverter(com.here.platform.data.client.spark.scaladsl.DataConverter dataConverter) Specify aDataConverterto convert a collection of dataRowto an aggregated data file as a byte array. -
withDataConverter
Specify aDataConverterto convert a collection of dataRowto an aggregated data file as a byte array. -
withDependencies
LayerDataFrameWriter withDependencies(scala.collection.immutable.Seq<VersionDependency> dependencies) Specify the dependencies to be used for write operation -
save
void save()Save the data to the given layer *
-