Class ProtobufFileFormat
java.lang.Object
com.here.platform.data.client.spark.datasources.protobuf.ProtobufFileFormat
- All Implemented Interfaces:
org.apache.spark.sql.execution.datasources.FileFormat,org.apache.spark.sql.sources.DataSourceRegister
public class ProtobufFileFormat
extends Object
implements org.apache.spark.sql.execution.datasources.FileFormat, org.apache.spark.sql.sources.DataSourceRegister
To read protobuf data from a layer, the schema must be specified in the layer
configuration and needs to be available on Artifact service. Further the schema must have a
ds variant.
For more information on how to maintain schemas, see: https://developer.here.com/documentation/archetypes/dev_guide/topics/archetypes-schema.html
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionscala.Function1<org.apache.spark.sql.execution.datasources.PartitionedFile,scala.collection.Iterator<org.apache.spark.sql.catalyst.InternalRow>> buildReader(org.apache.spark.sql.SparkSession spark, org.apache.spark.sql.types.StructType dataSchema, org.apache.spark.sql.types.StructType partitionSchema, org.apache.spark.sql.types.StructType requiredSchema, scala.collection.immutable.Seq<org.apache.spark.sql.sources.Filter> filters, scala.collection.immutable.Map<String, String> options, org.apache.hadoop.conf.Configuration hadoopConf) deconstructURI(String uri) static scala.Function1<org.apache.spark.sql.execution.datasources.PartitionedFile,scala.collection.Iterator<org.apache.spark.sql.catalyst.InternalRow>> enclose(scala.Function2<PartitionDescriptorHolder, scala.Option<String>, scala.Function1<org.apache.spark.sql.execution.datasources.PartitionedFile, scala.collection.Iterator<org.apache.spark.sql.catalyst.InternalRow>>> f, PartitionDescriptorHolder partitionDecoder, scala.Option<String> layerInfo) booleanstatic scala.Tuple2<PartitionDescriptorHolder,scala.Option<String>> getDescriptorHolderInfo(scala.collection.immutable.Map<String, String> options) inthashCode()scala.Option<org.apache.spark.sql.types.StructType>inferSchema(org.apache.spark.sql.SparkSession sparkSession, scala.collection.immutable.Map<String, String> options, scala.collection.immutable.Seq<org.apache.hadoop.fs.FileStatus> files) With each call toorg.apache.spark.sql.DataFrameReader.load,DataSource.lookupDataSourcecreates a newServiceLoader[DataSourceRegister]which then lazily recreates instances ofFileFormat.org.apache.spark.sql.execution.datasources.OutputWriterFactoryprepareWrite(org.apache.spark.sql.SparkSession sparkSession, org.apache.hadoop.mapreduce.Job job, scala.collection.immutable.Map<String, String> options, org.apache.spark.sql.types.StructType dataSchema) static StringMethods inherited from class java.lang.Object
getClass, notify, notifyAll, toString, wait, wait, waitMethods inherited from interface org.apache.spark.sql.execution.datasources.FileFormat
allowDuplicatedColumnNames, buildReaderWithPartitionValues, createFileMetadataCol, fileConstantMetadataExtractors, isSplitable, metadataSchemaFields, supportBatch, supportDataType, supportFieldName, vectorTypes
-
Constructor Details
-
ProtobufFileFormat
public ProtobufFileFormat()
-
-
Method Details
-
schemaDirKey
-
deconstructURI
-
getDescriptorHolderInfo
public static scala.Tuple2<PartitionDescriptorHolder,scala.Option<String>> getDescriptorHolderInfo(scala.collection.immutable.Map<String, String> options) -
enclose
public static scala.Function1<org.apache.spark.sql.execution.datasources.PartitionedFile,scala.collection.Iterator<org.apache.spark.sql.catalyst.InternalRow>> enclose(scala.Function2<PartitionDescriptorHolder, scala.Option<String>, scala.Function1<org.apache.spark.sql.execution.datasources.PartitionedFile, scala.collection.Iterator<org.apache.spark.sql.catalyst.InternalRow>>> f, PartitionDescriptorHolder partitionDecoder, scala.Option<String> layerInfo) -
shortName
- Specified by:
shortNamein interfaceorg.apache.spark.sql.sources.DataSourceRegister
-
inferSchema
public scala.Option<org.apache.spark.sql.types.StructType> inferSchema(org.apache.spark.sql.SparkSession sparkSession, scala.collection.immutable.Map<String, String> options, scala.collection.immutable.Seq<org.apache.hadoop.fs.FileStatus> files) With each call toorg.apache.spark.sql.DataFrameReader.load,DataSource.lookupDataSourcecreates a newServiceLoader[DataSourceRegister]which then lazily recreates instances ofFileFormat. In other words, with each call toorg.apache.spark.sql.DataFrameReader.loada newProtobufFileFormatis instantiated (with it a newPartitionDecoder) allowing new schemas to be downloaded once again from artifact service.- Specified by:
inferSchemain interfaceorg.apache.spark.sql.execution.datasources.FileFormat
-
hashCode
public int hashCode() -
equals
-
prepareWrite
public org.apache.spark.sql.execution.datasources.OutputWriterFactory prepareWrite(org.apache.spark.sql.SparkSession sparkSession, org.apache.hadoop.mapreduce.Job job, scala.collection.immutable.Map<String, String> options, org.apache.spark.sql.types.StructType dataSchema) - Specified by:
prepareWritein interfaceorg.apache.spark.sql.execution.datasources.FileFormat
-
buildReader
public scala.Function1<org.apache.spark.sql.execution.datasources.PartitionedFile,scala.collection.Iterator<org.apache.spark.sql.catalyst.InternalRow>> buildReader(org.apache.spark.sql.SparkSession spark, org.apache.spark.sql.types.StructType dataSchema, org.apache.spark.sql.types.StructType partitionSchema, org.apache.spark.sql.types.StructType requiredSchema, scala.collection.immutable.Seq<org.apache.spark.sql.sources.Filter> filters, scala.collection.immutable.Map<String, String> options, org.apache.hadoop.conf.Configuration hadoopConf) - Specified by:
buildReaderin interfaceorg.apache.spark.sql.execution.datasources.FileFormat
-