java.lang.Object

com.here.platform.data.client.spark.datasources.protobuf.ProtobufFileFormat

All Implemented Interfaces:: org.apache.spark.sql.execution.datasources.FileFormat, org.apache.spark.sql.sources.DataSourceRegister

public class ProtobufFileFormat extends Object implements org.apache.spark.sql.execution.datasources.FileFormat, org.apache.spark.sql.sources.DataSourceRegister

To read protobuf data from a layer, the schema must be specified in the layer configuration and needs to be available on Artifact service. Further the schema must have a ds variant.

For more information on how to maintain schemas, see: https://developer.here.com/documentation/archetypes/dev_guide/topics/archetypes-schema.html

Constructor Summary

Constructors

Constructor

Description

ProtobufFileFormat()
Method Summary

Modifier and Type

Method

Description

scala.Function1<org.apache.spark.sql.execution.datasources.PartitionedFile,scala.collection.Iterator<org.apache.spark.sql.catalyst.InternalRow>>

buildReader(org.apache.spark.sql.SparkSession spark, org.apache.spark.sql.types.StructType dataSchema, org.apache.spark.sql.types.StructType partitionSchema, org.apache.spark.sql.types.StructType requiredSchema, scala.collection.immutable.Seq<org.apache.spark.sql.sources.Filter> filters, scala.collection.immutable.Map<String,String> options, org.apache.hadoop.conf.Configuration hadoopConf)

static scala.Tuple2<HRN,String>

deconstructURI(String uri)

static scala.Function1<org.apache.spark.sql.execution.datasources.PartitionedFile,scala.collection.Iterator<org.apache.spark.sql.catalyst.InternalRow>>

enclose(scala.Function2<PartitionDescriptorHolder,scala.Option<String>,scala.Function1<org.apache.spark.sql.execution.datasources.PartitionedFile,scala.collection.Iterator<org.apache.spark.sql.catalyst.InternalRow>>> f, PartitionDescriptorHolder partitionDecoder, scala.Option<String> layerInfo)

boolean

equals(Object other)

static scala.Tuple2<PartitionDescriptorHolder,scala.Option<String>>

getDescriptorHolderInfo(scala.collection.immutable.Map<String,String> options)

int

hashCode()

scala.Option<org.apache.spark.sql.types.StructType>

inferSchema(org.apache.spark.sql.SparkSession sparkSession, scala.collection.immutable.Map<String,String> options, scala.collection.immutable.Seq<org.apache.hadoop.fs.FileStatus> files)

With each call to org.apache.spark.sql.DataFrameReader.load, DataSource.lookupDataSource creates a new ServiceLoader[DataSourceRegister] which then lazily recreates instances of FileFormat.

org.apache.spark.sql.execution.datasources.OutputWriterFactory

prepareWrite(org.apache.spark.sql.SparkSession sparkSession, org.apache.hadoop.mapreduce.Job job, scala.collection.immutable.Map<String,String> options, org.apache.spark.sql.types.StructType dataSchema)

static String

schemaDirKey()

String

shortName()

Methods inherited from class java.lang.Object
getClass, notify, notifyAll, toString, wait, wait, wait

Methods inherited from interface org.apache.spark.sql.execution.datasources.FileFormat
allowDuplicatedColumnNames, buildReaderWithPartitionValues, createFileMetadataCol, fileConstantMetadataExtractors, isSplitable, metadataSchemaFields, supportBatch, supportDataType, supportFieldName, vectorTypes

Constructor Details
- ProtobufFileFormat
  
  public ProtobufFileFormat()
Method Details
- schemaDirKey
  
  public static String schemaDirKey()
- deconstructURI
  
  public static scala.Tuple2<HRN,String> deconstructURI(String uri)
- getDescriptorHolderInfo
  
  public static scala.Tuple2<PartitionDescriptorHolder,scala.Option<String>> getDescriptorHolderInfo(scala.collection.immutable.Map<String,String> options)
- enclose
  
  public static scala.Function1<org.apache.spark.sql.execution.datasources.PartitionedFile,scala.collection.Iterator<org.apache.spark.sql.catalyst.InternalRow>> enclose(scala.Function2<PartitionDescriptorHolder,scala.Option<String>,scala.Function1<org.apache.spark.sql.execution.datasources.PartitionedFile,scala.collection.Iterator<org.apache.spark.sql.catalyst.InternalRow>>> f, PartitionDescriptorHolder partitionDecoder, scala.Option<String> layerInfo)
- shortName
  
  public String shortName()
  
  Specified by:
  
  shortName in interface org.apache.spark.sql.sources.DataSourceRegister
- inferSchema
  
  public scala.Option<org.apache.spark.sql.types.StructType> inferSchema(org.apache.spark.sql.SparkSession sparkSession, scala.collection.immutable.Map<String,String> options, scala.collection.immutable.Seq<org.apache.hadoop.fs.FileStatus> files)
  
  With each call to org.apache.spark.sql.DataFrameReader.load, DataSource.lookupDataSource creates a new ServiceLoader[DataSourceRegister] which then lazily recreates instances of FileFormat. In other words, with each call to org.apache.spark.sql.DataFrameReader.load a new ProtobufFileFormat is instantiated (with it a new PartitionDecoder) allowing new schemas to be downloaded once again from artifact service.
  
  Specified by:
  
  inferSchema in interface org.apache.spark.sql.execution.datasources.FileFormat
- hashCode
  
  public int hashCode()
  
  Overrides:
  
  hashCode in class Object
- equals
  
  public boolean equals(Object other)
  
  Overrides:
  
  equals in class Object
- prepareWrite
  
  public org.apache.spark.sql.execution.datasources.OutputWriterFactory prepareWrite(org.apache.spark.sql.SparkSession sparkSession, org.apache.hadoop.mapreduce.Job job, scala.collection.immutable.Map<String,String> options, org.apache.spark.sql.types.StructType dataSchema)
  
  Specified by:
  
  prepareWrite in interface org.apache.spark.sql.execution.datasources.FileFormat
- buildReader
  
  public scala.Function1<org.apache.spark.sql.execution.datasources.PartitionedFile,scala.collection.Iterator<org.apache.spark.sql.catalyst.InternalRow>> buildReader(org.apache.spark.sql.SparkSession spark, org.apache.spark.sql.types.StructType dataSchema, org.apache.spark.sql.types.StructType partitionSchema, org.apache.spark.sql.types.StructType requiredSchema, scala.collection.immutable.Seq<org.apache.spark.sql.sources.Filter> filters, scala.collection.immutable.Map<String,String> options, org.apache.hadoop.conf.Configuration hadoopConf)
  
  Specified by:
  
  buildReader in interface org.apache.spark.sql.execution.datasources.FileFormat

Class ProtobufFileFormat

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Methods inherited from interface org.apache.spark.sql.execution.datasources.FileFormat

Constructor Details

ProtobufFileFormat

Method Details

schemaDirKey

deconstructURI

getDescriptorHolderInfo

enclose

shortName

inferSchema

hashCode

equals

prepareWrite

buildReader