java.lang.Object
com.here.platform.data.client.spark.datasources.protobuf.ProtobufFileFormat
All Implemented Interfaces:
org.apache.spark.sql.execution.datasources.FileFormat, org.apache.spark.sql.sources.DataSourceRegister

public class ProtobufFileFormat extends Object implements org.apache.spark.sql.execution.datasources.FileFormat, org.apache.spark.sql.sources.DataSourceRegister
To read protobuf data from a layer, the schema must be specified in the layer configuration and needs to be available on Artifact service. Further the schema must have a ds variant.

For more information on how to maintain schemas, see: https://developer.here.com/documentation/archetypes/dev_guide/topics/archetypes-schema.html

  • Constructor Summary

    Constructors
    Constructor
    Description
     
  • Method Summary

    Modifier and Type
    Method
    Description
    scala.Function1<org.apache.spark.sql.execution.datasources.PartitionedFile,scala.collection.Iterator<org.apache.spark.sql.catalyst.InternalRow>>
    buildReader(org.apache.spark.sql.SparkSession spark, org.apache.spark.sql.types.StructType dataSchema, org.apache.spark.sql.types.StructType partitionSchema, org.apache.spark.sql.types.StructType requiredSchema, scala.collection.immutable.Seq<org.apache.spark.sql.sources.Filter> filters, scala.collection.immutable.Map<String,String> options, org.apache.hadoop.conf.Configuration hadoopConf)
     
    static scala.Tuple2<HRN,String>
     
    static scala.Function1<org.apache.spark.sql.execution.datasources.PartitionedFile,scala.collection.Iterator<org.apache.spark.sql.catalyst.InternalRow>>
    enclose(scala.Function2<PartitionDescriptorHolder,scala.Option<String>,scala.Function1<org.apache.spark.sql.execution.datasources.PartitionedFile,scala.collection.Iterator<org.apache.spark.sql.catalyst.InternalRow>>> f, PartitionDescriptorHolder partitionDecoder, scala.Option<String> layerInfo)
     
    boolean
    equals(Object other)
     
    static scala.Tuple2<PartitionDescriptorHolder,scala.Option<String>>
    getDescriptorHolderInfo(scala.collection.immutable.Map<String,String> options)
     
    int
     
    scala.Option<org.apache.spark.sql.types.StructType>
    inferSchema(org.apache.spark.sql.SparkSession sparkSession, scala.collection.immutable.Map<String,String> options, scala.collection.immutable.Seq<org.apache.hadoop.fs.FileStatus> files)
    With each call to org.apache.spark.sql.DataFrameReader.load, DataSource.lookupDataSource creates a new ServiceLoader[DataSourceRegister] which then lazily recreates instances of FileFormat.
    org.apache.spark.sql.execution.datasources.OutputWriterFactory
    prepareWrite(org.apache.spark.sql.SparkSession sparkSession, org.apache.hadoop.mapreduce.Job job, scala.collection.immutable.Map<String,String> options, org.apache.spark.sql.types.StructType dataSchema)
     
    static String
     
     

    Methods inherited from class java.lang.Object

    getClass, notify, notifyAll, toString, wait, wait, wait

    Methods inherited from interface org.apache.spark.sql.execution.datasources.FileFormat

    allowDuplicatedColumnNames, buildReaderWithPartitionValues, createFileMetadataCol, fileConstantMetadataExtractors, isSplitable, metadataSchemaFields, supportBatch, supportDataType, supportFieldName, vectorTypes
  • Constructor Details

    • ProtobufFileFormat

      public ProtobufFileFormat()
  • Method Details

    • schemaDirKey

      public static String schemaDirKey()
    • deconstructURI

      public static scala.Tuple2<HRN,String> deconstructURI(String uri)
    • getDescriptorHolderInfo

      public static scala.Tuple2<PartitionDescriptorHolder,scala.Option<String>> getDescriptorHolderInfo(scala.collection.immutable.Map<String,String> options)
    • enclose

      public static scala.Function1<org.apache.spark.sql.execution.datasources.PartitionedFile,scala.collection.Iterator<org.apache.spark.sql.catalyst.InternalRow>> enclose(scala.Function2<PartitionDescriptorHolder,scala.Option<String>,scala.Function1<org.apache.spark.sql.execution.datasources.PartitionedFile,scala.collection.Iterator<org.apache.spark.sql.catalyst.InternalRow>>> f, PartitionDescriptorHolder partitionDecoder, scala.Option<String> layerInfo)
    • shortName

      public String shortName()
      Specified by:
      shortName in interface org.apache.spark.sql.sources.DataSourceRegister
    • inferSchema

      public scala.Option<org.apache.spark.sql.types.StructType> inferSchema(org.apache.spark.sql.SparkSession sparkSession, scala.collection.immutable.Map<String,String> options, scala.collection.immutable.Seq<org.apache.hadoop.fs.FileStatus> files)
      With each call to org.apache.spark.sql.DataFrameReader.load, DataSource.lookupDataSource creates a new ServiceLoader[DataSourceRegister] which then lazily recreates instances of FileFormat. In other words, with each call to org.apache.spark.sql.DataFrameReader.load a new ProtobufFileFormat is instantiated (with it a new PartitionDecoder) allowing new schemas to be downloaded once again from artifact service.
      Specified by:
      inferSchema in interface org.apache.spark.sql.execution.datasources.FileFormat
    • hashCode

      public int hashCode()
      Overrides:
      hashCode in class Object
    • equals

      public boolean equals(Object other)
      Overrides:
      equals in class Object
    • prepareWrite

      public org.apache.spark.sql.execution.datasources.OutputWriterFactory prepareWrite(org.apache.spark.sql.SparkSession sparkSession, org.apache.hadoop.mapreduce.Job job, scala.collection.immutable.Map<String,String> options, org.apache.spark.sql.types.StructType dataSchema)
      Specified by:
      prepareWrite in interface org.apache.spark.sql.execution.datasources.FileFormat
    • buildReader

      public scala.Function1<org.apache.spark.sql.execution.datasources.PartitionedFile,scala.collection.Iterator<org.apache.spark.sql.catalyst.InternalRow>> buildReader(org.apache.spark.sql.SparkSession spark, org.apache.spark.sql.types.StructType dataSchema, org.apache.spark.sql.types.StructType partitionSchema, org.apache.spark.sql.types.StructType requiredSchema, scala.collection.immutable.Seq<org.apache.spark.sql.sources.Filter> filters, scala.collection.immutable.Map<String,String> options, org.apache.hadoop.conf.Configuration hadoopConf)
      Specified by:
      buildReader in interface org.apache.spark.sql.execution.datasources.FileFormat