软件定制开发供应商Hudi Spark SQL Call Procedures学习总结(一)(查询统计表文件信息)

软件定制开发供应商前些天发现了一个巨牛软件定制开发供应商的人工智能学习网站,通俗易懂,风趣幽默,软件定制开发供应商忍不住分享一下给大家。软件定制开发供应商点击跳转到网站:

前言

学习总结 Spark SQL Call Procedures,Call Procedures在被称作(Stored Procedures),它是在Hudi 0.11.0软件定制开发供应商版本由腾讯的ForwardXu大佬贡献的,它除了官网提到的几个Procedures外,还支持其他许多Procedures命令。本文先学习其中的几个我觉得比较常用的命令,主要是查询统计表路径下的各种文件信息。

版本

Hudi master 0.13.0-SNAPSHOT
Spark 3.1.2 (实际上所有Hudi支持的Spark版本都支持Call Procedures)
Kyuubi 1.5.2 (使用Kyuubi是因为返回结果可以展示列名,Spark自带的spark-sql不返回列名)

参数形式

按名称传递参数,没有顺序,可以省略可选参数

CALL system.procedure_name(arg_name_2 => arg_2, arg_name_1 => arg_1, ... arg_name_n => arg_n)
  • 1

按位置参数传递参数,有顺序,可以省略可选参数

CALL system.procedure_name(arg_1, arg_2, ... arg_n)
  • 1

支持的Procedures命令

我们可以在类HoodieProcedures获取对应版本支持的所有的Procedures命令。目前支持如下:

show_fs_path_detailshow_bootstrap_partitionsrepair_deduplicatecreate_metadata_tablestats_file_sizesvalidate_metadata_table_filesshow_commit_partitionsshow_commit_extra_metadatashow_table_propertiesrun_clusteringrun_bootstrapshow_commit_filesrun_cleanshow_rollback_detailrollback_to_savepointshow_fsview_allshow_compactioncopy_to_temp_viewshow_invalid_parquetdelete_savepointshow_bootstrap_mappingshow_archived_commitsshow_fsview_latestshow_metadata_table_filesexport_instantsshow_commits_metadatarollback_to_instantdelete_metadata_tabledelete_markershow_metadata_table_statssync_validatecopy_to_tableshow_savepointsinit_metadata_tablerepair_overwrite_hoodie_propsshow_metadata_table_partitionsshow_logfile_recordsdowngrade_tableshow_clusteringrepair_migrate_partition_metashow_rollbacksshow_logfile_metadataupgrade_tablerepair_add_partition_metahive_synccommits_comparehdfs_parquet_importshow_commit_write_statsshow_commitsshow_archived_commits_metadatarun_compactioncreate_savepointrepair_corrupted_clean_filesstats_wa
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  • 49
  • 50
  • 51
  • 52
  • 53
  • 54

具体的定义:

  private def initProcedureBuilders: Map[String, Supplier[ProcedureBuilder]] = {    Map((RunCompactionProcedure.NAME, RunCompactionProcedure.builder)      ,(ShowCompactionProcedure.NAME, ShowCompactionProcedure.builder)      ,(CreateSavepointProcedure.NAME, CreateSavepointProcedure.builder)      ,(DeleteSavepointProcedure.NAME, DeleteSavepointProcedure.builder)      ,(RollbackToSavepointProcedure.NAME, RollbackToSavepointProcedure.builder)      ,(RollbackToInstantTimeProcedure.NAME, RollbackToInstantTimeProcedure.builder)      ,(RunClusteringProcedure.NAME, RunClusteringProcedure.builder)      ,(ShowClusteringProcedure.NAME, ShowClusteringProcedure.builder)      ,(ShowCommitsProcedure.NAME, ShowCommitsProcedure.builder)      ,(ShowCommitsMetadataProcedure.NAME, ShowCommitsMetadataProcedure.builder)      ,(ShowArchivedCommitsProcedure.NAME, ShowArchivedCommitsProcedure.builder)      ,(ShowArchivedCommitsMetadataProcedure.NAME, ShowArchivedCommitsMetadataProcedure.builder)      ,(ShowCommitFilesProcedure.NAME, ShowCommitFilesProcedure.builder)      ,(ShowCommitPartitionsProcedure.NAME, ShowCommitPartitionsProcedure.builder)      ,(ShowCommitWriteStatsProcedure.NAME, ShowCommitWriteStatsProcedure.builder)      ,(CommitsCompareProcedure.NAME, CommitsCompareProcedure.builder)      ,(ShowSavepointsProcedure.NAME, ShowSavepointsProcedure.builder)      ,(DeleteMarkerProcedure.NAME, DeleteMarkerProcedure.builder)      ,(ShowRollbacksProcedure.NAME, ShowRollbacksProcedure.builder)      ,(ShowRollbackDetailProcedure.NAME, ShowRollbackDetailProcedure.builder)      ,(ExportInstantsProcedure.NAME, ExportInstantsProcedure.builder)      ,(ShowAllFileSystemViewProcedure.NAME, ShowAllFileSystemViewProcedure.builder)      ,(ShowLatestFileSystemViewProcedure.NAME, ShowLatestFileSystemViewProcedure.builder)      ,(ShowHoodieLogFileMetadataProcedure.NAME, ShowHoodieLogFileMetadataProcedure.builder)      ,(ShowHoodieLogFileRecordsProcedure.NAME, ShowHoodieLogFileRecordsProcedure.builder)      ,(StatsWriteAmplificationProcedure.NAME, StatsWriteAmplificationProcedure.builder)      ,(StatsFileSizeProcedure.NAME, StatsFileSizeProcedure.builder)      ,(HdfsParquetImportProcedure.NAME, HdfsParquetImportProcedure.builder)      ,(RunBootstrapProcedure.NAME, RunBootstrapProcedure.builder)      ,(ShowBootstrapMappingProcedure.NAME, ShowBootstrapMappingProcedure.builder)      ,(ShowBootstrapPartitionsProcedure.NAME, ShowBootstrapPartitionsProcedure.builder)      ,(UpgradeTableProcedure.NAME, UpgradeTableProcedure.builder)      ,(DowngradeTableProcedure.NAME, DowngradeTableProcedure.builder)      ,(ShowMetadataTableFilesProcedure.NAME, ShowMetadataTableFilesProcedure.builder)      ,(ShowMetadataTablePartitionsProcedure.NAME, ShowMetadataTablePartitionsProcedure.builder)      ,(CreateMetadataTableProcedure.NAME, CreateMetadataTableProcedure.builder)      ,(DeleteMetadataTableProcedure.NAME, DeleteMetadataTableProcedure.builder)      ,(InitMetadataTableProcedure.NAME, InitMetadataTableProcedure.builder)      ,(ShowMetadataTableStatsProcedure.NAME, ShowMetadataTableStatsProcedure.builder)      ,(ValidateMetadataTableFilesProcedure.NAME, ValidateMetadataTableFilesProcedure.builder)      ,(ShowFsPathDetailProcedure.NAME, ShowFsPathDetailProcedure.builder)      ,(CopyToTableProcedure.NAME, CopyToTableProcedure.builder)      ,(RepairAddpartitionmetaProcedure.NAME, RepairAddpartitionmetaProcedure.builder)      ,(RepairCorruptedCleanFilesProcedure.NAME, RepairCorruptedCleanFilesProcedure.builder)      ,(RepairDeduplicateProcedure.NAME, RepairDeduplicateProcedure.builder)      ,(RepairMigratePartitionMetaProcedure.NAME, RepairMigratePartitionMetaProcedure.builder)      ,(RepairOverwriteHoodiePropsProcedure.NAME, RepairOverwriteHoodiePropsProcedure.builder)      ,(RunCleanProcedure.NAME, RunCleanProcedure.builder)      ,(ValidateHoodieSyncProcedure.NAME, ValidateHoodieSyncProcedure.builder)      ,(ShowInvalidParquetProcedure.NAME, ShowInvalidParquetProcedure.builder)      ,(HiveSyncProcedure.NAME, HiveSyncProcedure.builder)      ,(CopyToTempView.NAME, CopyToTempView.builder)      ,(ShowCommitExtraMetadataProcedure.NAME, ShowCommitExtraMetadataProcedure.builder)      ,(ShowTablePropertiesProcedure.NAME, ShowTablePropertiesProcedure.builder)    )
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  • 49
  • 50
  • 51
  • 52
  • 53
  • 54
  • 55
  • 56

打印代码

initProcedureBuilders.keySet.foreach(println)
  • 1

建表造数

create table test_hudi_call_cow (  id int,  name string,  price double,  ts long,  dt string) using hudi partitioned by (dt) options (  primaryKey = 'id',  preCombineField = 'ts',  type = 'cow' );insert into test_hudi_call_cow values (1,'hudi',10,100,'2021-05-05');insert into test_hudi_call_cow values (2,'hudi',10,100,'2021-05-05');insert into test_hudi_call_cow values (3,'hudi',10,100,'2021-05-05');insert into test_hudi_call_cow values (4,'hudi',10,100,'2021-05-05');
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17

show_table_properties

查看表的properties,以key,value的形式返回hoodie.properties中表的配置

参数

  • table 表名
  • path 表路径
  • limit 可选 默认值10
    table和path两个参数必须得有一个,table的优先级高于path,即如果同时指定table和path,那么以table为准,path不生效。
    输出返回字段:
    key,value

示例

call show_table_properties(table => 'test_hudi_call_cow');call show_table_properties(path => 'hdfs://cluster1/warehouse/tablespace/managed/hive/hudi.db/test_hudi_call_cow');
  • 1
  • 2

默认展示前10条

+--------------------------------------------------+----------------+|                       key                        |     value      |+--------------------------------------------------+----------------+| hoodie.table.precombine.field                    | ts             || hoodie.datasource.write.drop.partition.columns   | false          || hoodie.table.partition.fields                    | dt             || hoodie.table.type                                | COPY_ON_WRITE  || hoodie.archivelog.folder                         | archived       || hoodie.timeline.layout.version                   | 1              || hoodie.table.version                             | 5              || hoodie.table.recordkey.fields                    | id             || hoodie.table.metadata.partitions                 | files          || hoodie.datasource.write.partitionpath.urlencode  | false          |+--------------------------------------------------+----------------+
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14

可以通过设置limit,将limit值设置大一点,查看所有的配置

call show_table_properties(table => 'test_hudi_call_cow', limit => 100);
  • 1
+--------------------------------------------------+----------------------------------------------------+|                       key                        |                       value                        |+--------------------------------------------------+----------------------------------------------------+| hoodie.table.precombine.field                    | ts                                                 || hoodie.datasource.write.drop.partition.columns   | false                                              || hoodie.table.partition.fields                    | dt                                                 || hoodie.table.type                                | COPY_ON_WRITE                                      || hoodie.archivelog.folder                         | archived                                           || hoodie.timeline.layout.version                   | 1                                                  || hoodie.table.version                             | 5                                                  || hoodie.table.recordkey.fields                    | id                                                 || hoodie.table.metadata.partitions                 | files                                              || hoodie.datasource.write.partitionpath.urlencode  | false                                              || hoodie.database.name                             | hudi                                               || hoodie.table.name                                | test_hudi_call_cow                                 || hoodie.table.keygenerator.class                  | org.apache.hudi.keygen.SimpleKeyGenerator          || hoodie.datasource.write.hive_style_partitioning  | true                                               || hoodie.table.create.schema                       | {"type":"record","name":"test_hudi_call_cow_record","namespace":"hoodie.test_hudi_call_cow","fields":[{"name":"_hoodie_commit_time","type":["string","null"]},{"name":"_hoodie_commit_seqno","type":["string","null"]},{"name":"_hoodie_record_key","type":["string","null"]},{"name":"_hoodie_partition_path","type":["string","null"]},{"name":"_hoodie_file_name","type":["string","null"]},{"name":"id","type":["int","null"]},{"name":"name","type":["string","null"]},{"name":"price","type":["double","null"]},{"name":"ts","type":["long","null"]},{"name":"dt","type":["string","null"]}]} || hoodie.table.checksum                            | 2721425243                                         || hoodie.allow.operation.metadata.field            | false                                              |+--------------------------------------------------+----------------------------------------------------+
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22

show_commits

参数

  • table 表名 必选
  • limit 默认值10 可选
    输出返回字段:
    commit_time,action,total_bytes_written,total_files_added,total_files_updated,total_partitions_written,total_records_written,total_update_records_written,total_errors

示例

call show_commits(table => 'test_hudi_call_cow');call show_commits(table => 'test_hudi_call_cow', limit => 1);
  • 1
  • 2
|    commit_time     | action  | total_bytes_written  | total_files_added  | total_files_updated  | total_partitions_written  | total_records_written  | total_update_records_written  | total_errors  |+--------------------+---------+----------------------+--------------------+----------------------+---------------------------+------------------------+-------------------------------+---------------+| 20221123205701931  | commit  | 435308               | 0                  | 1                    | 1                         | 4                      | 0                             | 0             || 20221123205650038  | commit  | 435279               | 0                  | 1                    | 1                         | 3                      | 0                             | 0             || 20221123205636715  | commit  | 435246               | 0                  | 1                    | 1                         | 2                      | 0                             | 0             || 20221123205546254  | commit  | 435148               | 1                  | 0                    | 1                         | 1                      | 0                             | 0             |+--------------------+---------+----------------------+--------------------+----------------------+---------------------------+------------------------+-------------------------------+---------------+
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9

show_commits_metadata

和show_commits功能差不多,不同的是输出字段不一样,和show_commits一样都是通过ShowCommitsProcedures实现的,区别是show_commits_metadata的includeExtraMetadata为true,show_commits的includeExtraMetadata为false

参数

  • table 表名 必选
  • limit 默认值10 可选
    输出返回字段:
    commit_time,action,partition,file_id,previous_commit,num_writes,num_inserts,num_deletes,num_update_writes,total_errors,total_log_blocks,total_corrupt_log_blocks,total_rollback_blocks,total_log_records, total_updated_records_compacted,total_bytes_written

示例

call show_commits_metadata(table => 'test_hudi_call_cow');call show_commits_metadata(table => 'test_hudi_call_cow', limit => 1);
  • 1
  • 2
+--------------------+---------+----------------+-----------------------------------------+--------------------+-------------+--------------+--------------+--------------------+---------------+-------------------+---------------------------+------------------------+--------------------+----------------------------------+----------------------+|    commit_time     | action  |   partition    |                 file_id                 |  previous_commit   | num_writes  | num_inserts  | num_deletes  | num_update_writes  | total_errors  | total_log_blocks  | total_corrupt_log_blocks  | total_rollback_blocks  | total_log_records  | total_updated_records_compacted  | total_bytes_written  |+--------------------+---------+----------------+-----------------------------------------+--------------------+-------------+--------------+--------------+--------------------+---------------+-------------------+---------------------------+------------------------+--------------------+----------------------------------+----------------------+| 20221123205701931  | commit  | dt=2021-05-05  | 35b07424-6e63-4b65-9182-7c37cbe756b1-0  | 20221123205650038  | 4           | 1            | 0            | 0                  | 0             | 0                 | 0                         | 0                      | 0                  | 0                                | 435308               || 20221123205650038  | commit  | dt=2021-05-05  | 35b07424-6e63-4b65-9182-7c37cbe756b1-0  | 20221123205636715  | 3           | 1            | 0            | 0                  | 0             | 0                 | 0                         | 0                      | 0                  | 0                                | 435279               || 20221123205636715  | commit  | dt=2021-05-05  | 35b07424-6e63-4b65-9182-7c37cbe756b1-0  | 20221123205546254  | 2           | 1            | 0            | 0                  | 0             | 0                 | 0                         | 0                      | 0                  | 0                                | 435246               || 20221123205546254  | commit  | dt=2021-05-05  | 35b07424-6e63-4b65-9182-7c37cbe756b1-0  | null               | 1           | 1            | 0            | 0                  | 0             | 0                 | 0                         | 0                      | 0                  | 0                                | 435148               |+--------------------+---------+----------------+-----------------------------------------+--------------------+-------------+--------------+--------------+--------------------+---------------+-------------------+---------------------------+------------------------+--------------------+----------------------------------+----------------------+
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9

show_commit_files

根据instantTime返回对应的文件信息,比如fileId

参数

  • table 表名 必选
  • instant_time 必选
  • limit 默认10 可选
    输出返回字段:
    action,partition_path,file_id,previous_commit,total_records_updated,total_records_written,total_bytes_written,total_errors,file_size

示例

call show_commit_files(table => 'test_hudi_call_cow', instant_time => '20221123205701931');
  • 1

因为测试数据比较少件,且只有一个分区,,所以只有一个文件

+---------+-----------------+-----------------------------------------+--------------------+------------------------+------------------------+----------------------+---------------+------------+| action  | partition_path  |                 file_id                 |  previous_commit   | total_records_updated  | total_records_written  | total_bytes_written  | total_errors  | file_size  |+---------+-----------------+-----------------------------------------+--------------------+------------------------+------------------------+----------------------+---------------+------------+| commit  | dt=2021-05-05   | 35b07424-6e63-4b65-9182-7c37cbe756b1-0  | 20221123205650038  | 0                      | 4                      | 435308               | 0             | 435308     |+---------+-----------------+-----------------------------------------+--------------------+------------------------+------------------------+----------------------+---------------+------------+
  • 1
  • 2
  • 3
  • 4
  • 5

执行下面的sql,使一次commit涉及两个文件

merge into test_hudi_call_cow as t0using (  select 5 as id, 'hudi' as name, 112 as price, 98 as ts, '2022-11-23' as dt,'INSERT' as opt_type union  select 2 as id, 'hudi_2' as name, 10 as price, 100 as ts, '2021-05-05' as dt,'UPDATE' as opt_type union  select 4 as id, 'hudi' as name, 10 as price, 100 as ts, '2021-05-05' as dt ,'DELETE' as opt_type ) as s0on t0.id = s0.idwhen matched and opt_type!='DELETE' then update set *when matched and opt_type='DELETE' then deletewhen not matched and opt_type!='DELETE' then insert *;
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10

先用show_commits查看最新的commit_time为20221123232449644

call show_commits(table => 'test_hudi_call_cow', limit => 1);+--------------------+---------+----------------------+--------------------+----------------------+---------------------------+------------------------+-------------------------------+---------------+|    commit_time     | action  | total_bytes_written  | total_files_added  | total_files_updated  | total_partitions_written  | total_records_written  | total_update_records_written  | total_errors  |+--------------------+---------+----------------------+--------------------+----------------------+---------------------------+------------------------+-------------------------------+---------------+| 20221123232449644  | commit  | 870474               | 1                  | 1                    | 2                         | 4                      | 1                             | 0             |+--------------------+---------+----------------------+--------------------+----------------------+---------------------------+------------------------+-------------------------------+---------------+
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7

然后再用show_commit_files看一下20221123232449644对应的文件

call show_commit_files(table => 'test_hudi_call_cow', instant_time => '20221123232449644');+---------+-----------------+-----------------------------------------+--------------------+------------------------+------------------------+----------------------+---------------+------------+| action  | partition_path  |                 file_id                 |  previous_commit   | total_records_updated  | total_records_written  | total_bytes_written  | total_errors  | file_size  |+---------+-----------------+-----------------------------------------+--------------------+------------------------+------------------------+----------------------+---------------+------------+| commit  | dt=2022-11-23   | 8f2aecfd-198f-405b-ab5d-46e0cc997d97-0  | null               | 0                      | 1                      | 435176               | 0             | 435176     || commit  | dt=2021-05-05   | 35b07424-6e63-4b65-9182-7c37cbe756b1-0  | 20221123231230786  | 1                      | 3                      | 435298               | 0             | 435298     |+---------+-----------------+-----------------------------------------+--------------------+------------------------+------------------------+----------------------+---------------+------------+
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8

show_commit_partitions

根据instantTime返回涉及的每个分区对应的文件和记录信息

参数

  • table 表名 必选
  • instant_time 必选
  • limit 默认10 可选
    输出返回字段:
    action,partition_path,total_files_added,total_files_updated,total_records_inserted,total_records_updated,total_bytes_written,total_errors

示例

call show_commit_partitions(table => 'test_hudi_call_cow', instant_time => '20221123232449644');
  • 1
+---------+-----------------+--------------------+----------------------+-------------------------+------------------------+----------------------+---------------+| action  | partition_path  | total_files_added  | total_files_updated  | total_records_inserted  | total_records_updated  | total_bytes_written  | total_errors  |+---------+-----------------+--------------------+----------------------+-------------------------+------------------------+----------------------+---------------+| commit  | dt=2022-11-23   | 1                  | 0                    | 1                       | 0                      | 435176               | 0             || commit  | dt=2021-05-05   | 0                  | 1                    | 0                       | 1                      | 435298               | 0             |+---------+-----------------+--------------------+----------------------+-------------------------+------------------------+----------------------+---------------+
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6

show_commit_write_stats

根据instantTime返回write_stats

参数:

  • table 表名 必选
  • instant_time 必选
  • limit 默认10 可选
    输出返回字段:
    action,total_bytes_written,total_records_written,avg_record_size

示例

call show_commit_write_stats(table => 'test_hudi_call_cow', instant_time => '20221123232449644');
  • 1
+---------+----------------------+------------------------+------------------+| action  | total_bytes_written  | total_records_written  | avg_record_size  |+---------+----------------------+------------------------+------------------+| commit  | 870474               | 4                      | 217619           |+---------+----------------------+------------------------+------------------+
  • 1
  • 2
  • 3
  • 4
  • 5

show_commit_extra_metadata

返回.commit、.deltacommit、.replacecommit中的extraMetadata

参数

  • table 表名 必选
  • instant_time 可选
  • limit 默认100 可选
  • metadata_key 可选 如schema
    输出返回字段:
    instant_time,action,metadata_key,metadata_value

默认返回最后一个commit文件中的extraMetadata,如果指定了instant_time,那么返回指定instant_time对应的commit文件中的extraMetadata

先看一下.commit里的内容,可以看到里面有一个extraMetadata,并且包含一个key:schema以及schema对应的value

hadoop fs -cat hdfs://cluster1/warehouse/tablespace/managed/hive/hudi.db/test_hudi_call_cow/.hoodie/20221123232449644.commit{  "partitionToWriteStats" : {    "dt=2022-11-23" : [ {      "fileId" : "8f2aecfd-198f-405b-ab5d-46e0cc997d97-0",      "path" : "dt=2022-11-23/8f2aecfd-198f-405b-ab5d-46e0cc997d97-0_1-238-2983_20221123232449644.parquet",      "cdcStats" : null,      "prevCommit" : "null",      "numWrites" : 1,      "numDeletes" : 0,      "numUpdateWrites" : 0,      "numInserts" : 1,      "totalWriteBytes" : 435176,      "totalWriteErrors" : 0,      "tempPath" : null,      "partitionPath" : "dt=2022-11-23",      "totalLogRecords" : 0,      "totalLogFilesCompacted" : 0,      "totalLogSizeCompacted" : 0,      "totalUpdatedRecordsCompacted" : 0,      "totalLogBlocks" : 0,      "totalCorruptLogBlock" : 0,      "totalRollbackBlocks" : 0,      "fileSizeInBytes" : 435176,      "minEventTime" : null,      "maxEventTime" : null,      "runtimeStats" : {        "totalScanTime" : 0,        "totalUpsertTime" : 0,        "totalCreateTime" : 1013      }    } ],    "dt=2021-05-05" : [ {      "fileId" : "35b07424-6e63-4b65-9182-7c37cbe756b1-0",      "path" : "dt=2021-05-05/35b07424-6e63-4b65-9182-7c37cbe756b1-0_0-238-2982_20221123232449644.parquet",      "cdcStats" : null,      "prevCommit" : "20221123231230786",      "numWrites" : 3,      "numDeletes" : 1,      "numUpdateWrites" : 1,      "numInserts" : 0,      "totalWriteBytes" : 435298,      "totalWriteErrors" : 0,      "tempPath" : null,      "partitionPath" : "dt=2021-05-05",      "totalLogRecords" : 0,      "totalLogFilesCompacted" : 0,      "totalLogSizeCompacted" : 0,      "totalUpdatedRecordsCompacted" : 0,      "totalLogBlocks" : 0,      "totalCorruptLogBlock" : 0,      "totalRollbackBlocks" : 0,      "fileSizeInBytes" : 435298,      "minEventTime" : null,      "maxEventTime" : null,      "runtimeStats" : {        "totalScanTime" : 0,        "totalUpsertTime" : 4162,        "totalCreateTime" : 0      }    } ]  },  "compacted" : false,  "extraMetadata" : {    "schema" : "{\"type\":\"record\",\"name\":\"test_hudi_call_cow_record\",\"namespace\":\"hoodie.test_hudi_call_cow\",\"fields\":[{\"name\":\"id\",\"type\":\"int\"},{\"name\":\"name\",\"type\":\"string\"},{\"name\":\"price\",\"type\":\"double\"},{\"name\":\"ts\",\"type\":\"long\"},{\"name\":\"dt\",\"type\":\"string\"}]}"  },  "operationType" : "UPSERT"}
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  • 49
  • 50
  • 51
  • 52
  • 53
  • 54
  • 55
  • 56
  • 57
  • 58
  • 59
  • 60
  • 61
  • 62
  • 63
  • 64
  • 65
  • 66
  • 67
  • 68
  • 69

示例

call show_commit_extra_metadata(table => 'test_hudi_call_cow');
  • 1
  • 2
+--------------------+---------+---------------+----------------------------------------------------+|    instant_time    | action  | metadata_key  |                   metadata_value                   |+--------------------+---------+---------------+----------------------------------------------------+| 20221123232449644  | commit  | schema        | {"type":"record","name":"test_hudi_call_cow_record","namespace":"hoodie.test_hudi_call_cow","fields":[{"name":"id","type":"int"},{"name":"name","type":"string"},{"name":"price","type":"double"},{"name":"ts","type":"long"},{"name":"dt","type":"string"}]} |+--------------------+---------+---------------+----------------------------------------------------+
  • 1
  • 2
  • 3
  • 4
  • 5
call show_commit_extra_metadata(table => 'test_hudi_call_cow', instant_time => '20221123205701931', metadata_key => 'schema', limit => 10);
  • 1
+--------------------+---------+---------------+----------------------------------------------------+|    instant_time    | action  | metadata_key  |                   metadata_value                   |+--------------------+---------+---------------+----------------------------------------------------+| 20221123205701931  | commit  | schema        | {"type":"record","name":"test_hudi_call_cow_record","namespace":"hoodie.test_hudi_call_cow","fields":[{"name":"id","type":"int"},{"name":"name","type":"string"},{"name":"price","type":"double"},{"name":"ts","type":"long"},{"name":"dt","type":"string"}]} |+--------------------+---------+---------------+----------------------------------------------------+
  • 1
  • 2
  • 3
  • 4
  • 5

我目前已知的extraMetadata中只有schema,且只有一条,所以只返回一条记录,且指定不指定metadata_key效果一样,不确定是否还有其他的extraMetadata

show_fs_path_detail

展示指定路径下面的文件和路径的统计信息,默认按照文件大小进行排序
返回.commit、.deltacommit、.replacecommit中的extraMetadata

参数

  • path 表文件路径 必选
  • is_sub 可选 是否查询子目录,只查询一级子目录 默认false
  • sort 可选 是否按文件大小排序 默认true
  • metadata_key 可选 如schema
    输出返回字段:
    instant_time,action,metadata_key,metadata_value

示例

call show_fs_path_detail(path => 'hdfs://cluster1/warehouse/tablespace/managed/hive/hudi.db/test_hudi_call_cow');+-----------+-----------+---------------+---------------------+----------------------------------------------------+-----------------+----------+--------------+| path_num  | file_num  | storage_size  | storage_size(unit)  |                    storage_path                    | space_consumed  |  quota   | space_quota  |+-----------+-----------+---------------+---------------------+----------------------------------------------------+-----------------+----------+--------------+| 22        | 53        | 3200109       | 3.05MB              | hdfs://cluster1/warehouse/tablespace/managed/hive/hudi.db/test_hudi_call_cow | -1              | 9600327  | -1           |+-----------+-----------+---------------+---------------------+----------------------------------------------------+-----------------+----------+--------------+
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7

查询一级子目录

call show_fs_path_detail(path => 'hdfs://cluster1/warehouse/tablespace/managed/hive/hudi.db/test_hudi_call_cow', is_sub => true);+-----------+-----------+---------------+---------------------+----------------------------------------------------+-----------------+----------+--------------+| path_num  | file_num  | storage_size  | storage_size(unit)  |                    storage_path                    | space_consumed  |  quota   | space_quota  |+-----------+-----------+---------------+---------------------+----------------------------------------------------+-----------------+----------+--------------+| 1         | 7         | 2611728       | 2.49MB              | hdfs://cluster1/warehouse/tablespace/managed/hive/hudi.db/test_hudi_call_cow/dt=2021-05-05 | -1              | 7835184  | -1           || 1         | 2         | 435272        | 425.07KB            | hdfs://cluster1/warehouse/tablespace/managed/hive/hudi.db/test_hudi_call_cow/dt=2022-11-23 | -1              | 1305816  | -1           || 19        | 44        | 153109        | 149.52KB            | hdfs://cluster1/warehouse/tablespace/managed/hive/hudi.db/test_hudi_call_cow/.hoodie | -1              | 459327   | -1           |+-----------+-----------+---------------+---------------------+----------------------------------------------------+-----------------+----------+--------------+
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9

查询二级子目录

call show_fs_path_detail(path => 'hdfs://cluster1/warehouse/tablespace/managed/hive/hudi.db/test_hudi_call_cow/dt=2021-05-05', is_sub => true);+-----------+-----------+---------------+---------------------+----------------------------------------------------+-----------------+----------+--------------+| path_num  | file_num  | storage_size  | storage_size(unit)  |                    storage_path                    | space_consumed  |  quota   | space_quota  |+-----------+-----------+---------------+---------------------+----------------------------------------------------+-----------------+----------+--------------+| 0         | 1         | 435353        | 425.15KB            | hdfs://cluster1/warehouse/tablespace/managed/hive/hudi.db/test_hudi_call_cow/dt=2021-05-05/35b07424-6e63-4b65-9182-7c37cbe756b1-0_0-192-1549_20221123231230786.parquet | -1              | 1306059  | -1           || 0         | 1         | 435308        | 425.11KB            | hdfs://cluster1/warehouse/tablespace/managed/hive/hudi.db/test_hudi_call_cow/dt=2021-05-05/35b07424-6e63-4b65-9182-7c37cbe756b1-0_0-147-118_20221123205701931.parquet | -1              | 1305924  | -1           || 0         | 1         | 435298        | 425.10KB            | hdfs://cluster1/warehouse/tablespace/managed/hive/hudi.db/test_hudi_call_cow/dt=2021-05-05/35b07424-6e63-4b65-9182-7c37cbe756b1-0_0-238-2982_20221123232449644.parquet | -1              | 1305894  | -1           || 0         | 1         | 435279        | 425.08KB            | hdfs://cluster1/warehouse/tablespace/managed/hive/hudi.db/test_hudi_call_cow/dt=2021-05-05/35b07424-6e63-4b65-9182-7c37cbe756b1-0_0-105-83_20221123205650038.parquet | -1              | 1305837  | -1           || 0         | 1         | 435246        | 425.04KB            | hdfs://cluster1/warehouse/tablespace/managed/hive/hudi.db/test_hudi_call_cow/dt=2021-05-05/35b07424-6e63-4b65-9182-7c37cbe756b1-0_0-66-52_20221123205636715.parquet | -1              | 1305738  | -1           || 0         | 1         | 435148        | 424.95KB            | hdfs://cluster1/warehouse/tablespace/managed/hive/hudi.db/test_hudi_call_cow/dt=2021-05-05/35b07424-6e63-4b65-9182-7c37cbe756b1-0_0-27-21_20221123205546254.parquet | -1              | 1305444  | -1           || 0         | 1         | 96            | 96B                 | hdfs://cluster1/warehouse/tablespace/managed/hive/hudi.db/test_hudi_call_cow/dt=2021-05-05/.hoodie_partition_metadata | -1              | 288      | -1           |+-----------+-----------+---------------+---------------------+----------------------------------------------------+-----------------+----------+--------------+
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13

相关阅读

网站建设定制开发 软件系统开发定制 定制软件开发 软件开发定制 定制app开发 app开发定制 app开发定制公司 电商商城定制开发 定制小程序开发 定制开发小程序 客户管理系统开发定制 定制网站 定制开发 crm开发定制 开发公司 小程序开发定制 定制软件 收款定制开发 企业网站定制开发 定制化开发 android系统定制开发 定制小程序开发费用 定制设计 专注app软件定制开发 软件开发定制定制 知名网站建设定制 软件定制开发供应商 应用系统定制开发 软件系统定制开发 企业管理系统定制开发 系统定制开发