软件定制开发供应商前些天发现了一个巨牛软件定制开发供应商的人工智能学习网站,通俗易懂,风趣幽默,软件定制开发供应商忍不住分享一下给大家。软件定制开发供应商点击跳转到网站:
前言
学习总结 Spark SQL Call Procedures,Call Procedures在被称作(Stored Procedures),它是在Hudi 0.11.0软件定制开发供应商版本由腾讯的ForwardXu
大佬贡献的,它除了官网提到的几个Procedures外,还支持其他许多Procedures命令。本文先学习其中的几个我觉得比较常用的命令,主要是查询统计表路径下的各种文件信息。
版本
Hudi master 0.13.0-SNAPSHOT
Spark 3.1.2 (实际上所有Hudi支持的Spark版本都支持Call Procedures)
Kyuubi 1.5.2 (使用Kyuubi是因为返回结果可以展示列名,Spark自带的spark-sql不返回列名)
参数形式
按名称传递参数,没有顺序,可以省略可选参数
CALL system.procedure_name(arg_name_2 => arg_2, arg_name_1 => arg_1, ... arg_name_n => arg_n)
- 1
按位置参数传递参数,有顺序,可以省略可选参数
CALL system.procedure_name(arg_1, arg_2, ... arg_n)
- 1
支持的Procedures命令
我们可以在类HoodieProcedures
获取对应版本支持的所有的Procedures命令。目前支持如下:
show_fs_path_detailshow_bootstrap_partitionsrepair_deduplicatecreate_metadata_tablestats_file_sizesvalidate_metadata_table_filesshow_commit_partitionsshow_commit_extra_metadatashow_table_propertiesrun_clusteringrun_bootstrapshow_commit_filesrun_cleanshow_rollback_detailrollback_to_savepointshow_fsview_allshow_compactioncopy_to_temp_viewshow_invalid_parquetdelete_savepointshow_bootstrap_mappingshow_archived_commitsshow_fsview_latestshow_metadata_table_filesexport_instantsshow_commits_metadatarollback_to_instantdelete_metadata_tabledelete_markershow_metadata_table_statssync_validatecopy_to_tableshow_savepointsinit_metadata_tablerepair_overwrite_hoodie_propsshow_metadata_table_partitionsshow_logfile_recordsdowngrade_tableshow_clusteringrepair_migrate_partition_metashow_rollbacksshow_logfile_metadataupgrade_tablerepair_add_partition_metahive_synccommits_comparehdfs_parquet_importshow_commit_write_statsshow_commitsshow_archived_commits_metadatarun_compactioncreate_savepointrepair_corrupted_clean_filesstats_wa
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32
- 33
- 34
- 35
- 36
- 37
- 38
- 39
- 40
- 41
- 42
- 43
- 44
- 45
- 46
- 47
- 48
- 49
- 50
- 51
- 52
- 53
- 54
具体的定义:
private def initProcedureBuilders: Map[String, Supplier[ProcedureBuilder]] = { Map((RunCompactionProcedure.NAME, RunCompactionProcedure.builder) ,(ShowCompactionProcedure.NAME, ShowCompactionProcedure.builder) ,(CreateSavepointProcedure.NAME, CreateSavepointProcedure.builder) ,(DeleteSavepointProcedure.NAME, DeleteSavepointProcedure.builder) ,(RollbackToSavepointProcedure.NAME, RollbackToSavepointProcedure.builder) ,(RollbackToInstantTimeProcedure.NAME, RollbackToInstantTimeProcedure.builder) ,(RunClusteringProcedure.NAME, RunClusteringProcedure.builder) ,(ShowClusteringProcedure.NAME, ShowClusteringProcedure.builder) ,(ShowCommitsProcedure.NAME, ShowCommitsProcedure.builder) ,(ShowCommitsMetadataProcedure.NAME, ShowCommitsMetadataProcedure.builder) ,(ShowArchivedCommitsProcedure.NAME, ShowArchivedCommitsProcedure.builder) ,(ShowArchivedCommitsMetadataProcedure.NAME, ShowArchivedCommitsMetadataProcedure.builder) ,(ShowCommitFilesProcedure.NAME, ShowCommitFilesProcedure.builder) ,(ShowCommitPartitionsProcedure.NAME, ShowCommitPartitionsProcedure.builder) ,(ShowCommitWriteStatsProcedure.NAME, ShowCommitWriteStatsProcedure.builder) ,(CommitsCompareProcedure.NAME, CommitsCompareProcedure.builder) ,(ShowSavepointsProcedure.NAME, ShowSavepointsProcedure.builder) ,(DeleteMarkerProcedure.NAME, DeleteMarkerProcedure.builder) ,(ShowRollbacksProcedure.NAME, ShowRollbacksProcedure.builder) ,(ShowRollbackDetailProcedure.NAME, ShowRollbackDetailProcedure.builder) ,(ExportInstantsProcedure.NAME, ExportInstantsProcedure.builder) ,(ShowAllFileSystemViewProcedure.NAME, ShowAllFileSystemViewProcedure.builder) ,(ShowLatestFileSystemViewProcedure.NAME, ShowLatestFileSystemViewProcedure.builder) ,(ShowHoodieLogFileMetadataProcedure.NAME, ShowHoodieLogFileMetadataProcedure.builder) ,(ShowHoodieLogFileRecordsProcedure.NAME, ShowHoodieLogFileRecordsProcedure.builder) ,(StatsWriteAmplificationProcedure.NAME, StatsWriteAmplificationProcedure.builder) ,(StatsFileSizeProcedure.NAME, StatsFileSizeProcedure.builder) ,(HdfsParquetImportProcedure.NAME, HdfsParquetImportProcedure.builder) ,(RunBootstrapProcedure.NAME, RunBootstrapProcedure.builder) ,(ShowBootstrapMappingProcedure.NAME, ShowBootstrapMappingProcedure.builder) ,(ShowBootstrapPartitionsProcedure.NAME, ShowBootstrapPartitionsProcedure.builder) ,(UpgradeTableProcedure.NAME, UpgradeTableProcedure.builder) ,(DowngradeTableProcedure.NAME, DowngradeTableProcedure.builder) ,(ShowMetadataTableFilesProcedure.NAME, ShowMetadataTableFilesProcedure.builder) ,(ShowMetadataTablePartitionsProcedure.NAME, ShowMetadataTablePartitionsProcedure.builder) ,(CreateMetadataTableProcedure.NAME, CreateMetadataTableProcedure.builder) ,(DeleteMetadataTableProcedure.NAME, DeleteMetadataTableProcedure.builder) ,(InitMetadataTableProcedure.NAME, InitMetadataTableProcedure.builder) ,(ShowMetadataTableStatsProcedure.NAME, ShowMetadataTableStatsProcedure.builder) ,(ValidateMetadataTableFilesProcedure.NAME, ValidateMetadataTableFilesProcedure.builder) ,(ShowFsPathDetailProcedure.NAME, ShowFsPathDetailProcedure.builder) ,(CopyToTableProcedure.NAME, CopyToTableProcedure.builder) ,(RepairAddpartitionmetaProcedure.NAME, RepairAddpartitionmetaProcedure.builder) ,(RepairCorruptedCleanFilesProcedure.NAME, RepairCorruptedCleanFilesProcedure.builder) ,(RepairDeduplicateProcedure.NAME, RepairDeduplicateProcedure.builder) ,(RepairMigratePartitionMetaProcedure.NAME, RepairMigratePartitionMetaProcedure.builder) ,(RepairOverwriteHoodiePropsProcedure.NAME, RepairOverwriteHoodiePropsProcedure.builder) ,(RunCleanProcedure.NAME, RunCleanProcedure.builder) ,(ValidateHoodieSyncProcedure.NAME, ValidateHoodieSyncProcedure.builder) ,(ShowInvalidParquetProcedure.NAME, ShowInvalidParquetProcedure.builder) ,(HiveSyncProcedure.NAME, HiveSyncProcedure.builder) ,(CopyToTempView.NAME, CopyToTempView.builder) ,(ShowCommitExtraMetadataProcedure.NAME, ShowCommitExtraMetadataProcedure.builder) ,(ShowTablePropertiesProcedure.NAME, ShowTablePropertiesProcedure.builder) )
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32
- 33
- 34
- 35
- 36
- 37
- 38
- 39
- 40
- 41
- 42
- 43
- 44
- 45
- 46
- 47
- 48
- 49
- 50
- 51
- 52
- 53
- 54
- 55
- 56
打印代码
initProcedureBuilders.keySet.foreach(println)
- 1
建表造数
create table test_hudi_call_cow ( id int, name string, price double, ts long, dt string) using hudi partitioned by (dt) options ( primaryKey = 'id', preCombineField = 'ts', type = 'cow' );insert into test_hudi_call_cow values (1,'hudi',10,100,'2021-05-05');insert into test_hudi_call_cow values (2,'hudi',10,100,'2021-05-05');insert into test_hudi_call_cow values (3,'hudi',10,100,'2021-05-05');insert into test_hudi_call_cow values (4,'hudi',10,100,'2021-05-05');
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
show_table_properties
查看表的properties,以key,value的形式返回hoodie.properties
中表的配置
参数
- table 表名
- path 表路径
- limit 可选 默认值10
table和path两个参数必须得有一个,table的优先级高于path,即如果同时指定table和path,那么以table为准,path不生效。
输出返回字段:
key,value
示例
call show_table_properties(table => 'test_hudi_call_cow');call show_table_properties(path => 'hdfs://cluster1/warehouse/tablespace/managed/hive/hudi.db/test_hudi_call_cow');
- 1
- 2
默认展示前10条
+--------------------------------------------------+----------------+| key | value |+--------------------------------------------------+----------------+| hoodie.table.precombine.field | ts || hoodie.datasource.write.drop.partition.columns | false || hoodie.table.partition.fields | dt || hoodie.table.type | COPY_ON_WRITE || hoodie.archivelog.folder | archived || hoodie.timeline.layout.version | 1 || hoodie.table.version | 5 || hoodie.table.recordkey.fields | id || hoodie.table.metadata.partitions | files || hoodie.datasource.write.partitionpath.urlencode | false |+--------------------------------------------------+----------------+
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
可以通过设置limit,将limit值设置大一点,查看所有的配置
call show_table_properties(table => 'test_hudi_call_cow', limit => 100);
- 1
+--------------------------------------------------+----------------------------------------------------+| key | value |+--------------------------------------------------+----------------------------------------------------+| hoodie.table.precombine.field | ts || hoodie.datasource.write.drop.partition.columns | false || hoodie.table.partition.fields | dt || hoodie.table.type | COPY_ON_WRITE || hoodie.archivelog.folder | archived || hoodie.timeline.layout.version | 1 || hoodie.table.version | 5 || hoodie.table.recordkey.fields | id || hoodie.table.metadata.partitions | files || hoodie.datasource.write.partitionpath.urlencode | false || hoodie.database.name | hudi || hoodie.table.name | test_hudi_call_cow || hoodie.table.keygenerator.class | org.apache.hudi.keygen.SimpleKeyGenerator || hoodie.datasource.write.hive_style_partitioning | true || hoodie.table.create.schema | {"type":"record","name":"test_hudi_call_cow_record","namespace":"hoodie.test_hudi_call_cow","fields":[{"name":"_hoodie_commit_time","type":["string","null"]},{"name":"_hoodie_commit_seqno","type":["string","null"]},{"name":"_hoodie_record_key","type":["string","null"]},{"name":"_hoodie_partition_path","type":["string","null"]},{"name":"_hoodie_file_name","type":["string","null"]},{"name":"id","type":["int","null"]},{"name":"name","type":["string","null"]},{"name":"price","type":["double","null"]},{"name":"ts","type":["long","null"]},{"name":"dt","type":["string","null"]}]} || hoodie.table.checksum | 2721425243 || hoodie.allow.operation.metadata.field | false |+--------------------------------------------------+----------------------------------------------------+
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
show_commits
参数
- table 表名 必选
- limit 默认值10 可选
输出返回字段:
commit_time,action,total_bytes_written,total_files_added,total_files_updated,total_partitions_written,total_records_written,total_update_records_written,total_errors
示例
call show_commits(table => 'test_hudi_call_cow');call show_commits(table => 'test_hudi_call_cow', limit => 1);
- 1
- 2
| commit_time | action | total_bytes_written | total_files_added | total_files_updated | total_partitions_written | total_records_written | total_update_records_written | total_errors |+--------------------+---------+----------------------+--------------------+----------------------+---------------------------+------------------------+-------------------------------+---------------+| 20221123205701931 | commit | 435308 | 0 | 1 | 1 | 4 | 0 | 0 || 20221123205650038 | commit | 435279 | 0 | 1 | 1 | 3 | 0 | 0 || 20221123205636715 | commit | 435246 | 0 | 1 | 1 | 2 | 0 | 0 || 20221123205546254 | commit | 435148 | 1 | 0 | 1 | 1 | 0 | 0 |+--------------------+---------+----------------------+--------------------+----------------------+---------------------------+------------------------+-------------------------------+---------------+
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
show_commits_metadata
和show_commits功能差不多,不同的是输出字段不一样,和show_commits一样都是通过ShowCommitsProcedures实现的,区别是show_commits_metadata的includeExtraMetadata为true,show_commits的includeExtraMetadata为false
参数
- table 表名 必选
- limit 默认值10 可选
输出返回字段:
commit_time,action,partition,file_id,previous_commit,num_writes,num_inserts,num_deletes,num_update_writes,total_errors,total_log_blocks,total_corrupt_log_blocks,total_rollback_blocks,total_log_records, total_updated_records_compacted,total_bytes_written
示例
call show_commits_metadata(table => 'test_hudi_call_cow');call show_commits_metadata(table => 'test_hudi_call_cow', limit => 1);
- 1
- 2
+--------------------+---------+----------------+-----------------------------------------+--------------------+-------------+--------------+--------------+--------------------+---------------+-------------------+---------------------------+------------------------+--------------------+----------------------------------+----------------------+| commit_time | action | partition | file_id | previous_commit | num_writes | num_inserts | num_deletes | num_update_writes | total_errors | total_log_blocks | total_corrupt_log_blocks | total_rollback_blocks | total_log_records | total_updated_records_compacted | total_bytes_written |+--------------------+---------+----------------+-----------------------------------------+--------------------+-------------+--------------+--------------+--------------------+---------------+-------------------+---------------------------+------------------------+--------------------+----------------------------------+----------------------+| 20221123205701931 | commit | dt=2021-05-05 | 35b07424-6e63-4b65-9182-7c37cbe756b1-0 | 20221123205650038 | 4 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 435308 || 20221123205650038 | commit | dt=2021-05-05 | 35b07424-6e63-4b65-9182-7c37cbe756b1-0 | 20221123205636715 | 3 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 435279 || 20221123205636715 | commit | dt=2021-05-05 | 35b07424-6e63-4b65-9182-7c37cbe756b1-0 | 20221123205546254 | 2 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 435246 || 20221123205546254 | commit | dt=2021-05-05 | 35b07424-6e63-4b65-9182-7c37cbe756b1-0 | null | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 435148 |+--------------------+---------+----------------+-----------------------------------------+--------------------+-------------+--------------+--------------+--------------------+---------------+-------------------+---------------------------+------------------------+--------------------+----------------------------------+----------------------+
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
show_commit_files
根据instantTime返回对应的文件信息,比如fileId
参数
- table 表名 必选
- instant_time 必选
- limit 默认10 可选
输出返回字段:
action,partition_path,file_id,previous_commit,total_records_updated,total_records_written,total_bytes_written,total_errors,file_size
示例
call show_commit_files(table => 'test_hudi_call_cow', instant_time => '20221123205701931');
- 1
因为测试数据比较少件,且只有一个分区,,所以只有一个文件
+---------+-----------------+-----------------------------------------+--------------------+------------------------+------------------------+----------------------+---------------+------------+| action | partition_path | file_id | previous_commit | total_records_updated | total_records_written | total_bytes_written | total_errors | file_size |+---------+-----------------+-----------------------------------------+--------------------+------------------------+------------------------+----------------------+---------------+------------+| commit | dt=2021-05-05 | 35b07424-6e63-4b65-9182-7c37cbe756b1-0 | 20221123205650038 | 0 | 4 | 435308 | 0 | 435308 |+---------+-----------------+-----------------------------------------+--------------------+------------------------+------------------------+----------------------+---------------+------------+
- 1
- 2
- 3
- 4
- 5
执行下面的sql,使一次commit涉及两个文件
merge into test_hudi_call_cow as t0using ( select 5 as id, 'hudi' as name, 112 as price, 98 as ts, '2022-11-23' as dt,'INSERT' as opt_type union select 2 as id, 'hudi_2' as name, 10 as price, 100 as ts, '2021-05-05' as dt,'UPDATE' as opt_type union select 4 as id, 'hudi' as name, 10 as price, 100 as ts, '2021-05-05' as dt ,'DELETE' as opt_type ) as s0on t0.id = s0.idwhen matched and opt_type!='DELETE' then update set *when matched and opt_type='DELETE' then deletewhen not matched and opt_type!='DELETE' then insert *;
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
先用show_commits查看最新的commit_time为20221123232449644
call show_commits(table => 'test_hudi_call_cow', limit => 1);+--------------------+---------+----------------------+--------------------+----------------------+---------------------------+------------------------+-------------------------------+---------------+| commit_time | action | total_bytes_written | total_files_added | total_files_updated | total_partitions_written | total_records_written | total_update_records_written | total_errors |+--------------------+---------+----------------------+--------------------+----------------------+---------------------------+------------------------+-------------------------------+---------------+| 20221123232449644 | commit | 870474 | 1 | 1 | 2 | 4 | 1 | 0 |+--------------------+---------+----------------------+--------------------+----------------------+---------------------------+------------------------+-------------------------------+---------------+
- 1
- 2
- 3
- 4
- 5
- 6
- 7
然后再用show_commit_files看一下20221123232449644对应的文件
call show_commit_files(table => 'test_hudi_call_cow', instant_time => '20221123232449644');+---------+-----------------+-----------------------------------------+--------------------+------------------------+------------------------+----------------------+---------------+------------+| action | partition_path | file_id | previous_commit | total_records_updated | total_records_written | total_bytes_written | total_errors | file_size |+---------+-----------------+-----------------------------------------+--------------------+------------------------+------------------------+----------------------+---------------+------------+| commit | dt=2022-11-23 | 8f2aecfd-198f-405b-ab5d-46e0cc997d97-0 | null | 0 | 1 | 435176 | 0 | 435176 || commit | dt=2021-05-05 | 35b07424-6e63-4b65-9182-7c37cbe756b1-0 | 20221123231230786 | 1 | 3 | 435298 | 0 | 435298 |+---------+-----------------+-----------------------------------------+--------------------+------------------------+------------------------+----------------------+---------------+------------+
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
show_commit_partitions
根据instantTime返回涉及的每个分区对应的文件和记录信息
参数
- table 表名 必选
- instant_time 必选
- limit 默认10 可选
输出返回字段:
action,partition_path,total_files_added,total_files_updated,total_records_inserted,total_records_updated,total_bytes_written,total_errors
示例
call show_commit_partitions(table => 'test_hudi_call_cow', instant_time => '20221123232449644');
- 1
+---------+-----------------+--------------------+----------------------+-------------------------+------------------------+----------------------+---------------+| action | partition_path | total_files_added | total_files_updated | total_records_inserted | total_records_updated | total_bytes_written | total_errors |+---------+-----------------+--------------------+----------------------+-------------------------+------------------------+----------------------+---------------+| commit | dt=2022-11-23 | 1 | 0 | 1 | 0 | 435176 | 0 || commit | dt=2021-05-05 | 0 | 1 | 0 | 1 | 435298 | 0 |+---------+-----------------+--------------------+----------------------+-------------------------+------------------------+----------------------+---------------+
- 1
- 2
- 3
- 4
- 5
- 6
show_commit_write_stats
根据instantTime返回write_stats
参数:
- table 表名 必选
- instant_time 必选
- limit 默认10 可选
输出返回字段:
action,total_bytes_written,total_records_written,avg_record_size
示例
call show_commit_write_stats(table => 'test_hudi_call_cow', instant_time => '20221123232449644');
- 1
+---------+----------------------+------------------------+------------------+| action | total_bytes_written | total_records_written | avg_record_size |+---------+----------------------+------------------------+------------------+| commit | 870474 | 4 | 217619 |+---------+----------------------+------------------------+------------------+
- 1
- 2
- 3
- 4
- 5
show_commit_extra_metadata
返回.commit、.deltacommit、.replacecommit中的extraMetadata
参数
- table 表名 必选
- instant_time 可选
- limit 默认100 可选
- metadata_key 可选 如schema
输出返回字段:
instant_time,action,metadata_key,metadata_value
默认返回最后一个commit文件中的extraMetadata,如果指定了instant_time,那么返回指定instant_time对应的commit文件中的extraMetadata
先看一下.commit里的内容,可以看到里面有一个extraMetadata,并且包含一个key:schema以及schema对应的value
hadoop fs -cat hdfs://cluster1/warehouse/tablespace/managed/hive/hudi.db/test_hudi_call_cow/.hoodie/20221123232449644.commit{ "partitionToWriteStats" : { "dt=2022-11-23" : [ { "fileId" : "8f2aecfd-198f-405b-ab5d-46e0cc997d97-0", "path" : "dt=2022-11-23/8f2aecfd-198f-405b-ab5d-46e0cc997d97-0_1-238-2983_20221123232449644.parquet", "cdcStats" : null, "prevCommit" : "null", "numWrites" : 1, "numDeletes" : 0, "numUpdateWrites" : 0, "numInserts" : 1, "totalWriteBytes" : 435176, "totalWriteErrors" : 0, "tempPath" : null, "partitionPath" : "dt=2022-11-23", "totalLogRecords" : 0, "totalLogFilesCompacted" : 0, "totalLogSizeCompacted" : 0, "totalUpdatedRecordsCompacted" : 0, "totalLogBlocks" : 0, "totalCorruptLogBlock" : 0, "totalRollbackBlocks" : 0, "fileSizeInBytes" : 435176, "minEventTime" : null, "maxEventTime" : null, "runtimeStats" : { "totalScanTime" : 0, "totalUpsertTime" : 0, "totalCreateTime" : 1013 } } ], "dt=2021-05-05" : [ { "fileId" : "35b07424-6e63-4b65-9182-7c37cbe756b1-0", "path" : "dt=2021-05-05/35b07424-6e63-4b65-9182-7c37cbe756b1-0_0-238-2982_20221123232449644.parquet", "cdcStats" : null, "prevCommit" : "20221123231230786", "numWrites" : 3, "numDeletes" : 1, "numUpdateWrites" : 1, "numInserts" : 0, "totalWriteBytes" : 435298, "totalWriteErrors" : 0, "tempPath" : null, "partitionPath" : "dt=2021-05-05", "totalLogRecords" : 0, "totalLogFilesCompacted" : 0, "totalLogSizeCompacted" : 0, "totalUpdatedRecordsCompacted" : 0, "totalLogBlocks" : 0, "totalCorruptLogBlock" : 0, "totalRollbackBlocks" : 0, "fileSizeInBytes" : 435298, "minEventTime" : null, "maxEventTime" : null, "runtimeStats" : { "totalScanTime" : 0, "totalUpsertTime" : 4162, "totalCreateTime" : 0 } } ] }, "compacted" : false, "extraMetadata" : { "schema" : "{\"type\":\"record\",\"name\":\"test_hudi_call_cow_record\",\"namespace\":\"hoodie.test_hudi_call_cow\",\"fields\":[{\"name\":\"id\",\"type\":\"int\"},{\"name\":\"name\",\"type\":\"string\"},{\"name\":\"price\",\"type\":\"double\"},{\"name\":\"ts\",\"type\":\"long\"},{\"name\":\"dt\",\"type\":\"string\"}]}" }, "operationType" : "UPSERT"}
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32
- 33
- 34
- 35
- 36
- 37
- 38
- 39
- 40
- 41
- 42
- 43
- 44
- 45
- 46
- 47
- 48
- 49
- 50
- 51
- 52
- 53
- 54
- 55
- 56
- 57
- 58
- 59
- 60
- 61
- 62
- 63
- 64
- 65
- 66
- 67
- 68
- 69
示例
call show_commit_extra_metadata(table => 'test_hudi_call_cow');
- 1
- 2
+--------------------+---------+---------------+----------------------------------------------------+| instant_time | action | metadata_key | metadata_value |+--------------------+---------+---------------+----------------------------------------------------+| 20221123232449644 | commit | schema | {"type":"record","name":"test_hudi_call_cow_record","namespace":"hoodie.test_hudi_call_cow","fields":[{"name":"id","type":"int"},{"name":"name","type":"string"},{"name":"price","type":"double"},{"name":"ts","type":"long"},{"name":"dt","type":"string"}]} |+--------------------+---------+---------------+----------------------------------------------------+
- 1
- 2
- 3
- 4
- 5
call show_commit_extra_metadata(table => 'test_hudi_call_cow', instant_time => '20221123205701931', metadata_key => 'schema', limit => 10);
- 1
+--------------------+---------+---------------+----------------------------------------------------+| instant_time | action | metadata_key | metadata_value |+--------------------+---------+---------------+----------------------------------------------------+| 20221123205701931 | commit | schema | {"type":"record","name":"test_hudi_call_cow_record","namespace":"hoodie.test_hudi_call_cow","fields":[{"name":"id","type":"int"},{"name":"name","type":"string"},{"name":"price","type":"double"},{"name":"ts","type":"long"},{"name":"dt","type":"string"}]} |+--------------------+---------+---------------+----------------------------------------------------+
- 1
- 2
- 3
- 4
- 5
我目前已知的extraMetadata中只有schema,且只有一条,所以只返回一条记录,且指定不指定metadata_key效果一样,不确定是否还有其他的extraMetadata
show_fs_path_detail
展示指定路径下面的文件和路径的统计信息,默认按照文件大小进行排序
返回.commit、.deltacommit、.replacecommit中的extraMetadata
参数
- path 表文件路径 必选
- is_sub 可选 是否查询子目录,只查询一级子目录 默认false
- sort 可选 是否按文件大小排序 默认true
- metadata_key 可选 如schema
输出返回字段:
instant_time,action,metadata_key,metadata_value
示例
call show_fs_path_detail(path => 'hdfs://cluster1/warehouse/tablespace/managed/hive/hudi.db/test_hudi_call_cow');+-----------+-----------+---------------+---------------------+----------------------------------------------------+-----------------+----------+--------------+| path_num | file_num | storage_size | storage_size(unit) | storage_path | space_consumed | quota | space_quota |+-----------+-----------+---------------+---------------------+----------------------------------------------------+-----------------+----------+--------------+| 22 | 53 | 3200109 | 3.05MB | hdfs://cluster1/warehouse/tablespace/managed/hive/hudi.db/test_hudi_call_cow | -1 | 9600327 | -1 |+-----------+-----------+---------------+---------------------+----------------------------------------------------+-----------------+----------+--------------+
- 1
- 2
- 3
- 4
- 5
- 6
- 7
查询一级子目录
call show_fs_path_detail(path => 'hdfs://cluster1/warehouse/tablespace/managed/hive/hudi.db/test_hudi_call_cow', is_sub => true);+-----------+-----------+---------------+---------------------+----------------------------------------------------+-----------------+----------+--------------+| path_num | file_num | storage_size | storage_size(unit) | storage_path | space_consumed | quota | space_quota |+-----------+-----------+---------------+---------------------+----------------------------------------------------+-----------------+----------+--------------+| 1 | 7 | 2611728 | 2.49MB | hdfs://cluster1/warehouse/tablespace/managed/hive/hudi.db/test_hudi_call_cow/dt=2021-05-05 | -1 | 7835184 | -1 || 1 | 2 | 435272 | 425.07KB | hdfs://cluster1/warehouse/tablespace/managed/hive/hudi.db/test_hudi_call_cow/dt=2022-11-23 | -1 | 1305816 | -1 || 19 | 44 | 153109 | 149.52KB | hdfs://cluster1/warehouse/tablespace/managed/hive/hudi.db/test_hudi_call_cow/.hoodie | -1 | 459327 | -1 |+-----------+-----------+---------------+---------------------+----------------------------------------------------+-----------------+----------+--------------+
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
查询二级子目录
call show_fs_path_detail(path => 'hdfs://cluster1/warehouse/tablespace/managed/hive/hudi.db/test_hudi_call_cow/dt=2021-05-05', is_sub => true);+-----------+-----------+---------------+---------------------+----------------------------------------------------+-----------------+----------+--------------+| path_num | file_num | storage_size | storage_size(unit) | storage_path | space_consumed | quota | space_quota |+-----------+-----------+---------------+---------------------+----------------------------------------------------+-----------------+----------+--------------+| 0 | 1 | 435353 | 425.15KB | hdfs://cluster1/warehouse/tablespace/managed/hive/hudi.db/test_hudi_call_cow/dt=2021-05-05/35b07424-6e63-4b65-9182-7c37cbe756b1-0_0-192-1549_20221123231230786.parquet | -1 | 1306059 | -1 || 0 | 1 | 435308 | 425.11KB | hdfs://cluster1/warehouse/tablespace/managed/hive/hudi.db/test_hudi_call_cow/dt=2021-05-05/35b07424-6e63-4b65-9182-7c37cbe756b1-0_0-147-118_20221123205701931.parquet | -1 | 1305924 | -1 || 0 | 1 | 435298 | 425.10KB | hdfs://cluster1/warehouse/tablespace/managed/hive/hudi.db/test_hudi_call_cow/dt=2021-05-05/35b07424-6e63-4b65-9182-7c37cbe756b1-0_0-238-2982_20221123232449644.parquet | -1 | 1305894 | -1 || 0 | 1 | 435279 | 425.08KB | hdfs://cluster1/warehouse/tablespace/managed/hive/hudi.db/test_hudi_call_cow/dt=2021-05-05/35b07424-6e63-4b65-9182-7c37cbe756b1-0_0-105-83_20221123205650038.parquet | -1 | 1305837 | -1 || 0 | 1 | 435246 | 425.04KB | hdfs://cluster1/warehouse/tablespace/managed/hive/hudi.db/test_hudi_call_cow/dt=2021-05-05/35b07424-6e63-4b65-9182-7c37cbe756b1-0_0-66-52_20221123205636715.parquet | -1 | 1305738 | -1 || 0 | 1 | 435148 | 424.95KB | hdfs://cluster1/warehouse/tablespace/managed/hive/hudi.db/test_hudi_call_cow/dt=2021-05-05/35b07424-6e63-4b65-9182-7c37cbe756b1-0_0-27-21_20221123205546254.parquet | -1 | 1305444 | -1 || 0 | 1 | 96 | 96B | hdfs://cluster1/warehouse/tablespace/managed/hive/hudi.db/test_hudi_call_cow/dt=2021-05-05/.hoodie_partition_metadata | -1 | 288 | -1 |+-----------+-----------+---------------+---------------------+----------------------------------------------------+-----------------+----------+--------------+
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13