Hive Partition 分区
分区表练习
1. 创建分区表-单列分区
CREATE TABLE IF NOT EXISTS day_table (
id INT,
content string
)
PARTITIONED BY ( dt string)
row format delimited
fields terminated by ',';
这里分区字段不能和表中的字段重复
2. 加载数据
cat >> ./hive-data/day_table.txt <<EOF
1,tom
2,jerry
3,alice
4,bob
5,lily
6,mary
7,jim
8,jane
9,jack
10,judy
EOF
docker 映射 hive-data 目录到宿主机 ./hive-data/day_table.txt: /opt/hive/ddldata/day_table.txt
LOAD DATA local INPATH '/opt/hive/ddldata/day_table.txt'
OVERWRITE INTO TABLE day_table
PARTITION (dt='2025-01-01');
3. 查询数据
SELECT * FROM day_table WHERE dt='2021-01-01'
1 tom 2025-01-01
2 jerry 2025-01-01
3 alice 2025-01-01
4 bob 2025-01-01
5 lily 2025-01-01
6 mary 2025-01-01
7 jim 2025-01-01
8 jane 2025-01-01
9 jack 2025-01-01
10 judy 2025-01-01
存储到hdfs中
/user/hive/warehouse/day_table/dt=2025-01-01/000000_0
插入另一个分区
load data local inpath '/opt/hive/ddldata/day_table.txt'
OVERWRITE INTO TABLE day_table
PARTITION (dt='2025-02-01');
1 tom 2025-01-01
2 jerry 2025-01-01
3 alice 2025-01-01
...
1 tom 2025-02-01
2 jerry 2025-02-01
3 alice 2025-02-01
创建分区表-多列分区
create table day_hour_table (id int, content string)
partitioned by (dt string, hour string)
row format delimited fields terminated by ','
;
-- 向双分区加载数据
load data local inpath '/opt/hive/ddldata/day_table.txt' into table day_hour_table partition (dt='2025-01-01', hour='01');
load data local inpath '/opt/hive/ddldata/day_table.txt' into table day_hour_table partition (dt='2025-01-01', hour='02');
添加两个字段相应的插入数据时要指定两个字段,一个字段会报错
存储
/user/hive/warehouse/day_hour_table/dt=2025-01-01/hour=01/000000_0
/user/hive/warehouse/day_hour_table/dt=2025-01-01/hour=02/000000_0
查询数据
select * from day_hour_table where dt='2025-01-01' and hour='01';
1 tom 2025-01-01 01
2 jerry 2025-01-01 01
3 alice 2025-01-01 01
4 bob 2025-01-01 01
5 lily 2025-01-01 01
6 mary 2025-01-01 01
7 jim 2025-01-01 01
8 jane 2025-01-01 01
9 jack 2025-01-01 01
10 judy 2025-01-01 01
总结
- 创建分区表时,分区字段不能和表中的字段重复
- 加载数据时,指定分区字段的值
- 查询数据时,指定分区字段的值
- 存储数据时,分区字段的值会作为文件夹名
- 多分区表,分区字段值会作为文件夹名的一部分
- 多分区表,查询时,指定所有分区字段的值
- 多分区表,加载数据时,指定所有分区字段的值
- 多分区表,存储数据时,分区字段的值会作为文件夹名的一部分