数据库分区表将表数据分成更小的物理分片,以此提高性能、可用性、易管理性。分区表是关系型数据库中比较常见的对大表的优化方式,数据库管理系统一般都提供了分区管理,而业务可以直接访问分区表而不需要调整业务架构,当然好的性能需要合理的分区访问方式。
分区表是数据库中常见的技术,而PostgreSQL中的分区表有许多专有的特性,比如分区表实现方案多、分区为普通表、分区维护方案、SQL优化还有一些分区表的问题。
PostgreSQL数据库有各式各样的分区实现方式。官方支持的有声明式分区和继承式分区,而三方插件包括pathman、partman等等。在官方声明式分区实现后,基本只推荐一种分区方式:声明式分区。由于再拓展不同实现的分区表的功能、细节、历史等差别会使篇幅过长,且未来意义不大,本篇主要讨论的是声明式分区,其他方式实现的分区功能只会简单介绍。但是由于历史存量和一些功能差异,了解声明分区、继承分区、pathman还是有必要的。
声明分区也叫原生分区,从PG10版本开始支持,相当于“官方支持”的分区表,也是最为推荐的分区方式。虽然与继承分区不一样,但是其内部也是用继承表实现的。声明分区只支持3种分区方式:range分区、list分区、hash分区
range分区表以范围进行分区,分区边界为[t1,t2)
CREATE TABLE PUBLIC.LZLPARTITION1 ( id int, name varchar(50) NULL, DATE_CREATED timestamp NOT NULL DEFAULT now() ) PARTITION BY RANGE(DATE_CREATED);
alter table public.lzlpartition1 add primary key(id,DATE_CREATED)
create table LZLPARTITION1_202301 partition of LZLPARTITION1 for values from ('2023-01-01 00:00:00') to ('2023-02-01 00:00:00'); create table LZLPARTITION1_202302 partition of LZLPARTITION1 for values from ('2023-02-01 00:00:00') to ('2023-03-01 00:00:00');
--往分区表添加一些数据 => INSERT INTO lzlpartition1 SELECT random() * 10000, md5(g::text),g FROM generate_series('2023-01-01'::date, '2023-02-28'::date, '1 minute') as g; INSERT 0 83521
range分区的from t1 to t2,为[t1,t2)范围,下边界包含上边界不包含。
查看分区表,每个分区也是单独的表
lzldb=> \d+ lzlpartition1 Partitioned table "public.lzlpartition1" Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description --------------+-----------------------------+-----------+----------+---------+----------+-------------+--------------+------------- id | integer | | not null | | plain | | | name | character varying(50) | | | | extended | | | date_created | timestamp without time zone | | not null | now() | plain | | | Partition key: RANGE (date_created) Indexes: "lzlpartition1_pkey" PRIMARY KEY, btree (id, date_created) Partitions: lzlpartition1_202301 FOR VALUES FROM ('2023-01-01 00:00:00') TO ('2023-02-01 00:00:00'), lzlpartition1_202302 FOR VALUES FROM ('2023-02-01 00:00:00') TO ('2023-03-01 00:00:00')
lzldb=> \d+ lzlpartition1_202301 Table "public.lzlpartition1_202301" Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description --------------+-----------------------------+-----------+----------+---------+----------+-------------+--------------+------------- id | integer | | not null | | plain | | | name | character varying(50) | | | | extended | | | date_created | timestamp without time zone | | not null | now() | plain | | | Partition of: lzlpartition1 FOR VALUES FROM ('2023-01-01 00:00:00') TO ('2023-02-01 00:00:00') Partition constraint: ((date_created IS NOT NULL) AND (date_created >= '2023-01-01 00:00:00'::timestamp without time zone) AND (date_created < '2023-02-01 00:00:00'::timestamp without time zone)) Indexes: "lzlpartition1_202301_pkey" PRIMARY KEY, btree (id, date_created) Access method: heap
分区上的主键、索引、字段null/CHECK约束自动创建。由于分区也是独立的表,约束和索引也可以单独在分区上创建。(attach则不会自动创建这些,详见attach一节)
list分区以指定的分区值将数据存放到对应的分区上
CREATE TABLE cities ( city_id bigserial not null, name text, population bigint ) PARTITION BY LIST (left(lower(name), 1));
CREATE TABLE cities_ab PARTITION OF cities FOR VALUES IN ('a', 'b'); CREATE TABLE cities_null PARTITION OF cities ( CONSTRAINT city_id_nonzero CHECK (city_id != 0) ) FOR VALUES IN (null);
insert into cities(name,population) values('Acity',10); insert into cities(name,population) values(null,20);
=> SELECT tableoid::regclass,* FROM cities; tableoid | city_id | name | population -------------+---------+--------+------------ cities_ab | 1 | Acity | 10 cities_null | 2 | [null] | 20
list分区表可以创建null分区
hash分区将数据散列存储在各个分区上,以打散热点数据
CREATE TABLE orders (order_id int,name varchar(10)) PARTITION BY HASH (order_id); CREATE TABLE orders_p1 PARTITION OF orders FOR VALUES WITH (MODULUS 3, REMAINDER 0); CREATE TABLE orders_p2 PARTITION OF orders FOR VALUES WITH (MODULUS 3, REMAINDER 1); CREATE TABLE orders_p3 PARTITION OF orders FOR VALUES WITH (MODULUS 3, REMAINDER 2);
不能创建默认分区,也不能创建超过MODULUS的分区
=> CREATE TABLE orders_p2 PARTITION OF orders -> FOR VALUES WITH (MODULUS 3, REMAINDER 4); ERROR: 42P16: remainder for hash partition must be less than modulus LOCATION: transformPartitionBound, parse_utilcmd.c:3939 => CREATE TABLE orders_p4 PARTITION OF orders default; ERROR: 42P16: a hash-partitioned table may not have a default partition LOCATION: transformPartitionBound, parse_utilcmd.c:3909
插入数据
=>insert into orders values(generate_series(1,10000),'a'); INSERT 0 10000 =>SELECT tableoid::regclass,count(*) FROM orders group by tableoid::regclass; tableoid | count -----------+------- orders_p1 | 3277 orders_p3 | 3354 orders_p2 | 3369 =>select tableoid::regclass,* from orders limit 30; tableoid | order_id | name -----------+----------+------ orders_p1 | 2 | a orders_p1 | 4 | a orders_p1 | 6 | a orders_p1 | 8 | a orders_p1 | 15 | a orders_p1 | 16 | a orders_p1 | 18 | a orders_p1 | 19 | a orders_p1 | 20 | a
hash分区数据散列分布
--插入100条null数据 => insert into orders values(null,generate_series(1,100)::text); INSERT 0 100 => SELECT tableoid::regclass,count(*) FROM orders where order_id is null group by tableoid::regclass; tableoid | count -----------+------- orders_p1 | 100 --null数据全部都放在了remainder 0的分区上 =>\d+ orders_p1 Table "public.orders_p1" Column | Type | Collation | Nullable | Default | Storage | Stats target | Description ----------+-----------------------+-----------+----------+---------+----------+--------------+------------- order_id | integer | | | | plain | | name | character varying(10) | | | | extended | | Partition of: orders FOR VALUES WITH (modulus 3, remainder 0) Partition constraint: satisfies_hash_partition('412053'::oid, 3, 0, order_id)
hash分区表虽然没有null分区的概念,但是可以存放null数据,null数据存放在remainder 0上。
分区下面也可以建立分区构成级联模式,子分区可以有不同的分区方式,这样的分区成为混合分区。
创建一个混合分区:
create table part_1000(id bigserial not null,name varchar(10),createddate timestamp) partition by range(createddate); create table part_2001 partition of part_1000 for values from ('2023-01-01 00:00:00') to ('2023-02-01 00:00:00') partition by list(name) ; create table part_2002 partition of part_1000 for values from ('2023-02-01 00:00:00') to ('2023-03-01 00:00:00') partition by list(name) ; create table part_2003 partition of part_1000 for values from ('2023-03-01 00:00:00') to ('2023-04-01 00:00:00') partition by list(name) ; create table part_3001 partition of part_2001 FOR VALUES IN ('abc'); create table part_3002 partition of part_2001 FOR VALUES IN ('def'); create table part_3003 partition of part_2001 FOR VALUES IN ('jkl');
\d+只能看到下一级的分区
\d+ part_1000 Partitioned table "dbmgr.part_1000" Column | Type | Collation | Nullable | Default | Storage | Stats target | Description -------------+-----------------------------+-----------+----------+---------------------------------------+----------+--------------+------------- id | bigint | | not null | nextval('part_1000_id_seq'::regclass) | plain | | name | character varying(10) | | | | extended | | createddate | timestamp without time zone | | | | plain | | Partition key: RANGE (createddate) Partitions: part_2001 FOR VALUES FROM ('2023-01-01 00:00:00') TO ('2023-02-01 00:00:00'), PARTITIONED, part_2002 FOR VALUES FROM ('2023-02-01 00:00:00') TO ('2023-03-01 00:00:00'), PARTITIONED, part_2003 FOR VALUES FROM ('2023-03-01 00:00:00') TO ('2023-04-01 00:00:00'), PARTITIONED => \d+ part_2001 Partitioned table "dbmgr.part_2001" Column | Type | Collation | Nullable | Default | Storage | Stats target | Description -------------+-----------------------------+-----------+----------+---------------------------------------+----------+--------------+------------- id | bigint | | not null | nextval('part_1000_id_seq'::regclass) | plain | | name | character varying(10) | | | | extended | | createddate | timestamp without time zone | | | | plain | | Partition of: part_1000 FOR VALUES FROM ('2023-01-01 00:00:00') TO ('2023-02-01 00:00:00') Partition constraint: ((createddate IS NOT NULL) AND (createddate >= '2023-01-01 00:00:00'::timestamp without time zone) AND (createddate < '2023-02-01 00:00:00'::timestamp without time zone)) Partition key: LIST (name) Partitions: part_3001 FOR VALUES IN ('abc'), part_3002 FOR VALUES IN ('def'), part_3003 FOR VALUES IN ('jkl')
此时插入一条数据
=> insert into part_1000 values(random() * 10000,'abc','2023-01-01 08:00:00'); INSERT 0 1 => SELECT tableoid::regclass,* FROM part_1000; tableoid | id | name | createddate -----------+------+------+--------------------- part_3001 | 6385 | abc | 2023-01-01 08:00:00
数据存放在最底层的子分区中
继承分区也是官方支持的,它利用了PGSQL的继承表特性来实现分区表的功能。继承分区表会比声明分区表更灵活。
继承分区表的实现需要到了PGSQL中的2个功能:继承表和写入重定向。写入重定向可以通过rule或者trigger来实现。
创建继承分区表示例:
1.创建父表
CREATE TABLE measurement ( city_id int not null, logdate date not null, peaktemp int, unitsales int );
2.创建继承表,指定约束范围
CREATE TABLE measurement_202308 ( CHECK ( logdate >= DATE '2023-08-01' AND logdate < DATE '2023-09-01' ) ) INHERITS (measurement); CREATE TABLE measurement_202309 ( CHECK ( logdate >= DATE '2023-09-01' AND logdate < DATE '2023-10-01' ) ) INHERITS (measurement);
3.创建规则或触发器,将插入数据重定向到对应的继承表中
CREATE OR REPLACE FUNCTION measurement_insert_trigger() RETURNS TRIGGER AS $$ BEGIN IF ( NEW.logdate >= DATE '2023-08-01' AND NEW.logdate < DATE '2023-09-01' ) THEN INSERT INTO measurement_202308 VALUES (NEW.*); ELSIF ( NEW.logdate >= DATE '2023-09-01' AND NEW.logdate < DATE '2023-10-01' ) THEN INSERT INTO measurement_202309 VALUES (NEW.*); ELSE RAISE EXCEPTION 'Date out of range. Fix the measurement_insert_trigger() function!'; END IF; RETURN NULL; END; $$ LANGUAGE plpgsql;
CREATE TRIGGER insert_measurement_trigger BEFORE INSERT ON measurement FOR EACH ROW EXECUTE FUNCTION measurement_insert_trigger();
一个最基本的继承分区表就创建好了。
=>\d+ measurement Table "public.measurement" Column | Type | Collation | Nullable | Default | Storage | Stats target | Description -----------+---------+-----------+----------+---------+---------+--------------+------------- city_id | integer | | not null | | plain | | logdate | date | | not null | | plain | | peaktemp | integer | | | | plain | | unitsales | integer | | | | plain | | Triggers: insert_measurement_trigger BEFORE INSERT ON measurement FOR EACH ROW EXECUTE FUNCTION measurement_insert_trigger() Child tables: measurement_202308, measurement_202309 Access method: heap
测试一下插入和查询数据
--插入范围外的数据会报错 => insert into measurement values(1001, now() - interval '31' day ,1,1); ERROR: P0001: Date out of range. Fix the measurement_insert_trigger() function! CONTEXT: PL/pgSQL function measurement_insert_trigger() line 10 at RAISE LOCATION: exec_stmt_raise, pl_exec.c:3889 --插入数据会重定向到子表上 => insert into measurement values(1001,now(),1,1); INSERT 0 0 --查询父表会查到子表数据 => select tableoid::regclass,* from measurement; tableoid | city_id | logdate | peaktemp | unitsales --------------------+---------+------------+----------+----------- measurement_202308 | 1001 | 2023-08-03 | 1 | 1
RULE和trigger
除了触发器,PGSQL还可以用rule来重定向插入。
rule语句参考:
CREATE RULE measurement_insert_202308 AS ON INSERT TO measurement WHERE ( logdate >= DATE '2023-08-01' AND logdate < DATE '2023-08-01' ) DO INSTEAD INSERT INTO measurement_202308 VALUES (NEW.*); CREATE RULE measurement_insert_202309 AS ON INSERT TO measurement WHERE ( logdate >= DATE '2023-09-01' AND logdate < DATE '2023-09-01' ) DO INSTEAD INSERT INTO measurement_202309 VALUES (NEW.*);
规则和触发器的差异:
索引
为了提升性能,还需要创建索引和开启constrain_exclusion。分区的索引一般都是必不可少,继承表的索引需要手动在子表上创建。
创建索引示例:
CREATE INDEX idx_measurement_202308_logdate ON measurement_202308 (logdate); CREATE INDEX idx_measurement_202309_logdate ON measurement_202309 (logdate);
插入一些数据查看执行计划:
--'2023-08-04'只有1条数据,让其可以走到索引 => insert into measurement values(1001,now()+interval '1' day,1,1); INSERT 0 0 insert into orders values(generate_series(1,10000),'a'); => insert into measurement values(generate_series(1,1000),now(),1,1); INSERT 0 0 => explain select * from measurement where logdate='2023-08-04'; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------ Append (cost=0.00..5.17 rows=2 width=16) -> Seq Scan on measurement measurement_1 (cost=0.00..0.00 rows=1 width=16) Filter: (logdate = '2023-08-04'::date) -> Index Scan using idx_measurement_202308_logdate on measurement_202308 measurement_2 (cost=0.14..5.16 rows=1 width=16) Index Cond: (logdate = '2023-08-04'::date)
上面的执行计划8月的分区走到了分区上的索引。constraint_exclusion默认打开了继承表的约束排除,上面的查询排除了9月分区,只扫描了8月。然而由于父表没有约束(也加不了)所以父表一定在执行计划里面,但是父表一般都是空的所以影响不大。
constraint_exclusion拥有控制优化器是否使用约束来减少非必要的访问表,该参数在继承分区表优化上常见,通过减少子表的访问,提升SQL的性能(该功能跟enable_partition_pruning参数类似,enable_partition_pruning用于控制声明式分区表的分区裁剪)。constraint_exclusion有3个值:
on:所有表都会检查约束
partition:继承表和UNION ALL子查询检查约束(默认值)
off:不会检查约束
约束排除只能发生在生成执行计划时,不会发生在真正执行时(分区裁剪是可以的)。这意味着当使用绑定变量、变量值时不会发生约束排除。
例如在使用now()等优化器不知道具体值的函数时,优化器无法排除根本不需要访问的分区:
=> select now(); now ------------------------------- 2023-08-03 17:12:04.772658+08 --优化器没有排除9月的分区 => explain select * from measurement where logdate<=now(); QUERY PLAN ----------------------------------------------------------------------------------------------------- Append (cost=0.00..55.98 rows=1628 width=16) -> Seq Scan on measurement measurement_1 (cost=0.00..0.00 rows=1 width=16) Filter: (logdate <= now()) -> Seq Scan on measurement_202308 measurement_2 (cost=0.00..21.15 rows=1010 width=16) Filter: (logdate <= now()) -> Bitmap Heap Scan on measurement_202309 measurement_3 (cost=7.44..26.69 rows=617 width=16) Recheck Cond: (logdate <= now()) -> Bitmap Index Scan on idx_measurement_202309_logdate (cost=0.00..7.28 rows=617 width=0) Index Cond: (logdate <= now())
另外,约束排除本身需要检查所有子表的约束,如果子表约束过多生成执行计划的效率会受到影响,所以继承分区不建议创建过多的子分区。
将一个继承分区做成普通表
ALTER TABLE measurement_202308 NO INHERIT measurement;
将一个含有数据的普通表当成子表加入到继承分区表中
CREATE TABLE measurement_202310 (LIKE measurement INCLUDING DEFAULTS INCLUDING CONSTRAINTS); ALTER TABLE measurement_202310 ADD CONSTRAINT measurement_202310_logdate_check CHECK ( logdate >= DATE '2023-10-01' AND logdate < DATE '2023-11-01' ); --insert into measurement_202310 values(2001,'20231010',3,3); ALTER TABLE measurement_202310 INHERIT measurement;
pg_pathman是三方插件实现的分区表功能。github上的pathman readme和使用pg_pathman插件的文章对pathman描述和使用已经非常详细,这里仅摘几个重点汇总和做一些简单的测试。
不再更新
NOTE: this project is not under development anymore
pg_pathman支持postgres9.5到15,PostgreSQL后续版本不会再支持,已有版本也只做BUG修复,不会再新增功能。
pg_pathman的出现是因为老版本的PostgreSQL分区表功能不完善,而现在原生分区表(也就是声明分区表)已非常成熟,pg_pathman也建议使用原生分区表,存量的pg_pathman分区表也建议转移到原生分区表。曾经被许多用户认可的pg_pathman成为历史,即使不再更新,它的功能也比目前的原生分区表更多。
特性介绍
pg_pathman功能相当强大,一些原生分区表不支持的功能pathman也支持。 pathman虽然强大但也不是完美的,在实际使用过程中问题也很多。pg_pathman特性中比较需要关注的点包括:
创建pathman range分区
--普通表就是父表 CREATE TABLE journal ( id SERIAL, dt TIMESTAMP NOT NULL, level INTEGER, msg TEXT); -- 子分区会自动创建父表上的索引 CREATE INDEX ON journal(dt); --创建分区 select create_range_partitions('journal'::regclass, 'dt', '2023-01-01 00:00:00'::timestamp, interval '1 month', 6, false) ;
--查看表定义 => \d+ journal Table "public.journal" Column | Type | Collation | Nullable | Default | Storage | Stats target | Description --------+-----------------------------+-----------+----------+-------------------------------------+----------+--------------+------------- id | integer | | not null | nextval('journal_id_seq'::regclass) | plain | | dt | timestamp without time zone | | not null | | plain | | level | integer | | | | plain | | msg | text | | | | extended | | Indexes: "journal_dt_idx" btree (dt) Child tables: journal_1, journal_2, journal_3, journal_4, journal_5, journal_6 Access method: heap => \d+ journal_6 Table "public.journal_6" Column | Type | Collation | Nullable | Default | Storage | Stats target | Description --------+-----------------------------+-----------+----------+-------------------------------------+----------+--------------+------------- id | integer | | not null | nextval('journal_id_seq'::regclass) | plain | | dt | timestamp without time zone | | not null | | plain | | level | integer | | | | plain | | msg | text | | | | extended | | Indexes: "journal_6_dt_idx" btree (dt) Check constraints: "pathman_journal_6_check" CHECK (dt >= '2023-06-01 00:00:00'::timestamp without time zone AND dt < '2023-07-01 00:00:00'::timestamp without time zone) Inherits: journal Access method: heap
--插入数据 INSERT INTO journal (dt, level, msg) SELECT g, random() * 10000, md5(g::text) FROM generate_series('2023-01-01'::date, '2023-02-28'::date, '1 hour') as g;
--插入还未创建对应分区的数据 => INSERT INTO journal (dt, level, msg) values('2023-07-01'::date,'11','1'); INSERT 0 1 --查看分区数据分布,已成功创建interval分区 => SELECT tableoid::regclass AS partition, count(*) FROM journal group by partition; partition | count -----------+------- journal_7 | 1 journal_2 | 649 journal_1 | 744
--查看执行计划 --已发生分区裁剪 => explain select * from journal where dt='2023-01-01 22:00:00'; QUERY PLAN ----------------------------------------------------------------------------------------------------- Append (cost=0.00..5.30 rows=2 width=48) -> Seq Scan on journal journal_1 (cost=0.00..0.00 rows=1 width=48) Filter: (dt = '2023-01-01 22:00:00'::timestamp without time zone) -> Index Scan using journal_1_dt_idx on journal_1 journal_1_1 (cost=0.28..5.29 rows=1 width=49) Index Cond: (dt = '2023-01-01 22:00:00'::timestamp without time zone)
创建pathman hash分区
--创建主表 CREATE TABLE items ( id SERIAL PRIMARY KEY, name TEXT, code BIGINT); --创建hash分区 select create_hash_partitions('items'::regclass, 'id', 3, false) ; --插入数据 INSERT INTO items (id, name, code) SELECT g, md5(g::text), random() * 100000 FROM generate_series(1, 1000) as g;
=> SELECT tableoid::regclass AS partition, count(*) FROM items group by partition; partition | count -----------+------- items_2 | 344 items_0 | 318 items_1 | 338 => \d+ items Table "public.items" Column | Type | Collation | Nullable | Default | Storage | Stats target | Description --------+---------+-----------+----------+-----------------------------------+----------+--------------+------------- id | integer | | not null | nextval('items_id_seq'::regclass) | plain | | name | text | | | | extended | | code | bigint | | | | plain | | Indexes: "items_pkey" PRIMARY KEY, btree (id) Child tables: items_0, items_1, items_2 Access method: heap => \d+ items_1 Table "public.items_1" Column | Type | Collation | Nullable | Default | Storage | Stats target | Description --------+---------+-----------+----------+-----------------------------------+----------+--------------+------------- id | integer | | not null | nextval('items_id_seq'::regclass) | plain | | name | text | | | | extended | | code | bigint | | | | plain | | Indexes: "items_1_pkey" PRIMARY KEY, btree (id) Check constraints: "pathman_items_1_check" CHECK (get_hash_part_idx(hashint4(id), 3) = 1) Inherits: items Access method: heap => SELECT tableoid::regclass AS partition, count(*) FROM items group by partition; partition | count -----------+------- items_2 | 344 items_0 | 318 items_1 | 338
没有原生的自动创建分区功能
只支持分区索引,不支持全局索引
主键必须包含分区键。postgresql目前只能在各自的分区内判断唯一性,所以有这个限制。oracle和mysql都没有这种限制。
唯一索引必须包含分区键。postgresql目前只能在各自的分区内判断唯一性。同理主键
无法创建定义在全局的约束
INSERT的BEFORE ROW触发器不能更新insert的那个分区
临时表分区和普通表分区不能在同一分区表下。
声明式分区的父表子表的列必须一致;继承式分区的子表可以比父表的列更多。
声明式分区CHECK和NOT NULL约束总是继承的,不能单独设置分区的这两种约束
range 不能存储NULL值;hash分区没有null分区的概念,但可以存储null值,null值存放在remainder 0分区上;list分区可以显示创建null分区存放null数据
首先使用分区表必须要了解分区表所带来的优劣势和使用限制,比如数据量很大的时候分区可以带来性能提升,冷热数据分离也更好管理分区数据等等。应该结合本身业务情况、硬件资源选择是否分区和如何分区。但是总会有开发人员会问到底多少数据量该分区等等的问题,使用分区表的建议只能给出一个笼统的回答,如果你不知道怎么分区,可以参考以下建议(如果足够了解表分区,请忽略):
权限问题是分区表知识点中讨论的比较少的,但是它仍然值得关注。
由于PostgreSQL数据库有“分区子表也是普通表”这样的概念,这与其他几个常见的数据库(Oracle、mysql)是不同的。比如在oracle中不需要关心分区子表的权限,但是在pg中却需要关注权限问题。
partition of/attach不会将主表的权限继承给子表
--把分区去表的select授权给一个普通用户 => grant select on lzlpartition1 to userlzl; GRANT --查看权限,只有主表有授权,存量分区子表不会自动授权 => select grantee,table_schema,table_name,privilege_type from information_schema.table_privileges where grantee='userlzl'; grantee | table_schema | table_name | privilege_type ---------+--------------+---------------+---------------- userlzl | public | lzlpartition1 | SELECT --partition of方式创建一个分区 => create table LZLPARTITION1_202303 partition of LZLPARTITION1 for values from ('2023-03-01 00:00:00') to ('2023-04-01 00:00:00'); CREATE TABLE --attach方式创建一个分区 => CREATE TABLE lzlpartition1_202304 -> (LIKE lzlpartition1 INCLUDING DEFAULTS INCLUDING CONSTRAINTS); CREATE TABLE => alter table lzlpartition1 attach partition lzlpartition1_202304 for values from ('2023-04-01 00:00:00') to ('2023-05-01 00:00:00'); ALTER TABLE --再次查看权限,新增的子分区不会自动授权给其他用户(但是新增子分区权限会自动授权给owner) => select grantee,table_schema,table_name,privilege_type from information_schema.table_privileges where grantee='userlzl'; grantee | table_schema | table_name | privilege_type ---------+--------------+---------------+---------------- userlzl | public | lzlpartition1 | SELECT
目前来看userlzl 这个用户对所有子表都没有访问权限,但是有主表的权限
此时userlzl可以通过主表访问分区数据,但不能通过直接访问子表访问数据
=> \c - userlzl; You are now connected to database "dbmgr" as user "userlzl". => select * from LZLPARTITION1 where date_created='2023-01-02 10:00:00'; id | name | date_created ------+----------------------------------+--------------------- 2159 | d05d716da126ff4b44d934344cc4dd7a | 2023-01-02 10:00:00 => select * from LZLPARTITION1_202301 where date_created='2023-01-02 10:00:00'; ERROR: 42501: permission denied for table lzlpartition1_202301 LOCATION: aclcheck_error, aclchk.c:3466
因为attach/detach不会处理权限,此时如果我们把分区detach出来,这个分区同样不能被userlzl访问
=> alter table lzlpartition1 detach partition lzlpartition1_202303; ALTER TABLE => \dp+ lzlpartition1_202303; Access privileges Schema | Name | Type | Access privileges | Column privileges | Policies --------+----------------------+-------+-------------------+-------------------+---------- dbmgr | lzlpartition1_202303 | table | | | => select * from LZLPARTITION1_202301 where date_created='2023-01-02 10:00:00'; ERROR: 42501: permission denied for table lzlpartition1_202301
由此可知
但是,分区表权限并不是仅仅控制是否能够访问,没有分区子表权限可能导致执行计划异常,参考文章:不同用户查看到不同的执行计划
这个问题是一个偶发现象,导致超级用户和一般用户看到的sql执行计划不一致,实际上业务SQL执行计划异常却看不出来,比较难定位。分区子表有各自的统计信息,子表权限与父表不一致(即便是partition of创建的分区),导致用户可以通过主表访问子表的数据,却不能查看子表的统计信息。权限问题导致了执行计划产生差异。
这与“权限只控制是否能访问表,不控制如何访问表”的一般概念是违背的,所以需要注意这个权限问题。
为了提供子表统计信息权限,建议显示对用户授权所有子表查询权限,就可以避免以上问题
grant select on table_partition_allname to username;
attach/detach可以将一个已存在的表作为分区添加/分离分区表。attach/detach在维护工作中很有用。
先来看看"create table…partition of"方式添加分区和"drop table"删除分区的锁情况
锁矩阵:https://www.postgresql.org/docs/current/explicit-locking.html
申请的锁:https://postgres-locks.husseinnasser.com
--session1 开启一个事务,只读数据 => select * from lzlpartition1 where date_created='2023-01-01 00:00:00'; id | name | date_created ------+----------------------------------+--------------------- 8249 | 256ac66bb53d31bc6124294238d6410c | 2023-01-01 00:00:00 --session3 查看锁情况。读取一个分区数据时,要在子分区和主表上同时获得锁 => select l.locktype,d.datname,r.relname,l.virtualxid,l.transactionid,l.pid,l.mode,l.granted from pg_locks l left join pg_database d on l.database=d.oid left join pg_class r on l.relation=r.oid where relname like '%lzlpartition1%'; locktype | datname | relname | virtualxid | transactionid | pid | mode | granted ----------+---------+----------------------+------------+---------------+--------+-----------------+--------- relation | dbmgr | lzlpartition1_202301 | [null] | [null] | 311449 | AccessShareLock | t relation | dbmgr | lzlpartition1 | [null] | [null] | 311449 | AccessShareLock | t --session2 partition of方式添加分区 => create table LZLPARTITION1_202305 partition of LZLPARTITION1 for values from ('2023-05-01 00:00:00') to ('2023-06-01 00:00:00'); --等待 --session3 再次查看锁 => select l.locktype,d.datname,r.relname,l.virtualxid,l.transactionid,l.pid,l.mode,l.granted from pg_locks l left join pg_database d on l.database=d.oid left join pg_class r on l.relation=r.oid where relname like '%lzlpartition1%'; locktype | datname | relname | virtualxid | transactionid | pid | mode | granted ----------+---------+----------------------+------------+---------------+--------+---------------------+--------- relation | dbmgr | lzlpartition1_202301 | [null] | [null] | 311449 | AccessShareLock | t relation | dbmgr | lzlpartition1 | [null] | [null] | 311449 | AccessShareLock | t relation | dbmgr | lzlpartition1 | [null] | [null] | 308525 | AccessExclusiveLock | f --此为partition of会话 --session4 再随便来一个查询 => select * from lzlpartition1 where date_created='2023-01-01 00:00:00'; --等待 --session4再次查看锁 => select l.locktype,d.datname,r.relname,l.virtualxid,l.transactionid,l.pid,l.mode,l.granted from pg_locks l left join pg_database d on l.database=d.oid left join pg_class r on l.relation=r.oid where relname like '%lzlpartition1%'; locktype | datname | relname | virtualxid | transactionid | pid | mode | granted ----------+---------+----------------------+------------+---------------+--------+---------------------+--------- relation | dbmgr | lzlpartition1_202301 | [null] | [null] | 311449 | AccessShareLock | t relation | dbmgr | lzlpartition1 | [null] | [null] | 311449 | AccessShareLock | t relation | dbmgr | lzlpartition1 | [null] | [null] | 308525 | AccessExclusiveLock | f relation | dbmgr | lzlpartition1 | [null] | [null] | 84774 | AccessShareLock | f --查询阻塞
partition of方式新增分区时,会申请主表的 AccessExclusiveLock,等待主表一切事务的同时也会阻塞主表的一切事务。
虽然partition of语句本身执行很快,但是如果遇到主表上有长事务,那么分区表上的所有操作都会长时间停滞。如果没有停机窗口,直接使用partition of方式新增分区是不推荐的。
--session1 再次开启只读事务 --session2 删除分区表的子分区 => drop table lzlpartition1_202305; --等待 --session3 查看锁情况 => select l.locktype,d.datname,r.relname,l.virtualxid,l.transactionid,l.pid,l.mode,l.granted from pg_locks l left join pg_database d on l.database=d.oid left join pg_class r on l.relation=r.oid where relname like '%lzlpartition1%'; locktype | datname | relname | virtualxid | transactionid | pid | mode | granted ----------+---------+----------------------+------------+---------------+--------+---------------------+--------- relation | dbmgr | lzlpartition1_202301 | [null] | [null] | 311449 | AccessShareLock | t relation | dbmgr | lzlpartition1 | [null] | [null] | 311449 | AccessShareLock | t relation | dbmgr | lzlpartition1 | [null] | [null] | 308525 | AccessExclusiveLock | f
drop table删除子分区时,会申请主表上的AccessExclusiveLock,等待一切和阻塞一切。同样的在生产环境需要谨慎使用。
attach将一个已存在的普通表附加到分区表上
虽然attach跟partition of都可以添加分区,但是需要注意ATTACH不会自动创建索引、约束、默认值、行级触发器,这点跟partition of是不同的。
先创建一个表
--减少繁琐的ddl,like方式创建表 CREATE TABLE lzlpartition1_202305 (LIKE lzlpartition1 INCLUDING DEFAULTS INCLUDING CONSTRAINTS);
再观察attach的是否被阻塞
--session1 开启读写事务 =>begin; BEGIN => insert into lzlpartition1 values('1234','abcd','2023-01-01 01:00:00'); INSERT 0 1 --session3 查看锁情况 => select l.locktype,d.datname,r.relname,l.virtualxid,l.transactionid,l.pid,l.mode,l.granted from pg_locks l left join pg_database d on l.database=d.oid left join pg_class r on l.relation=r.oid where relname like '%lzlpartition1%'; locktype | datname | relname | virtualxid | transactionid | pid | mode | granted ----------+---------+----------------------+------------+---------------+--------+------------------+--------- relation | dbmgr | lzlpartition1_202301 | [null] | [null] | 311449 | RowExclusiveLock | t relation | dbmgr | lzlpartition1 | [null] | [null] | 311449 | AccessShareLock | t relation | dbmgr | lzlpartition1 | [null] | [null] | 311449 | RowExclusiveLock | t --dml语句会获得分区主表和对应分区子表的RowExclusiveLock --session2 attach新建的表到分区主表上 => alter table lzlpartition1 attach partition lzlpartition1_202305 for values from ('2023-05-01 00:00:00') to ('2023-06-01 00:00:00'); ALTER TABLE
attach只会申请SHARE UPDATE EXCLUSIVE锁,比ACCESS EXCLUSIVE低很多。
attach与读写都互不阻塞,所以推荐以attach方式添加分区,不影响业务,可在线执行。
detach将一个分区脱离分区表,成为一个普通表
--session1 保持dml事务不提交 --session2 detach一个分区 alter table lzlpartition1 detach partition lzlpartition1_202305; --等待 --session3 查看锁情况 => select l.locktype,d.datname,r.relname,l.virtualxid,l.transactionid,l.pid,l.mode,l.granted from pg_locks l left join pg_database d on l.database=d.oid left join pg_class r on l.relation=r.oid where relname like '%lzlpartition1%'; locktype | datname | relname | virtualxid | transactionid | pid | mode | granted ----------+---------+----------------------+------------+---------------+--------+---------------------+--------- relation | dbmgr | lzlpartition1_202301 | [null] | [null] | 311449 | RowExclusiveLock | t relation | dbmgr | lzlpartition1 | [null] | [null] | 311449 | RowExclusiveLock | t relation | dbmgr | lzlpartition1 | [null] | [null] | 308525 | AccessShareLock | t relation | dbmgr | lzlpartition1 | [null] | [null] | 308525 | AccessExclusiveLock | f
detach跟attach不同,detach申请了主表的AccessExclusiveLock,等待一切和阻塞一切。
detach concurrently
PostgreSQL 14开始,detach新增了两种语法CONCURRENTLY 和FINALIZE
ALTER TABLE [ IF EXISTS ] name
DETACH PARTITION partition_name [ CONCURRENTLY | FINALIZE ]
detach concurrently内部会开启两次事务,第一次事务会在主表和子表上都申请SHARE UPDATE EXCLUSIVE锁,分区会标记为正在detach的状态,此时会等待分区表上的所有事务提交。一旦这些事务全部都提交了,第二次事务会申请主表上的 SHARE UPDATE EXCLUSIVE锁和那个子表上ACCESS EXCLUSIVE锁,随后detach concurrently完成。
另外,detach concurrently后的子分区,会保留约束,由分区约束转化为check约束保留在detach后的表上
DETACH CONCURRENTLY的限制:
concurrently的阻塞情况:
--session1 lzldb=> begin; BEGIN lzldb=*> insert into lzlpartition1 values('1234','abcd','2023-01-01 01:00:00'); INSERT 0 1 --session2 detach concurrently lzldb=> alter table lzlpartition1 detach partition lzlpartition1_202301 concurrently; --等待 --session3 查看锁 3691 | insert into lzlpartition1 values('1234','abcd','2023-01-01 01:00:00'); | Client | ClientRead 3940 | alter table lzlpartition1 detach partition lzlpartition1_202301 concurrently; | Lock | virtualxid 3947 | select pid,query,wait_event_type,wait_event from pg_stat_activity; | | --detach会话是3940,非常奇怪,detach的等待事件是virtualxid,等待事件类型是Lock --查看锁的情况 lzldb=> select locktype,database,relation,virtualtransaction,pid,mode,granted from pg_locks where pid in (3691,3940); locktype | database | relation | virtualtransaction | pid | mode | granted ---------------+----------+----------+--------------------+------+------------------+--------- virtualxid | | | 6/9 | 3940 | ExclusiveLock | t relation | 16387 | 40969 | 5/179 | 3691 | RowExclusiveLock | t relation | 16387 | 40963 | 5/179 | 3691 | RowExclusiveLock | t virtualxid | | | 5/179 | 3691 | ExclusiveLock | t virtualxid | | | 6/9 | 3940 | ShareLock | f transactionid | | | 5/179 | 3691 | ExclusiveLock | t --此时的detach还没有等待表上的锁,而是在等待virtualxid的ShareLock --session4 做个插入 lzldb=> insert into lzlpartition1 values('12345','abcd','2023-01-01 01:00:00'); ERROR: no partition of relation "lzlpartition1" found for row DETAIL: Partition key of the failing row contains (date_created) = (2023-01-01 01:00:00). lzldb=> insert into lzlpartition1 values('12345','abcd','2023-02-01 01:00:00'); INSERT 0 1 --此时detach的分区已经不能插入,其他分区可以插入 --如果通过分区插入会怎样呢?可以正常插入 lzldb=> insert into lzlpartition1_202301 values('12345','abcd','2023-01-01 01:00:00'); INSERT 0 1 --注意此时它还是个分区表的分区,还不是一个普通表,不过它已经被标记为不可用了 --\d+查看分区处于DETACH PENDING状态 Partitions: lzlpartition1_202301 FOR VALUES FROM ('2023-01-01 00:00:00') TO ('2023-02-01 00:00:00') (DETACH PENDING), lzlpartition1_202302 FOR VALUES FROM ('2023-02-01 00:00:00') TO ('2023-03-01 00:00:00') --把insert的session1提交/回滚 lzldb=> rollback; ROLLBACK --session2立即完成 lzldb=> alter table lzlpartition1 detach partition lzlpartition1_202301 concurrently; ALTER TABLE
FINALIZE:
--session1 lzldb=> begin; BEGIN lzldb=*> insert into lzlpartition1 values('1234','abcd','2023-01-01 01:00:00'); INSERT 0 1 --session2 detach concurrently,手动cancel lzldb=> alter table lzlpartition1 detach partition lzlpartition1_202301 concurrently; ^CCancel request sent ERROR: canceling statement due to user request --\d+查看分区表,分区处于DETACH PENDING状态 Partitions: lzlpartition1_202301 FOR VALUES FROM ('2023-01-01 00:00:00') TO ('2023-02-01 00:00:00') (DETACH PENDING), lzlpartition1_202302 FOR VALUES FROM ('2023-02-01 00:00:00') TO ('2023-03-01 00:00:00') --DETACH PENDING的分区,SQL已经不会去访问了 lzldb=> explain select * from lzlpartition1; QUERY PLAN ----------------------------------------------------------------------------------------- Seq Scan on lzlpartition1_202302 lzlpartition1 (cost=0.00..752.81 rows=38881 width=45) --finalize使之完成 lzldb=> alter table lzlpartition1 detach partition lzlpartition1_202301 finalize; --等待 --查看锁情况 lzldb=> select l.locktype,d.datname,r.relname,l.virtualxid,l.transactionid,l.pid,l.mode,l.granted from pg_locks l left join pg_database d on l.database=d.oid left join pg_class r on l.relation=r.oid where relname like '%lzlpartition1%';lzldb-# locktype | datname | relname | virtualxid | transactionid | pid | mode | granted ----------+---------+----------------------+------------+---------------+------+--------------------------+--------- relation | lzldb | lzlpartition1 | | | 3691 | RowExclusiveLock | t relation | lzldb | lzlpartition1_202301 | | | 3940 | AccessExclusiveLock | f relation | lzldb | lzlpartition1 | | | 3940 | ShareUpdateExclusiveLock | t relation | lzldb | lzlpartition1_202301 | | | 3691 | RowExclusiveLock | t --3940,finalize申请了分区主表的ShareUpdateExclusiveLock和子表的AccessExclusiveLock --由于是插入的是数据的分区刚好是detach的分区,所以发生等待 --session1结束 lzldb=!> rollback; ROLLBACK --session2立即完成 lzldb=> alter table lzlpartition1 detach partition lzlpartition1_202301 finalize; ALTER TABLE
虽然detach的分区会申请8级锁,但是一般业务也没有直接通过子分区写数据的,所以只需要关注分区表的长事务尽快完成就行,一般不需要担心造成该分区子表上的后续阻塞。
在线detach小结:
=> SELECT tableoid::regclass AS partition, count(*) FROM lzlpartition1 group by partition; partition | count ----------------------+--------- lzlpartition1_202301 | 2592001 lzlpartition1_202302 | 38881
注意这个202301的分区有一个partition constraint
=> \d+ lzlpartition1_202301 Table "public.lzlpartition1_202301" Column | Type | Collation | Nullable | Default | Storage | Stats target | Description --------------+-----------------------------+-----------+----------+---------+----------+--------------+------------- id | integer | | not null | | plain | | name | character varying(50) | | | | extended | | date_created | timestamp without time zone | | not null | now() | plain | | Partition of: lzlpartition1 FOR VALUES FROM ('2023-01-01 00:00:00') TO ('2023-02-01 00:00:00') Partition constraint: ((date_created IS NOT NULL) AND (date_created >= '2023-01-01 00:00:00'::timestamp without time zone) AND (date_created < '2023-02-01 00:00:00'::timestamp without t Indexes: "lzlpartition1_202301_pkey" PRIMARY KEY, btree (id, date_created) Access method: heap
alter table lzlpartition1 detach partition lzlpartition1_202301;
--detach后,partition constraint就没有了 => \d+ lzlpartition1_202301 Table "public.lzlpartition1_202301" Column | Type | Collation | Nullable | Default | Storage | Stats target | Description --------------+-----------------------------+-----------+----------+---------+----------+--------------+------------- id | integer | | not null | | plain | | name | character varying(50) | | | | extended | | date_created | timestamp without time zone | | not null | now() | plain | | Indexes: "lzlpartition1_202301_pkey" PRIMARY KEY, btree (id, date_created) Access method: heap
=> alter table lzlpartition1 attach partition lzlpartition1_202301 for values from ('2023-01-01 00:00:00') to ('2023-02-01 00:00:00'); ALTER TABLE Time: 343.498 ms
由于要扫描分区数据是否满足分区范围,attach耗时300+ms
=> alter table lzlpartition1 detach partition lzlpartition1_202301; => alter table lzlpartition1_202301 add constraint chk_202301 CHECK -> ((date_created IS NOT NULL) AND (date_created >= '2023-01-01 00:00:00'::timestamp without time zone) AND (date_created < '2023-02-01 00:00:00'::timestamp without time zone)); ALTER TABLE Time: 355.458 ms
上面添加check约束的耗时跟没有check时的attach操作耗时差不多,因为添加check约束同样也要扫描检查所有数据。
check约束添加完后再attach,此时的attach操作非常快就能完成
=> alter table lzlpartition1 attach partition lzlpartition1_202301 for values from ('2023-01-01 00:00:00') to ('2023-02-01 00:00:00'); ALTER TABLE Time: 1.480 ms
=> \d+ lzlpartition1_202301; Table "public.lzlpartition1_202301" Column | Type | Collation | Nullable | Default | Storage | Stats target | Description --------------+-----------------------------+-----------+----------+---------+----------+--------------+------------- id | integer | | not null | | plain | | name | character varying(50) | | | | extended | | date_created | timestamp without time zone | | not null | now() | plain | | Partition of: lzlpartition1 FOR VALUES FROM ('2023-01-01 00:00:00') TO ('2023-02-01 00:00:00') Partition constraint: ((date_created IS NOT NULL) AND (date_created >= '2023-01-01 00:00:00'::timestamp without time zone) AND (date_created < '2023-02-01 00:00:00'::timestamp without t Indexes: "lzlpartition1_202301_pkey" PRIMARY KEY, btree (id, date_created) Check constraints: "chk_202301" CHECK (date_created IS NOT NULL AND date_created >= '2023-01-01 00:00:00'::timestamp without time zone AND date_created < '2023-02-01 00:00:00'::timestamp without time Access method: heap
注意,check constraint和partition constraint是不一样的概念,虽然他俩的约束内容可以是一致的。attach使用了check约束但是不会进行合并,可以显式删除这个多余的check。
=> alter table lzlpartition1_202301 drop constraint chk_202301; ALTER TABLE
另外,需要关注drop constraint申请了当前子分区上AccessExclusiveLock,这是最高级别的锁,会阻塞任何操作。所以当前子分区上有事务,谨慎执行drop constraint。
=> select l.locktype,d.datname,r.relname,l.virtualxid,l.transactionid,l.pid,l.mode,l.granted from pg_locks l left join pg_database d on l.database=d.oid left join pg_class r on l.relation=r.oid where relname like '%lzlpartition1%'; locktype | datname | relname | virtualxid | transactionid | pid | mode | granted ----------+---------+----------------------+------------+---------------+--------+---------------------+--------- relation | dbmgr | lzlpartition1 | [null] | [null] | 448243 | AccessShareLock | t relation | dbmgr | lzlpartition1 | [null] | [null] | 448243 | RowExclusiveLock | t relation | dbmgr | lzlpartition1_202301 | [null] | [null] | 444399 | AccessShareLock | t relation | dbmgr | lzlpartition1_202301 | [null] | [null] | 444399 | AccessExclusiveLock | f --这个就是drop constraint会话 relation | dbmgr | lzlpartition1_202301 | [null] | [null] | 448243 | RowExclusiveLock | t
so,
在attach分区时,先添加check约束是比较有用,它可以减少attach的执行时间,数据检查在attach前完成就可以了
我们现在知道,attach可以在线执行,而partition of/drop table/detach都会申请等待和阻塞一切的AccessExclusiveLock
so,
建议用attach新建分区。partition of/detach都会等待和阻塞一切事务,而attach不会被只读/DML事务阻塞
所以添加分区应该使用attach,并提前创建check约束,删除约束时需要关注长事务问题。
分区表添加分区的正确姿势:
--减少繁琐的ddl,like方式创建表 CREATE TABLE lzlpartition1_202303 (LIKE lzlpartition1 INCLUDING DEFAULTS INCLUDING CONSTRAINTS); --参考其他分区的Partition constraint,添加表的check约束,减少attach检查约束的时间 alter table lzlpartition1_202303 add constraint chk_202303 CHECK ((date_created IS NOT NULL) AND (date_created >= '2023-03-01 00:00:00'::timestamp without time zone) AND (date_created < '2023-04-01 00:00:00'::timestamp without time zone)); --attach方式添加分区 alter table LZLPARTITION1 attach partition LZLPARTITION1_202303 for values from ('2023-03-01 00:00:00') to ('2023-04-01 00:00:00'); --可选。在新分区有事务之前,删除多余的check约束 alter table lzlpartition1_202303 drop constraint chk_202303;
当分区上有共享锁时AccessShareLock,也就是分区表上有查询事务的情况下
CREATE INDEX ON lzlpartition1创建成功(注意没有加concurrently);DROP INDEX lzlpartition1失败
--session1 开启事务,读取分区表数据 => begin; BEGIN => select count(*) from lzlpartition1 where date_created>='2023-01-01 00:00:00' and date_created<='2023-01-02 00:00:00'; count ------- 86401 (1 row) --session2 创建索引,成功 => create index idx_datecreated on lzlpartition1(date_created);; CREATE INDEX
--session2 删除索引,等待 => drop index idx_datecreated; --session3 查看锁 => select l.locktype,d.datname,r.relname,l.virtualxid,l.transactionid,l.pid,l.mode,l.granted from pg_locks l left join pg_database d on l.database=d.oid left join pg_class r on l.relation=r.oid where relname like '%lzlpartition1%'; locktype | datname | relname | virtualxid | transactionid | pid | mode | granted ----------+---------+---------------------------+------------+---------------+--------+---------------------+--------- relation | dbmgr | lzlpartition1_202301_pkey | [null] | [null] | 300371 | AccessShareLock | t relation | dbmgr | lzlpartition1 | [null] | [null] | 99598 | AccessExclusiveLock | f relation | dbmgr | lzlpartition1_202301 | [null] | [null] | 300371 | AccessShareLock | t relation | dbmgr | lzlpartition1 | [null] | [null] | 300371 | AccessShareLock | t
create index没有申请表上的AccessExclusiveLock,但是drop index申请了表上的AccessExclusiveLock。
从这个的例子可以得出:
只读事务不会阻塞创建索引,但会阻塞删除索引
--session1 开启更新事务 => begin; BEGIN => update lzlpartition1 set name='abc' where date_created='2023-01-01 10:00:00'; UPDATE 1 --session2 创建分区索引,等待 => create index idx_datecreated on lzlpartition1(date_created); --session3 查看锁情况 =>select l.locktype,d.datname,r.relname,l.virtualxid,l.transactionid,l.pid,l.mode,l.granted from pg_locks l left join pg_database d on l.database=d.oid left join pg_class r on l.relation=r.oid -> where relname like '%lzlpartition1%'; locktype | datname | relname | virtualxid | transactionid | pid | mode | granted ----------+---------+---------------------------+------------+---------------+--------+------------------+--------- relation | dbmgr | lzlpartition1_202301_pkey | [null] | [null] | 300371 | RowExclusiveLock | t relation | dbmgr | lzlpartition1 | [null] | [null] | 99598 | ShareLock | f relation | dbmgr | lzlpartition1_202301 | [null] | [null] | 300371 | RowExclusiveLock | t relation | dbmgr | lzlpartition1 | [null] | [null] | 300371 | AccessShareLock | t relation | dbmgr | lzlpartition1 | [null] | [null] | 300371 | RowExclusiveLock | t
create index会话(99598)申请分区主表的ShareLock锁,DML事务会话(300371 )持有该子分区和主表的RowExclusiveLock
create index(无concurrently)会话申请主表ShareLock;
只读事务会话申请主表和子表的AccessShareLock;
更新事务会话申请主表和子表的RowExclusiveLock;
==>
AccessShareLock不阻塞ShareLock,所以查询不阻塞create index(无concurrently);
RowExclusiveLock阻塞ShareLock,所以DML阻塞create index(无concurrently);
注意:在分区表上不能用concurrently创建索引
=> create index concurrently idx_datecreated on lzlpartition1(date_created); ERROR: 0A000: cannot create index on partitioned table "lzlpartition1" concurrently LOCATION: DefineIndex, indexcmds.c:665
有个patch https://commitfest.postgresql.org/35/2815/在解决这个问题。
目前可以在分区子表上concurrently创建索引。
--session1 仍然使用之前的DML事务 --session2 concurrently方式在子表上创建索引,等待 => create index concurrently idx_datecreated_202301 on lzlpartition1_202301(date_created); --session3 查询锁情况 => select l.locktype,d.datname,r.relname,l.virtualxid,l.transactionid,l.pid,l.mode,l.granted from pg_locks l left join pg_database d on l.database=d.oid left join pg_class r on l.relation=r.oid where relname like '%lzlpartition1%'; locktype | datname | relname | virtualxid | transactionid | pid | mode | granted ----------+---------+---------------------------+------------+---------------+--------+--------------------------+--------- relation | dbmgr | lzlpartition1_202301_pkey | [null] | [null] | 300371 | RowExclusiveLock | t relation | dbmgr | lzlpartition1_202301 | [null] | [null] | 99598 | ShareUpdateExclusiveLock | t relation | dbmgr | lzlpartition1_202301 | [null] | [null] | 300371 | RowExclusiveLock | t relation | dbmgr | lzlpartition1 | [null] | [null] | 300371 | AccessShareLock | t relation | dbmgr | lzlpartition1 | [null] | [null] | 300371 | RowExclusiveLock | t
concurrently申请的锁降低了一级,跟ROW EXCL不冲突了。锁都不冲突了,但是为什么concurrently本身还是被阻塞了呢?
it must wait for all existing transactions that could potentially modify or use the index to terminate.
官方文档解释concurrently需要等待潜在修改/使用索引的事务完成,我们这里的update语句更新了索引字段,所以concurrently需要等待它完成。
虽然concurrently本身因为之前的DML语句没有完成,但是这也有一个好处:concurrently不会阻塞后续的DML语句。
--concurrently没有完成的情况下 --session4 更新一条记录 => update lzlpartition1 set name='abc' where date_created='2023-01-01 12:00:00'; UPDATE 1
汇总分区索引的锁问题:
虽然不能以concurrently方式在分区表上创建索引,但可以在分区子表用concurrently创建索引,需要用到语法:
CREATE INDEX ON ONLY :在主表上创建一个无效索引,不会在子分区自动创建索引
CREATE INDEX CONCURRENTLY :concurrently方式在子分区上创建索引
ALTER INDEX .. ATTACH PARTITION:将分区索引ATTACH到主索引上,所有子分区索引ATTACH后,分区主表索引自动标记为有效。
不过在执行这些命令时仍然需要关注锁的情况
下面观察上面两个语句申请锁和阻塞的情况:
(过程中全程开启session1的DML显示事务)
=> CREATE INDEX IDX_DATECREATED ON ONLY lzlpartition1(date_created); --等待 --查看锁情况 locktype | datname | relname | virtualxid | transactionid | pid | mode | granted ----------+---------+----------------------+------------+---------------+--------+------------------+--------- relation | dbmgr | lzlpartition1_202301 | [null] | [null] | 448243 | RowExclusiveLock | t relation | dbmgr | lzlpartition1 | [null] | [null] | 448243 | RowExclusiveLock | t relation | dbmgr | lzlpartition1 | [null] | [null] | 444399 | ShareLock | f
ONLY创建索引会申请ShareLock锁,ShareLock跟RowExclusiveLock是相互阻塞的。所以,虽然ONLY本身执行会很快,但是ONLY创建索引也不是无脑使用。
--将DML事务结束后,ONLY创建索引完成 "idx_datecreated" btree (date_created) INVALID
CREATE INDEX ON ONLY在分区主表上创建了一个失效索引,且不会在子分区创建索引。
--将ONLY索引创建完成后,再开启session1的DML显示事务 => begin; BEGIN => insert into lzlpartition1 values('1111','abc','2023-01-01 00:00:00'); INSERT 0 1 --session2 concurrently创建子分区的索引 => create index concurrently idx_datecreated_202302 on lzlpartition1_202302(date_created); CREATE INDEX --202302分区索引创建完成 => create index concurrently idx_datecreated_202304 on lzlpartition1_202304(date_created); CREATE INDEX --202302分区索引创建完成 => create index concurrently idx_datecreated_202301 on lzlpartition1_202301(date_created); ----创建202302分区索引等待
concurrently会等待潜在使用索引的事务完成,我们这里的显示事务只插入了202301分区,也只有这个分区的concurrently创建索引没有完成。
--完成session1的DML显示事务,等待索引完成后,然后再次开启事务 => commit; COMMIT => begin; BEGIN => insert into lzlpartition1 values('1111','abc','2023-01-01 00:00:01'); INSERT 0 1 --session2 attach索引 => ALTER INDEX idx_datecreated ATTACH PARTITION idx_datecreated_202302; ALTER INDEX --成功ATTACH => \d+ idx_datecreated Partitioned index "public.idx_datecreated" Column | Type | Key? | Definition | Storage | Stats target --------------+-----------------------------+------+--------------+---------+-------------- date_created | timestamp without time zone | yes | date_created | plain | btree, for table "public.lzlpartition1", invalid Partitions: idx_datecreated_202302 --202302子分区索引已经attach,索引仍为invalid Access method: btree --将剩余的子分区索引全部attach => ALTER INDEX idx_datecreated ATTACH PARTITION idx_datecreated_202301; ALTER INDEX --成功ATTACH => ALTER INDEX idx_datecreated ATTACH PARTITION idx_datecreated_202304; ALTER INDEX --成功ATTACH --完成所有子分区索引attach后,主表索引自动有效 => \d+ idx_datecreated Partitioned index "public.idx_datecreated" Column | Type | Key? | Definition | Storage | Stats target --------------+-----------------------------+------+--------------+---------+-------------- date_created | timestamp without time zone | yes | date_created | plain | btree, for table "public.lzlpartition1" Partitions: idx_datecreated_202301, idx_datecreated_202302, idx_datecreated_202304 Access method: btree
attach不会被DML阻塞,直接完成。此时用partition of创建的新分区也会自动创建子分区索引。
综上所述,
所以,直接在分区上create index会阻塞DML,是不可取的
创建分区索引的正确姿势:
--ONLY方式在分区主表上创建失效索引。快,会阻塞后续dml,会影响业务,需要关注长事务 CREATE INDEX IDX_DATECREATED ON ONLY lzlpartition1(date_created); --CONCURRENTLY在各个分区子表上创建索引。慢,不会阻塞后续dml,不会影响业务,但需要关注DML长事务防止本身失败 create index concurrently idx_datecreated_202302 on lzlpartition1_202302(date_created); --所有索引attach。快,不会发生业务阻塞 ALTER INDEX idx_datecreated ATTACH PARTITION idx_datecreated_202302;
“主键索引”功能上等于“唯一索引+null约束”(但是主键只能有一个)。分区表创建唯一索引可以参考上面的索引创建最佳实践:only创建主表索引、concurrently创建子表索引、attach。
而主键虽然支持普通表using index语法,但是目前不支持分区表这样使用:
=> ALTER TABLE lzlpartition1 ADD CONSTRAINT pk_id_date_created PRIMARY KEY USING INDEX idx_uniq; ERROR: 0A000: ALTER TABLE / ADD CONSTRAINT USING INDEX is not supported on partitioned tables LOCATION: ATExecAddIndexConstraint, tablecmds.c:8032
也就是说可以通过提前创建not null约束+attach索引的方式创建一个非空的唯一索引,但是最后一步using index添加主键却不行。
下面看下直接添加/删除主键的阻塞情况
--session 1 => begin; BEGIN Time: 0.318 ms => select * from lzlpartition1 where date_created='2023-01-01 22:00:00'; id | name | date_created ------+----------------------------------+--------------------- 7715 | beee680a86e1d12790489e9ab4a4351b | 2023-01-01 22:00:00 --session2 删除主键等待 => alter table lzlpartition1 drop constraint lzlpartition1_pkey; --session3 观察 => select l.locktype,d.datname,r.relname,l.virtualxid,l.transactionid,l.pid,l.mode,l.granted from pg_locks l left join pg_database d on l.database=d.oid left join pg_class r on l.relation=r.oid where relname like '%lzlpartition1%'; locktype | datname | relname | virtualxid | transactionid | pid | mode | granted ----------+---------+---------------------------+------------+---------------+-------+---------------------+--------- relation | dbmgr | lzlpartition1_202301_pkey | [null] | [null] | 21659 | AccessShareLock | t relation | dbmgr | lzlpartition1_202301 | [null] | [null] | 21659 | AccessShareLock | t relation | dbmgr | lzlpartition1 | [null] | [null] | 95016 | AccessShareLock | t relation | dbmgr | lzlpartition1 | [null] | [null] | 95016 | AccessExclusiveLock | f relation | dbmgr | lzlpartition1 | [null] | [null] | 21659 | AccessShareLock | t
删除主键申请的是AccessExclusiveLock,阻塞一切
--session1事务结束,session2的删除主键完成 --session1再次开启只读事务 --session2在分区表上添加主键,等待 => ALTER TABLE lzlpartition1 ADD PRIMARY KEY(id, date_created); --session3 观察锁 => select l.locktype,d.datname,r.relname,l.virtualxid,l.transactionid,l.pid,l.mode,l.granted from pg_locks l left join pg_database d on l.database=d.oid left join pg_class r on l.relation=r.oid where relname like '%lzlpartition1%'; locktype | datname | relname | virtualxid | transactionid | pid | mode | granted ----------+---------+----------------------+------------+---------------+-------+---------------------+--------- relation | dbmgr | lzlpartition1_202301 | [null] | [null] | 21659 | AccessShareLock | t relation | dbmgr | lzlpartition1 | [null] | [null] | 95016 | AccessShareLock | t relation | dbmgr | lzlpartition1 | [null] | [null] | 95016 | AccessExclusiveLock | f --添加主键的会话 relation | dbmgr | lzlpartition1 | [null] | [null] | 21659 | AccessShareLock | t
添加主键在主表上申请AccessExclusiveLock,阻塞一切。
分区表上添加索引很慢,主键又会造成后续的阻塞,目前没有影响较小的在分区表上添加主键的办法。虽然没有达到目的,可以考虑用“attach唯一索引+非空约束”的办法;或者只能申请较长的停分区表业务,等待创建索引完成;或者通过第三方同步工具将数据插入一个带主键的分区表。
如果新增后的分区数为之前的整数倍,那么我们将会知道新分区的数据来自哪个老分区。比如将原本只有3个分区的hash分区表做成6个分区的,我们可以知道分区数据来源
虽然了解了这种简单的数据特性,但实际情况可能没有什么用,因为新分区的hash分区总是被暴力插入的。从操作上“3->4”的新增分区操作和“3->6”的新增分区操作没有什么区别。
目前成熟的数据同步工具已经非常多了,比如使用dts把表插入到新表中然后做表切换,停机时间会很短,生产环境应优先选择这个方案。
下面是主要是测试和观察hash分区表手动新增整数倍分区时的操作:
SELECT tableoid::regclass,count(*) FROM orders group by tableoid::regclass; tableoid | count -----------+------- orders_p1 | 3377 orders_p3 | 3354 orders_p2 | 3369
2. detach分区 3个分区的hash原生分区表再添加3个分区
ALTER TABLE orders DETACH PARTITION orders_p1; ALTER TABLE orders DETACH PARTITION orders_p2; ALTER TABLE orders DETACH PARTITION orders_p3;
ALTER TABLE orders_p1 RENAME TO bak_orders_p1; ALTER TABLE orders_p2 RENAME TO bak_orders_p2; ALTER TABLE orders_p3 RENAME TO bak_orders_p3;
CREATE TABLE orders_p1 PARTITION OF orders FOR VALUES WITH (MODULUS 6, REMAINDER 0); CREATE TABLE orders_p2 PARTITION OF orders FOR VALUES WITH (MODULUS 6, REMAINDER 1); CREATE TABLE orders_p3 PARTITION OF orders FOR VALUES WITH (MODULUS 6, REMAINDER 2); CREATE TABLE orders_p4 PARTITION OF orders FOR VALUES WITH (MODULUS 6, REMAINDER 3); CREATE TABLE orders_p5 PARTITION OF orders FOR VALUES WITH (MODULUS 6, REMAINDER 4); CREATE TABLE orders_p6 PARTITION OF orders FOR VALUES WITH (MODULUS 6, REMAINDER 5);
注意分区约束使用的函数
\d+ orders_p1 Table "public.orders_p1" Column | Type | Collation | Nullable | Default | Storage | Stats target | Description ----------+-----------------------+-----------+----------+---------+----------+--------------+------------- order_id | integer | | | | plain | | name | character varying(10) | | | | extended | | Partition of: orders FOR VALUES WITH (modulus 6, remainder 0) Partition constraint: satisfies_hash_partition('412053'::oid, 6, 0, order_id) Access method: heap
计算老分区数据需要插入到哪个新分区上
例如原先为modulus 3,remainder 0的分区,需要把数据分别插入到modulus 6,remainder0和3两个分区中。
select count(*) from bak_orders_p1 where satisfies_hash_partition('412053'::oid, 6, 0, order_id)=true; count ------- 1776 select count(*) from bak_orders_p1 where satisfies_hash_partition('412053'::oid, 6, 3, order_id)=true; count ------- 1601 select count(*) from bak_orders_p1; count ------- 3377
6.通过分区子表插入数据
可以直接把数据插入对应的分区子表上,而不是通过分区主表插入
INSERT INTO orders_p1 SELECT * FROM bak_orders_p1 where satisfies_hash_partition('412053'::oid, 6, 0, order_id)=true; INSERT INTO orders_p2 SELECT * FROM bak_orders_p2 where satisfies_hash_partition('412053'::oid, 6, 1, order_id)=true; INSERT INTO orders_p3 SELECT * FROM bak_orders_p3 where satisfies_hash_partition('412053'::oid, 6, 2, order_id)=true; INSERT INTO orders_p4 SELECT * FROM bak_orders_p1 where satisfies_hash_partition('412053'::oid, 6, 3, order_id)=true; INSERT INTO orders_p5 SELECT * FROM bak_orders_p2 where satisfies_hash_partition('412053'::oid, 6, 4, order_id)=true; INSERT INTO orders_p6 SELECT * FROM bak_orders_p3 where satisfies_hash_partition('412053'::oid, 6, 5, order_id)=true;
SELECT tableoid::regclass,count(*) FROM orders group by tableoid::regclass; tableoid | count -----------+------- orders_p3 | 1665 orders_p5 | 1678 orders_p1 | 1776 orders_p6 | 1689 orders_p4 | 1601 orders_p2 | 1691
修改字段需要考虑三个方面:表重写、索引重建、统计信息丢失
修改字段,参考PostgreSQL学徒。
这里主要测试分区表将字段改长的场景,如果存在索引的话,可能会引起分区表上的事务阻塞。
普通表,将有索引的字段改长:
--新建普通表和索引 => create table t111(id int,name varchar(50)); CREATE TABLE => insert into t111 values(1001,'abc'); INSERT 0 1 => create index idx111 on t111(name); CREATE INDEX --索引文件relfilenode为417728 select pg_relation_filepath('idx111'); pg_relation_filepath ---------------------- base/16398/417728 (1 row) --将字段改长 => alter table t111 alter column name type varchar(60); ALTER TABLE --索引文件relfilenode为417728,未发生变化,普通表索引未重建 => select pg_relation_filepath('idx111'); pg_relation_filepath ---------------------- base/16398/417728
分区表,将有索引的字段改长:
--在分区表上创建一个索引 => create index idx_name on lzlpartition1(name); CREATE INDEX --查看其中一个分区上的索引 => \d+ lzlpartition1_202301 Table "dbmgr.lzlpartition1_202301" Column | Type | Collation | Nullable | Default | Storage | Stats target | Description --------------+-----------------------------+-----------+----------+---------+----------+--------------+------------- id | integer | | | | plain | | name | character varying(50) | | | | extended | | date_created | timestamp without time zone | | not null | now() | plain | | Partition of: lzlpartition1 FOR VALUES FROM ('2023-01-01 00:00:00') TO ('2023-02-01 00:00:00') Partition constraint: ((date_created IS NOT NULL) AND (date_created >= '2023-01-01 00:00:00'::timestamp without time zone) AND (date_created < '2023-02-01 00:00:00'::timestamp without time zone)) Indexes: "lzlpartition1_202301_name_idx" btree (name) Access method: heap => select pg_relation_filepath('lzlpartition1_202301_name_idx') idx,pg_relation_filepath('lzlpartition1_202301') tbl; idx | tbl -------------------+------------------- base/16398/417810 | base/16398/417800 (1 row) --将索引字段改大,分区表索引重建 => alter table lzlpartition1 alter column name type varchar(60); ALTER TABLE => select pg_relation_filepath('lzlpartition1_202301_name_idx') idx,pg_relation_filepath('lzlpartition1_202301') tbl; idx | tbl -------------------+------------------- base/16398/417814 | base/16398/417800 --将索引字段改小,分区表重写 => alter table lzlpartition1 alter column name type varchar(40); ALTER TABLE Time: 609.585 ms => select pg_relation_filepath('lzlpartition1_202301_name_idx') idx,pg_relation_filepath('lzlpartition1_202301') tbl; idx | tbl -------------------+------------------- base/16398/417828 | base/16398/417825 --将索引字段保持原样,分区表索引重建 => alter table lzlpartition1 alter column name type varchar(40); ALTER TABLE => select pg_relation_filepath('lzlpartition1_202301_name_idx') idx,pg_relation_filepath('lzlpartition1_202301') tbl; idx | tbl -------------------+------------------- base/16398/417834 | base/16398/417825
普通表改大字段长度只需要关注统计信息会丢失(int到bigint除外);但是分区表在改大字段长度时,如果这个字段上有索引,不仅会丢失统计信息,还会重建索引。由于alter修改字段是8级锁,所以重建索引期间会导致长时间阻塞。
建议:先把索引删除,修改完字段后,“父表ONLY+子表CIC+ATTACH”的方式建索引。
分区裁剪(Partition Pruning)可以为声明式分区提升性能,是分区表优化非常重要的特性。如果没有分区裁剪,那么查询会扫描所有分区。当有分区裁剪时,优化器可以通过where条件过滤那些不需要访问的分区
分区裁剪依赖于分区约束Partition constraint(\d+可以看到),也就是说查询必须带有分区键条件才能进行裁剪。这个约束不同于一般约束constraint,它在分区创建时自动创建。
分区裁剪由enable_partition_pruning参数控制,默认为on。
--没有分区裁剪时,会访问所有分区 => set enable_partition_pruning=off; SET => explain select count(*) from lzlpartition1 where date_created='2023-01-01'; QUERY PLAN -------------------------------------------------------------------------------------------------- Aggregate (cost=1872.08..1872.09 rows=1 width=8) -> Append (cost=0.00..1872.07 rows=4 width=0) -> Seq Scan on lzlpartition1_202301 lzlpartition1_1 (cost=0.00..992.30 rows=1 width=0) Filter: (date_created = '2023-01-01 00:00:00'::timestamp without time zone) -> Seq Scan on lzlpartition1_202302 lzlpartition1_2 (cost=0.00..864.12 rows=1 width=0) Filter: (date_created = '2023-01-01 00:00:00'::timestamp without time zone) -> Seq Scan on lzlpartition1_202304 lzlpartition1_3 (cost=0.00..15.62 rows=2 width=0) Filter: (date_created = '2023-01-01 00:00:00'::timestamp without time zone) --有分区裁剪时,不需要访问的分区被排除 => set enable_partition_pruning=on; SET => explain select count(*) from lzlpartition1 where date_created='2023-01-01'; QUERY PLAN ------------------------------------------------------------------------------------------ Aggregate (cost=992.30..992.31 rows=1 width=8) -> Seq Scan on lzlpartition1_202301 lzlpartition1 (cost=0.00..992.30 rows=1 width=0) Filter: (date_created = '2023-01-01 00:00:00'::timestamp without time zone) (3 rows)
(官方文档说生成执行计划时发生裁剪,那么explain有Subplans Removed字样,经测试有时候没有,就像上面的explain例子)
分区裁剪可能发生在两个阶段:生成执行计划时、真正执行时
为什么会发生这样的情况呢?因为有时候只有执行时才会知道那些分区可以裁剪。有两种情况:
Parameterized Nested Loop Joins: The parameter from the outer side of the
join can be used to determine the minimum set of inner side partitions to
scan.
Initplans: Once an initplan has been executed we can then determine which
partitions match the value from the initplan.
模拟执行时发生裁剪:从其他表拿数据优化器肯定不知道数据是什么,就无法以此为依据在执行计划时发生分区裁剪:
--创建一个其他表 => create table x(date_created timestamp); CREATE TABLE => insert into x values('2023-01-01 09:00:00'); INSERT 0 1 --仅生成执行计划,不执行,没有发生裁剪 => explain select count(*) from lzlpartition1 where date_created=(select date_created from x); QUERY PLAN -------------------------------------------------------------------------------------------------- Aggregate (cost=1904.68..1904.69 rows=1 width=8) InitPlan 1 (returns $0) -> Seq Scan on x (cost=0.00..32.60 rows=2260 width=8) -> Append (cost=0.00..1872.07 rows=4 width=0) -> Seq Scan on lzlpartition1_202301 lzlpartition1_1 (cost=0.00..992.30 rows=1 width=0) Filter: (date_created = $0) -> Seq Scan on lzlpartition1_202302 lzlpartition1_2 (cost=0.00..864.12 rows=1 width=0) Filter: (date_created = $0) -> Seq Scan on lzlpartition1_202304 lzlpartition1_3 (cost=0.00..15.62 rows=2 width=0) Filter: (date_created = $0) (10 rows) --执行sql,发生裁剪。关键字never executed => explain analyze select count(*) from lzlpartition1 where date_created=(select date_created from x); QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=1904.68..1904.69 rows=1 width=8) (actual time=5.680..5.682 rows=1 loops=1) InitPlan 1 (returns $0) -> Seq Scan on x (cost=0.00..32.60 rows=2260 width=8) (actual time=0.013..0.014 rows=1 loops=1) -> Append (cost=0.00..1872.07 rows=4 width=0) (actual time=0.029..5.676 rows=2 loops=1) -> Seq Scan on lzlpartition1_202301 lzlpartition1_1 (cost=0.00..992.30 rows=1 width=0) (actual time=0.008..5.652 rows=2 loops=1) Filter: (date_created = $0) Rows Removed by Filter: 45382 -> Seq Scan on lzlpartition1_202302 lzlpartition1_2 (cost=0.00..864.12 rows=1 width=0) (never executed) Filter: (date_created = $0) -> Seq Scan on lzlpartition1_202304 lzlpartition1_3 (cost=0.00..15.62 rows=2 width=0) (never executed) Filter: (date_created = $0) Planning Time: 0.157 ms Execution Time: 5.732 ms (13 rows)
partition wise join可以减少分区连接的代价。
假设有两个分区表t1、t2,他们都有3个分区(p1,p2,p3)且分区定义一致,t1的每个分区10条数据,t2的每个分区20条数据:
t1 | t2 | |
---|---|---|
p1 | 10 rows | 20 rows |
p2 | 10 rows | 20 rows |
p3 | 10 rows | 20 rows |
此时t1和t2表进行连接, |
(10+10+10)*(20+20+20)=180次
t1.p1<=>t2.p1,
t1.p2<=>t2.p2,
t1.p3<=>t2.p3,
此时的连接比较次数为:
(10*20)*3=90次
在分区特别多的情况下,partition wise join的代价会小很多。
参数enable_partitionwise_join:是否开启partition wise join,默认关闭
partition wise join的前提条件非常苛刻:
看上去条件苛刻,两个不同的用途的表能产生partition wise join的情况是也是比较少的,比较常见的应该是两个表都是range时间分区。还有一种情况,如果是分区表自我连接,也符合partition wise join的前提:
--未开启partition wise join的情况 => explain select p1.*,p2.name from lzlpartition1 p1,lzlpartition1 p2 where p1.date_created=p2.date_created and p2.name='256ac66bb53d31bc6124294238d6410c'; QUERY PLAN ---------------------------------------------------------------------------------------------------------------- Hash Join (cost=546.64..9256.34 rows=182252 width=288) Hash Cond: (p1.date_created = p2.date_created) -> Append (cost=0.00..2085.46 rows=85364 width=150) -> Seq Scan on lzlpartition1_202301 p1_1 (cost=0.00..878.84 rows=45384 width=150) -> Seq Scan on lzlpartition1_202302 p1_2 (cost=0.00..765.30 rows=39530 width=150) -> Seq Scan on lzlpartition1_202304 p1_3 (cost=0.00..14.50 rows=450 width=150) -> Hash (cost=541.30..541.30 rows=427 width=146) -> Append (cost=7.17..541.30 rows=427 width=146) -> Bitmap Heap Scan on lzlpartition1_202301 p2_1 (cost=7.17..284.30 rows=227 width=146) Recheck Cond: ((name)::text = '256ac66bb53d31bc6124294238d6410c'::text) -> Bitmap Index Scan on lzlpartition1_202301_name_idx (cost=0.00..7.12 rows=227 width=0) Index Cond: ((name)::text = '256ac66bb53d31bc6124294238d6410c'::text) -> Bitmap Heap Scan on lzlpartition1_202302 p2_2 (cost=6.95..248.52 rows=198 width=146) Recheck Cond: ((name)::text = '256ac66bb53d31bc6124294238d6410c'::text) -> Bitmap Index Scan on lzlpartition1_202302_name_idx (cost=0.00..6.90 rows=198 width=0) Index Cond: ((name)::text = '256ac66bb53d31bc6124294238d6410c'::text) -> Bitmap Heap Scan on lzlpartition1_202304 p2_3 (cost=2.66..6.35 rows=2 width=146) Recheck Cond: ((name)::text = '256ac66bb53d31bc6124294238d6410c'::text) -> Bitmap Index Scan on lzlpartition1_202304_name_idx (cost=0.00..2.66 rows=2 width=0) Index Cond: ((name)::text = '256ac66bb53d31bc6124294238d6410c'::text) (20 rows) --开启partition wise join的情况 => set enable_partitionwise_join =on; SET M=> explain select p1.*,p2.name from lzlpartition1 p1,lzlpartition1 p2 where p1.date_created=p2.date_created and p2.name='256ac66bb53d31bc6124294238d6410c'; QUERY PLAN ---------------------------------------------------------------------------------------------------------------- Append (cost=287.14..2529.83 rows=438 width=288) -> Hash Join (cost=287.14..1338.49 rows=232 width=288) Hash Cond: (p1_1.date_created = p2_1.date_created) -> Seq Scan on lzlpartition1_202301 p1_1 (cost=0.00..878.84 rows=45384 width=150) -> Hash (cost=284.30..284.30 rows=227 width=146) -> Bitmap Heap Scan on lzlpartition1_202301 p2_1 (cost=7.17..284.30 rows=227 width=146) Recheck Cond: ((name)::text = '256ac66bb53d31bc6124294238d6410c'::text) -> Bitmap Index Scan on lzlpartition1_202301_name_idx (cost=0.00..7.12 rows=227 width=0) Index Cond: ((name)::text = '256ac66bb53d31bc6124294238d6410c'::text) -> Hash Join (cost=250.99..1166.55 rows=202 width=288) Hash Cond: (p1_2.date_created = p2_2.date_created) -> Seq Scan on lzlpartition1_202302 p1_2 (cost=0.00..765.30 rows=39530 width=150) -> Hash (cost=248.52..248.52 rows=198 width=146) -> Bitmap Heap Scan on lzlpartition1_202302 p2_2 (cost=6.95..248.52 rows=198 width=146) Recheck Cond: ((name)::text = '256ac66bb53d31bc6124294238d6410c'::text) -> Bitmap Index Scan on lzlpartition1_202302_name_idx (cost=0.00..6.90 rows=198 width=0) Index Cond: ((name)::text = '256ac66bb53d31bc6124294238d6410c'::text) -> Hash Join (cost=6.37..22.60 rows=4 width=288) Hash Cond: (p1_3.date_created = p2_3.date_created) -> Seq Scan on lzlpartition1_202304 p1_3 (cost=0.00..14.50 rows=450 width=150) -> Hash (cost=6.35..6.35 rows=2 width=146) -> Bitmap Heap Scan on lzlpartition1_202304 p2_3 (cost=2.66..6.35 rows=2 width=146) Recheck Cond: ((name)::text = '256ac66bb53d31bc6124294238d6410c'::text) -> Bitmap Index Scan on lzlpartition1_202304_name_idx (cost=0.00..2.66 rows=2 width=0) Index Cond: ((name)::text = '256ac66bb53d31bc6124294238d6410c'::text) (25 rows)
在没有开启partition wise join的情况下,优化器需要先访问分区表p2的所有分区数据(符合条件的)放一起(append),然后与分区表p1的所有分区数据通过分区键连接(Hash Join)。
在开启partition wise join的情况下,优化器将p1、p2两个分区表(实际上是一个,访问了两次)所对应的分区相连接:
p1_1<=>p2_1 Hash Join
p1_2<=>p2_2 Hash Join
p1_3<=>p2_3 Hash Join
然后再把数据合到一起(append)。
如果数据分区足够多,再加上分区裁剪,partition wise join会有很好的优化效果。
分区表在进行分区数据聚合计算时,分区可以各自算各自的,不需要扫描所有分区数据进行聚合计算,只需要各自分区的数据聚合计算完成后汇总返回即可。
没有partition wise grouping本质上是“先扫描所有分区,再聚合计算”;有partition wise grouping是“先分区聚合计算,再汇合数据”。
partition wise grouping的优势如下:
参数enable_partitionwise_aggregate:是否开启partition wise grouping/aggregation,默认关闭
partition wise aggregate示例:
=> vacuum (analyze) lzlpartition1; --未开启wise agg => set enable_partitionwise_aggregate =off; SET => explain select date_created,min(id),count(*) from lzlpartition1 group by date_created order by 1,2,3; QUERY PLAN ------------------------------------------------------------------------------------------------------------- Sort (cost=10354.94..10562.89 rows=83180 width=20) Sort Key: lzlpartition1.date_created, (min(lzlpartition1.id)), (count(*)) -> HashAggregate (cost=2725.69..3557.49 rows=83180 width=20) Group Key: lzlpartition1.date_created -> Append (cost=0.00..2085.46 rows=85364 width=12) -> Seq Scan on lzlpartition1_202301 lzlpartition1_1 (cost=0.00..878.84 rows=45384 width=12) -> Seq Scan on lzlpartition1_202302 lzlpartition1_2 (cost=0.00..765.30 rows=39530 width=12) -> Seq Scan on lzlpartition1_202304 lzlpartition1_3 (cost=0.00..14.50 rows=450 width=12) --开启wise agg => set enable_partitionwise_aggregate =on; SET => explain select date_created,min(id),count(*) from lzlpartition1 group by date_created order by 1,2,3; QUERY PLAN ------------------------------------------------------------------------------------------------------------- Sort (cost=10356.08..10564.32 rows=83296 width=20) Sort Key: lzlpartition1.date_created, (min(lzlpartition1.id)), (count(*)) -> Append (cost=1219.22..3548.31 rows=83296 width=20) -> HashAggregate (cost=1219.22..1663.09 rows=44387 width=20) Group Key: lzlpartition1.date_created -> Seq Scan on lzlpartition1_202301 lzlpartition1 (cost=0.00..878.84 rows=45384 width=12) -> HashAggregate (cost=1061.77..1448.86 rows=38709 width=20) Group Key: lzlpartition1_1.date_created -> Seq Scan on lzlpartition1_202302 lzlpartition1_1 (cost=0.00..765.30 rows=39530 width=12) -> HashAggregate (cost=17.88..19.88 rows=200 width=20) Group Key: lzlpartition1_2.date_created -> Seq Scan on lzlpartition1_202304 lzlpartition1_2 (cost=0.00..14.50 rows=450 width=12) (12 rows)
无wise aggregate时,先扫描所有数据再合并(Append),合并后再聚合计算(HashAggregate);
partition wise aggregate先在分区聚合计算(HashAggregate),然后在合并结果(Append)。
partial aggregation
聚合算法可以下放到分区上进行计算,此时聚合后的数据分为两种情况:聚合数据不重复(group包含分区键),聚合数据有重复(group不包含分区键)。
当聚合数据不重复时,只需要把各自分区算出来的聚合数据简单的加到一起(append)即可(就像上面的案例);当各自分区算出来的聚合数据重复时,仍然需要再聚合计算一次(Finalize Aggregate)。不包含分区键的聚合计算就是partial aggregation。
partial aggregation示例:
--group by不是分区键时 => show enable_partitionwise_aggregate; enable_partitionwise_aggregate -------------------------------- on => explain select id,count(*) from lzlpartition1 group by id ; QUERY PLAN ------------------------------------------------------------------------------------------------------------ Finalize HashAggregate (cost=2474.80..2573.80 rows=9900 width=12) Group Key: lzlpartition1.id -> Append (cost=1105.76..2377.47 rows=19467 width=12) -> Partial HashAggregate (cost=1105.76..1202.28 rows=9652 width=12) Group Key: lzlpartition1.id -> Seq Scan on lzlpartition1_202301 lzlpartition1 (cost=0.00..878.84 rows=45384 width=4) -> Partial HashAggregate (cost=962.95..1059.10 rows=9615 width=12) Group Key: lzlpartition1_1.id -> Seq Scan on lzlpartition1_202302 lzlpartition1_1 (cost=0.00..765.30 rows=39530 width=4) -> Partial HashAggregate (cost=16.75..18.75 rows=200 width=12) Group Key: lzlpartition1_2.id -> Seq Scan on lzlpartition1_202304 lzlpartition1_2 (cost=0.00..14.50 rows=450 width=4)
group by不包含分区键,也可以进行聚合计算,但是必须在稍后总聚合Finalize HashAggregate
即使没有group by,也可能发生Partial Aggregate:
=> show enable_partitionwise_aggregate; enable_partitionwise_aggregate -------------------------------- on => explain select count(*) from lzlpartition1; QUERY PLAN ------------------------------------------------------------------------------------------------------------ Finalize Aggregate (cost=1872.10..1872.11 rows=1 width=8) -> Append (cost=992.30..1872.10 rows=3 width=8) -> Partial Aggregate (cost=992.30..992.31 rows=1 width=8) -> Seq Scan on lzlpartition1_202301 lzlpartition1 (cost=0.00..878.84 rows=45384 width=0) -> Partial Aggregate (cost=864.12..864.13 rows=1 width=8) -> Seq Scan on lzlpartition1_202302 lzlpartition1_1 (cost=0.00..765.30 rows=39530 width=0) -> Partial Aggregate (cost=15.62..15.63 rows=1 width=8) -> Seq Scan on lzlpartition1_202304 lzlpartition1_2 (cost=0.00..14.50 rows=450 width=0) => explain select max(date_created) from lzlpartition1; QUERY PLAN ------------------------------------------------------------------------------------------------------------ Finalize Aggregate (cost=1872.10..1872.11 rows=1 width=8) -> Append (cost=992.30..1872.10 rows=3 width=8) -> Partial Aggregate (cost=992.30..992.31 rows=1 width=8) -> Seq Scan on lzlpartition1_202301 lzlpartition1 (cost=0.00..878.84 rows=45384 width=8) -> Partial Aggregate (cost=864.12..864.13 rows=1 width=8) -> Seq Scan on lzlpartition1_202302 lzlpartition1_1 (cost=0.00..765.30 rows=39530 width=8) -> Partial Aggregate (cost=15.62..15.63 rows=1 width=8) -> Seq Scan on lzlpartition1_202304 lzlpartition1_2 (cost=0.00..14.50 rows=450 width=8)
触发Partial Aggregate的前提不是group。应从Partial Aggregate的目的去考虑,它的目的是把聚合下放到分区,那么没有group的聚合其实也可以这么做,就像上面两个例子:他们都是在分区上聚合计算后(Partial Aggregate),汇总到一起再聚合计算一次( Finalize Aggregate);如果没有打开参数,这些聚合发生在扫描完所有分区后。
声明式分区经过了多个版本的增强,如今已非常成熟。对于历史版本的声明式分区,功能增强如下:
PG9.6以前
PG10
PG11
PG12
PG13
PG14
PG15
PG16
《PostgreSQL修炼之道》
https://mp.weixin.qq.com/s/NW8XOZNq0YlDZvx24H737Q
https://www.postgresql.org/docs/current/ddl-partitioning.html
https://www.postgresql.org/docs/current/ddl-inherit.html
https://www.postgresql.org/docs/13/sql-altertable.html
https://github.com/postgrespro/pg_pathman
https://developer.aliyun.com/article/62314
https://hevodata.com/learn/postgresql-partitions
https://www.postgresql.fastware.com/postgresql-insider-prt-ove
https://www.buckenhofer.com/2021/01/postgresql-partitioning-guide/
https://www.depesz.com/2018/05/01/waiting-for-postgresql-11-support-partition-pruning-at-execution-time/
https://blog.csdn.net/horses/article/details/86164273
http://www.pgsql.tech/article_0_10000102
https://brandur.org/fragments/postgres-partitioning-2022
上一篇:mysql不能远程连接的解决办法