查看原文
其他

别裁员!别裁员!别裁员! 一招降低IT数据库成本

德哥 PostgreSQL中文社区 2020-02-27

背景

2020疫情无情,诸多企业因此受挫,特别中小企业,甚至到了要裁员的地步, 但是人才是最宝贵的,裁员一定是下下策,如何渡过这个难关,疫情带给我们什么反思?

开源节流有新方法,通常数据库在企业IT支出中的占比将近一半,降低数据库成本对降低企业IT成本效果明显,但是一般企业没有专业DBA,很难在这方面下手,不过没关系,有了云厂商,一切变得简单。借助阿里云我们找到了可以为企业IT节省至少一倍成本的方法.

到底时什么方法呢? 回顾一下年前做的一系列MySQL+PG联合解决方案的课程.

《阿里云 RDS PostgreSQL+MySQL 联合解决方案课程 - 汇总视频、课件》

在众多数据库中, PG是一个企业级的开源数据库, 各方面的功能与Oracle对齐, 适合范围广, 能处理的数据量庞大. 采用PG的大型企业例如平安,邮储银行,阿里,华为,中兴,人保, 招商, 富士康, 苹果, SAP, saleforce等以及全球财富1000强等众多企业。 《外界对PostgreSQL 的评价》

阿里云RDS PG的优势:
· 支持完整生命周期管理,包括高可用, 容灾, 备份, 安全, 审计, 加密, cloud dba等模块, 大幅降低企业的使用和管理成本.
· 专业内核和DBA团队 7*24小时服务.
· 支持并行计算,LLVM,GPU加速,向量计算,分析能力更强。
· PG的优化器强大,应对复杂SQL处理效率更高,适合复杂业务场景, 更适合新零售、制造业、工业、在线教育、游戏、金融、政府、企业ERP等行业或领域。
· 内核扩展, 根据垂直领域的需求定制化。
o Ganos插件, GIS功能更强更专业,支持平面、球面几何,栅格,时空轨迹,点云,拓扑网络模型。
o pase插件, 支持高维向量搜索, 支持精确的图像搜索, 人脸识别, 相似查询.
o roaringbitmap插件, 支持实时大数据用户画像, 精准营销.
o rdkit插件, 支持化学分析, 分子式的相似搜索, 化学机器学习等.
· 多模能力更强,其表现在索引更丰富,除了btree,hash还支持gin,gist,spgist,brin,bloom,rum等索引接口,适合模糊搜索,全文检索,多维任意搜索,时空搜索,高维向量(广泛应用于图像识别、相似特征扩选,时序搜索,用户画像,化学分析,DNA检索等。
· 类型更加丰富,同时支持扩展类型,除了基本类型以外,支持网络、全文检索、数组、xml、JSON、范围、域、树、多维、分子、GIS等类型。支持更丰富的应用场景。
· 支持oss_fdw, 可以将数据库的归档数据存储在oss中, 降低成本, 并且访问方法不变.
本文将对PG和MySQL进行多方位对比, 在某些方面PG的综合性能比MySQL高出一个数量级, PG+MySQL结合使用, 可以大幅降低企业成本.
疫情无情PG有情, 别裁员了, 建立多元化的技术栈, 强化企业IT能力更重要.
环境
阿里云RDS PG 12实例, 8核32G 1500G ESSD
同硬件配置的MySQL 8.0
用户密码:
user123      
xxxxxxxx!      
库:
db1      
连接串:
PG:
export PGPASSWORD=xxxxxxxx! psql -h pgm-bp1z26gbo3gx893a129310.pg.rds.aliyuncs.com -p 1433 -U user123 db1 MySQL:mysql -h rm-bp1wv992ym962k85888370.mysql.rds.aliyuncs.com -P 3306 -u user123 --password=xxxxxx! -D db1
测试用的客户端ecs centos 7.x x64安装mysql, pg客户端
yum install -y mysql-* yum install https://download.postgresql.org/pub/repos/yum/reporpms/EL-7-x86_64/pgdg-redhat-repo-latest.noarch.rpm yum install -y postgresql12
MySQL 8.0测试
测试表
CREATE TABLE employees ( id INT NOT NULL, fname VARCHAR(30), lname VARCHAR(30), birth TIMESTAMP, hired DATE NOT NULL DEFAULT '1970-01-01', separated DATE NOT NULL DEFAULT '9999-12-31', job_code INT NOT NULL, store_id INT NOT NULL);
批量写入存储过程
DROP PROCEDURE IF EXISTS BatchInsert;
delimiter // -- 把界定符改成双斜杠 CREATE PROCEDURE BatchInsert(IN init INT, IN loop_time INT) -- 第一个参数为初始ID号(可自定义),第二个位生成MySQL记录个数 BEGIN DECLARE Var INT; DECLARE ID INT; SET Var = 0; SET ID = init; WHILE Var < loop_time DO insert into employees (id, fname, lname, birth, hired, separated, job_code, store_id) values (ID, CONCAT('chen', ID), CONCAT('haixiang', ID), Now(), Now(), Now(), 1, ID); SET ID = ID + 1; SET Var = Var + 1; END WHILE; END; // delimiter ; -- 界定符改回分号
批量写入20万条
-- 开启事务插入,否则会很慢      
      
begin;      
CALL BatchInsert(1, 200000);      
commit;      
      
Query OK, 1 row affected (7.53 sec)      
使用insert into继续批量写入
mysql> insert into employees select * from employees;      
Query OK, 200000 rows affected (1.61 sec)      
Records: 200000  Duplicates: 0  Warnings: 0      
      
mysql> insert into employees select * from employees;      
Query OK, 400000 rows affected (3.25 sec)      
Records: 400000  Duplicates: 0  Warnings: 0      
      
mysql> insert into employees select * from employees;      
Query OK, 800000 rows affected (6.51 sec)      
Records: 800000  Duplicates: 0  Warnings: 0      
      
mysql> insert into employees select * from employees;      
Query OK, 1600000 rows affected (12.93 sec)      
Records: 1600000  Duplicates: 0  Warnings: 0      
      
mysql> insert into employees select * from employees;      
Query OK, 3200000 rows affected (28.61 sec)      
Records: 3200000  Duplicates: 0  Warnings: 0      
      
mysql> insert into employees select * from employees;      
Query OK, 6400000 rows affected (56.48 sec)      
Records: 6400000  Duplicates: 0  Warnings: 0      
      
mysql> insert into employees select * from employees;      
Query OK, 12800000 rows affected (1 min 55.30 sec)      
Records: 12800000  Duplicates: 0  Warnings: 0      
查询性能
mysql> select count(*) from employees;      
+----------+      
| count(*) |      
+----------+      
| 25600000 |      
+----------+      
1 row in set (6.15 sec)      
求distinct性能
mysql> select count(distinct id) from employees ; +--------------------+ | count(distinct id) | +--------------------+ | 200000 | +--------------------+ 1 row in set (16.67 sec)
分组求distinct性能
mysql> select count(*) from (select id from employees group by id) t; +----------+ | count(*) | +----------+ | 200000 | +----------+ 1 row in set (15.52 sec)
再写入200万
begin; CALL BatchInsert(1, 2000000); commit;
测试表2, 写入200万.
CREATE TABLE employees1 ( id INT NOT NULL, fname VARCHAR(30), lname VARCHAR(30), birth TIMESTAMP, hired DATE NOT NULL DEFAULT '1970-01-01', separated DATE NOT NULL DEFAULT '9999-12-31', job_code INT NOT NULL, store_id INT NOT NULL);
DROP PROCEDURE IF EXISTS BatchInser1;
delimiter // -- 把界定符改成双斜杠 CREATE PROCEDURE BatchInsert1(IN init INT, IN loop_time INT) -- 第一个参数为初始ID号(可自定义),第二个位生成MySQL记录个数 BEGIN DECLARE Var INT; DECLARE ID INT; SET Var = 0; SET ID = init; WHILE Var < loop_time DO insert into employees1 (id, fname, lname, birth, hired, separated, job_code, store_id) values (ID, CONCAT('chen', ID), CONCAT('haixiang', ID), Now(), Now(), Now(), 1, ID); SET ID = ID + 1; SET Var = Var + 1; END WHILE; END; // delimiter ; -- 界定符改回分号
使用loop insert写入200万行

-- 开启事务插入,否则会很慢 begin; CALL BatchInsert1(1, 2000000); commit; Query OK, 1 row affected (1 min 7.06 sec)
2560万 多对一JOIN 200万, 分组,排序
select t1.lname,count(*) from employees t1 join employees1 t2 using (id) group by t1.lname order by count(*) desc,lname limit 10;
简单查询性能(因为以上查询几个小时都没有出结果, 不得不新建一个200万的表进行查询测试):
CREATE TABLE employees2 ( id INT NOT NULL, fname VARCHAR(30), lname VARCHAR(30), birth TIMESTAMP, hired DATE NOT NULL DEFAULT '1970-01-01', separated DATE NOT NULL DEFAULT '9999-12-31', job_code INT NOT NULL, store_id INT NOT NULL );
DROP PROCEDURE IF EXISTS BatchInser2;
delimiter // -- 把界定符改成双斜杠 CREATE PROCEDURE BatchInsert2(IN init INT, IN loop_time INT) -- 第一个参数为初始ID号(可自定义),第二个位生成MySQL记录个数 BEGIN DECLARE Var INT; DECLARE ID INT; SET Var = 0; SET ID = init; WHILE Var < loop_time DO insert into employees2 (id, fname, lname, birth, hired, separated, job_code, store_id) values (ID, CONCAT('chen', ID), CONCAT('haixiang', ID), Now(), Now(), Now(), 1, ID); SET ID = ID + 1; SET Var = Var + 1; END WHILE; END; // delimiter ; -- 界定符改回分号
-- 开启事务插入,否则会很慢
begin; CALL BatchInsert2(1, 2000000); commit;
Query OK, 1 row affected (1 min 7.06 sec)
创建索引
create index idx_employees2_1 on employees2(id);
建立查询存储过程, 查询200万次.
DROP PROCEDURE IF EXISTS select1;
delimiter // -- 把界定符改成双斜杠 CREATE PROCEDURE select1(IN init INT, IN loop_time INT) -- 第一个参数为初始ID号(可自定义),第二个位生成MySQL记录个数 BEGIN DECLARE Var INT; DECLARE ID1 INT; DECLARE vid INT; DECLARE vfname VARCHAR(30); DECLARE vlname VARCHAR(30); DECLARE vbirth TIMESTAMP; DECLARE vhired DATE; DECLARE vseparated DATE; DECLARE vjob_code INT; DECLARE vstore_id INT; SET Var = 0; SET ID1 = init; WHILE Var < loop_time DO select t.id,t.fname,t.lname,t.birth,t.hired,t.separated,t.job_code,t.store_id into vid,vfname,vlname,vbirth,vhired,vseparated,vjob_code,vstore_id from employees2 t where t.id=id1; SET ID1 = ID1 + 1; SET Var = Var + 1; END WHILE; END; // delimiter ; -- 界定符改回分号
基于KEY简单查询, 查询200万次的耗时.
-- 开启事务查询
begin; CALL select1(1, 2000000); commit;

Query OK, 1 row affected (1 min 10.23 sec)
MySQL 1亿+:
继续测试到1亿数据量.
mysql> insert into employees select * from employees; Query OK, 27600000 rows affected (4 min 38.62 sec) Records: 27600000 Duplicates: 0 Warnings: 0 mysql> insert into employees select * from employees; Query OK, 55200000 rows affected (11 min 13.40 sec) Records: 55200000 Duplicates: 0 Warnings: 0 mysql> select count(*) from employees; +-----------+ | count(*) | +-----------+ | 110400000 | +-----------+ 1 row in set (28.00 sec) mysql> select count(distinct id) from employees ; +--------------------+ | count(distinct id) | +--------------------+ | 2000000 | +--------------------+ 1 row in set (1 min 17.73 sec) mysql> select count(*) from (select id from employees group by id) t; +----------+ | count(*) | +----------+ | 2000000 | +----------+ 1 row in set (1 min 24.64 sec)
1.1亿全量数据更新
mysql> update employees set lname=lname||'new'; Query OK, 110400000 rows affected, 65535 warnings (21 min 30.34 sec) Rows matched: 110400000 Changed: 110400000 Warnings: 220800000
1.1亿 多对一JOIN 200万, 分组,排序, 超过3小时没有查询出结果.
select t1.lname,count(*) from employees t1 join employees1 t2 using (id) group by t1.lname order by count(*) desc,lname limit 10;
1.1亿创建索引
mysql> create index idx_employees_1 on employees(id); Query OK, 0 rows affected (3 min 49.04 sec) Records: 0 Duplicates: 0 Warnings: 0
阿里云RDS PostgreSQL 12测试    
测试表
CREATE TABLE employees ( id INT NOT NULL, fname VARCHAR(30), lname VARCHAR(30), birth TIMESTAMP, hired DATE NOT NULL DEFAULT '1970-01-01', separated DATE NOT NULL DEFAULT '9999-12-31', job_code INT NOT NULL, store_id INT NOT NULL);
直接使用srf快速写入20万数据
\timing
insert into employees (id, fname, lname, birth, hired, separated, job_code, store_id) selectID, CONCAT('chen', ID), CONCAT('haixiang', ID), Now(), Now(), Now(), 1, IDfrom generate_series(1,200000) id;
INSERT 0 200000Time: 355.652 ms
也可以使用和mysql一样的方法loop insert写入20万
create or replace function BatchInsert(IN init INT, IN loop_time INT) -- 第一个参数为初始ID号(可自定义),第二个位生成记录个数 returns void as $$ DECLAREVar INT := 0; beginfor id in init..init+loop_time-1 loopinsert into employees (id, fname, lname, birth, hired, separated, job_code, store_id) values (ID, CONCAT('chen', ID), CONCAT('haixiang', ID), Now(), Now(), Now(), 1, ID); end loop; end; $$ language plpgsql strict;

db1=# select batchinsert(1,200000); batchinsert -------------
(1 row) Time: 1292.559 ms (00:01.293)
使用insert into继续批量写入
db1=> insert into employees select * from employees ; INSERT 0 400000 Time: 322.335 ms db1=> insert into employees select * from employees ; INSERT 0 800000 Time: 835.365 ms db1=> insert into employees select * from employees ; INSERT 0 1600000 Time: 1622.475 ms (00:01.622) db1=> insert into employees select * from employees ; INSERT 0 3200000 Time: 3583.787 ms (00:03.584) db1=> insert into employees select * from employees ; INSERT 0 6400000 Time: 7277.764 ms (00:07.278) db1=> insert into employees select * from employees ; INSERT 0 12800000 Time: 15639.482 ms (00:15.639) db1=> \dt+ employees List of relations Schema | Name | Type | Owner | Size | Description --------+-----------+-------+---------+---------+------------- public | employees | table | user123 | 2061 MB | (1 row)
查询性能
db1=> select count(*) from employees ; count ---------- 25600000 (1 row)
Time: 604.982 ms
求distinct性能
db1=> select count(*) from (select id from employees group by id) t; count -------- 200000(1 row)
Time: 2982.907 ms (00:02.983)
再写入200万
insert into employees (id, fname, lname, birth, hired, separated, job_code, store_id) selectID, CONCAT('chen', ID), CONCAT('haixiang', ID), Now(), Now(), Now(), 1, IDfrom generate_series(1,2000000) id;
测试表2, 写入200万.
CREATE TABLE employees1 ( id INT NOT NULL, fname VARCHAR(30), lname VARCHAR(30), birth TIMESTAMP, hired DATE NOT NULL DEFAULT '1970-01-01', separated DATE NOT NULL DEFAULT '9999-12-31', job_code INT NOT NULL, store_id INT NOT NULL);

insert into employees1 (id, fname, lname, birth, hired, separated, job_code, store_id) selectID, CONCAT('chen', ID), CONCAT('haixiang', ID), Now(), Now(), Now(), 1, IDfrom generate_series(1,2000000) id;
INSERT 0 2000000Time: 3037.777 ms (00:03.038)
2560万 多对一JOIN 200万, 分组,排序
select t1.lname,count(*) from employees t1 join employees1 t2 using (id) group by t1.lname order by count(*) desc,lname limit 10; lname | count ----------------+------- haixiang1 | 129 haixiang10 | 129 haixiang100 | 129 haixiang1000 | 129 haixiang10000 | 129 haixiang100000 | 129 haixiang100001 | 129 haixiang100002 | 129 haixiang100003 | 129 haixiang100004 | 129 (10 rows)
Time: 8897.907 ms (00:08.898)
简单查询性能:
创建索引
create index idx_employees1_1 on employees1(id); CREATE INDEXTime: 1436.346 ms (00:01.436)
基于KEY简单查询, 查询200万次的耗时.
do language plpgsql $$ declarebeginfor i in 1..2000000 loop perform * from employees1 where id=i; end loop; end; $$;
DOTime: 9515.728 ms (00:09.516) db1=> select 9515.728/2000000; ?column? ------------------------ 0.00475786400000000000 (1 row)
PG 1亿+:    
db1=> INSERT INTO employees select * from employees; INSERT 0 27600000 Time: 25050.665 ms (00:25.051) db1=> INSERT INTO employees select * from employees; INSERT 0 55200000 Time: 64726.430 ms (01:04.726) db1=> select count(*) from employees; count ----------- 110400000 (1 row) Time: 7286.152 ms (00:07.286) db1=> select count(distinct id) from employees; count --------- 2000000 (1 row) Time: 39783.068 ms (00:39.783) db1=> select count(*) from (select id from employees group by id) t; count --------- 2000000 (1 row) Time: 14668.305 ms (00:14.668) db1=> select t1.lname,count(*) from employees t1 join employees1 t2 using (id) group by t1.lname order by count(*) desc,lname limit 10; lname | count ----------------+------- haixiang1 | 516 haixiang10 | 516 haixiang100 | 516 haixiang1000 | 516 haixiang10000 | 516 haixiang100000 | 516 haixiang100001 | 516 haixiang100002 | 516 haixiang100003 | 516 haixiang100004 | 516 (10 rows) Time: 33731.431 ms (00:33.731)
更新1.1亿
db1=> update employees set lname=lname||'new'; UPDATE 110400000 Time: 385372.063 ms (06:25.372)
创建索引:
db1=> create index idx_employees_1 on employees(id); CREATE INDEX Time: 70450.491 ms (01:10.450)
MySQL vs PG 性能报表
8核32G 1500G essd云盘, MySQL 8.0 vs PG 12
数据量
sql
MySQL耗时
PG耗时
20万
{写入}存储过程loop insert
7.53 s
1.29 s
20万
{写入}SRF insert
不支持
0.36 s
40万
{写入}INSERT INTO employees select * from employees;
3.25 s
0.32 s
80万
{写入}INSERT INTO employees select * from employees;
6.51 s
0.84 s
160万
{写入}INSERT INTO employees select * from employees;
12.93 s
1.62 s
320万
{写入}INSERT INTO employees select * from employees;
28.61 s
3.58 s
640万
{写入}INSERT INTO employees select * from employees;
56.48 s
7.28 s
1280万
{写入}INSERT INTO employees select * from employees;
115.30 s
15.64 s
2760万
{写入}INSERT INTO employees select * from employees;
278.62 s
25.05 s
5520万
{写入}INSERT INTO employees select * from employees;
673.40 s
64.73 s
200万
{普通查询}KV查询200万次.PS: 进程模型,建议实际应用时使用连接池,总连接控制在1000以内绝佳,未来支持内置线程池,几万连接完全没问题.
70.23 s
9.52 s
2560万
{复杂查询}select count(*) from employees;
6.15 s
0.60 s
2560万
{复杂查询}select count(distinct id) from employees;
16.67 s
7.85 s
2560万
{复杂查询}select count(*) from (select id from employees group by id) t;
15.52 s
2.98 s
1.1亿
{复杂查询}select count(*) from employees;
28 s
7.29 s
1.1亿
{复杂查询}select count(distinct id) from employees;
77.73 s
39.78 s
1.1亿
{复杂查询}select count(*) from (select id from employees group by id) t;
84.64 s
14.67 s
2760万 多对一JOIN 200万
{JOIN + 运算}select t1.lname,count(*) from employees t1 join employees1 t2 using (id) group by t1.lname order by count(*) desc,lname limit 10;
超过3小时未出结果
8.90 s
1.1亿 多对一JOIN 200万
{JOIN + 运算}select t1.lname,count(*) from employees t1 join employees1 t2 using (id) group by t1.lname order by count(*) desc,lname limit 10;
超过3小时未出结果
33.73 s
1.1亿
{更新}update employees set lname=concat(lname,'new');
1290.34 s
70.45 s
1.1亿
{创建索引}create index idx_employees_1 on employees(id);
229.04 s
70.45 s



通过以上测试, 在大多数场景中, 阿里云RDS PG相比MySQL的综合性能提升了1个数量级, PG+MySQL结合使用可以大幅降低企业成本. 疫情无情PG有情, 别裁员了, 建立多元化的技术栈, 强化企业IT能力更重要.


更多应用场景和使用方法请参考回顾视频, 包括如何将mysql数据同步到pg(dts):
《阿里云 RDS PostgreSQL+MySQL 联合解决方案课程 - 汇总视频、课件》
课程内容: 
· 2019.12.30 19:30 RDS PG产品概览,如何与mysql结合使用
· 2019.12.31 19:30 如何连接PG,GUI(pgadmin, navicat, dms),cli的使用
· 2020.1.3 19:30 如何压测PG数据库、如何瞬间构造海量测试数据
· 2020.1.6 19:30 mysql与pg类型、语法、函数等对应关系
· 2020.1.7 19:30 如何将mysql数据同步到pg(dts)
· 2020.1.8 19:30 PG外部表妙用 - mysql_fdw, oss_fdw(直接读写mysql、冷热分离)
· 2020.1.9 19:30 PG应用场景介绍 - 并行计算,实时分析
· 2020.1.10 19:30 PG应用场景介绍 - GIS
· 2020.1.13 19:30 PG应用场景介绍 - 用户画像、实时营销系统
· 2020.1.14 19:30 PG应用场景介绍 - 多维搜索
· 2020.1.15 19:30 PG应用场景介绍 - 向量计算、图像搜索
· 2020.1.16 19:30 PG应用场景介绍 - 全文检索、模糊查询
· 2020.1.17 19:30 pg 数据分析语法介绍
· 2020.1.18 19:30 pg 更多功能了解:扩展语法、索引、类型、存储过程与函数。如何加入PG技术社群
阿里云PG免费试用活动进行中, 请钉钉扫码加入咨询:
 

 
文中的相关链接:
 
1、《阿里云 RDS PostgreSQL+MySQL 联合解决方案课程 - 汇总视频、课件》
https://github.com/digoal/blog/blob/master/202001/20200118_02.md
 
2、https://github.com/digoal/blog/blob/master/202002/20200206_02.md
https://github.com/digoal/blog/blob/master/202002/20200206_02.md
 
3、《PG buildin pool(内置连接池)版本 原理与测试》
https://github.com/digoal/blog/blob/master/201805/20180521_03.md

PostgreSQL中文社区欢迎广大技术人员投稿

投稿邮箱:press@postgres.cn


    您可能也对以下帖子感兴趣

    文章有问题?点此查看未经处理的缓存