page contents

挑战在MySQL 8.0+ZFS上创建10亿张表

我在 MySQL8.0上创建了10亿+张InnoDB表(注意是表而不是行),如下:

attachments-2021-01-kwG2GcgI5ff528b9a2bf4.png

摘要

我在 MySQL8.0上创建了10亿+张InnoDB表(注意是表而不是行),如下:

$ mysql -A
Welcome to the MySQL monitor.  Commands end with ; or g.
Your MySQL connection id is 1425329
Server version: 8.0.12 MySQL Community Server - GPL
 
Copyright (c) 2000, 2018, Oracle and/or its affiliates. All rights reserved.
 
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
 
Type 'help;' or 'h' for help. Type 'c' to clear the current input statement.
 
mysql> select count(*) from information_schema.tables;
+------------+
| count(*)   |
+------------+
| 1011570298 |
+------------+
1 row in set (6 hours 57 min 6.31 sec)

是的,它耗费了6小时57分钟去统计表数目!

谁会需要创建10亿+张表?

在我之前的文章中,我创建和测试了MySQL 8.0上创建4000w张表(这是一个真实的案例)。不过10亿张表不是真实的案例场景,是因为我想挑战下在PG上创建了10亿张表的测试,所以准备在MySQL下创建下10亿张InnoDB表。

注:我认为MySQL8.0才是第一个具有创建10亿张InnoDB表可能性的MySQL版本。


挑战10亿张InnoDB表

磁盘空间

首先面临第一个也是最重要的挑战就是磁盘空间。创建.ibd文件时,InnoDB在磁盘上分配数据页。如果不做磁盘压缩,我们至少需要25T的存储容量。不过好消息是:我们的ZFS提供透明的磁盘压缩。以下是磁盘利用率的表现:

实际大小:

# du -sh --apparent-size /mysqldata/
26T     /mysqldata/

压缩后:

# du -sh /mysqldata/
2.4T    /mysqldata/

压缩率:

# zfs get compression,compressratio
...
mysqldata/mysql/data             compressratio         7.14x                      -
mysqldata/mysql/data             compression           gzip                       inherited from mysqldata/mysql

(看起来报告不是100%准确,我们达到了10倍+的压缩率)

许多小文件

为每张表要创建一个表空间文件,这是大问题。不过在MySQL 8.0中,我们可以创建一个通用表空间(General Tablespace)并在创建表时将表”分配“到表空间上。这里我为每个database创建一个通用表空间,每个database上创建了1000张表。

结果就是:

mysql> select count(*) from information_schema.schema;
+----------+
| count(*) |
+----------+
|  1011575 |
+----------+
1 row in set (1.31 sec)


创建表

另一个挑战点就是如何快速的创建表从而避免我们要耗费数月的时间。我用了三个锦囊妙计:

  • 禁用MySQL里面一切可能的一致性检测,减小innodb的page大小为4k(这些配置更改不适合生产环境)
  • 并发创建表。因为之前MySQL 8.0中的互斥量争用问题已经得到修复,所以并发创建表表现良好。
  • 在AWS ec2 i3.8 xlarge的实例上使用本地的NVMe卡

my.cnf config file (I repeat: do not use this in production):

my.cnf的配置信息如下(重申一遍:不要直接用在生产上):

[mysqld]
default-authentication-plugin = mysql_native_password
performance_schema=0
datadir=/mysqldata/mysql/data
socket=/mysqldata/mysql/data/mysql.sock
log-error = /mysqldata/mysql/log/error.log
skip-log-bin=1
innodb_log_group_home_dir = /mysqldata/mysql/log/
innodb_doublewrite = 0
innodb_checksum_algorithm=none
innodb_log_checksums=0
innodb_flush_log_at_trx_commit=0
innodb_log_file_size=2G
innodb_buffer_pool_size=100G
innodb_page_size=4k
innodb_flush_method=nosync
innodb_io_capacity_max=20000
innodb_io_capacity=5000
innodb_buffer_pool_instances=32
innodb_stats_persistent = 0
tablespace_definition_cache = 524288
schema_definition_cache = 524288
table_definition_cache = 524288
table_open_cache=524288
table_open_cache_instances=32
open-files-limit=1000000

ZFS pool:

# zpool status
 pool: mysqldata
 state: ONLINE
 scan: scrub repaired 0B in 1h49m with 0 errors on Sun Oct 14 02:13:17 2018
config:
 NAME        STATE     READ WRITE CKSUM
 mysqldata   ONLINE       0     0     0
 nvme0n1   ONLINE       0     0     0
 nvme1n1   ONLINE       0     0     0
 nvme2n1   ONLINE       0     0     0
 nvme3n1   ONLINE       0     0     0
errors: No known data errors

一个简单的并发创建表的脚本(表结构使用了sysbench里面的表):

#/bin/bash
function do_db {
 db_exist=$(mysql -A -s -Nbe "select 1 from information_schema.schemata where schema_name = '$db'")
 if [ "$db_exist" == "1" ]; then echo "Already exists: $db"; return 0; fi;
 tbspace="create database $db; use $db; CREATE TABLESPACE $db ADD DATAFILE '$db.ibd' engine=InnoDB";
 #echo "Tablespace $db.ibd created!"
 tables=""
 for i in {1..1000}
 do
 table="CREATE TABLE sbtest$i ( id int(10) unsigned NOT NULL AUTO_INCREMENT, k int(10) unsigned NOT NULL DEFAULT '0', c varchar(120) NOT NULL DEFAULT '', pad varchar(60) NOT NULL DEFAULT '', PRIMARY KEY (id), KEY k_1 (k) ) ENGINE=InnoDB DEFAULT CHARSET=latin1 tablespace $db;"
 tables="$tables; $table;"
 done
 echo "$tbspace;$tables" | mysql
}
c=0
echo "starting..."
c=$(mysql -A -s -Nbe "select max(cast(SUBSTRING_INDEX(schema_name, '_', -1) as unsigned)) from information_schema.schemata where schema_name like 'sbtest_%'")
for m in {1..100000}
do
 echo "m=$m"
 for i in {1..30}
 do
 let c=$c+1
 echo $c
 db="sbtest_$c"
 do_db &
 done
 wait
done

我们创建表有多快呢?可以通过下面的状态量观测:

# mysqladmin -i 10 -r ex|grep Com_create_table
...
| Com_create_table                                      | 6497                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
| Com_create_table                                      | 6449

我们约每秒创建650张表,上面是每10秒创建的表数量。

统计表数量

之前我们通过"count(*) from information_schema.tables"方式查看表数量耗费了6个多小时。因为:

  • MySQL 8.0 使用了一个新的数据字典(这很妙,避免创建10亿个frm文件)。所有的内容都存储在下面这个文件里:
# ls -lah /mysqldata/mysql/data/mysql.ibd
-rw-r----- 1 mysql mysql 6.1T Oct 18 15:02 /mysqldata/mysql/data/mysql.ibd
  • information_schema.tables实际上是一个视图:
mysql> show create table information_schema.tablesG
*************************** 1. row ***************************
 View: TABLES
 Create View: CREATE ALGORITHM=UNDEFINED DEFINER=`mysql.infoschema`@`localhost` SQL SECURITY DEFINER VIEW `information_schema`.`TABLES` AS select `cat`.`name` AS `TABLE_CATALOG`,`sch`.`name` AS `TABLE_SCHEMA`,`tbl`.`name` AS `TABLE_NAME`,`tbl`.`type` AS `TABLE_TYPE`,if((`tbl`.`type` = 'BASE TABLE'),`tbl`.`engine`,NULL) AS `ENGINE`,if((`tbl`.`type` = 'VIEW'),NULL,10) AS `VERSION`,`tbl`.`row_format` AS `ROW_FORMAT`,internal_table_rows(`sch`.`name`,`tbl`.`name`,if(isnull(`tbl`.`partition_type`),`tbl`.`engine`,''),`tbl`.`se_private_id`,(`tbl`.`hidden` <> 'Visible'),`ts`.`se_private_data`,coalesce(`stat`.`table_rows`,0),coalesce(cast(`stat`.`cached_time` as unsigned),0)) AS `TABLE_ROWS`,internal_avg_row_length(`sch`.`name`,`tbl`.`name`,if(isnull(`tbl`.`partition_type`),`tbl`.`engine`,''),`tbl`.`se_private_id`,(`tbl`.`hidden` <> 'Visible'),`ts`.`se_private_data`,coalesce(`stat`.`avg_row_length`,0),coalesce(cast(`stat`.`cached_time` as unsigned),0)) AS `AVG_ROW_LENGTH`,internal_data_length(`sch`.`name`,`tbl`.`name`,if(isnull(`tbl`.`partition_type`),`tbl`.`engine`,''),`tbl`.`se_private_id`,(`tbl`.`hidden` <> 'Visible'),`ts`.`se_private_data`,coalesce(`stat`.`data_length`,0),coalesce(cast(`stat`.`cached_time` as unsigned),0)) AS `DATA_LENGTH`,internal_max_data_length(`sch`.`name`,`tbl`.`name`,if(isnull(`tbl`.`partition_type`),`tbl`.`engine`,''),`tbl`.`se_private_id`,(`tbl`.`hidden` <> 'Visible'),`ts`.`se_private_data`,coalesce(`stat`.`max_data_length`,0),coalesce(cast(`stat`.`cached_time` as unsigned),0)) AS `MAX_DATA_LENGTH`,internal_index_length(`sch`.`name`,`tbl`.`name`,if(isnull(`tbl`.`partition_type`),`tbl`.`engine`,''),`tbl`.`se_private_id`,(`tbl`.`hidden` <> 'Visible'),`ts`.`se_private_data`,coalesce(`stat`.`index_length`,0),coalesce(cast(`stat`.`cached_time` as unsigned),0)) AS `INDEX_LENGTH`,internal_data_free(`sch`.`name`,`tbl`.`name`,if(isnull(`tbl`.`partition_type`),`tbl`.`engine`,''),`tbl`.`se_private_id`,(`tbl`.`hidden` <> 'Visible'),`ts`.`se_private_data`,coalesce(`stat`.`data_free`,0),coalesce(cast(`stat`.`cached_time` as unsigned),0)) AS `DATA_FREE`,internal_auto_increment(`sch`.`name`,`tbl`.`name`,if(isnull(`tbl`.`partition_type`),`tbl`.`engine`,''),`tbl`.`se_private_id`,(`tbl`.`hidden` <> 'Visible'),`ts`.`se_private_data`,coalesce(`stat`.`auto_increment`,0),coalesce(cast(`stat`.`cached_time` as unsigned),0),`tbl`.`se_private_data`) AS `AUTO_INCREMENT`,`tbl`.`created` AS `CREATE_TIME`,internal_update_time(`sch`.`name`,`tbl`.`name`,if(isnull(`tbl`.`partition_type`),`tbl`.`engine`,''),`tbl`.`se_private_id`,(`tbl`.`hidden` <> 'Visible'),`ts`.`se_private_data`,coalesce(cast(`stat`.`update_time` as unsigned),0),coalesce(cast(`stat`.`cached_time` as unsigned),0)) AS `UPDATE_TIME`,internal_check_time(`sch`.`name`,`tbl`.`name`,if(isnull(`tbl`.`partition_type`),`tbl`.`engine`,''),`tbl`.`se_private_id`,(`tbl`.`hidden` <> 'Visible'),`ts`.`se_private_data`,coalesce(cast(`stat`.`check_time` as unsigned),0),coalesce(cast(`stat`.`cached_time` as unsigned),0)) AS `CHECK_TIME`,`col`.`name` AS `TABLE_COLLATION`,internal_checksum(`sch`.`name`,`tbl`.`name`,if(isnull(`tbl`.`partition_type`),`tbl`.`engine`,''),`tbl`.`se_private_id`,(`tbl`.`hidden` <> 'Visible'),`ts`.`se_private_data`,coalesce(`stat`.`checksum`,0),coalesce(cast(`stat`.`cached_time` as unsigned),0)) AS `CHECKSUM`,if((`tbl`.`type` = 'VIEW'),NULL,get_dd_create_options(`tbl`.`options`,if((ifnull(`tbl`.`partition_expression`,'NOT_PART_TBL') = 'NOT_PART_TBL'),0,1))) AS `CREATE_OPTIONS`,internal_get_comment_or_error(`sch`.`name`,`tbl`.`name`,`tbl`.`type`,`tbl`.`options`,`tbl`.`comment`) AS `TABLE_COMMENT` from (((((`mysql`.`tables` `tbl` join `mysql`.`schemata` `sch` on((`tbl`.`schema_id` = `sch`.`id`))) join `mysql`.`catalogs` `cat` on((`cat`.`id` = `sch`.`catalog_id`))) left join `mysql`.`collations` `col` on((`tbl`.`collation_id` = `col`.`id`))) left join `mysql`.`tablespaces` `ts` on((`tbl`.`tablespace_id` = `ts`.`id`))) left join `mysql`.`table_stats` `stat` on(((`tbl`.`name` = `stat`.`table_name`) and (`sch`.`name` = `stat`.`schema_name`)))) where (can_access_table(`sch`.`name`,`tbl`.`name`) and is_visible_dd_object(`tbl`.`hidden`))
character_set_client: utf8
collation_connection: utf8_general_ci

而且通过explain看到它的执行计划如下:

mysql> explain select count(*) from information_schema.tables G
*************************** 1. row ***************************
 id: 1
 select_type: SIMPLE
 table: cat
 partitions: NULL
 type: index
possible_keys: PRIMARY
 key: name
 key_len: 194
 ref: NULL
 rows: 1
 filtered: 100.00
 Extra: Using index
*************************** 2. row ***************************
 id: 1
 select_type: SIMPLE
 table: tbl
 partitions: NULL
 type: ALL
possible_keys: schema_id
 key: NULL
 key_len: NULL
 ref: NULL
 rows: 1023387060
 filtered: 100.00
 Extra: Using where; Using join buffer (Block Nested Loop)
*************************** 3. row ***************************
 id: 1
 select_type: SIMPLE
 table: sch
 partitions: NULL
 type: eq_ref
possible_keys: PRIMARY,catalog_id
 key: PRIMARY
 key_len: 8
 ref: mysql.tbl.schema_id
 rows: 1
 filtered: 11.11
 Extra: Using where
*************************** 4. row ***************************
 id: 1
 select_type: SIMPLE
 table: stat
 partitions: NULL
 type: eq_ref
possible_keys: PRIMARY
 key: PRIMARY
 key_len: 388
 ref: mysql.sch.name,mysql.tbl.name
 rows: 1
 filtered: 100.00
 Extra: Using index
*************************** 5. row ***************************
 id: 1
 select_type: SIMPLE
 table: ts
 partitions: NULL
 type: eq_ref
possible_keys: PRIMARY
 key: PRIMARY
 key_len: 8
 ref: mysql.tbl.tablespace_id
 rows: 1
 filtered: 100.00
 Extra: Using index
*************************** 6. row ***************************
 id: 1
 select_type: SIMPLE
 table: col
 partitions: NULL
 type: eq_ref
possible_keys: PRIMARY
 key: PRIMARY
 key_len: 8
 ref: mysql.tbl.collation_id
 rows: 1
 filtered: 100.00
 Extra: Using index


结论

只是因为个人兴趣,我在MySQL 8.0上创建了10亿张InnoDB表和索引,我成功了。它花费了我大约2周的时间。

大概率MySQL 8.0是MySQL里面第一个支持能够创建10亿张InnoDB表的版本。

ZFS 的压缩再结合NVMe卡,可以降低成本。例如,选择AWS的i3.4xlarge或者i3.8xlarge实例。


attachments-2021-01-QYumdh9U5ff528d11f861.jpg

原文:

https://blog.csdn.net/n88Lpo/article/details/100058927?utm_medium=distribute.pc_category.none-task-blog-new-1.nonecase&depth_1-utm_source=distribute.pc_category.none-task-blog-new-1.nonecase

  • 发表于 2021-01-06 11:04
  • 阅读 ( 763 )

你可能感兴趣的文章

相关问题

0 条评论

请先 登录 后评论
Pack
Pack

1135 篇文章

作家榜 »

  1. 轩辕小不懂 2403 文章
  2. 小柒 1658 文章
  3. Pack 1135 文章
  4. Nen 576 文章
  5. 王昭君 209 文章
  6. 文双 71 文章
  7. 小威 64 文章
  8. Cara 36 文章