TiDB 6.0 新特性解读丨 Collation 规则
对数据库而言,合适的字符集和规则能够大大提升使用者运维和分析的效率。TiDB 从 v4.0 开始支持新 collation 规则,并于 TiDB 6.0 版本再次进行了更新。本文将深入解读 Collation 规则在 TiDB 6.0 中的变更和应用。
Ps. 本文已入选 TiDB 6.0 Book Rush,如果你也想分享 6.0 的使用体验,并获得丰厚奖励,欢迎加入我们!
引
默认启用新 Collation 规则
TiDB 从 v4.0 开始支持新 collation 规则,在大小写不敏感、口音不敏感、padding 规则上与 MySQL 行为保持一致。
TiDB [(none)] 18:45:27> select * from mysql.tidb where variable_name = 'new_collation_enabled';
+-----------------------+----------------+----------------------------------------------------+
| VARIABLE_NAME | VARIABLE_VALUE | COMMENT |
+-----------------------+----------------+----------------------------------------------------+
| new_collation_enabled | True | If the new collations are enabled. Do not edit it. |
+-----------------------+----------------+----------------------------------------------------+
1 row in set (0.003 sec)
TiDB [(none)] 18:45:51> select version()\G
*************************** 1. row ***************************
version(): 5.7.25-TiDB-v6.0.0
1 row in set (0.001 sec)
TiDB [(none)] 18:46:00> SELECT * FROM information_schema.collations;
+--------------------+--------------------+------+------------+-------------+---------+
| COLLATION_NAME | CHARACTER_SET_NAME | ID | IS_DEFAULT | IS_COMPILED | SORTLEN |
+--------------------+--------------------+------+------------+-------------+---------+
| ascii_bin | ascii | 65 | Yes | Yes | 1 |
| binary | binary | 63 | Yes | Yes | 1 |
| gbk_bin | gbk | 87 | | Yes | 1 | (#28645)
| gbk_chinese_ci | gbk | 28 | Yes | Yes | 1 | (#28645)
| latin1_bin | latin1 | 47 | Yes | Yes | 1 |
| utf8_bin | utf8 | 83 | Yes | Yes | 1 |
| utf8_general_ci | utf8 | 33 | | Yes | 1 |
| utf8_unicode_ci | utf8 | 192 | | Yes | 1 | (#18678)
| utf8mb4_bin | utf8mb4 | 46 | Yes | Yes | 1 |
| utf8mb4_general_ci | utf8mb4 | 45 | | Yes | 1 |
| utf8mb4_unicode_ci | utf8mb4 | 224 | | Yes | 1 | (#18678)
+--------------------+--------------------+------+------------+-------------+---------+
11 rows in set (0.001 sec)
TiDB [test] 19:05:14> SHOW CHARACTER SET;
+---------+-------------------------------------+-------------------+--------+
| Charset | Description | Default collation | Maxlen |
+---------+-------------------------------------+-------------------+--------+
| ascii | US ASCII | ascii_bin | 1 |
| binary | binary | binary | 1 |
| gbk | Chinese Internal Code Specification | gbk_chinese_ci | 2 |
| latin1 | Latin1 | latin1_bin | 1 |
| utf8 | UTF-8 Unicode | utf8_bin | 3 |
| utf8mb4 | UTF-8 Unicode | utf8mb4_bin | 4 |
+---------+-------------------------------------+-------------------+--------+
6 rows in set (0.001 sec)
TiDB-v5.4 [test] 10:17:22> select version()\G
*************************** 1. row ***************************
version(): 5.7.25-TiDB-v5.4.0
1 row in set (0.001 sec)
TiDB-v5.4 [test] 10:19:39> SELECT * FROM information_schema.collations;
+----------------+--------------------+------+------------+-------------+---------+
| COLLATION_NAME | CHARACTER_SET_NAME | ID | IS_DEFAULT | IS_COMPILED | SORTLEN |
+----------------+--------------------+------+------------+-------------+---------+
| utf8mb4_bin | utf8mb4 | 46 | Yes | Yes | 1 |
| latin1_bin | latin1 | 47 | Yes | Yes | 1 |
| binary | binary | 63 | Yes | Yes | 1 |
| ascii_bin | ascii | 65 | Yes | Yes | 1 |
| utf8_bin | utf8 | 83 | Yes | Yes | 1 |
| gbk_bin | gbk | 87 | Yes | Yes | 1 |
+----------------+--------------------+------+------------+-------------+---------+
6 rows in set (0.001 sec)
新 Collation 注意事项
service_configs:
tidb:
new_collations_enabled_on_first_bootstrap: true
Collation Bug 修复
DROP TABLE IF EXISTS t1;
CREATE TABLE t1 (
c1 char(20) CHARACTER SET utf8 COLLATE utf8_bin,
c2 char(20) CHARACTER SET utf8mb4 COLLATE utf8mb4_bin);
INSERT INTO t1 VALUES ('UUtJeaV','snRXXCZHBPW');
SET names utf8mb4 collate utf8mb4_bin;
SELECT greatest( c1, c2 ) as expr1 FROM t1;
SELECT least( c1, c2 ) as expr1 FROM t1;
SET names utf8mb4 collate utf8mb4_general_ci;
SELECT greatest( c1, c2 ) as expr1 FROM t1;
SELECT least( c1, c2 ) as expr1 FROM t1;
// The collation of JSON is always utf8mb4_bin in builtin-func which is same as MySQL
// see details https://github.com/pingcap/tidb/issues/31320#issuecomment-1010599311
if isJSON {
dstCharset, dstCollation = charset.CharsetUTF8MB4, charset.CollationUTF8MB4
}
DROP TABLE IF EXISTS t2;
CREATE TABLE t2 (c1 json);
INSERT INTO t2 VALUES ('{\"测试\": \"你好\"}');
SELECT collation(c1), collation(upper(c1)), collation(elt(1, c1, 0x12)) FROM t2;
新增内置函数 CHARSET()
TiDB [test] 23:54:42> select version()\G
*************************** 1. row ***************************
version(): 5.7.25-TiDB-v6.0.0
1 row in set (0.001 sec)
TiDB [test] 00:03:51> set names utf8mb4;
Query OK, 0 rows affected (0.000 sec)
TiDB [test] 00:03:58> select charset(1);
+------------+
| charset(1) |
+------------+
| binary |
+------------+
1 row in set (0.001 sec)
TiDB [test] 00:04:03> select charset('1');
+--------------+
| charset('1') |
+--------------+
| utf8mb4 |
+--------------+
1 row in set (0.001 sec)
文档拾遗
小结
本文对 Collation 特性在 TiDB 6.0 中的变更进行了汇总阐释及举例说明,当前 TiDB 所提供的几种 Collation 已经可以支撑大部分业务场景和大部分系统迁移需求。建议在项目设计之初,就选用普适类型的字符集和规则,毕竟效率是提升生产力的重要因素之一。
活动预告
新发布的 TiDB 6.0 大幅增强了 TiDB 作为企业级产品的可管理性,具备更成熟的 HTAP 与容灾能力,加入多项云原生数据库所需的基础特性。抢先试用 TiDB 6.0、TiFlash、TiEM 和 PingCAP Clinic 等多款新品与服务,参与 Book Rush 贡献,将有机会享受“TiDB 荣誉体验官”在内的多项权益!
扫描下方二维码了解详情
关注公众号:拾黑(shiheibook)了解更多
[广告]赞助链接:
四季很好,只要有你,文娱排行榜:https://www.yaopaiming.com/
让资讯触达的更精准有趣:https://www.0xu.cn/
随时掌握互联网精彩
- 1 习近平同普京举行视频会晤 7915619
- 2 苹果一夜之间没了8000亿 7921033
- 3 王菲 春晚 7875671
- 4 商品供应有保障 假日消费活力足 7722736
- 5 王鹤棣在巴黎穿上了道明寺皮肤 7625031
- 6 中方是否解除对美新任国务卿制裁? 7515329
- 7 市民在大连街头发现4亿年前的化石 7484093
- 8 黄子韬给中奖员工直接转账18888元 7376728
- 9 男子开车坠冰湖 淡定用碗往外舀水 7228828
- 10 小米YU7最新实车图曝光 7182546