Project

General

Profile

Feature #6720

Database/table character set and collation

Added by Evgeny Novikov about 4 years ago. Updated almost 3 years ago.

Status:
New
Priority:
Normal
Category:
Deployment
Target version:
-
Start date:
02/02/2016
Due date:
% Done:

0%

Estimated time:
Published in build:

Description

We need to understand whether we need very strong and likely very inefficient character set utf8 and collation utf8_general_ci for all tables [1] or just for particular tables. I am absolutely sure that just a few tables do need such the strong character set and collation, but I am not sure that using different character sets and collations will help. As well I am not sure whether we need such the strong collation at all.

[1] CREATE DATABASE db_name DEFAULT CHARACTER SET = utf8 COLLATE = utf8_general_ci;

History

#1

Updated by Evgeny Novikov about 4 years ago

BTW there are 44 occurrences of string utf8. I am sure that the most of them can be easily replaced with ascii especially assuming implementation of #6643.

#2

Updated by Evgeny Novikov almost 4 years ago

  • Assignee changed from Vladimir Gratinskiy to Evgeny Novikov
  • Priority changed from High to Urgent

It turns out that collation does matter, e.g. for report identifiers and values of report attributes (there are modules net/netfilter/xt_dscp.ko and net/netfilter/xt_DSCP.ko in Linux 3.14).

Here it is quite reasonably explained why collation utf8_unicode_ci is better than utf8_general_ci but both of them case insensitive. The case sensitive collation is utf8_bin. That is why our first step will be to specify this inefficient collation for all tables and columns (I am going to point out this in documentation). Then we will need to investigate how can we relax this strong condition.

#3

Updated by Evgeny Novikov almost 4 years ago

  • Assignee changed from Evgeny Novikov to Vladimir Gratinskiy
  • Priority changed from Urgent to High

The first step was done in 6ec4ca4.

#4

Updated by Evgeny Novikov almost 3 years ago

  • Priority changed from High to Normal

I don't think that this is really important especially we switched to PostgreSQL.

Also available in: Atom PDF