You create your datastructure and define a table which will allow UPSERT (insert if document not exists) and get one of this errors:
1170 BLOB/TEXT column 'document_id' used in key specification without a key length
Specified key was too long; max key length is 3072 bytes
The Problem
When creating an index there is an impelementation specific limit on the maximum number of bytes allowd per key.
It is bytes and not characters as it depends on the defined charset of the collation how many bytes are needed per character.
Here is a sample table which is uses the columns collection
and document_id
for the primary key:
CREATE TABLE type_varchar (
`collection` TEXT COLLATE utf8mb3_unicode_ci NOT NULL,
`document_id` varchar(255) COLLATE utf8mb3_unicode_ci NOT NULL,
`document_type` longtext COLLATE utf8mb3_unicode_ci NOT NULL,
`document` longtext COLLATE utf8mb3_unicode_ci NOT NULL COMMENT '(DC2Type:json)',
PRIMARY KEY (collection,document_id)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb3 COLLATE=utf8mb3_unicode_ci;
The result of this will be this error:
SQLSTATE[42000]: Syntax error or access violation: 1170 BLOB/TEXT column 'document_id' used in key specification without a key length
Some people or sources on the web will tell you that you should not use BLOB/TEXT for the index and suggest to use VARCHAR:
CREATE TABLE type_varchar (
`collection` varchar(800) COLLATE utf8mb3_unicode_ci NOT NULL,
`document_id` varchar(255) COLLATE utf8mb3_unicode_ci NOT NULL,
`document_type` longtext COLLATE utf8mb3_unicode_ci NOT NULL,
`document` longtext COLLATE utf8mb3_unicode_ci NOT NULL COMMENT '(DC2Type:json)',
PRIMARY KEY (collection,document_id)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb3 COLLATE=utf8mb3_unicode_ci;
The result of this will be this error:
SQLSTATE[42000]: Syntax error or access violation: Specified key was too long; max key length is 3072 bytes
How To Fix?
Fixing both errors only requires to add a length value of (768)
to the column specified in the primary key:
PRIMARY KEY (collection(768),document_id)
But how do you calulate the value for the length?
- get your collection charset
COLLATE utf8mb3_unicode_ci
- lookup the bytes for the charaterset
utf8mb3
or the aliasutf8
utf8mb3: 3 bytes
Based on the 2. error, a maximum of 3072 bytes is allowed per column by the database engine (MySQL / MariaDB):
3072 / 3 = 1024
On my MariaDB 10.8.3 instance this fails as it looks like that the index is using
utf8mb4
a 4 byte collation3072 / 4 = 768
which works!
Difference between TEXT and VARCHAR
On a first look both support up to 65,535 bytes and only need the space used by real data but MySQL imposes a limit of 65,535 bytes for the max size of each row.
The size of a VARCHAR column is counted towards the maximum row size, while TEXT columns are assumed to be storing their data by reference so they only need 9-12 bytes.
That means even if the "theoretical" max size of your VARCHAR field is 65,535 characters you won't be able to achieve that if you have more than one column in your table.
TEXT
- fixed maximum of 65,535 bytes (no option to limit the size)
- needs
2+c
bytes on disk, where c is the length of the bytes used by the string based on the charset - no default value
- tempory tables will use a disk based table (MEMORY table does not support TEXT/BLOB)
- requires a defined maximum length of the part which becomes a key
VARCHAR(N)
- variable maximum size of
N
bytes (charaters depend o) -
N
is between 1 and 65,535 - Performance implications / Stackexchange - takes 1 +
c
bytes (forN
≤ 255) or 2 +c
(for 256 ≤N
≤ 65535) bytes of disk space wherec
is the length of the stored string - part of a index works exept for
N
> max key length which reqires to manually set the length of the key
Some of the implementation details depend on the engine e.g.
InnoDB
orMyISAM
.Real content using
utf8mb4
will use 2, 3 or 4 byted depending on the character!
Fail Primary/Unique Keys with limited length?
Does a shortend index key still provide a unique key to fullfill all requirements for a primary/unique key to not allow dupblicates?
Test with a index key with length 1 - collection(1)
:
CREATE TABLE type_varchar (
`collection` varchar(800) COLLATE utf8mb3_unicode_ci NOT NULL,
`document_id` varchar(255) COLLATE utf8mb3_unicode_ci NOT NULL,
`document_type` longtext COLLATE utf8mb3_unicode_ci NOT NULL,
`document` longtext COLLATE utf8mb3_unicode_ci NOT NULL COMMENT '(DC2Type:json)',
PRIMARY KEY (collection(1),document_id)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb3 COLLATE=utf8mb3_unicode_ci;
insert two times the same content:
INSERT INTO `type_varchar` (`collection`, `document_id`, `document_type`, `document`)
VALUES
(REPEAT('1',800), '1', 't1', 'doc1');
INSERT INTO `type_varchar` (`collection`, `document_id`, `document_type`, `document`)
VALUES
(REPEAT('1',800), '1', 't1', 'doc1');
the result is the expected error:
Duplicate entry '1111111111111111111111111111111111111111111111111111111111111...' for key 'PRIMARY'
PRIMARY KEY or UNIQUE KEY with limited length are valid unique indexes!
What are the implications:
- performance will suffer as there will be a need to scan the table
- size of index will be very small
Top comments (0)