DEV Community

Pawan Kukreja
Pawan Kukreja

Posted on

SQL Syntax (Part 02)

Constants

In PostgreSQL, there are three categories of implicitly-typed constants: strings, bit strings, and numbers. Additionally, explicit types may be used to specify constants, allowing for a more precise representation and more effective system management. The subsections that follow cover these options.

String Constants

An arbitrary string of characters enclosed in single quotes (') is referred to as a string constant in SQL. Write two adjacent single quotes, such as 'Dianne's horse,' to include a single-quote character in a string constant. Keep in mind that this differs from a double-quote character (").

String Constants with C-Style Escapes

Additionally, PostgreSQL permits "escape" string constants, an addition to the SQL standard. Just before the first single quotation, write the letter E (in upper- or lowercase), for example, E'foo', to specify an escape string constant. (If you want to continue an escape string constant over many lines, only use E.) A backslash character () starts a backslash escape sequence that looks like C within an escape string.

You are accountable for ensuring that the byte sequences you generate, particularly when employing the octal or hexadecimal escapes, encode legitimate characters according to the server character set. Use of Unicode escapes or an alternate Unicode escape syntax is a good substitute.

String Constants with Unicode Escapes

Another form of escape syntax for strings is also supported by PostgreSQL and allows for the specification of any Unicode characters by code point. Before the starting quotation, without any spaces in between, a Unicode escape string constant begins with U& (an upper- or lower-case letter U followed by an ampersand), for instance, U&'foo'. (Remember that the operator & becomes ambiguous because of this. To get around this issue, use spaces around the operator.) A backslash, a four-digit hexadecimal code point number, or, alternatively, a backslash, a plus sign, and a six-digit hexadecimal code point number can be used to specify Unicode characters in escaped form inside of quotations.

Although the existence of the 6-digit form technically negates the need for this, it is possible to specify UTF-16 surrogate pairs to assemble characters with code points bigger than U+FFFF using either the 4-digit or 6-digit escape form. Surrogate pairs are concatenated into a single code point rather than being stored separately (directly).
The Unicode code point identified by one of these escape sequences is translated to the actual server encoding if the server's encoding is not UTF-8; if that is not feasible, an error is returned.

Additionally, the standard_conforming_strings configuration parameter must be enabled in order for the Unicode escape syntax for string constants to function. This is necessary because, in the absence of it, the syntax may mislead clients that interpret SQL statements, which may result in SQL injections and other security problems. This syntax will be denied with an error notice if the argument is set to off.

Top comments (0)