<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Florian Pigorsch</title>
    <description>The latest articles on DEV Community by Florian Pigorsch (@flopp).</description>
    <link>https://dev.to/flopp</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F465517%2Fab82e46a-a444-47b2-93da-74beaf7b2aa7.jpeg</url>
      <title>DEV Community: Florian Pigorsch</title>
      <link>https://dev.to/flopp</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/flopp"/>
    <language>en</language>
    <item>
      <title>Regular and Unusual "Space" Characters</title>
      <dc:creator>Florian Pigorsch</dc:creator>
      <pubDate>Sun, 22 Aug 2021 14:26:49 +0000</pubDate>
      <link>https://dev.to/flopp/regular-and-unusual-space-characters-40pm</link>
      <guid>https://dev.to/flopp/regular-and-unusual-space-characters-40pm</guid>
      <description>&lt;h2&gt;
  
  
  Regular Space Characters
&lt;/h2&gt;

&lt;h3&gt;
  
  
  &lt;a href="https://unicode-explorer.com/c/0020"&gt;U+0020 SPACE&lt;/a&gt;
&lt;/h3&gt;

&lt;p&gt;This is the regular space character as produced by pressing the space bar of your keyboard.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;a href="https://unicode-explorer.com/c/00A0"&gt;U+00A0 NO-BREAK SPACE&lt;/a&gt;
&lt;/h3&gt;

&lt;p&gt;A fixed space that prevents an automatic line break at its position. Abbreviation: NBSP&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;a href="https://unicode-explorer.com/c/2000"&gt;U+2000 EN QUAD&lt;/a&gt;
&lt;/h3&gt;

&lt;p&gt;A 1 en (= 1/2 em) wide space, where 1 em is the height of the current font.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;a href="https://unicode-explorer.com/c/2001"&gt;U+2001 EM QUAD&lt;/a&gt;
&lt;/h3&gt;

&lt;p&gt;A 1 em wide space, where 1 em is the height of the current font.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;a href="https://unicode-explorer.com/c/2002"&gt;U+2002 EN SPACE&lt;/a&gt;
&lt;/h3&gt;

&lt;p&gt;A 1 en (= 1/2 em) wide space, where 1 em is the height of the current font.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;a href="https://unicode-explorer.com/c/2003"&gt;U+2003 EM SPACE&lt;/a&gt;
&lt;/h3&gt;

&lt;p&gt;A 1 em wide space, where 1 em is the height of the current font.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;a href="https://unicode-explorer.com/c/2004"&gt;U+2004 THREE-PER-EM SPACE&lt;/a&gt;
&lt;/h3&gt;

&lt;p&gt;A 1/3 em wide space, where 1 em is the height of the current font. "Thick Space".&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;a href="https://unicode-explorer.com/c/2005"&gt;U+2005 FOUR-PER-EM SPACE&lt;/a&gt;
&lt;/h3&gt;

&lt;p&gt;A 1/4 em wide space, where 1 em is the height of the current font. "Mid Space".&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;a href="https://unicode-explorer.com/c/2006"&gt;U+2006 SIX-PER-EM SPACE&lt;/a&gt;
&lt;/h3&gt;

&lt;p&gt;A 1/6 em wide space, where 1 em is the height of the current font.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;a href="https://unicode-explorer.com/c/2007"&gt;U+2007 FIGURE SPACE&lt;/a&gt;
&lt;/h3&gt;

&lt;p&gt;A space character that is as wide as fixed-width digits. Usually used when typesetting vertically aligned numbers.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;a href="https://unicode-explorer.com/c/2008"&gt;U+2008 PUNCTUATION SPACE&lt;/a&gt;
&lt;/h3&gt;

&lt;p&gt;A space character that is as wide as a perido (".").&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;a href="https://unicode-explorer.com/c/2009"&gt;U+2009 THIN SPACE&lt;/a&gt;
&lt;/h3&gt;

&lt;p&gt;A 1/6 em - 1/4 em wide space, where 1 em is the height of the current font.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;a href="https://unicode-explorer.com/c/200A"&gt;U+200A HAIR SPACE&lt;/a&gt;
&lt;/h3&gt;

&lt;p&gt;Narrower than the "THIN SPACE", usually the thinnest space character.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;a href="https://unicode-explorer.com/c/202F"&gt;U+202F NARROW NO-BREAK SPACE&lt;/a&gt;
&lt;/h3&gt;

&lt;p&gt;A narrow form of a no-break space, typically the width of a "THIN SPACE". Abbreviation: NNBSP.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;a href="https://unicode-explorer.com/c/205F"&gt;U+205F MEDIUM MATHEMATICAL SPACE&lt;/a&gt;
&lt;/h3&gt;

&lt;p&gt;A 4/18 em wide space, where 1 em is the height of the current font. Usually used when typesetting mathematical formulas.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;a href="https://unicode-explorer.com/c/3000"&gt;U+3000 IDEOGRAPHIC SPACE&lt;/a&gt;
&lt;/h3&gt;

&lt;h2&gt;
  
  
  Regular Space Characters with Zero Width
&lt;/h2&gt;

&lt;h3&gt;
  
  
  &lt;a href="https://unicode-explorer.com/c/200B"&gt;U+200B ZERO WIDTH SPACE&lt;/a&gt;
&lt;/h3&gt;

&lt;p&gt;Literally a zero-width space character.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;a href="https://unicode-explorer.com/c/200C"&gt;‌U+200C ZERO WIDTH NON-JOINER&lt;/a&gt;
&lt;/h3&gt;

&lt;p&gt;When placed between two characters that would otherwise be connected into a ligature, a ZWNJ causes them to be printed in their final and initial forms, respectively.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;a href="https://unicode-explorer.com/c/200D"&gt;‍U+200D ZERO WIDTH JOINER&lt;/a&gt;
&lt;/h3&gt;

&lt;p&gt;When placed between two characters that would otherwise not be connected, a ZWJ causes them to be printed in their connected forms (ligature). Also used to join emoji with modifier characters.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;a href="https://unicode-explorer.com/c/2060"&gt;U+2060 WORD JOINER&lt;/a&gt;
&lt;/h3&gt;

&lt;p&gt;A zero width non-breaking space. Abbreviation: WJ.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;a href="https://unicode-explorer.com/c/FEFF"&gt;U+FEFF ZERO WIDTH NO-BREAK SPACE&lt;/a&gt;
&lt;/h3&gt;

&lt;p&gt;The zero width no-break space (ZWNBSP) is a deprecated use of the Unicode character at code point U+FEFF. Character U+FEFF is intended for use as a Byte Order Mark (BOM) at the start of a file. However, if encountered elsewhere, it should, according to Unicode, be treated as a "zero width no-break space". The deliberate use of U+FEFF for this purpose is deprecated as of Unicode 3.2, with the word joiner strongly preferred.&lt;/p&gt;

&lt;h2&gt;
  
  
  Non-Space Characters that Act Like Spaces
&lt;/h2&gt;

&lt;p&gt;The following characters are probably the most interesting: they act like regular space characters, but are typically not considered as such. Because of this, they can often be used in places where a single (regular) space character is not allowed (e.g. as a Youtube video title, in nick names in popular games, etc.).&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;a href="https://unicode-explorer.com/c/180E"&gt;U+180E MONGOLIAN VOWEL SEPARATOR&lt;/a&gt;
&lt;/h3&gt;

&lt;p&gt;The MVS is a word-internal thin whitespace that may occur only before the word-final vowels U+1820 MONGOLIAN LETTER A and U+1821 MONGOLIAN LETTER E. It determines the specific form of the character preceding it, selects a special variant shape of these vowels, and produces a small gap within the word. It is no longer classified as space character (i.e. in Zs category) in Unicode 6.3.0, even though it was in previous versions of the standard.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;a href="https://unicode-explorer.com/c/2800"&gt;U+2800 BRAILLE PATTERN BLANK&lt;/a&gt;
&lt;/h3&gt;

&lt;p&gt;The Braille pattern "dots-0", also called a "blank Braille pattern", is a 6-dot or 8-dot braille cell with no dots raised. It is represented by the Unicode code point U+2800, and in Braille ASCII with a space. In all Braille systems, the Braille pattern dots-0 is used to represent a space or the lack of content. In particular some fonts display the character as a fixed-width blank. However, the Unicode standard explicitly states that it does not act as a space.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;a href="https://unicode-explorer.com/c/3164"&gt;U+3164 HANGUL FILLER&lt;/a&gt;
&lt;/h3&gt;

&lt;p&gt;The Hangul Filler character is used to introduce eight-byte Hangul composition sequences and to stand in for an absent element (usually an empty final) in such a sequence. Unicode includes the Wansung code Hangul Filler in the Hangul Compatibility Jamo block for round-trip compatibility, but uses its own system (with its own, differently used, filler characters) for composing Hangul. &lt;/p&gt;

&lt;h2&gt;
  
  
  Visible Space Characters
&lt;/h2&gt;

&lt;h3&gt;
  
  
  &lt;a href="https://unicode-explorer.com/c/2420"&gt;␠ U+2420 SYMBOL FOR SPACE&lt;/a&gt;
&lt;/h3&gt;

&lt;h3&gt;
  
  
  &lt;a href="https://unicode-explorer.com/c/2422"&gt;␢ U+2422 BLANK SYMBOL&lt;/a&gt;
&lt;/h3&gt;

&lt;h3&gt;
  
  
  &lt;a href="https://unicode-explorer.com/c/2423"&gt;␣ U+2423 OPEN BOX&lt;/a&gt;
&lt;/h3&gt;

</description>
      <category>unicode</category>
      <category>text</category>
      <category>ux</category>
      <category>hacking</category>
    </item>
    <item>
      <title>Go: Identifiers vs. Unicode</title>
      <dc:creator>Florian Pigorsch</dc:creator>
      <pubDate>Tue, 03 Aug 2021 08:23:29 +0000</pubDate>
      <link>https://dev.to/flopp/golang-identifiers-vs-unicode-1fe7</link>
      <guid>https://dev.to/flopp/golang-identifiers-vs-unicode-1fe7</guid>
      <description>&lt;p&gt;A recent &lt;a href="https://www.reddit.com/r/golang/comments/owe6hn/i_cant_understand_which_characters_are_allowed/"&gt;Reddit post about Unicode characters in Go identifiers&lt;/a&gt; sparked my interest to dive into the &lt;a href="https://golang.org/ref/spec#Identifiers"&gt;Go spec&lt;/a&gt; and look things up properly:&lt;/p&gt;

&lt;p&gt;According to the spec, the syntax for valid identifiers is&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;identifier = letter { letter | unicode_digit }
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;with&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;letter = unicode_letter | "_"
unicode_letter = /* a Unicode code point classified as "Letter" */ .
unicode_digit  = /* a Unicode code point classified as "Number, decimal digit" */ .
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The "Letter" category consists of the Unicode categories &lt;code&gt;Lu&lt;/code&gt; (uppercase letters), &lt;code&gt;Ll&lt;/code&gt; (lowercase letters), &lt;code&gt;Lt&lt;/code&gt; (titlecase letters), &lt;code&gt;Lm&lt;/code&gt; (modifier letters), and &lt;code&gt;Lo&lt;/code&gt; (other letters), where "Number, decimal digit" refers to the Unicode category &lt;code&gt;Nd&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;So an identifier has to start with either a "letter" or an underscore ("_"), and must contain only "letters", "decimal digits" and "underscores" - according to what's defined as letters and digits in Unicode.&lt;br&gt;
The set of letters is not only the usual &lt;code&gt;A&lt;/code&gt;-&lt;code&gt;Z&lt;/code&gt;, &lt;code&gt;a&lt;/code&gt;-&lt;code&gt;z&lt;/code&gt;, but also letters from other scripts, like greek letters (e.g. &lt;a href="https://unicode-explorer.com/c/03A3"&gt;&lt;code&gt;Σ&lt;/code&gt;&lt;/a&gt;, or CJK characters (e.g. &lt;a href="https://unicode-explorer.com/c/3B6A"&gt;&lt;code&gt;㭪&lt;/code&gt;&lt;/a&gt;). The same holds for digits - not only &lt;code&gt;0&lt;/code&gt;-&lt;code&gt;9&lt;/code&gt;, but also digits from other scripts are allowed: e.g. &lt;a href="https://unicode-explorer.com/c/0B69"&gt;&lt;code&gt;୩&lt;/code&gt;&lt;/a&gt;, &lt;a href="https://unicode-explorer.com/c/0663"&gt;&lt;code&gt;٣&lt;/code&gt;&lt;/a&gt;, etc.&lt;/p&gt;

&lt;p&gt;Valid identifiers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;abc_123&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;_myidentifier&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;Σ&lt;/code&gt; (&lt;a href="https://unicode-explorer.com/c/03A3"&gt;U+03A3 GREEK CAPITAL LETTER SIGMA&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;㭪&lt;/code&gt; (&lt;a href="https://unicode-explorer.com/c/3B6A"&gt;some &lt;code&gt;CJK&lt;/code&gt; character&lt;/a&gt; from the &lt;code&gt;Lo&lt;/code&gt; category)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;x٣३߃૩୩3&lt;/code&gt; (&lt;code&gt;x&lt;/code&gt; + decimal digits &lt;code&gt;3&lt;/code&gt; from various scripts)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Invalid identifiers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;42&lt;/code&gt; (does not start with a letter)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;😀&lt;/code&gt; (not a letter, but &lt;code&gt;So / Symbol, other&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;⽔&lt;/code&gt; (not a letter, but &lt;code&gt;So / Symbol, other&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;x🌞&lt;/code&gt; (starts with a letter, but contains non-letter/digit characters)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Although Go considers identifiers valid that contain other characters than &lt;code&gt;A&lt;/code&gt;-&lt;code&gt;Z&lt;/code&gt;, &lt;code&gt;a&lt;/code&gt;-&lt;code&gt;z&lt;/code&gt;, &lt;code&gt;0&lt;/code&gt;-&lt;code&gt;9&lt;/code&gt;, and &lt;code&gt;_&lt;/code&gt;, it's generally not advisable to use those - because of readability, accessibility, or even to avoid rendering issues. &lt;/p&gt;

</description>
      <category>go</category>
      <category>unicode</category>
    </item>
  </channel>
</rss>
