<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Rudy Zidan</title>
    <description>The latest articles on DEV Community by Rudy Zidan (@rudyzidan).</description>
    <link>https://dev.to/rudyzidan</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F866487%2F3252dd94-e092-4295-9589-102e125af20d.jpeg</url>
      <title>DEV Community: Rudy Zidan</title>
      <link>https://dev.to/rudyzidan</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/rudyzidan"/>
    <language>en</language>
    <item>
      <title>Own Your Data: The Wake-Up Call</title>
      <dc:creator>Rudy Zidan</dc:creator>
      <pubDate>Fri, 03 Apr 2026 23:39:17 +0000</pubDate>
      <link>https://dev.to/rudyzidan/own-your-data-the-wake-up-call-2k3n</link>
      <guid>https://dev.to/rudyzidan/own-your-data-the-wake-up-call-2k3n</guid>
      <description>&lt;p&gt;Data plays a critical part in our lives. And with the rapid changes driven by the recent evolution of AI, owning your data is no longer optional!&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;First&lt;/strong&gt;, we need to answer the following question: "Is your data really safe?"&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;On April 1st, 2026&lt;/strong&gt;, an article was published on the Proton blog revealing that Big Tech companies have shared data from 6.9 million user accounts with US authorities over the past decade.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;On January 1st, 2026&lt;/strong&gt;, Google published its AI Training Data Transparency Summary, containing the following:&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7fxm84qyqm6ra5cds6mw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7fxm84qyqm6ra5cds6mw.png" alt=" " width="800" height="567"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This is Google basically saying: "We use your data to train our AI models, but trust us, we're careful about it."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;On November 24, 2025&lt;/strong&gt;, Al Jazeera published an article containing a striking statement "Deleting your Meta accounts does not eliminate the possibility of Meta AI using your past public data, Meta’s spokesperson said."&lt;/p&gt;

&lt;p&gt;In summary, most Big Tech companies are using your data. They scrape the internet and use whatever they can. Even Anthropic, the company behind Claude, acknowledges in their Collection of Personal Data section that training data does incidentally include personal information. So watch out for what you share!&lt;/p&gt;

&lt;p&gt;"&lt;a href="https://www.youtube.com/watch?v=xc3VG9JZM6I" rel="noopener noreferrer"&gt;I can only show you the door. You're the one that has to walk through it&lt;/a&gt;" - Morpheus, The Matrix (1999).&lt;/p&gt;

&lt;p&gt;References:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://proton.me/blog/big-tech-government-requests-parenting#:~:text=Over%20the%20past,to%206.9%20million." rel="noopener noreferrer"&gt;Proton's Article&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://proton.me/blog/big-tech-data-requests-surge" rel="noopener noreferrer"&gt;Proton's research&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://transparencyreport.google.com/user-data/overview" rel="noopener noreferrer"&gt;Google's transparency report for user data requests&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://storage.googleapis.com/transparencyreport/report-downloads/pdf-report-jj_2026-1-1_2026-1-1_en_v1.pdf" rel="noopener noreferrer"&gt;Google's AI Training Data Transparency Summary&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.aljazeera.com/news/2025/11/24/are-tech-companies-using-your-private-data-to-train-ai-models#:~:text=Deleting%20your%20Meta%20accounts%20does%20not%20eliminate%20the%20possibility%20of%20Meta%20AI%20using%20your%20past%20public%20data%2C%20Meta%E2%80%99s%20spokesperson%20said." rel="noopener noreferrer"&gt;Al Jazeera's Article&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://privacy.claude.com/en/articles/10023555-how-do-you-use-personal-data-in-model-training#:~:text=We%20do%20not%20actively,to%20any%20third%20party." rel="noopener noreferrer"&gt;Anthropic's Collection of Personal Data (Claude Privacy)&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>data</category>
      <category>privacy</category>
    </item>
    <item>
      <title>CITEXT vs LOWER</title>
      <dc:creator>Rudy Zidan</dc:creator>
      <pubDate>Fri, 17 Mar 2023 18:28:25 +0000</pubDate>
      <link>https://dev.to/rudyzidan/citext-vs-lower-49of</link>
      <guid>https://dev.to/rudyzidan/citext-vs-lower-49of</guid>
      <description>&lt;p&gt;Today we going to compare the differences between two approaches for case insensitive text search.&lt;/p&gt;

&lt;h2&gt;
  
  
  Starting with CITEXT
&lt;/h2&gt;

&lt;p&gt;CITEXT is a case-insensitive character string type, it stores the value as its input. It's does not convert the values to lower case.&lt;/p&gt;

&lt;p&gt;But how does it work?&lt;br&gt;
It actually uses &lt;strong&gt;LOWER()&lt;/strong&gt; behind the scenes whenever it does a comparison.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/*
 * citextcmp()
 * Internal comparison function for citext strings.
 * Returns int32 negative, zero, or positive.
 */
static int32
citextcmp(text *left, text *right, Oid collid)
{
    char       *lcstr,
               *rcstr;
    int32       result;

    /*
     * We must do our str_tolower calls with DEFAULT_COLLATION_OID, not the
     * input collation as you might expect.  This is so that the behavior of
     * citext's equality and hashing functions is not collation-dependent.  We
     * should change this once the core infrastructure is able to cope with
     * collation-dependent equality and hashing functions.
     */

    lcstr = str_tolower(VARDATA_ANY(left), VARSIZE_ANY_EXHDR(left), DEFAULT_COLLATION_OID);
    rcstr = str_tolower(VARDATA_ANY(right), VARSIZE_ANY_EXHDR(right), DEFAULT_COLLATION_OID);

    result = varstr_cmp(lcstr, strlen(lcstr),
                        rcstr, strlen(rcstr),
                        collid);

    pfree(lcstr);
    pfree(rcstr);

    return result;
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;So, does it mean that using &lt;strong&gt;CITEXT&lt;/strong&gt; will add overhead over the query?&lt;br&gt;
... actually, it depends. Depend on your Database Collation the performance will vary, while if you use an index the results will be close.&lt;/p&gt;

&lt;p&gt;Let's dig deep, my current collation is "en_US.utf8"&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SHOW lc_collate;
// = en_US.utf8
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;TEXT&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;CREATE TABLE text_data(t text NOT NULL);

INSERT INTO text_data
   SELECT i||'text'
   FROM generate_series(1, 1000000) AS i;

VACUUM (FREEZE, ANALYZE) text_data;

explain analyze (
  SELECT * FROM text_data WHERE t = 'ted'
)

//output
"Gather  (cost=1000.00..11613.43 rows=1 width=10) (actual time=45.528..49.233 rows=0 loops=1)"
"  Workers Planned: 2"
"  Workers Launched: 2"
"  -&amp;gt;  Parallel Seq Scan on text_data  (cost=0.00..10613.33 rows=1 width=10) (actual time=42.385..42.385 rows=0 loops=3)"
"        Filter: (t = 'ted'::text)"
"        Rows Removed by Filter: 333333"
"Planning Time: 0.112 ms"
"Execution Time: 49.258 ms"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;CITEXT&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;CREATE TABLE citext_data(t citext NOT NULL);

INSERT INTO citext_data
   SELECT i||'text'
   FROM generate_series(1, 1000000) AS i;

VACUUM (FREEZE, ANALYZE) citext_data;

explain analyze (
  SELECT * FROM citext_data WHERE t = 'ted'
)

//output
"Gather  (cost=1000.00..11613.43 rows=1 width=10) (actual time=291.318..294.635 rows=0 loops=1)"
"  Workers Planned: 2"
"  Workers Launched: 2"
"  -&amp;gt;  Parallel Seq Scan on citext_data  (cost=0.00..10613.33 rows=1 width=10) (actual time=288.713..288.714 rows=0 loops=3)"
"        Filter: (t = 'ted'::citext)"
"        Rows Removed by Filter: 333333"
"Planning Time: 0.045 ms"
"Execution Time: 294.655 ms"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;So basically, &lt;strong&gt;CITEXT&lt;/strong&gt; was performing 6 times slower than the &lt;strong&gt;TEXT&lt;/strong&gt;.&lt;br&gt;
Thats actually make sense because it performs sequential scan with a million comparisons. While if we were using index, they should be close to each other.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;TEXT&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;CREATE INDEX ON text_data (t);

VACUUM (FREEZE, ANALYZE) text_data;

explain analyze (
SELECT * FROM text_data WHERE t = 'ted'
)

//output
"Index Only Scan using text_data_t_idx on text_data  (cost=0.42..4.44 rows=1 width=10) (actual time=0.101..0.101 rows=0 loops=1)"
"  Index Cond: (t = 'ted'::text)"
"  Heap Fetches: 0"
"Planning Time: 0.113 ms"
"Execution Time: 0.116 ms"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;CITEXT&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;CREATE INDEX ON citext_data (t);

VACUUM (FREEZE, ANALYZE) citext_data;

explain analyze (
SELECT * FROM citext_data WHERE t = 'ted'
)

//output
"Index Only Scan using citext_data_t_idx on citext_data  (cost=0.42..4.44 rows=1 width=10) (actual time=0.091..0.091 rows=0 loops=1)"
"  Index Cond: (t = 'ted'::citext)"
"  Heap Fetches: 0"
"Planning Time: 0.111 ms"
"Execution Time: 0.105 ms"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  So where is LOWER() from all of this?
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;LOWER()&lt;/strong&gt; is just a string function. It converts the string to all lower case, according to the rules of the database's locale.&lt;/p&gt;

&lt;p&gt;So, by default &lt;strong&gt;LOWER()&lt;/strong&gt; does not use the b-tree index of what we already created on the &lt;strong&gt;text_data&lt;/strong&gt; table.&lt;br&gt;
Confirming this by the following example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;explain analyze (
  SELECT * FROM text_data WHERE lower(t) = 'ted'
)

//output
"Gather  (cost=1000.00..13155.00 rows=5000 width=10) (actual time=200.007..202.697 rows=0 loops=1)"
"  Workers Planned: 2"
"  Workers Launched: 2"
"  -&amp;gt;  Parallel Seq Scan on text_data  (cost=0.00..11655.00 rows=2083 width=10) (actual time=197.425..197.426 rows=0 loops=3)"
"        Filter: (lower(t) = 'ted'::text)"
"        Rows Removed by Filter: 333333"
"Planning Time: 0.109 ms"
"Execution Time: 202.716 ms"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To improve its performance, we will have to add a functional index.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;CREATE INDEX lower_text_data ON text_data (lower(t));

VACUUM (FREEZE, ANALYZE) text_data;

explain analyze (
  SELECT * FROM text_data WHERE lower(t) = 'ted'
)

//output
"Index Scan using lower_text_data on text_data  (cost=0.42..8.44 rows=1 width=10) (actual time=0.057..0.057 rows=0 loops=1)"
"  Index Cond: (lower(t) = 'ted'::text)"
"Planning Time: 0.189 ms"
"Execution Time: 0.068 ms"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;By performing multiple examples, both &lt;strong&gt;CITEXT&lt;/strong&gt; and &lt;strong&gt;LOWER()&lt;/strong&gt; are close to each other with the right applied index. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;CITEXT&lt;/strong&gt; is using the B-tree index. So, if you already have one you don't need to create another while &lt;strong&gt;LOWER()&lt;/strong&gt; needs the functional index in-order to perform.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;LOWER()&lt;/strong&gt; makes your SQL statements verbose, and you always have to remember to use &lt;strong&gt;LOWER()&lt;/strong&gt; on the column and the value.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;CITEXT&lt;/strong&gt; allow the primary key to be case-insensitive while &lt;strong&gt;LOWER()&lt;/strong&gt; does not allow that.&lt;/p&gt;

</description>
      <category>postgres</category>
      <category>citext</category>
      <category>btree</category>
      <category>sql</category>
    </item>
  </channel>
</rss>
