<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: SAVI</title>
    <description>The latest articles on DEV Community by SAVI (@savi444).</description>
    <link>https://dev.to/savi444</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3918444%2F27f5a2a3-a2a5-463a-8102-3d27f6ba83e7.png</url>
      <title>DEV Community: SAVI</title>
      <link>https://dev.to/savi444</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/savi444"/>
    <language>en</language>
    <item>
      <title>Tokens per Word: GPT-5 vs Claude vs GPT-4, Measured Across 7 Languages</title>
      <dc:creator>SAVI</dc:creator>
      <pubDate>Wed, 10 Jun 2026 20:32:58 +0000</pubDate>
      <link>https://dev.to/savi444/tokens-per-word-gpt-5-vs-claude-vs-gpt-4-measured-across-7-languages-4419</link>
      <guid>https://dev.to/savi444/tokens-per-word-gpt-5-vs-claude-vs-gpt-4-measured-across-7-languages-4419</guid>
      <description>&lt;p&gt;Most token-cost guides repeat the same rule of thumb: one token is about three quarters of an English word. That figure is roughly right for English on a modern tokenizer, and increasingly wrong for everything else. Published numbers are surprisingly thin, so we measured it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Method
&lt;/h2&gt;

&lt;p&gt;We built a 13-sample corpus: the same 94-word passage human-translated into English, Spanish, Portuguese, French, German, Chinese, and Japanese (so the comparison holds meaning constant, not length), plus Python, JavaScript, JSON, Markdown, emoji-heavy social text, and CSV data.&lt;/p&gt;

&lt;p&gt;GPT counts come from tiktoken (o200k_base for GPT-5/4o, cl100k_base for GPT-4, p50k for the GPT-3 era), so they are exact. Claude counts come from Anthropic's official count-tokens endpoint, envelope-calibrated (we measured the fixed message wrapper of 6 to 7 tokens and subtracted it; a doubling check came back with zero drift).&lt;/p&gt;

&lt;h2&gt;
  
  
  Tokens per word, same passage
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Language&lt;/th&gt;
&lt;th&gt;Words&lt;/th&gt;
&lt;th&gt;GPT-5 (o200k)&lt;/th&gt;
&lt;th&gt;Tokens/word&lt;/th&gt;
&lt;th&gt;GPT-4 (cl100k)&lt;/th&gt;
&lt;th&gt;Claude Sonnet 4.6&lt;/th&gt;
&lt;th&gt;Claude Opus 4.8&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;English&lt;/td&gt;
&lt;td&gt;94&lt;/td&gt;
&lt;td&gt;110&lt;/td&gt;
&lt;td&gt;1.17&lt;/td&gt;
&lt;td&gt;110&lt;/td&gt;
&lt;td&gt;116&lt;/td&gt;
&lt;td&gt;177&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Spanish&lt;/td&gt;
&lt;td&gt;107&lt;/td&gt;
&lt;td&gt;143&lt;/td&gt;
&lt;td&gt;1.34&lt;/td&gt;
&lt;td&gt;172&lt;/td&gt;
&lt;td&gt;184&lt;/td&gt;
&lt;td&gt;256&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Portuguese&lt;/td&gt;
&lt;td&gt;102&lt;/td&gt;
&lt;td&gt;137&lt;/td&gt;
&lt;td&gt;1.34&lt;/td&gt;
&lt;td&gt;176&lt;/td&gt;
&lt;td&gt;188&lt;/td&gt;
&lt;td&gt;241&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;French&lt;/td&gt;
&lt;td&gt;109&lt;/td&gt;
&lt;td&gt;153&lt;/td&gt;
&lt;td&gt;1.40&lt;/td&gt;
&lt;td&gt;194&lt;/td&gt;
&lt;td&gt;207&lt;/td&gt;
&lt;td&gt;275&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;German&lt;/td&gt;
&lt;td&gt;93&lt;/td&gt;
&lt;td&gt;159&lt;/td&gt;
&lt;td&gt;1.71&lt;/td&gt;
&lt;td&gt;203&lt;/td&gt;
&lt;td&gt;245&lt;/td&gt;
&lt;td&gt;324&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Chinese&lt;/td&gt;
&lt;td&gt;n/a&lt;/td&gt;
&lt;td&gt;159&lt;/td&gt;
&lt;td&gt;n/a&lt;/td&gt;
&lt;td&gt;223&lt;/td&gt;
&lt;td&gt;217&lt;/td&gt;
&lt;td&gt;216&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Japanese&lt;/td&gt;
&lt;td&gt;n/a&lt;/td&gt;
&lt;td&gt;205&lt;/td&gt;
&lt;td&gt;n/a&lt;/td&gt;
&lt;td&gt;268&lt;/td&gt;
&lt;td&gt;241&lt;/td&gt;
&lt;td&gt;240&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  The four findings that surprised us
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Spanish costs +30% vs English on GPT-5, but it used to be much worse.&lt;/strong&gt; The same passage was +56% on GPT-4's cl100k and more than double on the GPT-3 era p50k. o200k roughly doubled the vocabulary and spent it on human languages.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Claude has two counting regimes.&lt;/strong&gt; Anthropic's count-tokens endpoint reports identical numbers for Sonnet 4.6 and Haiku 4.5, but roughly 1.5x higher for Opus 4.8 on Latin-script text (Chinese and Japanese barely change). Since billing follows each model's own count, Opus costs about 2.5x Sonnet per English word despite a 1.67x sticker ratio.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;CSV is the most expensive thing you can send.&lt;/strong&gt; 57 tokens per 100 characters vs 19 for English prose. Digits, dates, and separators fragment into many small tokens.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;o200k did not help code.&lt;/strong&gt; Our JavaScript sample actually costs slightly more on o200k than cl100k. The multilingual gains came at no benefit to source code.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  What a million words costs (input, measured)
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Language&lt;/th&gt;
&lt;th&gt;GPT-5&lt;/th&gt;
&lt;th&gt;Claude Haiku 4.5&lt;/th&gt;
&lt;th&gt;Claude Sonnet 4.6&lt;/th&gt;
&lt;th&gt;Claude Opus 4.8&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;English&lt;/td&gt;
&lt;td&gt;$1.46&lt;/td&gt;
&lt;td&gt;$1.23&lt;/td&gt;
&lt;td&gt;$3.70&lt;/td&gt;
&lt;td&gt;$9.41&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Spanish&lt;/td&gt;
&lt;td&gt;$1.67&lt;/td&gt;
&lt;td&gt;$1.72&lt;/td&gt;
&lt;td&gt;$5.16&lt;/td&gt;
&lt;td&gt;$11.96&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;German&lt;/td&gt;
&lt;td&gt;$2.14&lt;/td&gt;
&lt;td&gt;$2.63&lt;/td&gt;
&lt;td&gt;$7.90&lt;/td&gt;
&lt;td&gt;$17.42&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Reproduce it
&lt;/h2&gt;

&lt;p&gt;The full dataset, corpus, and methodology are free under CC BY 4.0:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://textkit.tech/blog/tokens-per-word-tokenizer-comparison-2026" rel="noopener noreferrer"&gt;Full article with all tables&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://textkit.tech/data/tokenizer-comparison-2026.csv" rel="noopener noreferrer"&gt;tokenizer-comparison-2026.csv&lt;/a&gt; and &lt;a href="https://textkit.tech/data/tokenizer-comparison-2026.json" rel="noopener noreferrer"&gt;JSON&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://textkit.tech/data/tokenizer-corpus-2026.json" rel="noopener noreferrer"&gt;The 13-sample corpus&lt;/a&gt;, so you can verify every count&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;To sanity-check the GPT numbers interactively, our &lt;a href="https://textkit.tech/token-counter" rel="noopener noreferrer"&gt;browser-local token counter&lt;/a&gt; runs the real o200k encoding client-side, so counts match this dataset exactly and your text never leaves the page.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>claude</category>
      <category>llm</category>
      <category>openai</category>
    </item>
    <item>
      <title>Validating JSON in 2026: Browser vs VS Code vs jq</title>
      <dc:creator>SAVI</dc:creator>
      <pubDate>Thu, 07 May 2026 17:33:03 +0000</pubDate>
      <link>https://dev.to/savi444/validating-json-in-2026-browser-vs-vs-code-vs-jq-48i0</link>
      <guid>https://dev.to/savi444/validating-json-in-2026-browser-vs-vs-code-vs-jq-48i0</guid>
      <description>&lt;p&gt;Last week I watched a senior engineer paste a 4MB JSON payload into VS Code and wait 12 seconds for the linter to finish. Then they Cmd+F'd for the missing bracket. Then they gave up, re-ran the API call, and tried again.&lt;/p&gt;

&lt;p&gt;That whole detour was unnecessary. There are three places JSON validation lives in 2026 — the browser, the editor, and the shell — and each one wins in a different scenario. Mixing them up costs a few minutes a day. Once you stop mixing them up, you stop noticing the friction at all.&lt;/p&gt;

&lt;p&gt;This is a field guide to which tool actually wins for which task.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why "just use VS Code" is wrong&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;VS Code's built-in JSON validator is excellent. It is also the worst answer to several common questions, because:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;It loads the whole file into memory. A 50MB log dump from CloudWatch hangs the renderer for seconds before you can scroll.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;It validates syntax, not shape. Without a $schema reference, you get a green checkmark for any structurally valid blob — no help with "is this the response my API returned yesterday?"&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;It is coupled to a workspace. Pasting an ad-hoc payload into a scratch buffer means picking a filename, picking a folder, and ignoring unsaved-buffer warnings on quit.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;It is not on the production server. When you SSH into a box at 2am to inspect what an API returned, the editor is irrelevant.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;VS Code is right when JSON is part of a project — config files, fixtures, schema-validated payloads. It is wrong for almost everything else.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;When the browser wins&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Most JSON validation in practice is one-shot: "is this blob I just pasted valid?" or "what does this nested structure actually look like?"&lt;/p&gt;

&lt;p&gt;For that, a browser-based formatter is the fastest answer. No file to save, no buffer to close, no schema to wire up. You paste, you see green or red, you keep moving.&lt;/p&gt;

&lt;p&gt;I use &lt;a href="https://textkit.tech/json-formatter" rel="noopener noreferrer"&gt;TextKit's JSON formatter&lt;/a&gt; for this. The reason is boring and important: it pretty-prints, minifies, and validates without sending the payload anywhere. Everything happens client-side. That matters when the JSON contains a customer email or an internal token — and it almost always does, because if it did not, you would not be poking at it.&lt;/p&gt;

&lt;p&gt;The browser wins specifically when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;The payload is between 1KB and 5MB (above that, browsers struggle too).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;You do not need to query the data, just look at it.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;You want to compare two payloads side by side. Format both, then drop them into TextKit's text diff tool — it respects whitespace and shows the structural drift line by line. This is the single fastest way to answer "what changed between yesterday's API response and today's?"&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;You are on someone else's machine and you do not trust their Node version.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A note on copy-paste hygiene. JSON pulled from server logs usually comes with line numbers, timestamps, or wrapper text glued to it. Cleaning that up by hand is irritating. A regex-capable &lt;a href="https://textkit.tech/find-replace" rel="noopener noreferrer"&gt;find and replace&lt;/a&gt; can strip the prefix in one pass — the pattern &lt;strong&gt;^\d{4}-\d{2}-\d{2}T[^\s]+\s+&lt;/strong&gt; covers the ISO-timestamp prefix that breaks most parsers.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;When VS Code wins&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;VS Code is the right tool when JSON is part of the project, not a one-off blob.&lt;/p&gt;

&lt;p&gt;The clearest signal: the file lives in your repo and someone else will edit it next week. That includes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;tsconfig.json, package.json, .eslintrc.json, and the rest of the config family.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;API request and response fixtures in a test directory.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;OpenAPI specs and JSON Schema definitions where IntelliSense matters.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Any file where Git history needs to track changes&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In these cases the editor's schema validation is the entire point. Reach for the browser and you lose Cmd-click navigation, autocomplete, type errors against the schema, and the "Format Document" keybind your hands already know.&lt;/p&gt;

&lt;p&gt;A useful trick: if you are handed a JSON payload and you know it is about to become a fixture, paste it into a scratch buffer first, save it as *.fixture.json, and let VS Code start nagging immediately. The friction is the value.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;When jq wins&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;jq is the only correct answer for three things:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;JSON over 10MB. No browser, no editor, just stream it.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;JSON you need to query, not just read. "Give me the email of every user where subscription.status == 'cancelled'" is a one-liner in jq and a ten-minute exercise in any UI.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;JSON inside a pipeline. When the input is curl ... | jq and the output feeds another command, no GUI tool fits.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The jq syntax curve is real, but the floor is low. Three commands cover 80% of practical usage:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;bash# Pretty-print and validate &lt;span class="o"&gt;(&lt;/span&gt;exits non-zero on bad JSON&lt;span class="o"&gt;)&lt;/span&gt;
&lt;span class="nb"&gt;cat &lt;/span&gt;response.json | jq &lt;span class="nb"&gt;.&lt;/span&gt;

&lt;span class="c"&gt;# Pull a nested field out&lt;/span&gt;
&lt;span class="nb"&gt;cat &lt;/span&gt;response.json | jq &lt;span class="s1"&gt;'.data.users[0].email'&lt;/span&gt;

&lt;span class="c"&gt;# Filter an array by predicate&lt;/span&gt;
&lt;span class="nb"&gt;cat &lt;/span&gt;response.json | jq &lt;span class="s1"&gt;'.data.users[] | select(.status == "cancelled") | .email'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you are new to jq, the first command alone — jq . — replaces 90% of "is this valid?" workflows once you internalize it. The second and third are what make it irreplaceable.&lt;/p&gt;

&lt;p&gt;The cost of jq: it is installed-on-this-machine-or-not, and on Windows the experience is rough enough that most engineers do not bother. That is why the browser tool stays in the rotation even on jq-friendly setups.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;A decision tree that fits on an index card&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Use this and you will be right 95% of the time:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;File is in the repo? → VS Code.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Payload over 10MB, or you need to query inside it? → jq.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;One-shot blob, you just want to look at it or share a clean version? → browser-based formatter.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Comparing two payloads? → format both in the browser, diff them, then reach for jq if you need to drill in.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The trap is reaching for VS Code by default because it is already open. It is the most powerful of the three for project work and the slowest for ad-hoc inspection. The browser tool exists for the second category, and the few seconds it saves per inspection compound into real time across a week.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;A note on "online JSON validators"&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A surprising number of online JSON tools POST your payload to a server. For small public data this is fine. For anything containing PII, internal IDs, API keys, customer records, or anything that ends up in a SOC 2 audit trail — it is not. The fix is to use a tool that runs entirely in the browser and to verify it by opening the network tab and watching for outbound requests during a paste-and-format cycle. If you see none, you are safe.&lt;/p&gt;

&lt;p&gt;This is also why "JSON beautifier" search results from 2018 are not a great answer in 2026. Most of those domains were sold, the privacy posture changed, and there is no easy way to audit what they do with what you paste. Pick a tool whose source-of-truth you can verify, then stop thinking about it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Try it&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Next time you are about to paste 200 lines of JSON into your editor's scratch buffer, try &lt;a href="https://textkit.tech/json-formatter" rel="noopener noreferrer"&gt;TextKit's JSON formatter&lt;/a&gt; instead. If it is a structural diff problem, &lt;a href="https://textkit.tech/text-diff" rel="noopener noreferrer"&gt;TextKit's text diff&lt;/a&gt; takes the formatted output and shows exactly which keys moved.&lt;br&gt;
Save VS Code for the files that actually live in the repo.&lt;/p&gt;

</description>
      <category>json</category>
      <category>webdev</category>
      <category>productivity</category>
      <category>javascript</category>
    </item>
  </channel>
</rss>
