<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Patrick Ryan</title>
    <description>The latest articles on DEV Community by Patrick Ryan (@patrickryankenneth).</description>
    <link>https://dev.to/patrickryankenneth</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F4005742%2F57d82189-fdf8-4545-b29d-5b7a21602aa3.jpg</url>
      <title>DEV Community: Patrick Ryan</title>
      <link>https://dev.to/patrickryankenneth</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/patrickryankenneth"/>
    <language>en</language>
    <item>
      <title>Why *= 2 is slower than = col * 2 in Pandas — and what strace shows about abstraction cost</title>
      <dc:creator>Patrick Ryan</dc:creator>
      <pubDate>Sat, 27 Jun 2026 19:08:53 +0000</pubDate>
      <link>https://dev.to/patrickryankenneth/why-2-is-slower-than-col-2-in-pandas-and-what-strace-shows-about-abstraction-cost-10kb</link>
      <guid>https://dev.to/patrickryankenneth/why-2-is-slower-than-col-2-in-pandas-and-what-strace-shows-about-abstraction-cost-10kb</guid>
      <description>&lt;p&gt;This problem looks trivial, but I got curious about what different Pandas update strategies actually cost.&lt;/p&gt;

&lt;p&gt;I separated the benchmark into two layers:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Warm in-memory execution&lt;/strong&gt; — how fast each Pandas update strategy runs after imports are already complete.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cold-start syscall surface&lt;/strong&gt; — how much work the runtime performs before the useful operation even begins.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;On 20,000 rows with 1,000 randomized round-robin iterations, the result surprised me:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Direct assignment, &lt;code&gt;df["salary"] = df["salary"] * 2&lt;/code&gt;, was the fastest standard Pandas API.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;df["salary"] *= 2&lt;/code&gt; was slower, even though it looks more "in-place."&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;.loc[]&lt;/code&gt; assignment was much slower because it goes through label-indexing machinery.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;.apply(lambda)&lt;/code&gt; was roughly 28x slower than direct assignment.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;.iterrows()&lt;/code&gt; was roughly 2,800x slower than direct assignment.&lt;/li&gt;
&lt;li&gt;The fastest Pandas-backed path was dropping to the underlying NumPy buffer with &lt;code&gt;.to_numpy(copy=False)[:] *= 2&lt;/code&gt;, which beat standard Pandas assignment by about 5x.&lt;/li&gt;
&lt;li&gt;Even that cheatcode still costs ~2.3x over a bare NumPy array — the DataFrame wrapper overhead is irreducible even when you bypass the Pandas API entirely.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That last point was the most surprising. Escaping the Pandas API is not the same as escaping Pandas.&lt;/p&gt;

&lt;p&gt;Then I looked at cold-start cost with &lt;code&gt;strace&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Before the actual salary multiplication runs, Pandas pays a large initialization tax: imports, filesystem probing, memory-mapping compiled extensions, dtype/index machinery, and runtime setup. In my local environment, the tiny Pandas script had about 8,500 syscalls before useful work, compared with 4 for a tiny ASM program and 88 for a small C++ binary.&lt;/p&gt;

&lt;p&gt;The import cost is paid once in production. The row-wise cliff is real regardless.&lt;/p&gt;

&lt;p&gt;Full benchmark table with Pandas methods, NumPy buffer mutation, ASM/C++/Rust/Polars comparisons, and syscall breakdown is in my full &lt;a href="https://leetcode.com/problems/modify-columns/solutions/8360593/from-4-syscalls-to-8517-what-doubling-a-b1euy" rel="noopener noreferrer"&gt;benchmark results&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>python</category>
      <category>pandas</category>
      <category>dataengineering</category>
      <category>datascience</category>
    </item>
  </channel>
</rss>
