<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: en Z (Zen)</title>
    <description>The latest articles on DEV Community by en Z (Zen) (@jayhua).</description>
    <link>https://dev.to/jayhua</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F2840546%2F59bc86a6-863f-45da-b850-a6890dc8189d.jpg</url>
      <title>DEV Community: en Z (Zen)</title>
      <link>https://dev.to/jayhua</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/jayhua"/>
    <language>en</language>
    <item>
      <title>Boost Your Data Transfer Speed by 100x with Arrow Flight SQL in Just 3 Minutes</title>
      <dc:creator>en Z (Zen)</dc:creator>
      <pubDate>Mon, 10 Feb 2025 07:33:57 +0000</pubDate>
      <link>https://dev.to/jayhua/boost-your-data-transfer-speed-by-100x-with-arrow-flight-sql-in-just-3-minutes-1p6g</link>
      <guid>https://dev.to/jayhua/boost-your-data-transfer-speed-by-100x-with-arrow-flight-sql-in-just-3-minutes-1p6g</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;Data Analyst Xiao Hua rubs his sore eyes, staring blankly at the computer screen.&lt;br&gt;
He can’t help but complain, “This data export is so slow!”&lt;br&gt;
Indeed, waiting for MySQL protocol to transfer large volumes of data feels like trying to drink a barrel of water through a straw — when will it ever end?&lt;br&gt;
While waiting for the data transfer, Xiao Hua flips through the Doris official documentation and stumbles upon a game-changing tool that can boost data transfer efficiency by 100 times — Arrow Flight SQL!&lt;br&gt;
Traditional data transfer is like playing a “water pouring” game, involving several steps; Arrow Flight SQL, on the other hand, builds a highway for data to run smoothly without obstacles.&lt;br&gt;
Follow Xiao Hua’s journey to understand Arrow Flight SQL in just 3 minutes!&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h1&gt;
  
  
  Speed Up Data Transfer by 100 Times!
&lt;/h1&gt;

&lt;p&gt;Data Analyst Xiao Hua recently encountered a tricky problem.&lt;br&gt;
He needed to read massive amounts of data from Apache Doris for real-time analysis, but the traditional MySQL protocol’s cumbersome serialization and deserialization processes made each query painfully slow.&lt;br&gt;
“Is there a faster data transfer solution?” Xiao Hua wondered, scratching his head in frustration.&lt;br&gt;
After scouring the Doris official documentation, he finally found a revolutionary breakthrough in Apache Doris 2.1 — Arrow Flight SQL, the high-speed data link.&lt;br&gt;
This solution, based on Apache Arrow, boasts a staggering 100x improvement in data transfer performance compared to MySQL protocol!&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3paough77qzhgurh9v8n.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3paough77qzhgurh9v8n.png" alt="Image description" width="720" height="287"&gt;&lt;/a&gt;&lt;/p&gt;


&lt;h1&gt;
  
  
  How Does It Achieve a 100x Improvement?
&lt;/h1&gt;

&lt;p&gt;Xiao Hua muttered to himself, “Do you know how slow traditional MySQL protocol data transfer is?”&lt;br&gt;
It’s like pouring a barrel of water into another through a funnel. Data from Doris’s columnar storage format has to be converted to MySQL’s row storage format and then back to columnar format on the client side, wasting a lot of time.&lt;br&gt;
Arrow Flight SQL is completely different. It sets up a direct pipeline, allowing data to be transmitted from Doris to the client in Arrow columnar format without any conversion. This zero-copy transmission method boosts data transfer efficiency by nearly 100 times!&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fb54ddfxpz6vly5fxarqb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fb54ddfxpz6vly5fxarqb.png" alt="Image description" width="720" height="372"&gt;&lt;/a&gt;&lt;br&gt;
As shown in the figure above, query results in Doris are organized in columnar format blocks. In versions before 2.1, data had to be deserialized from row storage format bytes back to columnar format when transmitted via MySQL Client or JDBC/ODBC drivers.&lt;br&gt;
By building a high-speed data transfer link with Arrow Flight SQL, if the target client also supports Arrow columnar format, the entire transfer process will completely avoid serialization/deserialization operations, eliminating the time and performance losses associated with them.&lt;/p&gt;

&lt;p&gt;What’s even more amazing is that Arrow Flight SQL supports parallel transmission across multiple nodes, fully leveraging the multi-core advantages of modern hardware. For data scientists and analysts, this means they can obtain massive amounts of data for analysis in just a few seconds, significantly improving work efficiency!&lt;/p&gt;


&lt;h1&gt;
  
  
  Let Your Data Transfer Take Off!
&lt;/h1&gt;

&lt;p&gt;After understanding the basic principles, Xiao Hua couldn’t wait to try out this “magic tool.”&lt;br&gt;
In fact, using Arrow Flight SQL in Python and Java is very straightforward.&lt;br&gt;
In Python, you can quickly set up a high-speed data channel in just a few steps:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Doris Arrow Flight SQL Test
&lt;/span&gt;
&lt;span class="c1"&gt;# Step 1: Install the library from PyPI.
# pip install adbc_driver_manager
# pip install adbc_driver_flightsql
&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;adbc_driver_manager&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;adbc_driver_flightsql.dbapi&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;flight_sql&lt;/span&gt;

&lt;span class="c1"&gt;# Step 2: Create a client to interact with the Doris Arrow Flight SQL service.
# Modify arrow_flight_sql_port in fe/conf/fe.conf to an available port, such as 9090.
# Modify arrow_flight_sql_port in be/conf/be.conf to an available port, such as 9091.
&lt;/span&gt;&lt;span class="n"&gt;conn&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;flight_sql&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;connect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;uri&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;grpc://{FE_HOST}:{fe.conf:arrow_flight_sql_port}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;db_kwargs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;adbc_driver_manager&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;DatabaseOptions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;USERNAME&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;root&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;adbc_driver_manager&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;DatabaseOptions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;PASSWORD&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="n"&gt;cursor&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# Interacting with Doris via SQL using Cursor
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sql&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;### execute query: ###&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;sql&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sql&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;### result: ###&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fetchallarrow&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;to_pandas&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;

&lt;span class="c1"&gt;# Step 3: Execute DDL statements, create database/table, show stmt.
&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;DROP DATABASE IF EXISTS arrow_flight_sql FORCE;&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;show databases;&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;create database arrow_flight_sql;&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;show databases;&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;use arrow_flight_sql;&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;CREATE TABLE arrow_flight_sql_test
    (
         k0 INT,
         k1 DOUBLE,
         K2 varchar(32) NULL DEFAULT &lt;/span&gt;&lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="s"&gt; COMMENT &lt;/span&gt;&lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="s"&gt;,
         k3 DECIMAL(27,9) DEFAULT &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;0&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;,
         k4 BIGINT NULL DEFAULT &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;10&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;,
         k5 DATE,
    )
    DISTRIBUTED BY HASH(k5) BUCKETS 5
    PROPERTIES(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;replication_num&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; = &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;);&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;show create table arrow_flight_sql_test;&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;


&lt;span class="c1"&gt;# Step 4: Insert data
&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;INSERT INTO arrow_flight_sql_test VALUES
        (&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;0&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;, 0.1, &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ID&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;, 0.0001, 9999999999, &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;2023-10-21&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;),
        (&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;1&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;, 0.20, &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ID_1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;, 1.00000001, 0, &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;2023-10-21&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;),
        (&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;2&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;, 3.4, &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ID_1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;, 3.1, 123456, &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;2023-10-22&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;),
        (&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;3&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;, 4, &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ID&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;, 4, 4, &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;2023-10-22&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;),
        (&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;4&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;, 122345.54321, &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ID&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;, 122345.54321, 5, &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;2023-10-22&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;);&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;


&lt;span class="c1"&gt;# Step 5: Execute queries, aggregation, sort, set session variable
&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;select * from arrow_flight_sql_test order by k0;&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;set exec_mem_limit=2000;&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;show variables like &lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s"&gt;%exec_mem_limit%&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s"&gt;;&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;select k5, sum(k1), count(1), avg(k3) from arrow_flight_sql_test group by k5;&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Step 6: Close cursor
&lt;/span&gt;&lt;span class="n"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;close&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For Java developers, Arrow Flight SQL also offers an elegant solution with JDBC-style APIs:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;java.sql.Connection&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;java.sql.DriverManager&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;java.sql.ResultSet&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;java.sql.Statement&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;

&lt;span class="nc"&gt;Class&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;forName&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"org.apache.arrow.driver.jdbc.ArrowFlightJdbcDriver"&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="no"&gt;DB_URL&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"jdbc:arrow-flight-sql://{FE_HOST}:{fe.conf:arrow_flight_sql_port}?useServerPrepStmts=false"&lt;/span&gt;
        &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="s"&gt;"&amp;amp;cachePrepStmts=true&amp;amp;useSSL=false&amp;amp;useEncryption=false"&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
&lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="no"&gt;USER&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"root"&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
&lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="no"&gt;PASS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;""&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;

&lt;span class="nc"&gt;Connection&lt;/span&gt; &lt;span class="n"&gt;conn&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;DriverManager&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getConnection&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="no"&gt;DB_URL&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="no"&gt;USER&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="no"&gt;PASS&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;span class="nc"&gt;Statement&lt;/span&gt; &lt;span class="n"&gt;stmt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;createStatement&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
&lt;span class="nc"&gt;ResultSet&lt;/span&gt; &lt;span class="n"&gt;resultSet&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;stmt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;executeQuery&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"select * from information_schema.tables;"&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;resultSet&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;next&lt;/span&gt;&lt;span class="o"&gt;())&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="nc"&gt;System&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;out&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;println&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;resultSet&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;toString&lt;/span&gt;&lt;span class="o"&gt;());&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;

&lt;span class="n"&gt;resultSet&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;close&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
&lt;span class="n"&gt;stmt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;close&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
&lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;close&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After testing and experimenting, Xiao Hua summarized several key performance optimization tips:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Smart Batch Processing&lt;br&gt;
The default batch size is 1024 rows, which can be adjusted according to the actual scenario using setTargetBatchSize.&lt;br&gt;
For environments with ample memory, increasing the batch size can significantly boost throughput.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Parallel Acceleration&lt;br&gt;
Java developers can use FlightClient to achieve parallel reading across multiple endpoints, more flexibly leveraging cluster resources.&lt;br&gt;
A typical optimization is:&lt;br&gt;
&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nc"&gt;FlightClient&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;FlightClient&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;builder&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
            &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;setHost&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"localhost"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
            &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;setPort&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;8080&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
            &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;Columnar Computing
Keep data in Arrow format for computations to avoid unnecessary format conversions.
Python users can directly use pandas for efficient columnar computing:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fetchallarrow&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="n"&gt;to_pandas&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="p"&gt;...&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h1&gt;
  
  
  Summary
&lt;/h1&gt;

&lt;p&gt;Back to the beginning of the story, Xiao Hua reconstructed his data analysis workflow using Arrow Flight SQL, achieving nearly 100x faster query speeds and significantly reduced memory usage. His boss was very satisfied with the improvement and even gave him a raise and promotion!&lt;/p&gt;

&lt;p&gt;From this experience, Xiao Hua realized that technological innovation not only solves practical problems but also brings career development opportunities. Arrow Flight SQL is like giving data wings, truly “lifting” data analysis to new heights.&lt;/p&gt;

&lt;p&gt;In the next issue, we will explore other interesting, useful, and valuable content. Stay tuned!&lt;/p&gt;

&lt;p&gt;Welcome to connect with me: &lt;a href="http://www.linkedin.com/in/binhua-zhang-a5259834b" rel="noopener noreferrer"&gt;www.linkedin.com/in/binhua-zhang-a5259834b&lt;/a&gt;&lt;/p&gt;

</description>
      <category>doris</category>
      <category>olap</category>
      <category>bigdata</category>
    </item>
  </channel>
</rss>
