<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Manoranjan</title>
    <description>The latest articles on DEV Community by Manoranjan (@thecodecache).</description>
    <link>https://dev.to/thecodecache</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1477158%2Fbe01e0d9-f0bd-4d09-9783-d51c4cf1f331.jpeg</url>
      <title>DEV Community: Manoranjan</title>
      <link>https://dev.to/thecodecache</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/thecodecache"/>
    <language>en</language>
    <item>
      <title>Transpile a SQL Script into PySpark DataFrame API equivalent code</title>
      <dc:creator>Manoranjan</dc:creator>
      <pubDate>Wed, 08 May 2024 07:27:23 +0000</pubDate>
      <link>https://dev.to/thecodecache/transpile-a-sql-script-into-pyspark-dataframe-api-equivalent-code-1f92</link>
      <guid>https://dev.to/thecodecache/transpile-a-sql-script-into-pyspark-dataframe-api-equivalent-code-1f92</guid>
      <description>&lt;p&gt;Input &lt;code&gt;SQL&lt;/code&gt; Script (assume any dialect) :&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SELECT b.se10,
       b.se3,
       b.se_aggrtr_indctr,
       b.key_swipe_ind
FROM
  (SELECT se10,
          se3,
          se_aggrtr_indctr,
          ROW_NUMBER() OVER (PARTITION BY SE10
                             ORDER BY se_aggrtr_indctr DESC) AS rn,
          key_swipe_ind
   FROM fraud_details_data_whole
   GROUP BY se10,
            se3,
            se_aggrtr_indctr ,
            key_swipe_ind) b
WHERE b.rn&amp;lt;2
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output &lt;code&gt;PySpark&lt;/code&gt; Code using &lt;code&gt;DataFrame&lt;/code&gt; API :&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from pyspark.sql import SparkSession
from pyspark.sql.window import Window
from pyspark.sql.functions import row_number

# Create a SparkSession
spark = SparkSession.builder \
    .appName("TranspileSQLExample") \
    .getOrCreate()

# Sample data (replace with your actual DataFrame)
data = [
    ("se10_value1", "se3_value1", "aggrtr_value1", "swipe_value1"),
    ("se10_value1", "se3_value2", "aggrtr_value2", "swipe_value2"),
    ("se10_value2", "se3_value3", "aggrtr_value3", "swipe_value3"),
    ("se10_value2", "se3_value4", "aggrtr_value4", "swipe_value4"),
    ("se10_value3", "se3_value5", "aggrtr_value5", "swipe_value5"),
    ("se10_value3", "se3_value6", "aggrtr_value6", "swipe_value6")
]

# Create DataFrame
fraud_details_data_whole = spark.createDataFrame(data, ["se10", "se3", "se_aggrtr_indctr", "key_swipe_ind"])

# Define Window specification for row_number() function
windowSpec = Window.partitionBy("se10").orderBy(fraud_details_data_whole["se_aggrtr_indctr"].desc())

# Add row number column
fraud_details_data_whole = fraud_details_data_whole.withColumn("rn", row_number().over(windowSpec))

# Filter rows where rn &amp;lt; 2
result_df = fraud_details_data_whole.filter("rn &amp;lt; 2")

# Select required columns
result_df = result_df.select("se10", "se3", "se_aggrtr_indctr", "key_swipe_ind")

# Show the result DataFrame
result_df.show()
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Is there any way to &lt;code&gt;translate&lt;/code&gt; the above &lt;code&gt;Sql&lt;/code&gt; Query into its equivalent &lt;code&gt;PySpark&lt;/code&gt; DataFrame API code? The result must be equal when we execute &lt;code&gt;Sql&lt;/code&gt; Script and its transpiled &lt;code&gt;PySpark&lt;/code&gt; code separately. &lt;br&gt;
Please suggest&lt;/p&gt;

</description>
    </item>
  </channel>
</rss>
