The programming language esProc SPL recommended for data analysts is definitely worth a try.
Let’s talk about its advantages and disadvantages, it’s hard to say whether it’s an advantage or a disadvantage, it’s more like a characteristic.
Firstly, the usage threshold is very low, and it can be used immediately after installation without the need for database support. It can directly process files like CSV, which is particularly convenient to operate. The data reading and processing are done in one go, making it easier than other tools.
Group and aggregate CSV files in one line:
T(“order.csv”).groups(area;sum(amout))
SPL has an interesting characteristic:** grid code**. Its code is written in grids, similar to Excel. Buddies who are used to SQL and Python may feel a bit unfamiliar at first, as you have never seen such a writing style before. It may seem a bit unconventional, but it is actually quite convenient.
This grid code has two good advantages besides being relatively neat:
One is that there is no need to define temporary variables, and the subsequent code can reference the results of the previous cells through cell names such as A2 and A3, which is very convenient. If rows and columns are added or deleted, the cell names will automatically change, so there will be no reference errors, which is similar to Excel.
The second is good interactivity. On the right side, there is a result panel. After the code runs, you can see the calculation results by clicking on each cell, without the need for manual output. The results are quite intuitive, and any mistakes can be quickly detected and corrected. The overall interactivity is good.
The syntax of SPL is its own, different from SQL, but it has all the necessary functions such as grouping, filtering, and join. Specifically, SPL has made significant improvements in grouping and ordered operations, making it noticeably more concise than SQL and Python in handling complex analysis tasks.
You can have a more deeply feeling through this official example: Find the player who scores three consecutive times within one minute
SPL
SQL
WITH numbered_scores AS (
SELECT team, player, play_time, score,
ROW_NUMBER() OVER (ORDER BY play_time) AS rn
FROM ball_game)
SELECT DISTINCT s1.player
FROM numbered_scores s1
JOIN numbered_scores s2 ON s1.player = s2.player AND s1.rn = s2.rn - 1
JOIN numbered_scores s3 ON s1.player = s3.player AND s1.rn = s3.rn - 2
WHERE s3.play_time - s1.play_time <60 ;
Python
df = pd.read_csv("../ball_game.csv")
df["play_time"] = pd.to_datetime(df["play_time"])
result_players = []
player = None
start_index = 0
consecutive_scores = 0
for i in range(len(df)-2):
current = df.iloc[i]
if player != current["player"]:
player = current["player"]
consecutive_scores = 1
else:
consecutive_scores += 1
last2 = df.iloc[i-2] if i >=2 else None
if consecutive_scores >= 3 and (current['play_time'] - last2['play_time']).seconds < 60:
result_players.append(player)
result_players = list(set(result_players))
SPL uses some symbols, such as ~ and #, which may seem a bit strange at first glance. It’s not really difficult, just look at a few more examples and you’ll be proficient.
What I want to complain about is that the code prompt of SPL development environment is not good and intelligent enough, which is the aspect that should be improved.
And the graphical interface of this editor looks too outdated, developed using Swing? This thing is basically unused now, isn’t it. Although it has all the things needed, its appearance does not quite meet the aesthetic preferences of modern people.
The saved code files are not text, which makes it difficult to do version control and code review outside of the development environment. It also seems that they cannot be used in VSCode.
That’s all for now, overall speaking it’s pretty good. There are no major issues with the functionality, it’s easy to get started, and convenient to use. It’s suitable to be added to data analysis toolboxes and you may find it particularly useful sometimes. Of course, if it can be compatible with text format code and make the graphical interface and editor more modern, it should be more popular.
Top comments (0)