Yes! You read that right. SPL (abbreviation of Structured Process Language) is an open-source programming language that codes in a grid and that specifically designed to process structured data.
As almost all programming languages code in the form of text, what does the SPL code written in the grid like? And what differences does it make? Let's first take a look at SPL’s programming environment.
Features of SPL
Code written in the grid
In the above screenshot, the middle part is SPL’s grid code.
But what are the advantages of writing code in the grid?
Coding always involves intermediate variables, to which names usually need to be given. Often, we do not need to do this in SPL. A later step can directly use the cell name appearing in a previous step (like A1) to reference the computing result of the cell, such as =A2.align@a(A6:~,date(Datetime))
.Therefore, we do not need to go to the trouble to define a variable (often its name should be meaningful and the naming is annoying). Of course, SPL supports defining variables, but without the need to define their types. Define a variable off-the-cuff and use it, such as=cod=A4.new(ID:Commodity,0:Stock,:OosTime,0:TotalOosTime)
; or define one ad hoc and use it, like= A1.group@o(a+=if(x,1,0)).
Yet there is a problem with using cell name as the variable name. The cell name (position) changes when a row/column is inserted or deleted, and error occurs if the original name is still used. Don’t worry, SPL IDE already deals with it and ensures that, after a row/column is inserted or deleted, cell name will automatically adjust with the change. As the following figure shows, a row is inserted and the related cell names after it are changed.
The cellset code is neat-looking as code written in the grid is naturally orderly. In the cellset code, a code block (such as the A13 ~ A19 for loop) is set out by indentation without using any modifiers, and therefore, is very neat and intuitive. In addition, even when the code in a cell for handling task details is long, it occupies this one cell only and readability of the whole code won’t be compromised (this coding format prevents a lengthy code line from spilling out its cell and affecting reading code on the right and below). The text format code, however, will be wholly displayed and affect code reading.
The debugging functionality is another shining point. Look at the programming environment again. Besides the coding zone in the middle, the toolbar above it offers various execution and debugging functionality icons, including “Run”, “Debug”, “Run to cursor”, “Step over”, “Set breakpoint”, and “Calculate active cell”, to meet the needs of code editing and debugging. One cell holds one operation; the code is separate and clear. This is unlike the text format code, where one line may contain multiple actions, which are not easy to distinguish from each other and which are difficult to locate when the statement is too long and covers multiple lines.
On the right of the interface, there is a result display panel. With the grid-style (step-by-step) programming, SPL keeps result of each cell (step) after execution/debugging. Click a cell (step) and the programmer can check in real-time whether its computing result is correct or not. The ability to view result of each step in real-time gets rid of the manual output and makes debugging more convenient.
- Multilevel result set
Function options
Each programming language offers a large number of built-in library functions. The more the library functions a language has, the more convenient it becomes when implementing various functionalities. Functions are identified through different names or parameters (and parameter types). But when a function cannot be differentiated from another even through the parameter type, option parameters will be explicitly added to it to tell the compiler or interpreter what functionality it is intended. For example, Java has multiple OpenOptions for manipulating files. To create a new file, we use:
Files.write(path, DUMMY_TEXT.getBytes(), StandardOpenOption.CREATE_NEW);
To open a file if it already exists but to create it if it does not exist, we use:
Files.write(path, DUMMY_TEXT.getBytes(), StandardOpenOption.CREATE)
To append data to a file while ensuring that data loss won’t occur at system crash, we use:
Files.write(path,ANOTHER_DUMMY_TEXT.getBytes(), StandardOpenOption.APPEND, StandardOpenOption.WRITE, StandardOpenOption.SYNC)
A function needs options to do different jobs, usually by treating them as parameters. This makes the uses of options complicated, and it is easy to confuse their real targets. And functions whose number of parameters is not fixed cannot use parameters to represent options.
SPL offers unique function option syntax. It makes functions doing similar jobs share the same name, and uses options to differentiate them. The design implements function options in a near two-layer structure, making it convenient to memorize and use. For example, pos() function searches a string for the position of a specified substring; and it works with @z option to perform the search backwards:
pos@z("abcdeffdef","def")
Work with @c option to become case-insensitive:
pos@c("abcdef","Def")
And options can work together:
pos@zc("abcdeffdef","Def")
With function options, users learn fewer functions and, to perform a different job within the same functionality, just look up and use the appropriate option. The SPL’s way of tiering functions makes it convenient to find and use them.
Cascaded parameters
Some functions have complex parameters, which could be multilayered. Traditional programming languages do not have special syntax to represent the multilayer parameters. They can only generate a multilayer data object and pass it in, which is very inconvenient. For example, to perform a join operation in Java (inner-join between Orders table and Employee table), we have the following code:
Map<Integer, Employee> EIds = Employees.collect(Collectors.toMap(Employee::EId, Function.identity()));
record OrderRelation(int OrderID, String Client, Employee SellerId, double Amount, Date OrderDate){}
Stream<OrderRelation> ORS=Orders.map(r -> {
Employee e=EIds.get(r.SellerId);
OrderRelation or=new OrderRelation(r.OrderID,r.Client,e,r.Amount,r.OrderDate);
return or;
}).filter(e->e.SellerId!=null);
We can see that a multilayer (multi-segment) parameter needs to be passed to Map to achieve the join operation, making the code difficult to read and let alone write. Let’s compute further as a join operation is often followed by the other operations. Here we group Employee.Dept and sum Orders.Amount:
Map<String, DoubleSummaryStatistics> c=ORS.collect(Collectors.groupingBy(r->r.SellerId.Dept,Collectors.summarizingDouble(r->r.Amount)));
for(String dept:c.keySet()){
DoubleSummaryStatistics r =c.get(dept);
System.out.println("group(dept):"+dept+" sum(Amount):"+r.getSum());
}
There is no need to explain how complex such a function works as all programmers are very familiar with the experience. By contrast, SQL is much simpler and more intuitive.
select Dept,sum(Amount) from Orders r inner join Employee e on r.SellerId=e. SellerId group by Dept
SQL uses keywords (such as FROM and JOIN) to distinguish codes of different operations. The code parts can be seen as multilayer parameters, though they have better readability by mimicking English. Yet the way is not universal enough. We need to find a special keyword for each clause, and this results in inconsistent structures.
Neither does SPL use keywords to separate different operations in a statement, nor it writes a nested query as Java does. It invents the cascaded parameter, which is stipulated as triple-layer, and uses semicolon, comma and colon respectively to separate the three layers. The semicolon represents the top layer; parameters separated by it are a suite. In a suite, if there is another layer of parameters, separate them with the comma; and use colon to separate the next layer. For example, the SPL statement for handling the above association computation is like this:
join(Orders:o,SellerId ; Employees:e,EId).groups(e.Dept;sum(o.Amount))
The triple layers are enough as there are rarely parameter relationships that cannot represented in three layers.
Grid-style code, option syntax, and cascaded parameters make SPL look interesting, but its true purpose is efficient data processing. SPL is a low-code language designed specifically for handling structured data, as its full name Structured Process Language shows.
Why was SPL invented?
There are many programming languages available for processing structured data today, such as commonly used SQL, Java, Python, and so on. Why to invent SPL?
Because these languages cannot generate low enough code, that is, their code for processing structured data is not concise enough.
Comparison of SPL and the other languages
Let’s make the comparisons through examples.
JAVA vs SPL
Grouping and aggregation on two fields in Java:
Calendar cal=Calendar.getInstance();
Map<Object, DoubleSummaryStatistics> c=Orders.collect(Collectors.groupingBy(
r->{
cal.setTime(r.OrderDate);
return cal.get(Calendar.YEAR)+"_"+r.SellerId;
},
Collectors.summarizingDouble(r->{
return r.Amount;
})));
for(Object sellerid:c.keySet()){
DoubleSummaryStatistics r =c.get(sellerid);
String year_sellerid[]=((String)sellerid).split("_");
System.out.println("group is (year):"+year_sellerid[0] +
"\t (sellerid):"+year_sellerid[1]+"\t sum is:“ +
r.getSum()+"\t count is:"+r.getCount());
}
Even a simple grouping & aggregation involves these many lines of code and becomes cumbersome. while SPL can get it done in just one line:
SQL vs SPL
Here is the SQL code for getting the maximum consecutively rising days for a stock:
SELECT CODE, MAX(con_rise) AS longest_up_days
FROM (
SELECT CODE, COUNT(*) AS con_rise
FROM (
SELECT CODE, DT, SUM(updown_flag) OVER (PARTITION BY CODE ORDER BY CODE, DT) AS no_up_days
FROM (
SELECT CODE, DT,
CASE WHEN CL > LAG(CL) OVER (PARTITION BY CODE ORDER BY CODE, DT) THEN 0
ELSE 1 END AS updown_flag
FROM stock
)
)
GROUP BY CODE, no_up_days
)
GROUP BY CODE
SQL uses a three-layer statement to perform the query, which may take a while to understand, not to mention write. But the SPL code is still one line:
The SPL code expresses the same computational logic. Even one who isn’t familiar with the SPL syntax can understand the general meaning.
Python vs SPL
For this task, Python implement it with the following code:
import pandas as pd
stock = pd.read_excel(‘stock.xlsx’)
inc_days=0
max_inc_days=0
for i in stock ['price'].shift(0)>stocl[‘price’].shift(1):
inc_days =0 if i==False else inc_days+1
max_inc_days = inc_days if max_inc_days < inc_days else max_inc_days
print(max_inc_days)
The Python code is not that roundabout, but it still needs to hardcode the loop and generates “high” code. SPL achieves the same logic without the need of writing loop, generating very concise code:
Why are those languages quite troublesome?
Because none of them has the characteristics of set-orientation and discreteness at the same time!
But what are set-orientation and discreteness?
We’ll explain them using Java and SQL, the two languages with exactly opposite styles.
Set-orientation
Structured data is typically in batches, meaning it exists in the form of sets. To facilitate the processing of such data, programming languages need to provide sufficient capability to perform set operations.
However, Java lacks sufficient set operation libraries, making the implementation of structured data processing cumbersome and often requiring dozens of, even nearly one hundred, lines of code. In contrast, SQL offers a rich library of set operations, making it much simpler to code filtering (where), grouping (group), sum (aggregation) and so on.
Isn’t it enough to add to Java some library functions for handling set operations?
It isn’t that simple!
Take sorting as an example:
SELECT * FROM T ORDER BY price
In SQL, we simply write “ORDER BY price” for the sorting, without caring about the data type of “price”.
Java is a type-strict compiled language. One same function cannot work with different data types. This means we need to write sorting functions respectively for integers, real numbers, and strings.
// Integer
public static void sortIntegers(int[] integers) {
Arrays.sort(integers);
}
// Real number
public static void sortDoubles(double[] doubles) {
Arrays.sort(doubles);
}
// String
public static void sortStrings(String[] strings) {
Arrays.sort(strings);
}
This is not just a hassle for library function developers; users also need to specify the data type so the compiler can find the right function. Later, Java introduced generics to simplify the process:
public static <T extends Comparable<T>> void sortArray(T[] array) {
Arrays.sort(array);
}
But there is still a pile of angle brackets, making the code messy and affecting understanding of the business logic.
Sorting can also involve multiple parameters. For example:
SELECT * FROM T ORDER BY price, quantity
in SQL, we write “ORDER BY price, quantity”.
But this is also an issue for Java, which does not support using functions having different numbers of parameters together. It is impossible that we are able to list all possibilities of the number of parameters in advance, as we do for handling the data type issue. The general way is writing a function having only one parameter and transforming the possible multiparameter operation to a uni-parameter one ad hoc. For example, combine price and quantity into a single parameter for the sorting. In some scenarios the language supports set type parameters, and to reference them we need to combine parameters as a set, resulting in an extra layer and making the coding cumbersome.
public static <T> Comparator<T> chainingComparator(List<Function<T, Comparable<?>>> keyExtractors) {
return keyExtractors.stream()
.map(Comparator::comparing)
.reduce(Comparator::thenComparing)
.orElseThrow(() -> new IllegalArgumentException("At least one key extractor must be provided"));
}
// Define sorting
List<Function<Order, Comparable<?>>> keyExtractors = Arrays.asList(
Order::getPrice,
Order::getQuantity
);
// Sort
Collections.sort(orders, chainingComparator(keyExtractors));
SQL doesn't have such an issue. Interpreted languages can dynamically decide how to do it according to data type and number of the parameters.
Moreover, the sorting may be performed by an expression. For example:
SELECT * FROM T ORDER BY price*quantity
In SQL’s “ORDER BY price * quantity”, “price * quantity” is computed only after traversal of members of the set begins, rather than before execution of the SQL statement. In essence, “price * quantity” is a function that takes the current set members as parameters. That is to say, a function defined through an expression is used as a parameter of the sorting operation.
In Java, if an expression is used as a function parameter, it will be first computed before being called rather than one by one on set members. Though Java allows passing a function as a parameter to another function, the syntax is more complex as it needs to define the parameter function beforehand.
// Auxiliary method
public double getTotal() {
return price * quantity;
}
// Lambda expression comparator
Collections.sort(orders, Comparator.comparingDouble(Order::getTotal));
// Output
orders.forEach(System.out::println);
Isn’t it Lambda syntax that allows passing functions as parameters without defining them beforehand? Java now supports this.
That’s right, Java now supports Lambda syntax for defining anonymous functions as parameters.
Collections.sort(orders, Comparator.comparingDouble(order -> order.price * order.quantity));
However, the parameter function cannot be written as a simple formula, which, otherwise, will be evaluated directly when the compiler cannot identify it. Same as ordinary functions, Lambda syntax also needs to define parameters and even their data types, and also involves a function body, except that the function is not named. Additionally, since the above-mentioned issues of data type and number of parameters often mix up with the Lambda syntax, the code become even more messy.
SQL adopts Lambda syntax but abandons its form. No one even considers it Lambda syntax, though it does use a formula to define a function as parameter.
Another issue with Java is that it cannot directly reference fields. Structured data isn’t a simple single value but a record consisting of fields. While SQL only requires specifying table names when multiple fields share the same name, Java must use the verbose “object.member” notation. This is because Java’s Lambda syntax doesn’t naturally recognize records; it treats them as parameters, and accessing record fields requires using the dot operator.
Only a language whose syntactic mechanism allows direct reference of fields is a true professional for processing structured data.
Also, Java does not yet support dynamic data structures.
Often result of a structured data computation is still structured data, whose structure is determined by types of the involved operations and cannot be predetermined before coding. Therefore, the ability to support dynamic data structures is essential.
Any SQL SELECT statement can generate a new data structure, allowing fields to be added or removed freely in the code without defining the structure beforehand. Compiled languages like Java cannot do this because all structures to be used must be defined before compilation:
class Order {
double price;
int quantity;
public Order(double price, int quantity) {
this.price = price;
this.quantity = quantity;
}
class Order {
double price;
int quantity;
double total;
public Order(double price, int quantity) {
this.price = price;
this.quantity = quantity;
this.total = price * quantity;
}
In principle, new structures cannot be dynamically generated during execution.
In a nutshell, set operation class libraries that support dynamic data type and number of parameters; the underlying Lambda syntax that allows direct reference of record fields; and dynamic data structure.
These are what we commonly refer to as the set-oriented features of programming languages! With these features, processing batch structured data becomes more convenient.
Interpreted languages like SQL have enough set-oriented features, but compiled languages such as Java face challenges in this regard due to their inherent characteristics. This limitation is not unique to Java; languages like Kotlin and Scala also struggle to fully support set-oriented features.
Discreteness
After set-orientation, let’s look at discreteness. As mentioned earlier, SQL lacks discreteness and its code is also not low. What is this discreteness?
Let’s look at an example: calculating the median of a sequence. Here we use a simple array instead of the structured data to avoid Java’s weaknesses and prevent the lengthy code from covering the key issues.
//SQL:
WITH TN AS (SELECT ROWNUMBER() OVER (ORDER BY V) RN,V FROM T),
N AS (SELECT COUNT(*) N FROM T)
SELECT AVERAGE(V) FROM TN,N
WHERE RN>=N.N/2 AND RN<=N.N/2+1
//Java:
Array.sort(v);
return (v[(v.length-1)/2] + v[v.length/2])/2;
Java often appears more concise than SQL when it isn’t handling structured data or using Lambda syntax. By examining the computing processes of the two blocks of code, we find that the Java code is not only more concise and but more efficient.
On the surface, SQL’s problem is that it cannot retrieve members of a sequence according to their sequence numbers due to the lack of the sequence number concept. It has to invent a sequence number column – here window functions are used to make invention of the sequence not that difficult. Java, however, can easily retrieve members from an array using sequence numbers for computation.
The fundamental reason is the difference in data models between Java and SQL.
The basic-level data in high-level programming languages like Java is atomic data that cannot be further split, such as numbers and strings. Atomic data can form more complex structures such as sets and records, which are also a type of data, and can be further combined into larger sets and records. The data that constitutes a set or record is independent of them, and able to exist on its own or participate in computations, naturally making it possible for the languages to offer the operation of splitting members from a set or record (retrieve members according to sequence numbers or by fields). This flexible form of data organization is called discreteness. With discreteness, constructing a set of sets or a record whose field values are records becomes easy.
SQL treats a table, which is a set of records, as atomic data, which does not consist of more basic data. SQL tables cannot form more complex structured data. Sets of sets and records of records do not exist in SQL. A SQL record cannot exist independent of a table, which means members must be attached to a set and any operations trying to split them from the current set is meaningless, as no data types are available to hold the separated data. And the splitting operation is actually implemented through the WHERE filtering, which isn’t straightforward and which, in essence, manipulates subsets and still returns a table (set). A single record is essentially a table with just one record (a set having only one member). This makes SQL’s data organization very inflexible, that is, lacking in discreteness.
Almost all high-level languages naturally support discreteness, but SQL is an exception.
Discreteness is a natural feature, as things inherently develop from simple to complex. That’s also the natural way of thinking. Lack of discreteness makes it difficult to implement computations involving only a part of a set, resulting in long-winded and roundabout code.
Look at a simple structured data computation scenario: list employees whose age and income are both greater than a specified employee’s. According to the natural way of thinking, one would typically write a SQL statement as follows:
WITH A AS (SELECT * FROM employee WHERE name=...)
SELECT * FROM employee WHERE age>A.age AND salary>A.salary
But this SQL statement is illegal. The second half of the statement requires using JOIN:
SELECT T.* FROM employee T,A WHERE T.age>A.age AND T.salary>A.salary
This can be a bit roundabout, and is a reflection of the difference between tables and records in SQL, as the language does not have a data type that can directly hold records.
Java’s support for set operations on structured data is limited, so we switch to SPL for its superior set-oriented capabilities:
This conforms to the natural way of thinking.
Set-orientation is reflected through the syntactic form, which affects the degree of code complexity; discreteness is the data model that determines the degree of code difficulty. Without set-orientation, Java generates cumbersome and lengthy code; and based not on discreteness, SQL produces possibly shorter yet circuitous and difficult code.
Let’s look back at the earlier example of calculating the number of consecutively rising days for a stock.
SELECT CODE, MAX(con_rise) AS longest_up_days
FROM (
SELECT CODE, COUNT(*) AS con_rise
FROM (
SELECT CODE, DT, SUM(updown_flag) OVER (PARTITION BY CODE ORDER BY CODE, DT) AS no_up_days
FROM (
SELECT CODE, DT,
CASE WHEN CL > LAG(CL) OVER (PARTITION BY CODE ORDER BY CODE, DT) THEN 0
ELSE 1 END AS updown_flag
FROM stock
)
)
GROUP BY CODE, no_up_days
)
GROUP BY CODE
This order-based computation is a typical example of requiring the cooperation of discreteness and set-orientation. As the order of members is meaningful only in the set, set-orientation becomes an essential; and to perform an order-based computation, each member needs to be separated from its adjacent members, for which discreteness is emphasized. Discreteness enables more thorough set-orientation, and on the other hand, set-orientation allows discreteness to play a broader role. SQL lacks discreteness and its set-orientation is not thorough, ultimately leading to roundabout and hard-to-read code.
The born of SPL
Set-orientation is implemented through syntax and influences the degree of code complexity; discreteness is a type of data model that decides the degree of code difficulty. Java where the set-oriented feature is absent produces cumbersome and tedious code, and SQL that does not support discreteness might generate shorter code but will be indirect and difficult. That’s why both Java code and SQL code are not “low”.
How about Python?
As we discussed earlier, SQL and Java have notable shortcomings in terms of set-orientation and discreteness respectively, resulting in poor overall performance. While Python has greatly improved in these two aspects, it is still not good enough. Computations, such as order-based ones, are still cumbersome and difficult. Our expected goal is to be excellent in both set-orientation and discreteness, so that the language’s overall performance can be great.
From the previous discussion, we can easily conclude that to achieve low-code solutions, we just need to effectively combine Java’s discreteness and SQL’s set-orientation.
This was the original intent behind creating SPL, a low-code programming language designed for structured data computations, to better integrate set-orientation and discreteness.
SPL offers a rich collection of set operation class libraries, has underlying and inexplicit Lambda syntax, and supports dynamic data structure, order-based computations and procedural programming, and so on. It inherits all advantages of both Java and SQL, and has more thorough set-orientation, making it a true low code programming language.
Compared with other structured data processing languages, SPL has more comprehensive and outstanding capabilities. Look back again at the example of getting the longest consecutively rising days for a stock to find the syntactic style of SPL.
The loop function, which supports both order-based computations and has strong Lambda-style, is very concise, without the need of loop statements, let alone nesting. A single statement is enough.
There is also a real-world e-commerce funnel analysis case:
WITH e1 AS (
SELECT uid,1 AS step1, MIN(etime) AS t1
FROM events
WHERE etime>=end_date-14 AND etime<end_date AND etype='etype1'
GROUP BY uid),
e2 AS (
SELECT uid,1 AS step2, MIN(e1.t1) as t1, MIN(e2.etime) AS t2
FROM events AS e2 JOIN e1 ON e2.uid = e1.uid
WHERE e2.etime>=end_date-14 AND e2.etime<end_date AND e2.etime>t1 AND e2.etime<t1+7 AND etype='etype2'
GROUP BY uid),
e3 as (
SELECT uid,1 AS step3, MIN(e2.t1) as t1, MIN(e3.etime) AS t3
FROM events AS e3 JOIN e2 ON e3.uid = e2.uid
WHERE e3.etime>=end_date-14 AND e3.etime<end_date AND e3.etime>t2 AND e3.etime<t1+7 AND etype='etype3'
GROUP BY uid)
SELECT SUM(step1) AS step1, SUM(step2) AS step2, SUM(step3) AS step3
FROM e1 LEFT JOIN e2 ON e1.uid = e2.uid LEFT JOIN e3 ON e2.uid = e3.uid
SPL:
SPL not only has simple code, but the program is more versatile and can handle any step of funnel analysis. All of them are benefits brought by the low code feature.
With both set-orientation and discreteness, SPL generates true low code. There are more examples of SPL vs SQL and Python comparisons in Example codes for comparing SPL, SQL, and Python where you can feel the SPL conciseness better and have a deeper understanding about why Python is far from satisfaction though it has been much improved.
High-performance
Let’s then talk about performance. Data processing involves two key issues: write simple and run fast.
We know that software improvement cannot speed up the hardware. The CPU and hard disk have fixed speeds. However, by designing low-complexity algorithms that require less amount of computation, the computer would perform fewer actions, leading to faster execution.
An algorithm blueprint is only at the halfway, it needs to be implemented. This means that low-complexity algorithms should be easy to implement to ensure the feasibility. From this perspective, simple code and fast execution are essentially the same thing!
Characterized by low-code, SPL makes it easy to implement low complexity algorithms and thus achieve high performance. In contrast, Java and SQL often struggle, sometimes even are unable, to implement the same algorithm, and this causes low performance.
This is the reason behind the performance increase by dozens of times brought by many SPL optimization cases for database SQL.
Low-code SPL is now open source.Open source address
Top comments (0)