WangLiwen

Posted on Oct 10, 2023

JavaScript Magic Tricks: From Lexical Analysis to Confusing Encryption

#webdev #javascript #obfuscation #source

Generally speaking, if you want to achieve obfuscation and encryption of JavaScript code through programming, you have to use JavaScript code analysis libraries such as esprima and babel, even if you don't call mature product interfaces such as JShaman, in order to skip low-level lexical analysis and syntax analysis and start with abstract syntax trees to achieve code obfuscation and encryption.

This is not very beneficial for those who want to fully understand the complete process and principles of JavaScript code obfuscation and encryption.

In this article, we will start from scratch and demonstrate how to achieve JavaScript code obfuscation and encryption using native JavaScript without relying on any third-party tools, through syntax and lexical analysis.

Note: This article is intended for demonstration purposes, not for implementing a complete tool. Therefore, some details in logical processing may be omitted, such as not determining whether the code is wrapped in multiple "blocks" or not using a refined method to determine variable scope.

Example

This article will perform obfuscation encryption on the following JavaScript code.

function get_copyright(){ var domain = "jshaman javascript obfuscator"; var from_year = 2017; var copyright = "(c)" + from_year + "-" + (new Date).getFullYear() + "," + domain; return copyright; } console.log(get_copyright());

lexical analysis

Lexical analysis, roughly speaking, refers to the process of breaking down code according to certain rules in order to identify syntax keywords, variable names, strings, and other elements. In this example, the code is split into tokens using spaces, line breaks, tabs, greater than/less than signs, and other delimiters. This process is known as lexical analysis. After tokenization, a token array is generated.

Source Code

//分词特征
var word_split = "　 ,.?!;:\\/<>(){}[]\"'\r\n\t=+-|*%@#$^&";
//代码分词结果
var code_array = [];
//分词用索引
var word_index = 0;

//遍历代码并分词
for(i=0; i<js_code.length; i++){
    //不属分词特征，如function、var、变量等
    if(word_split.indexOf(js_code.charAt(i)) == -1){
        //分词索引位置为空
        if( typeof(code_array[word_index]) == "undefined" || code_array[word_index] == null){
            //未初始化的位置赋值为空白字符
            code_array[word_index] = "";
        }
        //此处拼接字符用+=，如遇到function字符串，会逐渐变成：f、fu、fun,直到成为function，然后遇到分词特征字符
        code_array[word_index] += js_code.charAt(i);
    }else{
        //分词特征处理
        //如果索引位置有内容，则让索引位置后移,相当于结束前一个分词索引内容
        if( typeof(code_array[word_index]) != "undefined" && code_array[word_index].length > 0){
            word_index++;
        }
        //注：\n会被charAt成为不可见的换行，此处功能可升级处理；
        code_array[word_index++] = js_code.charAt(i);
    }
}
console.log("分词完成：",code_array);

Execute

After completing the lexical analysis, the code can directly enter the obfuscation and encryption stage. In this stage, syntax analysis is performed while obfuscation and encryption preprocessing is carried out on the token array. The data that needs to be obfuscated or encrypted is operated on during this process.

Syntax analysis involves using if judgment logic, for example, "." is a member operation feature, and when this symbol is detected, the next node in the array is the method name. Another example is using parseInt to compare the value with the original value to determine whether it is a numerical value or not.

Source code

//字符串加密
if(code_array[i] == '"'){
    var before_string = code_array[++i];
    if(before_string.length > 1){
        console.log("字符串加密：", before_string, "变为", to_hex(before_string));
        code_array[i] = to_hex(before_string);
    }
}
//成员方法加密
if(code_array[i] == "."){
    var before_member_string = code_array[++i];
    console.log("成员方法加密：", before_member_string, "变为", to_hex(before_member_string));
    code_array[i] = "[\"" + to_hex(before_member_string) + "\"]";
    code_array[i-1] = "";
}
//数值加密
if(code_array[i].length <= 5){
    if(parseInt(code_array[i], 10) == code_array[i]){
        //异或加密算法
        var key = parseInt(Math.random() * (99999-10000) + 10000, 10);
        var cipher_num = code_array[i] ^ key;
        console.log("数值加密：",code_array[i], "变为", key + "^" + cipher_num);
        code_array[i] = key + "^" + cipher_num;
    }
}

In the real JavaScript obfuscation and encryption product functions, there will be more functional options.
For example, AST execution protection, string array obfuscation, flattening control flow, and many more. This article only provides a few simple examples as a demonstration. Additionally, it is important to consider many prerequisite conditions and the impact of modifications before executing obfuscation and encryption, as mentioned earlier.
For example: Before encrypting a string, it needs to determine whether the parent node of the string is an object attribute that cannot be encrypted or whether the encrypted characters exceed the maximum length of the string that can be stored.
Before flattening control flow, it needs to determine whether the code is in strict mode; before encrypting variable names, it needs to consider the scope of variables and whether the variable names are duplicates, etc. Only when all aspects are properly considered can the encrypted code be executed normally.

After the confusion and encryption operation is complete, the modifications are stored in the previously generated word segmentation array. Finally, the array is converted to code using the join method, completing the confusion encryption. Professional JavaScript code obfuscation and encryption tools such as JShaman also follow a similar logic process.

Full source code

//JavaScript代码
var js_code = `
function get_copyright(){
var domain = "jshaman混淆加密";
var from_year = 2017;
var copyright = "(c)" + from_year + "-" + (new Date).getFullYear() + "," + domain;
return copyright;
}
console.log(get_copyright());
`;

var var_count = 0;
//分词特征
var word_split = "　 ,.?!;:\\/<>(){}[]\"'\r\n\t=+-|*%@#$^&";
//代码分词结果
var code_array = [];
//分词用索引
var word_index = 0;

//遍历代码并分词
for(i=0; i<js_code.length; i++){
    //不属分词特征，如function、var、变量等
    if(word_split.indexOf(js_code.charAt(i)) == -1){
        //分词索引位置为空
        if( typeof(code_array[word_index]) == "undefined" || code_array[word_index] == null){
            //未初始化的位置赋值为空白字符
            code_array[word_index] = "";
        }
        //此处拼接字符用+=，如遇到function字符串，会逐渐变成：f、fu、fun,直到成为function，然后遇到分词特征字符
        code_array[word_index] += js_code.charAt(i);
    }else{
        //分词特征处理
        //如果索引位置有内容，则让索引位置后移,相当于结束前一个分词索引内容
        if( typeof(code_array[word_index]) != "undefined" && code_array[word_index].length > 0){
            word_index++;
        }
        //注：\n会被charAt成为不可见的换行，此处功能可升级处理；
        code_array[word_index++] = js_code.charAt(i);
    }
}
console.log("分词完成：",code_array);

//变量加密起始
var encode_var_start_index = 0;
//语法分析、代码混淆加密处理
for(i=0; i<code_array.length; i++){
    try{
        //变量名混淆
        if(code_array[i] == "{"){
            encode_var_start_index = i;
        }
        //变量提取
        //前项检测
        if(code_array[i-1] == " " && code_array[i-2] == "var"){
            //后项检测
            if(code_array[i+1] == ";" || code_array[i+1] == " " || code_array[i+1] == "=" || code_array[i+1] == "\r" || code_array[i+1] == "\n"){
                //原始变量名
                var var_origin = code_array[i];
                //新变量名
                var var_now = get_rand_name();
                for(j=encode_var_start_index; j<code_array.length; j++){
                    if(code_array[j] == var_origin){
                        console.log("变量名混淆：", var_origin, "变为", var_now);
                        code_array[j] = var_now;
                    }
                    if(code_array[j] == "}"){
                        break;
                    }
                }
            }
        }
        //字符串加密
        if(code_array[i] == '"'){
            var before_string = code_array[++i];
            if(before_string.length > 1){
                console.log("字符串加密：", before_string, "变为", to_hex(before_string));
                code_array[i] = to_hex(before_string);
            }
        }
        //成员方法加密
        if(code_array[i] == "."){
            var before_member_string = code_array[++i];
            console.log("成员方法加密：", before_member_string, "变为", to_hex(before_member_string));
            code_array[i] = "[\"" + to_hex(before_member_string) + "\"]";
            code_array[i-1] = "";
        }
        //数值加密
        if(code_array[i].length <= 5){
            if(parseInt(code_array[i], 10) == code_array[i]){
                //异或加密算法
                var key = parseInt(Math.random() * (99999-10000) + 10000, 10);
                var cipher_num = code_array[i] ^ key;
                console.log("数值加密：",code_array[i], "变为", key + "^" + cipher_num);
                code_array[i] = key + "^" + cipher_num;
            }
        }
    }catch(e){
        console.log(e.message)
    }
}
console.log("混淆加密完成：");
console.log(code_array.join(""));

//随机变量名
function get_rand_name(){
    var rand_name = [];
    var_count++;
    var c = "abcdefghijklmnopqrstuvwsyz";
    for (k=0; k<4; k++) {
        rand_name[rand_name.length] = c.charAt(parseInt(Math.random() * 26));
    }
    return rand_name.join("") + var_count;
}

//字符串转换为16进制形式
function to_hex(val) {
    var str = new Array();
    for (var i = 0; i < val.length; i++) {
        var c = val.charCodeAt(i);
        if (c >= 0 && c < 256) {
            str[str.length] = "\\x" + val.charCodeAt(i).toString(16);
        } else {
            str[str.length] = "\\u" + val.charCodeAt(i).toString(16);
        }
    }
    return str.join("");
}

Execute result

The significance and role of JavaScript obfuscation and encryption
JavaScript code obfuscation encryption can prevent code from being analyzed, copied, and misappropriated, solve the inherent transparency of JavaScript code, and protect the code and product.

Someone may ask: Can JavaScript code that has been obfuscated and encrypted be reversed and restored to its original code?
Rigorously speaking: No, at least it is impossible to completely restore it.

JavaScript code obfuscation and encryption can be achieved through various technical means, generally classified as encoding, encryption algorithms, code transformation, and logical sequence changes. It cannot be denied that some of these methods can be reversed. For example, character encoding - once the string content is encoded in Unicode, it can be reversed and decoded.

For example:

console.log("hi");

After encoding, a new code can be obtained:

console['\x6c\x6f\x67']("\u0068\u0069");

"log" was transformed into "\x6c\x6f\x67", which represents hexadecimal encoding, and "hi" was transformed into "\u0068\u0069", which represents Unicode encoding. Such encoded codes can indeed be reversed, as in:

Decoding method of \x6c\x6f\x67:

"\x6c\x6f\x67".replace(/\\x(\w{2})/g,function(x){ return String.fromCharCode(parseInt(x,16)) });

Decoding method of \u0068\u0069:

"\u0068\u0069".replace(/\\/g,"%");

At this stage, the viewpoint is that while some things may be reducible, many more complex encryption methods are impossible to crack or reverse, as exemplified by the variable name encryption demonstrated in this article.

For example:

var name = "Tom"; var age = "18";

In these two lines of code, two variables are defined, which can be inferred from their names, one is for name and the other is for age. After confusion encryption, the following can be obtained:

var _ = "Tom"; var _2 = "18";

Two meaningful variable names have been changed to meaningless underscores. Imagine, if you only had access to the encrypted code and there were 3000 such meaningless variables, how would you know what the original variable names were? It is impossible to restore them.

Here are some more examples of code:

var num1 = 3; var num2 = 1+2; var num3=function(s,h){return s^h;}(738158,738157);

After encryption, we can get:

var num1=function(s,h){return s^h;}(874506,874505); var num2=function(s,h){return s+h;}(722724^722725,768350^768348); var num3=function(s,h){return s^h;}(738158,738157);

This example is a deformation of the value, and the encrypted code logic is indeed simple, it seems that the final assignment can be considered as 3, for example:

var num1 = 3; var num2 = 3; var num3 = 3;

However, this is only speculation. If this violent method is used to restore the original code, it will already be damaged, that is, the cracking is wrong!
Some people may say: It's almost the same as the original code value. It seems to be, but the logic has been destroyed. The importance of logic to code is well understood by programmers.
Moreover, this is just a simple example of code, and encrypted code in reality cannot be so simple. Regardless of the code scale, this excessive reduction will inevitably damage the original meaning of the code function.
For example: In the three lines of code mentioned above, "zombie code" can be inserted during encryption to become:

var _0x; var num1 = function (s, h) { return s ^ h;}(467688, 467691); _0x = 972677 ^ 972685; var _0x2; var num2 = function (s, h) { return s + h;}(754013 ^ 754012, 887759 ^ 887757); _0x2 = "cpde"; var _0x3 = function (s, h) { return s + h;}(807363 ^ 807370, 383146 ^ 383147); var num3 = function (s, h) { return s ^ h;}(738158, 738157); _0x3 = 738774 ^ 738774;

Note: The variable names have not been modified to distinguish the original three variables.

In other words: useless "zombie code" is randomly inserted into the code, and these codes may contain variables, functions, logic, assignments, and references. In reality, it is impossible to distinguish which is the original variable and which is the newly generated zombie variable.

Additionally, as mentioned earlier, there are many encryption options for JavaScript code protection. When used together, they will further enhance the strength of encryption protection, making it increasingly unlikely for the original code to be cracked and obtained.

Therefore, in summary: the encrypted code cannot be reversed and restored; Obfuscating and encrypting JavaScript code to protect it is a useful and practical technique.

DEV Community

JavaScript Magic Tricks: From Lexical Analysis to Confusing Encryption

Example

lexical analysis

Source Code

Execute

Source code

Full source code

Execute result

Top comments (0)

Read next

Global Error Handling in Angular

How to create a simple appointment calendar

HTML Formatting Tags

Github's Top 36 items of Dec 19, 2024