Original: https://lwebapp.com/en/post/regular-expression-to-match-multiple-lines-of-text
Question
In our daily work, in order to improve work efficiency, we may write scripts to automate tasks. Because some websites require users to log in, the automatic login function of the script is essential.
However, when we log in to the website, we often see verification codes. The purpose of verification codes is to prevent machine logins and automate script operations. Is there a way for scripts to automatically identify verification codes to achieve login?
Next, I will use bilibili.com as an example to explain to you how to solve the most critical verification code problem in the automatic login script.
Explore
First of all, you need to experience the login method of this website and understand its verification code type.
Open https://www.bilibili.com/, open the console, click login, then a small login box will pop up in the middle, usually after entering the account and password, the verification code box will pop up, we guess the verification code interface has been requested at this time.
Since the English of the verification code is captcha, we search for captcha in the network panel
An interface related to verification code was found
https://passport.bilibili.com/x/passport-login/captcha
Click on the interface to see the results, and there is some useful information, we found that the captcha type is geetest.
{
"code": 0,
"message": "0",
"ttl": 1,
"data": {
"type": "geetest",
"token": "b416c387953540608bb5da384b4e372b",
"geetest": {
"challenge": "aeb4653fb336f5dcd63baecb0d51a1f3",
"gt": "ac597a4506fee079629df5d8b66dd4fe"
},
"tencent": {
"appid": ""
}
}
}
Through searching, I found that the verification code service used by bilibili.com is provided by geetest, which is used by many websites. The feature of geetest verification code is to move puzzles and select words or numbers in order.
So next, let's find a way to recognize the geetest verification code.
I learned about the verification code solutions provided on the market, and the most effective ones are basically OCR service providers. After comparison, I found that the service of 2Captcha is very good, with fast decoding speed, stable server connection, multi-language API support, and reasonable price, I decided to try 2Captcha.
Next, we will show the use of Nodejs + Playwright + 2Captcha to solve the login verification code problem at bilibili.com.
If you want to use other languages and frameworks, such as
Python+Selenium, you can also refer to this tutorial, the idea of solving the problem is the same.
Solution
- How to identify the verification code
First read the official document 2Captcha API Geetest, the solution is very detailed, simply put
- By intercepting the website interface, get the two verification code parameters
gtandchallenge, requesthttp://2captcha.com/in.php, and get the verification codeID - Request
http://2captcha.com/res.phpafter a period of time, and get thechallenge,validate,seccodeof successful verification
- How to apply verification results
After getting the most critical validate, simulate the user to fill in the account and password to log in, intercept the return parameters of the verification code request interface, replace them with the parameters of successful verification, and then trigger the login interface.
Next, we analyze the detailed steps
Environment
Let's build the script execution environment first.
We use Node.js + Playwright for scripting.
Make sure that Nodejs has been installed locally on your computer
Create a new empty project and install
Playwright
mkdir bypass-captcha
cd bypass-captcha
npm init
npm i -D playwright
We adopt
Playwright's library mode, detailed documentation: Playwright
- Create a new script file
captcha.jsin the project root directory, fill in the following content, runnode captcha.json the command line to simply test whether the project can be started normally
const { chromium } = require("playwright");
(async () => {
const browser = await chromium.launch({
headless: false,
});
const page = await browser.newPage();
await page.goto("https://www.bilibili.com/");
await browser.close();
})();
Under normal circumstances, a Google browser interface will pop up, displaying the home page of bilibili.com, and then the browser will automatically close.
Request in.php interface
- First, sort out the parameters required to request the
http://2captcha.com/in.phpinterface. You can see the list of parameters. We will pay attention to the parameters that must be passed.
| Parameter | Type | Required | Description |
|---|---|---|---|
| key | String | Yes | your API key |
| method | String | Yes | geetest - defines that you're sending a Geetest captcha |
| gt | String | Yes | Value of gt parameter you found on target website |
| challenge | String | Yes | Value of challenge parameter you found on target website |
| api_server | String | No | Value of api_server parameter you found on target website |
| pageurl | String | Yes | Full URL of the page where you see Geetest captcha |
| header_acao | IntegerDefault: 0 | No | 0 - disabled1 - enabled.If enabled in.php will include Access-Control-Allow-Origin:* header in the response. Used for cross-domain AJAX requests in web applications. Also supported by res.php. |
| pingback | String | No | URL for pingback (callback) response that will be sent when captcha is solved.URL should be registered on the server. More info here. |
| json | IntegerDefault: 0 | No | 0 - server will send the response as plain text1 - tells the server to send the response as JSON |
| soft_id | Integer | No | ID of software developer. Developers who integrated their software with 2captcha get reward: 10% of spendings of their software users. |
| proxy | String | No | Format: login:password@123.123.123.123:3128 You can find more info about proxies here. |
| proxytype | String | No | Type of your proxy: HTTP, HTTPS, SOCKS4, SOCKS5. |
| userAgent | String | No | Your userAgent that will be passed to our worker and used to solve the captcha. |
-
keyneeds to be registered on the 2Captcha official website, and there is anAPI keyin the account settings of the dashboard. Need to recharge a certain amount -
methodis a fixed valuegeetest -
gtandchallengehave been seen before in the interface of the website login page. However, there is a note here,gtis only one value per website, thegtvalue of bilibili.com isac597a4506fee079629df5d8b66dd4fe, butchallengeis a dynamic value, each API request will get a newchallengevalue . Once the captcha is loaded on the page, thechallengevalue becomes invalid. So you need to listen to the requesthttps://passport.bilibili.com/x/passport-login/captcha, when the website login page loads, and re-identify the newchallengevalue each time. The following will explain how to listen. -
pageurlis the address of the login pagehttps://www.bilibili.com/
So we can get a request interface like this
http://2captcha.com/in.php?key=1abc234de56fab7c89012d34e56fa7b8&method=geetest>=ac597a4506fee079629df5d8b66dd4fe&challenge=12345678abc90123d45678ef90123a456b&pageurl=https://www.bilibilicom/
- Next, solve the problem of getting a new
challengevalue every time you enter the home page
The process of simulating user click login
Start Google Chrome first and open the home page of bilibili.com
Click the login button at the top, a login box will pop up
At this time, the verification code interface has been sent, and you can intercept the values of
gtandchallengeby listening to the response returned by the verification code interface.
const { chromium } = require("playwright");
(async () => {
// Select the Chrome browser, set headless: false to see the browser interface
const browser = await chromium.launch({
headless: false,
});
const page = await browser.newPage();
// open bilibili.com
await page.goto("https://www.bilibili.com/");
const [response] = await Promise.all([
// request verification code interface
page.waitForResponse(
(response) =>
response.url().includes("/x/passport-login/captcha") &&
response.status() === 200
),
// Click the login button at the top
page.click(".header-login-entry"),
]);
// Get the interface response information
const responseJson = await response.body();
// Parse out gt and challenge
const json = JSON.parse(responseJson);
const gt = json.data.geetest.gt;
const challenge = json.data.geetest.challenge;
console.log("get gt", gt, "challenge", challenge);
// Pause for 5 seconds to prevent the browser from closing too fast to see the effect
sleep(5000);
// close the browser
await browser.close();
})();
/**
* Simulate the sleep function, delay for a number of milliseconds
*/
function sleep(delay) {
var start = new Date().getTime();
while (new Date().getTime() < start + delay);
}
- Use the
requestlibrary to request thein.phpinterface separately
Install request first
npm i request
Now it is time to request the http://2captcha.com/in.php interface
// request in.php interface
const inData = {
key: API_KEY,
method: METHOD,
gt: gt,
challenge: challenge,
pageurl: PAGE_URL,
json: 1,
};
request.post(
"http://2captcha.com/in.php",
{ json: inData },
function (error, response, body) {
if (!error && response.statusCode == 200) {
console.log("response", body);
}
}
);
Under normal circumstances, the verification code ID will be returned at this time, such as {"status":1,"request":"2122988149"}, just take the request field.
If the interface returns the code
ERROR_ZERO_BALANCE, it means that your account balance is insufficient and you need to recharge. I have recharged the minimum amount here for demonstration, and you can experience it according to your own needs.
Extended Learning
In order to improve security, we refer to the API Key in the environment variable file.
- Create a new environment variable file
.envin the root directory and write the value ofAPI Key
# .env file
API_KEY="d34y92u74en96yu6530t5p2i2oe3oqy9"
- Then install the
dotenvlibrary to get the environment variables
npm i dotenv
- Use it in js
require("dotenv").config();
In this way, the variables in .env can be obtained through process.env.API_KEY. Usually .env files are not uploaded to the code repository to ensure the security of personal information.
- If you don’t want to write the information to the file while ensuring security, you can also directly enter the Node.js environment variable in the console, such as
API_KEY=d34y92u74en96yu6530t5p2i2oe3oqy9 node captcha.js
Request res.php interface
- Before requesting the interface, we also sort out the required parameters
GET parameter Type Required Description key String Yes your API key action String Yes get - get the asnwer for your captcha id Integer Yes ID of captcha returned by in.php. json IntegerDefault: 1 No Server will alsways return the response as JSON for Geetest captcha.
-
keyisAPI_KEY, which is also used in the previous interface -
actionis fixed valueget -
idis the captchaIDjust returned byin.php
- 20 seconds after the last request, request the
http://2captcha.com/res.phpinterface to get the verification result
request.get(
`http://2captcha.com/res.php?key=${API_KEY}&action=get&id=${ID}&json=1`,
function (error, response, body) {
if (!error && response.statusCode == 200) {
const data = JSON.parse(body);
if (data.status == 1) {
console.log(data.request);
}
}
}
);
The interface will return three values challenge, validate and seccode, each parameter is a string
{
"geetest_challenge": "aeb4653fb336f5dcd63baecb0d51a1f3",
"geetest_validate": "9f36e8f3a928a7d382dad8f6c1b10429",
"geetest_seccode": "9f36e8f3a928a7d382dad8f6c1b10429|jordan"
}
Among them, challenge is the parameter we intercepted earlier, validate is the verification result identifier, and the content of seccode is basically the same as that of validate, with only one more word. We need to store validate for later use.
Sometimes the verification code cannot be verified here. You can try several times, or contact 2Captcha official website to troubleshoot the problem
At this point, the information of the verification code verification result has been obtained, and the next step is to log in with the verification result.
Login
- Let’s first study the login process after a normal user clicks on the verification code to verify the success
We found three interfaces
-
https://api.geetest.com/ajax.php: verification code interface, used to generate verification code and verify whether the verification code is passed. Thevalidatefield in the data returned by the validation interface is thegeetest_validateobtained by the 2Captcha service.
-
https://passport.bilibili.com/x/passport-login/web/key?_=1649087831803: password encryption interface, used to obtain hash and public key
-
https://passport.bilibili.com/x/passport-login/web/login: login interface, input parameters include account, password,token,challenge,validateandseccode, etc.
We analyze these interfaces, two login schemes are available.
- The first solution is to request the encryption interface and the login interface in the
Node.jsenvironment to obtain the user's cookie information, and the user can log in directly with the cookie information. The difficulty of this scheme is that it needs to deal with password encryption separately, which is not very friendly to beginner. - The second solution is to use
Playwrightto simulate the user to fill in the account and password to log in, randomly click the verification code to trigger the login, intercept the response parameter of the verification code interface, replace it with the successful verification code, and then trigger the login interface.
We take the second solution.
But I also encountered difficulties, in the Node.js environment, the verification code image could not be loaded. Then, I found the verification code interface https://api.geetest.com/ajax.php is also responsible for pulling the verification code image and verifying verification code. We directly intercept the request when pulling the verification code image, and replace the verification result to trigger the login, without waiting for the image verification code to come out. This detail is critical.
Conclusion
The above is some research on common automatic login functions in automated testing tasks. Combine the strengths of Node.js, Playwright, and 2Captcha, the verification code recognition is realized. I have uploaded the complete code to GitHub.
There may be many places to be optimized, and you are welcome to point out.
Disclaimer: This script is only used as a test and learning case, and the risk is self-assessed.


Top comments (0)
Some comments may only be visible to logged-in visitors. Sign in to view all comments.