The objective is to extract the addresses of tokens.
We can use this data for several purpouse :
Ex. Ussually each token is deployed in more than one Chain with different contract address. Automatizating task of having to search for a token address manually.
Set-Up
To do this we will use "node" and "puppeteer"
library to scrape coingecko website.
npm init --y
npm install puppeteer
Step 1_ Fetching web's urls
We will study the DOM of Coingecko's web in order to automate the data extraction process.
For it, we will follow the following steps:
- We will locate in which DOM elements the data we want to extract is stored. (using web inspector)
We're extracting each token url to extract smart contract's address.
- Once located the DOM elements, we will look at the attributes (class, id, type...) to be able to extract them using the following commands:
First, wrap all the code inside an IIFE async function:
const fs = require("fs");
const puppeteer = require("puppeteer");
(async () => {
Code...
}();
Create a new "page" object using "puppeteer" and go to https://www.coingecko.com/es
Use page objects built-in "evaluate" method to acces to the website DOM.
Use "querySelectorAll" pointing to the element "a" inside element with "coin-name" class.
const browser = await puppeteer.launch({ headless: false });
const page = await browser.newPage();
await page.goto("https://www.coingecko.com/es");
// Evaluating page's DOM elements and extracting data
const duplicatedUrls = await page.evaluate(() => {
// Extracting DOM elemets to an array // Contains href and ticker
const _rawUrls = document.querySelectorAll(".coin-name a");
// Looping to extract and filter data
const _duplicatedUrls = [];
for (let i = 0; i < _rawUrls.length; i++) {
let _url = _rawUrls[i].href;
_duplicatedUrls.push(_url);
}
return _duplicatedUrls;
});
// Deleting duplicated data from urls
const urls = [...new Set(duplicatedUrls)];
@Dev: Coingecko website may be updated, the attributes of the extracted elements may change. In this case, you will have to re-write the logic to extract DOM elements.
Step 2_ Looping in each token url and fething data
For loop, wraps the logic that will take care of getting data, filtering, structuring, and returning to an array.
Use "evaluate" method to fetch DOM element using "querySelectorAll([data-address]
)" and use "getAttribute()" method to get token's data: symbol, chainId, address and decimals.
@notice Loop ends before step 5
// Fetching token addresses from urls, "addresses" will be final array
const addresses = [];
for (let i = 0; i < urls.length; i++) {
await page.goto(urls[i]);
const tokenAddress = await page.evaluate(() => {
/*
DATA FETCHING LOGIC
*/
var _rawElement;
try {
_rawElement = document.querySelectorAll(`[data-address]`);
} catch (error) {
// Most coins, also have "data-address" atribute in DOM element
console.log("Is a coin, has not an address"); // Logs are only visible in pupperteer opened browser
return undefined;
}
// If is a coin, we don't search for an address, "_rawElement" will be false
if (_rawElement) {
// We will run code inside puppeteer's opened browser
// Extracting raw data
let _tokenAddress = [];
for (let i = 0; i < _rawElement.length; i++) {
_tokenAddress.push([
_rawElement[i].getAttribute("data-symbol"),
_rawElement[i].getAttribute("data-chain-id"),
_rawElement[i].getAttribute("data-address"),
_rawElement[i].getAttribute("data-decimals"),
/* search for more INFO in DOM element */
]);
}
Step 3_ Filtering data
When our logic checks Coin's data (not Token's) will return undefined. Usign following logic:
Check "data-address" if is equal to "0x" or not.
Check if "data-chain-id" is null or undefined or empty.
More info about differences between Coin and token
AGREGAR UN BLOG MIO DE DIFERENCIA ENTRE COIN Y TOKEN
https://www.outlookindia.com/business/demystified-the-difference-between-crypto-coins-and-crypto-tokens-read-here-for-details-news-197683
Use "[...new Set(_array)]" to delete duplicated data.
Use array.prototype.filter method to delete "null" variables.
// As mentioned before, we need to guarantee to return "undefined" if it is a Coin
// 2 checks
// Comparing "data-address" if starts with "0x"
let isToken = false;
// Checking if there is a "data-chain-id" value // In near future, maybe we'll need more filters
let isChain = false;
for (let i = 0; i < _rawElement.length; i++) {
const addr = _rawElement[i]
.getAttribute("data-address") // hasta en los tokens, puede veinr uno con un string
.substring(0, 2);
if (addr === "0x") {
isToken = true;
console.log("is a token"); // Logs are only visible in pupperteer opened browser
}
const chainTest = _rawElement[i].getAttribute("data-chain-id");
if (chainTest) {
isChain = true;
}
}
if (!isToken || !isChain) return undefined;
// Cleaning data
const _elements = [...new Set(_tokenAddress)];
// Checking duplicated arrays with null values to delete them
const _tokenData = _elements.filter(
(item) => item[0] !== null && item[1] !== null && item[2] !== null
);
Step 4_ Structuring data
Create a new object and fill it in new properties (using filtered data). Thus, we're not trasfering any useless data.
const tokenObject = {};
// un objeto con su ticker para cada token
tokenObject[`${_tokenData[0][0]}`] = {
symbol: `${_tokenData[0][0]}`,
};
// Dividing in an array of chains where the token is deployed
const chains = [];
// Dividing in an array of addresses of the token
const tokenAddressPerChain = [];
// Dividing in an array token`s decimals
const tokenDecimals = [];
for (let i = 0; i < _tokenData.length; i++) {
chains.push(_tokenData[i][1]);
tokenAddressPerChain.push(_tokenData[i][2]);
tokenDecimals.push(_tokenData[i][3]);
}
// Adding data to final object, overrides duplicated data
for (let i = 0; i < chains.length; i++) {
tokenObject[`${_tokenData[0][0]}`][`${chains[i]}`] = {
address: [`${tokenAddressPerChain[i]}`],
decimals: [`${tokenDecimals[i]}`],
/* ADD more INFO to json*/
};
}
return tokenObject;
} else return undefined;
});
// THE LOOP ENDS HERE
await page.goBack();
if (tokenAddress) {
addresses.push(tokenAddress);
}
}
Step 5_ Create a json file
fs.writeFileSync("json/TokenAddresses.json", JSON.stringify(addresses));
// Closing the browser
await browser.close();
Final json pre-view
[
{
LEO: {
1: {
address: ["0x2af5d2ad76741191d15dfe7bf6ac92d4bd912ca3"],
decimals: ["18"],
},
symbol: "LEO",
},
},
{
MATIC: {
1: {
address: ["0x7d1afa7b718fb893db30a3abc0cfc608aacfebb0"],
decimals: ["18"],
},
56: {
address: ["0xcc42724c6683b7e57334c4e856f4c9965ed682bd"],
decimals: ["18"],
},
137: {
address: ["0x0000000000000000000000000000000000001010"],
decimals: ["18"],
},
1666600000: {
address: ["0x301259f392b551ca8c592c9f676fcd2f9a0a84c5"],
decimals: ["18"],
},
symbol: "MATIC",
},
},
...
]
I hope it has been helpful.
Top comments (0)