Introduction:
This article will walk you through the process of extracting links from Gmail emails using Node.js and Puppeteer. We'll explore how to set up an IMAP client to access your Gmail inbox, parse email content using the 'mailparser' library, and extract links from the email body using the 'cheerio' library. Additionally, we'll use Puppeteer to interact with and navigate to the extracted links within the emails.
Prerequisites:
Before we get started, make sure you have the following prerequisites in place:
- A Gmail account: You will need a Gmail account from which you want to extract links.
- App Password: To access your Gmail account programmatically, you should generate an App Password. This password is used in place of your regular Gmail password and is more secure for applications. We'll explain how to create an App Password in the article.
- Node.js: Ensure you have Node.js installed on your computer. Getting Started: Creating an App Password
Gmail has a security feature that prevents the use of your regular password for less secure apps. To work around this, you can create an App Password. Here's how to do it:
- Sign in to your Gmail account.
- Go to your Google Account settings: - Click on your profile picture in the upper-right corner and select "Google Account." - In the left sidebar, click on "Security."
- Under "Signing in to Google," click on "App passwords."
- You may need to sign in again.
- In the "App passwords" section, click "Select app" and choose "Mail" (for email) and "Other (Custom name)" for the device. Enter a custom name (e.g., "Node.js Email Extractor").
- Click "Generate." You'll receive a 16-character app password. Make sure to copy it somewhere safe; you won't be able to see it again.
Setting Up the Code:
Now that you have an App Password, you can use it to access your Gmail account via IMAP and extract links from emails using Node.js. Here's a breakdown of the code you provided:
emailConfig: This object contains your Gmail account information, including the App Password you generated.
magicLinkSubject: This is the subject of the email you want to search for.
The code sets up an IMAP client, searches for emails with the specified subject, and then extracts and logs the email subject and body.
It uses 'cheerio' to parse the HTML content of the email and extract links within anchor tags. The links are stored in the links array and logged to the console.
Puppeteer is used to launch a browser, navigate to the first extracted link, and log the link.
Running the Code:
- Make sure you have Node.js installed on your computer.
- Install the required Node.js packages:
npm install imap mailparser cheerio puppeteer
Replace the emailConfig object's user and password properties with your Gmail address and the App Password you generated.
Run the script:
node your-script-filename.js
See below the full script:
const cheerio = require('cheerio');
const Imap = require('imap');
const { default: puppeteer} = require('puppeteer');
const simpleParser = require('mailparser').simpleParser;
const emailConfig = {
user: 'test-email@gmail.com', // Replace with your Gmail email address
password: 'generated-app-password', // Replace with your Gmail password
host: 'imap.gmail.com',
port: 993,
tls: true,
tlsOptions: {
rejectUnauthorized: false, // For testing, you can disable certificate rejection
},
connectTimeout: 100000, // 60 seconds
authTimeout: 30000,
debug: console.log,
};
const magicLinkSubject = 'Email subject example';
(async () => {
// Set up an IMAP client
const imap = new Imap(emailConfig);
imap.once('ready', () => {
imap.openBox('INBOX', true, (err) => {
if (err) {
console.error('Error opening mailbox', err);
imap.end();
return;
}
// Search for emails with the magic link subject
imap.search([['SUBJECT', magicLinkSubject]], (err, results) => {
if (err) throw err;
const emailId = results[0]; // Assuming the first result is the correct email
console.log('This is the email address: ' + emailId);
const email = imap.fetch(emailId, { bodies: '' });
email.on('message', (msg, seqno) => {
msg.on('body', (stream) => {
simpleParser(stream, async (err, mail) => {
if (err) throw err;
// Extract and log the email subject
const emailSubject = mail.text;
console.log('Email Subject:', emailSubject);
// Your code to extract and process the email content here
// Extract and log the email body
const emailBody = mail.html;
console.log('Email Body:', emailBody);
//Use cheerio to extract links
const $ =cheerio.load(emailBody);
const links=[];
$('a').each((index,element)=>{
links.push($(element).attr('href'));
});
console.log('Extracted Links', links);
const browser =await puppeteer.launch({headless: false});
const page = await browser.newPage();
await page.goto(links[0]);
console.log('this is the first link'+ links[0]);
});
});
});
});
});
});
imap.connect();
// Handle errors and edge cases as needed
})();
Top comments (0)