The ruby on rails Application to scrape the link uploaded from CSV file and 
find the occurance of link in particular page.
In the application user need to pass a csv and list of users email to whom the parsed CSV will be sent.
In the csv there will be three 2 column:
• refferal_link
• home_link 
• and there values like below
First of all we will create the rails application
$ rails new scrape_data
$ cd scrape_data
Then we will genrate the UploadCsv module, run the below command
$ rails g scaffold UploadCsv generated_csv:string csv_file:string
That will create All the required model, controller and migrations for csv_file
Then we will start by first upload the file in DB
replace the below code in files app/views/upload_csvs/_form.html.erb
we added the below code to upload file in view
<%= form_with(model: upload_csv, local: true) do |form| %>
  <% if upload_csv.errors.any? %>
    
<%= pluralize(upload_csv.errors.count, "error") %> prohibited this upload_csv from being saved:
  <ul>
    <% upload_csv.errors.full_messages.each do |message| %>
      <li><%= message %></li>
    <% end %>
  </ul>
</div>
<% end %>
<%= form.label :csv_file %>
<%= form.file_field :csv_file %>
<%= form.submit %>
<% end %>
Then we will add the gem for upload a csv_file
add the below line in gem file
gem 'carrierwave', '~> 2.0'
$ bundle install
Then we will create the uploader in carrierwave
$ rails generate uploader Avatar
we will attach the uploader in model
app/models/upload_csv.rb
class UploadCsv < ApplicationRecord
  mount_uploader :csv_file, AvatarUploader
end
before moving further just check your application is working
run below commands
$ rake db:create db:migrate
update the routes
Rails.application.routes.draw do
  resources :upload_csvs
  root 'upload_csvs#index'
end
$ rails s
Then we will create a Job to read the CSV file and scrape the link from it
and genrated file will be save in generated_csv column of that records
for genearting the job we will do like below
$ rails generate job genrate_csv
add the below gem and run bundle install
gem 'httparty'
gem 'nokogiri'
then we will replace the code with below
class GenrateCsvJob < ApplicationJob
  queue_as :default
def perform(upload_csv)
    processed_csv(upload_csv)
    file = Tempfile.open(["#{Rails.root}/public/generated_csv", '.csv']) do |csv|
      csv << %w[referal_link home_link count]
      @new_array.each do |new_array|
        csv << new_array
      end
      file = "#{Rails.root}/public/product_data.csv"
      headers = ['referal_link', 'home_link', 'count']
      file = CSV.open(file, 'w', write_headers: true, headers: headers) do |writer|
        @new_array.each do |new_array|
          writer << new_array
        end
        upload_csv.update(generated_csv: file)
      end
    end
    NotificationMailer.send_csv(upload_csv).deliver_now! if @new_array.present? 
    #need to genrate the mailer and follow the mailer steps
  end
# Method to get the link count and stores in the array
  def processed_csv(upload_csv)
    @new_array = []
    CSV.foreach(upload_csv.csv_file.path, headers: true, header_converters: :symbol) do |row|
      row_map = row.to_h
      page = HTTParty.get(row_map[:refferal_link])
      page_parse = Nokogiri::HTML(page)
      link_array = page_parse.css('a').map { |link| link['href'] }
      link_array_group = link_array.group_by(&:itself).map { |k, v| [k, v.length] }.to_h
      @new_array.push([row_map[:refferal_link], row_map[:home_link], (link_array_group[row_map[:home_link]]).to_s])
    end
  end
end
Then we will attach the job after_create of upload_csvs and we will add the validation for csv_file require
please update the code of app/models/upload_csv.rb
class UploadCsv < ApplicationRecord
  mount_uploader :csv_file, AvatarUploader
  after_create :processed_csv
  def processed_csv
    GenrateCsvJob.perform_later(self)
  end
end
then check after uploding file your scrape genrated file will be updated you can check generated csv
inside  /scrape_data/public/product_data.csv 
we can send through email by using below instruction
First of we will genrate the mailer
$ rails generate mailer NotificationMailer
update the code of app/mailers/notification_mailer.rb
def send_csv(upload_csv)
    @greeting = 'Hi'
    attachments['parsed.csv'] = File.read(upload_csv.generated_csv)
    mail(to: "sample@gmail.com", subject: 'CSV is parsed succesfully.')
  end
end
please configure the mail configure also config/environments/development.rb or production.rb
add below lines in the file
config.action_mailer.default_url_options = { host: 'https://sample-scrape.herokuapp.com/' }
config.action_mailer.delivery_method = :smtp
config.action_mailer.smtp_settings = {
  user_name: 'sample@gmail.com',
  password: '*******123456',
  domain: 'gmail.com',
  address: 'smtp.gmail.com',
  port: '587',
  authentication: :plain
}
config.action_mailer.raise_delivery_errors = false
and update the view also app/views/notification_mailer/send_csv.html.erb
CSV has been processed, Thanks!
, Please check attachment to recieve the email
Thanks!
              
    
Top comments (0)