Joshua Mayhew

Posted on Oct 17, 2023

Script, Scrape, Reflect: Automating My Code Journey

#ruby #programming #tutorial #productivity

In today's tech world, chronicling one's journey can be as enlightening as the code we write. If you're like me, you might be striving for consistent growth and ways to effectively track your coding progress. My solution? Scripting an automated process to archive my Codewars katas for personal reflection and evolution.

The Motivation Behind Archiving

Coding on platforms like Codewars is more than just solving problems—it's a chronicle of our growth, understanding, and the evolution of our problem-solving abilities. But with the vast number of challenges available, it can sometimes feel like our milestones are becoming mere data points lost in the vast sea of challenges.

For many developers, platforms like GitHub are not just code repositories but are resumes, reflections, and records of our daily commitment to the craft. As such, I identified a twofold objective:

Consolidation for Reflection: Having all my katas in one place for personal reference and review.
Visibility of Growth: Showcasing daily coding efforts as consistent GitHub activity, ensuring every problem solved is a step visibly recorded in my coding chronicle.

A Respectful Approach to Automation

Before diving into the details, it's crucial to emphasize my respect for platform rules. Codewars has valid reasons for keeping solutions private, and I wholly align with those principles. The primary objective of this automation is to archive solutions privately for my personal reference and reflection, not to publicize or share them. These solutions are committed to a private GitHub repo, accessible only by me, thereby preserving the integrity of the Codewars community.

How Other Developers Can Benefit

The method is straightforward, and the benefits are manifold:

Web Scraping for Data Extraction: Leveraging web scraping libraries, I programmatically navigate through my Codewars profile. Essential kata details like name, difficulty, my solution, and the date of resolution are extracted.
Clean Structuring: Post-extraction, the katas are formatted consistently. This ensures that when I or any developer looks back, the information is easily digestible.
Automated GitHub Activity: The daily efforts are reflected as GitHub commits in a private repository, emphasizing a daily coding habit. It's like a digital diary, only more technical.
Scheduled Automation: For the true spirit of automation, I've scheduled the script to run at set intervals, ensuring my GitHub activity stays updated without manual intervention.

To fellow coders and tech enthusiasts, this approach is more than just a technical exercise. It's a testament to our journey, a personal commitment to growth, and a nod to the ever-evolving landscape of tech. By archiving our milestones, we're not just storing code; we're storing memories, lessons, and hours of dedicated effort.

Whether you decide to adopt a similar approach or have your unique way of chronicling your journey, the essence lies in consistent growth and reflection. Here's to many more lines of code and milestones in our shared journey of development!

Understanding the CodewarsKataScraper Automation

This script offers a streamlined solution for automating the extraction of solved katas from Codewars, storing them, and subsequently pushing them to a private GitHub repository.

1. Initial Setup and Constants

The constants defined at the top of the class give a structure to the program:

API_ENDPOINT: The endpoint to fetch completed challenges (kata).
LOCAL_REPO_PATH: The path to your local repository where the solutions are stored.
GITHUB_REPO: The GitHub repository reference.
STORED_KATAS_PATH: The path to the JSON file which stores all completed katas.

2. Initializing and Preparing the Scraper

The initialize method sets up necessary configurations and loads environment variables required for logging into Codewars. Within it, the initialize_driver method ensures that the web driver for Selenium is properly set up, as Selenium is pivotal for the web scraping process.

def initialize
  # Configuration setup and environment variable loading
  @api_endpoint = 'API_ENDPOINT_URL'
  @local_repo_path = 'YOUR_LOCAL_PATH'
  # ... Other constants

  initialize_driver  # Setting up the Selenium web driver
end

def initialize_driver
  # Assuming you're using the Selenium Webdriver with Chrome
  options = Selenium::WebDriver::Chrome::Options.new
  options.add_argument('--headless')  # For headless browsing
  @driver = Selenium::WebDriver.for :chrome, options: options
end

3. Running the Scraper

The run method serves as the main driver function:

It starts by logging into Codewars using login_to_codewars.
Fetches completed katas via fetch_completed_katas.
Compares them to existing katas to find any new ones.
For new katas, it views and scrapes solutions.
Stores the new katas to the local storage.
Finally, commits and pushes updates to the GitHub repository.

def run
  login_to_codewars

  katas = fetch_completed_katas
  new_katas = katas - fetch_existing_kata

  if new_katas.any?
    new_katas.each do |kata|
      scrape_kata_solutions(kata)
    end

    store_new_kata(new_katas)
    commit_and_push_to_git
  else
    puts "No new katas found."
  end
end

4. Web Interaction using Selenium

Selenium is an incredibly powerful tool for controlling a web browser through programs and performing browser automation. It's functional for all browsers, works on all major operating systems, and is available in various languages, including Python, Java, and Ruby. In the context of this script, here's why Selenium plays a pivotal role:

Dynamic Content Loading: Many modern websites, including Codewars, use AJAX and other techniques to load content dynamically. Selenium can interact with dynamic content, ensuring all necessary data is loaded and accessible.
Interactivity: Beyond just scraping static content, there might be a need to interact with page elements, like clicking on buttons or navigating through paginated content. Selenium provides the functionality to simulate these user interactions.
User Simulation: By mimicking a real user's interactions with a website, Selenium can navigate through login screens, pop-ups, and other elements that might be challenging for more basic scraping tools.
Consistency Across Browsers: While this script may be designed for a specific browser, Selenium's cross-browser compatibility ensures that adaptations for other browsers can be made with minimal adjustments.

For the Codewars kata scraping task, the ability to programmatically log in, navigate, and interact with the platform's interface is crucial, and Selenium offers the most reliable and efficient means to achieve this.

Several methods interact with web pages, like logging into Codewars and viewing kata solutions:

 def login_to_codewars
    @driver.get('https://www.codewars.com/users/sign_in')
    email_input = @driver.find_element(id: 'user_email')
    password_input = @driver.find_element(id: 'user_password')
    sign_in_button = @driver.find_element(class: 'btn')

    email_input.send_keys(@codewars_username)
    password_input.send_keys(@codewars_password)
    sign_in_button.click

    # Wait for page to load completely
    wait = Selenium::WebDriver::Wait.new(timeout: @timeout)
    wait.until { @driver.execute_script('return document.readyState') == 'complete' }
  end

def view_kata_solutions(kata_data)
    kata_id = kata_data['id']
    languages = kata_data['completedLanguages']

    languages.each do |language|
      solutions_url = "https://www.codewars.com/kata/#{kata_id}/solutions/#{language}/me"
      @driver.get(solutions_url)

      begin
        # Wait for a critical element to be visible, confirming page load
        wait = Selenium::WebDriver::Wait.new(timeout: @timeout)
        selector = '#shell_content'
        wait.until { @driver.find_element(css: selector).displayed? }
      rescue Selenium::WebDriver::Error::TimeoutError
        puts 'Timed out waiting for code solutions container to load. Skipping this language.'
        next
      end
    end
  end

5. Data Extraction

The scrape_kata_solutions method is responsible for extracting the solution code and description for each completed kata. It navigates to each kata's solution page and fetches the required data:

def scrape_kata_solutions(kata_data)
  solutions = []

  # Iterate through completed languages for the kata
  kata_data[:completed_languages].each do |language|
    # Construct the URL for the kata solution page
    solution_url = "https://www.codewars.com/kata/#{kata_data[:kata_id]}/solutions/#{language}/me"

    # Navigate to the solution page
    @driver.get(solution_url)

    # Wait for the page to load (you may need to implement this)
    wait_for_page_to_load

    # Scrape code content and description
    code_content = @driver.find_element(:css, 'div.code').text
    description_content = @driver.find_element(:css, 'div.description').text

    # Store the scraped data
    solutions << {
      language: language,
      code: code_content,
      description: description_content
    }
  end

  solutions
end

6. Storing and Handling Data

There are helper methods to interact with local storage, like reading existing kata data or writing new data:

def fetch_existing_kata
  # Assuming data is stored in a JSON file
  JSON.parse(File.read(@stored_katas_path))
end

def store_new_kata(katas)
  # Write katas to the JSON file
  File.open(@stored_katas_path, 'w') do |file|
    file.write(JSON.generate(katas))
  end
end

7. Git Operations

The last step of the process involves Git operations to update your local repository and then push the changes to GitHub:

def commit_and_push_to_git
  Dir.chdir(@local_repo_path) do
    `git add .`
    `git commit -m "Added new katas"`
    `git push origin master`
  end
end

8. Execution

Finally, an instance of the scraper class is created, and the run method is called, setting the entire process in motion:

scraper = CodewarsKataScraper.new
scraper.run

9. Automating Script Execution with Cron

Once the script is set up and ready, it's essential to have it run autonomously at specified intervals. This ensures that our archival process remains consistent without manual intervention. One of the tried and true methods for task automation in Unix-like systems is using Cron.

Setting Up a Cron Job

A. Access the Cron Table: Open your terminal and type:

crontab -e

B. Add the Job: At the end of the file, specify when you want the task to run and what command to execute. For instance, to run the script daily at noon, you'd add:

0 12 * * * /usr/bin/ruby /path/to/script.rb

Schedule Breakdown:

Minute: 0 (Exact minute)
Hour: 12 (Noon)
Day of Month: * (Any day)
Month: * (Any month)
Day of Week: * (Any day)

C. Save and Close: Depending on the editor you're using, the process will vary. If using nano, for instance, save with CTRL + O and exit with CTRL + X.

D. Verify the Job: To confirm that the job has been set up correctly, type:

crontab -l

Important Notes

Cron uses the system's timezone. If you're working across different time zones, make sure to adjust your cron timing accordingly.
Ensure that your script has the necessary permissions to execute. You might need to use chmod +x yourscript.rb to grant it executable permissions.
Test your Cron job to ensure it runs successfully and that there aren't any environment-specific issues that might prevent the script from running.

By setting up this Cron job, you've ensured that your Codewars kata archival process will run automatically, keeping your GitHub repository up to date with your latest achievements.

Concluding Thoughts

Integrating this kind of automation doesn't only aid in tracking personal progress but also showcases an adept use of diverse tools and libraries. Remember, every line of code you write is a testament to your dedication and growth. By archiving and reflecting, you're paving the way for future achievements. Happy coding!

If you're interested in trying out the script, visit the repo on Github: https://github.com/jmayheww/code-wars-ruby-scraping-script

DEV Community