In today's tech world, chronicling one's journey can be as enlightening as the code we write. If you're like me, you might be striving for consistent growth and ways to effectively track your coding progress. My solution? Scripting an automated process to archive my Codewars katas for personal reflection and evolution.
The Motivation Behind Archiving
Coding on platforms like Codewars is more than just solving problems—it's a chronicle of our growth, understanding, and the evolution of our problem-solving abilities. But with the vast number of challenges available, it can sometimes feel like our milestones are becoming mere data points lost in the vast sea of challenges.
For many developers, platforms like GitHub are not just code repositories but are resumes, reflections, and records of our daily commitment to the craft. As such, I identified a twofold objective:
- Consolidation for Reflection: Having all my katas in one place for personal reference and review.
- Visibility of Growth: Showcasing daily coding efforts as consistent GitHub activity, ensuring every problem solved is a step visibly recorded in my coding chronicle.
A Respectful Approach to Automation
Before diving into the details, it's crucial to emphasize my respect for platform rules. Codewars has valid reasons for keeping solutions private, and I wholly align with those principles. The primary objective of this automation is to archive solutions privately for my personal reference and reflection, not to publicize or share them. These solutions are committed to a private GitHub repo, accessible only by me, thereby preserving the integrity of the Codewars community.
How Other Developers Can Benefit
The method is straightforward, and the benefits are manifold:
- Web Scraping for Data Extraction: Leveraging web scraping libraries, I programmatically navigate through my Codewars profile. Essential kata details like name, difficulty, my solution, and the date of resolution are extracted.
- Clean Structuring: Post-extraction, the katas are formatted consistently. This ensures that when I or any developer looks back, the information is easily digestible.
- Automated GitHub Activity: The daily efforts are reflected as GitHub commits in a private repository, emphasizing a daily coding habit. It's like a digital diary, only more technical.
- Scheduled Automation: For the true spirit of automation, I've scheduled the script to run at set intervals, ensuring my GitHub activity stays updated without manual intervention.
To fellow coders and tech enthusiasts, this approach is more than just a technical exercise. It's a testament to our journey, a personal commitment to growth, and a nod to the ever-evolving landscape of tech. By archiving our milestones, we're not just storing code; we're storing memories, lessons, and hours of dedicated effort.
Whether you decide to adopt a similar approach or have your unique way of chronicling your journey, the essence lies in consistent growth and reflection. Here's to many more lines of code and milestones in our shared journey of development!
Understanding the CodewarsKataScraper Automation
This script offers a streamlined solution for automating the extraction of solved katas from Codewars, storing them, and subsequently pushing them to a private GitHub repository.
1. Initial Setup and Constants
The constants defined at the top of the class give a structure to the program:
-
API_ENDPOINT
: The endpoint to fetch completed challenges (kata). -
LOCAL_REPO_PATH
: The path to your local repository where the solutions are stored. -
GITHUB_REPO
: The GitHub repository reference. -
STORED_KATAS_PATH
: The path to the JSON file which stores all completed katas.
2. Initializing and Preparing the Scraper
The initialize
method sets up necessary configurations and loads environment variables required for logging into Codewars. Within it, the initialize_driver
method ensures that the web driver for Selenium is properly set up, as Selenium is pivotal for the web scraping process.
def initialize
# Configuration setup and environment variable loading
@api_endpoint = 'API_ENDPOINT_URL'
@local_repo_path = 'YOUR_LOCAL_PATH'
# ... Other constants
initialize_driver # Setting up the Selenium web driver
end
def initialize_driver
# Assuming you're using the Selenium Webdriver with Chrome
options = Selenium::WebDriver::Chrome::Options.new
options.add_argument('--headless') # For headless browsing
@driver = Selenium::WebDriver.for :chrome, options: options
end
3. Running the Scraper
The run
method serves as the main driver function:
- It starts by logging into Codewars using
login_to_codewars
. - Fetches completed katas via
fetch_completed_katas
. - Compares them to existing katas to find any new ones.
- For new katas, it views and scrapes solutions.
- Stores the new katas to the local storage.
- Finally, commits and pushes updates to the GitHub repository.
def run
login_to_codewars
katas = fetch_completed_katas
new_katas = katas - fetch_existing_kata
if new_katas.any?
new_katas.each do |kata|
scrape_kata_solutions(kata)
end
store_new_kata(new_katas)
commit_and_push_to_git
else
puts "No new katas found."
end
end
4. Web Interaction using Selenium
Selenium is an incredibly powerful tool for controlling a web browser through programs and performing browser automation. It's functional for all browsers, works on all major operating systems, and is available in various languages, including Python, Java, and Ruby. In the context of this script, here's why Selenium plays a pivotal role:
Dynamic Content Loading: Many modern websites, including Codewars, use AJAX and other techniques to load content dynamically. Selenium can interact with dynamic content, ensuring all necessary data is loaded and accessible.
Interactivity: Beyond just scraping static content, there might be a need to interact with page elements, like clicking on buttons or navigating through paginated content. Selenium provides the functionality to simulate these user interactions.
User Simulation: By mimicking a real user's interactions with a website, Selenium can navigate through login screens, pop-ups, and other elements that might be challenging for more basic scraping tools.
Consistency Across Browsers: While this script may be designed for a specific browser, Selenium's cross-browser compatibility ensures that adaptations for other browsers can be made with minimal adjustments.
For the Codewars kata scraping task, the ability to programmatically log in, navigate, and interact with the platform's interface is crucial, and Selenium offers the most reliable and efficient means to achieve this.
Several methods interact with web pages, like logging into Codewars and viewing kata solutions:
def login_to_codewars
@driver.get('https://www.codewars.com/users/sign_in')
email_input = @driver.find_element(id: 'user_email')
password_input = @driver.find_element(id: 'user_password')
sign_in_button = @driver.find_element(class: 'btn')
email_input.send_keys(@codewars_username)
password_input.send_keys(@codewars_password)
sign_in_button.click
# Wait for page to load completely
wait = Selenium::WebDriver::Wait.new(timeout: @timeout)
wait.until { @driver.execute_script('return document.readyState') == 'complete' }
end
def view_kata_solutions(kata_data)
kata_id = kata_data['id']
languages = kata_data['completedLanguages']
languages.each do |language|
solutions_url = "https://www.codewars.com/kata/#{kata_id}/solutions/#{language}/me"
@driver.get(solutions_url)
begin
# Wait for a critical element to be visible, confirming page load
wait = Selenium::WebDriver::Wait.new(timeout: @timeout)
selector = '#shell_content'
wait.until { @driver.find_element(css: selector).displayed? }
rescue Selenium::WebDriver::Error::TimeoutError
puts 'Timed out waiting for code solutions container to load. Skipping this language.'
next
end
end
end
5. Data Extraction
The scrape_kata_solutions
method is responsible for extracting the solution code and description for each completed kata. It navigates to each kata's solution page and fetches the required data:
def scrape_kata_solutions(kata_data)
solutions = []
# Iterate through completed languages for the kata
kata_data[:completed_languages].each do |language|
# Construct the URL for the kata solution page
solution_url = "https://www.codewars.com/kata/#{kata_data[:kata_id]}/solutions/#{language}/me"
# Navigate to the solution page
@driver.get(solution_url)
# Wait for the page to load (you may need to implement this)
wait_for_page_to_load
# Scrape code content and description
code_content = @driver.find_element(:css, 'div.code').text
description_content = @driver.find_element(:css, 'div.description').text
# Store the scraped data
solutions << {
language: language,
code: code_content,
description: description_content
}
end
solutions
end
6. Storing and Handling Data
There are helper methods to interact with local storage, like reading existing kata data or writing new data:
def fetch_existing_kata
# Assuming data is stored in a JSON file
JSON.parse(File.read(@stored_katas_path))
end
def store_new_kata(katas)
# Write katas to the JSON file
File.open(@stored_katas_path, 'w') do |file|
file.write(JSON.generate(katas))
end
end
7. Git Operations
The last step of the process involves Git operations to update your local repository and then push the changes to GitHub:
def commit_and_push_to_git
Dir.chdir(@local_repo_path) do
`git add .`
`git commit -m "Added new katas"`
`git push origin master`
end
end
8. Execution
Finally, an instance of the scraper class is created, and the run method is called, setting the entire process in motion:
scraper = CodewarsKataScraper.new
scraper.run
9. Automating Script Execution with Cron
Once the script is set up and ready, it's essential to have it run autonomously at specified intervals. This ensures that our archival process remains consistent without manual intervention. One of the tried and true methods for task automation in Unix-like systems is using Cron.
Setting Up a Cron Job
A. Access the Cron Table: Open your terminal and type:
crontab -e
B. Add the Job: At the end of the file, specify when you want the task to run and what command to execute. For instance, to run the script daily at noon, you'd add:
0 12 * * * /usr/bin/ruby /path/to/script.rb
Schedule Breakdown:
-
Minute:
0
(Exact minute) -
Hour:
12
(Noon) -
Day of Month:
*
(Any day) -
Month:
*
(Any month) -
Day of Week:
*
(Any day)
C. Save and Close: Depending on the editor you're using, the process will vary. If using nano
, for instance, save with CTRL + O
and exit with CTRL + X
.
D. Verify the Job: To confirm that the job has been set up correctly, type:
crontab -l
Important Notes
- Cron uses the system's timezone. If you're working across different time zones, make sure to adjust your cron timing accordingly.
- Ensure that your script has the necessary permissions to execute. You might need to use
chmod +x yourscript.rb
to grant it executable permissions. - Test your Cron job to ensure it runs successfully and that there aren't any environment-specific issues that might prevent the script from running.
By setting up this Cron job, you've ensured that your Codewars kata archival process will run automatically, keeping your GitHub repository up to date with your latest achievements.
Concluding Thoughts
Integrating this kind of automation doesn't only aid in tracking personal progress but also showcases an adept use of diverse tools and libraries. Remember, every line of code you write is a testament to your dedication and growth. By archiving and reflecting, you're paving the way for future achievements. Happy coding!
If you're interested in trying out the script, visit the repo on Github: https://github.com/jmayheww/code-wars-ruby-scraping-script
Top comments (0)