<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Sam Gould</title>
    <description>The latest articles on DEV Community by Sam Gould (@samg7b9).</description>
    <link>https://dev.to/samg7b9</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F970742%2Fa2c0fa76-cd46-4726-ae39-c112388a1f90.png</url>
      <title>DEV Community: Sam Gould</title>
      <link>https://dev.to/samg7b9</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/samg7b9"/>
    <language>en</language>
    <item>
      <title>Web Development for Data Scientists: Core functionality and a DevSecOps foundation</title>
      <dc:creator>Sam Gould</dc:creator>
      <pubDate>Wed, 23 Nov 2022 15:42:10 +0000</pubDate>
      <link>https://dev.to/samg7b9/web-development-for-data-scientists-core-functionality-and-a-devsecops-foundation-2fem</link>
      <guid>https://dev.to/samg7b9/web-development-for-data-scientists-core-functionality-and-a-devsecops-foundation-2fem</guid>
      <description>&lt;p&gt;In &lt;a href="https://samgould.net/index.php/2022/11/02/introduction-to-web-development-for-data-scientists/"&gt;my last post&lt;/a&gt; I outlined a number of different approaches for setting up a simple website architecture. I mentioned that websites are typically hosted on Cloud VM/VPS or managed hosting solutions, and recommended some follow-up actions which take a Hello World site to a more robust production level. In this post &lt;strong&gt;I'm going to expand on some of these follow-up actions with a focus on bolstering our DevSecOps so that we have a solid foundation from which to build&lt;/strong&gt;. Some points will be specific to WordPress sites, but most of the DevSecOps applies in general (to Linux servers). Before I get to DevSecOps however, I'm going to quickly cover my overall approach to task management and implementation of some core site elements.&lt;/p&gt;

&lt;h3&gt;
  
  
  How do I manage my web development tasks?
&lt;/h3&gt;

&lt;p&gt;Personally, I find it important to maintain structured notes and TODO lists for my projects (not just web dev). I am using the following structure which I created to manage my web development tasks:&lt;/p&gt;

&lt;p&gt;&lt;u&gt;&lt;/u&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Deployment&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Plugins&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Design&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Content&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Integrations&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;DevOps&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Server management and security&lt;br&gt;
&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Deployment, Plugins and Design
&lt;/h3&gt;

&lt;p&gt;What is quite nice about this task management structure, besides being essentially &lt;a href="https://en.wikipedia.org/wiki/MECE_principle"&gt;MECE&lt;/a&gt; and easy to understand, is that at the beginning it can roughly be followed in order. In the first post, we covered Deployment of the Hello World site (in particular I used DigitalOcean's 1-click WordPress launcher to set up a Droplet VM which hosts samgould.net). The next step for me was to install a WordPress theme (which is &lt;a href="https://www.wpbeginner.com/beginners-guide/how-to-install-a-wordpress-theme/"&gt;very easy&lt;/a&gt;), essentially a Plugin, and then to configure the Design options. While not the focus of this article, the basic actions I would cover here are (all found under Appearance &amp;gt; Customise in WordPress):&lt;/p&gt;

&lt;p&gt;&lt;u&gt;&lt;/u&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Define a tagline&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Upload a 512x512 favicon&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Upload a logo (150x150 suggested but you can also use a banner)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Pick a nice colour and font scheme&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Define a sitewide header and footer (NB: it's possible to remove the theme copyright text by going into your site html root directory, which is &lt;code&gt;/var/www/your-site/&lt;/code&gt; where $your-site may depend on your Apache virtual host config, and then going to the relevant theme file, which for me is &lt;code&gt;wp-content/themes/theme-name/includes/template-tags.php&lt;/code&gt;, and editing the php code; however I'm not 100% sure if this is permitted by copyright and/or the theme license (GPL v3 for me) so I decided to be a good citizen and leave it in place. It is interesting to know that this is where the processing happens under-the-hood, though.)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Create a public nickname for the admin user, to be used in the default WordPress template ("content posted by X"); or it &lt;a href="https://www.wpbeginner.com/wp-themes/how-to-remove-author-name-from-wordpress-posts/"&gt;can be removed completely&lt;/a&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Set a site timezone (under Settings &amp;gt; General).&lt;br&gt;
&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Content
&lt;/h3&gt;

&lt;p&gt;You would probably then want to add some basic content to your site! I added stuff like:&lt;/p&gt;

&lt;p&gt;&lt;u&gt;&lt;/u&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Basic homepage&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;About Me / Bio page with links to socials&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Core functionality - a blog and list of freelance services for me; you might want an e-commerce store or something else&lt;br&gt;
&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Another important piece of functionality is enabling people to contact you. Of course it is easy to write your email address or link to other socials, but it looks quite professional to be able to &lt;strong&gt;receive email to you@yourdomain&lt;/strong&gt;. I discovered that, while it is easy to receive emails like this (very simple to &lt;a href="https://www.epik.com/support/how-to-use-email-forwarding/"&gt;set up via your domain registrar&lt;/a&gt;), sending emails requires an SMTP server. This is commonly achieved via &lt;a href="https://superuser.com/questions/208485/sending-email-from-domain-name-address"&gt;paid&lt;/a&gt; or &lt;a href="https://www.reddit.com/r/webhosting/comments/iu7ud0/sending_email_from_digital_ocean_and_linode/"&gt;free tier&lt;/a&gt; plugins but could also be &lt;a href="https://www.reddit.com/r/selfhosted/comments/rfh15d/how_do_your_selfhosted_applications_send_emails/"&gt;self-hosted&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Integrations
&lt;/h3&gt;

&lt;p&gt;The only work I have done here is beginning to automate the way I share my content across different socials. I have started posting to &lt;a href="https://dev.to/samg7b9"&gt;DEV.to&lt;/a&gt; and am using a &lt;a href="https://gist.github.com/samg7b5/2ebfc9e18cab74f5000919b9b76ae604"&gt;script I wrote which reformats WordPress markup&lt;/a&gt; so it is ready to post. This is somewhat tangential and should not be considered a necessary step to setting up your website.&lt;/p&gt;

&lt;h3&gt;
  
  
  DevOps
&lt;/h3&gt;

&lt;p&gt;In the context of this article, by DevOps what I really mean is: "&lt;strong&gt;how do I manage my code and other site assets in a way which is flexible for development and resilient to failure?&lt;/strong&gt;". In particular - version control and backups. I would also echo DigitalOcean's best practice advice and first &lt;a href="https://www.digitalocean.com/community/tutorials/initial-server-setup-with-ubuntu-20-04#step-2-creating-a-new-user"&gt;set up a non-root user&lt;/a&gt; for any work you do on the server. In the code below, we will call this user &lt;code&gt;_non_root_user_&lt;/code&gt;.&lt;/p&gt;

&lt;h4&gt;
  
  
  Version control
&lt;/h4&gt;

&lt;p&gt;At this stage, all you really need is a Git repo to hold relevant files from your site codebase. For my WordPress site, this is simply &lt;a href="https://stackoverflow.com/questions/54595787/using-wordpress-with-git-which-files-should-i-ignore/57414776#57414776"&gt;custom theme and plugin code&lt;/a&gt; (and at this stage I don't even have any custom plugins). As mentioned in the Design section above, the site root on your server may depend on your HTML server config; mine depends on &lt;a href="https://www.digitalocean.com/community/tutorials/how-to-set-up-apache-virtual-hosts-on-ubuntu-20-04"&gt;Apache virtual hosts&lt;/a&gt; and can be double checked by checking which sites are within &lt;code&gt;/etc/apache2/sites-enabled&lt;/code&gt;. As a data scientist you (should) probably know how to set up a Git repo, but there were a few gotchas which I discovered along the way:&lt;/p&gt;

&lt;p&gt;&lt;u&gt;&lt;/u&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;cd /var/www/your-site/&lt;/code&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;git init&lt;/code&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;gotcha #1&lt;/strong&gt;: &lt;code&gt;sudo chown -R non_root_user /var/www/your-site/&lt;/code&gt; (explanation &lt;a href="https://stackoverflow.com/questions/72978485/git-submodule-update-failed-with-fatal-detected-dubious-ownership-in-repositor"&gt;here&lt;/a&gt;)&lt;br&gt;
&lt;u&gt;&lt;/u&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;gotcha #2&lt;/strong&gt;: if you try to interact with the filesystem from outside the server (e.g. upload media files via web UI) then you will now get errors because the default &lt;code&gt;www-data&lt;/code&gt; worker no longer owns the files. After you finish working with the git repo you can reset it with &lt;code&gt;chown -R www-data /var/www/your-site/&lt;/code&gt;. &lt;em&gt;I'm not sure if there is a better way to handle this…&lt;/em&gt;&lt;br&gt;
&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;gotcha #3&lt;/strong&gt;: &lt;code&gt;nano .gitignore&lt;/code&gt; (this is a gotcha in the sense that it is crucially important you define (and save!) the correct gitignore. Do not commit passwords/secrets to your repo! In particular &lt;code&gt;wp-config.php&lt;/code&gt; must be excluded. I used &lt;a href="https://salferrarello.com/wordpress-gitignore/"&gt;Sal Ferrarello's "surgical" .gitignore&lt;/a&gt; as my starting point. Make sure you populate and save the file.)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;git add .&lt;/code&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;git commit -m "First commit"&lt;/code&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Create a repo, e.g. on github, and do &lt;code&gt;git remote add origin https://github.com/your-github-username/your-repo.git&lt;/code&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;git branch -M main&lt;/code&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;git push origin main&lt;/code&gt;&lt;br&gt;
&lt;u&gt;&lt;/u&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;gotcha #4&lt;/strong&gt;: although the command output says it requires a password, &lt;a href="https://stackoverflow.com/questions/68775869/message-support-for-password-authentication-was-removed-please-use-a-personal"&gt;it actually needs&lt;/a&gt; a &lt;a href="https://github.com/settings/tokens"&gt;Personal Access Token&lt;/a&gt;.&lt;br&gt;
&lt;br&gt;
&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Congrats, you have now synced your mutable (WordPress) site files to a Git repo. But this is not every component of the site - we also need to create a backup of our content and config. For that we will use phpMyAdmin (although &lt;a href="https://wordpress.org/support/article/backing-up-your-database/"&gt;there are multiple ways&lt;/a&gt; to do this).&lt;/p&gt;

&lt;h4&gt;
  
  
  Database backup
&lt;/h4&gt;

&lt;h4&gt;What is phpMyAdmin?&lt;/h4&gt;

&lt;p&gt;As WordPress &lt;a href="https://wordpress.org/support/article/phpmyadmin/"&gt;puts it&lt;/a&gt;:&lt;/p&gt;

&lt;p&gt;"An administrator’s tool of sorts, phpMyAdmin is a PHP script meant for giving users the ability to interact with their MySQL databases. WordPress stores all of its information in the MySQL database and interacts with the database to generate information within your WordPress site. A “raw” view of the data, tables and fields stored in the MySQL database is accessible through phpMyAdmin."&lt;/p&gt;

&lt;h4&gt;How to use phpMyAdmin to backup a WordPress site&lt;/h4&gt;

&lt;p&gt;Step 1: &lt;a href="https://www.digitalocean.com/community/tutorials/how-to-install-and-secure-phpmyadmin-on-ubuntu-20-04"&gt;install phpMyAdmin&lt;/a&gt;: &lt;code&gt;sudo apt install phpmyadmin&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;u&gt;&lt;/u&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;With an Apache server, you may need to run these additional commands: &lt;code&gt;sudo ln -s /etc/phpmyadmin/apache.conf /etc/apache2/conf-available/phpmyadmin.conf &amp;amp;amp;&amp;amp;amp; sudo a2enconf phpmyadmin.conf &amp;amp;amp;&amp;amp;amp; sudo service apache2 reload&lt;/code&gt;. I only needed to run the apache2 reload for some reason.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;With an Nginx server, you may need to run these additional commands: &lt;code&gt;sudo ln -s /user/share/phpmyadmin /var/www/my-site/phpmyadmin &amp;amp;amp;&amp;amp;amp; nginx -s reload&lt;/code&gt;&lt;br&gt;
&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;phpMyAdmin can then be accessed via &lt;em&gt;&lt;a href="https://yourdomain/phpmyadmin"&gt;https://yourdomain/phpmyadmin&lt;/a&gt;&lt;/em&gt; (NB: see the 'Server management and security' section below for instructions on enabling https, or get it automatically from the DigitalOcean 1-click installer). You can log in with the MySQL root user created during the DigitalOcean 1-click deployment when our LAMP stack was configured (see the &lt;a href="https://samgould.net/index.php/2022/11/02/introduction-to-web-development-for-data-scientists/"&gt;first post&lt;/a&gt;), or &lt;a href="https://devanswers.co/phpmyadmin-access-denied-for-user-root-localhost/"&gt;create a new database user with appropriate permissions&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Step 2: run a simple backup using phpMyAdmin. This is very simple to do - instructions can be found &lt;a href="https://wordpress.org/support/article/backing-up-your-database/#using-phpmyadmin"&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Congrats, you have now backed up pretty much everything you need to be able to restore your (WordPress) site in case something breaks in production!&lt;/p&gt;

&lt;h4&gt;
  
  
  What about CI/CD?
&lt;/h4&gt;

&lt;p&gt;A pattern for deploying different codebase versions is &lt;a href="https://www.redhat.com/en/topics/devops/what-is-blue-green-deployment"&gt;blue/green deployment&lt;/a&gt;. This is possible to do, for example &lt;a href="https://techannotation.wordpress.com/2020/01/28/blue-green-deployment-with-apache-web-server/"&gt;using Apache virtual hosts&lt;/a&gt;, but in my opinion is overkill at this stage.&lt;/p&gt;

&lt;h3&gt;
  
  
  Server management and security
&lt;/h3&gt;

&lt;p&gt;One of the first, most important steps to securing your server is &lt;strong&gt;implementing a firewall to block unwanted traffic&lt;/strong&gt;. The simplest configuration tool is the &lt;a href="https://wiki.archlinux.org/title/Uncomplicated_Firewall"&gt;Uncomplicated Firewall (UFW)&lt;/a&gt; and the suggested configuration is to allow only SSH (port 22, rate limited), HTTP (port 80), and HTTPS (port 443) access.&lt;/p&gt;

&lt;p&gt;With a firewall (i.e. with appropriately exposed ports), you can &lt;strong&gt;further secure your website by enabling encrypted connections over HTTPS/SSL&lt;/strong&gt;. This protects user privacy, data and has additional benefits like preferential treatment by search engines. The simplest way to implement this protection is by using the &lt;a href="https://certbot.eff.org/instructions?ws=apache&amp;amp;os=ubuntufocal"&gt;Certbot tool&lt;/a&gt;, which handles the certification and renewal/reminder processes.&lt;/p&gt;

&lt;p&gt;We can block further unwanted traffic by implementing &lt;strong&gt;DDoS prevention&lt;/strong&gt;. The DigitalOcean 1-click installer for WordPress uses two layers, namely &lt;a href="https://wiki.archlinux.org/title/fail2ban"&gt;fail2ban&lt;/a&gt; (should be implemented on all architectures) and &lt;a href="https://kinsta.com/blog/xmlrpc-php/"&gt;disabling XML-RPC&lt;/a&gt; (WordPress-specific).&lt;/p&gt;

&lt;p&gt;Finally, we want to make sure that our &lt;strong&gt;server stays updated to mitigate against any potential exploits&lt;/strong&gt;. When using a Cloud VM/VPS, your Cloud provider will most likely push messages into the SSH console so that upon login you can see if you need to run updates (&lt;code&gt;udo apt update &amp;amp;amp;&amp;amp;amp; sudo apt upgrade&lt;/code&gt;). Since vulnerabilities should be patched in a timely manner, it is sometimes recommended to &lt;a href="https://www.cyberciti.biz/faq/set-up-automatic-unattended-updates-for-ubuntu-20-04/"&gt;run updates in an automated and unattended manner&lt;/a&gt; (i.e. as soon as they are available).&lt;/p&gt;

&lt;h3&gt;
  
  
  Conclusion
&lt;/h3&gt;

&lt;p&gt;We have completed our first batch of activities from each section of my task management framework, taking us from a simple Hello World to a site which is version controlled, backed-up and secured from attackers. Congratulations!&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Disclaimer: I am not a security professional and you should DYOR. You are responsible for the security of your server/website.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;This content was originally posted on &lt;a href="https://samgould.net/index.php/2022/11/22/web-development-for-data-scientists-core-functionality-and-a-devsecops-foundation/"&gt;samgould.net&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>datascience</category>
      <category>devops</category>
    </item>
    <item>
      <title>Introduction to Web Development for Data Scientists</title>
      <dc:creator>Sam Gould</dc:creator>
      <pubDate>Thu, 17 Nov 2022 14:20:44 +0000</pubDate>
      <link>https://dev.to/samg7b9/introduction-to-web-development-for-data-scientists-737</link>
      <guid>https://dev.to/samg7b9/introduction-to-web-development-for-data-scientists-737</guid>
      <description>&lt;p&gt;You are a data scientist used to helping organisations answer their strategic questions using data and technology. When you are in engineering mode, you spend your time:&lt;/p&gt;

&lt;p&gt;&lt;u&gt;&lt;/u&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;building data pipelines in Python and SQL&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;modelling and analysing data in Python or R&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;using REST APIs and Linux containers for lightweight app deployments&lt;br&gt;
&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;You now want to build a website. What are the right tools and approaches for getting started with web development?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This is the situation in which I recently found myself while building &lt;a href="//www.samgould.net/"&gt;samgould.net&lt;/a&gt;. There are lots of reasons to build a website:&lt;/p&gt;

&lt;p&gt;&lt;u&gt;&lt;/u&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Startups/SaaS businesses need websites and web apps&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;It gives you a self-owned platform for creativity and expression&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Web dev is a new topic to learn&lt;br&gt;
&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;If you are working as a data scientist then you already have a lot of the foundational technical skillset required for web development. But the modern proliferation of languages, frameworks and platforms can make getting started seem daunting. Here's how I did it.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--s62_n2AE--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/rl984os9cydnicb88r8y.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--s62_n2AE--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/rl984os9cydnicb88r8y.png" alt="Let's tackle our Hello World foe" width="854" height="480"&gt;&lt;/a&gt;Let's tackle our Hello World foe&lt;/p&gt;

&lt;h2&gt;
  
  
  How is a website structured?
&lt;/h2&gt;

&lt;p&gt;My experience in deploying machine learning MVPs gave me a rough mental model of a website's internals as a starting point. We will need a backend handling application/business logic, a frontend providing pages for a user to view, and some kind of API connecting the two. Incoming internet requests are pointed at our server host via a DNS lookup (mapping URL to IP). We will probably also need some layer handling incoming traffic (API/routing and traffic load balancing).&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--r41TsKzQ--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/xkp1kf92vpn2apyj5wvj.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--r41TsKzQ--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/xkp1kf92vpn2apyj5wvj.jpg" alt="A rough mental model of a website architecture" width="880" height="341"&gt;&lt;/a&gt;A rough mental model of a website architecture&lt;/p&gt;

&lt;h3&gt;
  
  
  What are the main types of website?
&lt;/h3&gt;

&lt;p&gt;We can refine this picture by considering the use case. In a simple blog, the frontend user experience is the same for everyone: I request some content, like an article, and the website serves it to me; but in a more complex application, the experience might depend on the user and some data exchange with the server. The key distinction here is that of &lt;a href="https://about.gitlab.com/blog/2016/06/03/ssg-overview-gitlab-pages-part-1-dynamic-x-static/#a-static-vs-dynamic-website"&gt;&lt;strong&gt;static vs. dynamic websites&lt;/strong&gt;&lt;/a&gt;. A dynamic site typically uses static templates which it &lt;a href="https://developer.mozilla.org/en-US/docs/Learn/Server-side"&gt;dynamically populates based on client requests&lt;/a&gt;. An idea of our desired website functionality in these terms will be important for technology selection and architecture.&lt;/p&gt;

&lt;h3&gt;
  
  
  What are the main web dev technologies?
&lt;/h3&gt;

&lt;p&gt;There are various ways to research this question. I looked at posts on popular developer sites StackOverflow, IndieHackers, DEV.to, Reddit:&lt;/p&gt;

&lt;p&gt;&lt;u&gt;&lt;/u&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://survey.stackoverflow.co/2022/#section-most-loved-dreaded-and-wanted-web-frameworks-and-technologies"&gt;https://survey.stackoverflow.co/2022/#section-most-loved-dreaded-and-wanted-web-frameworks-and-technologies&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://insights.stackoverflow.com/survey/2021#most-loved-dreaded-and-wanted-webframe-love-dread"&gt;https://insights.stackoverflow.com/survey/2021#most-loved-dreaded-and-wanted-webframe-love-dread&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://www.indiehackers.com/post/whats-your-2021-tech-stack-for-web-apps-1c61893a52"&gt;https://www.indiehackers.com/post/whats-your-2021-tech-stack-for-web-apps-1c61893a52&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://dev.to/bjakyt/which-technologies-would-you-choose-for-your-next-web-project-3a4h"&gt;https://dev.to/bjakyt/which-technologies-would-you-choose-for-your-next-web-project-3a4h&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://www.reddit.com/r/webdev/comments/rlbuwo/how_to_deploy_web_app/"&gt;https://www.reddit.com/r/webdev/comments/rlbuwo/how_to_deploy_web_app/&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://www.reddit.com/r/webdev/comments/rlhalf/frameworks_for_creating_a_static_webpage/"&gt;https://www.reddit.com/r/webdev/comments/rlhalf/frameworks_for_creating_a_static_webpage/&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://www.reddit.com/r/sveltejs/comments/rdwef6/sveltekit_stacks/"&gt;https://www.reddit.com/r/sveltejs/comments/rdwef6/sveltekit_stacks/&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A minimal static site is, at its core, HTML and CSS, with bits of JavaScript sprinkled in for dynamic functionality. However, as a Python-using data scientist, I am used to high-level frameworks doing the heavy lifting for me, and I don't fancy the sound of learning three new languages. But which one to pick? There is clearly a massive array of web dev frameworks out there. &lt;strong&gt;In any form of programming, there are many ways to skin a cat&lt;/strong&gt;. The most important thing is to pick something which works well enough for us to build out our use case. With a Python background, the frameworks which jump out to me are Django and Flask.&lt;/p&gt;

&lt;h2&gt;
  
  
  Python web frameworks
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Django
&lt;/h3&gt;

&lt;p&gt;Django is "a high-level Python web framework that encourages rapid development and clean, pragmatic design". It is appealing as a "batteries-included" framework which handles a lot of boilerplate and functionality which is important but unfamiliar to a developer from a non-web background. It is considered "somewhat opinionated" and &lt;a href="https://developer.mozilla.org/en-US/docs/Learn/Server-side/Django/Introduction#what_does_django_code_look_like"&gt;utilises a 'model-view-template' design pattern&lt;/a&gt;. A model is a piece of backend logic which is invoked by a view, which is an HTTP request handler. Views use templates to format the data for display to the client.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://docs.djangoproject.com/en/4.0/intro/tutorial01/"&gt;official docs provide the best place to begin&lt;/a&gt; with Django development (project setup etc.) but there are &lt;a href="https://djangocentral.com/building-a-blog-application-with-django/"&gt;alternative tutorials&lt;/a&gt; too. The docs &lt;a href="https://docs.djangoproject.com/en/4.0/howto/deployment/wsgi/"&gt;explain the concept of WSGI&lt;/a&gt; (the Web Server Gateway Interface - the standard which lets your Python code communicate with web requests) - Django's &lt;code&gt;startproject&lt;/code&gt; command sets up a minimal default WSGI configuration.&lt;/p&gt;

&lt;p&gt;Looking at our diagram, the other missing puzzle piece is hosting. &lt;a href="https://djangocentral.com/deploy-django-with-nginx-gunicorn-postgresql-and-lets-encrypt-ssl-on-ubuntu/"&gt;Django applications are typically&lt;/a&gt; wrapped &lt;a href="https://www.digitalocean.com/community/tutorials/how-to-set-up-django-with-postgres-nginx-and-gunicorn-on-ubuntu-20-04"&gt;in lightweight web server frameworks&lt;/a&gt; (Nginx or Apache) to expose ports and allow connections in (i.e. HTTP requests to the frontend), and run on Cloud VMs, for example, in order to have a public IP address. It is possible, but not recommended, to run on your own hardware as a server.&lt;/p&gt;

&lt;p&gt;With this new knowledge of Django, we can update our mental model:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--Dq0afQh---/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/oywa0z9mjx4k3l9fa6ac.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--Dq0afQh---/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/oywa0z9mjx4k3l9fa6ac.jpg" alt="High level architecture of a website using Django" width="880" height="325"&gt;&lt;/a&gt;High level architecture of a website using Django&lt;/p&gt;

&lt;p&gt;If you do choose to develop with Django, then consider exploring &lt;a href="https://djangopackages.org/categories/frameworks/"&gt;popular frameworks and toolkits&lt;/a&gt;. To stay on the pulse of the Django ecosystem, see &lt;a href="https://nemecek.be/"&gt;Filip Němeček&lt;/a&gt;'s &lt;a href="https://djangofeeds.com/"&gt;DjangoFeeds&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Flask
&lt;/h3&gt;

&lt;p&gt;Flask is an alternative Python library, often &lt;a href="https://hackr.io/blog/flask-vs-django"&gt;compared with Django&lt;/a&gt;: "a lightweight WSGI web application framework. It is designed to make getting started quick and easy, with the ability to scale up to complex applications". As can be seen from the &lt;a href="https://github.com/pallets/flask/"&gt;source code&lt;/a&gt;, it can spin up a Hello World API routing example in just a few lines of extremely simple, Pythonic code. I personally wanted to explore heavier features such as front-end admin panels without spending too much time on a learning curve, so did not delve deeper into Flask - one to revisit in future.&lt;/p&gt;

&lt;p&gt;Alternative microframeworks include &lt;a href="https://www.reddit.com/r/Python/comments/yo5zb3/microframework_recommendations/"&gt;FastAPI and Starlite&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Static Site Generators
&lt;/h2&gt;

&lt;p&gt;As we have seen, developing a website in Python is an extremely viable option. But suppose that you have a laser focus on a simple use case: a static blog site. In this scenario, a Static Site Generator (SSG) might be the more efficient option.&lt;/p&gt;

&lt;p&gt;Let's slightly redefine our mental model. A common pattern for developing a site which serves static content involves less two-way communication than in Django's MVT: content is created when the site is developed, not when the user requests it, so why not generate all of our site's HTML at build time too? We can push our content (typically a collection of Markdown files) into a CI/CD platform (e.g. GitHub), run an SSG to inject it into templates and spit out a bunch of static HTML pages, and then serve these to the user. This typically makes for an extremely fast website with minimal development overhead.&lt;/p&gt;

&lt;h3&gt;
  
  
  Hugo
&lt;/h3&gt;

&lt;p&gt;A nice SSG is Hugo, which can be used to &lt;a href="https://kinsta.com/blog/hugo-static-site/#how-to-deploy-a-hugo-site"&gt;quickly set up a blog&lt;/a&gt;. It is trivial to then host these static files on a CDN hosting service like &lt;a href="https://kinsta.com/blog/hugo-static-site/#how-to-deploy-a-hugo-site"&gt;Netlify&lt;/a&gt; or &lt;a href="https://gohugo.io/hosting-and-deployment/hosting-on-github/"&gt;GitHub Pages&lt;/a&gt;, although &lt;a href="https://github.com/anzharip/Host-a-Static-Website-with-Hugo-and-NGINX-on-Ubuntu-16.04"&gt;deploying through an Nginx server&lt;/a&gt; is also &lt;a href="https://pvera.net/posts/create-site-nginx-hugo/"&gt;very doable&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--Sq51V1PW--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/9d5qp5zxrqbyhiauip1b.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--Sq51V1PW--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/9d5qp5zxrqbyhiauip1b.jpg" alt="High level architecture of a website using Hugo" width="880" height="231"&gt;&lt;/a&gt;High level architecture of a website using Hugo&lt;/p&gt;

&lt;p&gt;Hugo is not the only SSG, there are &lt;a href="https://jamstack.org/generators/"&gt;many others&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Content Management Systems
&lt;/h2&gt;

&lt;p&gt;With the previous two methods, we can successfully host our content on the internet. So what happens when our scope expands and we need to add new site functionalities, like collaborative editing of front-end content?&lt;/p&gt;

&lt;p&gt;A &lt;a href="https://jamstack.org/headless-cms/"&gt;Content Management System (CMS)&lt;/a&gt; can be used to create and edit a site's content. &lt;a href="https://www.themexpert.com/blog/static-site-generator-vs-cms"&gt;Like with an SSG&lt;/a&gt;, this content is then injected into HTML templates to be served on the front-end. The difference, architecturally speaking, is that the CMS can be dynamically coupled with the front-end. From a development perspective, popular CMSes are highly-featured with highly mature plugin ecosystems, meaning it is quicker to implement advanced site functionality such as e-commerce integration and user access roles.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Note: &lt;a href="https://jamstack.org/what-is-jamstack/"&gt;Jamstack&lt;/a&gt; is the name of a web development architecture based on core principles around decoupling the front end web experience from the data and business layers, with a focus on delivery as static sites. Both SSGs and CMSes can be used in this way: a CMS can be used in 'headless' mode - i.e. backend only, requiring a separate presentation layer to handle design, site structure and templates. This CMS usage mode is &lt;a href="https://levelup.gitconnected.com/spa-ssg-ssr-and-jamstack-a-front-end-acronyms-guide-6add9543f24d"&gt;technically more aligned to Jamstack&lt;/a&gt;, but the distinction appears to make little practical difference during early stage web dev.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;WordPress (the open source WordPress.org, not the managed service WordPress.com) is the most popular CMS in use today. It is straightforward to &lt;a href="https://www.digitalocean.com/community/tutorials/how-to-install-wordpress-with-lemp-on-ubuntu-20-04"&gt;deploy a WordPress instance into a VM running Nginx&lt;/a&gt; or Apache.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--Jbc-XzBF--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/kp0qoasy867epbh7fbgb.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--Jbc-XzBF--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/kp0qoasy867epbh7fbgb.jpg" alt="High level architecture of a website running WordPress" width="880" height="392"&gt;&lt;/a&gt;High level architecture of a website running WordPress&lt;/p&gt;

&lt;p&gt;From my perspective, this is a great option. It gives me control over my server - I have more fine-grained control because I am paying for IaaS (a VM server) rather than PaaS (website hosting). I am (still) using open source tools which are, on the whole, very approachable from a data science background. Personally, WordPress itself is still a bit of a black box of PHP code, which is why I stopped representing the distinction between back- and front-ends in the diagram, but the trade-off is that I have immediate access to plugins and integrations.&lt;/p&gt;

&lt;p&gt;This architecture is one implementation of what is known as the LAMP/LEMP stack: Linux (my Ubuntu VM), Apache/Nginx (pronounced Engine-X, hence "E"), MySQL, PHP. For reference, a typical JS web app stack would be something like &lt;a href="https://www.ibm.com/cloud/blog/lamp-vs-mean"&gt;MEAN&lt;/a&gt; or MERN (but there are &lt;a href="https://www.youtube.com/watch?v=FQPlEnKav48"&gt;lots of other possibilities&lt;/a&gt;).&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;There are multiple ways to build a website, and the choice of one should balance use case requirements against ease of development. Each approach conforms to the basic server hosting mental model coming from a Python/ML/DS background, with key distinctions depending on website archetype (static/dynamic web page/app) - but essentially we are putting a bunch of HTML files on a server and exposing them to the internet via a webserver service. Django lets Python developers focus on implementing business logic, Hugo can easily spin up static blogs, and WordPress provides instant access to mature plugins. There is no single right way to do web development - but it is important to remember the end goal and not fixate on the technology choice.&lt;/p&gt;

&lt;p&gt;I have skimmed over some details in this article, notably DNS setup (which is really &lt;a href="https://landchad.net/basic/dns/"&gt;quite straightforward&lt;/a&gt;) and best practices for server maintenance, which include the following recommendations. You can also refer to &lt;a href="https://marketplace.digitalocean.com/apps/wordpress"&gt;DigitalOcean's 1-click WordPress installer&lt;/a&gt; to see what additional configurations it performs on its Apache server:&lt;/p&gt;

&lt;p&gt;&lt;u&gt;&lt;/u&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://www.digitalocean.com/community/tutorials/initial-server-setup-with-ubuntu-20-04"&gt;Create a non-root sudo user&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://www.digitalocean.com/community/tutorials/initial-server-setup-with-ubuntu-20-04"&gt;Set up a basic UFW firewall&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://landchad.net/basic/certbot/"&gt;Use Certbot to set up SSL/HTTPS&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://www.digitalocean.com/community/tutorials/how-to-set-up-apache-virtual-hosts-on-ubuntu-20-04"&gt;Use Apache's virtual hosts file to manage site access&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For a live production site, you should go on to explore:&lt;/p&gt;

&lt;p&gt;&lt;u&gt;&lt;/u&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Security - fail2ban and DDoS prevention (DigitalOcean disables XML-RPC)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Backups - for example via git or using the Cloud hosting provider's services&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Staging and deployment - commonly using the blue/green pattern via Apache's virtual hosts file&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;SEO&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Sustainable content development - e.g. using content frameworks&lt;br&gt;
&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;but these are topics for future posts! I hope this was helpful for anyone looking to develop their site.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;This content was originally posted on &lt;a href="https://samgould.net/index.php/2022/11/02/introduction-to-web-development-for-data-scientists/"&gt;samgould.net&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>datascience</category>
    </item>
  </channel>
</rss>
