DEV Community: Guillermo A. Fisher

Building an Unflashy Serverless COVID-19 Tracker with AWS

Guillermo A. Fisher — Sat, 28 Mar 2020 20:21:12 +0000

Really? Why?

To keep your mind off the severity of the current situation. To stay busy, and to use your skills to keep your family and friends informed. To learn something new while you keep your busy behind at home.

So… a week or so ago, out of nowhere, I decided that I wanted to be able to get the most recent COVID-19 numbers without having to use a browser. There were a few states in particular that were important to me, and I wanted to be able to see the number of confirmed cases in those states by typing a command in a terminal.

Boom, bap, bip.

That’s all there was to it when I started. I figured I could ping the CDC’s site to get the info, but I noticed that the department of health websites seemed to have the most up-to-date information. I took a look at a few of those DOH sites, viewed the source (the way only very cool people do when they surf the Interwebs), and determined that I could easily scrape some pages to get the info I wanted using regular expressions, DOM traversal, and super fancy string parsing.

What started out as a CLI became a CLI + web API because I couldn’t help myself. An HTML page came to be because my wife wanted to know what I was working on, and I didn’t think she’d be too impressed by a JSON string. I went serverless because this was a green field project and it’s 2020 and my free time is too precious to be spent OS patching and because why in the world would I pay for a server and because come on already!

What’s Involved

PHP… again

I wanted to do this quickly, so I used the language that I knew best: PHP. I threw together a PHP skeleton to start the millions of unfinished projects I tend to bring to life, and I used it for this project. I also used Flysystem’s PSR-6 implementation for filesystem caching — I was actually unaware of the PSR-6 vs. PSR-16 discussions that popped up over the years; I chose PSR-6 because I thought the provided CacheItemPoolInterface would work well in this situation, allowing me to easily change cache storage options if I felt the need.

Other Stuff

An older version of Bref (I’m not ready to make the jump over to the version that leverages the Serverless framework) made it easy for me to use AWS SAMto deploy an AWS Lambda function. I used Amazon API Gateway to serve up the endpoint, and used Amazon CloudWatch to schedule the population of the cache, which lives in Amazon S3.

Scraping

This was the most interesting challenge because it required me to keep up with the way each state rolled out their changes. New York, for instance, made several changes to the way they displayed their data over time. Every few updates, they’d wrap the count in a different set of tags, or surround it with different characters. In most cases, though, it came down to doing some simple pattern matching:

$page = file_get_contents('https://the.url'); // grab the HTML
preg_match('/total to (.*) confirmed cases/', $page, $matches);

I invariably ended up having to do some cleanup on whatever existed in $matches[1], but that ended up being pretty simple. I did some similar matching to get the timestamps for the updates.

Putting it All Together

Deploying with Travis CI and AWS SAM

To get the code deployed up to a Lambda function, I followed the steps I walked though in an older post about Bref and Lambda. The one new thing to note is that I had to set the Python version in the .travis.yml file to 3.6.7 or 3.7.1 due to updates in the AWS tools.

API Gateway Setup

Once the Lambda function was deployed, I created a REST API endpoint using the API Gateway.

Go with REST, unless you don’t want to…

When I clicked on “ Build ” in the REST API box, I was led to a screen where I was able to add resources and methods to the API. Using the “ Actions ” dropdown, I created a resource named casesby choosing the “ Create resource ” option. I enabled CORS for good measure.

Cases? Yeah… that works…

I then chose “ Create method ” from the same dropdown to associate the GET HTTP method with that resource. I was able to point to my Lambda function from the screen that appeared. I used Lambda Proxy Integration to proxy incoming requests straight through to the Lambda function.

HTTP methods make the world go ‘round

Once it was all set up, I saw a no-fuss GET method configuration.

Voila!

I deployed the API by choosing “ Deploy API ” from the “ Actions ” dropdown, chose a stage name, and added additional stage meta data.

Be practical here

Once I was done deploying the API, I saw a screen that featured a link to it, and was also presented with options for things like caching and client certificates.

If you’re following along because you’re building an API of your own, then you’ll probably want to associate it with a customized domain name — if so, follow the steps in the AWS docs.

Use This as a Starting Point…

I really didn’t do much for this project, and I’m sure you can do much, much more with it. If you’re interested, you can save the data to a DynamoDB table and use that data to build visualizations. You can ditch the scraping altogether and just use the data provided by The COVID Tracking Project (thanks for making me aware of it, Branden). You can build a more engaging application using React or Vue. Whatever you do, though, do it from the safety of your home.

If you end up using the code, or find this post useful, please let me know.

Your First Steps with AWS

Guillermo A. Fisher — Tue, 17 Mar 2020 16:30:43 +0000

Photo by Christian Chen on Unsplash

A recent tweet from Helen Anderson prompted me to think of a few things that I’ve done in the past when creating new AWS accounts for myself and others. So I’ve put together a list of five first steps — this is by no means an exhaustive list, but it should help those of you who now find yourselves with enough time and elbow room (I don’t know about you, but I prefer to experiment/fail in solitude) to finally play around with AWS.

Choose Your Services

Logging into the management console for the first time can be intimidating. At the time of this writing, there are a little less than 220 services available in AWS. You don’t need to worry about most of those services — you should only focus on the ones that are most relevant to you. If you’re new to AWS, you should review the free tier documentation to help you make a decision about the services you’ll be using. My suggestions for beginners: S3, EC2, IAM, CloudWatch, and Trusted Advisor (I’ll talk about the last 3 later on in this post).

You can modify your console experience a bit by pinning services to the toolbar at the top of the page. Here’s an excerpt from the console FAQs:

Select the pin icon beside the Resource Groups menu and drag and drop the service links you want to save as shortcuts. You have the option to display the service icon alone, the service name alone, or both together.

Focus!

Create an IAM User

When you’re starting out with AWS, it can be tempting to use your root account to manage all of your resources — don’t. You should use the root account sparingly, keep the associated credentials safe, and create an IAM user. Here’s an excerpt from the IAM Best Practices documentation:

Create an IAM user for yourself as well, give that user administrative permissions, and use that IAM user for all your work.

I recommend you stick to the advice detailed in the IAM Best Practices docs. If some of the concepts in that doc seem unfamiliar or daunting, then I suggest you focus first on these: grant yourself and others only the needed privileges, enable multi-factor authentication (MFA), and configure a strong password policy for all users in your account. Follow this guide to setup an admin user, then set up other users, groups, and roles whenever you can.

Identify yourself!

Install the CLI

You may not think you need to use the CLI initially, but I highly recommend installing it anyway. Once you get more familiar with AWS, you’ll find that it’s just more efficient to do certain things using the command line. For example: I sometimes have to create upwards of fifteen users for 757ColorCoded workshops. I could do that manually in the console, but it takes a fraction of the time for me to create those users with a simple BaSh for-loop. The CLI installation and configuration steps are well-documented. You might as well get it out of the way before you actually need to install it. That way you have it.

Create a Billing Alarm

Even if you’re using the free tier — actually, especially if you’re using the free tier — you should set up a billing alarm in CloudWatch to make sure you’re not unknowingly spending money. It’s pretty easy to accidentally exceed some of the free tier usage limits, or forget about resources you’ve created. You might, for example, set up some beefy EC2 instances in Oregon for a high availability experiment and, hypothetically speaking of course, forget to shut down those instances for a few weeks. Then, in theory, you might get a shocking bill, because you usually use other regions for HA and don’t see any EC2 instances when you log into the console and go to those regions (I mean why in the world would you randomly spin up 2 ridiculously over-provisioned servers in the west coast for no good reason????). That could possibly happen to someone.

Anyway, the aforementioned tweet that inspired this post provides a link to a pretty good guide put together by Joseph Whyle, so you should check that out. I’ve included one of my alarms below.

Watch your money!

Use Trusted Advisor

Trusted Advisor helps keep your account in line with AWS best practices. It makes recommendations about — among other things — security, performance, and cost optimization. The Basic Trusted Advisor support plan is included in all AWS accounts (read: IT’S FREE). I’ve found it to be especially useful.

Listen!

Get Going!

That’s all I’ve got for ya. I’m going to be playing around with Athena Federated Query a bit over the next few weeks. If you’ll be doing the same, or if you have any questions about anything I’ve written, you can comment here or ping me on Twitter.

An Upgrade: Part 3 — Querying a Data Lake in AWS with Amazon Athena

Guillermo A. Fisher — Thu, 13 Feb 2020 13:38:19 +0000

A sizable chunk of time has passed since Part 2 of this series. In the months since my last post, I’ve started a new open source project (more to come later), travelled, and stood up a small, manageable data lake. I’d like to say that my delay in bringing you the engrossing (or boring? YMMV) content you’ve come to expect from me is the result of me back-loading 2019 with a whirlwind of activity; the reality, I’m afraid, has more to do with indolence than anything else. Sorry.

In this post, I’ll focus on my recent use of Amazon Athena — and, as a matter of necessity, Amazon S3 and AWS Glue — to explore my Apple Health data and decide whether or not to incorporate it into my unreasonably captivating personal website.

A Bit About re:Invent 2019

The aforementioned data lake is the same data lake I reviewed during the demo portion of my DevChat at AWS re:Invent 2019, the slides for which are available on Slideshare. I won’t get into a full-blown re:Invent recap, but I will say that I had a great time at the conference — I’ve been using one word in particular to describe the experience: overwhelming. I’m thankful to AWS for giving me the opportunity to attend and meet some truly impressive people; I’m also thankful to Handshake for giving me the space to stretch myself, talk about the company’s mission, and walk purposefully around Vegas for a few days — which, incidentally, is a great way to close activity rings.

I’m the stocky character pointing at the monitor. Photo by Ross Barich.

Tracking More Than Steps

If you’ve got an iPhone, you’ve got Apple Health data; if you’ve got an Apple Watch, too, then you’ve got even more data. The Health app consolidates data about your physical activity, heart rate, etc. from your iPhone, Apple Watch, and other third-party apps into a singular data repository.

You can use the Health app to export that data so that you can play around with it yourself. What the export provides is an unwieldy set of XML files whose size is directly related to the duration of your relationship with your iPhone — if, for example, you’ve been an Apple fangirl since iOS8 (when the Health app was introduced), then you might have 5 years worth of data on your hands.

Health app data isn’t easy to parse without employing some ETL wizardry. Before authoring while statements, I looked around to see if anyone had already decided to hazard an attempt at processing the files, and I quickly came across Mark Koester’s post entitled How to Export, Parse and Explore Your Apple Health Data With Python. I read through all of the rigor involved in making sense of the data, and realized that a subset of AWS services could be employed to reduce my level of effort if I introduced a data lake into the equation.

Data Lakes Seem Complicated

While the idea of deploying a data lake may seem daunting to the uninitiated, it can actually be a fairly straightforward affair, especially in use cases like this one.

Before we dive into my setup, let’s first take a look at a definition. As AWS puts it:

A data lake is a centralized repository that allows you to store all your structured and unstructured data at any scale. You can store your data as-is, without having to first structure the data, and run different types of analytics — from dashboards and visualizations to big data processing, real-time analytics, and machine learning to guide better decisions.

The key takeaway from that definition is that a data lake is just a centralized repository — you don’t, for example, need Redshift or QuickSight to build a data lake, but those tools can help you tease insights out of the data that is stored in your repository.

Data Storage & Ingestion

I decided to keep my Apple Health data in S3, a service that is at the heart of data lakes — and a host of other services, really — that live in AWS. It’s a cost effective, scalable, durable solution that allows you to store all kinds of data and define lifecycle policies to optimize storage costs. The data in my buckets is encrypted with the AWS Key Management Service (AWS-KMS). Pro tip: always encrypt your data in a data lake both while its sitting around doing nothing (at rest) and while its moving through your system (in transit).

A concept common among data lakes is the idea of zones. You’ll see different names for zones used in data lakes across industries and use cases, but the main idea is that each zone represents data as it exists in varying states of refinement. Zones are generally represented by S3 buckets. For example, raw data — data in its original format — is kept in a raw zone.

My solution has 3 zones: raw, refined, and curated. I stored my raw health data in a bucket at this path: S3://guillermoandrae-health-data/raw/2019–10–10(it’s worth mentioning here that the folder structure you use when storing your data can affect the performance of your Athena queries; if you’re interested in solutions more complex than the current one, you should do some reading on partitioning).

On the ingestion front: I don’t know of a way to automatically download my health data (I’m open to suggestions), so I manually downloaded it and pushed it to S3 through the AWS administrative console. You’d usually want to automate the upload process.

Raw, refined, and curated zones.

Move & Transform

The Apple Health Extractor outputs a number of CSV files, which I stored in the refined zone. On the first go-round, I pushed the CSVs to that zone manually. In order to automate the execution of the script, I uploaded a modified version of the code to a Lambda function that watches the raw bucket and dumps the CSVs out into the refined bucket.

My Lambda function is a bucket stalker.

Crawling & Cataloging

In my headier programming days, I spent a ton of time writing ETL (Extract, Transform, Load) code. I also had experiences with tools that claimed to magically transform data sets from one format to another, only to have to write code to clean up the shoddy job done by those tools. When I heard about Glue, I was definitely pessimistic; seeing it in action, though, was surprising.

AWS Glue is a serverless, pay-as-you-go ETL service. You can set Glue up to crawl and catalog the data in your S3 buckets. Tell it the data location and data format, and Glue will populate your data catalog, which is stored in a Presto database that you don’t have to manage. Incredibly efficient, incredibly useful.

Creeeeeeeepy crawlers!

You’ll need to follow the prompts, define an IAM role that can be used by Glue to access data in your buckets, and name the database where your data catalog will exist.

No code to write, no servers to maintain, no databases to manage.

Simple CSVs are easy for Glue to parse — there are a few built-in classifiers for various data formats (including CSV) that allow you to get started ETL-ing stuff right away. I was able to crawl my data without issue.

Data Exploration

With my data cataloged, I was ready to begin digging into it with Athena. Amazon Athena is a serverless interactive query service that can be used to analyze data stored in S3. Once a Glue data catalog is populated, you can use Athena to write SQL queries and perform ad-hoc data analysis. Recent updates to Athena make it possible to execute SQL queries across databases, object storage, and custom data sources using the federated query feature.

Run SQL from an intuitive UI.

I ran some queries to figure out which of the columns would be most useful. In the GIF above, I paid close attention to the type field and was able to figure out what kinds of metrics could be found in the health data. Further investigation led me to the unit, value, and date-related fields creationdate, startdate, and enddate. I was ultimately able to figure out how to get some aggregates together that might be useful (check out Mark Koester’s post for more on that).

So… Is This Data Helpful?

Nah. It’s not. It’s kind of interesting, but it doesn’t tell as compelling a story as I thought it would. I couldn’t find any strong correlations between my physical activity and my social media activity. I’d intended to convert the refined data into Parquet — a columnar storage format — and store the files in my curated zone, then import those files into Redshift for data warehousing purposes. I got as far as the Parquet conversion, but decided it wasn’t worth it to incur the costs associated with spinning up Redshift servers.

And that’s fine, I think. I was able to test out a hypothesis, and it cost me just about nothing. It’s OK that the process led me to the conclusion that I am, in fact, quantifiably boring.

A Final Word on Data Lakes

I don’t want to be irresponsible, so I won’t end this post without pointing you to at least one good data lake resource and mentioning AWS Lake Formation, which will handle a lot of the dirty work of setting up a data lake for you. I should also point out that my data lake lacked the level of automation and data governance that are necessary to maintain the integrity and usefulness of a data lake, so don’t mimic my setup for anything even mildly important.

Next Steps

It’s time to start writing some code. In the next post, I’ll talk about the AWS SDK for PHP and my mysterious open source project.

Stay tuned for Part 4 in the series, friends!

An Upgrade: Part 2 — Diving Deeper into DynamoDB

Guillermo A. Fisher — Fri, 27 Sep 2019 02:01:32 +0000

In Part 1 of this series, I shared my plan for rebuilding my personal website. Before I start thinking about changes to user interfaces or HTTP responses, I need to clean up the mess I created with my poorly designed data model. In this post, I’ll focus on my experience with Amazon DynamoDB and the role that service will continue to play in my site’s architecture. Let’s go over some core DynamoDB concepts so we can shine a bright light on my missteps.

Some Basics

DynamoDB stores data in tables. If you’re familiar with other popular database systems, chances are you’ve come across tables working with those systems, too — and the idea here is pretty similar. Tables are made up of zero or more items. An item can be compared to a row in a relational database, and is made up of attributes . An attribute can be likened to a table column, and can be one of the following types: String, Binary, Number, Boolean, Null, List, Map, StringSet, BinarySet, and NumberSet.

A table made up of items that happen to be fantastic songs.

The example above is an excerpt from a Songs table. Artist, SongTitle, and AlbumTitle are attributes. The second item in that table is represented visually as a row whose attribute values are the following strings:Eric Lau, Cloudburst, and Quadrivium.

On Being Ridiculous

That all seems pretty familiar, right? And simple, too? If you’re like me, it sure does. And if you’re like me, you started reading the documentation to figure out how to get started. You read through all that jazz about tables and attributes and groggily declared to yourself, somewhere around 10 PM at night, “I know this stuff”. And then you saw, later on in the documentation, mention of “primary keys” and “indexes”, and smirked — defiantly — at the thought of having to read more explanations of concepts that you’d already “mastered”. So you skipped it all and went straight to table creation… and just figured out the rest as you went along.

Do not do any of that.

Primary Keys

Without an understanding of how primary keys work in DynamoDB, you won’t be able to design your tables in a way that will allow you to efficiently retrieve data from them. To start, there are two kinds of primary keys in DynamoDB: simple and composite.

A simple primary key is a partition key made up of one attribute. You’ll sometimes see the partition key referred to as the hash key or hash attribute because it’s used in an internal hash function that evenly distributes items across partitions. In tables that only have partition keys, each item must have its own unique partition key value.

Composite keys are made up of two attributes — the partition key and the sort key. You’ll sometimes see the sort key referred to as the range key or range attribute , as items with the same partition key are stored physically close together and sorted by the sort key value. In tables that have both partition keys and sort keys, two items can share the same partition key value so long as those two items also have different sort key values.

There’s enough Little Brother for us all.

Consider the Songs table again. It has a composite primary key, composed of the Artist attribute (partition key) and SongTitle attribute (sort key). Note that there are two songs with the same partition key values (Little Brother), but they also have different sort key values (Home and Shorty on the Lookout).

Retrieving Items

After you’ve defined your table’s primary key and have added items, you’ll eventually want to retrieve that data. That can be done in a number of ways; the primary operations you’ll use to retrieve a collection of data are the Scan and Query operations.

The Scan operation can be used to retrieve all of the data in a table. You can apply a filter expression to refine your results and define the attribute set that you’d like to see returned. A Scan can be executed without a primary key being provided in a request.

The Query operation can be used to find items based on primary key values. In order to execute a Query operation, you must, at the very least, provide the name of a primary key and a corresponding value. You can also provide a sort key and use a comparator operator to further refine your results. More refinement can be done with key condition expressions as well as filter expressions.

On Being Ridiculous: The Aftermath

I originally designed my tables without paying much attention to any of the stuff I’ve written here since the “Primary Keys” heading. I assumed DynamoDB worked the way other popular NoSQL solutions worked, and that silly assumption informed my approach.

I decided to store all of my social media posts in a posts table that felt more like a MongoDB collection than a DynamoDB table. I used an attribute called id as the table’s partition key. No sort key. And in the absence of a sort key, I had to use the Scan operation to retrieve results. But the results were never sorted as expected. So I did some Googling and found this issue:

How to order a scan results by createdAt · Issue #346 · aws-amplify/aws-sdk-ios

Here’s a quote from the thread:

DynamoDB doesn’t support sorting in a scan operation, which makes senses [sic], as ordering in a full table scan is usually unnecessary.

Ugh. So I was forced to actually read more of the docs, and I found this:

Query results are always sorted by the sort key value. If the data type of the sort key is Number, the results are returned in numeric order. Otherwise, the results are returned in order of UTF-8 bytes. By default, the sort order is ascending. To reverse the order, set the ScanIndexForward parameter to false.

Ugh * ugh. Sort keys open up retrieval options, including those made available through key condition expressions.

Righting Wrongs

A few months ago, I created a new table to hold posts, and that new table has a composite primary key. The partition key is the table’s source attribute (with values like Twitter, Instagram, etc.), and the sort key is the createdAt attribute (the post’s timestamp). Now that I can use the Query operation, I can sort results by timestamp and use key condition expressions to write more complex queries. I also plan to add at least one secondary index so I can query against other item attributes like body.

My API is still pointing to the old table. A bit of code is required to convert the different timestamp formats into one format I can use in the new table, so I have to sync the two tables using a script until I can get that conversion code into the /posts endpoint’s POST method and point to the new table (more on that later on in the series). Still, though, I’m in much better shape overall.

Guillermo uses GUIs

The last DynamoDB-related change I’ll be making involves the addition of a new tool into my workflow: NoSQL Workbench for DynamoDB. The UI is pretty intuitive, and I like not having to log into the AWS admin console to interact with my tables. There’s an operation builder feature that is especially helpful for creating filter expressions and, of course, executing operations. The builder can generate Python, Node.js, and Java code for you as well.

I like a good GUI.

Next Steps

Now that I’ve got the data set up correctly, I need to create the data pipeline that will allow me to ultimately use Amazon QuickSight to make sense of my social media activity and health data. I’ll be setting all of that up with AWS Glue and Amazon Athena.

Stay tuned for Part 3 in the series, friends!

An Upgrade: Part 1 — Devising an Approach

Guillermo A. Fisher — Fri, 27 Sep 2019 01:39:17 +0000

For a little over a year and a half, guillermoandraefisher.com has existed as a serverless application that is powered by a familiar blend of services: Amazon S3, Amazon API Gateway, and Amazon DynamoDB. Things have changed significantly since I first deployed the index.html file — for me, and for some of the services I’ve been using. In this series of posts, I’ll discuss my mistakes, highlight relevant services, and walk through the overhaul of my small, drab corner of the Web.

Continuous Learning

As a people manager, writing code isn’t something I do on a regular basis. Occasionally, though, some extraordinary circumstance forces me to enter scrupulously into a production code base and completely wreck a Scrum team’s velocity.

Tess Rinearson

@_tessr

“oh, the engineering manager has decided to start programming again”

21:59 PM - 24 Aug 2019

1425 6712

I’m tasked with developing people instead of applications, and that charge presents its own set of fascinating, difficult challenges that do not leave room for much else in my work day. However, I do need to keep up with trends in pertinent tech, and I find that iterating on my unfrequented personal website in my downtime is a low-pressure way for me to get hands-on experience with contemporary tools of the trade. At my domain, I am free to fail, often and spectacularly, without consequence. That freedom is fertile ground for foolishness (like unwarranted alliteration), and it allows me to entertain grandiose ideas such as re-architecting an application from the ground up… and blogging about it.

Devising an Approach

The home page of my website is made up of a few paragraphs of text, a small number of links, and footnotes that I personally think are fairly amusing. A RESTful-ish Web API called “Andrae” is available at the api subdomain and flaunts three endpoints: /posts, which is where anyone interested can get their hands on my social media posts in JSON format; /elephpants, which is an example endpoint I put together for a talk about PHP AWS Lambda functions built with Bref (which I’ve lauded on Medium in the past); and /nicknames, which is a worthless collection of nicknames that I’m definitely going to delete but am mentioning here for posterity. My social media posts are pushed to the API with Zapier. There’s a bare-bones search UI available at a search.html page that lets users dig through my old social media posts.

“Zaps” I’ve created to collect my social media activity in one place.

That’s the current state of affairs. I have 4 goals for this project:

I want to be able to answer questions about my social media activity using the data that I’ve been collecting. I’d like to marry it up with my health data to see if any interesting connections exist.
I want to develop working proficiency with a new language: Golang, aka Go.
I want to limit myself to serverless (read: fully managed, scalable) services because I’m lazy & cheap… or efficient & cost-conscious. Whichever you prefer.
I want to build a more engaging, more responsive, modern web application that features a proper RESTful API — and eventually a GraphQL API, just so I can say I’ve built one.

The Data Store

I’m going to continue to use DynamoDB as my primary data store. Given that my traffic numbers are low, I’m in no danger of exceeding the AWS free tier usage thresholds anytime soon, which means it won’t cost me anything to store my data. Said another way: it’ll cost me $0.00 to keep my data in a fault-tolerant, highly available, managed NoSQL database.

The Data Story

To analyze my data, I’ll catalog it using AWS Glue and use Amazon Athena to do some exploring. I’ll also use Amazon QuickSight to put together visualizations.

The Web API

I’ve already mentioned Go — I’ll be using it to build at least one API endpoint. The others will be built with PHP and Bref. All endpoints, regardless of runtime, will be composed of Lambda functions that are exposed via the API Gateway.

The Front End

A change in my professional life has steered my attention towards React. I need to have some idea of what it’s like to work with it because I support people who work with it every day. I’m especially motivated because it seems jQuery has fallen out of favor with the JavaScript crowd, and I’m mildly embarrassed that I still rely on it so much. The React app will live in S3.

Next Steps

I’m going to start this off by getting my data in order. I made some critical errors with DynamoDB when I started, but I’ve since learned a lot about the service, and I’m eager to apply what I’ve learned.

Stay tuned for Part 2 in the series, friends!

Automate Deployment of PHP Applications to AWS Lambda with Bref, AWS SAM, and Travis CI

Guillermo A. Fisher — Sat, 25 May 2019 12:36:07 +0000

If you’re a PHP developer whose cloud provider of choice is AWS, chances are you’ve suffered through a bit of serverless FOMO due to AWS Lambda’s lack of support for PHP. But thanks to the AWS Serverless Application Model (AWS SAM) and Bref — both open source projects available on GitHub — it’s now possible to deploy your PHP applications to Lambda without having to fiddle with Node.js shims. And with the help of Travis CI, running automated tests and deployments is essentially a cinch. There are, however, a few gotchas.

I Don’t Know Who or What a “Bref” Is

I didn’t, either, until a few weeks ago. Matthieu Napoli’s “Bref” (which is the French word for “brief”) has been around for a while, and has caught the attention of some prominent PHP-ers. In short:

Bref provides the tools and documentation to easily deploy and run serverless PHP applications.

You should take a look at the Bref documentation before you move on to familiarize yourself with the goals of the project, walk through the “Getting Started” steps, and read the recommendations concerning the project’s use (for example, don’t expect to use Bref to deploy your legacy application without experiencing the sort of difficulty you’d normally associate with deploying a legacy application).

Respect Due

The work detailed in this post rests comfortably on the shoulders of unselfish giants; I would be remiss if I didn’t point you to this post explaining what’s needed to deploy Python applications to AWS Lambda using AWS SAM and Travis CI. You’ll find that a lot of what is covered there was put to use here, and the author (Mike Vanbuskirk) does a great job covering the pertinent prerequisites. In fact, I consider that series to be a prerequisite to this post, and suggest you read it before continuing, unless you’re impatient and just want to get to the goodies.

So… Travis CI?

Alright. Into the fray. I’ve set up a sample application at https://github.com/guillermoandrae/bref-hello-world, and will be using its .travis.yml file to explain this process:

language: generic
dist: xenial
before\_install:
  - phpenv global 7.2
install:
  - pip install --user awscli
  - pip install --user aws-sam-cli
script:
  - composer install --optimize-autoloader
  - composer test
  - composer package
deploy:
  provider: script
  script: composer deploy
  skip\_cleanup: true
  on:
    repo: guillermoandrae/bref-hello-world
env:
  global:
  - AWS\_DEFAULT\_REGION=us-east-1
  - secure: <obfuscated AWS Access Key ID>
  - secure: <obfuscated AWS Secret Key>

The Biggest Gotcha: The Build Environment

Let’s start at the tippy top:

language: generic
dist: xenial

A cursory look at the configuration file reveals two points:

I’m using phpenv, which is a PHP version manager.
I’m using pip, which is the standard package manager for Python.

The build environment needs to have both PHP and Python installed in order for all of the necessary tasks to be successfully executed. If you’ve used Travis CI for your PHP projects before, you’ve probably specified the language at the top of the YAML file with a line like language: php, which would prompt Travis CI to use a PHP build environment. I needed to use a lesser known language designation that supported my use case.

Travis CI provides two build environments that are not language-specific: the minimal environment and the generic environment. The former is, as the name implies, pretty bare. Aside from gcc, curl, and a few other tools, minimal is not outfitted with much besides Docker and Python. The generic environment, on the other hand, includes everything included in the minimal environment as well as some additional runtimes such as Go, Ruby, and PHP.

If you’re not familiar with it, the dist parameter of a Travis CI configuration file is used to denote the specific distribution of the build environment that you’d like to use. With Ubuntu images, you’ve got three options — trusty (the default, used when no distribution is specified), precise, and xenial —and they all contain specific configuration settings. In the case of the minimal and generic build environments, however, only precise and xenial are available. I’m using the xenial environment, as it’s the one that contains a suitable version of PHP.

A Little More About PHP Versions

By default, the xenial environment uses the lowest of the three versions of PHP that are installed: 5.6. To change that, I used phpenv to specify the version that meets our needs. phpenv is installed on all Travis CI images that run PHP:

before\_install:
- phpenv global 7.2

CLI Tool Installation vs. Python Versions

In Mike’s series, the installation of the AWS CLI and AWS SAM CLI tools seemed pretty straight forward, and I had no problem installing them on my laptop. I, however, found myself running into this error using Travis CI:

InsecurePlatformWarning: A true SSLContext object is not available. This prevents urllib3 from configuring SSL appropriately and may cause certain SSL connections to fail. You can upgrade to a newer version of Python to solve this. For more information, see https://urllib3.readthedocs.io/en/latest/security.html#insecureplatformwarning.

I tried using the python parameter, then the TRAVIS_PYTHON_VERSION environment variable to specify a newer Python version, but neither brought me joy. So, after some focussed Googling, I was led to a solution that involved adding the --user flag to the installation commands. So I ended up with:

install:
- pip install --user awscli
- pip install --user aws-sam-cli

Composer Scripts FTW

I really, really, really, really, really like Composer scripts. In the past, I used Phing as a task runner without realizing that I could leverage the scripts feature of Composer to do almost exactly the same thing. I’ve since seen the proverbial light and haven’t turned back:

script:
- composer install --optimize-autoloader
- composer test
- composer package

A look at an excerpt of the project’s composer.json file will shed some light on exactly what is going on:

The test script includes calls to Squizlabs’ code beautifier and code sniffer as well as a call to run the tests and generate coverage reports in both text format and Clover format (the Clover format is especially useful if you plan to integrate Travis CI with a tool/service that can create code coverage visualizations or store a project’s code coverage history). The package script calls sam package to build the stack configuration file.

Deploying… Finally!

I’m using Travis CI’s script deployment option to call my deployment script and associate the appropriate GitHub repository. The details behind the composer deploy call below are uncovered in the aforementioned composer.json excerpt:

deploy:
  provider: script
  script: composer deploy
  skip\_cleanup: true
  on:
    repo: guillermoandrae/bref-hello-world

That’s All, Folks

We’re done. PHP on AWS Lambda with a proper CI/CD pipeline. No fuss. Life is good. If you have any questions or comments, don’t be shy about sharing them.