Forem

Maiko Miyazaki
Maiko Miyazaki

Posted on

How to create hourly weather auto-collector using AWS Lambda with Python 🐍

In this post, we'll be creating an hourly weather auto-collector using AWS Lambda.

This Lambda function will collect the previous day hourly weather data from OpenWeatherMap API, clean it, then upload it on the S3 bucket.

The weather data will look like this:
Weather data in CSV

And it will be done once a day, so we'll have yesterday's hourly weather data uploaded daily.

Assuming we already have:

Steps we will go through

1️⃣ Create a policy
2️⃣ Create a role
3️⃣ Set up a Lambda Function
4️⃣ Write code to fetch, clean the data from OpenWeatherMap
5️⃣ Install libraries in the same directory
6️⃣Upload the zip file on the Lambda console
7️⃣ Set up CloudWatch
➡️ Done!

Let's get started 😊


1. Create a policy

To begin with, we first need to create a policy allowing a role to upload files on our S3 bucket.

Open your IAM console, and click Policies, then Create permission button.

Create policy

On the next page, we'll be able to write a policy in JSON format. It should look like this:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": "s3:PutObject",
           "Recource": "arn:aws:s3:::yourbucketnamehere/*"
        }
    ]
}
Enter fullscreen mode Exit fullscreen mode

When it's ready, click Next: Tags button, then click the Next button to review.

Name the policy and click Create policy.


2. Create a role

Now we will create a role that we will attach the policy we just created.

On the same IAM console, click Roles, then Create role.
IAM console

On the next page, select AWS services as the entity and Lambda as a use case. Click the Permission button to go to the next page.
Create a role

On the next page, you can define what permission you are going to give to this role. Search the policy name you have given to the policy we just created.

attach policy to the role

Check the policy and click the Next button. Click the Next button again to name your role.

Then click Create role.

The Lambda function we will be developing in the next step will be allowed to upload files if this role is attached.


3. Set up Lambda Function

Next, we will set up our Lambda function.
Open the Lambda console and click Create button.

Lambda console

Select Author from scratch, and enter the Function name you like. In this tutorial, we will use Python3.8.

create lambda function page

On the Permissions section, select Use an existing role. Then choose your role we just created. When it's done, we can click the Create Function button.

Lambda function permissions

Now we could attempt writing function code on the page, but most of our essential libraries cannot be imported through the Lambda function.

Lambda function attempt

Attempt to write code

failed

Unable to import modules.

Therefore we will first need to create the lambda_function.py file locally and install the necessary libraries on the same directory.


4. Write code to fetch, clean the data from OpenWeatherMap

Ok, now we will write code to clean the weather data fetched from OpenWeatherMap API.

Create a folder locally with lambda_function.py file in it.

Step1: Get the data

We start by getting the hourly weather data in a certain location.

In your lambda_function.py file, firstly import all the necessary libraries and define lambda_handler function.

import os
import sys
import requests
import json
import pandas as pd
from datetime import datetime, date, timedelta, timezone
import boto3
import csv

def lambda_handler(event, context):
    ## We will start writing code here
Enter fullscreen mode Exit fullscreen mode

Inside the lambda_handler function, we'll fetch weather data from OpenWeatherMap. To do so, firstly assign necessary information into variables.

    api_key = 'your_own_api_key_here'
    url = 'https://api.openweathermap.org/data/2.5/onecall/timemachine'
    yesterday = datetime.now() - timedelta(days=1)
    timestamp = round(datetime.timestamp(yesterday))
    params = {
        'lat': '53.349805',
        'lon': '-6.26031',
        'units': 'metric',
        'dt': timestamp,
        'appid': api_key
    }
Enter fullscreen mode Exit fullscreen mode

The geographical coordinates are set to Dublin, Ireland. Please find out your location by googling longitude and latitude [your location]. South and East are set to negative.

Now we will send a request to get hourly weather data for yesterday.

result = requests.get(url=url, params=params)
result_json = result.json()
Enter fullscreen mode Exit fullscreen mode

In the result_json variable, we should have:

{'lat': 53.3498,
 'lon': -6.2603,
 'timezone': 'Europe/Dublin',
 'timezone_offset': 0,
 'current': {'dt': 1613668586,
  'sunrise': 1613633820,
  'sunset': 1613670073,
  'temp': 6.92,
  'feels_like': 1.1,
  'pressure': 998,
  'humidity': 76,
  'dew_point': 2.99,
  'uvi': 0.84,
  'clouds': 75,
  'visibility': 10000,
  'wind_speed': 6.17,
  'wind_deg': 240,
  'wind_gust': 12.35,
  'weather': [{'id': 803,
    'main': 'Clouds',
    'description': 'broken clouds',
    'icon': '04d'}]},
 'hourly': [{'dt': 1613606400,
   'temp': 9.62,
   'feels_like': 4.5,
   'pressure': 991,
   'humidity': 81,
   'dew_point': 6.52,
   'clouds': 75,
   'visibility': 10000,
   'wind_speed': 6.17,
   'wind_deg': 160,
   'weather': [{'id': 500,
     'main': 'Rain',
     'description': 'light rain',
     'icon': '10n'}],
   'rain': {'1h': 0.51}},
  {'dt': 1613610000,
   'temp': 9.75,
   'feels_like': 2.49,
   'pressure': 987,
   'humidity': 81,
   'dew_point': 6.65,
   'clouds': 75,
   'visibility': 10000,
   'wind_speed': 9.26,
   'wind_deg': 170,
   'wind_gust': 16.46,
   'weather': [{'id': 500,
     'main': 'Rain',
     'description': 'light rain',
     'icon': '10n'}],
   'rain': {'1h': 0.89}}, .....
Enter fullscreen mode Exit fullscreen mode

Step2: Clean the data

What we want to know is only the hourly data from the JSON data, so we get that part in weather_data variable using pandas.

weather_data = pd.json_normalize(data=result_json['hourly'])
Enter fullscreen mode Exit fullscreen mode

Result:
Weather data normalized

However, what we want to get from this dataframe is only dt and feels_like, and the data that are still nested inside the 'weather' column, so we will change the above code into this:

weather_data = pd.json_normalize(data=result_json['hourly'], record_path='weather',
                                 meta=['dt','feels_like'])
Enter fullscreen mode Exit fullscreen mode

Then we will get only the necessary data.
Only necessary data

We can use these data as it is, but I chose to remove some of the columns as they are redundant.

weather_data = weather_data.drop(['main', 'description', 'icon', 'temp', 'clouds'], 1)
Enter fullscreen mode Exit fullscreen mode

Now we get the minimum required data in the data frame.
Cleaned data

We would also like the dt column as easy-to-read format, so we will change it too.

weather_data['dt'] = weather_data['dt'].apply(lambda x: datetime.fromtimestamp(x))
# we will also assign date as a part of file name later on.
date = weather_data['dt'][0].strftime("%m-%d-%Y")
Enter fullscreen mode Exit fullscreen mode

Result:
timestamp into datetime

We could also change the format to "%m/%d/%Y %H:%M:%S" if you'd like.

weather_data['dt'] = weather_data['dt'].apply(lambda x: x.strftime("%m/%d/%Y %H:%M:%S"))
Enter fullscreen mode Exit fullscreen mode

Result:
date format changed

In this tutorial, we don't want the data from 0:00 to 5:00, 21:00 to 23:00, so we'll get rid of them as well.

weather_data = weather_data.drop(weather_data.index[21:])
weather_data = weather_data.drop(weather_data.index[:6])
Enter fullscreen mode Exit fullscreen mode

Result:
hourly weather data from 6am to 8pm

Step3: Write a file on the S3 bucket

Now we will write the data into a file and upload it on the S3 bucket.

// Convert the data frame into CSV
csv_data = weather_data.to_csv(index=False)

s3 = boto3.resource('s3')
bucket = s3.Bucket('your_bucket_name_here')
key = '{}.csv'.format(date)

with open("/tmp/{}.csv".format(date), 'w') as f:
    csv_writer = csv.writer(f, delimiter=",")
    csv_reader = csv.reader(csv_data.splitlines())
    for row in csv_reader:
        # each row looks like this..
        # ['id', 'dt', 'feels_like']
        # ['801', '02/18/2021 06:00:00', '-2.49']
        # ['801', '02/18/2021 07:00:00', '-1.84']....
        # write each row on f using csv_writer
        csv_writer.writerow(row)
bucket.upload_file("/tmp/{}.csv".format(date), key)
Enter fullscreen mode Exit fullscreen mode

5. Install libraries in the same directory

In AWS Lambda, many libraries cannot be imported therefore we need to have them in the same directory where we have the lambda_function.py.

For our lambda function, we need to have NumPy, Pandas and Requests installed.

I have found this article extremely helpful, so please have a look if you'd like to know the way step by step.

After installing all the libraries, we need to compress all the file. I have mine as archive.zip but the name doesn't really matter.

my files are ready to go


6. Upload the zip file on the Lambda console

In your lambda console, we'll be able to find Upload a zip file button inside the Actions dropdown. Up load your zip file from there.

Alt Text

When it's done, we can run a test from the Test button that is located top-right of the page. You'll need to configure the test event, but you don't have to do much here, so just name the test and hit Create. Then hit the Test again.

Test Run successful

Sweet! It says the test has run successfully.
Let's see if the CSV file is correctly saved in the S3 bucket.

CSV file is saved on S3 bucket

CSV file that are uploaded from lambda function

It seems the file is uploaded correctly.

7. Set up CloudWatch

Setting Cloudwatch for our lambda function enables the function to run automatically.

Let's open the CloudWatch console. Click the Create rule button.
CloudWatch console

In the Event Source section, select Schedule and set our desired interval. I'll set it to run once a day.

Alt Text

In the Target section, select Lambda function and choose our function name from the list. Hit the Configure details button.

Alt Text

Name your CloudWatch rule on the next page, and hit Create rule button.

CloudWatch successfully set

Perfect!

Check if your function was run as soon as you created the CloudWatch rule as well as running as your expected interval.

Complete code in the lambda_function.py

import os
import sys
import requests
import json
import pandas as pd
from datetime import datetime, date, timedelta, timezone
import boto3
import csv


def lambda_handler(event, context):
    api_key = 'your_openweathermap_api_key_here'
    url = 'https://api.openweathermap.org/data/2.5/onecall/timemachine'
    yesterday = datetime.now() - timedelta(days=1)
    timestamp = round(datetime.timestamp(yesterday))
    params = {
        'lat': '53.349805',
        'lon': '-6.26031',
        'units': 'metric',
        'dt': timestamp,
        'appid': api_key
    }
    # Fetch hourly weather data in Dublin from OpenWeatherMap API
    input_file = requests.get(url=url, params=params)
    result_json = input_file.json()
    # Flatten and clean hourly weather data
    weather_data = pd.json_normalize(data=result_json['hourly'], record_path='weather',
                                    meta=['dt', 'temp', 'feels_like', 'clouds'])
    weather_data = weather_data.drop(['main', 'description', 'icon', 'temp', 'clouds'], 1)
    weather_data['dt'] = weather_data['dt'].apply(lambda x: datetime.fromtimestamp(x))
    date = weather_data['dt'][0].strftime("%m-%d-%Y")
    weather_data['dt'] = weather_data['dt'].apply(lambda x: x.strftime("%m/%d/%Y %H:%M:%S"))
    weather_data = weather_data.drop(weather_data.index[21:])
    weather_data = weather_data.drop(weather_data.index[:6])
    csv_data = weather_data.to_csv(index=False)

    #call your s3 bucket
    s3 = boto3.resource('s3')
    bucket = s3.Bucket('your_bucket_name_here')
    key = '{}.csv'.format(date)

    with open("/tmp/{}.csv".format(date), 'w') as f:
        csv_writer = csv.writer(f, delimiter=",")
        csv_reader = csv.reader(csv_data.splitlines())
        # Iterate over each row in the csv using reader object
        for row in csv_reader:
            # row variable is a list that represents a row in csv
            csv_writer.writerow(row)
    #upload the data into s3
    bucket.upload_file("/tmp/{}.csv".format(date), key)
Enter fullscreen mode Exit fullscreen mode

Thanks for reading!

If you have any ideas to improve the function, please leave your view in the comment! I would truly appreciate it 😊 In the meantime, follow me on Linkedin @Maiko Miyazaki

Resources

AWS Lambda with Pandas and NumPy by Ruslan Korniichuk
AWS Lambda with Pandas and NumPy|Pandas & AWS Lambda|Pandas Lambda with Python3 by BidDataOnlineSchool

Top comments (1)

Collapse
 
tejasjani1 profile image
tj_2023

Thank you Maiko! Extremely well done example!

Do you have this solution done with TerraForm by any chance?