loading...

Ingesting All The Weather Data With Apache NiFi

tspannhw profile image Timothy Spann Originally published at datainmotion.dev on ・3 min read



Ingesting All The Weather Data With Apache NiFi

Step By Step NiFi Flow

  1. GenerateFlowFile - build a schedule matching when NOAA updates weather
  2. InvokeHTTP - download all weather ZIP
  3. CompressContent - decompress ZIP
  4. UnpackContent - extract files from ZIP
  5. *RouteOnAttribute - just give us ones that are airports (${filename:startsWith('K')}). optional.
  6. *QueryRecord - XMLReader to JsonRecordSetWriter. Query : SELECT * FROM FLOWFILE WHERE NOT location LIKE '%Unknown%'. This is to remove some locations that are not identified. optional.
  7. Send it somewhere for storage. Could put PutKudu, PutORC, PutHDFS, PutHiveStreaming, PutHbaseRecord, PutDatabaseRecord, PublishKafkaRecord2* or others.

URL For All US Data

invokehttp.request.url

https://w1.weather.gov/xml/current\_obs/all\_xml.zip

Example Record As Converted JSON

[ {

"credit" : "NOAA's National Weather Service",

"credit_URL" : "http://weather.gov/",

"image" : {

"url" : "http://weather.gov/images/xml\_logo.gif",

"title" : "NOAA's National Weather Service",

"link" : "http://weather.gov"

},

"suggested_pickup" : "15 minutes after the hour",

"suggested_pickup_period" : 60,

"location" : "Stanley Municipal Airport, ND",

"station_id" : "K08D",

"latitude" : 48.3008,

"longitude" : -102.4064,

"observation_time" : "Last Updated on Jul 10 2020, 9:55 am CDT",

"observation_time_rfc822" : "Fri, 10 Jul 2020 09:55:00 -0500",

"weather" : "Fair",

"temperature_string" : "66.0 F (19.0 C)",

"temp_f" : 66.0,

"temp_c" : 19.0,

"relative_humidity" : 83,

"wind_string" : "South at 6.9 MPH (6 KT)",

"wind_dir" : "South",

"wind_degrees" : 180,

"wind_mph" : 6.9,

"wind_kt" : 6,

"pressure_in" : 30.03,

"dewpoint_string" : "60.8 F (16.0 C)",

"dewpoint_f" : 60.8,

"dewpoint_c" : 16.0,

"visibility_mi" : 10.0,

"icon_url_base" : "http://forecast.weather.gov/images/wtf/small/",

"two_day_history_url" : "http://www.weather.gov/data/obhistory/K08D.html",

"icon_url_name" : "skc.png",

"ob_url" : "http://www.weather.gov/data/METAR/K08D.1.txt",

"disclaimer_url" : "http://weather.gov/disclaimer.html",

"copyright_url" : "http://weather.gov/disclaimer.html",

"privacy_policy_url" : "http://weather.gov/notice.html"

} ]

Source Code

https://github.com/tspannhw/ClouderaFlowManagementWorkshop/tree/main/flows

Resources

Posted on by:

tspannhw profile

Timothy Spann

@tspannhw

I am a Principal Field Engineer for Data in Motion at Cloudera. I work with Apache NiFi, Apache Kafka, Apache Spark, Apache Flink, IoT, MXNet, DLJ.AI, Deep Learning, Machine Learning, Streaming...

Discussion

markdown guide