DEV Community

Timothy Spann.   🇺🇦
Timothy Spann. 🇺🇦

Posted on • Originally published at datainmotion.dev on

2 1

Ingesting All The Weather Data With Apache NiFi



Ingesting All The Weather Data With Apache NiFi

Step By Step NiFi Flow

  1. GenerateFlowFile - build a schedule matching when NOAA updates weather
  2. InvokeHTTP - download all weather ZIP
  3. CompressContent - decompress ZIP
  4. UnpackContent - extract files from ZIP
  5. *RouteOnAttribute - just give us ones that are airports (${filename:startsWith('K')}). optional.
  6. *QueryRecord - XMLReader to JsonRecordSetWriter. Query : SELECT * FROM FLOWFILE WHERE NOT location LIKE '%Unknown%'. This is to remove some locations that are not identified. optional.
  7. Send it somewhere for storage. Could put PutKudu, PutORC, PutHDFS, PutHiveStreaming, PutHbaseRecord, PutDatabaseRecord, PublishKafkaRecord2* or others.

URL For All US Data

invokehttp.request.url

https://w1.weather.gov/xml/current\_obs/all\_xml.zip

Example Record As Converted JSON

[ {

"credit" : "NOAA's National Weather Service",

"credit_URL" : "http://weather.gov/",

"image" : {

"url" : "http://weather.gov/images/xml\_logo.gif",

"title" : "NOAA's National Weather Service",

"link" : "http://weather.gov"
Enter fullscreen mode Exit fullscreen mode

},

"suggested_pickup" : "15 minutes after the hour",

"suggested_pickup_period" : 60,

"location" : "Stanley Municipal Airport, ND",

"station_id" : "K08D",

"latitude" : 48.3008,

"longitude" : -102.4064,

"observation_time" : "Last Updated on Jul 10 2020, 9:55 am CDT",

"observation_time_rfc822" : "Fri, 10 Jul 2020 09:55:00 -0500",

"weather" : "Fair",

"temperature_string" : "66.0 F (19.0 C)",

"temp_f" : 66.0,

"temp_c" : 19.0,

"relative_humidity" : 83,

"wind_string" : "South at 6.9 MPH (6 KT)",

"wind_dir" : "South",

"wind_degrees" : 180,

"wind_mph" : 6.9,

"wind_kt" : 6,

"pressure_in" : 30.03,

"dewpoint_string" : "60.8 F (16.0 C)",

"dewpoint_f" : 60.8,

"dewpoint_c" : 16.0,

"visibility_mi" : 10.0,

"icon_url_base" : "http://forecast.weather.gov/images/wtf/small/",

"two_day_history_url" : "http://www.weather.gov/data/obhistory/K08D.html",

"icon_url_name" : "skc.png",

"ob_url" : "http://www.weather.gov/data/METAR/K08D.1.txt",

"disclaimer_url" : "http://weather.gov/disclaimer.html",

"copyright_url" : "http://weather.gov/disclaimer.html",

"privacy_policy_url" : "http://weather.gov/notice.html"

} ]

Source Code

https://github.com/tspannhw/ClouderaFlowManagementWorkshop/tree/main/flows

Resources

Sentry image

Hands-on debugging session: instrument, monitor, and fix

Join Lazar for a hands-on session where you’ll build it, break it, debug it, and fix it. You’ll set up Sentry, track errors, use Session Replay and Tracing, and leverage some good ol’ AI to find and fix issues fast.

RSVP here →

Top comments (1)

Collapse
 
thiagoalbertins profile image
Thiago Albertins

Hey Timothy, nice post! I've been trying to follow but got an error with the Get Weather All processor StandardSSLContextService. I understand there was an file on the Truststore Filename field but already searched for it on the weather.org website and cant find it. Could you help me setting it up please? thx in advice

Billboard image

The Next Generation Developer Platform

Coherence is the first Platform-as-a-Service you can control. Unlike "black-box" platforms that are opinionated about the infra you can deploy, Coherence is powered by CNC, the open-source IaC framework, which offers limitless customization.

Learn more

👋 Kindness is contagious

Please leave a ❤️ or a friendly comment on this post if you found it helpful!

Okay