DEV Community

Dmitry Romanoff
Dmitry Romanoff

Posted on

Parsing YAML file with Python

In this blog I will demonstrate how to parse YAML file with Python.

Assume we have a YAML file:

cat my_yaml_file.yaml

probes:
  -
    id: 1
    description: Check the databases having more than 5 active connections.
    issue: It were found databases with the high number of active connections.
    recommendation: Check why Customers open so many active connections. It may be
      wrong configuration or unusual application pattern.
    sql_query_o: no
    sql_query: select datname, count(1) num_of_active_connections, 'wvw' chk from
      pg_stat_activity where datname!='' and state!='idle' group by datname
      having count(1)>5 order by 2 desc
    sql_query_extra: select datname, state, client_addr, client_hostname,
      substr(query, 1, 2048) query from pg_stat_activity where state!='idle' and
      datname in ( select datname from ( select datname, count(1)
      num_of_active_sessions from pg_stat_activity where state!='idle' and
      datname!='' group by 1 having count(1)>0 ) M ) order by 1, 5
  -
    id: 2
    description: Check DB queries that take more than 30 seconds.
    issue: Long-running queries.
    recommendation: Check why the query/queries take so much time. It maybe it's
      heavy non-optimized query. Maybe it's unusual application pattern.
    sql_query_o: no
    sql_query: select now()-query_start as runtime, pid as process_id, datname as
      db_name, client_addr, client_hostname, substr(query, 1, 2048) query, 'wvw'
      chk from pg_stat_activity where state!='idle' and datname!='' and now() -
      query_start > '30 seconds'::interval order by 1 desc;
    sql_query_extra: no
  - 
    id: 3
    description: Check in the pg_stat_statements DB queries that take more than 3000 ms
    issue: Long-running queries.
    recommendation: Check why the query/queries take so much time. It may be it is a
      heavy non-optimized query. Maybe it's an unusual application pattern.
    sql_query_o: SELECT pss.userid, pss.dbid, pd.datname as db_name,
      round(pss.total_time::numeric, 2) as total_time, pss.calls,
      round(pss.mean_time::numeric, 0) as mean, substr(pss.query, 1, 2048)
      query, 'wvw' chk FROM pg_stat_statements pss, pg_database pd WHERE
      pd.oid=pss.dbid and round(pss.mean_time::numeric, 0) > 3000 ORDER BY
      round(pss.mean_time::numeric, 0) desc LIMIT 30;
    sql_query: SELECT pss.userid, pss.dbid, pd.datname as db_name,
      round((pss.total_exec_time + pss.total_plan_time)::numeric, 2) as
      total_time, pss.calls,
      round((pss.mean_exec_time+pss.mean_plan_time)::numeric, 0) as mean,
      substr(pss.query, 1, 2048) query, 'wvw' chk FROM pg_stat_statements pss,
      pg_database pd WHERE pd.oid=pss.dbid and
      round((pss.mean_exec_time+pss.mean_plan_time)::numeric, 0) > 3000 ORDER BY
      round((pss.mean_exec_time+pss.mean_plan_time)::numeric, 0) desc LIMIT 30;
    sql_query_extra: no
Enter fullscreen mode Exit fullscreen mode

To parse it and populate an array with its data I will use the following Python code:

ptn_yaml_parser.py

import yaml

with open("/Users/dmitryr/LAB/YAML_parser/my_yaml_file.yaml", 'r') as f:
    valuesYaml = yaml.load(f, Loader=yaml.FullLoader)

n_elements=len(valuesYaml['probes'])
print('n_elements:', n_elements)

for idx in range(0, n_elements):
  print(" ",idx,":")
  print("issue:",valuesYaml['probes'][idx]['issue'])
  print("description:",valuesYaml['probes'][idx]['description'])
  print("recommendation:",valuesYaml['probes'][idx]['recommendation'])
  print("sql_query:",valuesYaml['probes'][idx]['sql_query'])
  print("sql_query_o:",valuesYaml['probes'][idx]['sql_query_o'])
  print("sql_query_extra:",valuesYaml['probes'][idx]['sql_query_extra'])

Enter fullscreen mode Exit fullscreen mode

Example how it works:

n_elements: 3
  0 :
issue: It were found databases with the high number of active connections.
description: Check the databases having more than 5 active connections.
recommendation: Check why Customers open so many active connections. It may be wrong configuration or unusual application pattern.
sql_query: select datname, count(1) num_of_active_connections, 'wvw' chk from pg_stat_activity where datname!='' and state!='idle' group by datname having count(1)>5 order by 2 desc
sql_query_o: False
sql_query_extra: select datname, state, client_addr, client_hostname, substr(query, 1, 2048) query from pg_stat_activity where state!='idle' and datname in ( select datname from ( select datname, count(1) num_of_active_sessions from pg_stat_activity where state!='idle' and datname!='' group by 1 having count(1)>0 ) M ) order by 1, 5
  1 :
issue: Long-running queries.
description: Check DB queries that take more than 30 seconds.
recommendation: Check why the query/queries take so much time. It maybe it's heavy non-optimized query. Maybe it's unusual application pattern.
sql_query: select now()-query_start as runtime, pid as process_id, datname as db_name, client_addr, client_hostname, substr(query, 1, 2048) query, 'wvw' chk from pg_stat_activity where state!='idle' and datname!='' and now() - query_start > '30 seconds'::interval order by 1 desc;
sql_query_o: False
sql_query_extra: False
  2 :
issue: Long-running queries.
description: Check in the pg_stat_statements DB queries that take more than 3000 ms
recommendation: Check why the query/queries take so much time. It may be it is a heavy non-optimized query. Maybe it's an unusual application pattern.
sql_query: SELECT pss.userid, pss.dbid, pd.datname as db_name, round((pss.total_exec_time + pss.total_plan_time)::numeric, 2) as total_time, pss.calls, round((pss.mean_exec_time+pss.mean_plan_time)::numeric, 0) as mean, substr(pss.query, 1, 2048) query, 'wvw' chk FROM pg_stat_statements pss, pg_database pd WHERE pd.oid=pss.dbid and round((pss.mean_exec_time+pss.mean_plan_time)::numeric, 0) > 3000 ORDER BY round((pss.mean_exec_time+pss.mean_plan_time)::numeric, 0) desc LIMIT 30;
sql_query_o: SELECT pss.userid, pss.dbid, pd.datname as db_name, round(pss.total_time::numeric, 2) as total_time, pss.calls, round(pss.mean_time::numeric, 0) as mean, substr(pss.query, 1, 2048) query, 'wvw' chk FROM pg_stat_statements pss, pg_database pd WHERE pd.oid=pss.dbid and round(pss.mean_time::numeric, 0) > 3000 ORDER BY round(pss.mean_time::numeric, 0) desc LIMIT 30;
sql_query_extra: False
Enter fullscreen mode Exit fullscreen mode

Conclusion:

In this blog I demonstrated how to parse YAML file using Python and populate the data into array.

Top comments (0)