DEV Community

pacdev
pacdev

Posted on • Edited on

Beans !

Utilisation de pandas.groupby et de pandas.cut
Imaginons qu'on ait un Dataframe représentant une time serie qui représente l'élévation du niveau d'eau et la direction de la houle à chaque pas de temps.

#+BEGIN_SRC
import pandas as pd
import numpy as np
import random as rdn

# build fake dataframe
df = pd.DataFrame()
df['dir'] = [rdn.randint(0, 360) for _ in range(100)]
df['elevation'] = [rdn.uniform(-3, 3) for _ in range(100)]
print("df:\n", df)
#+END_SRC

#+RESULTS:
df:
     dir  elevation
0    48   2.437835
1     4   1.489722
2   130  -0.576090
3   169   2.721867
4   315  -1.120620
..  ...        ...
95  247  -1.256091
96  301   2.014663
97  175   0.901548
98  289   2.903849
99   98  -0.712153

[100 rows x 2 columns]
Enter fullscreen mode Exit fullscreen mode

Cela peut être fastidieux à lire. On peut vouloir lire les informations par secteur. On effectue alors un binning.

#+BEGIN_SRC python
# create the bins
bins = pd.cut(df['dir'], np.arange(0, 390, 30))

df_grp = df.groupby(bins).describe()['elevation']
print("df_grp:\n", df_grp)
#+END_SRC

#+RESULTS:
df_grp:
             count      mean       std  ...       50%       75%       max
dir                                    ...                              
(0, 30]       5.0  0.249184  1.670480  ...  0.605261  1.489722  1.771682
(30, 60]      6.0  0.530561  2.120228  ...  1.271229  2.242043  2.437835
(60, 90]     14.0  0.280099  1.937723  ...  1.548581  1.775393  2.223731
(90, 120]     9.0 -0.894609  1.444726  ... -0.712153 -0.239352  1.684734
(120, 150]    5.0 -0.542193  0.655393  ... -0.576090 -0.259147  0.374463
(150, 180]   11.0 -0.000794  1.906180  ... -0.093552  1.132952  2.759483
(180, 210]    5.0 -0.104412  1.175992  ... -0.471943  1.037076  1.221931
(210, 240]   11.0  0.447931  1.605676  ...  0.832420  1.552965  2.335645
(240, 270]    4.0 -0.556329  1.708697  ... -1.300140 -0.443048  1.996078
(270, 300]   12.0 -0.406191  1.708407  ... -0.800965 -0.041385  2.903849
(300, 330]   11.0  0.168334  1.585117  ...  0.135546  1.177059  2.553054
(330, 360]    7.0  0.411437  1.725589  ...  0.466246  1.667764  2.716199

[12 rows x 8 columns]

Enter fullscreen mode Exit fullscreen mode

Pour centrer autour de 0

condition = (df["dir"] > 360 - bin_size/2) | (df["dir"] < bin_size/2)
df1 = df[condition]
df2 = df[~condition]
Enter fullscreen mode Exit fullscreen mode

AWS Security LIVE!

Join us for AWS Security LIVE!

Discover the future of cloud security. Tune in live for trends, tips, and solutions from AWS and AWS Partners.

Learn More

Top comments (0)

Billboard image

The Next Generation Developer Platform

Coherence is the first Platform-as-a-Service you can control. Unlike "black-box" platforms that are opinionated about the infra you can deploy, Coherence is powered by CNC, the open-source IaC framework, which offers limitless customization.

Learn more

👋 Kindness is contagious

Please leave a ❤️ or a friendly comment on this post if you found it helpful!

Okay