Chunkify Huge List into Smaller N equal size lists.
In order to backfill data for one of our machine learning pipeline I have to divide the date list into small n list of equal length and distribute them at n GPU cluster.
from datetime import timedelta,date,datetime
start_dt = date(2023,1,1)
end_dt = date(2023,12,31)
cdays = []
while start_dt < end_dt:
cdays.append(start_dt)
start_dt += timedelta(days=3)
#print(cdays)
def split(a, n):
k, m = divmod(len(a), n)
return (a[i*k+min(i, m):(i+1)*k+min(i+1, m)] for i in range(n))
split_list = list(split(cdays, 15))
print(split_list[0])
#Output
[datetime.date(2023, 1, 1), datetime.date(2023, 1, 4), datetime.date(2023, 1, 7), datetime.date(2023, 1, 10), datetime.date(2023, 1, 13), datetime.date(2023, 1, 16), datetime.date(2023, 1, 19), datetime.date(2023, 1, 22), datetime.date(2023, 1, 25)]
__all__ in Python
The __all__ represents name of variable ,function or module that you want to expose to wildcards import. This method really comes handy when large number of function in your base module and you just want to export only few of them. Example let's say you have.
#foo.py
waz = 3
bar = 9
def baz():
return 'baz'
__all__ = ['bar','baz']
and now bar.py imports from foo.py
#bar.py
from foo import *
print(bar)
print(baz)
print(waz) # This will trigger the exception , as waz was is not exported by the module.
Conclusion
Hope this was easy and clear to understand.
Some Final Words
If this blog was helpful and you wish to show a little support, you could:
- 👍 300 times for this story
- Follow me on LinkedIn: https://www.linkedin.com/in/raju-n-203b2115/
These actions really really really help me out, and are much appreciated!
Top comments (1)
Excellent breakdown @rajun !
Using Python to chunk large datasets for GPU processing optimizes machine learning pipelines, and leveraging all keeps your codebase clean and manageable by controlling module exports