DEV Community: just a martian

Suggested databases for logging client app data

just a martian — Sun, 26 Sep 2021 04:16:43 +0000

When it comes to client-side logging events at scale, what databases come to mind? I was reading through some articles and saw that most big tech went the MySQL route and experimented with others such as Cassandra, but ended up staying on MySQL. Is MySQL still considered the optimal choice?

Tracking guest users

just a martian — Wed, 01 Sep 2021 16:17:08 +0000

How does an app like tiktok track a device that is not signed up on the app, but might signup later; therefore, initially missing an accountId stored in a data warehouse? My dim_users table can only insert rows given the null accountId attached to that device machineId. I still can track the activity of this device but won’t know exactly who they’re initially.

How do you merge millions of small files in a S3 bucket to larger single files to a separate bucket daily?

just a martian — Fri, 27 Aug 2021 20:49:47 +0000

We have a situation where we're currently uploading events to s3 in real-time. The result is roughly 30 million tiny json files (<1kb) per day. These files sit in a raw layer bucket with the following folder format "#{folder_name}/#{year}/#{month}/#{day}/#{hour}/#{minute}/#{System.os_time()}-#{file_name}.#{file_ext}". We want to send this to a data warehouse for analytics but need the files to be much larger (150-200mb). What solutions are there for merging json files from a s3 bucket back to a separate s3 bucket. I have tried developing a lambda to tackle this problem but it was not enough since all the files must be downloaded in /tmp and lambda ran out of memory. Please help :)