Creating a Cross-platform Data Sandbox that makes Working with Data Easier and Faster

#datascience #dataengineering #python #sql

Hello everyone!

So for the past few months I have been working on a side project about a data centric cross platform desktop app that I call Varan, and I want to get some advise on improving it.

After some observations from my colleagues, classmates and overall people that work in data science related fields, I have seen some problems and frictions and I wanted to change that

Here is a full documentation of the project: https://docs.google.com/document/d/1fx3FBMWKLdiCfUIy3WdL9qJDT8GT5KlzzzNjqSNMuq0/edit?usp=sharing

What I observed:

Lots of time is is spent on environment setups and configs if you are starting from scratch, and it doesn't even have any effect on your final results
Tool dependencies: people use several tool simultaneously just to do their work, which leads to more friction and context changes, for example, db visualizers, python notebooks and so on, several apps at the same time
Most of the current apps depend on servers, more like cloud services, and are not giving the best performance for local analysis. For example in my workplace developers and analysts were using the same database server for BI tools and backend service testing which were hammering the server, and could be totally avoided if done locally to reduce server dependency
Not the best performance with current widely used tools: I also observed that if you are doing heavy tasks, current tools don't have the best performance, for example loading millions of rows of data and analyzing it

After considering these issues I decided to develop a side project to help my coworkers and people I know that work in related fields.

What is Varan about, and what makes it better:

Local first data analytics tool: All your data is fetched to your local machine and is accessible dynamically (currently loads into ram, but I am implementing better approach using DuckDB)
No context switching: Varan has all you need, SQL console, Python notebooks, DB visualizers all at the same UI
Optimized and Fast orchestra of the tools we use: SQL, Python and data sources are interconnected, among these 3 everything is accessible and it is seemless! For example, you query a SQL script on a data source, after that query you can use the result in SQL or Python immediately, no export/import, just the name of the result in your scripts and you are ready to go!
Easier syntax, especially in Python: You can query on CSV or Excel files too, just by using this example syntax: 'SELECT * FROM random_data_csv', or in Python 'df = random_data_csv.copy()' with out ever exporting you data to another source to match.
Unified data sources: You can even join different types of data sources in SQL or Python, for example, you can join a table from MySQL to CSV table without additional work, just in your query/script
Simple and auto access to your used tables in Python: 'list(_tables)' and you will see your tables across from any source you are using ready as a data frame
No Pyhton installation: It comes with embedded python and essential libraries such as numpy, matplot, pandas and so on
and much more

I want to get some feedback on this project I am working on, and if you have any problems related to this field, you can also share it so I can try to resolve that too. If you want, I can provide you with a beta version of the app for testing and experiencing it yourself. I appreciate any kind of feedback, and you can contact me anytime!

DEV Community

Creating a Cross-platform Data Sandbox that makes Working with Data Easier and Faster

Top comments (0)