<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Daniel Stepanian</title>
    <description>The latest articles on DEV Community by Daniel Stepanian (@dstepanian).</description>
    <link>https://dev.to/dstepanian</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3760509%2F81806f0f-9b04-47b2-b3e7-f9b97eb8120e.jpeg</url>
      <title>DEV Community: Daniel Stepanian</title>
      <link>https://dev.to/dstepanian</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/dstepanian"/>
    <language>en</language>
    <item>
      <title>SaaS System Design - Graphmotivo</title>
      <dc:creator>Daniel Stepanian</dc:creator>
      <pubDate>Mon, 09 Feb 2026 11:48:45 +0000</pubDate>
      <link>https://dev.to/dstepanian/saas-system-design-graphmotivo-210k</link>
      <guid>https://dev.to/dstepanian/saas-system-design-graphmotivo-210k</guid>
      <description>&lt;h2&gt;
  
  
  Background
&lt;/h2&gt;

&lt;p&gt;In the last months of 2025 I was under great impression of graphs: graph databases, knowledge graphs, graph algorithms, Neo4j, GraphRAGs. I feel like these technologies open another valuable approach of understanding reality through the lens of relations between them. We can visualize relations, train ML models, find patterns and make better predictions, and understand knowledge domains better.&lt;br&gt;
Under such data-science-oriented influence I also have been learning the DevOps side on intense 5-weekend bootcamp, which gave me abilities to set up, run, monitor and move software production deployments.&lt;/p&gt;

&lt;h2&gt;
  
  
  Idea, project overview
&lt;/h2&gt;

&lt;p&gt;The idea of using knowledge graphs, embeddings to match buyer persona’s treats to product use cases, and matching it all to Google/META ad configs to reach the right people came to me much earlier and during a couple of months, the final idea matured in my head. I started working on Graphmotivo in late November 2025, and devoted much of my free time throughout December and January. By coincidence, we saw a significant increase in the quality of AI programming, which allowed me to create this project in such a short time: 1.5 months.&lt;br&gt;
Graphmotivo is a future marketing intelligence platform that uses Expert AI Agentic workflows and knowledge graphs, which allows marketers and business owners generate buyer personas, use case story journeys, ad targeting inspirations, and explorable identity graphs of simulated buyer personas.&lt;/p&gt;

&lt;h2&gt;
  
  
  Development Process
&lt;/h2&gt;

&lt;p&gt;My primary tool was Cursor 2.1.49 version with Composer for researching and planning the work, sometimes Auto mode for simpler stuff, and Sonnet 4.5 / Opus 4.5 for hardest parts, and debugging, which often took place for 2-4 hours to find root causes and make fixes. In total, around 500m tokens were used.&lt;/p&gt;

&lt;p&gt;The development process was iterative: I started with the design doc, architecture plan, and building every service step by step, starting with the heart: Neo4j graph database + Agentic workflows to fill it in with data, based on user prompts. I tested several agentic workflows frameworks: Google ADK - Agent Development Kit (dropped it because it had a constraint of running up to 10 steps in a workflow), LangGraph (dropped as it was too complex for debugging, too much of important parts what works and how was hidden under abstraction). So I ended up creating a custom Python workflow of agents that: think about persona matching the business description, research of these persona traits, what they do, their problems, what website they visit. Then another set of agents extracts nodes and relations and translates them to a Cypher query according to a specific database schema that I created during a project planning phase. Cypher queries are invoked, graphs are created. Other agents, based on research agents’ data, generate ad targeting configs, user journey story, generate images for the scenario and return data from Neo4j that can be then used in web-based graphs for visualization. These workflows are deployed in a separate container, and are controlled by API.&lt;/p&gt;

&lt;p&gt;Once a scenario generation process was working on local docker containers (this was achieved after ~50+ failing 5 minute agentic workflow processes during development) - I’ve finally got json files that could be loaded to the demo UI. Frontend was built in a separate container, for backend I used Supabase, which is an open source Firebase alternative that makes backend setup much faster, as it comes with Postgres database, storage, authorization out-of-the-box. Payment Integration with Paypal, sandbox tests, and when everything worked locally, I went to the next step: Production deployment.&lt;/p&gt;

&lt;p&gt;Building a production environment was based on a central plan in a .md file that was preceded with a research &amp;amp; planning phase. The production steps were:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Pre-deployment preparations (env, code verifications, security hardening)&lt;/li&gt;
&lt;li&gt;VPS, DNS setups, server hardening, firewall, user permissions&lt;/li&gt;
&lt;li&gt;Infrastructure config with Docker compose&lt;/li&gt;
&lt;li&gt;Reverse proxy setup, SSL&lt;/li&gt;
&lt;li&gt;Docker Compose deployment, database migrations, RLS policies, network security hardening&lt;/li&gt;
&lt;li&gt;GCP integrations: Google Oauth authorization, Pub-sub automation for a spend safety cap - Gemini API cost usage to detach the budget account from the project once spend crosses a treshold.&lt;/li&gt;
&lt;li&gt;CI/CD with Github Actions&lt;/li&gt;
&lt;li&gt;Testing, debugging the layout, agentic workflows (another 50+ failed during this phase before getting it working), testing auth, payments sandbox &amp;gt; production&lt;/li&gt;
&lt;li&gt;Grafana + Loki containers for observability&lt;/li&gt;
&lt;li&gt;Ansible playbooks for future reproducibility&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Results
&lt;/h2&gt;

&lt;p&gt;In mid-January 2026, everything was working as intended, and the platform was hosted at: &lt;a href="https://graphmotivo.dstepanian-tech.ovh/" rel="noopener noreferrer"&gt;https://graphmotivo.dstepanian-tech.ovh/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;It offers 3 demo persona purchase story explorations, a flexible token-based payment system allowing users to request their own custom persona + user journey presentation with graph explorations.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fha97yukhohlycy97gww8.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fha97yukhohlycy97gww8.jpg" alt="Demo personas" width="800" height="335"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fi6h5tq53y9trtp7g68jz.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fi6h5tq53y9trtp7g68jz.jpg" alt="Ad targeting" width="800" height="503"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyjd7t4ovxgmrghvo8ei5.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyjd7t4ovxgmrghvo8ei5.jpg" alt="Graph Explorer" width="800" height="491"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;There are some improvements that could be done in future i.a. with UX optimization, graph database deduplication (e.g. META vs Meta Ads) but for this stage it’s good enough. It was fun building this, but it required a tight focus and patience to work in Cursor. Sometimes it still feels like working with ultra-fast but clueless temps. These also helped through the process: basic technical understanding of how LLMs work, statistics, AI-augmented programming experience and best practices to guide programming models correctly to the goal, ensure they know what’s needed, and use them to identify root causes of errors together.&lt;/p&gt;

</description>
      <category>database</category>
      <category>datascience</category>
      <category>saas</category>
      <category>systemdesign</category>
    </item>
    <item>
      <title>Failed Machine Learning Experiment: Training XGBoost Classifier with 1.5m signals</title>
      <dc:creator>Daniel Stepanian</dc:creator>
      <pubDate>Sun, 08 Feb 2026 19:15:25 +0000</pubDate>
      <link>https://dev.to/dstepanian/failed-machine-learning-experiment-training-xgboost-classificator-with-15m-signals-2fk5</link>
      <guid>https://dev.to/dstepanian/failed-machine-learning-experiment-training-xgboost-classificator-with-15m-signals-2fk5</guid>
      <description>&lt;p&gt;In 2022 I started creating trading strategies in Python, and I had in mind some powerful ML-based strategies, but had neither knowledge, nor abilities to code and test them. Now, although I still have no experience with professional Machine Learning with deep mathematics, I thought that I could use AI to write code for this (Sonnet 4.5), and suggest model parameters (Grok Thinking).&lt;/p&gt;

&lt;p&gt;When looking at many market price charts, I was under the impression that there are some patterns that can be utilized and with a right set of trading strategy and position optimization to get automated at least a couple percents of return. It’s clear that this is not true - usually the market presents a distorted picture. Yet, tempted with the ability to check it myself in a quick prototype project, I proceeded with an experiment to verify if a hypothesis based on my earlier impressions could be true. I used 2 Jupyter notebooks: XGBoost model training and Strategy backtest.&lt;/p&gt;

&lt;p&gt;First, I downloaded 5y data of 15m timeframe price points for top 30 crypto tokens into parquet files. Then created an algorithm to find all price points, after which there was a price drop bigger than 3% within the 10 next 15 min blocks, and extracted preceding 10 price points with technical analysis indicators as training data for XGBoost classifier - for identification of moments preceding  price drops. 500k Drop signals were found, and I added another 1 million with random non-drop preceding samples, in total 1.5m training samples, with 20% from these used for testing. &lt;/p&gt;

&lt;p&gt;I’ve also normalized drops, as 3% drop on Bitcoin has a different magnitude than the same drop on Dogecoin. So I’ve chose the drop threshold = -2 expressed with Z-score approach: drop_zscore=drop_pct/volatility. It means that it’s a drop with 2x typical volatility (based on std deviation). &lt;br&gt;
Then feature engineering process based on indicators momentum, volatility, price differences. Data preparation, then XGBoost training with parameters from Grok’s recommendation:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;*Recommended hyperparameters:**
- `max_depth`: 3-7 (prevents memorizing noise)
- `learning_rate`: 0.01-0.1 (smaller = better with more trees)
- `n_estimators`: 200-500 (with early stopping)
- `subsample` / `colsample_bytree`: 0.6-0.9 (prevents overfitting)
- `scale_pos_weight`: 3-10 (handles class imbalance)

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The model performed very similarly on test predictions and train set:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;============================================================
TRAIN SET PERFORMANCE
============================================================
ROC-AUC Score: 0.6899

Classification Report:
              precision    recall  f1-score   support

   No Signal       0.93      0.62      0.74   3149036
      Signal       0.19      0.66      0.29    426220

    accuracy                           0.62   3575256
   macro avg       0.56      0.64      0.52   3575256
weighted avg       0.84      0.62      0.69   3575256

Confusion Matrix:
[[1938267 1210769]
 [ 144995  281225]]

============================================================
TEST SET PERFORMANCE (Unseen Data)
============================================================
ROC-AUC Score: 0.6761

Classification Report:
...
Train AUC: 0.6899
Test AUC:  0.6761
Difference: 0.0138
✓ Good generalization - minimal overfitting

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Basic returns turned out to be the most important feature for drop prediction. Yet there are too many false positives, which could hurt a portfolio.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flk1w12xdgd8geucqrixd.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flk1w12xdgd8geucqrixd.png" alt="Confusion Matrix by Feature Space" width="742" height="638"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;So I thought, maybe a set of position parameters could save this signal and make it usable? So I proceed with the backtesting notebook. I loaded the model, created a backtesting trading simulation environment, and a set of trading position parameters: TP, SL, delay, cooldown. I tried a grid search optimization approach - to test 900 scenarios with parameter combinations and find algorithmically the best one. It took 3 hours on my local computer, and yet… All scenarios resulted in 100% loss! The process failed miserably. &lt;/p&gt;

&lt;p&gt;It was nice working on this step-by-step with Cursor + Sonnet 4.5. I’ve read a lot about XGBoost when building this, so just telling the assistant what needs to be done and why, and seeing it creating neat notebooks that work out-of-the-box or after 1-2 debug-fix iterations, felt almost seamless. Working with Jupyter Notebooks in Cursor is not convenient - the notebook needs to be closed, reopened and rerunned  manually after changes applied in Agent mode. So I ended up in Ask Mode and pasting the code blocks manually.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
      <category>python</category>
      <category>sideprojects</category>
    </item>
  </channel>
</rss>
