Asked AI to format the original Text from Medium.
The Theoretical Foundation: Fundamentals of Data Science
Data Science represents the convergence of three essential domains that, together, transform raw data into strategic intelligence:
-
Business Knowledge (Domain)
- Ability to understand organizational context (industry, objectives, key metrics)
- Skill to formulate the right questions that data can answer
- Example: A data scientist in the banking sector needs to understand regulations, consumer behavior, and financial risks to identify fraud patterns.
-
Mathematics and Statistics
- Theoretical foundation for modeling and inference
- Key concepts: probability, descriptive/inferential statistics, linear algebra
- Example: Using logistic regression to predict customer churn based on demographic and behavioral variables.
-
Computing and Programming Skills
- Practical implementation through languages like Python and R
- Essential tools: Pandas (manipulation), Scikit-learn (modeling), SQL (databases)
- Example: Building automated data pipelines to process millions of daily transactions.
The Data Science Lifecycle
Typically, to be well-structured, this follows six interconnected stages that naturally require the three fundamentals mentioned above:
- Problem Definition: Translating business objectives into quantifiable questions. Example: “How to reduce delinquency in new customers by 20%?”
- Data Collection: Identifying relevant sources (internal/external). Example: Combining transactional, behavioral, and socioeconomic data.
- Data Cleaning and Preparation: Handling missing values, outliers, and inconsistencies. Example: Normalizing monetary scales and encoding categorical variables.
- Exploratory Analysis: Identifying patterns through visualizations and statistics. Example: Discovering that customers with >3 international transactions/month have 5x more fraud risk.
- Modeling: Building predictive or descriptive algorithms. Example: Training a Random Forest model to classify credit risk.
- Results Communication: Translating technical findings into strategic actions. Example: Interactive dashboard showing risk profiles for the credit team.
Practical Examples: Data Science in Action
Brazilian Case: Nubank — Revolutionizing the Financial Sector
Nubank challenged traditional banks using data as a competitive weapon, growing from a startup to a unicorn in less than 5 years¹.
Data Science Applications:
- AI and Machine Learning Credit Analysis Nubank uses machine learning models for credit risk analysis⁶. The platform processes about 450 million events per day⁷, generating approximately 5 million internal requests per minute⁸. Processing time in complex streams dropped from 550 to about 350 milliseconds, improving user experience⁹.
- Unified Fraud Defense System Nubank implemented a unified fraud defense platform with over 99.98% availability¹⁰. The system uses facial recognition (liveness detection) as an additional layer of protection¹¹, and all customers automatically have these defenses activated¹². The highly distributed architecture, with 20 shards in Brazil alone, processes over 100 terabytes of logs per day¹³.
- Personalization and Engagement With 83% of the customer base active monthly (off the curve compared to the market)¹⁴ and 61% of customers having Nu as their main financial relationship¹⁵, Nubank uses predictive AI models to personalize services and products. The average monthly revenue per customer (ARPAC) was $7.8 in 2022¹⁶, while the average monthly service cost per active customer remained below $0.⁷¹⁷.
Impact:
- 90 million customers (2023)³
- Market valuation: $30 billion⁴
- Third largest financial institution in Brazil by number of customers²
- 85% lower cost per customer vs. traditional banks⁵
North American Case: Netflix — Reinventing Entertainment
Netflix transformed from a DVD rental company to a streaming giant, reaching 230 million global subscribers in 2022¹⁸ and projecting to reach 301.6 million in 2025¹⁹.
Data Science Applications:
- Recommendation System Netflix’s recommendation algorithm influences 80% of the content watched on the platform²³, generating $1 billion per year in value from user retention²⁴. The system processes 3.2 Petabytes of data daily²⁵, analyzing more than 1 million data points²⁶. With a catalog of over 15,000 different movies and series²⁷, the system is fundamental to user experience, with users spending an average of 63 minutes per day watching content on the platform²⁸.
- Data-Based Decisions — The House of Cards Case The production of House of Cards, commissioned based on user preference data²⁹, represents a milestone in data-based decision making. Analysis revealed that users who watched the BBC version of House of Cards also liked movies starring Kevin Spacey or directed by David Fincher, as well as political dramas³⁰. Based on these insights, Netflix invested $100 million in the production³¹.
- Operations Optimization Netflix uses demand forecasting to optimize its infrastructure, especially during series launches. The system processes massive volumes of data to ensure streaming quality and operational efficiency²². The US represents the company’s largest market, with 81.44 million subscribers³².
Impact:
- 230 million global subscribers (2022)¹⁸
- Projection of 301.6 million subscribers (2025)¹⁹
- Revenue of $39 billion (2024)²⁰
- Retention rate of 98.2%²¹
- $1 billion/year generated by the recommendation system²⁴
Connecting the Dots: Lifecycle and Value Generated
Nubank: Lifecycle Mapping
Netflix: Lifecycle Mapping
Conclusion: The Harmony of Data
The cases of Nubank and Netflix demonstrate that the success of Data Science does not reside in isolated algorithms, but in the harmonious execution of the entire lifecycle. Each stage contributes a specific type of value:
- Problem Definition: Directs efforts toward real business impacts.
- Data Collection and Preparation: Ensures data quality for models.
- Analysis and Modeling: Transform data into actionable predictions.
- Communication: Connects technical insights to strategic decision-making.
Companies that master this harmony — like those analyzed — don’t just survive, but lead their industries. The lesson is clear: Data Science is not a project, but a continuous cycle of learning and optimization. For organizations seeking relevance in the 21st century, the question is not whether they should invest in Data Science, but how to integrate it into the DNA of their operations.
The future belongs to companies that understand that data is not just numbers on a server, but the voice of the customer, the pulse of the market, and the compass for innovation. Data Science is the art of listening to this voice — and transforming it into action.
Sources:
Nubank:
- ¹ Historical data: founded in 2013, reached unicorn status in 2018
- ² Brazil Journal: “Nu ended 2024 as the third largest financial institution in Brazil”
- ³ Nubank quarterly reports and earnings presentations
- ⁴ Market data and investment reports
- ⁵ Brazil Journal: “digital model, which generates a cost to serve per customer on average 85% lower”
- ⁶ Building Nubank: multiple articles on the use of ML models
- ⁷ Building Nubank: “platform processes about 450 million events per day”
- ⁸ Building Nubank: “generating approximately 5 million internal requests per minute”
- ⁹ Building Nubank: “processing time in complex streams dropped from 550 to about 350 milliseconds”
- ¹⁰ Building Nubank: article on fraud defense platform
- ¹¹ Nubank International: “Facial recognition (liveness) is yet another layer of protection”
- ¹² Nubank International: “All Nubank customers automatically have these defenses”
- ¹³ Building Nubank: “Nubank’s distributed ETL system — which aggregates over 100 terabytes of logs per day”
- ¹⁴ Brazil Journal: “83% of this base is active monthly, off the curve compared to the market”
- ¹⁵ Brazil Journal: “61% of them have Nu as their main financial relationship”
- ¹⁶ CEO’s letter to shareholders: “The average monthly revenue per customer (ARPAC) was $7.8 for the year ended December 31, 2022”
- ¹⁷ Nubank International: “the average monthly service cost per active customer remained below the dollar level, at $0.7 and $0.8”
Netflix:
- ¹⁸ Demand Sage: historical table shows 230.7 million in 2022 netflix
- ¹⁹ Demand Sage: “Netflix reached 301.6 million global subscribers as of August 2025”
- ²⁰ Demand Sage: Netflix generated 39 billion in revenue in 2024
- ²¹ dcfmodeling.com: “Netflix’s customer retention rate is very high, at 98.2%”
- ²² Netflix Tech Blog: several articles on infrastructure optimization
- ²³ dcfmodeling.com: “The company’s recommendation algorithm influences 80% of the content watched on the platform”
- ²⁴ Blog Somos Tera: “helped Netflix generate $1 billion per year in value from user retention”
- ²⁵ dcfmodeling.com: “Machine learning models Process 3.2 Petabytes of data daily”
- ²⁶ attractgroup.com: “Netflix’s data science team analyzes over 1 million data points”
- ²⁷ Stratoflow.com: “With over 15,000 different movies and series in the Netflix catalog”
- ²⁸ Demand Sage: “On average, Netflix users spend around 63 minutes per day watching content”
- ²⁹ Pacific Standard: https://psmag.com/economics/house-of-cards-is-built-on-big-data-52602/
- ³⁰ Pacific Standard: “viewers are also suckers for movies starring Kevin Spacey or directed by David Fincher”
- ³¹ Sofy.tv: “produced at a cost of over $100 million”
- ³² Demand Sage: table by countries
Top comments (0)