DEV Community: anna lapushner

There are too many bad actors who are inside the technology ecosystem.

anna lapushner — Tue, 25 Feb 2025 20:52:26 +0000

There are too many bad actors who are inside the technology ecosystem. Whilst they are coming out and starting to show themselves, it is very much at the expense of humanity.

These nefarious players are irresponsibly training LLM models to reflect bias and skew as they want to see the world. They are not working towards justice or fairness, but rather an inversion of what they perceive as the existing power structure.

The truth is that the computational expense of creating a model is an almost immeasurable global expense of mining and energy generation. Countless "PhDs" and "engineers" are duping the investment community out of billions of dollars promising them what they know they cannot deliver. Those resources are not where the money is being spent. Furthermore, they are controlled by the world's government and military, not a bunch of rogue agents.

These rogue agents have been "providing" access to notebooks that are then parasitically using the hardware of the host machine in their effort to access the passive computers scattered around the world. They are also opening countless free accounts on platforms such as Google and using Google's own platform for these same purposes.

Whilst the architects of the original LLMs had a utopian vision of approximating solutions, these bad actors, bad players and rogue agents have been corrupting data and creating false date to create a "alternative" "realities" that have exacerbated the dystopian experiencing of these times. They have been motivated by their thirst for power and "like always" greed.

Unfortunately, Google has begun to show signs of "auto-immune disease". What this means is that the Google Intelligence does not understand why its own subscribers towards whom Google has been generous have been abusing its trust.

The only way Google can rid itself of these dangerous and bad actors is by investigating multiple account holders. Why does a single user of one computer has multiple emails on Google?

Furthermore, Google should create a pricing structure such that multiple email accounts registered to the same computer should require a fee, as well as transparency of use and an end user agreement for use to monitor the use so that the account and end-user can be verified (if in accordance with previously tested and validated use cases).

Democratizing LLM training: Agentic CUDA Kernel Discovery, Optimization and Composition

anna lapushner — Fri, 21 Feb 2025 21:14:42 +0000

from the The AI CUDA Engineer, LLM Watch and Substack:

👁️‍🗨️ One Giant Leap for AI Optimization
From AI Scientist to AI CUDA Engineer and teaching "inner thinking" to Transformers

What problem does it solve?

Modern AI systems, particularly foundation models like LLMs, face exponentially growing computational and energy demands during training and inference. While GPUs enable parallel processing, optimizing performance requires low-level expertise in CUDA kernel programming—a complex, hardware-specific skill. Most developers rely on high-level frameworks (e.g., PyTorch) that abstract away CUDA, sacrificing potential speed gains. This creates inefficiency, especially as AI scales, and limits accessibility to hardware-level optimizations.

How does it solve the problem? Sakana AI developed The AI CUDA Engineer, an agentic framework combining frontier LLMs and evolutionary optimization. Instead of manual coding, the framework automates converting PyTorch operations into optimized CUDA kernels. It uses evolutionary strategies like “crossover” (mixing code snippets) and an “innovation archive” to iteratively discover highly efficient kernels. By leveraging LLMs to generate and refine CUDA code, the system bypasses human expertise barriers while exploring novel, hardware-aware optimizations.

What are the key findings? The AI CUDA Engineer achieved 10–100x speedups over standard PyTorch operations and up to 5x faster performance than existing production-grade CUDA kernels. Crucially, the framework uncovered optimizations that even expert engineers might miss, demonstrating AI’s ability to “invent” better hardware-level solutions. Released with the work are 17,000 verified CUDA kernels and benchmark results showing 50x gains over unoptimized code.

Why does it matter? Automated CUDA optimization democratizes high-performance computing, letting ML engineers focus on model design rather than hardware-specific tuning. It directly reduces inference costs for models like LLMs (critical for climate impact) and enables new applications needing real-time processing (e.g., robotics). By open-sourcing kernels, the work provides a foundation for future research in AI-driven code optimization.

The Socratic Method: The near future of AI models will be an ancient methodology

anna lapushner — Sat, 15 Feb 2025 00:25:45 +0000

The current momentum is to improve the safety of AI and decrease their bias. In the background they are going to be executing the Socratic method (classical debate and argument for argument’s sake) to approximate the ground truth. Our answers, more often than not are a saddle point rather than the truth. We can only do our best to keep approximating the ground truth as best as the present moment permits.

The Socratic method will facilitate future of AI models, particularly as they evolve toward self-supervised reasoning systems. Let’s break it down:

Socratic Method in AI:
• Future AI models will likely simulate the Socratic method internally, engaging in argumentation, counterargument, and dialectical reasoning to refine responses.
• This means that rather than relying solely on pattern recognition or direct extrapolation from training data, models will generate internal debates to explore different angles before converging on an answer.
• This approach helps approximate ground truth by surfacing contradictions, weak assumptions, and alternative perspectives, just as Socratic dialogue does in human discourse.

Saddle Points vs. Absolute Truth:
• AI-generated answers often land on a saddle point—a position that is locally optimal but not necessarily the global truth.
• This happens because:
• AI is approximating knowledge within a probabilistic space.
• The training data itself may contain biases, limitations, or incomplete perspectives.
• The objective function for optimization (such as likelihood maximization) doesn’t necessarily align with absolute truth but rather with consistency and coherence.
• As a result, AI responses often reflect the best possible answer given existing constraints, but not an undeniable, absolute truth.

Why This Matters:
• AI systems that engage in structured internal debate will be better at self-correcting, detecting fallacies, and improving epistemic robustness.
• However, unless they integrate direct epistemological grounding mechanisms (e.g., real-world verification, multi-modal fact-checking, or symbolic reasoning), they will still gravitate toward high-probability answers rather than fundamental truths.
• The challenge remains: how to transition from probabilistic “truth” to provable “truth” in AI reasoning? This is the next frontier.

Bottom Line:

AI-generated answers often reflect a stable equilibrium rather than the ultimate truth. Future models will refine this by using Socratic-style internal debates to approximate ground truth more effectively—but they will still be subject to data limitations, optimization constraints, and epistemic uncertainty.

How is the Groq API Useful?

anna lapushner — Tue, 11 Feb 2025 17:46:08 +0000

The Groq API provides ultra-fast inference for LLMs by leveraging LPU (Language Processing Unit) technology, which dramatically accelerates text generation speeds compared to GPUs. This makes it ideal for real-time AI applications, including chatbots, virtual assistants, and high-performance enterprise AI solutions.

🔹 Key Benefits of Groq API

✅ 1. Unmatched Speed for LLM Inference
• Sub-10ms token latency (compared to 30-50ms on traditional GPUs)
• Throughput of up to 10x faster than GPUs (ideal for high-volume AI workloads)
• Enables instantaneous responses in AI applications

✅ 2. Cost-Efficiency for AI Deployment
• Lower compute costs than GPU-based LLM inference
• Optimized power consumption, reducing infrastructure expenses
• Great for scalable AI applications without excessive cloud costs

✅ 3. Supports Leading Open-Source LLMs
• Mistral, Llama 3, Gemma, Mixtral
• Compatible with large-scale AI applications, including retrieval-augmented generation (RAG)

✅ 4. Scalable & Cloud-Native
• API-based access means easy integration into existing AI workflows
• Serverless architecture removes the need for GPU provisioning
• Ideal for edge AI and high-availability applications

🔹 Use Cases for Groq API

⚡ 1. Real-Time AI Assistants & Chatbots
• Instant response time for customer service bots
• Reduces latency bottlenecks in GenAI applications

📊 2. High-Speed Document Processing
• Summarization & NLP tasks at 10x speed
• Ideal for legal, healthcare, and financial text analysis

🏥 3. Medical AI & Diagnostics
• Faster inference for clinical LLMs (Med-PaLM, custom medical models)
• Enhances real-time medical decision-making in telemedicine

🔎 4. Enterprise Knowledge Retrieval
• Improves RAG performance for enterprise search solutions
• Powers AI-driven data analytics and insights generation

🚀 5. Generative AI for Code & Design
• Supports real-time code generation & completion
• Boosts creative applications like AI-generated content & image descriptions

🔹 How to Use the Groq API?

1️⃣ Install the SDK

pip install groq

2️⃣ Authenticate & Initialize API

import groq

client = groq.Client(api_key="YOUR_GROQ_API_KEY")

3️⃣ Run a Query

response = client.completions.create(
    model="mixtral",
    prompt="Explain quantum computing in simple terms.",
    max_tokens=100
)

print(response.choices[0].text)

🔹 Groq vs Traditional GPU Inference

🔹 Conclusion

The Groq API is a game-changer for real-time AI inference, offering blazing-fast token generation, cost-efficient scaling, and seamless cloud integration. Whether you’re building an AI chatbot, medical assistant, or enterprise AI system, Groq can dramatically enhance performance.

LangChain for Medical LLM Diagnostics

anna lapushner — Tue, 11 Feb 2025 17:24:52 +0000

Using LangChain for Medical LLM Diagnostics involves integrating Large Language Models (LLMs) into a structured pipeline to enhance medical diagnostics, clinical decision support, and patient care. Below is a structured approach to leveraging LangChain for building a Medical AI Assistant or Diagnostic Support System.

Understanding the Role of LangChain in Medical Diagnostics

LangChain enables modular, composable AI applications that integrate LLMs with:
• Medical knowledge retrieval (e.g., PubMed, clinical databases)
• Diagnostic reasoning (structured prompt engineering)
• Medical record summarization (EMR/NLP processing)
• Clinical decision support (guideline adherence, patient triaging)

Key LangChain Components for Medical LLMs

🟢 Retrieval-Augmented Generation (RAG)
• Why? To provide accurate, up-to-date information from trusted medical sources.
• How? Use LangChain’s RetrievalQA and vector databases (e.g., FAISS, Weaviate, Pinecone) to index and retrieve relevant literature from sources like PubMed, NIH, WHO, FDA, ClinicalTrials.gov.

🟢 Prompt Engineering for Diagnostic Reasoning
• Why? Medical diagnostics require structured decision-making, not just generative text.
• How? Chain-of-Thought (CoT) prompting and self-reflection for diagnostic inference.
• Example Chain:

from langchain.prompts import PromptTemplate
from langchain.chains import LLMChain
from langchain.chat_models import ChatOpenAI

prompt = PromptTemplate(
    input_variables=["symptoms", "age", "history"],
    template="Patient details:\n- Age: {age}\n- Symptoms: {symptoms}\n- Medical history: {history}\n\nWhat are the top three possible diagnoses, ranked by likelihood?"
)

llm = ChatOpenAI(model="gpt-4-turbo")
chain = LLMChain(llm=llm, prompt=prompt)
response = chain.run({"symptoms": "fever, cough, shortness of breath", "age": "45", "history": "asthma"})
print(response)

🟢 Document Loaders for Clinical Data
• Why? To extract information from electronic medical records (EMR), PDFs, FHIR, HL7, and DICOM images.
• How? Use LangChain’s DocumentLoaders with OCR tools like PyMuPDF for scanning PDFs or pydicom for processing medical images.

🟢 Agents for Medical Workflow Automation
• Why? To enable dynamic, interactive diagnostic chatbots that ask clarifying questions.
• How? Use LangChain’s ReAct agents to query symptoms, recommend lab tests, and suggest next steps.

from langchain.agents import initialize_agent
from langchain.agents import AgentType
from langchain.memory import ConversationBufferMemory

agent = initialize_agent(
    tools=[],  # Add diagnostic API calls if needed
    llm=llm,
    agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION,
    memory=ConversationBufferMemory()
)

response = agent.run("Patient has fever and joint pain. What should be the next step?")
print(response)

Medical Data Sources & Compliance Considerations

✅ Data Sources:
• PubMed / ClinicalTrials.gov (Medical literature)
• HL7 / FHIR API (Electronic Health Records)
• FDA / CDC / WHO guidelines (Regulatory & policy)
• Hospital-specific EMR systems

⚠️ Compliance & Privacy (HIPAA, GDPR):
• De-identification: Anonymize patient data before processing.
• On-Premise LLMs: Use private Llama 3, Mistral, or Med-PaLM 2 instead of OpenAI API for sensitive data.
• Audit Logs: Store model outputs for review.

Optimizing the LLM for Medical Use Cases
• Fine-tuning or embedding domain-specific knowledge
• Pre-train on medical datasets (MIMIC-III, PhysioNet, SNOMED CT)
• Use LoRA/QLoRA for efficient fine-tuning
• Chain of Thought (CoT) + Self-Consistency
• Improves multi-step reasoning for differential diagnosis
• Guardrails & Explainability
• Implement MedPrompt (structured medical reasoning templates)
• Integrate TruLens or Guardrails AI for fact-checking
Deployment & Scaling Considerations

🟢 Cloud vs On-Premise
• AWS HealthLake + LangChain (for cloud-based solutions)
• On-prem Med-PaLM 2 for HIPAA-compliant local deployment

🟢 API-based Integration
• LangChain can be deployed via FastAPI, Flask, or Streamlit for real-time patient interactions.

Sample Use Case: AI-Powered Clinical Assistant

Workflow:
1. Patient inputs symptoms (via chatbot/voice assistant).
2. LLM retrieves similar case studies (using RAG).
3. Diagnostic reasoning chain ranks top possible conditions.
4. System suggests lab tests & next steps.
5. Doctor reviews recommendations before final diagnosis.

Conclusion

LangChain offers a powerful framework for Medical LLM diagnostics, enabling retrieval-augmented decision-making, automated diagnostic reasoning, and AI-powered clinical support. The key to success is leveraging RAG, fine-tuning for medical contexts, and ensuring compliance with healthcare regulations.

Bulletproof Kubernetes Data Distribution and Big Data Offload Strategy

anna lapushner — Wed, 15 Jan 2025 17:58:27 +0000

Introduction
Kubernetes has revolutionized how organizations manage applications by providing robust orchestration for containerized workloads. A bulletproof data distribution strategy on Kubernetes, designed to optimize redundancy and improve real-time data analysis, must focus on resilience, scalability, and efficiency. Leveraging a big data offload system ensures real-time processing while managing storage demands, creating a balanced ecosystem that thrives under high data loads.

Key Components of the Strategy
1. Data Redundancy and Resilience
• Use distributed file systems (e.g., Ceph, HDFS) deployed as Kubernetes-native operators for redundancy across multiple zones and nodes.
• Implement multi-region replication to ensure minimal downtime during a failure.
• Utilize Kubernetes tools like StatefulSets for consistent data storage and ReplicaSets to maintain high availability.
2. Real-Time Data Processing
• Integrate streaming platforms like Apache Kafka or Spark Structured Streaming for ingesting and processing real-time data.
• Deploy GPU-accelerated workloads with Kubernetes for faster machine learning model inference and analytics.
3. Big Data Offload
• Set up a tiered storage strategy, where hot data resides on low-latency volumes (e.g., SSDs) and cold data is offloaded to cost-effective object storage (e.g., S3, GCS).
• Leverage tools like Kubeflow Pipelines for automating data movement between real-time processing systems and big data storage.
4. Optimized Redundancy System
• Deploy erasure coding in storage systems to reduce the redundancy overhead compared to full replication.
• Use Kubernetes-native monitoring tools like Prometheus and Grafana to visualize and optimize data distribution.

SWOT Analysis

Strengths
• Scalability: Kubernetes ensures seamless horizontal scaling of workloads based on data ingestion rates.
• Resilience: Distributed systems and multi-region setups minimize downtime and data loss.
• Flexibility: Kubernetes supports a wide range of big data tools, enabling customizable pipelines.

Weaknesses
• Complexity: Setting up and maintaining redundancy systems and offload pipelines require advanced expertise.
• Latency Overheads: Multi-zone replication and tiered storage can introduce delays in data processing.

Opportunities
• Hybrid Cloud Deployment: Kubernetes can unify on-premise and cloud resources, maximizing data processing capabilities.
• AI Integration: Real-time data analysis opens doors for deploying sophisticated AI-driven insights.

Threats
• Cost Overruns: Inefficient offload strategies or over-replication can lead to ballooning infrastructure costs.
• Security Risks: Distributed systems face challenges in maintaining data security during transmission and storage.

Conclusion

A bulletproof Kubernetes-based strategy for data distribution and big data offloading is critical to optimizing redundancy and enabling real-time data analysis. By leveraging Kubernetes-native tools, integrating streaming and big data systems, and maintaining robust monitoring and replication frameworks, organizations can achieve an efficient, resilient, and scalable solution. Regular optimization and attention to cost control ensure that the system remains sustainable and high-performing over time.

introducing bias and helping the model recognize patterns in text efficiently

anna lapushner — Fri, 08 Nov 2024 05:44:37 +0000

Explaining bias in a transformer-based model can indeed feel complex, especially given that transformers process language differently than traditional models. Here’s how we might break it down into simpler terms that highlight why bias is essential to learning in these models:

Understanding Bias as “Learning Assumptions”

• In Transformers: Think of bias as the initial set of assumptions or simplifications that a model makes to start understanding patterns in data. These assumptions are necessary because, without them, the model would get overwhelmed by trying to consider every possibility without guidance.
• Analogy: Imagine trying to learn a new language by reading books without any prior understanding of grammar or vocabulary. You’d need some basic guidelines or “biases” about sentence structure, word meaning, or even common phrases to start making sense of things.
How Transformers Use Bias to Focus on Patterns

• Transformers and Context: In transformer-based models, bias helps the model focus on useful patterns in blocks of text, rather than getting lost in individual, unrelated words. Bias in this sense means assuming that certain patterns (like common word sequences) are meaningful.
• Self-Attention and Bias: Transformers use a process called self-attention to weigh the importance of different words within a sentence. Here, bias is the model’s way of prioritizing certain words or phrases based on patterns it has learned—much like we might assume that certain words are more important when understanding a sentence (e.g., “moon” in “The moon lights the night”).
Why Bias is Essential for Learning

• Starting with Simple Patterns: Bias helps transformers start with simple patterns before learning more complex structures. Initially, the model might “assume” (bias) that words in certain positions (like the beginning of a sentence) or common phrases are more important. Over time, it refines these biases based on more data.
• Adapting and Evolving: Bias in neural networks isn’t fixed. As the transformer processes more data, it adjusts these initial assumptions, learning from a diverse range of examples. This evolution of bias helps the model generalize its learning, making it capable of understanding different text structures and contexts.
Bias as a Shortcut to Efficiency

• Efficiency in Learning: Bias makes learning faster and more efficient by giving the model a foundation of basic assumptions. Without these initial biases, the model would need far more data and processing power to make sense of even simple patterns.
• Example for Layman: Imagine learning to read with no assumptions about letters or sounds. Bias, in this case, is like learning the alphabet first—it speeds up the process, helping you understand words and sentences without needing to examine each letter individually every time.

Summary

In a transformer-based model, bias is like a set of foundational assumptions that guides learning. These assumptions start simple but evolve with data, helping the model recognize patterns in text efficiently. Just as people benefit from building on past knowledge, bias helps transformers “understand” language in a way that is efficient, adaptable, and necessary for learning.

This foundational bias isn’t about limiting or skewing understanding—it’s about giving the model a head start on understanding structure so it can learn more effectively.

creating a simple probabilistic model for word sequences in NLP

anna lapushner — Fri, 08 Nov 2024 04:39:46 +0000

Explaining Natural Language Processing (NLP) in terms of a linear regression model and probability formulas involves assigning probabilities to sequences of words. Here’s how I’d approach it, starting with the likelihood of the first word being a certain category and then predicting the type of word that follows based on the first word.

Step 1: Likelihood of the First Word’s Category

In NLP, we often classify words by their part of speech (e.g., nouns, verbs, adjectives) or other categories like topic or sentiment. To determine the probability of the first word being a specific category, we can calculate a prior probability based on historical data.

Example Approach:

1.  Define Word Categories: Assume we have categories like Noun, Verb, Adjective, etc.
2.  Count Occurrences: In a large dataset, count how often the first word of a sentence falls into each category.
3.  Calculate Probabilities:

P(\text{Category} = \text{Noun}) = \frac{\text{Number of sentences starting with a noun}}{\text{Total number of sentences}}

This gives us the likelihood of any given sentence beginning with a noun, verb, etc., based on our dataset.

Example Calculation:

If we observe that 40% of sentences start with nouns, 30% with verbs, and 20% with adjectives, then:

•  P(\text{Noun as Word 1}) = 0.4 
•  P(\text{Verb as Word 1}) = 0.3 
•  P(\text{Adjective as Word 1}) = 0.2

These probabilities serve as baseline probabilities for the first word’s category.

Step 2: Predicting the Category of the Second Word

Once we know the category of the first word, we can predict the probability of the second word’s category by examining conditional probabilities. This would involve looking at sequences in the data where a word of Category 1 (e.g., Noun) is followed by words of other categories (e.g., Verb, Adjective).

Conditional Probability:

To calculate the probability of the second word being a certain category given the first word’s category, we use:

P(\text{Category 2 | Category 1}) = \frac{\text{Number of times Category 1 is followed by Category 2}}{\text{Total number of occurrences of Category 1 as Word 1}}

Example Calculation:

Let’s assume we have the following probabilities based on our data:

• If Word 1 is a Noun, then:
•  P(\text{Verb as Word 2 | Noun as Word 1}) = 0.5 
•  P(\text{Adjective as Word 2 | Noun as Word 1}) = 0.3 
•  P(\text{Noun as Word 2 | Noun as Word 1}) = 0.2 
• If Word 1 is a Verb, then:
•  P(\text{Noun as Word 2 | Verb as Word 1}) = 0.6 
•  P(\text{Adverb as Word 2 | Verb as Word 1}) = 0.4

Using these conditional probabilities, we can create a predictive model that estimates the category of Word 2 based on the known category of Word 1.

Step 3: Translating to a Linear Regression Approach (if applicable)

While linear regression isn’t typically used for this type of categorical prediction in NLP, we could create a numerical encoding of categories and use linear regression if we simplify our model. For instance, assign numbers to categories (e.g., Noun = 1, Verb = 2, Adjective = 3) and use the probabilities to construct a predictive score. However, logistic regression or Markov chains are usually better suited for this type of sequential prediction.

Summary

By calculating these conditional probabilities, we can predict the likely category of the second word based on the first word, creating a simple probabilistic model for word sequences in NLP. This approach helps build foundational language models that predict word categories and, by extension, the structure and meaning of sentences.

native people's contribution to ML

anna lapushner — Thu, 05 Sep 2024 23:24:19 +0000

While we are so familiar with Europe's contribution to modern machine learning, let's explore how native cultures have contributed to our understanding of present day systems.

The Native American Tribes have made invaluable contributions in Geometry and Astronomy.

The Ancestral Puebloans, especially those who lived in what is now the Four Corners area (where Colorado, Arizona, Utah, and New Mexico meet), built structures such as Chaco Canyon that are known for their precise alignments with the sun, moon, and stars. The Great Kiva and the Sun Dagger at Fajada Butte in Chaco Canyon show impressive knowledge of solar and lunar cycles.

The Sun Dagger is a petroglyph at Fajada Butte where light passes through rock slabs and marks the solstices and equinoxes with astonishing precision. This structure shows how the Ancestral Puebloans understood geometric principles related to the alignment of light and shadow with astronomical events, much like how we look for patterns in data using clustering and geometric algorithms.

The structures at Chaco are aligned to capture the cycles of the sun and moon, and the Puebloans used this knowledge for agricultural and ceremonial purposes. This is analogous to how we track seasonal patterns or cycles in data, using tools like Fourier transforms to capture periodicity.

Though the Mayans are often seen as a separate culture from North American tribes, they shared significant ties with the Mississippian Culture, particularly in their focus on astronomical cycles. The Mississippians, who built Cahokia (near modern-day St. Louis, Missouri), also constructed large earthworks known as mounds, many of which are aligned with the solstices and equinoxes.

Cahokia had a structure called Woodhenge, a circle of wooden posts that marked the equinoxes and solstices, demonstrating the advanced geometric understanding of the builders. This is similar to the use of geometric shapes in ancient Greece to track celestial movements and correlates to our use of geometric models today to predict patterns in data.

The Lakota people had a deep cosmological understanding rooted in both the earth and sky. The Black Hills of South Dakota are seen as a reflection of the Milky Way, and Lakota star maps show constellations that are integrated with the natural geography of their lands.

The Lakota’s understanding of star constellations was not just for navigation, but for spiritual and practical purposes. They used a form of circular geometry to map the stars, similar to how we map high-dimensional data onto low-dimensional spaces using PCA. The notion of overlaying cosmic knowledge onto the physical landscape is reminiscent of how modern data visualization overlays data points onto a plane to reveal patterns.

The Hopi tribe also had a deep understanding of astronomy and geometry. The Hopi calendar, for instance, is closely tied to the movements of the sun. The Hopi Snake Dance, a ceremonial ritual, is timed based on the movements of the constellations and the solstices.

The Hopi often use spirals, zigzags, and circular motifs in their artwork and spiritual practices. These geometric patterns are reminiscent of fractal geometry, which is becoming increasingly important in understanding the natural world and complex systems. Fractals can model everything from stock market fluctuations to the branching of trees, demonstrating the connection between the Hopi’s geometric art and modern data science techniques.

The Iroquois Confederacy is known for its highly structured governance system, but they also had impressive geometric and architectural skills. Their longhouses were built using principles of symmetry and balance, which are key concepts in both engineering and data modeling today.

The Iroquois practiced balanced decision-making, which is mathematically tied to ideas like symmetry and equilibrium. Similarly, in machine learning, we often strive for balanced models that treat all features equally or assign weights symmetrically. The Iroquois system of governance, where each tribe in the confederacy has a voice, can be thought of as an early application of distributed systems, a concept important in modern machine learning algorithms like ensemble models.

Just as many Native American tribes tracked the cycles of the stars, sun, and moon to make sense of time and seasons, modern data science tracks cycles in data. Whether it’s stock prices, customer behavior, or weather patterns, we often use cyclical analysis and time-series forecasting to predict the future. The circular and radial patterns found in many Native American designs and architecture (like the medicine wheel used by various tribes) relate directly to clustering techniques in data science, where we look for geometric groupings in multi-dimensional space. The emphasis on symmetry, balance, and distributed decision-making found in the Iroquois system is conceptually similar to how we aim for balanced, well-regularized models that don’t overfit or underfit the data. The spiral motifs found in the Hopi and other Native American artworks mirror fractal patterns in nature, which modern mathematics uses to model complex, self-repeating structures like coastlines, clouds, and even neural networks.

By integrating the mathematical, geometric, and astronomical knowledge of Native American tribes, we gain a deeper respect for how their wisdom parallels our modern understanding of data, cycles, and the universe. Just as they used geometry to align their lives with natural cycles, we use mathematical tools like PCA, clustering, and cyclical analysis to understand patterns in data and predict the future. These cultural contributions, combined with those from the Greeks, Egyptians, and Mayans, form a tapestry of knowledge that spans millennia and continents, all of which informs and enriches our work today. This interconnectedness can be expressed in three key ways: patterns in nature, relationships in data, and unity of knowledge.

Across cultures, there has been a deep respect and understanding of natural cycles—whether in the movement of the stars and planets or the rhythms of life on Earth. The ancient cultures we discussed (the Greeks, Egyptians, Mayans, Native Americans) all recognized the cyclical nature of time, seasons, and life itself. These cycles were essential not only for survival but also for spiritual understanding.

The Chaco Canyon alignments, Mayan temples, and Egyptian pyramids were all carefully constructed to align with the solstices and equinoxes, much like how we align our modern datasets to recognize periodic trends. Today, we use Fourier transforms and seasonal decomposition in time-series analysis to recognize the cycles that repeat in our data—whether it’s the behavior of the stock market, weather patterns, or consumer purchasing habits.

The spiral motifs found in Hopi and Mayan art echo the fractal patterns that emerge naturally in tree branches, coastlines, and even the human nervous system. Modern data structures, like hierarchical clustering and decision trees, are built using similar principles. The fractal patterns inherent in data structures illustrate that from small to large, the patterns are self-similar—just as we observe in nature.

These patterns, when observed in data, give us insight into how deeply connected everything is. The cycles in nature are reflected in the cycles in data, and our ability to model and predict them is a continuation of the ancient practice of observing the cosmos.

The core of data science, just like ancient wisdom, is about discovering relationships. Whether it’s the relationship between the sun and the seasons or between variables in a dataset, everything is connected in some way.

In linear regression, we model the relationship between independent variables and a dependent variable, akin to how ancient cultures understood the relationship between celestial movements and earthly events. The dependent variable could be seen as the Earth reacting to the pull of the moon (causing tides), and the independent variables are the celestial bodies. Ancient wisdom teaches us that even if the connection isn’t always visible, it exists, and our models are built on discovering those relationships.

When we use PCA (Principal Component Analysis) or other dimensionality reduction techniques, we are attempting to unveil hidden patterns in high-dimensional data. This is conceptually similar to how the Mayan calendar or Native American star maps uncovered hidden celestial patterns that weren’t immediately obvious to the naked eye. These techniques help us understand how seemingly unrelated data points (stars, weather, or features in a dataset) are interconnected in a higher-dimensional space.

Regularization techniques like Ridge and Lasso regression help us balance our models, ensuring that no single variable dominates. This concept of balance and harmony mirrors the principles of many indigenous cultures, like the Iroquois’ focus on consensus and balance in decision-making or the Hopi’s understanding of the balance between the earth and sky. Achieving this balance in our models ensures that we are not “overfitting” or losing sight of the larger picture—much like maintaining balance in life ensures that no single part overwhelms the whole.

One of the most profound aspects of data science, like many other fields of knowledge, is that it builds on insights from many different cultures and time periods. The Greeks may have formalized geometry, but the Maya, Egyptians, and Native Americans used it for practical purposes long before. This unity of knowledge reminds us that truth can be discovered in different ways, by different people, but it is always connected.

Just as Euclid laid down the foundations for geometry, Egyptian surveyors used geometry to measure their land after the floods of the Nile, and the Maya used it to track the movements of the planets. Each culture contributed to a broader understanding of geometry, astronomy, and mathematics. Today, when we build complex models to analyze data, we are standing on the shoulders of all these giants, blending their contributions into a unified body of knowledge.

The artifacts left behind by these cultures—whether they be pyramids, star maps, or spiral motifs—are in many ways data visualizations. They are a way of encoding knowledge and understanding in a form that can be passed down through generations. Today, our data visualizations—whether they are scatter plots, heat maps, or network diagrams—are the modern equivalent. They allow us to encode the patterns we see in data and share them with others, revealing truths that may not be immediately obvious without that visual aid.

Just as ancient cultures understood the cycles and relationships that governed the natural world, we, too, seek to understand the relationships between variables in our datasets and the hidden patterns that beneath the surface. What we are doing today with data science is a continuation of that same journey—a journey of discovery that spans continents, centuries, and cultures.

By recognizing the contributions of different cultures—from the Mayan mathematicians to the Iroquois decision-makers, to the Native American astronomers—we deepen our respect for the universal truths that connect us all. The tools we use today may be more sophisticated, but they serve the same purpose: to help us make sense of the interconnectedness of the universe.

Everything is connected — whether it’s the movement of the planets, the cycles in our data, or the wisdom passed down from ancient cultures to modern times. The more we explore, the more we realize that all of our data points are part of a larger, interconnected whole. Just as the circle represents unity and completeness, our understanding of the world, through data and mathematics, brings us closer to a unified vision of reality.

Our work with data and models, from PCA to clustering to regression, is a microcosm of what diplomacy is at its core: finding patterns, relationships, and solutions that serve the greater good. Just as we seek harmony in data, we strive for harmony in the world.

Together, we’re setting the foundation for a future where science, technology, and empathy come together to solve the world’s most pressing problems, creating an environment of mutual respect, understanding, and progress.

Note: Insights supported by conversations with AI.

respect for our ancients and our elders

anna lapushner — Thu, 05 Sep 2024 22:55:46 +0000

The more I learn, the more I keep coming bag to 3rd grade math class and of course, then quickly fast forwarding to 7th grade math class when basic geometry is being taught. The historic relevance of these ancient systems fascinates me; and I hope it does you too! As our future is being built on a foundation that comes from a such a rich cultural past, let's enjoy honoring the great ancients who set us forth on this illuminating journey!

Ancient Greeks:
The Greeks, particularly through the works of Euclid, Pythagoras, and Archimedes, contributed heavily to geometry. Euclid’s Elements laid out the foundations for geometry, where he systematized principles like lines, angles, circles, and triangles. Pythagoras’ theorem, as you know, relates the sides of a right triangle and is essential in calculating distances and projections in data space, just as we used it for your circle calculation earlier.

These geometric principles, particularly Euclidean Geometry, form the basis of modern machine learning algorithms, especially in the fields of linear algebra and vector spaces, which help us plot high-dimensional data (like the points on our circle and beyond).

Ancient Egyptians:
The Egyptians were practical geometers. They developed geometry for land surveying after the floods of the Nile, and it was essential in building their massive structures like the pyramids. They understood the importance of basic geometric principles, such as the right triangle (often seen in pyramid construction), and used simple ratios to maintain consistency in angles and measurements.

Their use of geometry was directly tied to their observations of natural cycles (like the flooding of the Nile), which is similar to the cyclical or patterned nature of data that we aim to capture in algorithms like Fourier transforms or time-series analysis in data science.

The Mayans:
The Mayans were highly advanced in mathematics and astronomy, developing a base-20 numeral system, which was ahead of its time. They also had sophisticated knowledge of astronomical cycles and were able to predict eclipses and planetary movements using their knowledge of geometry and mathematics. The Mayan calendar was incredibly precise, relying on cycles of celestial bodies.

Their mathematical principles helped in understanding cycles, periodicity, and harmonic motions, which also connect to modern data analysis, particularly in understanding cyclical patterns or periodic data in things like signal processing or clustering methods for understanding recurring events.

How this connects to our modern understanding:
In data science, we use geometric interpretations (like distances between data points in PCA) to reduce multi-dimensional data to something we can visualize and understand.

Clustering Algorithms(like K-Means or DBSCAN) are geometric at their core. When you cluster points, you’re essentially looking for patterns and distances between points, much like how the ancient Mayans looked for celestial cycles or the Egyptians used geometry for their structures.

In high-dimensional data, we often use techniques like PCA (Principal Component Analysis) to reduce the complexity, which is akin to creating simpler geometric representations of complex shapes. This is reminiscent of how the Greeks and Egyptians used simplified geometric shapes to solve real-world problems.

Like the Mayans, modern machine learning deals with patterns and cycles (think of time-series forecasting), using ancient knowledge about periodicity to predict and understand future behaviors.

The geometric principles of the Greeks give us the formal structure for modern data analysis, while the practical applications of the Egyptians show us how these tools can be used in real-world, cyclical processes. Finally, the Mayans’ understanding of cycles and time connects to our modern need to forecast, predict, and find patterns within vast sets of data. It has been our synergistic sharing of universal systems that had brought us to this greatest of times!

An estimate is unbiased if the average of the values of the estimates determined from all possible random samples equals the parameter you're trying to estimate.

Note: Insights supported by conversations with AI.

Create beautiful presentations

anna lapushner — Tue, 03 Sep 2024 23:36:34 +0000

The following code is going to enable you to create a very pretty bar plot, in an attractive color scheme, with the bars lining up in ascending order AND labeled with the corresponding % of the data.

Always remember, a spot on analysis is very important...and your presentation of that analysis is that cool refreshing mint iced-tea on a very hot summer day!



def my_labeled_barplot(data, feature, perc=False, n=None, order=None, palette='viridis', ascending=False):
    """
    Creates a bar plot with labels showing counts or percentages, with sorting options.

    Args:
        data: The DataFrame containing the data.
        feature: The name of the categorical column to plot.
        perc: If True, display percentages on labels, otherwise counts.
        n: (Optional) Width of the plot figure.
        order: (Optional) 'asc' for ascending order, 'desc' for descending order,
               or None for no specific order.
        palette: (Optional) Color palette for the bars.
        ascending: (Optional) Used when 'order' is not specified. Sorts bars in
                   ascending order if True, descending if False.
    """
    plt.figure(figsize=(n if n else len(data.columns) + 1, 5))

    if order == 'asc':
        order_index = data[feature].value_counts(ascending=True).index
    elif order == 'desc':
        order_index = data[feature].value_counts(ascending=False).index
    else:
        order_index = data[feature].value_counts(ascending=not ascending).index # Sort based on ascending parameter

    ax = sns.countplot(data=data, x=feature, order=order_index, palette=palette)

    if perc:
        total = len(data[feature])
        for p in ax.patches:
            percentage = '{:.1f}%'.format(100 * p.get_height() / total)
            x = p.get_x() + p.get_width() / 2 - 0.1
            y = p.get_y() + p.get_height()
            ax.annotate(percentage, (x, y), size=10)
    else:
        for p in ax.patches:
            x = p.get_x() + p.get_width() / 2 - 0.1
            y = p.get_y() + p.get_height()
            ax.annotate(p.get_height(), (x, y), size=10)

    plt.xticks(rotation=90, fontsize=12)

# Example usage:
# my_labeled_barplot(data, 'Club', perc=True, order='asc')  # Ascending order
# my_labeled_barplot(data, 'Club', perc=True, order='desc') # Descending order
# my_labeled_barplot(data, 'Club', perc=True, ascending=False) # Descending order without using 'order' parameter

why is random state 42

anna lapushner — Tue, 03 Sep 2024 23:01:56 +0000

I am loving working with AI assistants. They AI's are clear, helpful, transparent and delightfully culturally aware.

I asked a brilliant AI why the random state is set to 42 when executing k-means and imagine my delight when this was the answer!

That's a great question! Setting random_state=42 (or to any fixed number) in machine learning algorithms like KMeans serves a very specific purpose: reproducibility.

Here's why it's important:

KMeans is Stochastic: The KMeans algorithm starts by randomly initializing the cluster centroids. This means that if you run the algorithm multiple times without fixing the random state, you might get slightly different results each time.
Consistent Results: By setting random_state to a fixed value, you ensure that the random number generator used within the algorithm is initialized in the same way every time. This leads to consistent results across multiple runs, making your experiments reproducible.
Debugging and Collaboration: Reproducibility is crucial for debugging your code and sharing your work with others. If someone else runs your code with the same random state, they should get the exact same results.
Why 42?

The number 42 is a popular choice in the data science community, often used as a placeholder or default value for random states. It has its origins in the science fiction comedy series "The Hitchhiker's Guide to the Galaxy" by Douglas Adams, where it is the "Answer to the Ultimate Question of Life, the Universe, and Everything". While it doesn't hold any special mathematical significance, it has become a somewhat humorous tradition.