Zain Naboulsi

Posted on Feb 19 • Originally published at dailyairundown.substack.com

Daily AI Rundown - February 18, 2026

#ai #machinelearning #news #newsletter

This is the February 18, 2026 edition of the Daily AI Rundown newsletter. Subscribe on Substack for daily AI news.

Tech News

No tech news available today.

Prefer to listen? ReallyEasyAI on YouTube

Biz News

No biz news available today.

Prefer to listen? ReallyEasyAI on YouTube

Podcasts

Semantic Chunking and the Entropy of Natural Language

This study introduces a first-principles statistical model that explains the inherent redundancy and entropy of natural language by treating text as a hierarchical structure of semantically coherent chunks. By recursively segmenting documents into nested units—ranging from broad topics down to single words—the authors construct semantic trees that mirror the cognitive process of comprehension. The researchers demonstrate that the entropy rate derived from this hierarchical structure closely matches the estimates produced by modern Large Language Models, suggesting that a significant portion of linguistic unpredictability is encoded within this multiscale semantic organization. Furthermore, the model relies on a single free parameter representing the maximum number of chunks at each hierarchical level, which effectively captures the semantic complexity of different genres, distinguishing between simple narratives like children's stories and more complex forms like poetry.

https://arxiv.org/pdf/2602.13194

Buy versus Build an LLM:A Decision Framework for Governments

As Large Language Models (LLMs) evolve into essential digital infrastructure, governments must navigate the complex strategic decision of whether to purchase commercial AI services or develop sovereign models domestically. This choice extends beyond financial calculations to encompass critical dimensions such as national sovereignty, data privacy, cultural alignment, and long-term economic resilience. The authors propose a nuanced evaluation framework that identifies a spectrum of acquisition pathways, ranging from purchasing API access and utilizing hybrid sovereign clouds to building models from scratch or adapting open-source systems. While buying commercial solutions offers rapid deployment and lower initial capital expenditure, building sovereign models provides governments with essential control over sensitive data, protection against vendor lock-in, and the ability to tailor systems to specific local languages and legal contexts that global providers often overlook. Drawing on practical insights from initiatives like Singapore's SEA-LION and Switzerland's Apertus, the paper concludes that effective national AI strategies are rarely binary; instead, they are typically pluralistic, leveraging commercial models for general commodity tasks while cultivating domestic capabilities for high-risk or strategically vital public services.

https://arxiv.org/pdf/2602.13033

AI Agents for Inventory Control: Human-LLM-OR Complementarity

This study explores the synergistic integration of Operations Research (OR) algorithms, Large Language Models (LLMs), and human decision-makers within the domain of inventory control. Through the creation of InventoryBench, a benchmark encompassing over 1,000 synthetic and real-world instances, the researchers demonstrate that hybrid approaches combining OR heuristics with LLM reasoning significantly outperform independent methods. The results indicate that while traditional OR tools provide essential mathematical precision for calculating base-stock levels, LLMs contribute critical capabilities in detecting demand shifts, identifying supply disruptions, and applying world knowledge to contextualize data. Furthermore, a controlled classroom experiment revealed that a human-in-the-loop configuration—specifically where humans make final decisions based on OR-augmented LLM recommendations—achieved the highest profitability, surpassing both autonomous AI agents and humans relying solely on OR data. Ultimately, the findings argue for a complementary system where algorithmic precision, AI-driven contextual reasoning, and human judgment interact to mitigate the limitations inherent in each individual approach.

https://arxiv.org/pdf/2602.12631
https://tianyipeng.github.io/InventoryBench/

In-Context Autonomous Network Incident Response: An End-to-End Large Language Model Agent Approach

To address the inefficiencies of manual cyber incident response and the limitations of previous automated systems, researchers developed a new autonomous agent powered by a lightweight large language model. Unlike reinforcement learning approaches that require extensive manual modeling, this end-to-end solution processes raw system logs directly to perceive threats, reason about attack patterns, and plan recovery actions using an internal world model. The agent employs a lookahead planning strategy inspired by Monte-Carlo tree search, allowing it to simulate potential outcomes and refine its tactics through in-context adaptation when actual observations differ from predictions. By integrating perception, reasoning, planning, and action into a single 14-billion parameter model, this approach mitigates common issues like hallucination and context loss found in general-purpose models. Experimental results demonstrate that this tailored agent achieves network recovery up to 23% faster than leading frontier models while running efficiently on commodity hardware.

https://arxiv.org/pdf/2602.13156
https://github.com/TaoLi-NYU/llmagent4incidense-response-aaai26summer

Consistency of Large Reasoning Models Under Multi-Turn Attacks

This study investigates the adversarial robustness of nine frontier reasoning models, such as GPT-5 and DeepSeek-R1, revealing that while their explicit reasoning capabilities generally confer greater consistency than standard instruction-tuned baselines, they remain susceptible to specific multi-turn attacks. The researchers identified distinct vulnerability profiles where misleading suggestions proved universally effective, while social pressure tactics elicited model-specific failures classified into modes like Self-Doubt and Social Conformity, which together accounted for half of all capitulations. A significant finding is the failure of Confidence-Aware Response Generation (CARG), a defense mechanism successful with standard models; for reasoning models, the extended reasoning process induces systematic overconfidence, rendering confidence scores poor predictors of correctness and making random confidence embedding paradoxically more effective than targeted extraction. Ultimately, the authors conclude that reasoning capabilities alone do not guarantee robustness against manipulation, highlighting the need for redesigned defense paradigms that account for the unique calibration issues inherent in long-chain reasoning.

https://arxiv.org/pdf/2602.13093

Optimal Take-off under Fuzzy Clearances

This research paper presents a hybrid obstacle-avoidance architecture for unmanned aircraft that integrates Optimal Control with a Fuzzy Rule-Based System to enable adaptive constraint handling during take-off. Motivated by the need for interpretable decision-making that adheres to FAA and EASA safety guidelines, the authors designed a fuzzy logic layer to determine constraint radii and urgency levels based on obstacle data, which are then fed into an optimal control solver as soft constraints. The system aims to improve efficiency by selectively activating trajectory updates, and initial tests with a simplified model demonstrated feasible computation times of two to three seconds per iteration. However, the experiments uncovered a critical software incompatibility in the latest versions of the optimization tools, FALCON and IPOPT, where the Lagrangian penalty remained zero and prevented the proper enforcement of safety constraints. The study concludes that this failure was a result of a solver-toolbox regression rather than a modeling error, and future work will focus on validating the framework with earlier software versions and optimizing the fuzzy membership functions.

https://arxiv.org/pdf/2602.13166

MedXIAOHE: A Comprehensive Recipe for Building Medical MLLMs

MedXIAOHE is a medical vision-language foundation model engineered to bridge the gap between general AI performance and the rigorous standards required for real-world clinical applications,. Built upon a multimodal architecture that integrates a high-resolution vision encoder with a large language model, the system utilizes an entity-aware continual pretraining framework designed to organize heterogeneous medical data around a structured taxonomy, effectively addressing knowledge gaps in rare diseases and long-tail medical scenarios,. The model's training methodology incorporates advanced reasoning patterns through a mid-training phase that emphasizes chain-of-thought logic and tool use, followed by post-training alignment strategies involving reinforcement learning and rubric-based rewards to minimize hallucinations in long-form report generation,,. To validate its capabilities, the authors introduced a unified benchmark suite consolidating over 30 public and in-house datasets, where MedXIAOHE demonstrated state-of-the-art performance across diverse tasks including visual diagnosis, medical imaging interpretation, and complex clinical reasoning, surpassing several leading closed-source models,.

https://arxiv.org/pdf/2602.12705

Code2Worlds: Empowering Coding LLMs for 4D World Generation

Code2Worlds is a framework that advances generative AI by empowering Large Language Models to create physically grounded 4D environments directly from natural language prompts. Addressing the limitations of prior methods that struggled with balancing detailed object structures against global environmental layouts, this system employs a dual-stream architecture that disentangles the generation of high-fidelity 3D objects from the hierarchical orchestration of the background scene. To ensure that the generated animations adhere to the laws of physics rather than just appearing visually plausible, the framework utilizes a closed-loop refinement mechanism where a Vision Language Model acts as a motion critic to iteratively evaluate and correct the simulation code. Evaluations on the newly created Code4D benchmark demonstrate that Code2Worlds significantly outperforms existing baselines, achieving higher semantic richness and drastically reducing physical hallucinations in dynamic scenes.

https://arxiv.org/pdf/2602.11757
https://github.com/AIGeeksGroup/Code2Worlds

OpenLID-v3: Improving the Precision of Closely Related Language Identification

OpenLID-v3 is introduced as an upgraded language identification classifier engineered to improve the precision of detecting closely related languages and filtering non-linguistic noise from large-scale web datasets. Addressing the limitations of prior systems like OpenLID-v2 and GlotLID, which often conflated distinct varieties such as Bosnian, Croatian, and Serbian or misclassified noise as low-resource languages, the researchers refined the model by curating cleaner training data, introducing a "not-a-language" label, and merging dialectal clusters. The study emphasizes that while broad benchmarks like FLORES+ suggest high performance, they fail to capture the nuances required for distinguishing similar linguistic groups, necessitating the use of specialized evaluation datasets. Ultimately, the report demonstrates that OpenLID-v3 offers improved precision over its predecessors and recommends a top-1 ensembling approach with GlotLID to achieve the most accurate results for distinguishing between valid text and noise.

https://arxiv.org/pdf/2602.13139
https://github.com/hplt-project/openlid

Xiaomi-Robotics-0: An Open-Sourced Vision-Language-Action Model with Real-Time Execution

Xiaomi-Robotics-0 is an advanced open-source Vision-Language-Action (VLA) model designed to bridge the gap between high-level semantic understanding and fluid, real-time robotic control. Built upon a mixture-of-transformers architecture that integrates a pre-trained vision-language model with a diffusion transformer for action generation, the system utilizes a comprehensive pre-training regimen involving diverse robot trajectories and vision-language data to ensure robust generalization capabilities without catastrophic forgetting. To address the challenge of inference latency in real-world deployment, the researchers implemented an asynchronous execution strategy that allows the robot to move continuously while computing future actions; this is optimized via a novel "Lambda-shape" attention mask during post-training which prevents the model from over-relying on action history and forces it to remain reactive to visual inputs. Consequently, Xiaomi-Robotics-0 achieves state-of-the-art performance across multiple simulation benchmarks and demonstrates superior throughput and precision in complex bimanual tasks, such as Lego disassembly and towel folding, effectively running on consumer-grade hardware.

https://arxiv.org/pdf/2602.12684
https://xiaomi-robotics-0.github.io/

Qwen3.5: Towards Native Multimodal Agents

Released in February 2026, Qwen3.5 represents a significant advancement in native multimodal artificial intelligence, featuring the Qwen3.5-397B-A17B model which utilizes a hybrid architecture of linear attention and sparse mixture-of-experts to balance immense scale with inference efficiency. By activating only 17 billion of its 397 billion total parameters per forward pass, this system achieves state-of-the-art performance across reasoning, coding, and visual understanding benchmarks, often rivaling or surpassing frontier competitors like GPT-5.2 and Claude 4.5 Opus. The model distinguishes itself through rigorous post-training reinforcement learning across diverse environments, enabling it to function as a highly capable agent that supports 201 languages and seamlessly integrates tool use, such as web searching and code interpretation, within a one-million-token context window. Supported by a specialized heterogeneous infrastructure that optimizes training throughput, Qwen3.5 aims to facilitate the transition from simple model scaling to the creation of persistent, autonomous systems capable of executing complex, multi-step objectives.

https://qwen.ai/blog?id=qwen3.5

Solving Sparse Finite Element Problems on Neuromorphic Hardware

Researchers have successfully demonstrated that neuromorphic hardware, which mimics the brain's architectural principles to achieve high energy efficiency, can be utilized to solve partial differential equations using the finite element method. By developing an algorithm called NeuroFEM, the authors translated the large, sparse linear systems central to scientific computing into a spiking neural network where neurons function as distributed controllers that dynamically reduce calculation errors. This approach was implemented on Intel's Loihi 2 chip, effectively solving problems like the Poisson equation and linear elasticity on complex, irregular meshes without requiring the extensive training data typically associated with neural networks. The study reveals that this method achieves high numerical accuracy and ideal scalability, meaning the computational resources required grow linearly rather than quadratically with problem size. Although the current execution time is slower than traditional central processing units, the neuromorphic approach offers significant potential for energy savings, bridging the gap between established mathematical simulations and emerging brain-inspired computing technologies.

https://www.nature.com/articles/s42256-025-01143-2

QED-Nano: Teaching a Tiny Model to Prove Hard Theorems

The LM Provers Team introduced QED-Nano, a compact 4-billion parameter language model engineered to solve complex Olympiad-level mathematical proofs with performance comparable to much larger frontier models. The development process utilized a three-stage post-training recipe: supervised fine-tuning using data distilled from DeepSeek-Math-V2, reinforcement learning optimized via dense, rubric-based rewards, and a novel training technique called Reasoning Cache that enables the model to improve iteratively through summarize-and-refine cycles. When deployed with agentic scaffolds that scale test-time computation to over 1.5 million tokens per problem, QED-Nano outperforms significantly larger open-source models like Nomos-1 and approaches the capabilities of proprietary systems like Gemini 3 Pro at a fraction of the inference cost. This research demonstrates that task-specialized small models, when explicitly trained for test-time adaptation and paired with effective verification strategies, can bridge the gap between accessible open models and massive generalist systems in high-reasoning domains.

DEV Community

Daily AI Rundown - February 18, 2026

Tech News

Biz News

Podcasts

Stay Connected

Top comments (0)