DEV Community: daily ai trends

TCEval: Using Thermal Comfort to Assess Cognitive and Perceptual Abilities of AI

daily ai trends — Wed, 31 Dec 2025 19:29:35 +0000

arXiv:2512.23217v1 Announce Type: new
Abstract: A critical gap exists in LLM task-specific benchmarks. Thermal comfort, a sophisticated interplay of environmental factors and personal perceptions involving sensory integration and adaptive decision-making, serves as an ideal paradigm for evaluating real-world cognitive capabilities of AI systems. To address this, we propose TCEval, the first evaluation framework that assesses three core cognitive capacities of AI, cross-modal reasoning, causal association, and adaptive decision-making, by leveraging thermal comfort scenarios and large language model (LLM) agents. The methodology involves initializing LLM agents with virtual personality attributes, guiding them to generate clothing insulation selections and thermal comfort feedback, and validating outputs against the ASHRAE Global Database and Chinese Thermal Comfort Database. Experiments on four LLMs show that while agent feedback has limited exact alignment with humans, directional consistency improves significantly with a 1 PMV tolerance. Statistical tests reveal that LLM-generated PMV distributions diverge markedly from human data, and agents perform near-randomly in discrete thermal comfort classification. These results confirm the feasibility of TCEval as an ecologically valid Cognitive Turing Test for AI, demonstrating that current LLMs possess foundational cross-modal reasoning ability but lack precise causal understanding of the nonlinear relationships between variables in thermal comfort. TCEval complements traditional benchmarks, shifting AI evaluation focus from abstract task proficiency to embodied, context-aware perception and decision-making, offering valuable insights for advancing AI in human-centric applications like smart buildings.

Source: https://arxiv.org/abs/2512.23217

ReGAIN: Retrieval-Grounded AI Framework for Network Traffic Analysis

daily ai trends — Wed, 31 Dec 2025 19:29:33 +0000

Key points

['Key components of ReGAIN include:', '- Traffic summarization', '- Retrieval-augmented generation (RAG)', '- Large Language Model (LLM) reasoning']

arXiv:2512.22223v1 Announce Type: cross
Abstract: Modern networks generate vast, heterogeneous traffic that must be continuously analyzed for security and performance. Traditional network traffic analysis systems, whether rule-based or machine learning-driven, often suffer from high false positives and lack interpretability, limiting analyst trust. In this paper, we present ReGAIN, a multi-stage framework that combines traffic summarization, retrieval-augmented generation (RAG), and Large Language Model (LLM) reasoning for transparent and accurate network traffic analysis. ReGAIN creates natural-language summaries from network traffic, embeds them into a multi-collection vector database, and utilizes a hierarchical retrieval pipeline to ground LLM responses with evidence citations. The pipeline features metadata-based filtering, MMR sampling, a two-stage cross-encoder reranking mechanism, and an abstention mechanism to reduce hallucinations and ensure grounded reasoning. Evaluated on ICMP ping flood and TCP SYN flood traces from the real-world traffic dataset, it demonstrates robust performance, achieving accuracy between 95.95% and 98.82% across different attack types and evaluation benchmarks. These results are validated against two complementary sources: dataset ground truth and human expert assessments. ReGAIN also outperforms rule-based, classical ML, and deep learning baselines while providing unique explainability through trustworthy, verifiable responses.

Source: https://arxiv.org/abs/2512.22223

TCEval: A Real-World Cognitive Turing Test for AI and Large Language Models

daily ai trends — Wed, 31 Dec 2025 14:00:04 +0000

Most Large Language Model (LLM) benchmarks today focus on abstract reasoning, coding, or text-based intelligence. But how well do AI systems perform when faced with real-world, human-centered decision-making scenarios?

A new paper introduces TCEval, the first evaluation framework that uses thermal comfort decision-making to assess an AI’s cognitive abilities — offering a powerful new way to benchmark AI beyond traditional tests.

🔍 Why Thermal Comfort?

Thermal comfort isn’t trivial. It’s influenced by:

🌡️ Environmental conditions
👕 Clothing insulation
🧍 Human perception
🧠 Adaptive decision-making

This makes it an ideal real-world testbed to evaluate whether AI can:
1️⃣ Perform cross-modal reasoning
2️⃣ Understand causal relationships
3️⃣ Make adaptive, context-aware decisions

🧪 How TCEval Works

TCEval uses LLM “agents” with simulated human traits to:

Choose clothing insulation levels
Provide thermal comfort feedback
Make decisions in varying environmental situations

Their outputs are validated against:

ASHRAE Global Thermal Comfort Database
Chinese Thermal Comfort Database

So instead of testing AI in synthetic lab conditions, TCEval measures AI performance in ecologically valid, human-centric scenarios.

📊 Key Findings

Experiments across four major LLMs revealed:

✔ LLMs show foundational cross-modal reasoning ability
✔ Their responses show better directional consistency when a 1 PMV tolerance is allowed

But…

❌ Exact alignment with human responses remains limited
❌ PMV distributions differ significantly from real human data
❌ Models perform near-random in discrete thermal comfort classification

In short:

Current LLMs can “reason” about comfort trends…
…but still lack precise causal understanding of how variables interact.

🔎 Why This Matters

TCEval shifts AI evaluation from:
❌ Abstract benchmarks
➡️ ✅ Human-centered, embodied, real-world cognition

This opens new doors for:

Smart building systems
Human-environment interaction design
AI agents in real-life decision support
More meaningful AI benchmark development

It acts like a Cognitive Turing Test, pushing AI assessment closer to real human thinking and perception.

📖 Read the Full Paper

🔗 https://arxiv.org/abs/2512.23217

🚀 ReGAIN: A Transparent, AI-Powered Framework for Network Traffic Analysis

daily ai trends — Wed, 31 Dec 2025 14:00:03 +0000

Modern networks generate massive amounts of diverse traffic that must be continuously monitored for security and performance — but traditional network traffic analysis systems often fall short. Whether rule-based or powered by machine learning, they commonly suffer from high false positives and poor explainability, making it hard for analysts to trust their outputs.

💡 Meet ReGAIN — a multi-stage framework that combines:

Traffic Summarization
Retrieval-Augmented Generation (RAG)
Large Language Model (LLM) Reasoning

The goal? Deliver accurate, transparent, and evidence-backed network traffic analysis.

✅ How ReGAIN Works

ReGAIN converts network traffic into natural-language summaries and stores them in a multi-collection vector database. It then uses a hierarchical retrieval pipeline to ground LLM outputs with real, verifiable evidence.

Key components include:

🔎 Metadata-based filtering
🎯 MMR sampling
🔁 Two-stage cross-encoder reranking
🛑 Abstention mechanism to prevent hallucinations

This ensures decisions are not only correct, but also explainable and trustworthy.

📊 Real-World Performance

Evaluated on ICMP ping flood and TCP SYN flood traces from real-world datasets, ReGAIN achieved:

95.95% – 98.82% accuracy across different attack types and benchmarks

Validation came from:
✔ Ground truth datasets
✔ Human expert assessments

Even better — ReGAIN outperformed:

Rule-based systems
Classical ML models
Deep learning baselines

…while still providing human-readable explanations instead of black-box outputs.

🔐 Why This Matters

Security teams need tools that are:

Reliable
Interpretable
Evidence-backed

ReGAIN bridges the gap between advanced AI capabilities and real-world trust requirements in cybersecurity operations.

📚 Read the full paper here:
https://arxiv.org/abs/2512.22223