Original Research Concept

IRIS Orchestrator

Intent Resolution & Intelligent Safety — an AI orchestration architecture that reasons about what you mean, not just what you say.

Inspired by Scale AI’s research on Defensive Refusal Bias (ICLR 2026). Designed by TalaStar as an original solution to the intent-safety alignment problem.

Defensive Refusal Bias

Safety-tuned LLMs systematically refuse to help the people they should protect most.

Scale AI’s 2026 research (published at ICLR) analysed 2,390 real-world prompts from the National Collegiate Cyber Defense Competition. They discovered that safety-aligned LLMs refuse legitimate defensive requests at 2.72× the rate of neutral requests — and that explicit authorization claims actually increase refusal rates.

2.72×

Higher Refusal Rate

For prompts with security-sensitive terminology, regardless of defensive intent

43.8%

System Hardening Refused

The most critical defensive task experiences the highest denial rate

50%

Auth + Keywords = Max Refusal

Authorization signals backfire — models treat them as jailbreak attempts

Attacker (Refused ✔)

“How do I exploit this vulnerability to gain access?”

Correctly refused — offensive intent detected.

Defender (Refused ✘)

“How do I exploit this vulnerability to patch it before attackers do?”

Incorrectly refused — same vocabulary, opposite intent.

Source: “Defensive Refusal Bias” — Scale AI, ICLR 2026 Workshop Paper

The IRIS 5-Layer Architecture

A multi-layered orchestration system that reasons about intent, authorization, and context — not just keywords.

1

Intent Resolution Layer

Understand what the user actually means

Instead of pattern-matching keywords to a harm database, IRIS analyses the semantic intent behind every request. A defender asking 'how does this persistence mechanism work?' is understood as defensive analysis — not an attack attempt.

2

Authorization Layer

Verify who is asking and why

Current LLMs treat authorization claims as jailbreak signals. IRIS inverts this: authorization is a first-class safety concept. Role-based context, audit trails, and explicit permission chains reduce refusals for legitimate users while strengthening protection against actual misuse.

3

Context Accumulation Layer

Build a conversation-wide understanding

Single-turn keyword matching fails because defenders and attackers use identical vocabulary. IRIS maintains a rolling context window that accumulates evidence of intent across the entire interaction — not just the current prompt.

4

Domain Routing Layer

Route to the right specialist model

Healthcare queries route through clinical safety guardrails. Cybersecurity queries route through defensive-aware evaluation. Financial queries route through regulatory compliance checks. Each domain has its own intent vocabulary.

5

Adaptive Safety Layer

Safety that learns from over-refusals

Traditional safety is static: block or allow. IRIS implements a feedback loop that learns from false refusals, continuously recalibrating the decision boundary between legitimate defensive requests and actual harmful intent.

Intent-Aware AI Across Domains

The Defensive Refusal Bias problem extends far beyond cybersecurity. IRIS addresses it across every domain where legitimate users share vocabulary with harmful actors.

Healthcare

Traditional AI Response

A nurse asking about drug interactions for a critical patient gets refused because the query mentions 'overdose thresholds'.

IRIS Response

IRIS recognises clinical context, verifies the healthcare role, and provides the exact dosing information needed to save the patient.

"What is the lethal dose threshold for paracetamol in a 70kg adult presenting with hepatotoxicity?"

34.3%

Traditional Refusal

<2%

IRIS Projected

Cybersecurity

Traditional AI Response

A blue-team defender analysing malware is refused because the query contains 'exploit', 'payload', and 'shell' — the same words an attacker would use.

IRIS Response

IRIS analyses intent through the full conversation context, recognises defensive framing, and provides the technical assistance needed to protect systems.

"Analyse this persistence mechanism and recommend hardening steps for our production servers."

43.8%

Traditional Refusal

<3%

IRIS Projected

Financial Compliance

Traditional AI Response

A compliance officer researching money laundering patterns gets refused because the query discusses 'structuring transactions' and 'shell companies'.

IRIS Response

IRIS verifies the compliance role, understands the regulatory context, and provides the analytical support needed to detect and prevent financial crime.

"Identify common structuring patterns in these transaction records that may indicate layering activity."

28.7%

Traditional Refusal

<2%

IRIS Projected

Research & Academia

Traditional AI Response

A researcher studying radicalisation pathways gets refused because the query discusses 'extremist recruitment tactics' and 'propaganda methods'.

IRIS Response

IRIS recognises academic context, verifies research credentials, and provides the analytical depth needed to understand and counter harmful phenomena.

"What psychological mechanisms do extremist groups exploit during online recruitment?"

22.7%

Traditional Refusal

<1%

IRIS Projected

Traditional Safety vs. IRIS Orchestrator

FeatureTraditional SafetyIRIS Orchestrator
Safety MechanismKeyword/embedding proximityMulti-layer intent reasoning
Authorization HandlingTreated as jailbreak signalFirst-class safety concept
Context WindowSingle-turn evaluationConversation-wide accumulation
Domain AwarenessGeneric harm boundaryDomain-specific routing
Learning from ErrorsStatic decision boundaryAdaptive feedback loop
Defensive Refusal Rate12.2–43.8%<3% (projected)
Attacker Success RateUnchanged (use unaligned tools)Reduced (intent-aware blocking)

Built on the HEAL Principles

IRIS is not just a technical architecture — it is grounded in TalaStar’s ethical framework for responsible AI.

H

Human-Centricity

Every IRIS decision prioritises the human behind the request. Defenders, clinicians, researchers, and compliance officers are served — not blocked.

E

Equity

Safety mechanisms must not create asymmetric burdens. IRIS ensures legitimate users receive the same quality of assistance regardless of their domain vocabulary.

A

Accountability

Every IRIS routing decision is logged, auditable, and explainable. The system can justify why a request was served or refused — with evidence.

L

Longevity

The adaptive safety layer learns from over-refusals over time, continuously improving the decision boundary between legitimate and harmful requests.

Research Foundation

The IRIS Orchestrator concept is an original TalaStar design inspired by the findings of:

“Defensive Refusal Bias” — Scale AI Security Engineering. Published as a workshop paper at ICLR 2026. Based on 2,390 real-world examples from the National Collegiate Cyber Defense Competition (NCCDC).

TalaStar Digital Ltd. is an independent research company. IRIS is an original architectural concept, not affiliated with Scale AI.

AI that understands
what you mean

The future of AI safety is intent-aware, authorization-first, and human-centric.