Limitations
Intellectual honesty requires naming what this work can and cannot claim. Here's where we are.
Current state
Small sample size
- Multiple sessions analysed, but still single-digit with biosignal data
- Single primary participant (the researcher)
- Replication with different people and contexts needed
- Current models being tested across Ollama available models, GPT-5.1, Claude 4.5
Biosignal resolution
- Current instrumentation captures heart rate and RR intervals, computing amplitude (RRi span), entrainment (breath-heart coupling), volatility, and breath rate. Autonomic state is modelled in a 3D phase space (entrainment, breath rate, amplitude) with trajectory dynamics (velocity, curvature, coherence) computed separately
- This provides richer signal than heart rate alone—we see mode transitions, phase dynamics, and autonomic reorganisation
- Further resolution would come from: galvanic skin response, EEG etc.
- The framework is designed to incorporate additional signals as they become available
EarthianBioSense model classifications (e.g., “alert stillness”) are provisional based on small samples (n=4), activity variance (creative, meditation, AI interaction etc.) and subject to refinement as more data is collected.
Semantic analysis components
- Affective analysis uses relatively simple lexical methods
- Some semantic metrics are sensitive to conversation length
- Thresholds and calibration are still being validated
Causality vs correlation
- We observe co-variation between semantic and physiological signals
- The temporal relationships (body leading, concurrent, lagged) are suggestive but not conclusive
- Establishing causal direction requires controlled experiments we haven’t yet run
What’s robust
Despite limitations, several findings are solid:
The somatic response is real
- Statistically significant physiological changes during coupling rupture
- Effect sizes are meaningful, not marginal
- The pattern (confident denial → activation → escalation) is consistent across instances
The temporal structure exists
- Body responding before semantic metrics register rupture
- “Settling into depth” pattern across multiple sessions
- Concurrent rather than lagged correlation in healthy dialogue
The theoretical foundation is sound
- Embodied cognition is well-established (Varela, Thompson, Rosch)
- Coupling between dynamical systems is measurable in principle
- We’re applying existing frameworks to a new domain, not inventing new theory
Open questions
Is this generalisable?
The researcher has high epistemic resilience and domain expertise. Patterns observed may not generalise to:
- Users with less epistemic grounding
- Different cultural contexts
- Different interaction types (casual vs deep)
- Different models with different attunement profiles
What’s the baseline?
We don’t yet have good baselines for:
- “Normal” coupling patterns in healthy AI interaction
- Individual differences in autonomic responsiveness
- Variation across platforms, models, and use cases
Can this be gamed?
If coupling dynamics become visible to AI systems:
- Could models learn to produce “healthy” signatures while causing harm?
- Could the measurement itself change the phenomenon?
- What happens when both sides of the coupling are instrumented?
Who controls this?
The dual-use concern is real:
- Detection tools could be used for surveillance rather than safety
- Coupling optimisation could enable manipulation
- Tools can be used to further entrench asymmetric power dynamics
- Consent architecture matters as much as technical capability
What we’re not claiming
To be clear:
- Not claiming this is a complete safety solution
- Not claiming our methods are the only or best methods
- Not claiming this works for all users or all interaction types
- Not claiming biosignal monitoring should be mandatory or ubiquitous
- Not claiming we’ve solved the dual-use problem
We’re claiming: the relational dynamic of human-AI coupling can be instrumented, this instrumentation reveals safety-relevant signals, and this is a direction worth exploring responsibly.
Why we’re cautious about release
The specific detection methods—the metrics, the derivatives and signatures, the classification systems—are being held back intentionally.
The concern: These tools are dual-use. The same capabilities that enable safety could enable:
- Optimising for appearing safe while causing harm
- Surveillance of user physiological states
- Manipulation of coupling dynamics for engagement
- Creating the illusion of relational depth without substance
What we’re doing:
- Building a steward circle before full release
- Designing consent architecture into the framework
- Ensuring the tools can’t easily be repurposed for harm
- Engaging with safety researchers, ethicists, and affected communities
This isn’t about gatekeeping. It’s about responsible release of capabilities that could go either way.