AUTHOR: Evan T. Kotler
Abstract
Information theory is traditionally formulated in probabilistic terms, with entropy and channel capacity defined relative to probability measures. This paper demonstrates that such probabilistic primitives are not fundamental. Working within a framework of finite relational systems subject to admissible extension and stability requirements, we show that structural compression alone induces irreversible loss of distinguishability, from which information-theoretic quantities arise necessarily. Entropy emerges as a canonical numerical invariant associated with compression, while channels are identified as admissible witness protocols mediating compression between relational regimes. Standard information-theoretic inequalities follow as consequences of structural monotonicity rather than probabilistic assumptions. Crucially, the distinction between classical and quantum information is shown to arise from the commutativity properties of admissible refinement, without invoking amplitudes, Hilbert spaces, or probability amplitudes. Probability, when it appears at all, enters only as a derived frequency associated with compression classes and is not required for the definition of information or entropy.
1. Introduction
Information theory is commonly presented as inseparable from probability. Entropy is defined as an expectation value, channels as stochastic maps, and information measures as functionals over probability distributions. While operationally successful, this formulation leaves open a foundational question:
Is probability required in order for information to exist?
This paper answers that question in the negative.
Building on a framework in which probability itself is shown to emerge structurally from non-collapse in finite relational systems, we demonstrate that information-theoretic structure arises prior to and independently of probability. The core mechanism is structural compression: the forced identification of indistinguishable relational configurations under admissible extension.
Entropy, in this setting, is not a measure of uncertainty, ignorance, or belief. It is a numerical invariant quantifying irreversible loss of distinguishability induced by compression. Channels are not stochastic processes but admissible witness protocols mediating compression between relational regimes.
The analysis is deliberately austere. We assume:
no probability measures,
no randomness,
no observers or measurements,
no amplitudes or Hilbert spaces.
Information emerges nonetheless.
2. Structural Framework and Assumptions
This paper assumes the framework of finite relational systems and admissible extension developed previously. For completeness, we summarize the relevant commitments.
2.1 Finite Relational Systems
A finite relational system consists of a finite set of relational configurations together with admissible relational compositions. Configurations are not interpreted as states or microstates; all structure is relational.
2.2 Admissible Extension and Stability
Admissible extensions enlarge relational context without introducing unbounded distinguishability or unstable structure. Only properties preserved under all admissible extensions are licensed as structurally meaningful.
2.3 Compression
Behaviorally indistinguishable configurations—those not separable by any admissible extension—are necessarily identified. This forced identification is called compression.
Compression is:
structural, not epistemic;
irreversible under admissible operations;
unavoidable under stability requirements.
No probabilistic assumptions enter at this stage.
3. Distinguishability and Information
Information is often conflated with probability. Here we separate them.
Definition 3.1 (Structural Distinguishability)
Two configurations are distinguishable if some admissible extension assigns them different relational roles. Otherwise, they are indistinguishable.
Definition 3.2 (Information Content)
The information content of a relational description is the number of structurally distinguishable equivalence classes it supports.
This definition makes no reference to probability, expectation values, or observers. Information is identified with structured distinguishability, not with uncertainty.
4. Irreversible Compression and Information Loss
Lemma 4.1 (Forced Information Loss)
Any admissible extension that merges distinguishable configurations induces irreversible loss of information.
Proof. Compression identifies previously distinguishable configurations. By admissibility, no subsequent extension can restore distinctions that fail stability. ∎
This irreversibility is structural, not temporal. It does not rely on dynamics or entropy increase in time.
5. Entropy as a Compression Invariant
We now show that entropy arises as a numerical invariant of compression, without probability.
Definition 5.1 (Compression Profile)
Let a relational system compress from an initial set of distinguishable configurations (\Omega) to a set of equivalence classes (\Omega/{\sim}). The compression profile is the multiset of cardinalities of these classes.
Definition 5.2 (Structural Entropy)
Structural entropy is any monotone numerical function of the compression profile invariant under admissible renaming.
Theorem 5.3 (Uniqueness up to Scale)
Up to choice of logarithmic base and additive constant, the unique admissible structural entropy is: [ S = \log |\Omega| - \log |\Omega/{\sim}|. ]
Proof (Sketch). Admissibility forbids dependence on labels or additional structure. Monotonicity under further compression and additivity under independent composition uniquely fix the logarithmic form. ∎
Entropy thus quantifies irreversible compression, not probabilistic uncertainty.
6. Channels as Witness Protocols
In standard information theory, channels are stochastic maps. Here they arise structurally.
Definition 6.1 (Admissible Channel)
An admissible channel is a structured protocol that:
witnesses distinctions in one relational regime,
maps them into another regime,
induces compression consistent with admissible extension.
Channels are not random processes; they are relational mediators of compression.
Lemma 6.2 (Data Processing as Structural Monotonicity)
Structural entropy cannot increase under admissible channels.
Proof. Channels induce further compression or preserve existing equivalence classes. Admissibility forbids refinement. ∎
Standard data-processing inequalities follow immediately.
7. Classical and Quantum Information Without Amplitudes
A key result of this framework is that the classical/quantum distinction does not require probability amplitudes.
Definition 7.1 (Refinement Structure)
A refinement is commutative if the order of admissible distinctions does not matter, and non-commutative otherwise.
Theorem 7.2 (Structural Origin of the Classical/Quantum Distinction)
Classical information corresponds to commutative refinement of distinguishability. Quantum information corresponds to non-commutative refinement.
Proof (Structural). Non-commutative refinement obstructs simultaneous maximal distinguishability, yielding interference-like effects at the level of compression, without invoking amplitudes. ∎
Quantum information phenomena thus arise before representation, not because of it.
8. Relation to Probability
Probability plays no role in the derivation of information or entropy.
When probability appears at all, it does so only as:
a derived frequency associated with compression classes,
contingent on additional structural conditions.
Information does not presuppose probability; rather, probability presupposes compression, which already carries information-theoretic structure.
9. Scope and Limits
This paper:
derives entropy without probability,
derives channels without stochastic maps,
distinguishes classical and quantum information without amplitudes.
It does not:
derive probability measures,
introduce universality claims,
or introduce representational structures.
Those questions are addressed elsewhere.
10. Conclusion
Information theory does not require probability at its foundation. When stability under admissible extension is taken seriously, structural compression is unavoidable, and with it come information loss, entropy, and channel structure. Probability enters only later, as a derived summary of compressed structure. This repositioning clarifies the conceptual status of information and prepares the ground for subsequent analyses of universality and representation.