Probability is Relativistic

Hadi published on
19 min, 3750 words

Probability is Relativistic

Probability is depending on the observer's knowledge, obviously!


Above wallpaper reference1

Probability is a vector, and it's so obvious to me, so let's write about it! To do so, as always, let's define the basics first.

But before going further, I just have a side note. I notice some people reading the previous paragraph, and similar ones, then thinking that I think I am smart! Unfortunately, I think I am not smart, but sometime I am so stupid! In fact my understanding that I could sometime find a small bright pebble along the shore of knowledge that nobody has been found is as following. Building upon how Newton described it, the attack surface of knowledge ocean, which would be its shore of course, is enormously vast that even if all the smart people in the history with the help of all the artificial intelligence in the future gather their force to swipe all shores, they cannot cover it all. Therefore, there's a random element in choosing where should be explored, and while we humans are deterministic machines, the random seeds are the input from our life-style and environment. Finding the brighter pebble is hard by itself and it will take decades, but if you try to have a unique life-style, you will find the bureaucracy in academia keep fighting back! Over the history of science, we see scientists shared their knowledge over private letters, where in 2025 apparently sharing via public post is not science! The promise of heavy regulation is protecting scientists against fraud, which it didn't happen, and it will never happen! It's beyond my understanding why having minimum number of regulation is even a debate! Having minimum number of regulation is how any large scale structure in reality works! In summary, I probably just have a different life-style and environment than all the smart people, so I can find some bright pebbles, even though I am not smart. You can do that too!

Let's come back to the definition of the probability.

Probability is the branch of mathematics and statistics concerning events and numerical descriptions of how likely they are to occur. -- Wikipedia

Which means for any probabilistic system there are a constant number of cases, where we'll call event type, where among them one will happen and we can assign numbers to the likelihood of its happening.

So far so good! However, it's much harder to find the definition of relativity based on my understanding of this subject! The relativity in the Wikipedia page is described by The Einstein Theory of Relativity2, which looks like an indirect description rather than pointing what it actually is! First of all, Galileo Galilei first explained relativity, where you can find the details in the Galilean invariance page in Wikipedia3, which doesn't have a wrong title, but I would change the title to Galilean Relativity! Second, the concept of relativity, especially after General Relativity4, is much closer to the concept of scalar/vectors/tensors5, where you can define it like the following.

Relativity is the existence of linear transformation among certain observation of different observers. An observation can be expressed as an array of numbers associated to the coordinate of an observer, where we call those array of numbers scalar, vector, or tensor.

This is clear now that the concept of relativity and scalar/vectors/tensors has been merged for a century, but apparently nobody finds it significant enough to mention it in our most used encyclopedia! Yes! Before Special Relativity6 we didn't consider the time as an axis in coordinates, but after that what's our excuse to start defining relativity otherwise!

Next, we're going to define probability as vector, then we're going to prove it satisfies the necessary constraints.

Probability as vector

The way we're going to define this vector is inspired by the way the Quantum Mechanincs7 works with probability, however, it's not compatible with what we have here! You can find more discussion about it below.

Let's have a manifold8, which is obviously locally is an Euclidean space9, \(\mathbf{E}^S\), with \(S\) number of dimensions, where \(S\) is the number of types of possible events, event types, to occur in an observing system. Probability of happening an event type \(i\) is shown by \(P_i=a_i/Z\), where \(a_i\)s are the elements of a vector, let's name it \(\overrightarrow A\), where below relation always satisfies.

\[ \sum_i P_i=\sum_{i \in E} \frac{1}{Z}\times a_i=1 \]

Or

\[ \sum_{i \in E} a_i=Z \]

Where \(E=\{\text{all elements}\}=\{\text{all event types}\}\). Be aware, unlike Wikipedia we defined event as what we observer in one point in space-time, the same as what we have in General Relativity4, but Wikipedia's definition of event is what we called event type here.

In above paragraphs we only defined that array of numbers, without showing how is that a vector! To do so, we need to dig deeper.

Even though we have multiple interpretations to think about probability, such as Frequentist probability10, Propensity probability11, or Bayesian probability12, in the end, when we want to calculate a probability distribution we are always using Empirical probability13, or also called relative frequency, where someone may define it like below.

Given an event \(A\) in a sample space, the relative frequency of \(A\) is the ratio \(\frac{m}{n}\), \(m\) being the number of outcomes in which the event \(A\) occurs, and \(n\) being the total number of outcomes of the experiment. -- Wikipedia

Notice, what Wikipedia called event, here we called event type, also, the outcome in the Wikipedia is what we refer to as event! Since, we want to have compatibility with General Relativity4.

We are always using Empirical probability13, otherwise, people just assume some distributions, like Boltzmann distribution14, calculate something, then measure some consequences of that choice to argue about that distribution. Here, we're interested in the calculation of probability distribution directly from the reality, which is always a fraction based on Empirical probability.

Let's assume we have a finite sample of events with a number of types. We can count the events for each type. For instance, we have \(m_1\) for the number of events for the type one, then \(m_2\), etc. Therefore, the probability distribution would be

\[ P_i=\frac{m_i}{\sum_j m_j} \]

The point is there's no other pragmatical way to calculate probability distribution out of reality, based on above explanation, even though people usually don't take above formula as an axiom, but here we do. Now that we know \(m_i\)s are the counting of the events, we know that there must be a variable like time, \(t\), that \(m_i(t)\) is always increasing, since the counting is accumulative.

\[ P_i(t)=\frac{m_i(t)}{\sum_j m_j(t)} \]

Based on Frequentist probability10, we should calculate the limit of \(P_i(t)\) over \(t\), when \(t\) is increasing to infinity. Also based on Propensity probability11 we increase \(t\) until \(P_i(t)\) become stable. Additionally, in the Bayesian probability12 we don't need to go back and count again after each tick of the clock, \(t\), but we can calculate the probabilities based on the update function of Bayesian inference15, which is obviously compatible with going back and count them again. Thus, above formulation covers all the interpretations we have in hand. This is why we took above formula as an axiom.

Now that we are familiar with \(m_i(t)\) we can think about its Taylor series16. With \(c_{ij}\) as the coefficients of this series we have

\[ m_i(t)=\sum_j c_{ij}t^j \]

Hence we have

\[ P_i(t)=\frac{\sum_j c_{ij}t^j}{\sum_k \sum_j c_{kj}t^j} \]

Notice, since \(m_i(t)\) must be increasing, it must at least has a non-zero term, \(c_{iu} \neq 0\), in its series, where \(u > 0 \), so it would be the lower bound in its Taylor series.

Remember \(m_i(t)\) must be an increasing function, and in all above interpretations we have more strong results if we increase \(t\). Therefore, basically we always end up calculating the limit of above statement, when \(t\) is increasing to infinity, if you are familiar with how we prove the lim of any function! Someone could come and ask: you said above that infinity is paradoxical then why are you using it, and I will respond that: one of the problems that makes it paradoxical is that when we talk about infinity we are not necessarily talking about the same thing unless we presume the context first. For instance, the infinity when you increase the variable of the lim is just asserting what is the behavior of the function if we increase its variable step by step, which totally makes sense, so it's a valid process. In summary, the assumption of existence of infinite iteration is paradoxical, like where we assume real numbers exist, but it's safe for \(t\) to approaches what we call infinity in lim's definition, as long as we don't rely on any infinite iteration.

Obviously, I just paved the way to use lim's theorems, especially L'Hôpital's rule17. By applying this rule on above statement we have the following.

\[ P_i(t)=\lim_{t\to \infty}\frac{\sum_j c_{ij}t^j}{\sum_k \sum_j c_{kj}t^j}\ =\ \begin{cases} 0, \text{if } u_i < u_{max}\\ \frac{u_{i}!c_{i,u_{i}}}{u_{max}!\sum_{k \in U} c_{k,u_{max}}}, \text{if } u_i = u_{max}, U = \{\forall i| u_i = u_{max}\} \end{cases} \]

Where \(u_i=\max \{\forall j|c_{ij} \neq 0\}\), and \(u_{max}=\max \{\forall u_j|j \in E\}\). This shows \(u_{max}\), and accordingly \(u_i\), must exist, therefore, \(m_i(t)\) must be a polynomial function18, otherwise, the probability will not converge, or the probability distribution will collapse. The existence of the upper bound in its Taylor series will complete our picture of this series. For instance, if you found an event that when you count its frequency you see an exponential function respect to time, the system is not a probabilistic system. It's collapsed to a deterministic one!

Regarding the zero cases \(\text{if } u_i < u_{max}\), we should remove them from the event types' set, since they are not probable event types anyway! This simplifies what we had above, because it forces

\[ u=u_i = u_{max}, U = \{\forall i| u_i = u_{max}\} = \{\forall i| \text{for all event type } i\} = E \]

Thus we have only

\[ P_i(t)=\frac{c_{i,u}}{\sum_k c_{k,u}} \]

I see I may confuse some people, where they may complain: but Boltzmann distribution14 is an exponential function, where I'll respond by: that distribution is not a function of time, \(t\), or a similar ever increasing variable! Hope it's clear by now.

We're in a half of the proof! Stay with me! The Taylor series shows

\[ c_{i,u} = \frac{1}{u!}\frac{d^u m_i(t)}{dt^u}(0) \]

After this step, we only need one of the derivations, since carrying all of them around is not necessary, thus let's define \(q_{iu}(t)\) like this

\[ q_{i,u} = \frac{1}{u!}\frac{d^{(u-1)} m_i(t)}{dt^{(u-1)}} \]

So we have

\[ c_{i,u} = \frac{d q_{iu}(t)}{dt}(0) \]

Notice, even though \(q_{iu}(t)\) is a linear function of time, but we kept the \((0)\) around, since it doesn't bother me! By using this expression, here is the time to look at the definition of probability above, and pick our coefficient \(Z\)! Based on the definition, below relation always satisfies.

\[ \sum_i P_i=\sum_{i \in E} \frac{a_i}{Z}=1 \]

This takes us to below equation.

\[ P_i(t)=\frac{1}{\sum_k c_{k,u}}\frac{d q_{iu}(t)}{dt}(0)=\frac{a_i(0)}{Z} \]

Since we can always come back and by using \(\sum_i P_i=1\), recalculate \(\frac{1}{\sum_k c_{k,u}}\) coefficient, thus, let's keep it in the \(Z\) coefficient and focus on the rest. Therefore we are looking to show \(a_i(0)\)s in below equation are elements of a vector.

\[ a_i(t)=\frac{d q_{iu}(t)}{dt} \]

Notice, if \(a_i(t)\) is a vector for all \(t\), then in one point, namely \(t=0\), it also must be a vector. Next! Since \(a_i(t)\) are differentials, according to Differential Geometry19, it must be a vector, but we need to provide the space/manifold in which it's a vector. It's not hard actually since we know that we built the whole thing on the frequency of events. And different observers would count differently the same events. Therefore, if for our first observer we have \(m_1,m_2,...,m_M\), which are generated by counting of the events, and the second observer have \(n_1,n_2,...,n_N\), then there are relations like below among them.

\[ \begin{cases} m_1=m_1(n_1,n_2,...,n_N)\\ m_2=m_2(n_1,n_2,...,n_N)\\ ...\\ m_M=m_M(n_1,n_2,...,n_N) \end{cases} \]

To avoid switching dimensions, let's define \(S=\max\{M,N\}\), so if \(M<S\), we assume there are extra events that didn't happen for this observer, or the observer doesn't have access to count them, or one observer measures some events twice or more, thus \(m_{m+1}..m_{S}\) are all zeros. The same would be applied to the other observer if \(N<S\). Notice each observer have different level of access to the knowledge about the system, so it's totally possible that one of them has access to the projection of the information on a lower-dimensional manifold. In such a case the counting coordinates which are perpendicular to that projection, would be considered as zero. Hence, let's write the above function with the new convention.

\[ \begin{cases} m_1=m_1(n_1,n_2,...,n_S)\\ m_2=m_2(n_1,n_2,...,n_S)\\ ...\\ m_S=m_S(n_1,n_2,...,n_S) \end{cases} \]

This brings us to realize that \(a_i\) are functions of these coordinates, so we have \(a_i=a_i(m_1,m_2,...,m_S)\), where for that specific counting we had a curve with parameter \(t\), thus we had \(m_1=m_1(t), m_2=m_2(t), ..., m_S=m_S(t)\) for that specific curve for this observer. This also dictates \(q_{iu}=q_i(m_1,m_2,...,m_S)\) in the below equation. Notice, we get ride of \(u\) since it's a constant after this.

\[ \begin{array}{lll} \overrightarrow A&=\sum_i a_i\frac{\partial }{\partial q_i}\\ &=\sum_i\frac{d q_i}{dt}\frac{\partial }{\partial q_i}\\ &=\sum_{ij}\frac{d q_i}{dt}\frac{\partial r_j}{\partial q_i}\frac{\partial }{\partial r_j}\\ &=\sum_{j}\frac{d r_j}{dt}\frac{\partial }{\partial r_j} &=\sum_{j} b_j\frac{\partial }{\partial r_j} \end{array} \]

Where we defined \(r_j\) respect to the second observer's probability distribution, \(b_j\), the same as what we had for \(q_i\) and \(a_i\).

\[ b_i(t)=\frac{d r_{i}(t)}{dt} \]

Notice, \(a_i\) are not exactly the elements of this vector in \(m_i\) coordinates, but in \(q_i\) coordinates, where someone can easily transformation them by using below relation, to find the elements of \(\overrightarrow A\) in \(m_i\) coordinates.

\[ \sum_{i}a_i\frac{\partial m_k}{\partial q_i} \]

Where \(\frac{\partial m_k}{\partial q_i}\) are elements of the reverse transformation matrix, with \(\frac{\partial q_k(m_1,m_2,...,m_S)}{\partial m_i}\) elements. This is true since in Differential Geometry19 coefficients of \(\frac{\partial}{\partial m_i}\) are the elements of the vector as shown below.

\[ \begin{array}{ll} \overrightarrow A&=\sum_i a_i\frac{\partial }{\partial q_i}\\ &=\sum_{ik}a_i\frac{\partial m_k}{\partial q_i}\frac{\partial}{\partial m_k}\\ \end{array} \]

This all shows the linear transformation we needed exists, thus, we proved \(\overrightarrow A\) is a vector.

Even though I found this fact independently, and I cannot claim nobody knew about this vector, but at least nobody used it to solve the Sleeping Beauty problem20, as far as I know!

Before jumping to that problem, Let's briefly discuss the compatibility with Quantum Mechanics we have mentioned before. In the Quantum Mechanics we have Hilbert space21, where we can define state of the system as vectors, which means the transformations are linear, but the relationship between that vector to the probability of one quantum basis state being excited is defined by the norm of that vector in the Hilbert space21. However, in above definition of probability vector, we showed the probability itself is a vector, where sum of all of its direct elements would add up to one, so no norm is needed. Someone can say in the Quantum Mechanics that vector swipe a sphere, whereas here it swipe a plane.

Sleeping Beauty problem

It's a known problem without clear resolution yet, where you can read about it in its Wikipedia page20, or watch the Derek's video below.

To summarize the problem, there are two valid positions, the thirder position, where they count the number of event types as three, and the halfer position, where they count the number of event types as two.

Based on what we learned above these positions are actually two observers of the system. The thirders are the actual sleeping beauty's observer, where in Derek's video, he counted the event types by hand to reach to the thirders' result. However, the halfer observers are the one who can count the heads directly, so they have access to more information about the system.

To give a better picture, let's write the vectors down for each of them. For thirders it's like this.

\[ \begin{bmatrix} \frac{1}{3}\\ \frac{1}{3}\\ \frac{1}{3} \end{bmatrix} \]

And for the the halfers it's like below.

\[ \begin{bmatrix} \frac{1}{2}\\ \frac{1}{2}\\ 0 \end{bmatrix} \]

Notice, we used our previous convention that if we need to increase the dimension, we will add event types with occurrence of zero. The transformation matrix between these two have some unknown variables, so it's not fully determined, however, it's not a deal breaker for us! We understood the Sleeping beauty problem, so it's not a problem or paradox anymore! Just embrace the fact that probability is relativistic.

Wigner's friend

In the Quantum Mechanics, Wigner's friend22 is a paradox, where Wigner's friend has access to more information than Wigner himself, and the absolute view of a probabilistic system raise a paradox. However, in the previous sections we learned that in a probabilistic system we can have different observers, who have access to different information about the system. Therefore, by simply accepting that probability is relativistic, not absolute, this paradox, the same as the Sleeping beauty paradox, will vanish to the air!

Determinism vs probabilism

Questions regarding the deterministic nature of reality are old! I mean really old! All the religions have been choosing one side and built upon that! However, here we're going to give a new answer, and probably the final answer!

We can find the update of that debate in the interpretations of the Quantum Mechanics. You may know that I have an interpretation of quantum named the Resonance interpretation23, which is a deterministic theory, unlike the Copenhagen interpretation24 that is a probabilistic, where I think a deterministic reality makes more sense if you compare it to a probabilistic reality. Thus this post is targeting to clarify a new requirement for any probabilistic theory.

Based on the fact the probability vector is swiping a plane, you may notice for any probabilistic system, there exists an observer who calculates one for the probability of happening an event type and zero probability for the rest, which is the one who knows everything about the system. Therefore, to that observer the system is deterministic, and that's what a theory about the reality must cover, even though such an observer doesn't exists in the reality, but its coordinates must be definable in any theory.

Unfortunately, theories like the Copenhagen interpretation doesn't support such a deterministic reality even for a single non-existence observer. Now that we studied the relativistic probability, the fact that Copenhagen interpretation doesn't have any room for a deterministic point of view, is like saying in the classical mechanics we cannot have a reference frame inside the core of the sun, since no observer could exist there!! My point is any theory must support such a coordinates even though no observer could sit in the center of that coordinates.

So the answer to the question of "is reality deterministic?" would be "Any theory you choose to describe the reality MUST support coordinates for a deterministic view point, event though all other view points are probabilistic".

Conclusion

Probability is relativistic, since not all information is equally shared among different observers. In this post, we proved probability is a vector, therefore, it's relativistic. Additionally, we concluded any theory for a probabilistic system MUST reserve a room for a deterministic observer, no matter if such an observer exists or not.


References

Cite

If you found this work useful, please consider citing:

@misc{hadilq2025ProbabilityRelativity,
    author = {{Hadi Lashkari Ghouchani}},
    note = {Published electronically at \url{https://hadilq.com/posts/probability-is-relativistic/}},
    gitlab = {Gitlab source at \href{https://gitlab.com/hadilq/hadilq.gitlab.io/-/blob/main/content/posts/2025-02-25-probability-is-relativistic/index.md}},
    title = {Probability is Relativistic},
    year={2025},
}