Geometrical Probability (Conditional Probability is Relativistic)
Categories: Math Probability
Tags: Mathematics Probability Relativity
Since probability is about counting, you can use it to define our space-time, and more!
Above wallpaper reference1
After writing Probability is Relativistic2, I notice I have to complete the picture I drew over there, so here we go!
In our reality, we have some events. We can observe specific position and time for each of them. Here events are points, when you study geometry, so we want to connect the dots to have something meaningful, literally! By connecting the dots/points we are talking about categorizing or assigning types to them. Keen eyes have noticed I am referring to Category Theory3 and Type Theory4, but if you don't know about those ideas you didn't lose the continuity of this story!
We assign some events to a type by using our measurement tools. For instance, we have a ruler that can connect dots/events, so we assign a type to all of those events, which are the dots in a line measured by the ruler. We will refer to that type as event-type. The same could be applied to events measured by a clock. All the events on the ruler has the same type, where we usually call them \(x\) axis or \(y\), etc.
You may notice that based on Special and General Relativity5 6 different observers could assign different types to the events. This was the whole idea behind the previous post, Probability is Relativistic2. Also, you may notice from the previous paragraph that even one observer with one coordinate system will assign different event-types to events, which are the coordinates themselves.
By the way, for completeness let's mention that I consider my posts as full-fledged scientific articles, so feel free to comment or even cite.
In this post we want to complete the tools we have developed, since it's easy, useful, and helpful for future posts.
Geometrical probability space
Let's talk with the language of Differential Geometry7. Let's have a counting manifold8, which obviously locally is an Euclidean space9, \(\mathbf{E}^S\), with \(S\) number of dimensions, where \(S\) is the number of types of possible events, event-types, The \(S\) must be the maximum number of independent, or dependent, event-types any observer could observe on any point on the system. Recall this is not infinity since observers have the finite number of measurement devices. The dependent part of this definition will stop us to use certain tools in the Differential Geometry, especially the ones that use determinant10, but the rest of the tools would work perfectly. Therefore, we have
\[ \begin{cases} n_1=n_1(\alpha_1)\\ n_2=n_2(\alpha_2)\\ ...\\ n_S=n_S(\alpha_S) \end{cases} \]
Where \(n_1,n_2,...,n_S\) are the number of events with type one, two, to \(S\), but these numbers are growing if you have more time, or more walk on axis \(x\) to count the events. Thus, there are also \(S\) independent, or dependent, parameters, namely \(\alpha_1,\alpha_2,...\alpha_S\), where you can move on them and count more of events with that event-type. We can call \(\alpha\)s parameters of measurement devices, for instance \(\alpha_1\) could have a cm unit, but \(n_1\) has no unit since it's counting events.
What I like about this definition is that the arbitrary choice of having dependent parameters. It simply ignores the fact that we don't have proper tools to deal with spaces with zero determinant10, but they exist, and they are fascinating stuff! We should start studying them. For instance, in the previous post 2 in all examples the \(\alpha\) parameters were always depending on one parameter, the time, so they were all dependent parameters, but it didn't stop us to have a consistent framework there. Here, we considered the fact that someone may have event counting machines aligned on \(x\) axis, to converge the probabilities faster, of course! Those machines could be as simple as a ruler. In this way we can describe a geometrical reality that doesn't force us to stick to non-zero determinant spaces. For instance, in the Sleeping beauty problem11, one of the observers, the sleeping beauty herself, keeps counting the same event multiple times. Such a coordinate system will have a zero determinant, but it's okay, as long as we avoid tools that rely on determinant in this space, such as the metric, etc.
Another point of extension of that idea is that we can let the counting numbers, \(n\)s, be zero or negative, since \(\alpha\)s, like time and space, have origin, that could be moved forward or backward.
Be aware that \(n\)s are counting, so we will end up with countable numbers, which are close to our consistent reality, not paradoxical, and non-constructable, Real numbers! I love it!
In this way, we built one coordinate system, by using \(n\)s, on the Geometrical probability space. However, it's not the only possible coordinate system. As such, we can define another one with \(m\)s as shown below.
\[ \begin{cases} m_1=m_1(n_1,n_2,...,n_S)\\ m_2=m_2(n_1,n_2,...,n_S)\\ ...\\ m_S=m_S(n_1,n_2,...,n_S) \end{cases} \]
The rest would work the same as Differential Geometry 7.
Probability in the counting manifold
Recall in the previous post2 we found:
\[ P_i=\frac{a_i(0)}{Z} \]
Where \(P_i(t)\) is the probability of events with even-type \(i\) is happening. However, since here we know \(a_i\) is a vector we need to keep its index on top due to the fact that vectors are contravariance12, so the correct way of writing it would be as following.
\[ P^i=\frac{a^i(0)}{Z} \]
We are going to apply it from now on. Also, we proved the following,
\[ a^i(t)=\frac{d q^{i}(t)}{dt} \]
Thus, \(a^i(t)\) is a vector, which implies \(P^i\) is a vector up to a normalization coefficient, where \(q^{i}(t)\) is another vector. We can extend \(t\), form the previous post, into \(\gamma\), to generalize the idea of direction of counting. Here we have also extended \(t\), from the previous post, into \(\alpha\) parameters, thus, the parameters of measurement devices, \(\alpha\)s, are function of the direction of counting, \(\gamma\). However, keep it in mind that \(\gamma\) must always increase the counting numbers. In other words, the coordinate numbers, \(n\)s, must be increasing in the direction of \(\gamma\).
\[ a^i(\gamma)=\frac{d q^{i}(\gamma)}{d\gamma} \]
Hence, probability of happening an event-type \(i\) is shown by \(P^i=a^i/Z\), where \(a^i\)s are the elements of a vector, let's name it \(\overrightarrow a\), where below relation always satisfies.
\[ \sum_{i \in E} P^i=\sum_{i \in E} \frac{1}{Z}\times a^i=1 \]
Or
\[ \sum_{i \in E} a^i=Z \]
Where \(E=\{\text{all elements}\}=\{\forall \text{event-types}\}\). For convenience we can have a coordinate-dependent array, \(\underrightarrow{1}=\sum_{i \in E} dn^i\), so \(Z=\langle \underrightarrow{1}, \overrightarrow{a}\rangle\), where \(\langle o , o\rangle\) is the inner product. Notice as a convention \(dn^i \)s are 1-forms. By using this we can summarize them into the following.
\[ \underrightarrow{P}=\frac{1}{\langle \underrightarrow{1}, \overrightarrow{a}\rangle} \overrightarrow{a} \]
Since \(\underrightarrow{1}\) is not a vector, which is denoted by the under arrow, \(Z\), as its inner product, is not a scalar, which means it's also coordinate-dependent. This is why \(\underrightarrow{P}\) is also coordinate-dependent, so we wrote it as an array, \(\underrightarrow{P}\), not a vector. Thus, technically \(\underrightarrow{P}\) is not relativistic, but we proved that \(\overrightarrow{a}\) is. Hence, if we could do all the calculations with \(\overrightarrow{a}\), then in the last step multiply it by a non-scalar to normalize it, we will have a better understanding/visualization of what's going on. In other words, a better wording would be: \(\underrightarrow{P}\) is relativistic up to the normalization coefficient.
Conditional probability is relativistic
The conditional probabilities13 are very useful, so we're interested to picture them in the Geometrical Probability, aka, the counting manifold. Based on their definition we have
\[ P(A \mid B) = \frac{P(A \cap B)}{P(B)} \]
Where in our language \(A\) and \(B\) are two event-types. However, it's known if we have a complete set of \(B\) event-types, where they could cover the whole space, we can sum up \(P(A \cap B)\) to get \(P(A)\).
\[ P(A)=\sum_{i \in E} P(A\cap B^i)=\sum_{i \in E} P(A\mid B^i) P(B^i) \]
Let the counting manifold has \(B^i\) as the coordinate system, since they cover the whole space, we can argue the probability of event-type \(A\) is vector, up to the normalization coefficient, in that space, where temporarily we are going to use \(\underrightarrow{P(A)}\) to show it.
Notice, even though \(B^i\)s are making a complete coordinate system for this space, we still can add event-type \(A\) as another coordinate to this system later, since we can have dependent coordinates. It'll be useful when we want \(B^i\)s covering the entire space, but also below we don't want \(Z=\langle \underrightarrow{1}, \overrightarrow{b}\rangle\) changes when we add \(A\) as yet another coordinate.
Therefore
\[ P(A)=\frac{1}{\langle \underrightarrow{1}, \overrightarrow{b}\rangle} \frac{d q(m)}{d \gamma},\qquad P(B^i)=\frac{1}{\langle \underrightarrow{1}, \overrightarrow{b}\rangle} \frac{d r^i(n^i)}{d \gamma} \]
I hope it's clear that we defined \(\overrightarrow{b}=dq/d\gamma \partial_m + dr^i/d\gamma\partial_i\) as the corresponding vector for the counting direction of \(\gamma\). Notice, as a convention we have \(\partial_m=\partial /\partial m\) and \(\partial_i=\partial /\partial n^i\). Additionally, \(m\) is the number of event with event-type \(A\), and \(n^i\)s are the number of event with event-type \(B^i\). The \(q(m)\) is an element of a vector in correspondence to \(A\) event-type, and \(r^i(n^i)\) works the same for \(B^i\)s. This implies we can calculate \(\frac{\partial m}{\partial \gamma}\) as following.
\[ \frac{d q(m)}{d \gamma} = \langle \underrightarrow{1}, \overrightarrow{b}\rangle \sum_{i\in E} P(A\mid B^i) \frac{1}{\langle \underrightarrow{1}, \overrightarrow{b}\rangle} b^i = \sum_{i\in E} P(A\mid B^i) \frac{d r^i(n^i)}{d \gamma} \]
But we also have the following relationship.
\[ \frac{d q(m)}{d \gamma} = \sum_{i\in E} \frac{\partial q(m)}{\partial r^i(n^i)} \frac{d r^i(n^i)}{d \gamma} \]
Since this is independent of which coordinate we choose, this concludes we can calculate the conditional probability as the following.
\[ P(A\mid B^i) = \frac{\partial q(m)}{\partial r^i(n^i)} \]
Where \(n^i\) are coordinates, which are corresponding to \(B^i\), As you can see \(P(A\mid B^i)\) is an element of the transformation tensor, with rank two. Here, we don't need all the \(B^i\)s in this equation, so we can keep just one.
\[ P(A\mid B) = \frac{\partial q}{\partial r} \]
The fact that \(P(A\mid B)\) are elements of a rank two tensor, not like simple probabilities, which are vectors up to the normalization coefficient, makes them more useful than simple probabilities. Therefore, we can safely say conditional probabilities are relativistic. Going back to
\[ P(A \cap B) = P(A \mid B) P(B) = P(B \cap A) = P(B\mid A)P(A) \]
We can write the following.
\[ \frac{\partial q(m)}{\partial r(n)}\frac{d r(n)}{d \gamma} =\frac{\partial r(n)}{\partial q(m)}\frac{d q(m)}{d \gamma} \]
As expected! Let me explain! I was expecting the geometrical equivalent of the intersection, \(P(A\cap B)\), be the inner product between two directions of counting, \(m\) and \(n\), and this is exactly that, obviously, since we deduced the formula for the conditional probability in this way! The inner product must be something along the line of
\[ \frac{d q(m)}{d \gamma}\frac{d r(n)}{d \gamma} \cos\theta \]
And if you look carefully, it's there.
\[ \frac{\partial q(m)}{\partial r(n)}=\frac{d q(m)}{d \gamma}\cos\theta, \qquad \frac{\partial r(n)}{\partial q(m)}=\frac{d r(n)}{d \gamma}\cos\theta \]
Where \(\theta\) is the angle between \(m\) and \(n\) directions. Notice, I am just giving you the intuition of what's going on. The equations are working perfectly without the inner product argument above.
Thus, the Geometrical probability brought a new language for us to replace the language of Set theory14. This is the topic of the next post.
Conclusion
It's so useful to think about probabilities in the Geometrical probability context. It can show us that conditional probability is relativistic. Also, it can help us build probability without any dependency to the Set Theory, which is the topic for the next post.
References
Cite
If you found this work useful, please consider citing:
@misc{hadilq2025GeoProb,
author = {{Hadi Lashkari Ghouchani}},
note = {Published electronically at \url{https://hadilq.com/posts/geometrical-probability/}},
gitlab = {Gitlab source at \href{https://gitlab.com/hadilq/hadilq.gitlab.io/-/blob/main/content/posts/2025-05-04-geometrical-probability/index.md}},
title = {Geometrical Probability},
year={2025},
}