when doing breath meditation, mindwandering is an important notion. we can perhaps imagine that the mind follows a trajectory in some kind of space, \(x(t) \in \mathcal{X}\). We can also imagine that maybe this trajectory is stochastic in some sense, but maybe will wait.

I will use it as a word “introspectively” for “what i conclude when i try to think about what i experience in meditation”.

Introspectively, time is very fine-grained an consists of a single percept. So in that case we have to think of \(\mathcal{X}\) as a set of percpets. These percepts are complex objects, and one important aspect is that they differ by many “ooms” in how much they cause some kind of reaction. This means that some percept, it can be an internal image of a person, can have a certain kind of salience. Introspectively, percepts with a high value are “on average” followed by more of percepts that are “related”.

Simplistically we might imagine some kind of entity that controls inputs from multiple different sources. These sources may be more “sensory” or they might be more “internal”, such as images generated by fantasies or planning etc.

I then imagine \(x(t)\) as a kind of controlled sampling from these sources, where each source has its own trajectories and you get a sample from one of them.

This controlled sampler is a mechanism of attention. I then imagine that we can view the space from two perspectives, we can view it as discrete moments where we get a sample from each, or we can look at the relative abundance of samples from each source, almost as if we are looking across small intervals \(\Delta t\).

I don’t know whether to think of these sources as specific brain areas or more distributed, for now we are restricting to a kind of combination of introspection and formalizing, where we try to throw around ideas for formalizations of what we introspect. Introspection is different from awareness of the things, it invovles a kind of balance between rationalization and awareness.

Have you ever seen these videos showing a world map and the change of borders as different countries conquer each other an so on? There is some kind of finite resource, the area of land in the world and then there are these entities, countries, that vary in their territory. New ones can arise at some point and also end at some point.

landscapes

so we imagine that somehow there is a landscape, where there are certain attractors. there is stochasticity, so we are not converging to a single percept but locally in time most of the percepts will be about a specific “topic”

the temperature controls the flatness.

in breath meditation we are controlling the system to be in a state that is not an attractor. however, frequent breath meditation, in my experience, makes the sensory stimuli from the nose region almost like an attractor.

when meditating for 30 minutes, as opposed to having normal resting state where mind wanders freely, you get two different distributions of time spent in the attractors.

however, i think this misses the important part, the important part may be the relationship between temporally close percepts.

for an interval of 10 seconds \(\Delta t\), consider the fraction of percepts at the meditation object, call it \(\phi_{\Delta t}(t)\). Breath meditation generally seeks relatively high such values. Depending on your goal for meditation, many people aim for too high values of \(\phi\).

We can consider a mindwandering episode to be a stretch of time where \(\phi(t)\) falls from its desired value \(\phi^*\) along with the the increase of the fraction of percepts \(\psi_A(t)\) associated with some attractor \(A\).

I contend that high values of \(\phi\) is generally, is associated with some kind of temporal mixing of attractor percepts. And this mixing is even higher for noting meditation. Though not necessarily changing the relative power/size/abundance of the attractors, you are sampling them in a more mixed way, essentially reducing the autocorrelation. Introspectively, especially in noting meditation, this is experienced as a kind of temporal shattering of thought processes, where no thought process can survive sufficiently long. There can also be a related kind of spatial shattering, but that would be a bit unrelated to what I am discussing now.

I guess a way to talk about this is that we can think that the marginal probability of a state might stay the same, and shattering is associated with moving the condition distribution on percepts closer to the marginal.

the speculation is then that this has some desirable effects, to have mixing. i want to imagine that there is some global capacity, and that mostly the attractors are using up all the capacity. they are maintaining their reach, their territory, through a reinforcement that is related to the temporal associations, and mixing is reducing this.

attractors live by reward

i imagine that there is some process, maybe it is related to dopamine and the literal reward system, or maybe it is some other mechanism, that makes it so that associated with having a percept \(x(t)\), there is some associated reward.

the difficult thing is that the mindwandering does not tend towards only “pleasant thoughts”.

i don’t know i’ll shift

scale free and power laws

a function \(f(x)\) is scale free if there exists an \(H > 0\) such that for all \(\lambda > 0\), \[ f(\lambda x) = \lambda^{H}f(x) \] a power law is a function of the form \(f(x ) = ax^{-\alpha}\), and here \(f(\lambda x) = a(\lambda x)^{-\alpha} = a\lambda^{-\alpha}x^{-\alpha} = \lambda^{-\alpha}f(x)\), so we will simplify and just equate scale free with power law. The \(H\) is called Hurst exponent in some contexts, scaling exponent in some contexts.

consider a random sequence \(\{X_t\}_t\). If the mean and variance is the same for all \(t\) we say it is weakly stationary, then we can define an autocorrelation function \(r(\tau) = Corr(X_t,X_{t+\tau})\). In smoe cases this correlation function is scale free \(r(\tau) \sim \tau^{-\alpha}\).

We should think of power laws as meaning “slowly decaying” by distance in time or space, as an alternative to exponential \(r(\tau) \sim \exp(-\tau)\). That would mean the signal \(X_t\) there is not a lot of connection between nearby occurences, less “memory” of the past.

we can also do it for spatial correlation. If we consider a metric space \((\mathcal{X},d)\), then a random field \(\{\phi(x)\}_{x\in\mathcal{X}}\) is also bunch of random variables but now instead of being indexed by time they are indexed by a point in a higher dimensional space, and I am using a metric space with a distance function. For some random fields, where correlations do not depend on “where” and which direciton but only about the distance between points, the correlation function which in general will be \(G(x_1, x_2) = \mathbb{E}[\phi(x_0)\phi(x_1)] - \mathbb{E}[\phi(x_0)]\mathbb{E}[\phi(x_1)]\) can be considered a function only of the distance \(r\) between points. Here again we can imagine that the correlation decays with distance either exponentially or as a power law \(G(r) \sim r^{-\alpha}\).

criticality in statistical mechanics

In statistical mechanics, scale free is related to criticality. Consider a (2d) grid/lattice of points, that can take on values -1 or +1. A specific configuration of -1 or 1 is the state of the system. We are not concerned about how it evolves, but about the distribution. A Boltzman distribution is a distribution that can be written \(p(x) \sim \exp(-\beta H(x))\). If we have a function \(H(x)\) (a Hamiltonian) that assigns a scalar value (“energy”) to a specific configuration \(x\) of the grid, then we can get a distribution over configurations. This distribution depends on how we choose our H function, but once that is fixed it also depends on \(\beta\). In statmech foundations of thermodynamics \(\beta\) is literally associated with the literal temperature of the system \(\beta \propto 1/T\), but we can consider it more general. The picture is rather that we have some parameters like \(\beta\) and they determine the distribution on the random field (eg the grid), but it is easier if we think of a continuous field rather than a grid. So this is just an explanation of what it could mean that we have a random field, with a specific correlation function, and we can imagine that there are ways in which this random field is translation invariant and rotation invaration so that we only care about distances of points, so we have \(G(r)\) for the correlation. As we said, there might be a parameter such as \(\beta\) that influences the distribution and therefore the random field, and therefore it can also influence the correlation function. In particular we can imagine the correlation decays with distance r, but how fast depends on the parameter such as temperature \(T\) \[ G(r; T) \sim \exp(-r/\xi(T)) \] this \(\xi(T)\) is called the correlation length. Statistical mechanics is used to describe phase transitions in systems, where systems changes from behaving in one way to a “qualitatively” different way. We often hear the example of ice and water, the same things but different temperature, but this stuff with criticality is used to describe “second order phase transitions”, and water to ice is not that so not all phase transitions are like that. But such a phase transition happens at some “temperature” or other value of some parameter, and the relation is that it is a this special temperature we call it \(T_c\) that the correlation is instead a power law \[ G(r;T) \sim r^{-\alpha} \] this means that there are “stronger” correlations between distant points, not in the dynamic sense but under the boltzman distribution.

For the critical point, as \(T\) comes closer to \(T_c\) the correlation length becomes longer and longer, and it diverges \[ \lim_{T \to T_c} \xi(T) = \infty \] but also, and i am not sure about the exact relatinship, it is modelled close to T as \[ \xi(T) \sim |T - T_c|^{-\nu} \] where \(\nu\) is the correlation length exponent, and you can see how this means that it become larger and larger as we get closer.

With the scaling hypothesis the power law decay and exponential decays is bridged somehow with a function \(f(u) = \exp(-u)\) that becomes constant near the critical point \(T_c\) so that the correlation function can be written in general as \[ G(r; T) \sim \frac{1}{r^{d-2+\eta}}f(\frac{r}{\xi(T)}) \] so that because f is exponential when not close to \(T_c\) that part will dominate and because it is constant close to \(T_c\) the fraction is what matters, and the fraction is just where instead of writing \(r^{-\alpha}\) the exponent is now \(\alpha = d-2+\eta\), where \(d\) is some constant and \(\eta\) is the important exponent for power law now.

It does not just jump directly from exponential to power law exactly at \(T_c\), also if \(T_c\) is continouous what does it even mean for a system to be “exactly” \(T_c\). So there is something called the scaling hypothesis that describes like as \(T\) goes towards \(T_c\) in the tiny neighborhood around \(T\), then there is way of