Maximilian Gazeroglu

2025-09-12

The Central Thesis of Autonomous Self

Autonomous Self exists to conquer the execution problem: the inability to consistently perform all the things you should do, want to do, can do, but just aren't doing. Every good habit you strive to build—exercise, diet, deep work, meditation, reading, sleep—and every bad habit you aim to eliminate collapses into this one battle.

What is the core cause of the execution problem?

1) Fundamentally, it is an execution alignment problem. A part of you, the modern reasoning/planning cortical part of your brain1primarily the prefrontal cortex (PFC), holds your long-term goals. Another part of you, the older instinctual subcortical part of your brain2the limbic system (amygdala, hypothalamus, hippocampus), basal ganglia, thalamus, midbrain dopaminergic nuclei (VTA, SNc), brainstem, and cerebellum, was hardcoded by evolution to optimize for behavior in environments that differed greatly from our current one, where these long-term goals do not make sense3Imagine telling a starving early human to jog daily to burn fat—he was wired to conserve energy and crave high-calorie food, instincts that still pull us to donuts today in a world of overabundance.. The execution problem stems from the misalignment between these two selves on four core attributes:

A. Time: small immediate rewards vs large delayed rewards (Temporal Discounting)
B. Cost: cost/effort aversion vs self-control recruitment (Cost Sensitivity)
C. Cues: external/internal stimuli biased vs internal values biased (Urgent vs Important)
D. Behavior: instinctual stimulus-response vs goal-directed action-outcome (System 1/2)

2) The other cause—equally important but easier to fix—is that the cues to initiate our desired actions fail to appear.

So how do we solve the execution alignment problem?

Fundamentally, we must first understand how the brain decides which of the two competing interests to execute, and then modify the environment accordingly to ensure the cortical interests (where our desired activities live) are chosen.

Before diving in, the goal of a system that solves the execution problem is to ensure either of the 2 following (exhaustive) cases:

1) Your cortical self wins the battle over your subcortical self
2) Your cortical self loses the battle, but the environment is designed such that the thing your subcortical self wants to do (i.e. the best/easiest thing in the environment) IS the thing that your cortical self wants you to do. In other words, the environment is designed for optimal defaults.

This second idea is the primary strategy of Autonomous Self. Along the power law distribution of deployable tactics, environment alignment is the most powerful because even at your weakest state, you still perform the desired action. The rest of the distribution contains brain-targeted strategies to ensure cortical-you wins over subcortical-you, in the event that the environment was not fully aligned.

To put all this into context, we'll examine how both brain regions function and how the brain resolves their competition, so that we can identify the inputs to optimize for execution alignment.

How The Human Brain Makes Decisions4A small side note: Much of the popular habit literature, like Atomic Habits by James Clear, centers on the Cue-Craving-Action-Reward loop that underlie the habit forming center of the subcortical brain, the ventral striatum. But habit is not the starting point—it's the downstream result of many repetitions of both subcortical and cortical decision-making. The real challenge is completing the first 30+ repetitions before habit takes over. Tactics like the 2-min rule and temptation bundling do help in this phase, but these tactics are ideated under the lens of habit / ventral striatum logic. For this pre-habit stage, better is to include tactics that additionally target the cortical structures also used in decision-making, like the ACC and DLPFC. Only once solved should attention shift back towards the ventral striatum so that actions become automated without cortical effort.

Subcortical Structures

The primitive, subcortical regions (under the cerebral cortex) consist of:

A. Basal ganglia, mainly consisting of the striatum:
- • ventral striatum (NAc): learns from dopamine-reward prediction
- • dorsal striatum: stimulus-response habits / motor chunks
B. Amygdala: salience/emotion
C. Hypothalamus: homeostasis (e.g. thirst, hunger, sex, hormones)
D. Midbrain (VTA/SNc): dopaminergic RPE signals
E. Brain stem/Cerebellum: coordination; autonomic/survival programs
F. Thalamus: signal relay station

Dopamine

The instinctual, subcortical structures mainly operate through rewards, implemented by neurotransmitters like serotonin (contentment), endorphins (euphoria), epinephrine (arousal), norepinephrine (attention/arousal), and most importantly, dopamine (motivation).

However, dopamine is correlated not with reward, but the expectation of reward. More precisely, it is a teaching signal to update the brain's weights to better predict when a reward will occur and thus act accordingly. The amount of dopamine that fires is proportional to the difference between obtained reward and expected reward, called a Reward Prediction Error (RPE). When expectation is low but a large reward is obtained, the delta between these two is very large (a delta that should be minimized) and so dopamine spikes at reward time. As the reward becomes consistent given some cue, the dopamine spike now occurs at the cue that best predicts the reward. Since the expectation (measured by the size of the dopamine spike) matches the level of reward, there is no longer an error to minimize. As a result, no further dopamine modifications are needed since the neural network has properly learned to predict this reward. If the reward does not occur when predicted, dopamine will downspike at reward time, again to minimize the delta between expectation and reward, and eventually to unlearn the association between cue and reward.

It is important for dopamine to spike at the cue that predicts a reward because the second main function of dopamine is to energize behavior, by exciting the striatal/motor circuits corresponding to the behavior that obtains the expected reward. And so, the primary mechanism of action for the instinctual part of our brains is to automatically perform the behavior that leads to a reward for a given cue, both energized and learned by dopamine spikes that seek to minimize predicted reward errors.

Cortical Structures

The cerebral cortex is the outer sheet of the brain (about 80% of volume) and contains the neocortex, the modern part of the brain capable of abstract reasoning. It contains 4 main regions:

Occipital lobe (vision)
Temporal lobe (auditory and memory)
Parietal lobe (touch and spatial navigation)
Frontal lobe (planning reasoning abstraction)

The subset of the frontal lobe involved in higher order thought is the prefrontal cortex (PFC).

Of the regions of the PFC, two are most important. The lateral PFC (LPFC) performs planning/reasoning, keeps goals and relevant information in working memory, and exhibits self-control over subcortical desires. The anterior cingulate cortex (ACC) in the medial PFC (MPFC) is the monitor of decision making, taking input from a variety of brain regions, and determines whether to recruit the LPFC to exhibit self control over the subcortical structures to achieve a goal.

The Interplay of brain regions in decision making

ACC - performs a cost benefit analysis, called the Expected Value of Control (EVC), of the potential rewards of achieving a goal to the costs of control and determines whether to recruit the LPFC to exhibit self control over the subcortical structures
Insula - maps interoception (heartbeat, breath, gut), subjective unpleasantness (pain, craving, effort, cost)
Orbitofrontal cortex (OFC) - assigns the value of a reward (e.g. how good a food tastes), integrating emotion and external stimuli, evaluates immediate rewards/punishments
Medial PFC (MPFC) - integrates and computes the value of long term values, pulling from long-term memory
Subcortical regions - the previously discussed subcortical regions are further used as inputs for benefits/costs
LPFC - the source of self control, suppressing subcortical responses to off-goal distractions5In imaging studies, this region lights up when subjects deploy self control (e.g. choosing to eat a healthy food over a delicious junk food), and in people who claim greater self control in surveys. People with damaged, less developed (e.g. children), or fatigued PFCs are unable to exhibit control over their instinctual desires. Perhaps what distinguishes people with greater self control is the functional capacity of their LPFC, the control-allocation signals of their ACC, or the ability of their LPFC to modulate the value discerned by the OFC.

The computation for decision making

There is no universally accepted equation for how the ACC computes the Expected Value of Control (EVC). However, the following is a reasonable approximation:

EVC = V · E − g · C − O

V = current subjective value (immediate/delayed, intrinsic/extrinsic, primary/secondary gains/losses)
E = perceived efficacy of control
C = perceived cost/effort (physiological, cognitive, social, etc)
g = gain on cost sensitivity (set by neuromodulators; affected by stress, sleep, meditation, etc)
O = opportunity cost

When EVC > 0, the ACC recruits the LPFC to exert self-control over the subcortical regions to sustain progress towards the goal.

How to Tip the Scale in Favor of the Cortical Self

Modifying Brain Hardware

What strengthens ACC / LPFC (in order of effectiveness):

1) Sleep & circadian regularity — biggest legal performance enhancer for prefrontal control
2) Acute aerobic exercise (20-30min) — same-day boost to executive function/vigilance
3) Mindfulness/attention-training (e.g. meditation) — measurable gains within weeks
4) Cold/heat exposure (hormetic)
5) Caffeine

What weakens them:

1) Sleep loss/debt, circadian irregularity
2) Chronic stress / anxiety spikes; decision fatigue
3) Alcohol, cannabis, nicotine withdrawal
4) Large glycemic swings / poor nutrition; dehydration
5) Distracting/interruptive environments (constant pings, task-switching)

Modifying Brain Software

Book a 1-1 call with me to learn more.

Environment

While we can modify the brain’s inputs so your cortical self overrides your subcortical self, a more powerful solution would be to modify the environment such that the activity your subcortical self desires (the easiest, most rewarding option) IS the one your cortical self wants it to perform. In other words, we design for optimal defaults. That way, even at your worst state with zero self control, you still end up doing the desired activity.

We know we are the products of our environments, yet we still underrate its importance! At a macro scale, the environment drives evolution by setting the constraints and incentives that define the fitness function that selects which biological mutations will dominate the next generation. On a micro scale, this happens to you every day: the environment performs evolution on your behavior! You try different behaviors (like genetic mutations), the environment resists some and favors others, and those that are best aligned become the default. In this sense, the "optimal" behavior is already dictated by the environment—you simply discover it through trial.

A simple example: you come home at night and flip on the lights. The lighting layout makes some rooms more inviting, so you spend more time there, and the items in that space shape the next actions you take. Step by step, your behavior is being constrained and selected by the environment, often without you realizing it.

When people try to build a new habit, by definition the action isn’t optimal in their environment—otherwise they’d already be doing it. So they rely on bursts of motivation to fight environmental pressures and force the behavior; it works the first day, maybe the second, but by the seventh day, motivation has faded and the old defaults return.

Instead of spending bursts of motivation to repeatedly push through a suboptimal behavior, it’s better to use that burst of motivation once to redesign the environment so the desired action becomes the optimal one. Afterwards, motivation is no longer needed—you naturally perform the behavior weeks or even months later, because it is now the path of least resistance in your environment.

So how should the environment be modified?

Some existing solutions attempt to modify the environment with negative punishment—automatically charging you money if you don’t do X by Y time. These systems force you to perform an action that isn't optimal in the environment by incurring pain otherwise. As a result, instead of performing the behavior, you’re motivated to escape the system itself and restore the previous environment to remove the punishment. Retention suffers, and you inevitably fall back into the old habits of the old environment.

Instead, the environment should be reshaped to directly target the subcortical brain, namely by positive reward. A core principle of Autonomous Self is to transform your desired actions into the optimal ones by tightly coupling rewards that directly captivate the subcortical brain—specifically the ventral striatum and its dopamine-reward circuitry. If the rewards of the task outweigh its costs6Ideally by at least a factor of two, since we are more sensitive to losses than to gains (Kahneman & Tverskey 1992) and this reward-cost delta exceeds that of every other option in the environment, the desired action becomes the optimal default. Even at your weakest, your subcortical mind seeks it.

So are there rewards powerful enough to make this work? In practice, the elements in your environment with the greatest gravitational pull are usually digital devices like your phone/computer/TV. These devices dominate because they can generate any combination of visual and auditory output—the two mediums that make up most human perception. It follows that a carefully curated subset of this input can serve as rewards strong enough to outweigh the costs of almost any task, especially if also decoupled from undesired activities. For example, digital superstimuli like YouTube contain entire worlds within them that can be used as simultaneous or post-task rewards, directly engaging the ventral striatum’s dopamine circuits. For the details of the kinds of rewards and how to deploy them, see below.

After designing the environment for optimal defaults, good habits encoded in the ventral striatum will naturally result. Afterwards, the rewards that were previously decoupled away from the rest of the environment can be reintroduced.

Hardware / Software Solution

I'm currently building software that builds upon the above ideas and trialing it with early users. If these ideas resonate with you and you are trying to form some cornerstone habit (work, exercise, sleep, meditation, reading etc), book a free 20min 1-1 call with me so we can both tackle your biggest problem and assess whether our software is a good fit.

Or feel free to email me at max@atnself.com with thoughts.