Mining is at the heart of the Bitcoin protocol. Is the mining that ensures concensus on the set of rules that define Bitcoin. Today I would like to review the theoretical basis of mining, in order to answer practical questions such as: what’s the probability of mining a block considering my hardware? how long do I have to wait before finding a block? and how much am I expected to earn with mining in a given period of time?

What is mining?

(TODO )Mining consists in the process of adding a new block to the blockchain. Mining nodes construct a merkel tree of transactions and then (graphic here) they build a block header (cite Antonopoulos). To make my point I should first disect a block. Look where the nounces are placed, the role of the block header, coinbase transaction, merkle tree root and previous block hash.

Let \(M = 2^{256}\) denote the size of the sha256 image space, which can be enumerated with all numbers between 0 and \(2^{256}-1\). A block is valid if the hash corresponds to a number less or equal \(T\) that is called the target and it is part of the concensus (see function CheckProofOfWork in pow.cpp). The target varies as the difficulty adjusts every 2016 blocks.

(TODO) Check the code for difficulty adjustment.

Blocks encode the target in the header in a field called “bits”. In Big endian representation (when using bitcoin-cli for instance) bits has 4 bytes, where the first is \(E\), and the other 3 bytes are a number \(C\). The target is then

\[T = C \cdot 256^{E-3}\]

The smallest value of the target is a number \(T_1\) which corresponds also the genesis block target. For example, the genesis block has the following bits: \(\mathtt{0x1d00ffff}\). That means

\[T_1 = \mathtt{0x00000000ffff0000000000000000000000000000000000000000000000000000}\]

The size of the valid block sha256 image space is also \(T\). Now, assuming sha256 preserves the uniform probability distribution of possible block space, then probability of hitting a valid block in a single trial is computed as

\[p(T) = T/M\]

(TODO) in Cryptography, this property is assumed for a good hash function.

For example, in the case of the genesis target, this probability is

\[p(T_1) = T_1/M = 2 \times 10^{-10}\]

Work

When mining, our software will produce many block candidates until one is found that meets the target constraint. In this process we can identify the random variable \(\omega\) that counts how many trials are needed before hitting a block as the work. The probability distribution of work is a Geometric distribution, let us illustrate why: Consider \(p\) as the probability of mining a block in a single trial, then in order to mine a block after \(w\) trials we must have a sequence of \(w-1\) failure events with probability \(1-p\) each and a final success event with probability \(p\), hence the probability of having performed a work equal to \(w\) is the multiplication of those events’ probabilities. Therefore the probability of performing w trials until a block is found is

\[P(\omega = w | N = 1) = (1-p)^{w - 1} \cdot p\]

We use a conditional probability event \(N = 1\) to indicate that a single block is found and then we stop.

The Geometric Probability Distribution is a well known probability model. For instance the moment generating function is computed as (see [1])

\[\langle\exp(z \omega)\rangle_{N=1} = \frac{p}{p+e^{-z}-1}\]

and from that result we obtain the expected work denoted as \(W(T)\) as a function of the target is computed as

\[W(T) = \langle\omega\rangle = 1 / p\]

As a side note: moment generating functions (MGF) fully describe probability distributions. Once a MGF is computed for a distribution, then all expectation values can be derived performing linear operation on it. For example above we obtained the expected value of \(\omega\) doing a partial derivative on the MGF

\[\langle\omega\rangle_{N=1} = \frac{\partial}{\partial z} \langle\exp(\omega z)\rangle |{}_{z=0} = 1/p\]

Therefore, the expected work needed to hit a block is the inverse of the probability of hitting a block with a single trial. For example the expected work needed to mine the genesis block is \(W(T_1) = 1/p(T_1) \approx 4.3 \times 10^9\). Notice that

\[\lfloor W(T_1) \rfloor = \lfloor \frac{M}{T_1} \rfloor = \mathtt{0x100010001}\]

which is the chainwork of the genesis block.

(TODO) check the source code for how chainwork is defined

chainwork_n = int(W(T_n)) + chainwork_{n-1}

Difficulty \(d\) is a real number whose concept is that of a relative expected work with respect to the genesis block. Therefore difficulty is mathematically defined as

\[d = \frac{W(T)}{W(T_1)} = \frac{T_1}{T}\]

Later we will need to write the expected work in terms of the difficulty instead of the target:

There is a one-way relation between difficulty and target, therefore one can use either one of them to describe the current probability \(p\) of finding a block and hence all related quantities. However, we will adopt a convention here to write all results in terms of the difficulty. For instance the expected work as a function of the difficulty is

\[W(d) = W_1 \cdot d\]

Time

A mining node is roughly a computer that produces \(w = t\cdot h\) block candidates during the elapsed time \(t\), where \(h\) is a constant called hashrate. Using this definition and assuming the typical time scales we measure are much bigger than the time it takes to perform a single trial (\(\Delta t = 1/h\)) we can introduce the probability density function (PDF) of time elapsed until a single block is found by a miner.

\[p(t|N=1) \Delta t = P(\omega = h\cdot t | N = 1)\]

taking the limit when \(p \ll 1\) and \(\Delta t = 1/h \ll 1\) we can write \(1-p \approx \exp(-p)\) and we obtain that the waiting time until one block is found follows an exponentially decreasing PDF

\[p(t| N =1) = \frac{h}{W_1 \cdot d}\, \exp(-h\cdot t/(W_1\cdot d))\]

It follows that the expected time is

\[\langle t \rangle_{N=1} = \frac{W_1 \cdot d}{h}\]

Or just simply

\[\langle \omega \rangle_{N=1} = \langle t \rangle_{N=1} h\]

which can also be obtained by applying the expectation operator to the equation \(\omega = t\cdot h\).

For example, at the time of writing the difficulty is \(d = 5.7 \times 10^{13}\), so that \(h = 1 \,\mathrm{TH/s}\) (tera-hashes per second) will have an expected waiting time of \(\langle t \rangle = 2.5 \times 10^{11}\) seconds or about 8000 years! In order to be able to mine a block every 30 days one needs: \(95000 \,\mathrm{TH/s} = 95 \,\mathrm{PH/s}\). In order to mine a block every 10 minutes the hashrate should be: \(4\times 10^8 \,\mathrm{TH/s} = 400 \,\mathrm{EH/s}\), and that’s the estimated network hashrate at the moment.

Generalization of work to mine N blocks

As an interesting mathematical exercise, we can generalized the previous random variable for work to consider the case in which we would like to keep producing block candidates until we find N valid blocks. In this case the probability of having \(\omega = w\) will be the multiplication of the probability of occurrence of \(w-N\) failures, each one corresponding to \((1-p)\) and the occurrence of \(N\) successes with probability \(p\). Clearly this is subjected to \(w \ge N\) and a combinatorial factor must be present to account for the random order in which \(N-1\) success events can occur among \(w-1\) trials, before the final trial is a success. Therefore we obtain that the probability of performing w trials until N blocks are found to be

\[P(\omega = w | N) = { w-1 \choose N-1 } (1-p)^{w - N} \cdot p^{N}\]

which is known as a negative binomial distribution (the Negative probability distribution as listed in wikipedia [5] follows a different convention from the one we show here, however, with a slight change of variable one can obtain the same expressions). The MGF for this distribution can be shown to be

\[\langle \exp(\omega z) \rangle_N = \left( \frac{p}{p + e^{-z} -1 } \right)^N\]

The expected work until \(N\) blocks are mined is computed as

\[\langle \omega \rangle_N = \frac{N}{p}\]

Similarly to the previous section, we can introduce a time random variable which denotes the waiting time until \(N\) block are found given a constant hashrate \(h\). We use \(\Delta t = 1/h \ll 1\) as the smallest unit of time, assume \(p\ll 1\) so that we can write \((1-p)\approx \exp(-p)\), also assume that \(N \ll 1/p\) (that is N is much smaller than the expected work to mine a single block) and \(N \ll w\) (that is we consider time intervals much greater than the time it takes to perform N trials). Then we obtain that the waiting time until a total of N blocks are found follows a gamma PDF [3]:

\[p(t|N) = \frac{h}{W_1 \cdot d} \frac{1}{(N-1)!} \left(\frac{t\cdot h}{W_1\cdot d}\right)^{N-1}\exp(-t\cdot h/(W_1\cdot d))\]

With expected waiting time equal to

\[\langle t \rangle_{N} = N \frac{W_1 \cdot d}{h}\]

Counting valid blocks

Now suppose we are interested in the number \(\eta\) of valid block one can obtain after performing \(w\) trials. In a sequence of \(w\) trials/events \(\eta = N\) if \(w-N\) block trials are failures with probability \((1-p)\) and \(N\) blocks are valid with probability \(p\). The probability of \(\eta=N\) is obtained by multiplying those independent event probabilities and multiplying by a binomial factor that takes into account the random order of these events. Hence the probability of finding N valid blocks after w trials is

\[P(\eta = N| w) = {w \choose N} p^{N} (1-p)^{w-N}\]

which is a binomial distribution [2]. The MGF for the binomial distribution is

\[\langle \exp(\eta z) \rangle_{w} = ( p e^z + 1 - p )^w\]

From here it follows that the expected number of mined blocks after \(w\) attempts is

\[\langle \eta \rangle_w = w \cdot p\]

Time can be introduced in the equation if we consider \(w = h\cdot t\). We can also replace write \(p = 1/(W_1\cdot d)\) in terms of the difficulty and assume \(p\ll 1\) and \(N \ll w\). Using these approximations we obtain that the probability of finding N valid blocks after period of time t is

\[P(\eta = N| t) = \left( \frac{h\cdot t}{W_1\cdot d} \right)^N \frac{\exp(-t\cdot h/(W_1\cdot d))}{N!}\]

which is a Poisson probability distribution function [4], whose MGF is

\[\langle \exp(\eta z) \rangle_{t} = \exp\left( (e^z-1) \frac{h\cdot t}{W_1 \cdot d} \right)\]

which implies the expectation value is

\[\langle \eta \rangle_{t} = \frac{h\cdot t}{W_1 \cdot d}\]

That result is expected since we had seen that \(\langle \eta \rangle_w = w \cdot p\).

For example: A miner with \(1\, \mathrm{TH/s}\) is expected to hit: 0.000128 blocks every year (one every 8000 years). A miner with \(100\, \mathrm{PH/s}\) is expected to hit: 12.8 blocks every year (once a month). A miner with \(400\, \mathrm{EH/s}\) is expected to hit: 51417 blocks every year (one every 10 minutes).

The Poisson distribution for the number of blocks in a time frame \(t\) we have seen above, can also be used to compute a more practical prediction, which is the probability of finding at least one block in a time frame \(t\). That number would be 1 minus the probability of finding none

\[P(\eta>0|t) = 1 - P(\eta=0|t) = 1 - \exp(-t\cdot h/(W_1\cdot d))\]

which can be also derived from the PDF of the waiting time before hitting a block. Notice that if the time frame is small compared to \(\frac{W_1\cdot d}{h}\) we can approximate

\[P(\eta>0|t) \approx \frac{t\cdot h}{W_1 \cdot d}\]

For example, a miner with \(1 \,\mathrm{TH/s}\) will have: \(P(\eta>0|10\,\mathrm{min}) \approx 2.4 \times 10^{-9}\), or 2 over 1 billion. A miner with \(100 \,\mathrm{PH/s}\) has instead: \(P(\eta>0|10\, \mathrm{min}) \approx 0.00024\), or 1 in 4000. A miner with \(400 \,\mathrm{EH/s}\) has: \(P(\eta>0|10\, \mathrm{min}) = 0.624\). Of course, the longer one waits the better is this probability.

Solo Mining economics

A solo miner has a computer with a constant hashrate \(h\). Previously we have introduced time into our probability distributions because mining equipment spends a certain power of electric energy per unit of time and often we will be worried about how much time one needs to wait until a block is found as well as how much energy we will consume in respect to the expected gains from our mining setup.

A solo miner expects to earn some amount of Bitcoin \(\langle r \rangle\) in a period of time, we call this expected revenue:

\[\langle r \rangle_t = \langle \eta \rangle_t \cdot R = \frac{R \cdot h \cdot t}{W_1 \cdot d}\]

where \(R\) is the reward per block, currently 6.25 BTC new minted bitcoin plus transaction fees, currently around 0.2 BTC, this can change wildly over time.

For example: assume \(R = 6.5\, \mathrm{BTC}\) then a miner with \(1\, \mathrm{TH/s}\) is expected to earn: 84’000 sats a year; while a miner with \(100\, \mathrm{PH/s}\) is expected earn: 84 BTC a year.

A miner has to take into account the energy that he spends. For that reason an important characteristic of ASICs is the energy efficiency \(\epsilon\), usually measured in \(\mathrm{J/TH}\). If we are not worried about the revenue per unit of time, but instead the revenue per unit of energy we should look at the quantity \(\rho = \langle r \rangle/Q\) where \(Q\) is the energy spent. Notice that \(h\cdot t\) is the number of trials in the time frame \(t\), but also \(h\cdot t = Q/\epsilon\) because we produce \(1/\epsilon\) hashes per unit of energy. Then

\[\langle r \rangle = \frac{R \cdot h \cdot t}{W_1\cdot d} = \frac{ R}{ W_1 \cdot d} \cdot \frac{Q}{\epsilon}\]

therefore the revenue per unit of energy is

\[\rho = \langle r \rangle / Q = \frac{R}{W_1 \cdot d \cdot \epsilon}\]

For example a miner with \(\epsilon = 30\, \mathrm{J/TH}\) currently earns: \(290\,\mathrm{sats/kWh}\) (notice we have converted \(\mathrm{sats/J}\) to \(\mathrm{sats/kWh}\) because electric energy usually is expressed in this kWh not in Joules).

Consider the cost of electricity \(c\), usually in FIAT, but if that cost expressed in sats is greater than \(\rho\) then it means we are better off buying bitcoin than mining it. Unless we need the computation heat anyways, in that case one can consider mining as consuming electricity for heat with a payback in bitcoin.

For example consider \(c/S = 1000\, \mathrm{sats/kWh}\) (with \(c\) as 25 cents per kWh and \(S\) equal to 25’000 $ per BTC). Then we are effectively paying

\[c/S - \rho = 1000 - 300 = 700\, \mathrm{sats/kWh}\]

instead of \(1000\, \mathrm{sats/kWh}\). If the price of bitcoin \(S\) rises, \(c\) could even be lower and the revenue could surpass the costs. This happens if

\[\rho > c/S\]

that is

\[S > c/\rho\]

For example with the current difficulty and with \(\epsilon=30\, \mathrm{J/TH}\), and assuming \(c = 0.25\, \mathrm{\$/kWh}\) we obtain: \(S > 80'000\, \mathrm{\$/BTC}\) before mining is profitable. Mining could also become profitable if the difficulty drops (in the case other miners close their operations) or the reward per block \(R\) increases; earlier this year there was a spike in the mining revenue due to increased demand of block space due to introduction of inscriptions in the Bitcoin blockchain. If the reward per block hits 10 BTC then mining at these rates becomes profitable if \(S>50'000\, \mathrm{\$/BTC}\). With an efficient miner consuming \(\epsilon = 20\,\mathrm{J/TH}\) mining becomes profitable already at \(S>34'000 \,\mathrm{\$/BTC}\).

Pool mining economics

Conclusions

References

[1] Wikipedia Geometric Distribution

[2] Wikipedia Binomial Distribution

[3] Wikipedia Gamma Distribution

[4] Wikipedia Poisson Distribution

[5] Wikipedia Negative Binomial Distribution