Tinychain源码阅读笔记2-检测区块有效性

上次我们分析到load_from_disk如何将chain.dat中的区块数据解析出来的，接下来的工作便是验证区块的有效性了，直接看 connect_block函数：

@with_lock(chain_lock)
def connect_block(block: Union[str, Block],
                  doing_reorg=False,
                  ) -> Union[None, Block]:
    """Accept a block and return the chain index we append it to."""
    # Only exit early on already seen in active_chain when reorging.

connect_block函数接受Block，还有个默认值为False的doing_reorg参数，我们暂时不关心这个参数，继续看代码。

    search_chain = active_chain if doing_reorg else None
    if locate_block(block.id, chain=search_chain)[0]:
        logger.debug(f'ignore block already seen: {block.id}')
        return None

doing_reorg默认是false，所以这里search_chain是None，接着是locate_block，这个函数是为了检测区块是否与历史区块重复。

@with_lock(chain_lock)
def locate_block(block_hash: str, chain=None) -> (Block, int, int):
    chains = [chain] if chain else [active_chain, *side_branches]

    for chain_idx, chain in enumerate(chains):
        for height, block in enumerate(chain):
            if block.id == block_hash:
                return (block, height, chain_idx)
    return (None, None, None)

因为传进来的search_chain是None，所以chains = [active_chain, *side_branches]，我们找下active_chain和side_branches的定义

active_chain: Iterable[Block] = [genesis_block] 
side_branches: Iterable[Iterable[Block]] = []

active_chain默认是包含创世区块的list，side_branches则是包含list的list，最里面的list包含区块。所以chains实际是不同链组成的list，也就是主链与分叉链，关于分叉详细情况后面再分析。
随后遍历chains，如果当前block.id与历史区块重复，则返回这个区块和区块高度以及该区块所处的链id。

    try:
        block, chain_idx = validate_block(block)
    except BlockValidationError as e:
        logger.exception('block %s failed validation', block.id)
        if e.to_orphan:
            logger.info(f"saw orphan block {block.id}")
            orphan_blocks.append(e.to_orphan)
        return None

后续要检测区块的有效性

@with_lock(chain_lock)
def validate_block(block: Block) -> Block:
    
    if not block.txns:
        raise BlockValidationError('txns empty')
    
    if block.timestamp - time.time() > Params.MAX_FUTURE_BLOCK_TIME:
        raise BlockValidationError('Block timestamp too far in future')

首先区块交易是None，连coinbase交易都没有，肯定是个无效区块。随后这个判断我没有理解作者的意思，block.timestamp - time.time()肯定是个负值，区块难道可能来自未来？？？继续看代码

    if int(block.id, 16) > (1 << (256 - block.bits)):
        raise BlockValidationError("Block header doesn't satisfy bits")

bits这个元素是决定区块创建难度的，区块生成要满足int(sha256d(block.header(nonce)), 16) < (1 << (256 - block.bits))这个条件，后续会介绍区块难度的变化情况。

    if [i for (i, tx) in enumerate(block.txns) if tx.is_coinbase] != [0]:
        raise BlockValidationError('First txn must be coinbase and no more')

区块的coinbase交易顺序应在交易列表的第一位，接下来

    try:
        for i, txn in enumerate(block.txns):
            txn.validate_basics(as_coinbase=(i == 0))
    except TxnValidationError:
        logger.exception(f"Transaction {txn} in {block} failed to validate")
        raise BlockValidationError('Invalid txn {txn.id}')

这段代码是为了验证区块交易的基本有效性，看看validate_basics函数的代码

    def validate_basics(self, as_coinbase=False):

        if (not self.txouts) or (not self.txins and not as_coinbase):
            raise TxnValidationError('Missing txouts or txins')

        if len(serialize(self)) > Params.MAX_BLOCK_SERIALIZED_SIZE:
            raise TxnValidationError('Too large')

        if sum(t.value for t in self.txouts) > Params.MAX_MONEY:
            raise TxnValidationError('Spend value too high')

当没有交易输出或者非coinbase交易没有输入，交易无效。还有交易总字节大小超过1MB以及交易输出金额超过总上限，交易都是无效的。

    if get_merkle_root_of_txns(block.txns).val != block.merkle_hash:
        raise BlockValidationError('Merkle hash invalid')

接下来验证的是交易merkleTree

def get_merkle_root_of_txns(txns):
    return get_merkle_root(*[t.id for t in txns])

将所有交易id传递给get_merkel_root

@lru_cache(maxsize=1024)
def get_merkle_root(*leaves: Tuple[str]) -> MerkleNode:
    """Builds a Merkle tree and returns the root given some leaf values."""
    if len(leaves) % 2 == 1:
        leaves = leaves + (leaves[-1],)

    def find_root(nodes):
        newlevel = [
            MerkleNode(sha256d(i1.val + i2.val), children=[i1, i2])
            for [i1, i2] in _chunks(nodes, 2)
        ]

        return find_root(newlevel) if len(newlevel) > 1 else newlevel[0]

    return find_root([MerkleNode(sha256d(l)) for l in leaves])

所有传进来的交易id，把它们称之为叶子，首先判断叶子节点个数，是奇数的话，复制最后一个叶子到叶子群中，使叶子总数为偶数。get_merkle_root内部定义了find_root函数，显然又是一个递归函数。

class MerkleNode(NamedTuple):
    val: str
    children: Iterable = None

def _chunks(l, n) -> Iterable[Iterable]:
    return (l[i:i + n] for i in range(0, len(l), n))

看到MerkleNode和_chunks的定义后，这段代码就很容易理解了，每两个相邻叶子的哈希值拼接在一起，对其双哈希生成了一个新的叶子，不断的递归调用，直至剩下一个节点，这个节点的值也就是merkleroot。每次递归的计算量是源叶子节点的一半，因此该算法的时间复杂度是log(n)。

    if block.timestamp <= get_median_time_past(11):
        raise BlockValidationError('timestamp too old')

计算最近11个区块中间区块的时间戳，如果当前区块的时间戳不大于这个时间戳，区块无效

    if not block.prev_block_hash and not active_chain:
        # This is the genesis block.
        prev_block_chain_idx = ACTIVE_CHAIN_IDX
    else:
        prev_block, prev_block_height, prev_block_chain_idx = locate_block(
            block.prev_block_hash)

        if not prev_block:
            raise BlockValidationError(
                f'prev block {block.prev_block_hash} not found in any chain',
                to_orphan=block)

当active_chain为None时，没有prev_block_hash的区块判定其为创世区块，并将父区块的链id设为ACTIVE_CHAIN_IDX。不满足上述条件则调用locate_block，获取prev_block, prev_block_height, prev_block_chain_idx，prev_block为None则报错，判定其为孤块。

        # No more validation for a block getting attached to a branch.
        if prev_block_chain_idx != ACTIVE_CHAIN_IDX:
            return block, prev_block_chain_idx

        # Prev. block found in active chain, but isn't tip => new fork.
        elif prev_block != active_chain[-1]:
            return block, prev_block_chain_idx + 1  # Non-existent

如果父区块的链id不为ACTIVE_CHAIN_IDX，则返回该父区块及其的链id。否则如果父区块不在active_chain末尾，则返回目前的区块和prev_block_chain_idx + 1。

    if get_next_work_required(block.prev_block_hash) != block.bits:
        raise BlockValidationError('bits is incorrect')

这段代码是为了验证该区块的难度值是否正确，子区块的难度是由父区块决定的。看一下get_next_work_required这个函数，传进的参数是区块哈希值

def get_next_work_required(prev_block_hash: str) -> int:
    """
    Based on the chain, return the number of difficulty bits the next block
    must solve.
    """
    if not prev_block_hash:
        return Params.INITIAL_DIFFICULTY_BITS

    
    (prev_block, prev_height, _) = locate_block(prev_block_hash)


    if (prev_height + 1) % Params.DIFFICULTY_PERIOD_IN_BLOCKS != 0:
        return prev_block.bits

    with chain_lock:
        # #realname CalculateNextWorkRequired
        period_start_block = active_chain[max(
            prev_height - (Params.DIFFICULTY_PERIOD_IN_BLOCKS - 1), 0)]

    actual_time_taken = prev_block.timestamp - period_start_block.timestamp

    if actual_time_taken < Params.DIFFICULTY_PERIOD_IN_SECS_TARGET:
        # Increase the difficulty
        return prev_block.bits + 1
    elif actual_time_taken > Params.DIFFICULTY_PERIOD_IN_SECS_TARGET:
        return prev_block.bits - 1
    else:
        # Wow, that's unlikely.
        return prev_block.bits

当prev_block_hash为None，会认定为此区块是创世区块，则把难度值设置为初始值INITIAL_DIFFICULTY_BITS，这个值为24.随后要获取父区块及其区块高度，tinychain定义了DIFFICULTY_PERIOD_IN_BLOCKS这个全局变量，它的值是600，意思是，从创世区块开始，每600个区块是一个难度周期，也就是说这600个区块的bits值都是一样的。如果目前的区块与父区块在同一个周期内，那么它的bits与父区块相等。如果当前区块处于一个新的难度周期，那么它的bits如何计算呢？这与上一个难度周期内的600个区块全部生成的总时间有关，即actual_time_taken这个量，看最后的判断代码，DIFFICULTY_PERIOD_IN_SECS_TARGET这个值是36000，难度周期的总时间比它大就减少难度，小就增加难度。这样做的目的是让这600个区块生成的时间趋近于36000秒，与算力变化有关，即无论算力如何变化，区块生成的速度总是稳定的，保障了tinychain的稳定性。

    for txn in block.txns[1:]:
        try:
            validate_txn(txn, siblings_in_block=block.txns[1:],
                         allow_utxo_from_mempool=False)
        except TxnValidationError:
            msg = f"{txn} failed to validate"
            logger.exception(msg)
            raise BlockValidationError(msg)

最后验证非coinbase交易的有效性，交易是tinychain的核心内容，比较复杂，下次再分析。

Tinychain源码阅读笔记2-检测区块有效性

Tinychain源码阅读笔记2-检测区块有效性

comments