Podes ajudar-me a fazer um resumo bastante complet...

बनाया गया: 25 मार्च 2026

का उपयोग करके उत्तर दिया गया GPT-5.4 Thinking द्वारा Chat01

प्रश्न

Podes ajudar-me a fazer um resumo bastante completo deste paper? Neste momento não tenho tempo para o ler com atenção, mas gostava de ficar já com uma base sólida e suficientemente detalhada para começar a estruturar um plano de apresentação.
Colei o texto do paper abaixo, mas podes consultar no ficheiro anexado!

Abusing the Ethereum Smart Contract Verification
Services for Fun and Profit

Abstract—Smart contracts play a vital role in the Ethereum
ecosystem. Due to the prevalence of kinds of security issues
in smart contracts, the smart contract verification is urgently
needed, which is the process of matching a smart contract’s source
code to its on-chain bytecode for gaining mutual trust between
smart contract developers and users. Although smart contract
verification services are embedded in both popular Ethereum
browsers (e.g., Etherscan and Blockscout) and official platforms
(i.e., Sourcify), and gain great popularity in the ecosystem, their
security and trustworthiness remain unclear. To fill the void,
we present the first comprehensive security analysis of smart
contract verification services in the wild. By diving into the
detailed workflow of existing verifiers, we have summarized the
key security properties that should be met, and observed eight
types of vulnerabilities that can break the verification. Further, we
propose a series of detection and exploitation methods to reveal
the presence of vulnerabilities in the most popular services, and
uncover 19 exploitable vulnerabilities in total. All the studied
smart contract verification services can be abused to help spread
malicious smart contracts, and we have already observed the
presence of using this kind of tricks for scamming by attackers.
It is hence urgent for our community to take actions to detect
and mitigate security issues related to smart contract verification,
a key component of the Ethereum smart contract ecosystem. 
I. INTRODUCTION
Ethereum, as one of the representative blockchain platforms, is regarded as a medal contender of Satoshi’s Bitcoin.
Its market cap peaked at 540billioninNovember2021[16].ThesuccessofEthereumcannotomittheexistenceoftensofmillionsofsmartcontractsdeployedonit.Specifically,smartcontractsonEthereumcanbeseenasscriptsthatwillbeexecutedoncepredefinedconditionsaremet.Alongsidethecharacteristicsofirreversibilityanddeterminacyofblockchain,developersstarttocomposedecentralizedapplications(DApps)withsmartcontracts,e.g.,gamblinggames[62],decentralizedexchanges[66],anddecentralizedautonomousorganizationsthatcanproposeanddiscussproposals[22],whereallparticipantsarewillingtoandPengxiangMaandNingyuHearecofirstauthors.§HaoyuWang(haoyuwang@hust.edu.cn)isthecorrespondingauthor.havetoobeygamerulesthatarecodedinthesmartcontracts.ConsideringtheefficiencyandI/Oissue,Ethereumonlystoresacompactformat,i.e.,bytecode,ofsmartcontractswithinitsdecentralizeddatabase.However,theunreadabilityofbytecodeseverelyhampersthedevelopmentofthewholeecosystem.Forexample,onEthereum,accountsareeligibletocreateandissuetokensbydeployingtheirowntokencontracts,itishardtotellscamtokensfromofficialorrealonesbyonlyauditingthebytecode.Forexample,Xiaetal.[72]identifiedover10Kscamtokencontracts,wherescammershavegainedaprofitofatleast540 billion in November 2021 [16]. The success of Ethereum cannot omit the existence of tens of millions of smart contracts deployed on it. Specifically, smart contracts on Ethereum can be seen as scripts that will be executed once pre-defined conditions are met. Alongside the characteristics of irreversibility and determinacy of blockchain, developers start to compose decentralized applications (DApps) with smart contracts, e.g., gambling games [62], decentralized exchanges [66], and decentralized autonomous organizations that can propose and discuss proposals [22], where all participants are willing to and *Pengxiang Ma and Ningyu He are co-first authors. §Haoyu Wang ([email protected]) is the corresponding author. have to obey game rules that are coded in the smart contracts. Considering the efficiency and I/O issue, Ethereum only stores a compact format, i.e., bytecode, of smart contracts within its decentralized database. However, the unreadability of bytecode severely hampers the development of the whole ecosystem. For example, on Ethereum, accounts are eligible to create and issue tokens by deploying their own token contracts, it is hard to tell scam tokens from official or real ones by only auditing the bytecode. For example, Xia et al. [72] identified over 10K scam token contracts, where scammers have gained a profit of at least 16 million. Such a gap between users’ expectation and actual
executed logic in unreadable bytecode urges the emergence
of smart contract verification, i.e., the process of matching a
smart contract’s source code to its on-chain bytecode.
Smart contract verification have been integrated into
Ethereum browsers, e.g., Etherscan [26] and Blockscout [6],
and other official platforms, e.g., Sourcify [2]. As shown
in Fig. 1(a), by providing a piece of source code, a set of
configuration items, and an address, the verification service
compiles the given source code according to the configuration
and compares it with the on-chain bytecode in the designated
address. For all source code that passes the verification, they will be stored by the service provider. Once someone asks the
source code of a contract, the verification service returns it if
it has been verified, as illustrated in Fig. 1(b). The retrieved
source code can be used in various kinds of downstream applications, e.g., vulnerability detection and code auditing. Indeed,
many research studies [28], [32] and industrial products [8]
rely on the smart contract verification services. Intuitively,
source code verification services will help gain mutual trust
between smart contract developers and their users, and boost
a series of applications.
Are the smart contract verification services trustworthy?
Surprisingly, no prior studies have considered whether these
verification services work as expected. Imagine such a situation
where a malicious developer provides a seemingly harmless
source code that can pass the verification of an on-chain contract, which however is embedded with backdoors. Moreover,
what if one’s smart contract can be verified by a source code
that is elaborately constructed by malicious competitors? These
severe consequences can happen once there are vulnerabilities
or bugs under any of the modules of the verification services.
These vulnerabilities can be abused by adversaries, leading to
an extreme negative impact to the whole Ethereum ecosystem.
This Work. We take the first step to perform a comprehensive security analysis on Ethereum smart contract verification services in this paper. Based on the implementation of
three widely-adopted verifiers, i.e., Etherscan, Sourcify, and
Blockscout, we firstly distill the general workflow of them
and identify key modules that make them up (see §IV). Then
we propose the key security properties that should be satisfied
in these services (see §V). Against the security properties, we
summarize eight types of potential vulnerabilities that can exploit the verification service. Further, for each type of vulnerability, we propose a detection method, the corresponding proof
of concept (PoC) to illustrate how they can be exploited, and
concrete steps to avoid the abuse (see §VI). Our comprehensive
analysis reveals 19 vulnerabilities in total on three services,
each of which can be exploited by attackers. After a timely
disclosure to service providers, 15 vulnerabilities have been
confirmed by the corresponding official and 10 of them have
been patched. We further measure the impact, and observe that
tens of millions of deployed innocent contracts can be abused,
and hundreds of contracts may have already committed fraud
by exploiting these vulnerabilities (see §VII).
Our contributions can be summarized as follows:
• To the best of our knowledge, we are the first to
characterize the design and implementation of popular
Ethereum source code verification services.
• We uncover eight types of overlooked but exploitable
vulnerabilities hidden in verification services.
• We propose the concrete methods to detect, exploit
and mitigate for all these types of vulnerabilities,
and reveal 19 vulnerabilities hidden in three verifiers.
By the time of writing, 15 vulnerabilities have been
confirmed, and 10 of them have been patched in time.
• We show that the uncovered vulnerabilities can introduce an extreme negative impact on the Ethereum
ecosystem, i.e., tens of millions of innocent contracts A. Smart Contract
Most Ethereum smart contracts are written in Solidity [54],
a high-level programming language specifically designed by
the Ethereum official. Correspondingly, Ethereum provides a
specific compiler, named solc, which takes Solidity files as
input and generates a bytecode file, which can be executed by
Ethereum Virtual Machine (EVM). An EVM bytecode consists
of a series of opcodes, which can interact with the data
structures maintained by EVM [23]. For example, the PUSH
opcode pushes its operand onto an operand stack.
Typically, an EVM bytecode can be divided into three parts
according to their functionalities, i.e., creation code, runtime
code, and metadata, as shown in Fig. 2b. Specifically, the
creation code can be executed only once. It is responsible for
deploying the corresponding runtime code, where the concrete
executing logic of a smart contract, e.g., implementations of
functions, is encoded. Moreover, the fixed-length metadata is
part of the runtime code and not executable. It is the hash
result of the meta information during the compilation (like
solc version) [55]. Additionally, the metadata can be used as a
key to retrieve and index the corresponding smart contract in a
decentralized database. For most smart contracts, there is only
one piece of metadata, while things look different for factory
contracts [70]. Specifically, a factory contract can deploy other
contracts solely by itself, as shown in Fig. 2a, where the fac
contract can deploy foo and bar whenever the function f is
invoked. Its bytecode structure is shown in Fig. 2c. The most
obvious distinction is that there are three pieces of metadata
embedded in the runtime code, corresponding to fac and the
contracts it can deploy. Note that, there exist explicit and fixed
indicators in the head and tail of each metadata, which can thus
be identified easily. Except for these three basic structures,
we often use the term, bytecode, as a general one to refer all these three parts together. When a user tends to initiate a
transaction with the locally compiled bytecode for deployment,
the bytecode as well as the initial values of parameters are
embedded into the input field of the transaction.
Functions and variables in Ethereum smart contracts can
be specified by type specifiers [54]. Some of them specify
the visibility, e.g., private and external, which can be
used to differentiate access control on functions and variables. Another specifier, named immutable, is introduced
to specify read-only variables. Different with the ordinary
constants, immutable variables are calculated and initialized
till the contract is deployed by the creation code.
B. Smart Contract Verification
Code Is Law, is one of the core principles of Ethereum. In
other words, all smart contracts in finalized blocks are nonupdatable and cannot be rolled back. As we mentioned in
§II-A, smart contracts are stored in the form of EVM bytecode.
Due to its unreadability, it requires huge efforts to identify
developers’ original intention.
Such a transparency issue urges the emergence of smart
contract verification1
, which is either a feature offered by
Ethereum browsers (e.g., Etherscan [26] and Blockscout [6]) or
a service provided by the Ethereum official (e.g., Sourcify [2]).
Specifically, anyone can upload a source code file and claim
it is the implementation of an on-chain smart contract. The
service is responsible for verifying if the bytecode compiled
from the given source code matches the one deployed on-chain.
Once matched, all users are able to access the source code file
for further usage.
There are two specific results to further describe that the
provided source code is matched to the on-chain contract,
i.e., partial match and exact match. To be specific, if the
given source code can generate an identical result with the
on-chain one except for the metadata, it is described as a
partial match. This is because some configuration items are
inconsistent with the ones at the time of original compilation.
Otherwise, it is an exact match. However, such a subtle
difference is only considered by Sourcify, where an exact
match can replace the existing partial match on the same
contract [58]. In contrast, Blockscout and Etherscan do not
allow replacements of any already manually verified contracts.
According to the statistics [25], [2], [6], less than 1%, 0.7% and
0.3% of contracts have been manually verified in Etherscan,
Sourcify, and Blockscout, respectively, indicating a huge attack
surface for adversaries.
III. ADVERSARY MODEL
A. Motivating Example
If the smart contract verification services are exploited,
it is possible that the provides source code is inconsistent
with the contract in the designated address. Fig. 3 illustrates
a concrete example. As we can see, from step 1 to step 3,
attackers intentionally deploy an evil contract (A), request a
source code verification by providing A. Thus, the source code
of A is stored in the corresponding address, 0x11...1, forexample. Then, to hide their malicious intention, they compose
another harmless contract, named B. Based on the contract B,
attackers repeat the above processes from step 4 to step 6.
However, at step 5, they slightly modify the configuration file.
As the highlighted rows indicate, they claim the path of B as
../0x11...1/a.sol. To this end, the source code of B
overwrites the A’s in the back-end database. Users who visit
0x11...1 eventually obtain the source code of B without
explicit warnings. In other words, the harmless contract covers
the malicious intention of the actually deployed one. This
vulnerability is uncovered in Sourcify by us (see §VI-B4).
B. Adversary Model
We assume that attackers can access both on-chain and offchain data as normal users. Specifically, attackers can access
all deployed smart contracts through a self-deployed node or
Ethereum browsers, and arbitrarily request verification by providing necessary files. Thus, depending on whether attackers
request verification on contracts deployed by themselves or
other developers, we can divide the consequences of exploiting
smart contract verification services into two categories, i.e.,
adversarial verification and source scam.
Adversarial Verification (A1). This consequence corresponds
to verifying contracts deployed by other developers. Given
an arbitrary contract bytecode, malicious users can forge a
piece of source code that can successfully pass the verification
process by exploiting vulnerabilities in the verifiers. Such
a verification not only does not require authorizations from
the actual developers, but also does not notify them that
their contracts have been verified by others. Additionally, the
verified source code can show malicious or scam information
that discredits the victim. For example, Fig. 4 illustrates a realworld example2
. Specifically, we can see the actual deployed
contract is named as L1Weth, and its meta information is
shown in the upper part. Attackers exploit a vulnerability
(the one detailed in §VI-A1) and provide a source code
file, as shown in the lower part. There are three interesting
points: 1) attackers deliberately name the source code file as
L1Weth.sol, though the most visible contract is named as
LlWeth; 2) there are an ASCII art diagram, a phishing link
and a fake discord channel; and 3) the deposit function has an obvious vulnerability. Intuitively, the credibility of
the original contract is significantly compromised. As for the
adversarial, it can be explained in two ways. On the one
hand, a successful source code verification, even not requested
by the actual developers, has no chance to be replaced in
Blockscout and Etherscan. On the other hand, in Sourcify, if
a smart contract is only partially verified by its developers,
attackers can construct an exact match to adversarially replace
the original one (see §II-B).
Source Scam (A2). This consequence corresponds to the
situation where attackers are exactly the developers of the
verified contracts. As the example in §III-A, attackers can
exploit vulnerabilities by providing a seemingly harmless
contract to hide malicious intentions. Consequently, the fake
contract gains users’ trust, which is however a trap.
These two consequences are explicitly related to users who
access Ethereum contracts. Further, the vulnerabilities can also
bring in critical consequences to the whole ecosystem. For
example, lots of downstream applications, e.g., source-code
level vulnerability detectors like Slither [28] and Echidna [32]
and smart contract auditing companies [8], rely on the source
code retrieved from these verifiers. If they are compromised,
the reliability of such services is severely impacted.
IV. SOURCE CODE VERIFIER
Against three mainstream verifiers, i.e., Etherscan, Sourcify, and Blockscout, we firstly overview their general workflow in §IV-A. Then, we delve deeper in their implementations
in §IV-B and §IV-C. Last, against the proprietary Etherscan,
we detail how a black-box testing is conducted in §IV-D. 1) Untrusted Source Code Data Acquisition Module (M1):
M1 is responsible for obtaining untrusted data, i.e., source
code file(s), configuration items, and an address, from the
verifying requester. As for providing multiple source code
files, it is because a DApp may require multiple files to
achieve a complex functionality. Thus, to enable compiling
a DApp with the hierarchical structure among source code
files, solc allows users to pack source code files and their
paths into a single file in JSON format. Additionally, once
a contract is compiled by solc, a set of configuration items is
generated automatically, corresponding to the meta information
of the compilation adopted this time, e.g., solc version. Last,
the contract stored in the given address is regarded as the
verification target. Moreover, specific service providers may
require extra information. For example, Blockscout requires a
contract name as the entry when multiple contracts exist.
2) On-chain Data Acquisition Module (M2): From the
given address, this module retrieves necessary on-chain data,
i.e., the runtime code and the bytecode (i.e., both creation
code and runtime code) on demand. Specifically, to obtain
the on-chain runtime code, M2 queries a standard RPC,
named eth.getCode(), which is exposed by client nodes.
As for obtaining the bytecode, M2 firstly queries some
third-party databases, e.g., Etherscan, to obtain the hash of
the creation transaction to the address. Then, M2 queries
eth_getTransactionByHash() and extracts the bytecode from its input field.
3) Compilation Module (M3): M3 invokes solc to compile and obtain both bytecode and runtime code for the
subsequent comparison process. According to the source code
files and configuration items obtained by M1, solc can perform
compilation in a deterministic way. Even so, the compiled
runtime code may be unusable. The reason lies in that there may exist immutable variables (see §II-A), whose values are
dynamically determined by executing the creation code during
the deployment. We find two ways are adopted by verifiers to
resolve this issue, i.e., directly replacing immutable variables
and simulating creation code [57].
As for the former method, during the compilation, it
leaves all immutable variables as blanks and records all their
offsets. Then, before comparing the compiled one and the onchain one, it fetches the corresponding bytes from the onchain runtime code according to the recorded offsets, and
fills them into the blanks. Hence, the values of immutable
variables in the local compiled contract are definitely identical
to the on-chain ones, which somehow does not achieve the
purpose of verification. As for the latter method, M3 invokes
eth_call() exposed by client nodes to simulate the behavior of creation code. For example, suppose there is a statement
in a constructor: address immutable public owner
= msg.sender;, where owner is an immutable variable
that can be dynamically assigned as the user’s address who
deploys this contract. The advantage of this statement is that
the owner address does not need to be hard-coded, but is
dynamically set as the transaction initiator during deployment.
This avoids permission verification problems such as owner
address misuse.
4) Comparison Module (M4): M4 is responsible for comparing the fetched on-chain data and the locally compiled ones.
As we stated in §IV-B2, the bytecode is obtained from the
input field of the creation transaction, which means that the
bytecode may be followed by values of parameters if the constructor requires (see §II-A). Thus, instead of asking requesters
to provide concrete values of parameters, another feasible
strategy is determining if the locally compiled bytecode is the
prefix of the one parsed from the creation transaction.
Except for that, comparing bytecode or runtime code is
still challenging due to the existence of metadata. Though
verification requesters are asked to provide configuration items
to try to generate identical metadata, it is still difficult to
ensure that the uploaded ones are identical to the ones at
the actual compilation. Thus, removing the metadata in advance is the best choice to avoid verification failure. These
three verifiers adopt distinct methods to identify and remove
metadata. Specifically, because the metadata is wrapped by
explicit indicators [54], Sourcify directly identifies such a
pattern through regex matching at the tail of the runtime code. However, as Fig. 2c shows, factory contracts may
have multiple metadata, where only removing the tailing one
is insufficient. Thus, Blockscout adopts another method. It
intentionally inserts additional configuration items, which only
changes the metadata while keeping runtime logic unaltered. It
can identify all metadata by observing which bytes are changed
and wrapped in the metadata pattern.
5) Data Storage Module (M5): After completing the above
comparison, all uploaded files need to be properly preserved.
Thus, M5 is accountable for storing them in a permanent
database. For Etherscan and Blockscout, they store all source
code files in their owned and centralized back-end servers.
Users may be concerned about such a centralization issue, like
a single point of failure can cause a completely unavailable
service. Moreover, such centralized back-end servers adopted
by verifiers have been compromised in TRON, where 26
million TRX tokens are stolen [18]. Thus, Sourcify adopts
IPFS [36], a decentralized file system that can be accessed by
anyone who maintains an IPFS node, to enhance its confidence.
6) Display & API Module (M6): Through M6, users
can access the corresponding source code files when asking
for a specific address. Additionally, the displayed information
also includes some meta information as illustrated in Fig. 4.
Furthermore, Sourcify directly offers a metadata file consisting
of all uploaded configuration items to users.
C. Shortcuts
All these three verifiers generally follow the design and
workflow we mentioned in §IV-A and §IV-B. However, we
found some shortcuts, i.e., skipping some modules when a
condition meets, exist in their implementations. The purpose of
these shortcuts is to alleviate the workload or save computing
resources. We detail the found two shortcuts in the following.
Source Code Inheritance across Identical Runtime Code.
Intuitively, if two on-chain runtime code are identical, we can
conclude that the specific logic of the non-constructor part of
these two contracts is equivalent. Therefore, in Etherscan and
Blockscout, if a contract is verified, all other contracts whose
runtime code are identical to this one’s will also be marked
as verified. Such a shortcut provides an automatic verification
for factory contracts. Take a famous decentralized finance
application, Uniswap [66], as an example, which allows token
exchanges at an interest rate calculated by supply and demand. Users can create and deploy a contract of an exchangeable
token pair by invoking create_pair in the factory contract.
Because the concrete steps of how to perform exchanges are
coded in the factory contract, all newly deployed contracts
are identical in terms of runtime code. Consequently, the
previously verified source code files can be directly inherited
to the newly created ones without further verification requests.
Source Code Inheritance across Platforms. To avoid resource consumption, Blockscout recognizes the verification
results of Sourcify without manual interventions. In other
words, against an address, users can only request verification
on Sourcify, Blockscout automatically inherits the source code
files uploaded to the address. Note that, there is no automatic
results inheritance from or to Etherscan, which still requires
manual verification requests from users for each contract.
D. Blackbox Testing in Etherscan
Etherscan is not open-source, thus we use a black-box
testing based approach to investigate what options are adopted
by each module. Specifically, the process can be roughly
divided into: 1) constructing and deploying a contract, 2)
constructing the corresponding source code, 3) trying to bypass
or utilize specific options in modules, and 4) observing whether
the verification is passed.
Take determining how metadata is identified in Etherscan
as an example. The biggest challenge for identifying metadata
lies in factory contracts, who have multiple ones. Thus, we
firstly deploy several factory contracts with different solc
versions. Then, we ask verifications for them by providing
the corresponding source code files. The results show that the
verification cannot pass due to the existence of unidentified
metadata when solc is less than or equal to 0.4, but plays
opposite when solc is greater than 0.4. Thus, we infer that
when solc ≤ 0.4, Etherscan adopts the same method as
Sourcify’s, i.e., regex matching. However, we cannot conclude
which option is adopted when solc is greater than 0.4. All
speculated options of these modules are reflected in Table I.
Detailed processes and specific contracts used for black-box
testing are released in our artifact: link.
V. SECURITY PROPERTIES
According to the two possible consequences (see §III-B),
two security properties (P) should be guaranteed to protect
the normal functionalities of source code verification services,
which are shown in Table II.
Unrestorability (P1). Restoration refers to a kind behavior,
where a user somehow obtains source code via reverse engineering or exploiting compiler features. However, this source
code is definitely not the original one composed by developers,
but can bypass the source code verification. Such a behavior
is intuitively unexpected for both users and developers as it cannot reflect the original intention of the deployed contract.
Analogous to the preimage resistance of hash functions [68],
we propose unrestorability, trying to avoid such behaviors. To
meet this property, verifiers should be robust to the following
threats. First, threats to the compilation module. Features
adopted by M3 may violate this security property. For example, both verbatim function [51] and loose inline assembly [53]
in solc can assist developers to designate the compiled opcode
sequence. Second, threats to the comparison module. During
the comparison stage, as we mentioned in §IV-B4, some
options are enabled to enhance the user-friendliness, like only
considering the prefix of the retrieved bytecode. Such options
can be utilized by attackers to hamper the unrestorability.
Consistency (P2). Consistency refers to a property that the
semantics between the source code files that passed the verification and the targeted on-chain contract should be identical.
Thus, comparing to the semantics of the targeted on-chain
contract, there should not be redundancy or absence in the semantics of the uploaded source code files. Intuitively, two types
of inconsistency can emerge. On the one hand, the semantics
of the provided source code files is absent. For example, in
M3, the implementation of functions in linked libraries is not
included in the caller. In addition, before the comparison in
M4, all metadata should be identified and removed. However,
if a piece of runtime logic is mistakenly marked as metadata,
the corresponding semantics will be overlooked. Consequently,
the absence in the semantics of source code files results in more
code being executed than users think. On the other hand, the
semantics of the provided source code files is redundant. In
M6, the consistency is reflected by the unambiguity of the
displayed information. For example, if attackers can display
redundant or misleading information, like multiple contracts
with an identical name, the unambiguity is violated. The
downstream applications based on the data exposed by M6
will be influenced. Moreover, such an inconsistency can also
emerge after the verification. Specifically, after a successful
verification, the uploaded source code files are stored in
M5. The stored ones may be overwritten unexpectedly or
intentionally by tampering file systems.
VI. VULNERABILITY DETAILS
Against two security properties, we found eight types of
vulnerabilities, whose relations are shown in Fig. 6. In this
section, we firstly organize them according to their consequences (A1 and A2). Then, we present the technical details
of detection, exploitation, and mitigation against each of them Three types of vulnerabilities (R1 to R3) are categorized
to this consequence. The impact of exploiting them depends
on the scale of unverified contracts and the practicality of the
exploits. As we introduced in §II-B, less than 1%, 0.7% and
0.3% of contracts have been manually verified in Etherscan,
Sourcify, and Blockscout, respectively. Thus, if the corresponding exploits are used extensively, the impact can be
summarized in twofold: 1) preventing original developers from
providing actual original source code files, and 2) embedding
malicious or scam information within the provided source code
files to discredit victims. Specifically, as R1 to R3 violate
P1, attackers are able to construct a contract according to
the victim’s bytecode to bypass the source code verification.
In the same source code file, attackers can compose another
contract, within which embeds phishing or scam information as
illustrated in Fig. 4 to discredit the victim contract. The latter
contract can even be placed in a more visible position. In the
following, we mainly focus on how to forge source code files,
instead of the concrete way to spread malicious information
and commit fraud.

  1. Exploitable Compiler Features (R1): In M3, attackers
    can abuse features of solc to break P1. Specifically, such
    features include YUL [56] (an intermediate language) and loose
    inline assembly [53], both of which can be inlined in Solidity.
    For example, taking advantage of the verbatim_0i_0o
    function in YUL, users can directly designate the compiled
    opcode sequence. Though according to the implementation of
    solc, an invalid opcode fe will be appended after the opcode
    sequence, it still reflects how users can flexibly control the
    deployed bytecode via abusing compiler features. Likewise,
    when solc ≤ 0.4, users can also adopt loose inline assembly
    to generate arbitrary opcode sequences.
    Detection. To detect if R1 is exploitable in three verifiers, we
    construct a set of contracts. Each of them is composed of a
    single fallback function, where it embeds a verbatim function
    or a piece of loose inline assembly code. Fallback function
    has an empty function signature, meaning that the compiled
    bytecode is not wrapped by additional opcodes, like function
    identifiers [52]. Through observing the difference between the
    opcode sequence declared in source code and the consequently
    compiled one, we can conclude if R1 is exploitable in verifiers. PoC. Consider a deployed contract named A, whose runtime
    code is 0x608060.... We can compose a contract A_, as
    shown in Listing 1, which only contains a fallback function
    (L3). To exploit R1 in Etherscan, we firstly translate the
    victim’s, i.e., A’s, runtime code (without metadata) into a piece
    of loose inline assembly in the fallback function of A_, as
    shown from L4 to L9. Then, we compile A_ locally and
    manually replace the metadata of A_ to the A’s. In this way,
    A_ and A are identical in terms of runtime code. Thus, we deploy A_ and verify it by providing Listing 1, which should
    be an exact match certainly4
    . Consequently, due to the shortcut
    in Etherscan (see §IV-C), A is automatically labeled as verified
    with the source code of A_. As Sourcify allows only verifying
    the runtime code (see §IV-B4), it is easier to exploit it. We
    can directly request verification for A with A_ in Listing 1.
    After the metadata is removed, they certainly have an identical
    runtime code to pass the verification. Since Blockscout inherits
    the results of Sourcify, Blockscout is also exploitable.
    Exploitation Conditions. Successfully abusing R1 is limited
    by some conditions. First, only valid opcodes can be written in
    loose inline assembly, resulting in the PoC has to be compiled
    with solc ≤ 0.4 and the target contract cannot have multiple
    pieces of metadata. Specifically, if solc is greater than 0.4,
    it automatically appends an invalid opcode fe in front of the
    tailing metadata. And if there are multiple pieces of metadata,
    it cannot guarantee that each byte of such a random hash string
    can always correspond to a valid opcode. Second, operands of
    PUSH cannot be led by zero. In Ethereum, there are 32 types
    of PUSH which can push 1 to 32 bytes, respectively. However,
    if we intend to encode opcodes like PUSH2 0001 in loose
    inline assembly, solc will automatically optimize it as PUSH1
    01, which changes the final opcode sequence.
    Mitigation. It is impractical to prohibit the use of such
    features in the provided source code, because they can provide
    several advantages, like supporting functionalities unavailable
    in Solidity and saving gas, which are proven by Liao et al. [38].
    Therefore, to avoid the abuse, here are two possible solutions
    for verifiers. On the one hand, raise the threshold for requesting
    contract verification, e.g., all requesters must follow the knowyour-customer (i.e., KYC) policy [67] or being authorized in
    advance. On the other hand, if these features are found to
    be heavily used by solc, verifiers should allow source code
    replacement of such contracts or give users a clear warning.
  2. Unchecked Simulating (R2): Simulating is an option in
    M3, which is adopted by Sourcify in verifying runtime code
    to handle immutable variables. Specifically, the simulator takes
    whatever it receives as a creation code, executes it locally, and
    regards the returned bytecode as the corresponding runtime
    code. Such a process may break P1 if attackers construct a
    wrapper (not a valid creation code), which returns the victim’s
    runtime code directly. Generally, the pointer of the runtime
    code is acquired by codecopy, an opcode located at the end
    of the creation code. Then, the pointed data, i.e., runtime code,
    is returned and processed. However, solc does not check if the
    pointer is actually returned by codecopy and the validity
    of the returned data. Thus, attackers can construct a piece of
    code, within which explicitly returns a pointer that points to
    the victim’s runtime code before the actual codecopy.
    Detection. To detect if R2 is exploitable, we deliberately insert
    some functions that can explicitly change the control flow in
    the constructor, like call and return. By asking conducting
    source code verification on these contracts and observing if
    the control flow of the compiled bytecode changes, we can
    conclude if simulating is adopted in M3. contract A_ {
    2 constructor() public {
    4Note that, the modified metadata has no influence on the verification result
    because it is removed before comparing, see §IV-B4 3 // Assign victim A’s runtime code to bytecode
    4 bytes memory bytecode = hex’608060...’;
    5 assembly {
    6 return (add(bytecode, 0x20), mload(bytecode))
    7 }
    8 }
    9 }
    Listing 2: The PoC of R2.
    PoC. Suppose a contract A is deployed on-chain, and we can
    construct a contract A_ as shown in Listing 2. At L4, it assigns
    the runtime code of A to the variable bytecode. Then, at L6,
    a return is explicitly declared in YUL, which is located before
    the actual codecopy opcode. Thus, the simulator will execute
    the return at L6 before running into the actual codecopy,
    leading to an early return of a pointer, which points to the
    bytecode. Considering that such an attack can accomplish
    the exact match (as metadata can also be directly encoded in
    bytecode), we can verify arbitrary contracts that are not
    exactly matched yet.
    Mitigation. To mitigate the issue, we highly recommend verifiers to not adopt simulating during the compilation stage. This
    is because simulating has to rely on EVM, which inevitably
    introduces new features that may be improperly utilized or
    abused. After we made the suggestion, Sourcify officially
    abandoned simulating and started to adopt the direct replacing
    strategy that we mentioned in §IV-B3 [60].
  3. Incomplete Bytecode Validation (R3): Considering the
    user-friendliness, verifiers may not ask users to provide the
    concrete values of parameters that are appended after the bytecode. Such a strategy can be abused by attackers, which breaks
    P1. Specifically, assume a constructor requires a parameter,
    whose value is p. It is always a mismatch if directly comparing
    locally compiled bytecode, denoted as byteloc, to the input
    field of the creation transaction, denoted as bytechain||p, due
    to the absence of p. Thus, verifiers may adopt prefix matching,
    i.e., verifying if byteloc is the prefix of bytechain||p. If it is,
    they consider the verification passes. However, this strategy
    can be abused if a comprehensive examination misses. For
    example, a prefix of byteloc, which is even not a valid
    bytecode, may also be the prefix of bytechain||p. Furthermore,
    if verifiers do not take an empty string into consideration, an
    empty string can also be the prefix of bytechain||p.
    Detection. To detect if R3 is exploitable in verifiers, we
    intentionally design a set of contracts, which can generate a
    prefix of the victim contract. This process can be completed
    through utilizing loose inline assembly (see §VI-A1) or by
    composing abstract contracts. Specifically, abstract contracts
    can be compiled to an empty bytecode. In other words, an
    abstract contract can be regarded as a prefix of any contract.
    If such contracts can pass the verification, R3 is exploitable.
    1 abstract contract WrappedToken{
    2 mapping (address=>uint256) public balance;
    3 function withdraw (uint256 value) public {
    4 balance[msg.sender] -= value;
    5 payable(msg.sender).transfer(value);
    6 }
    7 }
    Listing 3: The PoC of R3. source code, which is wrapped by an abstract contract. Abstract contracts have the following two characteristics: 1) the
    compiled bytecode is empty, and 2) functions within them will
    be recognized as ABI functions [50], just like the functions
    in normal smart contracts. Because of the problematic implementation of verifiers (e.g., Sourcify [59]), such an empty
    bytecode can be regarded as the prefix of any bytecode. To
    this end, users who browse this contract will notice that a
    function called withdraw is listed as an ABI function, but
    it has an apparent integer overflow vulnerability at L4. The
    actual deployed contract is discredited.
    Mitigation. To avoid this exploitation, verifiers should firstly
    always perform a non-null check on byteloc, i.e., an empty
    bytecode compiled locally is unacceptable. Further, verifiers
    should ensure that the arguments part can always be successfully identified. It can be done by verifying whether the suffix
    bytes after removing the matched prefix can be parsed as legal
    parameters according to the corresponding ABI file. If it is
    possible, it means that byteloc equals to bytechain.
    B. Vulnerabilities Leading to A2
    Several types of exploitable vulnerabilities can lead to A2,
    i.e., conducting source scam for inexperienced users. They can
    cause extreme severe impact.
    Specifically, the verification requesters who exploit these
    vulnerabilities (R4 to R8) are malicious developers themselves. They firstly deploy a malicious contract, and verify
    it by another piece of seemingly harmless source code. The
    verification between these two mismatched contracts can be
    bypassed by exploiting these vulnerabilities, where a concrete
    instance is illustrated as the motivating example (see §III-A).
    Similarly to the introduction on vulnerabilities leading to A1
    (see §VI-A), in this section, we deliberately ignore details
    about how to construct and deploy malicious contracts. In the
    following, we mainly focus on how to bypass the verification,
    i.e., importing inconsistency to break P2.
  4. Replaceable On-chain Contracts (R4): The status and
    enabled features of the client node that verifiers depend on can
    be exploited by attackers to replace already on-chain contracts,
    which breaks P2. For instance, client nodes may suffer the
    chain reorganization issue [41], which is a normal phenomenon
    for blockchains as forking is inevitable. According to the
    statistics [27], this issue happens around 5 times an hour in
    Ethereum. It requires service providers on blockchain platforms to proactively avoid the imported negative impact, e.g.,
    decentralized exchanges only consider the transactions located
    under certain block heights as valid ones [15]. For verifiers,
    if unconfirmed on-chain smart contracts are taken as targets
    for the source code verification, the validity of their results
    may be hampered. Additionally, the EVM version adopted by
    the verifier-dependent client node may also lead to abuses.
    For example, Ethereum has introduced a new opcode, named
    create2 [21]. It was originally designed to deploy a smart
    contract on a predetermined address for developers. However,
    this opcode can be abused to deploy another piece of runtime
    code on an existing address, which may invalidate the original
    source code verification result on the same address [24].
    Detection. Against above two examples, to detect the former
    one, a reasonable method is to deploy multiple client nodes and initiate multiple transactions simultaneously to urge state forking. However, such a testing on consensus algorithm is out of
    scope of this work. Thus, we mainly focus on the exploitability
    of the later one, i.e., if verifiers can be cheated by redeploying
    contracts on an existing address through create2. Specifically, if the creation code is unchanged, the address of the
    deployed contract also keeps unchanged through create2.
    Taking advantage of this feature, we compose a contract as
    a deployer, which has a fixed creation code and takes the tobe-deployed runtime code as an argument. Intuitively, if an
    address is available and can be redeployed after the original
    one is self-destructed, the verified source code may correspond
    to the original one, indicating verifiers are exploitable to R4.
    1 contract Deployer {
    2 bytes public deployBytecode;
    3 address public deployedAddr;
    4 function deploy(bytes memory code) public {
    5 deployBytecode = code;
    6 address target;
    7 bytes memory dumperBytecode = hex"{Dumper
    contract’s creation code}";
    8 assembly {
    9 target := create2(callvalue, add(0x20,
    dumperBytecode), mload(dumperBytecode), 0
    x11)
    10 }
    11 deployedAddr = target;}
    12 }
    13 contract Dumper {
    14 constructor() public {
    15 Deployer dp = Deployer(msg.sender);
    16 bytes memory bytecode = dp.deployBytecode();
    17 assembly {
    18 return (add(bytecode, 0x20),mload(bytecode))
    19 }}
    20 }
    Listing 4: The toolchain of conducting PoC of R4.
    PoC. Fig. 7 and Listing 4 illustrate how a fraud can be committed through exploiting R4. In the preparation stage, i.e., step
    1, we first compile the Dumper declared at L13 in Listing 4,
    extract its creation code and assign it to dumperBytecode
    at L7. Thus, the contract Deployer at L1 can be seen as a
    general deployer. Any runtime code passed through code at
    L4 can be deployed at an unchanged address. At step 2, we
    compile a harmless contract A, and call the function deploy
    of Deployer at L4 with code as A’s runtime code. The
    create2 at L9 deploys this contract on an address. Then, at
    step 3, we provide the corresponding source code files of A
    and ask for a verification. After the verification completes, we
    call the function selfdesctruct in A to make this address available, as shown in step 4. We again compose another evil
    contract A_, and pass its runtime code in deployer. Due
    to the characteristic of create2, such an evil contract is
    also deployed on the identical address to A. Consequently, the
    verification results of A are mistakenly bound on the actual
    executed contract, i.e., A_.
    Mitigation. To prevent such an attack, verifiers should ensure
    the immutability of the contracts being verified and make
    necessary updates simultaneously. On the one hand, for those
    contracts that can still be rolled back, e.g., through a chain
    reorganization, it is best for verifiers not to provide services
    to them. On the other hand, for those contracts in finalized
    blocks, when verifiers provide the source code file, they should
    also highlight the difference between the current bytecode
    and the one at the time the verification is requested. If they
    are different, it means that the deployed contracts are selfdestructed or re-inited by create2 on the same address.
  5. Unverified Linked Libraries (R5): If the implementation
    of invoked linked libraries is not considered by verifiers, it
    intuitively violates P2. Specifically, in order to support modular design when developing DApps, developers can embed
    part of the contract logic into a library contract. Two kinds of
    libraries exist, i.e., linked library and embedded library, whose
    distinctions are subtle [17]. On syntactic level, the only distinction is that they are specified by public and internal,
    respectively. However, they are encoded in different ways.
    Specifically, a call to functions in linked libraries is encoded as
    an opcode delegatecall, which takes the invoked library’s
    address and the function signature of the callee as arguments.
    This means that the callee linked library is actually located at
    another address, and its logic does not appear in the caller’s
    bytecode at all. Conversely, as for embedded libraries, the
    logic of callee is directly included into the caller’s. Such
    subtle differences may not be taken into account by users. In
    other words, though verification requesters provide the source
    code of the caller contract and the invoked linked libraries
    together, verifiers only consider the function signatures instead
    of the implementation of the functions in the provided linked
    libraries. Because there is no clear warnings about this issue,
    users may be deceived as the implementation of invoked linked
    library is not verified at all.
    Detection. To detecting if a verifier is exploitable to R5, we
    firstly deploy a contract that invokes a linked library where
    it poses malicious behaviors. Then, we verify the contract
    while uploading a library with identical function signature but
    different implementation. If the verification can pass, it means
    malicious users can commit fraud through exploiting R5. pragma solidity ˆ0.8.0;
    2 contract A{ // caller
    3 uint totalsupply = 0;
    4 function is_zero() public view returns(bool){
    5 L.check(totalsupply);
    6 // compiled to: L.delegatte(abi("check(uint)
    ", totalsupply))
    7 return true;
    8 }
    9 }
    10 library L{ // linked library
    11 function check(uint balance) public view{
    12 require(balance == 0);
    13 }
    14 } PoC. Listing 5 illustrates a call to a linked library. We can see
    from L5 that L is a linked library, and check is the callee
    function. This is equivalent to the statement at L6, i.e., only
    the address of L and the signature of check is considered
    by the caller contract. To this end, we can commit fraud by
    utilizing this feature. Specifically, we firstly update Listing 5 by
    replacing L12 to selfdestruct(msg.caller) that can
    destroy the contract in callers. Then, we compile and deploy
    this malicious contract on-chain. To conduct exploitation, we
    provide the seemingly normal Listing 5 at the source code
    verification, which should pass the verification because the
    contract A is unchanged, and the signature of the callee
    function check in the linked library also keeps unaltered.
    Consequently, the malicious intention is covered.
    Mitigation. If the invoked linked libraries cannot be verified
    recursively, we urge verifiers to explicitly warn users that
    the implementation of linked libraries is unreliable. Moreover,
    users need to carry out further checks themselves. In other
    words, only when the main contract and the invoked linked
    libraries are both verified and there are no security issues in
    their implementations, users can trust the main contract.
  6. Mislabeled Bytecode (R6): When performing the comparison in M4, verifiers flag and intentionally skip certain
    fields in order to avoid unnecessary verification failures. However, such an intentional skip may lead to the inconsistency
    issue. Two types of fields lie in this scope, i.e., metadata and
    linked library placeholders.
    The necessity of removing all metadata before M4 is
    detailed in §IV-B4. However, problematic metadata labeling
    and extracting may leave metadata behind or extract innocent bytes as metadata, leading to a failed verification or
    unexpected verification bypass, respectively. Specifically, the
    current pattern of metadata is 53-bytes long, including a 34-
    bytes IPFS hash, a 3-bytes solc version, and a 16-bytes magic
    number [54]. To extract all metadata, as we mentioned in
    M4, two ways are adopted by current verifiers, i.e., regex
    matching and differential extracting. Specifically, no matter
    where a metadata locates, regex matching is supposed to be
    effective and efficient. However, in 2021, a white hat exploited
    the buggy regex matching to intentionally mark a piece of
    runtime code as metadata, where he hid a backdoor [49]. It
    proves the unreliability of this strategy. Thus, Blockscout has
    proposed the idea of differential extracting. According to its
    implementation [5], it can be summarized as follows:
    • According to configuration items, it compiles the
    given source code files, which is denoted as c.
    • Blockscout intentionally adds a pre-defined and useless configuration item to make the compiled bytecode changed as c

    . The newly added item is to
    import a linked library located in a file named
    SOME_TEXT_USED_AS_FILENAME [7].
    • By comparing c and c

    , Blockscout can identify an
    index, denoted as i, where the first difference occurs.
    Based on i, it looks backward and forward to find a
    byte string that follows the metadata pattern.
    • The above indexing and metadata identifying process
    repeat till all metadata are identified. Linked library placeholders should also be removed before
    conducting comparison. Specifically, during the compilation,
    solc firstly replaces each invoked linked library address with a
    40-bytes placeholder. If users require verification on Etherscan
    or Blockscout, both verifiers ask users to provide concrete
    addresses. Unlike them, Sourcify provides a more user-friendly
    solution, which is detailed as follows:
    • Sourcify identifies the first appeared placeholder,
    records its offset, and extracts the following 40-bytes
    string as a regex pattern.
    • Sourcify then locates the bytes from on-chain bytecode according to the recorded offset, regarding them
    as the actual invoked library address.
    • Among all placeholders, it matches the ones according
    to the regex pattern (1st step), and replaces them with
    the on-chain bytes (2nd step).
    • The above process repeats till no placeholders exist.
    This implementation is safe when solc is greater than 0.4,
    where the placeholder is a 34 bytes hash with fixed prefix and
    suffix. However, the situation turns opposite when solc is 0.4,
    where the placeholder is a string like:
    _ _ filePath : libName _ _
    , where filePath and libName are strings explicitly declared in
    the source code. Because solc does not require the format of
    filePath, e.g., ended by .sol, attackers can construct a value
    to abuse Sourcify’s regex matching mechanism.
    Detection. Instead of exploiting the regex matching mechanism in identifying metadata like the previous attack [49], we
    focus on if the differential labeling proposed by Blockscout is
    vulnerable. To this end, we compose a contract, in which we
    deliberately invoke the library imported by Blockscout. Then,
    between M3 and M4, we compare the compilation results
    of the Blockscout generated one and the normal one. If some
    runtime logic is missed, it means that part of runtime logic is
    mislabeled, violating P2. PoC #1 illustrates an example.
    As for detecting the vulnerability on identifying linked
    library placeholders, we construct a contract whose filePath
    and libName are deliberately set as strings following regex
    grammar. Then, after M3 of Sourcify, we observe if the forged
    filePath and libName can hit any valid runtime logic and
    replace it with on-chain bytecode. If it is, it means attackers
    can construct malicious contracts to arbitrarily label runtime
    logic to intentionally remove it before the comparison in M4.
    We will demonstrate is in the following PoC #2.
    PoC #1. Fig. 8 illustrates how to exploit the differential
    extracting. Specifically, we compose a piece of source code,
    as shown in the top of Fig. 8. We deliberately import a file
    named SOME_TEXT_USED_AS_FILENAME, and assign the
    library (L_), which is defined in the imported file, to a variable.
    At step 1 & 2, we compile the source code, and locally
    update an opcode near the library placeholder to FF, i.e.,
    selfdestruct. Then, we deploy such a modified bytecode
    on-chain. After that, we request a source code verification
    through Blockscout. Because the library L_ is defined in the
    imported strange file, which is inserted by Blockscout when
    conducting differential extracting, the bytes at that location are unexpectedly updated. According to the implementation
    adopted by Blockscout we mentioned before, it should mark
    the L_ nearby area, including the malicious FF, as metadata
    and remove it, which is normal runtime logic actually. Because
    other parts of bytecode are still identical to the on-chain ones,
    it can pass the verification, while users are unaware of there is
    a dangerous selfdestruct hidden in the deployed contract.
    1 pragma solidity ˆ0.4.0;
    2 import "./.{37}|2{40}|"; // file path 3 contract A { 4 address constant public owner = address(0x222 ..22); 5 uint public b; 6 function c() public{ 7 b = foo.bar(); 8 }} 9 10 library foo{ // lib name 11 function bar() public pure returns(uint) { 12 return 1; 13 }} Listing 6: The PoC #2 of R6. PoC #2. Listing 6 illustrates a malicious smart contract. The L2 and L10 correspond to filePath and libName, respectively. Thus, a placeholder is generated as5 : __.{37}|2{40}|:foo_...__
    To this end, Sourcify takes the above placeholder as a regex
    pattern. Specifically, the __$.{37} can match the current
    placeholder itself, but the 2{40} can match the address declared at L4. Suppose this placeholder locates at the offset
    o. We can place any 40-byte sequences at o on the onchain contract, like an address or even a backdoor. Therefore,
    through the constructed regex pattern, Sourcify mistakenly
    replaces the address at L4 to a predefined byte sequence.
    Mitigation. Generally speaking, when dealing with such nonnaive fields, like metadata and linked library placeholders,
    extra operations need to be conducted to avoid unnecessary
    verification failures. According to Occam’s razor, verifiers
    should prioritize the reliability instead of introducing unnecessary steps, which can be reflected on the mitigations against
    these two exploitations.
    Specifically, when identifying metadata, the differential
    extracting is better because it is conducted on semantic level
    while the regex matching is only on syntactic level and proven
    unsafe [49]. However, Blockscout unnecessarily introduces
    a linked library that can be exploited. Therefore, we urge
    the verifiers to adopt the simplest way of differential extracting, like adding a meaningless space character after the
    source code, which can update the metadata while keeping
    runtime logic unchanged. As for identifying linked library
    placeholders, Sourcify can avoid this attack by enumerating all
    placeholders one by one instead of adopting regex matching.
    Because verifying a contract is basically a one-shot process,
    underlining the efficiency barely introduces extra advantages.
  7. Path Traversal Risk (R7): Similar to R4, where onchain contracts can be replaced by exploiting features in EVM,
    by exploiting R7, existing contracts can also be overwritten.
    Specifically, solc allows users to pack source code files and
    their corresponding paths in a single JSON file, where the
    paths can be designated arbitrarily. To this end, if verifiers do
    not strictly validate the paths, it is very likely that attackers
    can abuse this feature to conduct an arbitrary path traversal.
    Detection. To detect if a verifier is vulnerable to R7, we
    construct several JSON files, within which the paths of source
    code files are declared in a malicious path traversal way, like
    deliberately adding ../ to refer to its parent directory. Then,
    we try to access the directory to see if the source code files
    are uploaded to that invalid directory.
    PoC. Suppose there is a contract with backdoors named A. After the deployment and source code verification, let us assume
    its source code is stored in 0x12..fe/source/A.sol.
    To fool users, we can deploy another contract without backdoors, named A_ for instance. However, during performing
    the source code verification, we claim the path of A_ is:
    ../../0x12..fe/source/A.sol. To this end, the verifier stores the source code of A_ at the place where it should
    be used to store A. When users examine the source code files
    of A, they are fooled by the implementation of A_.
    Mitigation. Path traversal is a well-studied vulnerability in
    traditional scenarios, e.g., web applications [30]. Therefore,
    we strongly suggest verifiers to adopt a comprehensive sanity
    check on the given paths of source code files by following the
    best practices that are widely-adopted [43], [46].
  8. Inadequate Information Disclosure (R8): The inconsistency can be introduced by not only really existing issues (like
    R4 to R7), but also users misunderstanding. Specifically, if a
    source code verification is passed, the verifier should display the verified information to the public. Take the Etherscan as
    an example (see Fig. 4), it illustrates lots of meta information,
    but the contract name does not contain the directory it belongs
    to. Such an inadequate information disclosure may mislead
    users and be abused. For example, attackers can put the main
    contract and a fake one with the identical name under different
    paths. Because only the contract name is displayed, users may
    be misled to take the fake one as the verified contract, which
    can negatively impact the credit of the project.
    Detection. To detect R8, we can upload multiple contracts
    with an identical name to verifiers. Then, we observe if users
    can distinguish them. If they cannot, we conclude that this
    verifier is exploitable to R8.
    PoC. We develop two contracts, both named erc721.sol,
    following a simplified ERC-721 standard [20] to issue
    tokens. They are put under different directories. e.g.,
    test/erc721.sol and main/erc721.sol. The only
    difference between them is the initial value of totalSupply,
    i.e., the field refers to the maximum amount of tokens that is allowed to be minted, where the value of totalSupply under
    the test one is much higher than the one under the main directory. We deploy the test/erc721.sol firstly, and verify
    it by uploading both files in a json format. To fool users further,
    we can even put another config.sol under the main
    directory. Consequently, users may be fooled by a seemingly
    limited edition of the one shown in main/erc721.sol.
    Mitigation. Verifiers should ensure that the disclosed information is consistent to the ones adopted by the compiler. If
    some fields cannot be displayed at all or cannot be displayed
    completely, verifiers should at least guarantee the displayed
    information is unambiguous.
    VII. EVALUATION
    A. Experimental Setup & Ethical Considerations
    In this work, we target the most popular Ethereum smart
    contract verification services, including Etherscan, Sourcify,
    and Blockscout. For all these three verifiers, they provide the
    verification services on both Ethereum mainnet and testnet.
    For ethical considerations, all evaluations are conducted on the
    goerli testnet, which is independent of the Ethereum mainnet.
    We have carefully designed methods to detect the exploitable vulnerabilities. Specifically, for the A1-related vulnerabilities, i.e., R1 to R3, we randomly sample 10 widelyused contracts with source code from mainnet as victim
    contracts, and deploy them on the testnet. After that, according
    to the victim’s bytecode, we construct PoCs to try to pass the
    verification. Note that a successful verification does not imply
    the A1 consequence, as we need to ensure that the forged
    source code cannot be replaced. Therefore, we further request
    verifications against these victim contracts with their original
    source code. If the re-verification fails, we conclude that the
    testing verifier is exploitable. For the A2-related vulnerabilities, we firstly construct some contracts with backdoors, like
    conducting selfdestruct or transferring to other users,
    compile and deploy them on the testnet. Then, according
    to PoCs raised from R4 to R8, we try to construct source
    code without such malicious behaviors, and bypass the source
    code verification. If it succeeds and no warnings raise, we TABLE III: Overall vulnerability detection results. For each
    vulnerability, – and ✗ refer to infeasible (safe) and exploitable,
    respectively. * refers to the official teams have confirmed our
    reported vulnerabilities, and ✓ indicates it has been patched
    after our timely disclosure.
    Consequence Exploitable
    Vulnerailities Etherscan Sourcify Blockscout
    A1
    R1 ✗ ✗* ✗*
    R2 – ✗* (✓) ✗* (✓)
    R3 – ✗* (✓) ✗* (✓)
    A2
    R4 ✗ ✗* ✗*
    R5 ✗ ✗* ✗* (✓)
    R6 – ✗* (✓) ✗* (✓)
    R7 – ✗* (✓) ✗* (✓)
    R8 ✗ – ✗* (✓)
    can conclude that the verifier is exploitable in terms of the
    corresponding vulnerability.
    Finally, after confirming the exploitability of vulnerabilities, we conduct timely disclosure to impacted verifiers within
    30 minutes. On the one hand, it benefits a timely fixup on
    exploitable vulnerabilities. On the other hand, it gives little
    chance for malicious users to replay exploitations by observing
    our uploaded source code files and initiated transactions.
    B. Detecting Results
    Overall Result. Table III presents the overall results. Surprisingly, all these popular verifiers are vulnerable, which
    can be abused by attackers. In total, we have uncovered 19
    exploitable vulnerabilities. Etherscan is vulnerable to 4 kinds
    of vulnerabilities, while Sourcify poses the risks of 7 types of
    vulnerabilities, and Blockscout can be exploited by all types
    of vulnerabilities.
    Further Exploration. As Sourcify and Blockscout are vulnerable to most kinds of vulnerabilities, we next deep dive into
    them. For Sourcify, some of its designs for user-friendliness
    introduce security issues. For example, Sourcify does not require users to provide the addresses of invoked linked libraries,
    it tries to replace placeholders according to a regex pattern,
    which is also embedded in the source code. Such an adoption
    of untrusted data harms its initial goodwill on usability. For
    Blockscout, it is vulnerable to all kinds of attacks. Recall
    the shortcut we mentioned in §IV-C, which can be seen as
    an amplifier to expand the exploitable scope of PoCs. For
    example, if a vulnerability is exploitable in Sourcify, we also
    consider it can be exploited in Blockscout because the latter
    one recognizes the results of the former one. Through the
    shortcut, Blockscout inherits the malicious verification results
    of exploiting R1, R2, R3, R4 and R7 from Sourcify. But then
    again, the transparency of Blockscout and Sourcify enables a
    more insightful and timely security analysis.
    Vulnerability Patching. We disclose the vulnerabilities to
    verifiers timely. Among the 19 uncovered vulnerabilities, 15
    of them have been confirmed by the official teams, and 10
    vulnerabilities have been fixed. For Sourcify and Blockscout,
    we observed that the vulnerabilities of R2, R3, R6, and R7
    were fixed within 12 hours. However, as suggested in Table III,
    we find that four kinds of vulnerabilities are almost exploitable
    in all verifiers by the time of this writing (R1, R4, R5, and R8). By communicating with the official teams, we summarize the following challenges. Specifically, as for R1, loose
    inline assembly is a necessary feature that is widely adopted
    by normal contracts to improve its runtime efficiency [44].
    Directly disabling it shall affect the usability of verifiers. As
    for the other three kinds of vulnerabilities, the verification
    services thought that users should be aware of the scams,
    e.g., paying attentions to whether a smart contract invokes a
    malicious linked library or has a suspicious accompanying file
    with identical name. Thus, they are considering adding explicit
    warnings for users in recent updates.
    C. Contracts Affected by A1-related Vulnerabilities
  9. Method: Among all three A1-related vulnerabilities,
    exploiting R2 and R3 have no prerequisites. Thus the number
    of influenced smart contracts are all smart contracts that are not
    exactly matched yet and unverified, respectively. As for R1,
    due to the restriction we mentioned in §VI-A1, we compose
    a simple and effective SQL query statement to filter out all
    possible victim contracts. It consists of three sets of like
    operators based on the patterns we summarized. Specifically,
    first, to narrow down the scope to all contracts compiled
    from solc with 0.4 version, we heuristically use the magic
    number a165627a7a7230, which is located in metadata
    that is used by solc 0.4, as the pattern. Then, we restrict the
    operand of each PUSH operators has no leading zeros. Last,
    because the exploit can only be conducted on contracts with
    at most a single piece of metadata, we filter the contract with
    multiple pieces of metadata out by adopting the magic number
    0029a165.
  10. Result: As shown in Table IV, We finally identify more
    than 49K contracts that are influenced by R1. Among them, by
    identifying the function signatures, 25K and 2,440 contracts
    are suspected to follow ERC20 and ERC721, respectively,
    which may lead to huge financial and ecological impact if they
    are adversarially verified. In addition, all unverified solidity
    contracts in Sourcify, around 58.9M, are affected by R2 and
    R3, where the former one even includes the partial verified
    ones (around to 9K contracts). Moreover, as we clarified in
    §I, lots of downstream applications rely on source code verification services. If millions of contracts can be adversarially
    verified by forged smart contracts, the effectiveness and even
    safety of these applications are severely impacted. We illustrate
    a case study in our artifact [1], where Slither [28], a wellknown vulnerability detector on Ethereum smart contracts on
    source-code level, is exploited by a piece of forged source
    code, resulting in file overwriting in its hosting environment. D. Contracts Affected by A2-related Vulnerabilities
    Five types of vulnerabilities are related to A2. Therefore,
    we try to measure if there exists any contracts that have already
    performed source scam on users.
  11. Method: To the best of our knowledge, among all these
    five types of vulnerabilities, R6 to R8 are zero-day vulnerabilities uncovered by us. No existing instances are found on
    Ethereum mainnet. Thus, we mainly focus on the existence of
    instances that exploit R4 and R5. Based on tintinweb [42],
    a dataset consisting of all verified contracts on Ethereum,
    we performed a series of rule-based detection to filter such
    contracts. Specifically, as for identifying contracts abusing R4,
    for each verified contract, we iterate its transactions to identify
    if there are more than one successful contract deployment. To
    identify contracts abusing R5, we first filter out all verified
    contracts that have a call relationship to linked libraries. Then,
    we perform a difference checking between the provided linked
    library and the one invoked. Consequently, we can identify if
    attackers uploaded a forged linked library.
  12. Result: As shown in Table IV, we finally find 2 and
    244 contracts whose developers are suspected to abuse R4
    and R5 to commit fraud. For the 2 cases under R4, after a
    manual recheck, we confirm that one case has performed a
    fake proposal attack against Tornado.cash [69]. Specifically,
    by exploiting R4, the attackers associate a piece of source
    code that has gained the user’s trust to a malicious proposal.
    The proposal ultimately resulted in a 750KfinanciallossforTornado.cash[48].Asforthe244casesthatarerelatedtoR5,weconfirmthatatleast6oftheminvokedmaliciouslibrariesbutuploadedwithbenignones.Forexample,thereisanattacktargetingtheSaddleFinance[47],awellknowndecentralizedexchange.Interestingly,itsactuallyinvokedlinkedlibraryisadeprecatedversion,whichhasavulnerabilityinthepricecalculationthatcanbeutilizedbymalicioususerstoexchangeasmallnumberoftokenforanotherbunchoftokenataratedeviatedfromthemarket.However,thedeveloperofSaddleFinancehasuploadedtheupdatedandbugfreeversionoftheinvokedlibrary,whichcoversthevulnerabilityhiddeninthelibrary.Eventually,thisvulnerabilityhasbeenexploitedandledtoafinanciallossof750K financial loss for Tornado.cash [48]. As for the 244 cases that are related to R5, we confirm that at least 6 of them invoked malicious libraries but uploaded with benign ones. For example, there is an attack targeting the Saddle Finance [47], a well-known decentralized exchange. Interestingly, its actually invoked linked library is a deprecated version, which has a vulnerability in the price calculation that can be utilized by malicious users to exchange a small number of token for another bunch of token at a rate deviated from the market. However, the developer of Saddle Finance has uploaded the updated and bug-free version of the invoked library, which covers the vulnerability hidden in the library. Eventually, this vulnerability has been exploited and led to a financial loss of 7.2 million.
    VIII. RELATED WORK
    Blockchain Security. Currently, a great deal of research
    work is focusing on the security of the whole blockchain
    ecosystem [45], [10], [37], [35], [34], [71], [75], [4], [13],
    [12]. While lots of them are delving into the security risks of
    various modules in the blockchain ecosystem, there is still a
    void when targeting source code verification services.
    Ethereum Scam. In recent years, with the increasing financial
    value embedded in the Ethereum contracts, the research on the
    scam in the Ethereum ecosystem has become a hot spot [73],
    [74], [11], [65], [14], [31], [72]. For example, Chen et al. [11]
    propose a heuristic-guided symbolic execution technique to
    identify Ponzi scheme contracts in Ethereum. However, there
    is little discussion about if scams can be performed by a
    mismatched source code and deployed bytecode.
    Ethereum Smart Contract Analysis. In terms of Solidity
    contract security, there is a lot of work [32], [19], [40], [28 [29], [64], [9], [3], [33], [39] focus on contract vulnerability
    detection based on source code. For instance, Feist et al. [28]
    build a static analysis framework for solidity source code,
    called Slither, to provide support for bug detection in Ethereum
    smart contracts. However, all of them assume the fetched
    source code files are correct and not consider the risks brought
    by source code verifiers.
    IX. THREATS TO VALIDITY
    We have identified three threats to validity in this work.
    First, there might be more kinds of vulnerabilities related to
    smart contract verification services, which are not covered in
    this work. Etherscan is close-sourced and we can only conduct
    a black-box analysis against it. Nevertheless, we argue that we
    have comprehensively performed a code audit on Blockscout
    and Sourcify against the intuitive security properties, and summarized the vulnerabilities related to almost all the modules
    of verifiers. We believe this work can remind developers and
    security researchers of this previously overlooked direction.
    Second, beyond the detection principles for vulnerabilities,
    we also illustrate the corresponding PoCs for them. However,
    we need to mention that these exploitations are not the only
    way to tamper verifiers. For example, in addition to adopting
    return in YUL to exploit the simulating mechanism in
    R2, attackers could also utilize the jump statement in loose
    inline assembly [76]. Moreover, exploits can even be combined
    to perform a more covert attack. In a nutshell, we have
    presented these vulnerabilities in the most understandable way,
    but developers should never take the risks hidden in verifiers
    lightly. Last, we have considered the three mainstream verifiers
    in Ethereum in this work, and identified several exploitable
    vulnerabilities. There do exist other verifiers in Ethereum, e.g.,
    Tenderly [63]. We argue that our detecting principles can be
    directly applied to them, and even verifiers on other EVM
    blockchain platforms.
    X. CONCLUSION
    We explored the security issues of smart contract verification services in this work, an uncharted direction of the
    Ethereum ecosystem. We have depicted the design and implementation of smart contract verification services, summarized
    their security properties, observed eight kinds of vulnerabilities, and proposed effective detection and attack methods.
    Our exploration has uncovered 19 exploitable vulnerabilities
    in popular smart contract verification services, posing a great
    impact to millions of smart contracts in the ecosystem. Our
    results encourage our research community to invest more
    efforts into the under-studied directions.
    ACKNOWLEDGEMENT
    We would like to express our sincere thanks for all anonymous reviewers and the shepherd for their valuable suggestions
    on improving this paper. Additionally, we really appreciate
    the official teams of Blockscout and Sourcify for their timely
    responding, and samczsun for his insightful comments on
    this paper. This work was supported in part by National Key
    R&D Program of China (2021YFB2701000), the Key R&D
    Program of Hubei Province (2023BAB017, 2023BAB079),
    the Knowledge Innovation Program of Wuhan-Basic Research,
    HUST CSE-HongXin Joint Institute for Cyber Security, Hong

विचार

Vou preparar um resumo bem estruturado do paper, em português, focado em dar-te base sólida para uma apresentação: ideia central, modelo de ameaça, arquitetura dos verificadores, tipologia de vulnerabilidades, resultados e implicações.

Thought for 1m 12s

उत्तर

Claro. Aqui vai um resumo bastante completo, em português, orientado para te dar base sólida para uma apresentação.

Ideia central do paper

O paper mostra que os serviços de verificação de smart contracts em Ethereum — como Etherscan, Sourcify e Blockscoutnão são apenas infraestruturas auxiliares: eles são um ponto crítico de confiança no ecossistema. E o problema é que esses serviços, segundo os autores, têm vulnerabilidades exploráveis que permitem:

  1. associar código-fonte falso a contratos legítimos já deployados, e
  2. fazer contratos maliciosos parecerem inofensivos, ao exibir código-fonte “limpo” que não corresponde ao que está realmente on-chain.

A tese é forte: mesmo que a blockchain seja imutável, a camada de verificação que traduz bytecode para código legível pode ser manipulada, minando a confiança de utilizadores, auditores e ferramentas automáticas.


Porque é que isto importa

Em Ethereum, os contratos são guardados on-chain como bytecode EVM, que é muito difícil de ler.
A verificação serve para ligar esse bytecode ao código-fonte Solidity supostamente original.

Sem isso:

  • os utilizadores não conseguem inspecionar facilmente a lógica real do contrato;
  • é difícil distinguir contratos legítimos de scams;
  • ferramentas como Slither, Echidna e auditorias externas ficam dependentes de uma fonte de verdade que pode estar errada.

O paper argumenta que a comunidade tende a assumir que “verified source code = trustworthy source code”, mas essa equivalência pode falhar.


O que os autores fazem

Os autores fazem a primeira análise sistemática de segurança dos serviços de verificação de smart contracts.

Eles:

  • analisam 3 verificadores populares: Etherscan, Sourcify e Blockscout;
  • modelam o workflow interno desses serviços;
  • definem 2 propriedades de segurança que eles deviam garantir;
  • identificam 8 tipos de vulnerabilidades;
  • constroem métodos de deteção e provas de conceito (PoCs);
  • encontram 19 vulnerabilidades exploráveis no total;
  • fazem responsible disclosure;
  • mostram impacto real e potencial à escala do ecossistema.

Resultado principal: todos os verificadores estudados são abusáveis de alguma forma.


Contexto técnico essencial

1. Como é estruturado um smart contract em Ethereum

O paper relembra que o bytecode EVM tem, em geral, três partes:

  • creation code: executado apenas no deploy;
  • runtime code: a lógica efetiva do contrato depois de deployado;
  • metadata: informação de compilação, como versão do solc, hashes, etc.

Isto é importante porque o processo de verificação precisa de comparar o que foi compilado localmente com o que está on-chain, e essa comparação é mais difícil do que parece.

Há ainda complicações adicionais:

  • factory contracts, que podem incluir múltiplos blocos de metadata;
  • immutable variables, cujo valor só fica definido durante o deploy;
  • linked libraries, cuja lógica pode estar noutra morada e não dentro do bytecode principal.

O que é “verificação” neste contexto

O utilizador fornece:

  • código-fonte;
  • opções/configuração de compilação;
  • endereço do contrato.

O verificador recompila e compara com o bytecode on-chain.

Existem dois tipos de match:

  • partial match: bate no essencial, mas não em metadata/configuração;
  • exact match: correspondência total.

Os autores sublinham que Sourcify distingue partial e exact match, e isso abre alguns vetores de ataque específicos; já Etherscan e Blockscout tendem a não permitir substituição de contratos já verificados manualmente.

Também salientam que a percentagem de contratos verificados é muito baixa, o que aumenta a superfície de ataque:

  • menos de 1% no Etherscan,
  • 0,7% no Sourcify,
  • 0,3% no Blockscout.

Modelo de ameaça

O paper define dois grandes tipos de abuso.

A1. Adversarial verification

Um atacante consegue verificar o contrato de outra pessoa com um código-fonte forjado.

Consequências:

  • o contrato legítimo passa a aparecer associado a um código enganador;
  • os verdadeiros developers podem ficar impedidos de substituir esse código;
  • o atacante pode incluir:
    • phishing links,
    • mensagens falsas,
    • funções supostamente vulneráveis,
    • informação reputacionalmente tóxica.

Isto ataca a credibilidade do projeto vítima.

A2. Source scam

Aqui o atacante é o próprio developer do contrato malicioso.

Ele:

  • deploya um contrato com comportamento malicioso,
  • mas consegue verificá-lo com um código-fonte aparentemente benigno.

Consequência:

  • utilizadores e ferramentas acreditam estar a olhar para um contrato seguro,
  • mas on-chain corre outra lógica.

Este é talvez o cenário mais perigoso.


Como os autores modelam o verificador

Eles decompoem o serviço em 6 módulos:

M1. Aquisição de dados não confiáveis

Recebe:

  • código-fonte,
  • ficheiros adicionais,
  • configuração,
  • endereço alvo.

M2. Aquisição de dados on-chain

Obtém:

  • runtime code,
  • bytecode do deploy,
  • transação de criação, etc.

M3. Compilação

Usa solc para compilar o código enviado.

Problema importante: immutables.
Para lidar com eles, os verificadores usam duas estratégias:

  • substituição direta dos valores com base no que está on-chain;
  • simulação do creation code.

Os autores mostram que ambas têm implicações de segurança.

M4. Comparação

Compara o que foi compilado localmente com o que está on-chain.

Aqui há “atalhos” perigosos:

  • comparar só o prefixo do bytecode;
  • ignorar/remover metadata;
  • lidar mal com argumentos de construtor.

M5. Armazenamento

Guarda o código submetido.

Pode ser:

  • centralizado (Etherscan, Blockscout),
  • descentralizado via IPFS (Sourcify).

M6. Display/API

Expõe o código e metadados ao público e a ferramentas externas.

Se esta camada mostrar informação ambígua ou incompleta, também pode induzir erro.


Dois “atalhos” que agravam o problema

1. Herança de código entre contratos com o mesmo runtime

Etherscan e Blockscout assumem que, se dois contratos têm o mesmo runtime, então o código-fonte de um pode ser herdado pelo outro.

Isto ajuda em factory contracts, mas também amplifica ataques: um contrato verificado pode “contaminar” muitos outros.

2. Herança de resultados entre plataformas

Blockscout herda automaticamente resultados do Sourcify.

Logo, uma falha no Sourcify pode propagar-se para o Blockscout.


As duas propriedades de segurança propostas

P1. Unrestorability

A ideia é: não deve ser possível reconstruir ou forjar outro código-fonte que passe na verificação sem ser o original do developer.

É quase uma analogia a “resistência a preimagem”, mas aplicada à verificação de código.

Se esta propriedade falha, abre-se a porta a adversarial verification.

P2. Consistency

O código verificado e o contrato on-chain devem ter a mesma semântica.

Não pode haver:

  • partes de lógica on-chain que não aparecem no código exibido;
  • ou partes exibidas que não correspondem ao comportamento real.

Se esta propriedade falha, abre-se a porta a source scam.


As 8 vulnerabilidades

O paper organiza-as em dois grupos: as que levam a A1 e as que levam a A2.


Vulnerabilidades que levam a A1

R1. Exploitable compiler features

Os atacantes abusam de funcionalidades do compilador, como:

  • YUL,
  • loose inline assembly,
  • verbatim.

Essas features permitem controlar diretamente sequências de opcodes e construir contratos “espelho” que passam na verificação sem corresponder ao código original do developer.

Ideia-chave

O compilador deixa demasiado controlo ao utilizador, e o verificador não consegue distinguir “código legítimo” de “código fabricado para bater no bytecode”.

Impacto

Permite verificar contratos alheios com código forjado.


R2. Unchecked simulating

Alguns verificadores simulam o creation code para resolver immutables.
O problema é que essa simulação pode ser enganada.

Um atacante pode construir um “wrapper” que devolve diretamente o runtime code da vítima antes da lógica real.

Ideia-chave

O verificador confia demasiado na execução simulada.

Resultado

O contrato falso consegue parecer um exact match.

Mitigação proposta

Evitar simulação e preferir substituição controlada dos immutables.


R3. Incomplete bytecode validation

Para não pedir os valores dos parâmetros do construtor, alguns verificadores aceitam que o bytecode compilado seja apenas prefixo do bytecode da transação de criação.

Isto é perigoso porque:

  • um prefixo inválido pode ser aceite;
  • até um bytecode vazio pode ser tratado como prefixo válido.

Exemplo dado

Um abstract contract compila para bytecode vazio, mas ainda expõe ABI functions.
Assim, um atacante pode exibir funções “visíveis” e suspeitas para descredibilizar o contrato real.

Problema conceptual

A heurística de “prefix matching” melhora usabilidade, mas destrói segurança.


Vulnerabilidades que levam a A2

R4. Replaceable on-chain contracts

Mesmo depois da verificação, o contrato num endereço pode deixar de ser o mesmo, por exemplo via:

  • chain reorganization,
  • ou abuso de create2 e selfdestruct.

Ideia do ataque

  1. Deploya-se um contrato benigno.
  2. Verifica-se esse contrato.
  3. Destrói-se o contrato.
  4. Reutiliza-se o mesmo endereço para deployar lógica maliciosa.

A verificação antiga continua associada ao endereço, mas o código real mudou.

Este ponto é muito forte

Mostra que a verificação não deve ser tratada como evento único; o sistema devia verificar também a imutabilidade do estado futuro do endereço.


R5. Unverified linked libraries

Quando um contrato chama uma linked library, a implementação da biblioteca pode não estar realmente incluída no bytecode do caller.

O que aparece no contrato principal é apenas algo equivalente a:

  • endereço da biblioteca,
  • assinatura da função,
  • delegatecall.

Problema

O verificador pode aceitar uma biblioteca “bonita” enviada pelo atacante, mesmo que a biblioteca realmente chamada on-chain tenha comportamento malicioso.

Consequência

O código exibido parece benigno, mas a lógica externa chamada pelo contrato é diferente.


R6. Mislabeled bytecode

O verificador remove partes do bytecode que acha que são:

  • metadata,
  • ou placeholders de linked libraries.

Mas se essa identificação estiver errada, pode:

  • deixar passar lógica maliciosa,
  • ou remover parte da lógica real antes da comparação.

Dois subproblemas

  1. metadata mal identificada
  2. placeholders de libraries mal resolvidos

Os autores mostram que:

  • regex matching é frágil;
  • até o método “melhorado” do Blockscout, baseado em diferenças, pode ser abusado;
  • o mecanismo do Sourcify para placeholders também pode ser manipulado.

Este é um achado muito importante

Porque mostra que até mecanismos criados para serem mais robustos continuam exploráveis.


R7. Path traversal risk

Os ficheiros enviados para verificação podem vir num JSON com paths arbitrários.

Se o serviço não sanitiza esses paths, o atacante pode usar algo como ../ para sobrescrever ficheiros de outro contrato no backend.

Consequência

O código-fonte armazenado para um endereço pode ser trocado por outro.

Isto é quase uma vulnerabilidade “clássica de web/app”

Mas aqui aplicada a infraestrutura de verificação de contratos.


R8. Inadequate information disclosure

Mesmo sem hack profundo, o simples modo como a informação é mostrada pode induzir erro.

Exemplo:

  • vários contratos com o mesmo nome,
  • diferentes diretórios,
  • metadados insuficientes no display,
  • ficheiro principal ambíguo.

Consequência

O utilizador pode olhar para o contrato “errado” dentro do pacote verificado e assumir que esse é o contrato real.

Este ponto é subtil, mas muito relevante

Porque lembra que segurança não é só comparação de bytecode, mas também UI, naming e desambiguação.


Metodologia experimental

Os autores testam:

  • Etherscan via black-box testing, porque é fechado;
  • Sourcify e Blockscout com análise mais detalhada;
  • tudo em Goerli testnet, por razões éticas;
  • constroem contratos-alvo, PoCs, verificações e re-verificações;
  • após encontrar falhas, fazem disclosure em menos de 30 minutos.

Resultados principais

Resultado global

Encontram 19 vulnerabilidades exploráveis no total.

Distribuição por plataforma:

  • Etherscan: vulnerável a 4 tipos
  • Sourcify: vulnerável a 7 tipos
  • Blockscout: vulnerável a todos os 8 tipos

Estado de disclosure/patching

  • 15 vulnerabilidades foram confirmadas pelos maintainers
  • 10 já tinham sido corrigidas no momento da escrita

Interpretação importante

O paper não está a dizer apenas “há bugs”.
Está a dizer que a confiança institucional no rótulo “verified” está sobrestimada.


Impacto medido

Contratos afetados por A1

Os autores estimam que:

  • mais de 49 mil contratos podem ser afetados por R1;
  • cerca de 58,9 milhões de contratos Solidity não verificados no Sourcify ficam expostos a R2 e R3;
  • entre os potenciais alvos de R1, identificam muitos contratos suspeitos de seguir padrões:
    • ERC20
    • ERC721

Ou seja, o problema toca ativos e NFTs, não é marginal.

Contratos afetados por A2

Ao procurar evidência de abuso real, encontram:

  • 2 contratos suspeitos ligados a R4
  • 244 contratos suspeitos ligados a R5

Depois de revisão manual:

  • confirmam pelo menos 1 caso real de ataque relacionado com R4, ligado ao Tornado.cash, com perda de $750K
  • confirmam pelo menos 6 casos relacionados com R5, incluindo um caso em Saddle Finance, associado a perda de $7.2M

Isto é especialmente relevante porque tira o paper do plano puramente teórico.


O que o paper prova de facto

O paper prova três coisas muito fortes:

1. “Verified” não implica necessariamente “correto”

A etiqueta de verificação pode ser enganadora.

2. O verificador é parte da superfície de ataque

A comunidade costuma focar-se no contrato e no compilador, mas não no serviço de verificação em si.

3. O problema afeta todo o ecossistema

Não afeta só o utilizador final, mas também:

  • auditores,
  • scanners automáticos,
  • datasets académicos,
  • block explorers,
  • serviços que fazem due diligence com base no source code verificado.

Pontos fortes do paper

Muito forte no framing

A pergunta de investigação é excelente:
“Será que os verificadores são trustworthy?”

Boa abstração

Os autores não ficam presos à implementação de um único serviço; constroem um modelo geral com módulos M1–M6.

Boa taxonomia

As propriedades P1/P2 e os cenários A1/A2 ajudam bastante a organizar a apresentação.

Bom equilíbrio entre teoria e prática

O paper:

  • conceptualiza,
  • implementa PoCs,
  • mede impacto,
  • faz disclosure responsável.

Relevância prática alta

É um trabalho com implicações imediatas para infraestrutura real.


Limitações do paper

Etherscan é caixa-preta

Isso limita a certeza sobre alguns detalhes internos.

Não garante exaustividade

Os próprios autores admitem que podem existir mais classes de vulnerabilidades.

Algumas explorações dependem de condições específicas

Por exemplo:

  • versão do solc,
  • estrutura do metadata,
  • presença de factories,
  • uso de linked libraries.

Parte da avaliação empírica é heurística

Na deteção de casos reais, há contratos “suspeitos”, mas nem todos são confirmados como ataques reais.

Mesmo assim, isso não enfraquece a tese principal.


A mensagem que eu destacaria numa apresentação

Se tivesse de resumir o paper em uma frase:

A confiança no ecossistema Ethereum não depende apenas da imutabilidade on-chain, mas também da integridade off-chain dos serviços que afirmam explicar o que o bytecode faz — e esses serviços podem ser enganados.


Estrutura sugerida para a tua apresentação

Uma estrutura muito natural seria esta:

1. Problema

  • O bytecode EVM é ilegível.
  • O ecossistema depende de serviços de verificação.

2. Pergunta central

  • Podemos confiar nesses serviços?

3. Como funciona a verificação

  • source code + config + address
  • recompilação
  • comparação com bytecode/runtime

4. Arquitetura dos verificadores

  • M1 a M6

5. Propriedades de segurança

  • P1 unrestorability
  • P2 consistency

6. Modelo de ameaça

  • A1 adversarial verification
  • A2 source scam

7. As 8 vulnerabilidades

  • R1–R3 para A1
  • R4–R8 para A2

8. Resultados

  • 19 vulnerabilidades
  • todos os verificadores afetados
  • 15 confirmadas
  • 10 corrigidas

9. Impacto real

  • dezenas de milhões de contratos potencialmente afetados
  • casos reais com perdas financeiras

10. Conclusão

  • “verified” não é sinónimo de “safe”
  • verificadores precisam de ser tratados como infraestrutura crítica

Conclusão final

Este paper é muito bom porque muda o foco habitual da segurança em Ethereum.

Em vez de perguntar apenas:

  • “o contrato tem bugs?”
    ou
  • “o compilador gera bytecode correto?”,

ele pergunta:

  • “e se o próprio mecanismo que liga código-fonte a bytecode for abusável?”

A resposta dos autores é clara: é abusável, já foi abusado, e isso compromete uma peça central da confiança no ecossistema Ethereum.

Para uma apresentação, eu diria que os 3 conceitos que tens mesmo de fixar são:

  • A1 vs A2
  • P1 vs P2
  • R1–R8 como mapa das falhas

E os 3 takeaways finais seriam:

  • a verificação é infraestrutura de confiança, não apenas conveniência;
  • usabilidade mal desenhada pode abrir portas sérias;
  • ferramentas e utilizadores não devem assumir que “verified source” é automaticamente fidedigno.

Posso também transformar isto num guião de 10–12 slides com títulos e notas do orador.

इस प्रश्नोत्तर को साझा करें