What Is Sharding? Meaning Of Sharding

Contents

1 What is sharding?
2 How does sharding work?
3 Sharding and Security
4 Advantages and disadvantages of sharding
5 Why is sharding used?
6 What is the difference between Sharding and Partitioning?
7 Do you need database sharding?
8 What are the different sharding architectures and types?
9 What are examples of sharding in cryptocurrencies?

The ultimate method of scaling cryptocurrency.

Once upon a time, blockchain networks could only have two features of the holy trinity, decentralization, security, and scalability. Building any two of these features into your blockchain feature is possible but never all three. This is known as the blockchain trilemma, and solving it has been on the mind of every blockchain developer since the birth of Web3.

But now, there might be a solution to the infamous trilemma called blockchain Sharding.

The brand new newsletter with insights, market analysis and daily opportunities.

Let’s grow together!

What is sharding?

Sharding is a method of dividing a single database of a blockchain network across multiple databases. These databases can then be stored on separate devices. This allows more extensive databases to be dealt with in smaller chunks of data and stored in multiple data nodes. It allows the system to increase its overall storage capacity. This allows the various nodes to process more requests simultaneously across the network.

Sharding is a method of horizontal scaling that increases the processing power of a network by dividing the workload across multiple data nodes operating on independent machines. This method allows a nearly limitless opportunity for network scalability, enabling it to deal with big data more effectively.

video 1: https://www.youtube.com/watch?v=8AtGrwSN8Nc

How does sharding work?

The first and most important thing to understand about sharding is that not all sharding strategies are created equally. Each method, however, completes a nearly identical task of divvying up the database into more manageable chunks of data, which are then split up to be stored and processed on different shards. These shards are independent nodes operating on separate devices that are used in a single interconnected blockchain network.

The next thing to consider is that there are different operations shared on a network, reading and writing operations. These possess various advantages and disadvantages for the network and bring about an additional workload. Operations with primarily read functions are relatively easy to replicate, increasing the overall performance due to low processing costs for the device, while mixed read-write operations carry a higher workload. Each operation requires a different sharding architecture to be processed and distributed most effectively.

The last thing to consider is system maintenance. Once again, the network contains numerous devices and data nodes that each store data. The map on which this data is stored may require occasional maintenance. The architecture of the sharding method affects the map in question and thus also affects the amount of maintenance it requires. This will be further explained when you read the part that differentiates the types of sharding architecture.

Sharding and Security

If you know anything about the plight to scale a blockchain network, then you know that any act of scalability has one primary foe it must overcome, and that behemoth’s name is Security. As with the blockchain trilemma, it has long been stated that one can have scalability and decentralization or decentralization and security but never all three. Sharding tries to prove this incorrect.

Ethereum is a perfect example of this. It has always aimed to be decentralized and secure, but it took on the feat of becoming scalable. This poses immediate threats to its network security, and even the earliest attempts at sharding put it at risk of a data node takeover. Due to the decentralized nature of Ethereum, creating new network shards creates a data pool with its independent authenticators. This network is subject to cyber vulnerabilities. The more shards made, the less safe any individual shard is against cybercrime. This problem has only scaled up after the Merge and Ethereum’s Mainnet merging with the Beacon chain proof-of-stake system that plans to introduce 64 new shard chains.

Thus far, the network has opted to try to trick potential hackers with a random node assignment protocol which would confuse hackers on which node is where and which shard is worthwhile to hack.

Advantages and disadvantages of sharding

Advantages	Disadvantages
Increased writing & reading operation capacity	Query overhead, separated sharded databases
Increased storage capacity through scalability	Complexity of administration
Increased availability	Increased infrastructure costs

Why is sharding used?

The primary function of Sharding is to improve the scalability of a blockchain network. The network contains a giant ledger that accumulates an extensive database. Sharding allows the network to divide and distribute this data into smaller chunks, called logic shards, and then have it processed across multiple data notes. This removes the need for a single device to process many data operations and thus increases the network’s overall performance by decreasing the total computation demand.

Sharding is also referred to as horizontal scaling, and it has two key ways to increase system performance:

Parallel processing can take advantage of all the computers simultaneously to solve a single query.
Due to the data chucks (logic shards) being separated, the machine gets to scan a few rows before responding to a query.

Now that that’s out the way, there are two kinds of sharding, and each has its benefits.

Horizontal Sharding is effective when queries return a few rows, and those rows are grouped. This query thus only filters through a short range of data before returning them to the server.

Vertical Sharding is effective when queries return subsets of columns of data. These columns may contain names or addresses, and these separate columns can be sharded onto different servers.

Thinking of it this way makes it far easier to visualize.

The last thing to think about is that sharded databases offer a higher data availability in the event of a shard outage. It prevents the shutdown of an entire network or application and offsets the blackout to a single data node or shard. This risk is then further mitigated by replicating the data onto additional nodes.

What is the difference between Sharding and Partitioning?

There is a nuanced difference between sharding and partitioning. Partitioning involves dividing the database into smaller subsets that belong to a single database instance. Sharding involves a similar data division, but it does so over multiple computers that do not make necessary deals with the entirety of any singular database. They are often used as synonyms, however, depending on the type of sharding. For instance, “horizontal sharding” is synonymous with “horizontal partitioning.”

Do you need database sharding?

Sharding is not some utopian solution without actual world costs. Setting up a sharding system requires a lot of capital and labor-related resources. Before setting up a sharding system, developers must consider alternative solutions for data distribution on a blockchain network that allows for long-term scalability.

These are three alternative strategies:

Vertical Scaling: This one is relatively straightforward. But by upgrading one’s hardware, you’d acquire a higher RAM. Updating the computer’s CPU would increase the storage capacity of any individual device. This increases system storage and performance without altering the network architecture. But it is not a scalable solution.
Specialized services or database: This solution involved migrating some of the network’s data onto an already established data server like the Cloud or Amazon’s S3 server. This offloads less critical data – like analytics or full-text searches – onto these specialized data warehouses to free up space on the network for additional data transactions. This poses the apparent threat of data centralization.
Replication: This solution focuses primarily on read-focused data as it is easier to replicate. It increases system availability and overall system performance and avoids the need for a complex sharding system to be introduced. This, however, has limited potential due to its read-focus requirement. When write-focused operations are introduced into the database, it overcomplicates the system.

What are the different sharding architectures and types?

The act of sharding a blockchain network is an important one, but it is not a one-size fits all kind of solution. There are numerous types of sharding, and each is selected for a specific problem in the network. These are examples of sharding architecture:

Key Based Sharding

This is called hash-based sharding and involves newly written data to obtain its value. This data includes customer ID numbers, client application’s IP address, etc. It converts the input of this data into a discrete value known as a hash value. The system uses a shard key that identifies the hash value’s function and enters it into the correct column. These positions and their linked functions are static and remain unchanged. This is to prevent system operation updates from protecting the network’s performance.

The value storage process enacted through key-based sharding is a typical sharding architecture. While it is a secure method of preserving data, it has its limitations. Each must receive a corresponding hash value when adding new servers to the network. These values and their entries need to be remapped and migrated to their appropriate columns on a server. This entire process results in the network being paused from writing new data.

This strategy’s main advantage is preventing hotspots in the network while evenly distributing the data. Its distribution is guaranteed by its native algorithm, and it ensures that the data map does not need to be externally maintained like with other sharding strategies.

Range-Based Sharding

This sharding strategy is the foil to the abovementioned one. It is perhaps the simplest method of sharding through the correspondence of the data; match your value range with the corresponding shard. This works in most instances, but it does prevent the data from being separated into shards to be done evenly.

The process of range-based sharding groups data based on an assigned range of values. Each shard is intended to have an identical amount of data, but each shard only deals with a specific range of values which determines which data is assigned to which respective shard. This makes for an uneven data distribution per value but not quantity. Some shards in this network will inevitably be prioritized over others due to their importance to the network.

Directory-Based Sharding

This is the stereotypical bookkeeper of the sharding world with a dedicated table of content to keep track of each shard created and each pocket of data stored. This strategy uses the Delivery Zone column as a shard key.

This method is an amalgamation of the two sharding methods above, as it uses both a shard key and a correspondence table to track the shards and their assigned data. It is the superior option over the range-based sharing method for data with cardinality and thus fewer potential values. It also makes sense because this method still gives a value per key, making little sense to mass-store a range of keys for a single data value in a shard. It thus combines the approaches and assigns keys based on values, and then checks that key against the entry on the lookup table.

Its main appeal is that it is flexible by combining the key and range-based approaches. Still, what it has in the flexibility it loses in performance, and the need to constantly check the table for ever-increasing databases slows down the overall performance speed of the shard. This impacts the ability of shards to write new data onto the network.

What are examples of sharding in cryptocurrencies?

NEAR Sharding

Near sharding is the primary example of sharding a blockchain network. This example focused on enabling low-end devices to run a node as part of the original network. The proof-of-stake (PoS) protocol allows a blockchain to obtain higher scalability, facilitating nodes to operate on low-end devices. This method, however, still encounters issues of data availability and validity.

Ethereum Beacon Chain

In the case of Ethereum, they used the sharding method to increase their TPS and improve the blockchain network’s scalability. The Beacon Chain of Ethereum became fundamental to its shift to Ethereum 2.0 and has become the “master chain” in the post-merge Ethereum network with its proof-of-stake (PoS) protocol. This sharding method allows the chain to manage transaction validation and stakes on the network. It even permitted violators on the network to face penalties. Most importantly, it has helped Ethereum 2.0 with its scalability.

Polkadot Parachain

The last example involves an overall simplification of the blockchain. Polkadot’s approach prioritizes productivity and independence of computation when distributing the sharded database amongst its nodes. It has another feature, specificity among parachains, that all but prevents conflicts between transactions. All in all, this is a highly effective method of sharding.

All information provided on this website is for educational and informational purposes only. Please consult with our Disclaimer.

Meaning of sharding in blockchain