Partitioning in Database

Reading Time: 3 minutes

To speed up the query processing, we need to efficiently manage data.
And partitioning the database is considered as one of the ways to enhance the processing speed.

Using Partitioning, we can divide the data based on some value/field, so that data belonging to different groups are identified easily(may be stored on different nodes), and therefore processing a query won’t require a Full Table Scan, hence making the processing faster.

Following keys can be used to achieve partitioning :

  • Partition Key: The Partition Key is responsible for data distribution across your nodes.
  • Clustering Key: The Clustering Key is responsible for data sorting within the partition.
  • Row Key: A Row Key is just the identifier of an entity within a partition.

Partition key

Determines the node on the cluster, where the partition will be stored.

Partitioning Strategies

  1. Single-Level Partitioning –

    a) Range Partition
    Range partitioning maps data to partitions based on ranges of partition key values that you establish for each partition. A range is defined using the operators( ‘>’ or ‘<’ (non-inclusive))

    Consider the following table :
    – CustomerId
    – Name
    – Location
    – DOB

    The Lower Bound and the Upper Bound are required for the partition.
    Ex: we can define range partition on DOB like:-
    Range Partition on DOB  ‘2016-01-01’ < DOB <=  ’2017-01-01’

    b) List Partition – List partitioning enables you to explicitly control how rows map to partitions. A list of discrete values for the partitioning key in the description for each partition is defined.

    A row is mapped to a partition by checking whether the value of the partitioning column for a row falls within the set of values that describes the partition.  If a table is partitioned by list, the partitioning key can only consist of a single column of the table.

    Ex: we can define list partition on Location like
    List Partition on Location( ‘Asia’, ‘Europe’, ‘Africa’ )

    c) Hash Partition – A column is defined, where hash values are generated for each value and based on the buckets, the partition is made.

    The hashing algorithm evenly distributes rows among partitions, giving partitions approximately the same size.
    This is typically used where ranges aren’t appropriate, i.e. employee number, productID, etc.

    It is a better choice than range partitioning when:
    You do not know beforehand how much data maps into a given range
    The sizes of range partitions would differ quite substantially or would be difficult to balance manually
    Range partitioning would cause the data to be undesirably clustered
    Performance features such as parallel DML, partition pruning, and partition-wise joins are important

    Ex: we can define hash partition on any column like:-
    Hash Partition( ‘column-name’ )


  2. Composite Partitioning

A combination of more than one partitioning technique.
         These can be any of the following type: Range-Range, Range-List, Range-Hash, List-Range, and so on.


Row Key

A Row Key in Table Storage can be defined as the “primary key” within a partition.

PartitionKey + RowKey form the composite unique identifier for an entity.

Within one PartitionKey, you can only have unique RowKeys. If you use multiple partitions, the same RowKey can be reused in every partition.

Row key can be viewed as the combination of partition key and the clustering key.

References :



Written by 

Neha is a Senior Software Consultant with an experience of more than 3 years. She is a Big data enthusiast and knows various programming languages including Scala and Java. She is always eager to learn new and advance concepts to expand her horizons and apply them in the project development.

1 thought on “Partitioning in Database3 min read

Comments are closed.