 # Bucket Sort in Data Structures

Amit Kumar Ghosh  34 min read
15 Jun 2023
Intermediate
250 Views

## Introduction

When talking about data structures, the bucket sort algorithm is a powerful and efficient sorting method that can be used to quickly organize large volumes of data. This sorting technique, which uses an array of buckets to classify groups of elements based on their properties, has been around for decades but only recently began gaining traction due to its scalable nature and improved performance in much longer datasets. With this article, we'll explore what bucket sort is, how it works, and some of the key benefits it can offer when implemented in different applications. So if you're looking for a better way to sort through your boxes full of unsorted documentation or process petabytes worth of Big Data projects-bucket sort just might be the answer you've been searching for!

## What is Bucket Sort in Data Structure?

Bucket sort is a popular sorting algorithm to arrange a huge amount of data by dividing them into buckets. These buckets are used to store data that have similar values, and then it applies sorting techniques on each bucket to get the sorted list. This technique is used to sort data in linear time; hence, it becomes more efficient in terms of time complexity compared to other sorting algorithms. As the name suggests, it works similarly to sorting books in a library. All the books are grouped by their subject, and each group is sorted alphabetically, making the search process much more manageable. Likewise, bucket sort in data structures aims to lessen the complexity of sorting a large dataset.

## Working of Bucket Sort

Bucket sort in data structure is a sorting algorithm that works by dividing an array of elements into several buckets or bins. Each bucket is then sorted individually, either using another sorting algorithm or by recursively applying the bucket sort algorithm. Finally, the sorted elements from each bucket are concatenated to obtain the sorted array.

Here's a step-by-step explanation of how bucket sort works:

1. Determine the range of the input: Find the minimum and maximum values in the input array. This range will be used to divide the elements into buckets.
2. Create empty buckets: Create a fixed number of empty buckets equal to the number of elements in the input array or a predetermined value based on the range.
3. Distribute elements into buckets: Iterate through the input array and distribute each element into its corresponding bucket based on a formula that evenly distributes the elements across the buckets. The formula can be something like (value - min) / (max - min), where value is the element's value, min is the minimum value in the input array, and max is the maximum value.
4. Sort individual buckets: Sort each bucket individually using a suitable sorting algorithm, such as insertion sort, quicksort, or another bucket sort recursively. The choice of sorting algorithm depends on the size and nature of the elements within each bucket.
5. Concatenate the buckets: After sorting the individual buckets, concatenate them back into a single sorted array. The order of concatenation depends on the order of the buckets.

## Bucket Sort Algorithm

``````bucketSort()
create N buckets each of which can hold a range of values
for all the buckets
initialize each bucket with 0 values
for all the buckets
put elements into buckets matching the range
for all the buckets
sort elements in each bucket
gather elements from each bucket
end bucketSort``````

## Implementation of Bucket Sort

``````
# Bucket sort in Python

NARRAY = 7 # Array size
NBUCKET = 6 # Number of buckets
INTERVAL = 10 # Each bucket capacity

class Node:
def __init__(self, data):
self.data = data
self.next = None

def bucket_sort(arr):
buckets = [[] for _ in range(NBUCKET)] # Create empty buckets

# Fill the buckets with respective elements
for i in range(NARRAY):
pos = get_bucket_index(arr[i])
buckets[pos].append(arr[i])

# Sort the elements of each bucket
for i in range(NBUCKET):
buckets[i] = insertion_sort(buckets[i])

# Put sorted elements back into the array
index = 0
for i in range(NBUCKET):
for j in range(len(buckets[i])):
arr[index] = buckets[i][j]
index += 1

def insertion_sort(bucket):
if len(bucket) <= 1:
return bucket

for i in range(1, len(bucket)):
key = bucket[i]
j = i - 1
while j >= 0 and bucket[j] > key:
bucket[j + 1] = bucket[j]
j -= 1
bucket[j + 1] = key

return bucket

def get_bucket_index(value):
return value // INTERVAL

# Print array
def print_array(arr):
for i in range(len(arr)):
print(arr[i], end=" ")
print()

# Driver code
if __name__ == "__main__":
array = [42, 32, 33, 52, 37, 47, 51]

print("Initial array:")
print_array(array)
print("-------------")

bucket_sort(array)
print("-------------")
print("Sorted array:")
print_array(array)
``````
``````
// Bucket sort in Java
import java.util.ArrayList;
import java.util.List;

class BucketSort {
private static final int NARRAY = 7; // Array size
private static final int NBUCKET = 6; // Number of buckets
private static final int INTERVAL = 10; // Each bucket capacity

static class Node {
int data;
Node next;

Node(int data) {
this.data = data;
this.next = null;
}
}

static void bucketSort(int[] arr) {
List<Node>[] buckets = new ArrayList[NBUCKET];

// Initialize empty buckets
for (int i = 0; i < NBUCKET; i++) {
buckets[i] = new ArrayList<>();
}

// Fill the buckets with respective elements
for (int i = 0; i < NARRAY; i++) {
int pos = getBucketIndex(arr[i]);
Node newNode = new Node(arr[i]);
}

// Print the buckets along with their elements
for (int i = 0; i < NBUCKET; i++) {
System.out.print("Bucket[" + i + "] : ");
printBuckets(buckets[i]);
System.out.println();
}

// Sort the elements of each bucket
for (int i = 0; i < NBUCKET; i++) {
buckets[i] = insertionSort(buckets[i]);
}

System.out.println("-------------");
System.out.println("Buckets after sorted");
for (int i = 0; i < NBUCKET; i++) {
System.out.print("Bucket[" + i + "] : ");
printBuckets(buckets[i]);
System.out.println();
}

// Put sorted elements into arr
int j = 0;
for (int i = 0; i < NBUCKET; i++) {
for (Node node : buckets[i]) {
arr[j++] = node.data;
}
}
}

static List<Node> insertionSort(List<Node> list) {
if (list == null || list.size() <= 1) {
return list;
}

List<Node> nodeList = new ArrayList<>();

for (int i = 1; i < list.size(); i++) {
Node current = list.get(i);
int j = 0;
while (j < nodeList.size() && nodeList.get(j).data < current.data) {
j++;
}
}

return nodeList;
}

static int getBucketIndex(int value) {
return value / INTERVAL;
}

// Print buckets
static void print(int[] arr) {
for (int i = 0; i < NARRAY; i++) {
System.out.printf("%3d", arr[i]);
}
System.out.println();
}

static void printBuckets(List<Node> list) {
for (Node node : list) {
System.out.printf("%3d", node.data);
}
}

// Driver code
public static void main(String[] args) {
int[] array = {42, 32, 33, 52, 37, 47, 51};

System.out.println("Initial array: ");
print(array);
System.out.println("-------------");

bucketSort(array);
System.out.println("-------------");
System.out.println("Sorted array: ");
print(array);
}
}
``````
``````
// Bucket sort in C++

#include <iomanip>
#include <iostream>
using namespace std;

#define NARRAY 7 // Array size
#define NBUCKET 6 // Number of buckets
#define INTERVAL 10 // Each bucket capacity

struct Node {
int data;
struct Node *next;
};

void BucketSort(int arr[]);
struct Node *InsertionSort(struct Node *list);
void print(int arr[]);
void printBuckets(struct Node *list);
int getBucketIndex(int value);

// Sorting function
void BucketSort(int arr[]) {
int i, j;
struct Node **buckets;

// Create buckets and allocate memory size
buckets = (struct Node **)malloc(sizeof(struct Node *) * NBUCKET);

// Initialize empty buckets
for (i = 0; i < NBUCKET; ++i) {
buckets[i] = NULL;
}

// Fill the buckets with respective elements
for (i = 0; i < NARRAY; ++i) {
struct Node *current;
int pos = getBucketIndex(arr[i]);
current = (struct Node *)malloc(sizeof(struct Node));
current->data = arr[i];
current->next = buckets[pos];
buckets[pos] = current;
}

// Print the buckets along with their elements
for (i = 0; i < NBUCKET; i++) {
cout << "Bucket[" << i << "] : ";
printBuckets(buckets[i]);
cout << endl;
}

// Sort the elements of each bucket
for (i = 0; i < NBUCKET; ++i) {
buckets[i] = InsertionSort(buckets[i]);
}

cout << "-------------" << endl;
cout << "Bucktets after sorted" << endl;
for (i = 0; i < NBUCKET; i++) {
cout << "Bucket[" << i << "] : ";
printBuckets(buckets[i]);
cout << endl;
}

// Put sorted elements on arr
for (j = 0, i = 0; i < NBUCKET; ++i) {
struct Node *node;
node = buckets[i];
while (node) {
arr[j++] = node->data;
node = node->next;
}
}

for (i = 0; i < NBUCKET; ++i) {
struct Node *node;
node = buckets[i];
while (node) {
struct Node *tmp;
tmp = node;
node = node->next;
free(tmp);
}
}
free(buckets);
return;
}

// Function to sort the elements of each bucket
struct Node *InsertionSort(struct Node *list) {
struct Node *k, *nodeList;
if (list == 0 || list->next == 0) {
return list;
}

nodeList = list;
k = list->next;
nodeList->next = 0;
while (k != 0) {
struct Node *ptr;
if (nodeList->data > k->data) {
struct Node *tmp;
tmp = k;
k = k->next;
tmp->next = nodeList;
nodeList = tmp;
continue;
}

for (ptr = nodeList; ptr->next != 0; ptr = ptr->next) {
if (ptr->next->data > k->data)
break;
}

if (ptr->next != 0) {
struct Node *tmp;
tmp = k;
k = k->next;
tmp->next = ptr->next;
ptr->next = tmp;
continue;
} else {
ptr->next = k;
k = k->next;
ptr->next->next = 0;
continue;
}
}
return nodeList;
}

int getBucketIndex(int value) {
return value / INTERVAL;
}

// Print buckets
void print(int ar[]) {
int i;
for (i = 0; i < NARRAY; ++i) {
cout << setw(3) << ar[i];
}
cout << endl;
}

void printBuckets(struct Node *list) {
struct Node *cur = list;
while (cur) {
cout << setw(3) << cur->data;
cur = cur->next;
}
}

// Driver code
int main(void) {
int array[NARRAY] = {42, 32, 33, 52, 37, 47, 51};

cout << "Initial array: " << endl;
print(array);
cout << "-------------" << endl;

BucketSort(array);
cout << "-------------" << endl;
cout << "Sorted array: " << endl;
print(array);
}
``````

#### Output:

``````Initial array:
42 32 33 52 37 47 51
Bucket : 42 32
Bucket : 33
Bucket :
Bucket :
Bucket : 52 37 47
Bucket : 51
Buckets after sorted
Bucket : 32 42
Bucket : 33
Bucket :
Bucket :
Bucket : 37 47 52
Bucket : 51
Sorted array:
32 42 33 37 47 52 5``````

## Scatter-Gather Approach

The scatter-gather approach is not typically used in bucket sort because bucket sort inherently involves a different strategy for sorting elements. In bucket sort, the input is divided into a fixed number of equally-sized buckets, and each bucket represents a range or interval of values. Elements from the input are distributed or scattered into their corresponding buckets based on their values. Then, each bucket is sorted individually, typically using another sorting algorithm such as insertion sort or quicksort. Finally, the sorted elements are gathered or concatenated back into a single output array.

The scatter-gather approach, on the other hand, is commonly used in parallel computing or distributed systems to divide a larger task into smaller subtasks that can be processed independently in parallel. In this approach, the input data is partitioned or scattered across multiple processing units or nodes for individual processing. Once the subtasks are completed, the results are gathered or collected from each processing unit and combined to produce the final output.

## Visualization of Bucket Sort

Bucket sort is a sorting algorithm that divides the input into several buckets and then sorts each bucket individually, either using a different sorting algorithm or by recursively applying bucket sort. Here's a visualization of the bucket sort algorithm:

1. Initialize an empty array of buckets.
2. Iterate through the input array and distribute the elements into the appropriate buckets based on their values. Each bucket represents a range of values.
3. Sort each individual bucket. This step can be done using another sorting algorithm such as insertion sort, selection sort, or any other suitable algorithm.
4. Concatenate the sorted buckets back into a single sorted array.

Let's say we have the following unsorted array: [29, 25, 3, 49, 9, 37, 21, 43, 1]. We'll use five buckets, each representing a range of values.

Step 1: Initialize an empty array of buckets: Step 2: Distribute elements into buckets based on their values: Step 3: Sort each individual bucket: Step 4: Concatenate the sorted buckets: This is a simplified visualization of the bucket sort algorithm. In practice, the number of buckets, their ranges, and the sorting algorithm used within each bucket may vary depending on the specific implementation and the range of input values.

## Bucket Sort Complexity

The time complexity of the bucket sort algorithm depends on several factors, including the distribution of the input elements and the sorting algorithm used to sort the individual buckets.

In the average case, bucket sort has a time complexity of O(n + k), where n is the number of elements to be sorted and k is the number of buckets. This assumes that the elements are uniformly distributed across the buckets, and a linear-time sorting algorithm (such as insertion sort or quicksort) is used to sort the individual buckets.

In the best-case scenario, where the elements are evenly distributed across the buckets and each bucket contains approximately the same number of elements, the time complexity of bucket sort can be as low as O(n). However, this is not always achievable in practice.

In the worst case, bucket sort can have a time complexity of O(n^2) if the elements are highly skewed or if the sorting algorithm used to sort the buckets has quadratic time complexity. For example, if insertion sort is used to sort the buckets, and the input elements are already sorted in decreasing order, then each insertion operation takes O(k) time, resulting in a total time complexity of O(kn) or O(n^2).

It's worth noting that the space complexity of bucket sort is O(n + k), as it requires additional space to store the buckets and their contents.

• Efficiency for uniformly distributed data: Bucket sort performs efficiently when the input data is uniformly distributed across a range. It can distribute the elements evenly into buckets, leading to a more balanced workload during the sorting process.
• Simplicity: Bucket sort is relatively easy to understand and implement compared to other sorting algorithms like quicksort or mergesort. It involves dividing the input into buckets and sorting each bucket separately, which simplifies the overall process.
• The linear time complexity for specific cases: In the best-case scenario, where the elements are uniformly distributed across the buckets and the number of elements in each bucket is small, bucket sort can achieve linear time complexity. This is because the sorting of individual buckets can be done in linear time, and the overall time complexity becomes O(n), where n is the number of elements to be sorted.
• Adaptability: Bucket sort can be easily adapted to handle various types of data. It is particularly useful for sorting elements that have some inherent relationship or similarity, such as floating-point numbers or strings with a specific pattern.
• Parallelization: The process of sorting individual buckets can be parallelized, as each bucket can be sorted independently. This allows for efficient implementation in parallel and distributed computing environments, which can significantly speed up the sorting process for large datasets.

• Limited Applicability: Bucket sort is most effective when the input data is uniformly distributed across the range of possible values. If the data is heavily skewed or has outliers, bucket sort may not perform well. Uneven distribution can lead to imbalanced buckets, which can result in poor sorting performance.
• Memory Requirements: Bucket sort requires additional memory space to create the buckets and hold the elements within each bucket. The number of buckets needed depends on the range of input values. If the range is large, a significant amount of memory may be required, which can be a constraint for systems with limited memory resources.
• Non-Adaptive: Bucket sort operates based on a fixed number of buckets, usually predetermined based on the range of input values. If the input data vary significantly in terms of distribution or range, the chosen number of buckets may not be optimal, leading to inefficient sorting. It lacks the adaptability to dynamically adjust the number of buckets based on the input data.
• Stability: Bucket sort, as originally designed, is not a stable sorting algorithm. Stability refers to the preservation of the relative order of elements with equal values. If stability is a requirement for the sorting task, additional steps or modifications are necessary to ensure stability, which can increase the complexity of the algorithm.
• Limited Sorting of Non-Integer Data: Bucket sort is primarily designed for sorting integers or floating-point numbers. It may not be directly applicable to sort other types of data, such as strings or complex objects. Adapting bucket sort for non-numeric data would require additional preprocessing steps or customizations.

## Applications of bucket sort

Here are a few applications where bucket sorting can be useful:

1. Sorting floating-point numbers: Bucket sort is particularly suitable for sorting floating-point numbers within a specific range. For example, if you have a large set of numbers between 0 and 1, you can divide this range into multiple buckets and distribute the numbers accordingly. Sorting each bucket individually is generally faster than using other sorting algorithms.
2. Sorting integers within a range: Similar to sorting floating-point numbers, bucket sort can efficiently sort integers within a specific range. For example, if you have a set of integers between 1 and 1,000, you can divide this range into multiple buckets and distribute the integers accordingly.
3. Sorting data with known distributions: If you have prior knowledge about the distribution of the input data, such as a uniform or Gaussian distribution, bucket sort can be used to take advantage of this information. By dividing the input range based on the known distribution, you can achieve efficient sorting.
4. Radix sort optimization: Bucket sort is often used as a sub-routine in the radix sort algorithm. Radix sort is a non-comparative sorting algorithm that sorts numbers by processing individual digits. It uses bucket sort to sort the numbers based on each digit, achieving efficient overall performance.
##### Summary

To sum up, Bucket Sort is an efficient sorting algorithm that comes in constant time and requires only a few simple steps. It has the capability of quickly sorting datasets due to its sequential nature. When used correctly, Bucket Sort is one of the most reliable ways of ordering data and can provide detailed results with minimal effort. As this article has shown, Bucket Sort can be easily used for many sorting needs and its advantages are clear. When speed counts, nothing else comes close. Therefore if you're looking for an effective way to organize your next project or research endeavour, you should seriously consider Bucket Sort as an ideal option. Thanks for reading - go start utilizing Bucketing today!

Similar Articles 