Several algorithms for finding the first K minimum values ​​of an array

Publisher:绿意盎然Latest update time:2015-05-05 Source: 51hei Reading articles on mobile phones Scan QR code
Read articles on your mobile phone anytime, anywhere
    I haven't written a blog for a long time. During this period, I was mainly preparing for future job review. Today I will summarize how to find the first K minimum values ​​of an array. There are many ways to find the first K minimum values. The main ideas include the following:

    1. Sort the array, and then the first K elements are the elements that need to be found. The sorting method can be quick sort, but we know that if the array is already ordered in quick sort, the time complexity of quick sort is O(N^2). In order to solve this problem, we usually randomly select an array value pivot as the basis, and divide the array into S1 = < pivot and S2 > pivot, so as to avoid the problems existing in quick sort, or randomly select three elements and then take the middle value as the basis to avoid the worst time complexity of the fast algorithm. The first K numbers of this method are in order.
 
    2. Since the first K objects are selected, there is no need to sort all the objects. The idea of ​​quick selection can be used to obtain the first K objects. For example, first use the quick sort set partitioning method to divide the set: S1, pivot, S2, and then compare whether K is less than the number of S1. If it is less than, then quickly sort S1 directly. If the number of K exceeds S1, then quickly sort S2. After the sorting is completed, the first K elements of the array are the first K minimum values ​​of the array. This implementation method is definitely faster than the first full quick sort.
 
    3. When the array is converted to a minimum heap, according to the characteristics of the minimum heap, the first element must be the minimum value in the array. At this time, we can save the elements, and then promote the last element to the first element, and rebuild the minimum heap. In this way, the minimum heap is created K times, and the first K minimum values ​​are found. This is the use of the characteristics of the minimum heap, which is essentially the deletion implementation method of the minimum heap. The advantage of this algorithm is that it implements the in-place sorting of the array and does not require additional memory space.
 
    4. The following idea is somewhat similar to bucket sorting. First, a K-sized array b is given, and then the first K numbers in array a are copied to array b. These K numbers are regarded as the first K minimum values ​​of array a, and a maximum heap is created for array b. At this time, other elements in array a are compared again. If other elements are less than the maximum value of array b (top of the heap), the top value of the heap is replaced and the maximum heap is recreated. In this way, the first K minimum elements are found by traversing the array once. This method uses additional memory space, especially when the selected K value is relatively large, this method needs to be weighed.
    This method is very effective for massive data. Massive data cannot be stored in memory. At this time, create a smaller array space, then create a maximum heap, read other data from the hard disk, and then find the first K data.
 
    These are some of the more traditional methods. Of course, there are other options. I will not elaborate on them here. From the above methods, we can see that the search methods make full use of the characteristics of data structures and algorithms. Therefore, the flexible use of data structures has many benefits for the implementation of algorithms.
 
The following is my implementation code. I implemented the first K elements in the array by printing, and did not save them in a new array:

 

    #include
    #include
    #include
    #include
    #include

    #define LEN 500000
    #define K 100

    /*Properties of heap*/
    #define LEFTSON(i) (2*(i)+1)
    #define RIGHTSON(i) (2*((i)+1))
    #define PARENT(i) (((i) -1)/2)

    void swap(int *a, int *b)
    {
            assert(a != NULL && b != NULL);

            if(a != b)
            {
                    *a = *a ^ *b;
                    *b = *a ^ *b;
                    *a = *a ^ *b;
            }
    }

    int partition(int *a, int left, int right)
    {
            int pivot = a[right];
            int i = left;
            int j = left - 1;

            assert(a != NULL);

            for(i = left; i < right; ++ i)
            {
                    if(a[i] < pivot)
                    {
                            ++ j;
                            swap(&a[i],&a[j]);
                    }
            }
            swap(&a[j + 1],&a[right]);
            return (j + 1);
    }

    void quicksort(int *a, int left, int right)
    {
            int i = 0;

            assert(a != NULL);

            if(left < right)
            {
                    i = partition(a,left,right);
                    quicksort(a, left, i - 1);
                    quicksort(a, i + 1, right);
            }
    }

    int QuickSort(int *a, int size)
    {
            assert(a != NULL);
            quicksort(a,0,size-1);
    }

    void quickselect(int *a, int left, int right, int k)
    {
            int i = 0;

            assert(a != NULL && left <= k
                    && left <= right && k <= right);

            if(left < right)
            {
                    i = partition(a, left, right);
                    if(i + 1 <= k)
                            quickselect(a, i + 1 , right, k);
                    else if(i > k)
                            quickselect(a , left, i - 1, k);
            }
    }

    void QuickSelect(int *a, int size, int k)
    {
            assert(a != NULL);
            quickselect(a, 0, size - 1, k);
    }

    /*Max heap*/
    void max_heapify(int *a, int left, int right)
    {
            int tmp = 0;
            int child = left;
            int parent = left;

            assert(a != NULL);

            for(tmp = a[parent]; LEFTSON(parent) <= right;parent = child)
            {
                    child = LEFTSON(parent);

                    if(child != right && a[child] < a[child + 1])
                            child ++;

                    if(tmp < a[child])
                            a[parent] = a[child];
                    else /*Satisfy the characteristics of the maximum heap, exit directly*/
                            break;
            }
            a[parent] = tmp;
    }

    /*Create maximum heap*/
    void build_maxheap(int *a, int size)
    {
            int i = 0;
            assert(a != NULL);

            for(i = PARENT(size); i >= 0; -- i)
                    max_heapify(a,i,size - 1);
    }

    /*Implementation of minimum heap*/
    void min_heapify(int *a, int left, int right)
    {
            int child = 0;
            int tmp = 0;
            int parent = left;

            assert(a != NULL);

            for(tmp = a[parent]; LEFTSON(parent) <= right; parent = child)
            {
                    child = LEFTSON(parent);

                    if(child != parent && a[child] > a[child + 1])
                            child ++;

                    if(a[child] < tmp)
                            a[parent] = a[child];
                    else /*Satisfy the minimum heap characteristics, exit directly*/
                            break;
            }
            a[parent] = tmp;
    }[page]

    /*Create a minimum heap*/
    void build_minheap(int *a, int size)
    {
            int i = PARENT(size);

            assert(a != NULL);

            for(; i >= 0; -- i)
                    min_heapify(a, i, size - 1);
    }

    /*Search using quick sort*/
    void find_Kmin_num_1(int *a , int size, int k)
    {
            int i = 0;

            assert(a != NULL);

            QuickSort(a, size);

    #if 0
            for(i = 0; i < k ; ++ i)
                    printf("%d ",a[i]);

            printf(" ");
    #endif
    }

    /*Use fast selection implementation*/
    void find_Kmin_num_2(int *a, int size, int k)
    {
            int i = 0;

            assert(a != NULL);

            QuickSelect(a, size, k);

    #if 0
            for(i = 0; i < k ; ++ i)
                    printf("%d ",a[i]);

            printf(" ");
    #endif
    }

    /*Use maximum heap to implement*/
    void find_Kmin_num_3(int *a, int size, int k)
    {
            int i = 0;

            int *b = malloc(sizeof(int)*k);

            assert(a != NULL && b != NULL);

            for(i = 0; i < k; ++ i)
                    b[i] = a[i];

            build_maxheap(b,k);

            for(; i < size; ++ i)
            {
                    if(a[i] < b[0])
                    {
                            b[0] = a[i];
    // build_maxheap(b, k);
                            max_heapify(b,0, k - 1);
                    }
            }
    #if 0
            for(i = 0; i < k; ++ i)
                    printf("%d ",b[i]);

            printf(" ");
    #endif
    }

    /*Use the minimum heap to delete elements*/
    void find_Kmin_num_4(int *a ,int size, int k)
    {
            int i = 0;

            assert(a != NULL);

            build_minheap(a, size - 1);
            for(i = 0; i < k; ++ i)
            {
    // printf("%d ",a[0]);

                    /*Delete a[0] and release a[size - 1 - i]*/
                    a[0] = a[size -1 - i];
                    min_heapify(a, 0, size - 2 - i);
            }
    // printf(" ");
    }

    int main()
    {
            int a[LEN];
            int b[LEN];
            int c[LEN];
            int d[LEN];

            int i = 0,j = 0;

            clock_t _start;
            double times = 0;

            srand((int)time(NULL));

            for(i = 0; i < LEN; ++ i)
            {
                    a[i] = rand()%(LEN);
                    b[i] = a[i];
                    c[i] = a[i];
                    d[ i] = a[i];

    // printf("%d ",a[i]);
            }
    // printf(" ");

            _start = clock();
            find_Kmin_num_1(a,LEN,K);
            times = (double)(clock() - _start)/CLOCKS_PER_SEC;
            printf("Quick sort search requires: %f ",times);

            _start = clock();
            find_Kmin_num_2(b,LEN,K);
            times = (double)(clock() - _start)/CLOCKS_PER_SEC;
            printf("Fast selection search requires: %f ",times);

            _start = clock();
            find_Kmin_num_3(c,LEN,K);
            times = (double)(clock() - _start)/CLOCKS_PER_SEC;
            printf("The maximum heap search requires: %f ",times);

            _start = clock();
            find_Kmin_num_4(d,LEN,K);
            times = (double)(clock() - _start)/CLOCKS_PER_SEC;
            printf("Minimum heap search requires: %f ",times);

            return 0;
    }

Performance of the detection algorithm:

 

    [gong@Gong-Computer interview]$ gcc -g minKnum.c -o minKnum
    [gong@Gong-Computer interview]$ ./minKnum
    Fast sort search requires: 0.130000
    Fast selection search requires: 0.020000
    Max heap search requires: 0.000000
    Min heap search requires: 0.010000

From the results, we can see that the quick sort algorithm has the worst effect, while the maximum heap has the best effect, and the minimum heap has the second best effect, but the maximum heap uses additional memory space. Therefore, under the condition of memory space limitation, it is more appropriate to consider the minimum heap. However, the idea of ​​the maximum heap is indeed very clever, using properties similar to bucket sorting.
 
In order to show whether the algorithm can achieve the search of the first K minimum values, change the array size to 50, print the completion status of each method, and search for the first 10 data. The experimental results are as follows:

 


    4 11 1 8 0 2 1 4 8 8 9 11 48 27 0 1 1 2 4 4 8 8 9 11 48 27 0 1 1 2 4 4 8 8 9 11 48 27 0 1 1 2 4 4 8 8 9 11 48 27 0 1 1 2 4 4 8 8 9 11 48 27 0 1 1 2 4 4 8 8 9 11 48 27 0 1 1 2 4 4 8 8 9 11 48 27         0
    1 1 2 4 4 8 8 9 11 48 27 0 1 1         2 4 4 8 8 9 11 48 27 0 1 1 2    
    4 4 8 8 9 11 48 27 0 1 1     2     4 4 8 8 9 11 48 27 0 1 1 2 0         Maximum heap search requires: 0.000000     0 1 1 2 4 4 8 8 9 11         Minimum heap search requires: 0.000000





From the above experimental results, we can see that all four methods can achieve obtaining the first K smallest elements.

Reference address:Several algorithms for finding the first K minimum values ​​of an array

Previous article:Application of accompanying array and counting sort
Next article:C implementation of three operations of set intersection and union

Latest Microcontroller Articles
  • Download from the Internet--ARM Getting Started Notes
    A brief introduction: From today on, the ARM notebook of the rookie is open, and it can be regarded as a place to store these notes. Why publish it? Maybe you are interested in it. In fact, the reason for these notes is ...
  • Learn ARM development(22)
    Turning off and on interrupts Interrupts are an efficient dialogue mechanism, but sometimes you don't want to interrupt the program while it is running. For example, when you are printing something, the program suddenly interrupts and another ...
  • Learn ARM development(21)
    First, declare the task pointer, because it will be used later. Task pointer volatile TASK_TCB* volatile g_pCurrentTask = NULL;volatile TASK_TCB* vol ...
  • Learn ARM development(20)
    With the previous Tick interrupt, the basic task switching conditions are ready. However, this "easterly" is also difficult to understand. Only through continuous practice can we understand it. ...
  • Learn ARM development(19)
    After many days of hard work, I finally got the interrupt working. But in order to allow RTOS to use timer interrupts, what kind of interrupts can be implemented in S3C44B0? There are two methods in S3C44B0. ...
  • Learn ARM development(14)
  • Learn ARM development(15)
  • Learn ARM development(16)
  • Learn ARM development(17)
Change More Related Popular Components

EEWorld
subscription
account

EEWorld
service
account

Automotive
development
circle

About Us Customer Service Contact Information Datasheet Sitemap LatestNews


Room 1530, 15th Floor, Building B, No.18 Zhongguancun Street, Haidian District, Beijing, Postal Code: 100190 China Telephone: 008610 8235 0740

Copyright © 2005-2024 EEWORLD.com.cn, Inc. All rights reserved 京ICP证060456号 京ICP备10001474号-1 电信业务审批[2006]字第258号函 京公网安备 11010802033920号