13 lipca 2016

Kontenery STL a pamięć

Pod jednym z ostatnich moich wpisów wywiązała się ciekawa dyskusja o klasach std::set oraz std::unordered_set. W związku z tym postanowiłem, by w tym wpisie zbadać zapotrzebowanie pamięciowe powyższych dwóch klas.

Aby móc śledzić alokacje dokonywane przez kontener STL, napisałem alokator pamięci rr_debug_allocator. Klasa ta dziedziczy po std::allocator< T > i jedynym jej zadaniem jest wypisywanie na konsole zajmowanych i zwalnianych obszarów pamięci.

Jako punkt odniesienia przedstawiam zapotrzebowanie na pamięć dla kontenerów std::vector oraz std::list.

STL Custom Allocator

Implementacja alokatora rr_debug_allocator.

#pragma once

#include <cstdio>
#include <memory>

template < typename T >
class rr_debug_allocator: public std::allocator< T >
{
public:
    using size_type = size_t;
    using pointer = T*;
    using const_pointer = const T*;

    template< typename U >
    struct rebind { using other = rr_debug_allocator<U>; };

    pointer allocate(size_type n, const void *hint=0)
    {
        pointer result = std::allocator<T>::allocate(n, hint);
        const size_t alloc_size = n*sizeof(T);
        printf("  >>  [%d] bytes allocated at [%p] for [%d] x [%d]\n",alloc_size ,result,n,sizeof(T));
        return result;
    }

    void deallocate(pointer p, size_type n)
    {
        printf("  >>  [%d] bytes deallocated at [%p] for [%d] x [%d]\n", n*sizeof(T), p, n,sizeof(T));
        return std::allocator<T>::deallocate(p, n);
    }

    rr_debug_allocator() noexcept: std::allocator<T>() {
        //printf("  >>>>  rr_debug_allocator Constructor\n");
    }

    rr_debug_allocator(const rr_debug_allocator &a) noexcept: std::allocator<T>(a) { }

    template <class U>
    rr_debug_allocator(const rr_debug_allocator<U> &a) noexcept: std::allocator<T>(a) { }

    ~rr_debug_allocator() noexcept {
        //printf("  >>>>  ~rr_debug_allocator Destructor\n");
    }
};

Logujemy zajmowanie i zwalnianie pamięci. Nie logujemy wywołań konstruktora kopiującego, konstruktora standardowego oraz destruktora.

Aplikacja testowa

#include "custom_alocator.h"

#include <vector>
#include <list>
#include <set>
#include <unordered_set>

using element_t     = int;
using allocator_t   = rr_debug_allocator< element_t >;

constexpr int ELEMENTS_COUNT = 4;

void test_vector()
{
    printf("test_vector\n");
    std::vector < element_t, allocator_t >  container;
    for( int i=0;i<ELEMENTS_COUNT; ++i)
        container.push_back(i);
}

void test_list()
{
    printf("test_list\n");
    std::list < element_t, allocator_t >  container;
    for( int i=0;i<ELEMENTS_COUNT; ++i)
        container.push_back(i);
}

void test_set()
{
    printf("test_set\n");
    std::set < element_t, std::less<element_t>, allocator_t >  container;
    for( int i=0;i<ELEMENTS_COUNT; ++i)
        container.insert(i);
}

void test_unordered_set()
{
    printf("test_unordered_set\n");
    std::unordered_set < element_t, std::hash<element_t>, std::equal_to<element_t>, allocator_t >  container;
    for( int i=0;i<ELEMENTS_COUNT; ++i)
        container.insert(i);
}

int main() {
    printf("sizeof(void*) = %d\n==============\n",sizeof(void*));
    test_vector();
    printf("\n");
    test_list();
    printf("\n");
    test_set();
    printf("\n");
    test_unordered_set();
    printf("\n");

    return 0;
}

Analiza otrzymanych wyników

Logi dla trybu 32 bit

sizeof(void*) = 4
==============
test_vector
  >>  [4] bytes allocated at [00fba190] for [1] x [4]
  >>  [8] bytes allocated at [00fba000] for [2] x [4]
  >>  [4] bytes deallocated at [00fba190] for [1] x [4]
  >>  [16] bytes allocated at [00fb6e38] for [4] x [4]
  >>  [8] bytes deallocated at [00fba000] for [2] x [4]
  >>  [16] bytes deallocated at [00fb6e38] for [4] x [4]
 
test_list
  >>  [12] bytes allocated at [00fb6e38] for [1] x [12]
  >>  [12] bytes allocated at [00fb0568] for [1] x [12]
  >>  [12] bytes allocated at [00fb0580] for [1] x [12]
  >>  [12] bytes allocated at [00fba648] for [1] x [12]
  >>  [12] bytes deallocated at [00fb6e38] for [1] x [12]
  >>  [12] bytes deallocated at [00fb0568] for [1] x [12]
  >>  [12] bytes deallocated at [00fb0580] for [1] x [12]
  >>  [12] bytes deallocated at [00fba648] for [1] x [12]
 
test_set
  >>  [20] bytes allocated at [00fb0568] for [1] x [20]
  >>  [20] bytes allocated at [00fb0588] for [1] x [20]
  >>  [20] bytes allocated at [00fba648] for [1] x [20]
  >>  [20] bytes allocated at [00fba668] for [1] x [20]
  >>  [20] bytes deallocated at [00fba668] for [1] x [20]
  >>  [20] bytes deallocated at [00fba648] for [1] x [20]
  >>  [20] bytes deallocated at [00fb0588] for [1] x [20]
  >>  [20] bytes deallocated at [00fb0568] for [1] x [20]
 
test_unordered_set
  >>  [8] bytes allocated at [00fba0c0] for [1] x [8]
  >>  [8] bytes allocated at [00fba0d0] for [2] x [4]
  >>  [8] bytes allocated at [00fba100] for [1] x [8]
  >>  [20] bytes allocated at [00fb0568] for [5] x [4]
  >>  [8] bytes deallocated at [00fba0d0] for [2] x [4]
  >>  [8] bytes allocated at [00fba000] for [1] x [8]
  >>  [8] bytes allocated at [00fba030] for [1] x [8]
  >>  [8] bytes deallocated at [00fba030] for [1] x [8]
  >>  [8] bytes deallocated at [00fba000] for [1] x [8]
  >>  [8] bytes deallocated at [00fba100] for [1] x [8]
  >>  [8] bytes deallocated at [00fba0c0] for [1] x [8]
  >>  [20] bytes deallocated at [00fb0568] for [5] x [4]

Logi dla trybu 64 bit

sizeof(void*) = 8
==============
test_vector
  >>  [4] bytes allocated at [0000000000b514e0] for [1] x [4]
  >>  [8] bytes allocated at [0000000000b51500] for [2] x [4]
  >>  [4] bytes deallocated at [0000000000b514e0] for [1] x [4]
  >>  [16] bytes allocated at [0000000000b514e0] for [4] x [4]
  >>  [8] bytes deallocated at [0000000000b51500] for [2] x [4]
  >>  [16] bytes deallocated at [0000000000b514e0] for [4] x [4]

test_list
  >>  [24] bytes allocated at [0000000000b514e0] for [1] x [24]
  >>  [24] bytes allocated at [0000000000b51500] for [1] x [24]
  >>  [24] bytes allocated at [0000000000b51520] for [1] x [24]
  >>  [24] bytes allocated at [0000000000b51540] for [1] x [24]
  >>  [24] bytes deallocated at [0000000000b514e0] for [1] x [24]
  >>  [24] bytes deallocated at [0000000000b51500] for [1] x [24]
  >>  [24] bytes deallocated at [0000000000b51520] for [1] x [24]
  >>  [24] bytes deallocated at [0000000000b51540] for [1] x [24]

test_set
  >>  [40] bytes allocated at [0000000000b514e0] for [1] x [40]
  >>  [40] bytes allocated at [0000000000b51510] for [1] x [40]
  >>  [40] bytes allocated at [0000000000b51540] for [1] x [40]
  >>  [40] bytes allocated at [0000000000b51570] for [1] x [40]
  >>  [40] bytes deallocated at [0000000000b51570] for [1] x [40]
  >>  [40] bytes deallocated at [0000000000b51540] for [1] x [40]
  >>  [40] bytes deallocated at [0000000000b51510] for [1] x [40]
  >>  [40] bytes deallocated at [0000000000b514e0] for [1] x [40]

test_unordered_set
  >>  [16] bytes allocated at [0000000000b514e0] for [1] x [16]
  >>  [16] bytes allocated at [0000000000b51500] for [2] x [8]
  >>  [16] bytes allocated at [0000000000b51520] for [1] x [16]
  >>  [40] bytes allocated at [0000000000b51540] for [5] x [8]
  >>  [16] bytes deallocated at [0000000000b51500] for [2] x [8]
  >>  [16] bytes allocated at [0000000000b51500] for [1] x [16]
  >>  [16] bytes allocated at [0000000000b51570] for [1] x [16]
  >>  [16] bytes deallocated at [0000000000b51570] for [1] x [16]
  >>  [16] bytes deallocated at [0000000000b51500] for [1] x [16]
  >>  [16] bytes deallocated at [0000000000b51520] for [1] x [16]
  >>  [16] bytes deallocated at [0000000000b514e0] for [1] x [16]
  >>  [40] bytes deallocated at [0000000000b51540] for [5] x [8]

Podsumowanie

Przeprowadzony eksperyment spełniał następujące warunki:

  • mamy N elementów do wstawiania.
  • kontenery były tworzone z domyślnymi ustawieniami.
  • kompilator MinGW-w64 (w64 4.0).

std::vector

Element w kontenerze zajmuje dokładnie tyle samo pamięci co element poza kontenerem. W najgorszym przypadku całkowita pamięć zaalokowana przez kontener będzie równa

std::list

Element w kontenerze potrzebuje dwóch wskaźników na kolejny i poprzedni węzeł listy.

std::set

Dla każdego elementu kontener tworzy nowy węzeł drzewa. Każdy taki węzeł potrzebuje przechować dwa wskaźniki, przekazaną wartość oraz pewne wartości zależne od implementacji (głębokość obydwu poddrzew potrzebna do balansowania drzewem)

std::unordered_set

Wewnętrzna implementacja kontenera std::unordered_set, w mojej wersji gcc, to połączenie klasy vector i list. Każdy element to węzeł listy jednokierunkowej, z tą jednak różnicą, że zamiast jednej listy mamy pewną ilość list K. Wartość K jest zależna od ilości elementów w kontenerze, podobnie jak ma to miejsce w klasie vector. Wraz z dodawaniem nowych elementów rozmiar K jest zwiększany. Przybliżoną wartość dla najgorszego przypadku możemy policzyć następująco.