提问者:小点点

为什么对vector vs unordered_map的基准测试查找在每次运行中产生不一致的结果?


我遇到了一个类似的问题:Timed vector vs map vs unordered_map查找

但是我的情况只是在上的小范围元素(0-100,大部分是0-20)。所以我改了author@gitgregor的代码:

#include <iostream>
#include <vector>
#include <map>
#include <unordered_map>
#include <chrono>
#include <algorithm>

unsigned dummy = 0;

int main()
{
    std::vector<unsigned> v;
    std::unordered_map<unsigned, unsigned> u;

    unsigned elementCount = 1;

    struct Times
    {
        unsigned long long v;
        unsigned long long u;
    };
    std::map<unsigned, Times> timesMap;

    while (elementCount != 100)
    {
        v.clear();
        u.clear();

        elementCount *= 10;
        std::vector<unsigned int> tmp;
        tmp.reserve(elementCount);
        for (unsigned i = 0; i < elementCount; ++i)
        {
            tmp.push_back(std::rand()%50000);
        }
        // fill vector and unmap with random numbers
        for (const auto integer : tmp)
        {
            v.emplace_back(integer);
            u.insert(std::make_pair(integer, 1));
        }
        // fill a testset with 10000 random numbers to test lookup
        std::vector<unsigned> tmp2;
        tmp2.reserve(10000);
        for (int i = 0; i < tmp2.size(); i++)
        {
            tmp2.push_back(std::rand()%50000);
        }

        std::chrono::time_point<std::chrono::steady_clock> start = std::chrono::high_resolution_clock::now();
        for (const auto integer : tmp2)
        {
            auto findItr = std::find(std::begin(v), std::end(v), integer);
            if (findItr != v.end())
            {
                dummy++;
            }
        }
        auto tp0 = std::chrono::high_resolution_clock::now() - start;
        unsigned long long vTime = std::chrono::duration_cast<std::chrono::nanoseconds>(tp0).count();

        start = std::chrono::high_resolution_clock::now();
        for (const auto integer : tmp2)
        {
            const bool res = u[integer] == 0;
            if (res)
            {
                dummy++;
            }
        }
        auto tp1 = std::chrono::high_resolution_clock::now() - start;
        unsigned long long uTime = std::chrono::duration_cast<std::chrono::nanoseconds>(tp1).count();

        timesMap.insert(std::make_pair(elementCount, Times{ vTime, uTime }));
    }

    for (auto& itr : timesMap)
    {
        std::cout << "Element count: " << itr.first << std::endl;
        std::cout << "std::vector time:        " << itr.second.v << std::endl;
        std::cout << "std::unordered_map time: " << itr.second.u << std::endl;
        std::cout << "-----------------------------------" << std::endl;
    }

    std::cout << dummy;
}

我关闭了优化,让随机数来填充,并使用10000个随机数的数字集来测试查找。但结果一点也不一致:

First run:
Element count: 10
std::vector time:        100
std::unordered_map time: 100
-----------------------------------
Element count: 100
std::vector time:        0
std::unordered_map time: 100
-----------------------------------

Second Run:
Element count: 10
std::vector time:        200
std::unordered_map time: 200
-----------------------------------
Element count: 100
std::vector time:        100
std::unordered_map time: 100
-----------------------------------

Third Run:
Element count: 10
std::vector time:        100
std::unordered_map time: 0
-----------------------------------
Element count: 100
std::vector time:        100
std::unordered_map time: 0
-----------------------------------

结果看起来也很奇怪,只有数字:0,100和200。

有人知道为什么吗?


共1个答案

匿名用户

我找到了代码不能正确测量的真正原因,因为您的循环运行了0次,因为tmp2的大小在开始时是0。因此,您测试的对象是来自tmp2的0个整数,因此您几乎只有0次。

我修改了您的代码来修复上面的问题,也修复了一些编译问题,还将整数(迭代)的数量设置为1000万而不是1万,我还计算了平均运行时间(以纳秒为单位)而不是总时间。下面是完整的更正代码。

在网上试试!

#include <iostream>
#include <vector>
#include <map>
#include <unordered_map>
#include <chrono>
#include <algorithm>
#include <cstdint>

volatile size_t dummy_total = 0;

int main()
{
    std::vector<unsigned> v;
    std::unordered_map<unsigned, unsigned> u;

    unsigned elementCount = 1;

    struct Times
    {
        double v = 0;
        double u = 0;
    };
    std::map<unsigned, Times> timesMap;

    while (elementCount != 100)
    {
        size_t dummy = 0;
        v.clear();
        u.clear();

        elementCount *= 10;
        std::vector<unsigned int> tmp;
        tmp.reserve(elementCount);
        for (unsigned i = 0; i < elementCount; ++i)
        {
            tmp.push_back(std::rand()%50000);
        }
        // fill vector and unmap with random numbers
        for (const auto integer : tmp)
        {
            v.emplace_back(integer);
            u.insert(std::make_pair(integer, 1));
        }
        // fill a testset with 10000 random numbers to test lookup
        std::vector<unsigned> tmp2(1000000);
        for (int i = 0; i < tmp2.size(); i++)
        {
            tmp2[i] = std::rand()%50000; 
        }

        std::chrono::time_point<std::chrono::high_resolution_clock> start = std::chrono::high_resolution_clock::now();
        for (const auto integer : tmp2)
        {
            auto findItr = std::find(std::begin(v), std::end(v), integer);
            if (findItr != v.end())
            {
                ++dummy;
            }
        }
        dummy_total = dummy_total + dummy;
        auto tp0 = std::chrono::high_resolution_clock::now() - start;
        double vTime = double(std::chrono::duration_cast<std::chrono::nanoseconds>(tp0).count()) / tmp2.size();

        start = std::chrono::high_resolution_clock::now();
        for (const auto integer : tmp2)
        {
            const bool res = u.count(integer) != 0;
            if (res)
            {
                ++dummy;
            }
        }
        dummy_total = dummy_total + dummy;
        auto tp1 = std::chrono::high_resolution_clock::now() - start;
        double uTime = double(std::chrono::duration_cast<std::chrono::nanoseconds>(tp1).count()) / tmp2.size();

        timesMap.insert(std::make_pair(elementCount, Times{ vTime, uTime }));
    }

    for (auto& itr : timesMap)
    {
        std::cout << "Element count: " << itr.first << std::endl;
        std::cout << "std::vector time:        " << itr.second.v << " ns" << std::endl;
        std::cout << "std::unordered_map time: " << itr.second.u << " ns" << std::endl;
        std::cout << "-----------------------------------" << std::endl;
    }

    std::cout << dummy_total;
}

输出:

Element count: 10
std::vector time:        4.42279 ns
std::unordered_map time: 14.1827 ns
-----------------------------------
Element count: 100
std::vector time:        23.784 ns
std::unordered_map time: 16.8989 ns
-----------------------------------
6738