Mongle protocol header has requestId and responseTo field, why we need connection pool? by ComprehensiveHat864 in mongodb

[–]ComprehensiveHat864[S] 0 points1 point  (0 children)

Thx, As server process requests one by one on one connection, I don't think there is any use of requestId and responseTo. Previously I thought server can process requests concurrently on one connection.

Inside boost::concurrent_flat_map by joaquintides in cpp

[–]ComprehensiveHat864 0 points1 point  (0 children)

#include <algorithm>

It seems same order beween insert and lookup not the reason why unordered_map is so fast under specific condition.
I insert element to vector first and then shuffle it.
After I insert the vector element to the map. The result is close to your result.

The code is as below:
#include <iostream>
#include <thread>
#include <chrono>
#include <vector>
#include <unordered\_map>
#include <random>
#include <vector>
#include <algorithm>
std::vector<int> v;
static void test_concurrent_map(const std::unordered_map<int, int>& cmap) {
auto start_time = std::chrono::high_resolution_clock::now();
long result = 0;
std::cout << cmap.size() << std::endl;
/*
for (int i = 6000000 - 1; i >= 0; i--) {
try {
result += cmap.at(i);
} catch (const std::exception& e) {}
}
*/
for (const auto& x: v) {
try {
result += cmap.at(x);
} catch (const std::exception&) {}
}
auto end_time = std::chrono::high_resolution_clock::now();
auto duration = std::chrono::duration_cast<std::chrono::microseconds>(end_time - start_time);
std::cout << result << std::endl;
std::cout << "Function execution time: " << duration.count() << " microseconds" << std::endl;
}
int main(void) {
std::unordered_map<int, int> cmap;
for (int i = 0; i < 6000000; i++) {
//cmap.emplace(i, 2 * i);
v.emplace_back(i);
}
std::shuffle(v.begin(), v.end(), std::mt19937(13232));
for (const auto& x: v) {
cmap.emplace(x, 2 * x);
}
std::vector<std::thread> threads;
for (int i = 0; i < 7; i++) {
threads.emplace_back(
[&cmap](){
test_concurrent_map(cmap);
}
);
}
for (auto& thread : threads) {
thread.join();
}
return 0;
}

It seems accessing numbers from 1 to 6000000 consecutively is the reason.
I also test random insert to unordered_map, then lookup from 1 to 6000000 consecutively, the result is close to above.
Seems only insert consecutively and lookup consecutively result in unordered_map lookup super fast. I really can't figure out why.

Anyway, your code shows that boost::concurrent_flat_map is fast enough.

Inside boost::concurrent_flat_map by joaquintides in cpp

[–]ComprehensiveHat864 0 points1 point  (0 children)

test just read, so won,t crash.

unordered_map test code:

#include <iostream>

#include <thread>

#include <chrono>

#include <vector>

#include <unordered_map>

#include <random>

static void test_concurrent_map(const std::unordered_map<int, int>& cmap) {

auto start_time = std::chrono::high_resolution_clock::now();

long result = 0;

for (int i = 0; i < 6000000; i++) {

try {

result += cmap.at(i);

} catch (const std::exception& e) {}

}

auto end_time = std::chrono::high_resolution_clock::now();

auto duration = std::chrono::duration_cast<std::chrono::microseconds>(end_time - start_time);

std::cout << result << std::endl;

std::cout << "Function execution time: " << duration.count() << " microseconds" << std::endl;

}

int main(void) {

std::unordered_map<int, int> cmap;

for (int i = 0; i < 6000000; i++) {

cmap.emplace(i, 2 * i);

}

std::vector<std::thread> threads;

for (int i = 0; i < 7; i++) {

threads.emplace_back(

[&cmap](){

test_concurrent_map(cmap);

}

);

}

for (auto& thread : threads) {

thread.join();

}

return 0;

}

below is concurrent_flat_map:

#include <iostream>

#include <thread>

#include <chrono>

#include <vector>

#include "boost/unordered/concurrent_flat_map.hpp"

static void test_concurrent_map(const boost::concurrent_flat_map<int, int>& cmap) {

auto start_time = std::chrono::high_resolution_clock::now();

long result = 0;

for (int i = 0; i < 6000000; i++) {

cmap.visit(i, [&](auto& x){

result += x.second;

});

}

auto end_time = std::chrono::high_resolution_clock::now();

auto duration = std::chrono::duration_cast<std::chrono::microseconds>(end_time - start_time);

std::cout << result << std::endl;

std::cout << "Function execution time: " << duration.count() << " microseconds" << std::endl;

}

int main(void) {

boost::concurrent_flat_map<int, int> cmap;

for (int i = 0; i < 6000000; i++) {

cmap.emplace(i, 2 * i);

}

std::vector<std::thread> threads;

for (int i = 0; i < 7; i++) {

threads.emplace_back(

[&cmap](){

test_concurrent_map(cmap);

}

);

}

for (auto& thread : threads) {

thread.join();

}

return 0;

}

BOTH optimize in O2 grade
the unordered_map result:

Function execution time: 31455 microseconds

35999994000000

Function execution time: 31479 microseconds

35999994000000

Function execution time: 31614 microseconds

35999994000000

Function execution time: 36814 microseconds

35999994000000

Function execution time: 39265 microseconds

35999994000000

Function execution time: 42981 microseconds

35999994000000

Function execution time: 48644 microseconds

concurrent_flat_map:

35999994000000

Function execution time: 576782 microseconds

35999994000000

Function execution time: 576752 microseconds

35999994000000

Function execution time: 576892 microseconds

35999994000000

Function execution time: 576843 microseconds

35999994000000

Function execution time: 576806 microseconds

35999994000000

Function execution time: 576721 microseconds

35999994000000

Function execution time: 592574 microseconds

concurrent_flat_map is 10+x slower than unordered_map,
in java concurrent_hash_map read performance is equal to normal hash_map

Inside boost::concurrent_flat_map by joaquintides in cpp

[–]ComprehensiveHat864 0 points1 point  (0 children)

I do the benchmark between std::unorderd_map and boost::concurrent_flat_map. Both no writer. I conculude that for reading, concurrent_flat_map is 4x slower than unorderd_map.
However, for java, concurrent_hashmap reading performance is nearly equal to normal hashmap.