d2jsp
Log InRegister
d2jsp Forums > Off-Topic > Computers & IT > Programming & Development > Which Container Provides The Quickest Search?
Add Reply New Topic New Poll
Member
Posts: 1,358
Joined: Dec 30 2012
Gold: 0.10
Nov 18 2014 03:42am
basically what i'm going to be doing is storing a list of url md5 checksums in a container of some sort.

im trying to figure out which container i could iterate over the fastest to find a matching checksum. the container will be searched over quiet frequently and will potentially grow quiet large (probably >= 100k)

the container doesn't need to provide any special operations because ill only be inserting and searching through it.

any input on which container would be the best to use? :)

Member
Posts: 4,372
Joined: Jul 2 2009
Gold: 3,630.00
Nov 18 2014 07:14am
You might want to check bloom filters ??
There are C++ implementations of it :)
Member
Posts: 32,925
Joined: Jul 23 2006
Gold: 3,804.50
Nov 18 2014 08:38am
is there a reason you dont wanna use a simple hashtable/dictionary?
Member
Posts: 1,358
Joined: Dec 30 2012
Gold: 0.10
Nov 18 2014 03:51pm
Quote (Futu_Re @ Nov 18 2014 05:14am)
You might want to check bloom filters ??
There are C++ implementations of it :)


Thanks! I looked into it and it seems using a bloom filter might be a good idea. However, i don't like the idea of having to hash data 5-10+ times in order to reduce the chance of running into clashes..

Quote (carteblanche @ Nov 18 2014 06:38am)
is there a reason you dont wanna use a simple hashtable/dictionary?


Not really. And are you referring to something like std::map/std::unordered_map?

I have considered it.. but wasn't sure if there was a better option.

I was hoping to get directed towards a few different containers which would be optimal for what im doing, then look into those and decide which to use.

This post was edited by SelfTaught on Nov 18 2014 03:52pm
Member
Posts: 32,925
Joined: Jul 23 2006
Gold: 3,804.50
Nov 18 2014 04:39pm
Quote (SelfTaught @ Nov 18 2014 04:51pm)


Not really. And are you referring to something like std::map/std::unordered_map?

I have considered it.. but wasn't sure if there was a better option.

I was hoping to get directed towards a few different containers which would be optimal for what im doing, then look into those and decide which to use.


i assume std::map is a hashtable, but i'm not a c/c++ guy. hashtables offer O(1) performance given a key, which is what you want. i can only think of two scenarios where you can find something faster: 1) if you can exploit the data/requirements to custom make a datastructure (ex: bitvector) or 2) the data you have won't fit in memory since there's so much of it.

if you're writing an app where every millisecond matters, you might have to create your own datastructure instead of using a standard one. standard ones often have a bit of bloatness to it (eg: thread safety) which you can remove to make it faster for your use. it might also add a bit of bloatness for things like removal or key traversal which you dont need. you might also want to override the default hash function based on your data so it performs faster.

if memory becomes important, you can do tricks like not storing the entire checksum, but only a piece of it. eg: first N digits. if your hashtable doesnt find it, then it's clearly not there (eg: no false negatives). if your hashtable does find it, maybe it's there, maybe it's not (false positive), and you can execute another piece of code that takes longer to run (eg: check the disk). you can run some experiments and see if it's useful or not.
Member
Posts: 1,358
Joined: Dec 30 2012
Gold: 0.10
Nov 20 2014 04:34pm
I just went with std::map, thanks for the input
Go Back To Programming & Development Topic List
Add Reply New Topic New Poll