Custom Data Structure for File MD5 List
-
Hi I'm looking for a little advice for a custom data structure I need to use to store file MD5s and the corresponding files that have those MD5s. I need to be able to look up MD5s quickly and see which files have those MD5s.
My current approach is to create a QVector of std::pairs with the first pair field the MD5 as a QByteArray and the second pair field a QLinkedList of files represented with QStrings of their paths. Initially from no exisiting QVector I would first add all the MD5/file path pairs to the QVector (without filled linked lists since there's no way to search quickly and add to the linked lists until things are sorted), then do an std::sort, then iterate through the QVector and create QLinkedLists of all the file paths that have the same MD5s and removing the excess QVector elements for file paths with the same MD5. After that initial construction I can then easily look up MD5s and file paths quickly, add file paths to existing MD5 linked lists, and add new MD5s with file path lists easily and sort again after new MD5 additions.
I was thinking a QMap might work well except lookup would be slow O(n) IIRC.
There will be ~50,000 files or so total per MD5 list and maybe 30,000 unique MD5s.
Can anybody think of a better way to handle this situation or to improve on it?
Thanks!
-
Hi,
What about QMultiHash ?
-
Hi I'm looking for a little advice for a custom data structure I need to use to store file MD5s and the corresponding files that have those MD5s. I need to be able to look up MD5s quickly and see which files have those MD5s.
My current approach is to create a QVector of std::pairs with the first pair field the MD5 as a QByteArray and the second pair field a QLinkedList of files represented with QStrings of their paths. Initially from no exisiting QVector I would first add all the MD5/file path pairs to the QVector (without filled linked lists since there's no way to search quickly and add to the linked lists until things are sorted), then do an std::sort, then iterate through the QVector and create QLinkedLists of all the file paths that have the same MD5s and removing the excess QVector elements for file paths with the same MD5. After that initial construction I can then easily look up MD5s and file paths quickly, add file paths to existing MD5 linked lists, and add new MD5s with file path lists easily and sort again after new MD5 additions.
I was thinking a QMap might work well except lookup would be slow O(n) IIRC.
There will be ~50,000 files or so total per MD5 list and maybe 30,000 unique MD5s.
Can anybody think of a better way to handle this situation or to improve on it?
Thanks!
QVector
in recent Qt versions is just a different name forQList
.
Have to triedQHash
?! Or evenQMultiHash
?
The lookup might be faster the more data you store compared toQMap
and since ordering your data should not matter (as far as I understand) because you map it with your string anyway, it is worth considering.damn, @SGaist had the same idea but expressed it in a more "compact" way, though faster :-)