Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Groups
  • Search
  • Get Qt Extensions
  • Unsolved
Collapse
Brand Logo
  1. Home
  2. Qt Development
  3. General and Desktop
  4. Custom Data Structure for File MD5 List
QtWS25 Last Chance

Custom Data Structure for File MD5 List

Scheduled Pinned Locked Moved Unsolved General and Desktop
4 Posts 3 Posters 64 Views
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • C Offline
    C Offline
    Crag_Hack
    wrote last edited by Crag_Hack
    #1

    Hi I'm looking for a little advice for a custom data structure I need to use to store file MD5s and the corresponding files that have those MD5s. I need to be able to look up MD5s quickly and see which files have those MD5s.

    My current approach is to create a QVector of std::pairs with the first pair field the MD5 as a QByteArray and the second pair field a QLinkedList of files represented with QStrings of their paths. Initially from no exisiting QVector I would first add all the MD5/file path pairs to the QVector (without filled linked lists since there's no way to search quickly and add to the linked lists until things are sorted), then do an std::sort, then iterate through the QVector and create QLinkedLists of all the file paths that have the same MD5s and removing the excess QVector elements for file paths with the same MD5. After that initial construction I can then easily look up MD5s and file paths quickly, add file paths to existing MD5 linked lists, and add new MD5s with file path lists easily and sort again after new MD5 additions.

    I was thinking a QMap might work well except lookup would be slow O(n) IIRC.

    There will be ~50,000 files or so total per MD5 list and maybe 30,000 unique MD5s.

    Can anybody think of a better way to handle this situation or to improve on it?

    Thanks!

    Pl45m4P 1 Reply Last reply
    0
    • SGaistS Offline
      SGaistS Offline
      SGaist
      Lifetime Qt Champion
      wrote last edited by
      #2

      Hi,

      What about QMultiHash ?

      Interested in AI ? www.idiap.ch
      Please read the Qt Code of Conduct - https://forum.qt.io/topic/113070/qt-code-of-conduct

      1 Reply Last reply
      0
      • C Crag_Hack

        Hi I'm looking for a little advice for a custom data structure I need to use to store file MD5s and the corresponding files that have those MD5s. I need to be able to look up MD5s quickly and see which files have those MD5s.

        My current approach is to create a QVector of std::pairs with the first pair field the MD5 as a QByteArray and the second pair field a QLinkedList of files represented with QStrings of their paths. Initially from no exisiting QVector I would first add all the MD5/file path pairs to the QVector (without filled linked lists since there's no way to search quickly and add to the linked lists until things are sorted), then do an std::sort, then iterate through the QVector and create QLinkedLists of all the file paths that have the same MD5s and removing the excess QVector elements for file paths with the same MD5. After that initial construction I can then easily look up MD5s and file paths quickly, add file paths to existing MD5 linked lists, and add new MD5s with file path lists easily and sort again after new MD5 additions.

        I was thinking a QMap might work well except lookup would be slow O(n) IIRC.

        There will be ~50,000 files or so total per MD5 list and maybe 30,000 unique MD5s.

        Can anybody think of a better way to handle this situation or to improve on it?

        Thanks!

        Pl45m4P Offline
        Pl45m4P Offline
        Pl45m4
        wrote last edited by Pl45m4
        #3

        @Crag_Hack

        QVector in recent Qt versions is just a different name for QList.
        Have to tried QHash?! Or even QMultiHash?
        The lookup might be faster the more data you store compared to QMap and since ordering your data should not matter (as far as I understand) because you map it with your string anyway, it is worth considering.

        damn, @SGaist had the same idea but expressed it in a more "compact" way, though faster :-)


        If debugging is the process of removing software bugs, then programming must be the process of putting them in.

        ~E. W. Dijkstra

        1 Reply Last reply
        1
        • C Offline
          C Offline
          Crag_Hack
          wrote last edited by
          #4

          @SGaist @Pl45m4 Excellent thanks guys. Now to read up on hash tables.

          1 Reply Last reply
          0

          • Login

          • Login or register to search.
          • First post
            Last post
          0
          • Categories
          • Recent
          • Tags
          • Popular
          • Users
          • Groups
          • Search
          • Get Qt Extensions
          • Unsolved