Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Groups
  • Search
  • Get Qt Extensions
  • Unsolved
Collapse
Brand Logo
  1. Home
  2. International
  3. Japanese
  4. [SOLVED] Need help with regexp for Kanji
QtWS25 Last Chance

[SOLVED] Need help with regexp for Kanji

Scheduled Pinned Locked Moved Japanese
3 Posts 2 Posters 4.7k Views
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • V Offline
    V Offline
    vsorokin
    wrote on 8 Aug 2011, 14:35 last edited by
    #1

    I need check string for Kanji symbols. Can anybody help me build regexp for this?

    Thanks.

    --
    Vasiliy

    1 Reply Last reply
    0
    • T Offline
      T Offline
      takumiasaki
      wrote on 8 Aug 2011, 16:25 last edited by
      #2

      "Unicode Chapter 12":http://www.unicode.org/versions/Unicode5.0.0/ch12.pdf will help you a lot.

      |CJK Unified Ideographs|4E00–9FFF|Common|
      |CJK Unified Ideographs Extension A|3400–4DBF|Rare|
      |CJK Unified Ideographs Extension B|20000–2A6DF|Rare, historic|
      |CJK Unified Ideographs Extension C|2A700–2B73F|Rare, historic|
      |CJK Unified Ideographs Extension D|2B740–2B81F|Uncommon, some in current use|
      |CJK Compatibility Ideographs|F900–FAFF|Duplicates, unifiable variants, corporate
      characters|
      |CJK Compatibility Ideographs Supplement|2F800–2FA1F|Unifiable variants|

      So, range of Kanji(Han) are very roughly U+3400-U+9FFF, U+F900-U+FAFF, and U+20000-U+2FFFF.

      QRegExp:
      @
      QRegExp isHan("([\x3400-\x9FFF\xF900-\xFAFF]|[\xD840-\xD87F][\xDC00-\xDFFF])+");
      @

      Note: This regexp(isHan) doesn't contain CJK Symbols(U+3000 - U+303F), Hiragana(U+3041 - U+309F), or Katakana(U+30A0 - U+30FF).

      • "CJK Symbols and Punctuation":http://www.unicode.org/charts/PDF/U3000.pdf
      • "Hiragana":http://www.unicode.org/charts/PDF/U3040.pdf
      • "Katakana":http://www.unicode.org/charts/PDF/U30A0.pdf

      If you would like to check them, please add them to regexp.

      1 Reply Last reply
      0
      • V Offline
        V Offline
        vsorokin
        wrote on 8 Aug 2011, 17:00 last edited by
        #3

        Thank you, for fast and good answer.

        --
        Vasiliy

        1 Reply Last reply
        0

        1/3

        8 Aug 2011, 14:35

        • Login

        • Login or register to search.
        1 out of 3
        • First post
          1/3
          Last post
        0
        • Categories
        • Recent
        • Tags
        • Popular
        • Users
        • Groups
        • Search
        • Get Qt Extensions
        • Unsolved