Parsing a Malaysian Identity Card

Friday, June 27, 2014

Regular expressions can often be really handy. I will use the example of parsing a Malaysian identity card (aka Mykad), the current format of which is as follows:

YYMMDD-BP-nnnG

where

  • YYMMDD represents the birthdate following the ISO8601:2000 format
  • BP is a 2 digit number code denoting the place of birth
  • nnnG is a randomly generated serial number. The last digit G represents the gender
    • odd for Male
    • even for female

We could easily use the string object split method to parse the IC on the '-' character. However very often the IC number is stored in a database with this character stripped off. A regular expression is flexible as we can parse both forms.

import re

pat = r"""
\b                      # word boundary
(?P<birthdate>\d{6})    # named group capture of birthdate, six digits
-?                      # optional -
(?P<birthplace>\d{2})   # named group, birthplace, 2 digits
-?                      # optional -
\d{3}                   # next 3 digits
(?P<gender>\d)          # capture last digit representing gender
\b                      # word boundary
"""

vpo = re.compile(pat, re.VERBOSE)

codes = [('01', '21', '22', '23', '24'), ('02', '25', '26', '27'), ('03', '28', '29'),
         ('04', '30'), ('05', '31', '59'), ('06', '32', '33'), ('07', '34', '35'),
         ('08', '36', '37', '38', '39'), ('09', '40'), ('10', '41', '42', '43', '44'),
         ('11', '45', '46'), ('12', '47', '48', '49'), ('13', '50', '51', '52', '53'),
         ('14', '54', '55', '56', '57'), ('15', '58'), ('16',), ('82',)]

# place of birth
place = ('Johor', 'Kedah', 'Kelantan', 'Malacca', 'Negri Sembilan', 
'Pahang', 'Penang',  'Perak',  'Perlis', 'Selangor', 'Trengganu', 'Sabah', 
'Sarawak', 'Kuala Lumpur', 'Labuan', 'Putrajaya', 'Unknown')

get_gender = lambda n : 'Male' if int(n) % 2 else 'Female'

def get_place(code):
     for i, item in enumerate(codes):
         if code in item:
             return place[i]
     return None

def parse_ic(ic):
    m = vpo.search(ic)
    if m:
        return(m.group('birthdate'), 
               get_place(m.group('birthplace')), 
               get_gender(m.group('gender')))

if __name__ == '__main__':
     ic = '850521-22-3454'
     print parse_ic(ic)
     ic = '850521223454'
     print parse_ic(ic)

Comments closed for this post.