Coding challenge May of 2021

Sun, May 23, 2021 5-minute read pythonregex

 

This post may not be very informative, but I feel compelled to write something to keep the blog updated regularly, otherwise the guilty feeling of not keeping the commitment would haunt me all the time 😐. The thing is that I will start a summer internship in June, which will require me to do intensive coding using python. So in order to be better prepared, I decided to set up a 10-days python bootcamp for myself from May 20th to May 29th, during which I will complete one coding assignment each day on Python Morsels website, which I personally think is a great resource to practice coding in pythonic way, especially for people like me who started from R programming then transferred to python.

Though it’s not recommended to binge-practicing those coding exercises, I think that 10 assignments for 10 days is a pretty reasonable bootcamp plan, taken into account that the starting date of the internship is right at the corner and that I am being available recently. Typically it would take me about one to two hours to finish one exercise, including writing answers, checking solutions, and reading those reference materials. The coding exercises are different from those algorithmic problems one would encounter on platforms such as Leetcode, they focus on pythonic mindset and code readability. I don’t have any plan on which 10 questions to practice on, I will just keep chill and randomly choose them, and I will comeback after the bootcamp to write another post to report what I have learnt.

As you may have noticed, the bootcamp started on May 20th, but at the time when I write the post, it’s already May 23rd. So yes, I am glad to announce that the plan is going smoothly into the fourth day as planned 😎, and I do have something to share. One of the assignment I completed involves using regex module, and there is a great tutorial on this topic, accompanied with some very useful exercises. I have prepared solutions for those exercises, as shown below. Details of the exercises descriptions can be found by clicking the tutorial link above.

 

1. Validation Exercises

import re
import calendar

def has_vowel(string):
    """Return True iff the string contains one or more vowels."""
    return bool(re.search(r'[aeiou]',string))


def is_integer(string):
    """Return True iff the string represents a valid integer."""
    return bool(re.search(r'^-?[0-9]+$',string))
    #######
    # note: put -? after ^
    #######

def is_fraction(string):
    """Return True iff the string represents a valid fraction."""
    return bool(re.search(r'^-?[0-9]+/[0]*[1-9]+[0-9]*',string))
    #######
    # note: put -? after ^, and the expression for at least one not equal to zero
    #######

def is_valid_date(string):
    """Return True iff the string represents a valid YYYY-MM-DD date."""
    if bool(re.search(r'^\d{4}-\d{2}-\d{2}$', string)):
        year, month, day = string.split('-')
        if int(month)<=12 and int(month)>=1:
            ym_days = calendar.monthrange(int(year), int(month))[1]
            return True if (int(day) >=1 and int(day) <= ym_days) else False
        else:
            return(False)
    return(False)


def is_number(string):
    """Return True iff the string represents a decimal number."""
    return(bool(re.search(r'^-?(\d+\.?\d*|\d*\.?\d+)$',string)))
    #######
    # note: the test case of '.' is false
    #######


def is_hex_color(string):
    """Return True iff the string represents an RGB hex color code."""
    return(bool(re.search(r'^#([0-9a-f]{3}|[0-9a-f]{6})$',string, re.IGNORECASE)))
    #######
    # note: (|) has to be between # and $
    #######

 

2. Search Exercises

with open('dictionary.txt') as dict_file:
    dictionary = dict_file.read()


def get_extension(filename):
    """Return the file extension for a full file path."""
    m = re.search(r'\.(\w+)$',filename)
    return(m.group(1))


def tetravocalic(dictionary=dictionary):
    """Return a list of all words that have four consecutive vowels."""
    return(re.findall(r'\b\w*[aeiou]{4}\w*\b', dictionary, re.IGNORECASE))

def hexadecimal(dictionary=dictionary):
    """Return a list of all words consisting solely of the letters A to F."""
    return(re.findall(r'\b[A-F]+\b', dictionary, re.IGNORECASE))


def hexaconsonantal(dictionary=dictionary):
    """Return a list of all words with six consecutive consonants."""
    return(re.findall(r'\b\w*[^aeiouy]{6}\w*\b', dictionary, re.IGNORECASE))

def possible_words(partial_word, dictionary=dictionary):
    """
    Return possible word matches from a partial word.

    Underscores in partial words represent missing letters.  Examples:
        C_T (cat, cot, cut)
        _X_ (axe)
    """
    return(re.findall(r'\b'+partial_word.replace('_',r'\w')+r'\b', dictionary, re.IGNORECASE))


def five_repeats(letter, dictionary=dictionary):
    """Return all words with at least five occurrences of the given letter."""
    rep_letter = letter + r'\w*'
    return(re.findall(r'\b\w*'+rep_letter*5+r'\b', dictionary, re.IGNORECASE))

def abbreviate(phrase):
    """Return an acronym for the given phrase."""
    take_letters = re.findall(r'\b(\w)[a-z]*([A-Z])?[a-z]*\b', phrase)
    return ''.join(map(''.join, take_letters)).upper()
    ########
    ## note: the test case of 'JavaScript Object Notation' is JSON
    ########


def palindrome5(dictionary=dictionary):
    """Return a list of all five letter palindromes."""
    return [m.group() for m in re.finditer(r'\b(\w)(\w)\w\2\1\b', dictionary)]


def double_double(dictionary=dictionary):
    """
    Return words with a double repeated letter with one letter between.

    Example double double words:
    - freebee
    - assessed
    - voodoo
    """
    return [m.group() for m in re.finditer(r'\b\w*(\w)\1\w\1\1\w*\b', dictionary)]


def repeaters(dictionary=dictionary):
    """
    Return words that consist of the same letters repeated two times.

    Example double double words:
    - tutu
    - cancan
    - murmur
    """
    return [m.group() for m in re.finditer(r'\b(\w+)\1\b', dictionary)]

 

3. Substitution Exercises

def normalize_jpeg(filename):
    """Return the filename with jpeg extensions normalized."""
    return re.sub(r'jpe?g', 'jpg', filename, flags = re.IGNORECASE)
    #######
    ## note: the flags argument
    ## https://stackoverflow.com/questions/42581/python-re-sub-with-a-flag-does-not-replace-all-occurrences
    #######

def normalize_whitespace(string):
    """Replace all runs of whitespace with a single space."""
    return re.sub(r'\s+', r' ', string)

def normalize_domain(string):
    """Normalize all instances of treyhunner.com URLs."""
    def replace_domain(match):
        h, w, tcom, rest = match.groups()
        return string if tcom is None else ''.join(('https://', tcom, rest))
    domain_re = re.compile(r'^(\w*://)?(www.)?(treyhunner.com)?(.*)')
    return domain_re.sub(replace_domain, string)
    #######
    ## note: is substitution function necessary?
    #######

def convert_linebreaks(string):
    """Convert linebreaks to HTML."""
    s = re.sub(r'\n{2,}', r'</p><p>', string)
    s = re.sub(r'\n', r'<br>', s)
    return '<p>'+s+'</p>'

 

4. Lookahead Exercises

def have_all_vowels(dictionary=dictionary):
    """Return all words at most 9 letters long that contain all vowels."""
    m = re.findall(r'\b(?=\w*?a)(?=\w*?e)(?=\w*?i)(?=\w*?o)(?=\w*?u)\w{,9}\b',dictionary)
    return [x for x in m if (x or None) is not None]
    ########
    ## note:
    ## https://stackoverflow.com/questions/54267095/what-is-the-regex-to-match-the-words-containing-all-the-vowels
    ## use (x or None) to replace '' with None, because bool('') is false
    ## cannot use  # m = re.findall(r'\b(?=[^a]*?a)(?=[^e]*?e)(?=[^i]*?i)(?=[^o]*?o)(?=[^u]?u)\w{,9}\b',dictionary)
    ########

def no_repeats(dictionary=dictionary):
    """Return all words with 10 or more letters and no repeating letters."""
    return [m.group() for m in re.finditer(r'\b(?:([a-zA-Z])(?!.*\1)){10,}\b', dictionary)]
    ########
    ## note:
    ## https://stackoverflow.com/questions/51358885/regex-no-character-should-repeat/51359047
    ########