Coding challenge learning note (1)

Sun, Jun 20, 2021 7-minute read python

 

As I mentioned in the last post, I was doing a 10-days python coding bootcamp at the end of May. Now it’s time to summarize what I have learnt. This post is part 1 of the learning note, and three main topics are covered here: list/tuple/string, class and file reading. I would say the note is not something that one would learn from a regular python tutorial, it falls into the category that you learn it only after you have come across it in a practice, at least that’s the case for me. Hope everyone can find something helpful or refreshing from the post.

 

🌀 About list, tuple, string

— 1. lst[-0:] is the same thing as lst[0:], so be careful when slicing the last n items of a list lst[-n:]. OR if the iterable is large, we can use:

for item in iterable:
        items = [*items[-(n-1):], item]

— 2. A slice object is what Python creates when you use the slicing notation. When you say lst[-n:], Python essentially converts that to lst[slice(-n, None)].

— 3. When you append to a deque which has reached it’s maximum length, it will efficiently remove the item closest to the beginning before appending the new item. So this is an efficient way to keep track of the most recent n items we’ve seen.

dq = collections.deque(maxlen=n)
dq.append(item)
list(dq)

OR deque objects can be passed an iterable of items to initialize themselves with, so we can max a deque with a iterable and setting the maximum length.

list(deque(iterable, maxlen=n))

— 4. Tuples can be hashable so we can pass a generator expression into the tuple constructor inside a set comprehension to make a set of tuples.

matrix_shapes = {
        tuple(len(r) for r in matrix)
        for matrix in matrices
    }

— 5. Looping over None would fail and adding None to a number would fail too, cause TypeError.

— 6. That raise X from Y syntax we’re using is a Python 3 feature to make tracebacks more clear.

— 7. Whenever you’re using a list comprehension to create a list that will only be looped over once, you could make a generator expression instead.

— 8. Whenever you see for x in iterable: yield x you can instead write yield from iterable.

— 9. re.findall(r'\b(\d+)-(\d+)\b', string). Finding all sets of consecutive digits separate by a hyphen which are at the ends of “words” (meaning they’re at the beginning/end of the string or they have a comma, space, or other non-word character before/after them). \b are word breaks.

re.findall(r'(\d+)(?:-(\d+))?', string). Search for any number of digits, optionally followed by a dash and more consecutive digits. We’re capturing the two groups of consecutive digits here using those parenthesis. (?:-(\d+)) indicates the group (-(\d+)) is non-capturing, but we capture the group (\d+) inside, i.e the part after the dash.

— 10. .partition() VS .split(). .partition() always return a list of 3 items, while the number of return items by .split() varies. The partition method on strings partitions the string by splitting on the given separator and returns back the part before the partition (the head), the separator itself, and the part after the partition (the tail). If no separator is found, an empty string is returned for the partition and the tail.

— 11. int(b or a) means int(b) if b else int(a).

— 12. list comprehensions can’t have assignments in them.

[x for x in 17] # gives error

[x for x in [17]] # works

— 13. Check if a string starts with certain characters 'words'.startswith('wo').

— 14. Return iterable in the same order but with duplicates removed, use dict.formkeys.

dict.fromkeys(iterable).keys()

— 15. Tuples are compared deeply.

(1,2,3) == (2,3,4)

 

🌀 About class

— 1. Use methods __repr__(self) to set useful string representation for a class. We don’t need to implement __str__, the other string representation. By default __str__ relies on __repr__, so if they’re the same we only need to define __repr__.

def __repr__(self):
        return f"string representation" # use f-string, format or %

— 2. For a class, if we want to set attribute attr_b to change whenever attribute attr_a change, we can use property decorator, they’re Python’s preferred equivalent to getter and setter methods.

def __init__(self, attr_a):
      self.attr_a = attr_a
 
@property
def attr_b(self):
      return self.attr_a * 2

— 3. For a class, if we want to set a value to property attr_b and at the same time to make sure the attribute attr_a will automatically change appropriately based on the set value, we can make a setter for the property. When you make a property without a setter method, attempting to set the property will raise an AttributeError automatically.

def __init__(self, attr_a):
      self.attr_a = attr_a
      
# a getter
@property 
def attr_b(self):
      return self.attr_a * 2

# a setter
@attr_b.setter  
def attr_b(self, attr_b):
      self.attr_a = attr_b / 2

— 4. For a class, if we want to validate a non-nagative attribute attr_a, we can use property and setter decorators. Note that we we’re using a _attr_a to store the actual attr_a now. We’re doing this for two reasons: 1. We can’t use a attr_a attribute to store this data because <class>.attr_a will call our attr_a property and that will look up self.attr_a which looks up the same property infinitely so we’d get a recursion error. 2. That _ is a convention in Python that means “this attribute is non-public by convention, so don’t touch it unless you know what you’re doing”. There’s no such thing as private attributes in Python, but an underscore prefix is often used to declare an attribute as being an internal implementation detail, not to be touched by folks outside this class implementation.

def __init__(self, attr_a):
      self.attr_a = attr_a
 
@property 
def attr_a(self):
      return self._attr_a
 
@attr_a.setter
def attr_a(self, attr_a):
      if attr_a < 0:
          raise ValueError('attr_a cannot be negative')
      self._attr_a = attr_a

— 5. Operator overload

class compare:

    def __init__(self, x, y):
        self.x = x
        self.y = y

    def __iter__(self): # make object to be iterable, so we can do tuple unpacking (multiple assignment)
        yield from (self.x, self.y) 
    
    def __eq__(self, other):
        return (self.x, self.y) == (other.x, other.y)
        # return tuple(self) == tuple(other) # if object is iterable

    def __add__(self, other):
        return compare(self.x+other.x, self.y+other.y)

    def __sub__(self, other):
        return compare(self.x-other.x, self.y-other.y)

    def __mul__(self, scalar):
        return compare(scalar*self.x, scalar*self.x)

    def __rmul__(self, scalar):
        return self.__mul__(scalar)

 

🌀 About read file

— 1. When writing CSV files in Python 3, using newline='' is recommended. This will make line endings consistent with Window system (\r\n) and Mac system (\n). Also when write files using print, better to set end='', so no extra line endings are added.

 with open(file, mode='rt', newline='') as f:
     lines = []
     for line in f:
         lines.append(f)

 with open(file, mode='wt', newline='') as f:
     for line in lines:
         f.write(line) 
         # OR 
         print(line, end='', file=f) # needs to set end='', so no extra line endings are added

— 2. Read file (binary/string)

with open(file, newline='') as f:

    f.read() # read in character
    f.read().splitlines() # split by newline character, each line has NO newline character attached
    f.readlines() # read in lines, each line has new line character 
    f # same as f.readlines()
    f.readline() # read one line

for line in sys.stdin.buffer: # binary input
    sys.stdout.buffer.write(line) # binary output

— 3. Null context manager can be used in the context manager when file is None.

from contextlib import nullcontext
import sys

with open(file, mode='wt') if file else nullcontext() as f:
    for line in sys.stdin: # standard input
        sys.stdout.write(line) # standard output
        if f:
            f.write(line) # write file

— 4. module argparse, a good command-line parsing tool, can handle both optional (’–flag’) and positional arguments. Default value for the arguments is None (if they are not specified)

from argparse import ArgumentParse, FileType

parser = ArgumentParse()
parser.add_argument('<positional_arg>') # add positional argument
parser.add_argument('<files>', nargs='*', type=FileType('wt')) # add file object
parser.add_argument('--<optional_arg>', dest='<optional_name>', default='<>') # add optional/keyword argument 
parser.add_argument('--<optional_arg>', dest='<optional_name>', default='<>', action='store_const', const='<value>') 
args = parser.parse_args()

args.<positional_arg>
args.<optional_name>

for f in args.<files>:
    f.write()

— 5. Use csv module’s Sniffer class to automatically detect (or attempt to detect) the format of the input file. Files are iterators which means they’re stateful: they keep track of where we are in them as we loop over them.

with open(file, newline='') as f:

    dialect = csv.Sniffer().sniff(f.read())
    f.seek(0)
    reader = csv.reader(f, dialect)
    rows = list(reader)