Wednesday, November 13, 2013

Codes : Python control flow looping through dates and months

While practicing data analysis, I came across a small piece of algorithm. I don't usually use Python, but for data analysis, it is preferred, so this part is iterated in Python.

This post is not for teaching, it's just to showcase my thought process, and for my future preference. It's a writing practice too. It is meant to be updated in the future when better solutions become available.

The prompt is to iterate over all the dates in 2009 and the months. Python has some built in functions per StackOverflow

What if we are building it from scratch, how would you use control flow to iterate through, just the simple and crisp way.

How would you solve it? How do you get the console to automatically print out each month and date of 2009 without giving any month extra days. For example Feb can never have 31st.


It's a great control flow mini example because not all months are created equal.

  • February is unique, in 2009, it only has 28 days
  • For months equal to or below 7, odd months have 31 days, and even months have 30 days
  • For months greater than 7, odd months have 30 days, and even months have 31 days (so it's reversed)
So there are 5 cohorts / cases. But the terrible thing is that there are 5 check points for each of the months.

for x in range (1,13):
 print x
 if (x == 2):
  print "x equals to 2"
 elif (x <= 7 and x%2 == 0):
  print "x smaller than or equal to 7 and x is even"
 elif (x <= 7 and x%2 != 0):
  print "x smaller than or equal to 7 and x is odd"
 elif (x > 7 and x%2 == 0):
  print "x is greater than 7 and x is even"   
  print "x is greater than 7 and x is odd" 

Another Pythonist used some predefined scenarios instead of checking each:

for m in range(1,3):
 for d in range(1,32):
  if (m==2 and d > 28):
  elif(m in [4,6,9,11] and d > 30):

  timestamp = '2009' + str(m) + str(d)
  print timestamp

Now this is really a different way to solve it: one must iterate through the months then dates, finally cleverly break out of the dates when certain condition is met. But it's in certain sense more elegant, because the exceptions are clearer: m==2 and 4, 6, 9, 11. Versus the previous case, there were 5 cohorts, and you have to constantly think about which cohort to fall into, also more rooms for typos on > or < signs.

See more examples on Python control flow

The stackflow examples are better in many senses too, but seeing a date problem as a simple control flow example has its merits.

When I asked college friend Roger, he also used the datetime module. Except his answer is super crispy, I like it a lot! So I am sharing it here:

import datetime

startDate = 2009, 1, 1 )
endDate = 2010, 1, 1 )
dayDelta = datetime.timedelta( days=1 )

while startDate < endDate:
   print startDate
   startDate += dayDelta

No comments:

Post a Comment

K mean clustering sklearn best practice - Udacity Machine Learning Nanodegree Unsupervised Learning

There are three key k means clustering parameters in sklearn that you will need to pay attention to: Number of centroids, aka center of c...