Six Is Enough
Ernest Hemingway said that his best work was simply six words long, “For sale: baby shoes, never used.”
About two years ago I read an article on Wired.com that asked authors to write their own six word stories. Ever since then, I’ve thought it would be interesting to have a web site where anyone can write their own six word story, have people rate it, comment on it, etc…
I’ve been working on a startup for a bit now, and needed a good test bed for getting familiar with Django, so I wrote SixIsEnough.com. This has absolutely nothing to do with the startup, but it’s fun to work on.
Some things are still a little rough around the edges, I don’t handle all errors as well as I should, but common things like validating user input, etc… are taken care of. It renders fine in Firefox, Opera, Safari, and IE 6.0… but IE 7.0 has a minor alignment issue. I also need to redo how “Most Popular” stories are calculated, as the current system can easily be skewed. My main focus right now is making it a facebook app too, so your facebook profile will show a random story of yours, or other users of your choosing.
Just thought I’d share one of the many things I’ve been up to recently. My main focus is on the startup right now, but over the next few months expect some other fun little web apps I’m working on like lunchbrake.com to be released too, which I won’t go into details about yet (it’s currently just parked, I don’t have a server behind it).
Simple Regex in Python
I’ve been reading Beautiful Code. It’s a book that I now recommend to every software engineer that asks. It came to my attention after seeing an entry written by Jeff Dean and Sanjay from Google on mapreduce. When I was working at Google, I dealt with Jeff Dean a bit when working on some performance stuff I was involved with. The man is a genius, and it’s hard to touch a piece of code at Google that he hasn’t touched before you.
Every chapter in this book is a gem. The book opens with a chapter titled “A Regular Expression Matcher” by Brian Kernighan, where Brian effectively describes a very basic regular expression implementation written by Rob Pike (who I also had the fortune of speaking with while working at Google). For more details on Rob Pike’s implementation, I recommend buying “Beautiful Code”.
For more info on regular expressions, wikipedia has a good article on them. Rob Pike’s solution is in C, but still only 30 lines of code. It supports . ^ $ * and matching arbitrary characters.
I adapted Rob’s code to python. You might be wondering why I would do that when python already has regular expression support. Well, I get a lot of emails from professors thanking me for code snippets on my site that they’ve used in research or in the classroom, and I thought this would be good for in a classroom. It gives a basic implementation in python for regular expressions, and asking students to extend it to support other regular expression features such as + or character classes (i.e. [abc] or [0-9]) could make for a good assignment.
Here is the code.
# match: search the regexp anywhere in text
def match(regexp, text):
if regexp[0] == '^':
return matchhere(regexp[1:], text)
while True: # must look even if string is empty
if matchhere(regexp, text):
return True
text = text[1:]
if len(text) == 0:
break
return False
# matchhere: search for regexp at beginning of text
def matchhere(regexp, text):
if len(regexp) == 0:
return True
elif len(regexp) >1 and regexp[1] == '*':
return matchstar(regexp[0], regexp[2:], text)
elif regexp[0] == '$' and len(regexp) == 1:
return len(text) == 0
elif len(text) > 0 and (regexp[0] == '.' or regexp[0] == text[0]):
return matchhere(regexp[1:], text[1:])
return False
# matchstar: search for c*regexp at beginning of text
def matchstar(c, regexp, text):
while True: # a * matches zero or more instances
if matchhere(regexp, text):
return True
if len(text) == 0 or ( text[0] != c and c != '.' ):
break
text = text[1:]
return False
Here are some tests you can run:
print match("abc", "abcdefghijklmnopqrstuvwxyz") # True
print match("abd", "abcdefghijklmnopqrstuvwxyz") # False
print match("xyz", "abcdefghijklmnopqrstuvwxyz") # True
print match("^xyz", "abcdefghijklmnopqrstuvwxyz") # False
print match("xyz$", "abcdefghijklmnopqrstuvwxyz") # True
print match("c.*l", "abcdefghijklmnopqrstuvwxyz") # True
print match("c.*b", "abcdefghijklmnopqrstuvwxyz") # False
print match("c*b", "cccccccccb") #True
print match("c*b", "b") # True
Enjoy!
Pagination
One of my side projects that I’ve yet to release required me to implement some pagination code. I’m referring to the same kind of pagination that you see at the bottom of a Google search, those numbers which subsequently take you to the Nth page of results.
In this entry I present 3 approaches to pagination. I’m still not sure which I’ll use in the final release of my project though; they each have their own benefits and drawbacks.
I’ve seen pagination bars implemented many ways. For instance, they might show 10 numbers, where the middle number is in bold and is the current page.
Here is some python code that will generate these ranges for you.
def paginate(current, width, max_page):
""" Creates a list of integers corresponding to the
pages to be displayed in a pagination navigation
bar. The behavior causes the current page to be as
close to the center of the list as possible while
keeping the list the size of @width.
@param current The current page being viewed
@param width The total number of pages to show
in the navigation bar
@param max_page The maximum value for a page in the
navigation bar
@return The list of integers to be used in the
pagination navigation bar.
"""
start = max(1, current-(width/2))
end = min(start+width, max_page+1)
if end == max_page+1:
start = max(1, max_page-width+1)
return range(start, end)
The output of this function when the width is 5 numbers and the total number of pages is 8 is:
paginate(1,5,8): 1 2 3 4 5
paginate(2,5,8): 1 2 3 4 5
paginate(3,5,8): 1 2 3 4 5
paginate(4,5,8): 2 3 4 5 6
paginate(5,5,8): 3 4 5 6 7
paginate(6,5,8): 4 5 6 7 8
paginate(7,5,8): 4 5 6 7 8
paginate(8,5,8): 4 5 6 7 8
This is more or less what Google does, and most pagination implementations I’ve seen do. The one difference in Google’s is that they start with showing 10 numbers, and grow it to 20 numbers before they center the number. With a little modification to the code, we can get Google’s behavior:
def paginate_google_style(current, start_width, max_width, max_page):
""" Creates a list of integers corresponding to the
pages to be displayed in a pagination navigation
bar. The behavior follows that of Google's search
results. That is, the width of the navigation bar is
@start_width when @current == 1 and the width of the
navigation bar grows by one with each increment of
@current until the width is >= @max_width, at which
point the width is set to @max_width. The list is
generated in such a way as to keep the current page
as close to the center of the list as possible while
maintaing a list of size @width.
@param current The current page being viewed
@param start_width The starting number of pages to
show in the navigation bar
@param max_width The maximum number of pages to
show in the navigation bar
@param max_page The maximum value for a page in the
navigation bar
@return The list of integers to be used in the
pagination navigation bar.
"""
width = min(start_width + current - 1, max_width)
start = max(1, current-(width/2))
end = min(start+width, max_page+1)
if end == max_page+1:
start = max(1, max_page-width+1)
return range(start, end)
Starting with a width of 5 numbers and ending with a width of 10 (Google uses 10 and 20 instead of 5 and 10), and 15 pages total, the pagination code will output these ranges when going from start to finish:
paginate_google_style(1,5,10,15): 1 2 3 4 5
paginate_google_style(2,5,10,15): 1 2 3 4 5 6
paginate_google_style(3,5,10,15): 1 2 3 4 5 6 7
paginate_google_style(4,5,10,15): 1 2 3 4 5 6 7 8
paginate_google_style(5,5,10,15): 1 2 3 4 5 6 7 8 9
paginate_google_style(6,5,10,15): 1 2 3 4 5 6 7 8 9 10
paginate_google_style(7,5,10,15): 2 3 4 5 6 7 8 9 10 11
paginate_google_style(8,5,10,15): 3 4 5 6 7 8 9 10 11 12
paginate_google_style(9,5,10,15): 4 5 6 7 8 9 10 11 12 13
paginate_google_style(10,5,10,15): 5 6 7 8 9 10 11 12 13 14
paginate_google_style(11,5,10,15): 6 7 8 9 10 11 12 13 14 15
paginate_google_style(12,5,10,15): 6 7 8 9 10 11 12 13 14 15
paginate_google_style(13,5,10,15): 6 7 8 9 10 11 12 13 14 15
paginate_google_style(14,5,10,15): 6 7 8 9 10 11 12 13 14 15
paginate_google_style(15,5,10,15): 6 7 8 9 10 11 12 13 14 15
While both of these are valid solutions to pagination, they both lead to the entire navigation bar shifting with each click (there are edge cases, like pages 11 to 15 in the above example where the bar doesn’t shift). I thought about how to make the navigation bar remain more or less the same, while still allowing arbitrary ranges of pages. Effectively, I wanted to minimize the frequency with which the numbers displayed change. This could be a useful constraint for systems with many tabs or ajax-ified pagination as well. (Note: One assumption I operated under was that no “next” or “previous” buttons were present)
What I decided upon was to amortize the shifting by showing ranges of pages in “blocks”, let’s say in blocks of 10 numbers. So block 1 is the range for pages 1 through 10, block 2 is for pages 11 through 20, and so on. An entire block is always visible when any page in said block is visible. The only thing changing is what number is in bold (or rather… currently viewed).
There is a catch though. If we are in block 1, how do get to block 2? Alternatively, if we are in block 2, how do we get back to block 1? What we need are triggers that take us to the next block. In the previous two solutions, every page was a trigger (every page had the potential to cause a shift when clicked). In this solution, we only need two triggers. One to take us to the previous block and one to take us to the next block.
We can’t make a page in the block a trigger though. This is because if I’m seeing 1 2 3 4 5 6 7 8 9 10 and we make 10 the trigger to show block 2, when I click on 10 to view page 10, the navigation bar will show 11 12 13 14 15 16 17 18 19 20. How do I get to page 9? How do I even view the first block of numbers again? We can’t make 11 the trigger to go back, because then if we’re on page 11 and naturally want to go to page 12, we first need to view page 10 before being able to get to page 12. That’s not very friendly. In fact, it’s downright confusing.
Simple solution. We stick with our original constraint: “An entire block is always visible when any page in said block is visible.” Also, no number in the current block can be a trigger for reasons discussed above. We still need two triggers though… so in addition to showing the numbers in the current block, we’ll show the last number from the previous block and the first number from the next block. These numbers will be our triggers. Like I said, it’s a simple solution.
Here is the code to achieve this:
def paginate_krenzel_style(current, width, max_page):
""" Creates a list of integers corresponding to the
pages to be displayed in a pagination navigation
bar. The behavior displays the pages in "blocks"
of size @width-2. The @width has 2 subtracted from
it because the first and last numbers are the end
and start of the previous and next blocks,
respectively. See http://krenzel.info/?p=97 for more
details. The list is generated in such a way as to
minimize the frequency with which the numbers
displayed to the user change.
@param current The current page being viewed
@param width The total number of pages to show
in the navigation bar
@param max_page The maximum value for a page in the
navigation bar
@return The list of integers to be used in the
pagination navigation bar.
"""
block_size = width - 2
start = max(1, ((current-1)/block_size)*block_size)
end = min(max_page, start + block_size + 1)
return range(start, end+1)
Using a width of 10, and 15 pages total, the pagination code will output these ranges when going from start to finish:
paginate_krenzel_style(1,10,15): 1 2 3 4 5 6 7 8 9 10
paginate_krenzel_style(2,10,15): 1 2 3 4 5 6 7 8 9 10
paginate_krenzel_style(3,10,15): 1 2 3 4 5 6 7 8 9 10
paginate_krenzel_style(4,10,15): 1 2 3 4 5 6 7 8 9 10
paginate_krenzel_style(5,10,15): 1 2 3 4 5 6 7 8 9 10
paginate_krenzel_style(6,10,15): 1 2 3 4 5 6 7 8 9 10
paginate_krenzel_style(7,10,15): 1 2 3 4 5 6 7 8 9 10
paginate_krenzel_style(8,10,15): 1 2 3 4 5 6 7 8 9 10
paginate_krenzel_style(9,10,15): 8 9 10 11 12 13 14 15
paginate_krenzel_style(10,10,15): 8 9 10 11 12 13 14 15
paginate_krenzel_style(11,10,15): 8 9 10 11 12 13 14 15
paginate_krenzel_style(12,10,15): 8 9 10 11 12 13 14 15
paginate_krenzel_style(13,10,15): 8 9 10 11 12 13 14 15
paginate_krenzel_style(14,10,15): 8 9 10 11 12 13 14 15
paginate_krenzel_style(15,10,15): 8 9 10 11 12 13 14 15
In this example, when a user starts at page 1 and goes through every page up to 15, the numbers displayed only change once. I kind of like this behavior.
Please feel free to recommend other approaches as well. Most engineers I’ve seen don’t put any thought into little things like this, but it is important to the overall user experience.
FizzBuzz
It’s been a while since I’ve written an entry. I’m usually not one to make excuses, so I’ll just sum it up with: I’ve been working on many side projects, Google kept me busy and frequently what I worked on internally conflicted with what I wanted to write publicly, and it’s my senior year… so I’ve been crazy busy with school. Without further ado… this entry will be a lightweight one, but I’ve got a couple heavier ones on the back burner.
Last year, CodingHorror wrote a great post titled Why Can’t Programmers.. Program?. The author references another post by Imran which introduces what is now a commonly referenced programming “challenge” called FizzBuzz. It is a good test for immediately and easily weening out those who claim to be able to code, but can’t. The challenge is this:
Write a program that prints the numbers from 1 to 100. But for multiples of three print “Fizz” instead of the number and for the multiples of five print “Buzz”. For numbers which are multiples of both three and five print “FizzBuzz”.
Any software engineer that can turn on a computer should immediately want to write up a solution. That is, after all, what we are. We are problem solvers. We eat problems for breakfast, lunch, dinner and a midnight snack. Apparently though (according to Joel), only about 0.5% of self-claimed programmers can actually code… that’s 199 out of every 200 people who say they can program, can’t1.
If you’ve been following what I write… you may or may not have come to the conclusion that I can code
For those still uncertain, here is a one-line FizzBuzz solution in python (clearly it can be done much cleaner if you’re willing to do it in more than one line). I abuse comprehensions, auto-casting, and string operators to achieve the end result, but it works.
This is where someone points out an amateur mistake.
1Apologies to Joel if I’m misrepresenting his position
Two complaints about python
- Self-hell and general scope handling
- It reads my mind only 60% of the time