And we’re back…

The most recent post on this blog is from…. last year—ACK! That’s lame. Sorry about that.

In my defense, this year has seen some upheaval in my life, as I decided to leave my cushy Google digs for more exciting pastures. Nothing is settled yet by any means, but they’re getting there, so I’m starting to think again about what’s next for the realchat project. I know I want to keep doing it, so that’s a start. 

For now, tho, here’s a funny cat video. 

wsgi, bottlepy, and putting it “all” (read: “some of it”) together

In this post we’ll take a look at the most basic of WSGI apps (helloworld, natch) with the bottlepy micro-framework and how a wsgi app might work with what we’ve already built.

I know, I know. “Micro?” Really? Somewhat groan inducing, but given how many seemingly one-stop shopping frameworks out there that exist I think its important. Bottle doesn’t have an ORM. There’s no middleware for keeping sessions around. There’s a template language but it’s purely optional and we’ll probably steer clear of it. From the bottle page:

Bottle is a fast, simple and lightweight WSGI micro web-framework for Python. It is distributed as a single file module and has no dependencies other than the Python Standard Library.

We’re gonna use it for request context (GET and POST params) and routing. That’s all we need it for, so that’s all we’ll use it for.

The Service Module

[link]

Ok, enough soapbox, back to the codez. We’re adding a new directory called chatserver. We’ll add two module files, chatroom.py, which will encapsulate our bottle endpoints (the wsgi app), which are just decorated python functions, and service.py, which takes care of firing up a wsgiref http server in the context of startup components we already created.

Let’s take a look at the main bits of serivice.py as well as our imports: 

39 import sys                                                                                         
40 import os
41 import signal
42 
43 import wsgiref.simple_server
44 import conf
45 import common.start

67 def main():                                                                  
68     srvs = conf.get('chatservers')
69     parser = common.start.generic_parser('chatserver')
70     args = parser.parse_args()
71     return common.start.main(
72         serve_wsgiref, args, srvs, 
73         logname = 'chatserver',
74         monkey_log = True,
75         )
76 
77 if __name__ == '__main__':
78     v = main()
79     sys.exit(v)

You’ll notice we imported a few modules we’ve built before, conf and common.start. We use the conf module and arguments to create a context by which the process will run, like logfile setup and forking, then we use common.start.main() to do the running.  The params for this call are:

  • serve_wsgiref: The actual method to call once we’ve setup logging and optionally forked. We’ll cover this a bit more later.
  • args: the argparse’d argument manager.
  • srvs: the service configuration list. 
  • logname = ‘chatserver’: just a simple tag for our logs.
  • monkey_log = True: This flags the common.start.main() function to monkey patch the python logging module so that calls such as logging.info() go to our newly created log. I touched on this briefly here, and you’ll see us use this in the chatroom module later.

This is just really the glue into our startup code we’ve done before. Now let’s take a look at serve_wsgiref() method to see our wsgi app in action.

48 def serve_wsgiref(log, logdir, args, here, **kwargs):
49     '''The realmain serve method, called by common.start.main()
50     with nice newly minted log and the here dictionary, which is
51     locked by common.serverlock.lock_server().
52 
53     The here has the host and port name, which we use to construct our
54     wsgiref httpd to expose our bottle wsgi app.                                                                           
55     '''
56     host = here['host']
57     httpport = here['httpport' ]
58     log.info('chat service starting. host: %s port: %s' % (
59         host, httpport))
60     app = chatroom.APP
61     httpd = wsgiref.simple_server.make_server(host, httpport, app)
62     try:
63         httpd.serve_forever()
64     except KeyboardInterrupt, e:
65         pass
66     log.info('chat service done')

Here, of note:

  • common.start.main() actually takes a callback, the “realmain” function, which in this case is serve_wsgiref. These calls needs to have a signature as follows: the log, logdir, argparse args, and the here dictionary, which is the socket-locked element we pull out of conf.
  • The handle to the bottle app is needed to hand off to the wsgiref simple_server. Yes, bottle has it own facilities to start up a wsgiref httpd but I thought it would be nice to show it in action here, especially since in another iteration of this code we’re going to be firing up different http request processing mechanism (like gevent). And yes, I know that bottle also supports starting with gevent. This, I think, counts against the loosely coupledness of the micro-framework, which is a bummer. But thankfully we don’t have to use it.  
  • We call serve_forever(), which, as you may imagine, servers for, um, ever.

The Chatroom Module

[link]

The chatroom module at this point is incredible simple; look for more complexity from the future!

For now, without further adue, the helloworld in all its boringtitude.

42 import bottle                                                          
43 import logging
44 
45 bottle.debug(True)
46 APP = bottle.default_app()
47 
48 @bottle.route('/hello')
49 def hello():
50     msg = 'ohno hello world!'
51     logging.info(msg)
52     bottle.response.content_type = 'text/plain'
53     return msg + '\n'

Ok, so maybe its not the most boring piece of code ever. There’s some interesting stuff going on here. It’s the most basic of route functions for bottle, we:

  • Decorate the method with bottle.route().
  • Call logging.info() with our message.
  • Set the content_type of the response to be plain text, and return it.

I’m not going to go to much into how bottle works, you can run through the excellent tutorial on the bottlepy site. What you should take away from this is how easy and simple http request routing is with it. 

There is some module level setup we do as well, namely reaching into the module level default bottle app and placing it here in this module for easier access by the service.py components, which it uses to start up the wsgiref server. We also call the module function debug(True), which will flag bottle to do stuff like write tracebacks encountered during request handling to stderr (and thus to the log we setup in common.start).

Running The Code (finally)

Finally time for the fun part! Remember we have 2 processes configured and all we have to do is run service.py twice for them to both fire up (because of how we do socket locking). Here’s the commandline:

$ python chatserver/service.py --logfile=chat.log --daemon
$ python chatserver/service.py --logfile=chat.log --daemon
$

Which result in logs that look like this:

$ tail -f log*/chat.log
==> log0/chat.log <==
[chatserver|2011-12-18 13:58:27,707|INFO] chat service starting. host: alkime port: 8081

==> log1/chat.log <==
[chatserver|2011-12-18 13:58:28,923|INFO] chat service starting. host: alkime port: 8071

Great! Our service processes are up and running, bound to different http ports. Out comes everyone’s favorite http command liner curl:

$ curl http://alkime:8081/hello
ohno hello world!
$ curl http://alkime:8071/hello
ohno hello world!
$

So far so good. We’ve hit the same endpoint, but on different ports/processes. The logs, as you’ll see, have some stuff in them. 

==> log0/chat.log <==
[chatserver|2011-12-18 13:58:40,014|INFO] ohno hello world!
[chatserver|2011-12-18 13:58:40,015|ERROR] alkime - - [18/Dec/2011 13:58:40] "GET /hello HTTP/1.1" 200 18

==> log1/chat.log <==
[chatserver|2011-12-18 13:58:44,541|INFO] ohno hello world!
[chatserver|2011-12-18 13:58:44,543|ERROR] alkime - - [18/Dec/2011 13:58:44] "GET /hello HTTP/1.1" 200 18

Bah, looks like the wsgiref defaults its access.log to go to stderr. We could jump through some hoops to fix this but since we’re really just using it as an example we’ll just leave it.

We now have a good starting point, with some hopefully more interesting bits for starting up a service rather then the boring old I can start a wsgi app tutorial. It should be too hard to extract this approach to something more service-like, where we start up N wsgi processes as daemons with their own logs in a relatively straightforward and predictable manner. This will serve as the foundation as we build out our horizontally scaling chat app. 

A Note On Importing

You may be wondering how we can have import statements in our code like this:

import conf

When we execute our command line like this:

$ python chatserver/service.py --logfile=chat.log --daemon

service.py, in the chatserver directory, is able to import the code in conf/ because we’re executing the code with the PYTHONPATH=.:..

This is a habbit I got into at Slide and it seems to work well here.

pygments

You may be asking yourself something like “damn those highlighted code snippets on GRD are dope. How do I get such sweet codez?” Well fear not gentle reader!! Not only have I had too much coffee this morning, but I’ve got the hookup for you. 

Check out pygments. As you might imagine, it lexes out various languages and outputs nicely to CSS’d html, images, etc. And its written in python, so you can actually utilize it as a programming API. 

In fact I was using one set of command lines so much I decided to write up a small script that’ll slice a file and output html with the correct line numbers and offsets. Here it is all gisted up for you.  

Happy highlighting!

Testing

As the realchat project gets more pieces, testing is starting to become more important (and to be honest, “more pieces” here means, like, 3… which is to say you might as well start doing testing with 1 piece… which is to say which is to say start of thinking about testing at the beginning!)

Case in point: as I’m working on the next bits, I encountered a bug in common/start.py. I was closing of the stdout/stderr file descriptors in daemon mode without also monkey patching them with a wrapper that behaves like the stream but actually writes to the log files we’ve set up. This way anything that does a sys.stdout.write() call, like say a small implementation of a wsgiref simple_server, will go to the logs and not barf out. Simple enough to get back on track.

You may notice, too, that I created a tests directory. As I was debugging this fix I created this simple test script to exhibit the bad behavior, based off the small example snippet on the python docs about wsgiref:

import sys                                                                                            
from wsgiref.simple_server import make_server
from wsgiref.simple_server import make_server, demo_app

httpd = make_server('alkime',   8111, demo_app)
print "Serving HTTP on port 8000..."

from common import start
try:
    start.daemonize('foo.pid')
except start.IAmParent, e:
    sys.exit(0)
# Respond to requests until process is killed
httpd.serve_forever()

I did this mostly because the error was a bit of a PITA because it was the result of closing down file descriptors inappropriately, one of which was the stderr fd, which makes it a weeee bit tricky to know what error is going down. 

Anyways, the real lesson for me, after thinking about it a bit, was that this script was already there in my code… I just didn’t realize it because I didn’t think about it as “a test script.” The piece of code was actually in the __main__ piece of common/start.py where I have a small integration test for those components. The bug was there so I should have gone there to look but instead I created a wholly new, wholly redundant piece of code when all I needed was to add a line or two to the current test, which I didn’t remember existed. I wasn’t in the mindset that I even had tests, so why would I go look for them? To mitigate this I created I moved my integration test to its own file and elevated it by putting it in its very own directory. There don’t you feel special.

So yah, the simplest testing lesson is: start out in the mind set that you have tests, and you’ll be more apt to do those tests. 

fcntl.flock reminder

A friend of mine the other day mentioned I should take a look at python fcntl.flock for doing process locking instead of the socket-oriented way I describe here

The primary reasoning for using sockets is because it gives nice “free” cleanup of the lock if the process should die. Apparently os level file locks do much the same cleanup, so it might be worth checking out. Certainly worth a look. 

Cython Callbacks

I’ve been loving my dive into cython. Well loving might be a little strong, considering learning about it has necessitated so many punch-me-in-the-nuts moments that I remember from C hacking days. Yet so far most of those moments have been tempered by moments at the end where something starts to work and I think, “oh that’s pretty cool.”

Here’s one of those moments. So in my “learning cython” project (more on this as it gets further along), I’m wrapping an existing C api. This api defines a type which is basically a big container for callbacks which get fired a various times in a session. In order to best describe such a type in cython, I thought it might be worth while dip my toe into cython-based functional programming.

This small example is a result, with some comments to explain the interesting bits:

foo_lib.h:

1 //simple struct with a function pointer                                                                                                                  
2 typedef struct Foo_t {
3     int  a;
4     int  b;
5     int (*func)(int i);
6 } Foo_t;

foo.pyx:

 1 cdef extern from "foo_lib.h":
 2     # mirror the C struct, including the function pointer
 3     # we can't just pass in the struct body here, the Cython compiler
 4     # appearently needs to know these symbols in order to, upon instantiation
 5     # below, translate them info the C stuff defined in foo_lib.h, which makes
 6     # sense.
 7     #
 8     ctypedef struct Foo_t:
 9         int a
10         int b
11         int (*func)(int i)
12 
13 # the function we'll be giving our Foo_t
14 #
15 cdef int munge(int i):
16     print 'munging', i
17     return i**i
18 
19 # a python function, in which we define a Cython type, intantiated with the
20 # munge() Cython function defined above, and call it.
21 #
22 def go(int a):                                                  
23     cdef Foo_t f  = Foo_t(a = a, b = 2, func = munge)
24     r = f.func(a)
25     print 'called with %s and returned %s' % (a, r)

After build_ext’ing this guy we have the foo.so module which can be imported. Running the go() method give us this:

>>> import foo; foo.go(2)
munging 2
called with 2 and returned 4
>>> 

spotifython [working title] teaser

libspotify.pxd:

 1 cdef extern from "libspotify/api.h":
 2     ctypedef enum sp_error:
 3         pass
 4     char* sp_error_message(sp_error error)
 5         
 6     char* sp_build_id()
 7                                                                                                                                                    
 8 cdef extern from 'appkey.h':
 9     ctypedef unsigned char uint8_t
10     uint8_t* g_appkey
11     size_t g_appkey_size

libspotify.pyx:

 1 cimport libspotify                                                      
 2
 3 def get_app_key():
 4     result = []
 5     for i in range(libspotify.g_appkey_size):
 6         result.append(libspotify.g_appkey[i])
 7     return result
 8 
 9 def get_build_id():
10     return libspotify.sp_build_id()
11 
12 def error_message(libspotify.sp_error err):
13     return libspotify.sp_error_message(err)

Music Service APIs

I’ve never really used spotify much since it’s become all the rage (and, I believe, the single biggest clogger of facebook’s vaguely creepy realtime feed). And I have to admit when they announced their developer platform a little while ago, I rolled my eyes a bit. Seriously, another platform?

But I have to say, after looking at it, especially libspotify which is a C API for interfacing with the music streams the service provides (!!), it seems like it might be, in a word, THESHIT.

Having all kinds of mad-scientisty thoughts… I think I might have found my first cython project… 

P.S. Another interesting online music service, centered around music recommendations—i.e., “songs like this song” a la pandora—is Echonest. Including here as a reminder to myself to check it out!

Small change to [realchat] common/start

Just commited a small change to the common.start logging setup to allow for a “sort-of” monkey patching of the python logging API. From the commit comment:

common.start.main() now takes a monkey_log flag param which will cause the Logger created by the setup (which routes logs to a file or stdout depending on startup params) to be monkey patched into the logging.root object so that module level calls will us it. So now code can just import logging and call logging.info(), for example, and it will log correctly without needing to have a handle on the Logger instance itself.

I term this a “sort-of” monkey patch because, while its changing something behind the scenes of a standard python api, the object being change is of the same class as the old object, just configured differently. I guess you might call it a monkey-config.

This issue came up because of my starting in on the web endpoints for the app using the bottle python wsgi framework. Unlike a lot of wsgi frameworks out there, bottle takes a module level approach to request routing instead of a object oriented based one (and I would argue this makes it more lightweight, which if you will recall, using lightweight, loosely coupled frameworks is a big part of this project).

Because it operates at module levels, the bottle API makes it tricky to do stuff like “create this object and pass it in to some framework object.” For example, a Logger object.

Luckily python’s logging api is fairly easy to coax into doing what we want:

131     if monkey:                                                                                            
132         logging.root = log
133         logging.Logger.root = logging.root 
134         logging.manaer = logging.Manager(logging.Logger.root)

This code snippet corresponds very closely with the module level code that sets up the root logger in the logging module (in python2.6). The reasoning here should become more clear once I get the commit in for the bottle stuff. Stay tuned.   

YAML (a reminder)

In discussing this project with a friend of mine, he pointed me at YAML as an option for doing configurations. I haven’t ever really explored it but it seems promising, so I’m putting it here as a reminder to myself to look at it again when the project’s a bit more fully formed (I think our use of the library config is good for now).