pystatsd & 5,000 Lists!

We’re working hard to improve Open Library’s general stability and performance, after a few harrowing weeks moving our hardware infrastructure around. We’re beginning to measure more stuff across the site, from general activity levels (about 40,000 catalog edits every month!) to quite specific actions (like, seeing that every second, 1-3 people open up our BookReader).

We’ve begun using a super awesome, real-time stats processing package called pystatsd, a Python implementation of Etsy’s statsd server. My favourite bit is a program that sits on top of that called graphite which takes all the stats we collect with pystatsd and renders them as graphs in a browser. Suddenly, we can see the system in a new and useful way.

We’re also looking hard at improving our memcached configuration, recently introducing another 4 memcached machines into our pool. Now that we can measure memcached hits and misses using pystatsd and graphite, we’ll be able to tell when our caching stuff is actually improving. Yay!

Memcached hits & misses

Another tweak you might find interesting… it used to be that lists would only show up on the main Lists page if they contained at least 3 seeds. The other day, Raj and I upped that to at least 5 seeds, and that immediately produced a selection of arguably more interesting lists, most of which settle around a subject area. Here’s a small selection:

Have you made a great list, or found someone else’s? Let us know in the comments!

Minimum Viable Record?

Having worked more closely with bibliographic data than I had ever expected to over the last couple of years, I still can’t quite believe how complicated it can be. I keep holding tight something Karen Coyle told me when I first started at Open Library, that “library metadata is diabolically rational.” Now that I’ve witnessed the cataloging from lots of different sources and am more familiar with the level of detail that’s possible in a library catalog, I have a new fondness for these intensely variegated information systems; at times devilishly detailed, at others wildly incomplete or arcanely abbreviated. Everyone likes to arrange things and classify them into groups. It’s when you try to get people to put things into groups that someone else has come up with that it starts getting messy.

Continue reading