Mashups made of messages – tech talk at Open Hack London

More (very) rough notes from the weekend's Open Hack London event – please let me know of clarifications, questions, links or comments. You can also check out other posts here tagged openhacklondon.

Mashups made of messages, Matt Biddulph (Dopplr)

Systems architecture on Doppler lets them combine 3rd party systems with their stuff without tying their servers up in knots.

At a rough count, Dopplr uses about 25 third party web APIs.

If you're going to make a web service, site, concentrate on the stuff you're good at. [Use what other people are good at to make yours ace.]

But this also means you're outsourcing and part of your reliability to other people. For each bit of service you add, network latency [is?] putting another bit of risk into your web architecture. Use messaging systems to make server side stuff asynchronous.

'&' is his favourite thing about Linux. Fundamental in Unix that work is divided into packets; each doing the thing it does well. Not even very tightly coupled. Anything that can be run on the command line, stick & on the end, do it in the background. Can forget about things running in the background – don't have to manage the processes, it's not tightly coupled.

Nothing in web apps is simple these days – lots of interconnected bits.

In the physical world, big machines use gearing – having different bits of system run at different speeds. Also things can freewheel then lock in to system again when done.

When building big systems, there's a worry that one machine, one bit it depends on can bring down everything else.

[Slide of a] Diagram of all the bits of the system that don't run because someone has sent an HTTP request – [i.e. background processes]

Flickr is doing less database work up front to make pages load as quickly as possible. They queue other things in the background. e.g. photos load, tags added slightly later. (See post 'Flickr engineers do it offline'.)

Enterprise Integration Patterns (Hohpe et al) is a really good book. Banks have been using messaging for years to manage the problems. Atomic packets of data can be sent on a channel – 'Email for applications'.

Designing – think about what needs to be done now, what can be done in the background? Think of it as part of product design – what has instant effect, what has slower effect? Where can you perform the 'sleight of hand' without people noticing/impacting their user experience?

Example using web services 1: Dopplr and AMEE. What happens when someone asks to see their carbon impact? A request for carbon data goes to Ruby on Rails (memory hungry, not the fastest thing in the world, try to take things off that and process elsewhere). Refresh user screen 'check back soon', send request to message broker (in JSON). Worker process connected to message broker sends request to AMEE. Update database.

Example using web services 2: Flickr pictures on Dopplr page. When you request a trip page, the page loads with all usual stuff and empty div in page with a piece of Javascript on a timer that polls Flickr.

Keeps open connection, a way to push messages to the client while it's waiting to do something.

When processing lots of stuff, worker processes write to memcache as a form of progress bar, but the process is actually disconnected from the webserver so load/risk is outsourced.

'Sites built with glue and string don't automatically scale for free.' You can have many webservers, but the bottleneck might be in the database. Splitting work into message queues is a way of building so things can scale in parallel.

Slide of services, companies that offer messaging stuff. [Did anyone get a photo of that?]

Because of abstraction and with things happening in the background, it's a different flow of control than you might be used to – monitoring is different. You can't just sit there with a single debugger.

[Slide] "If you can't see your changes take effect in a system your understanding of cause and effect breaks down" – not just about it being hard to debug, it's also about user expectations.

I really liked this presentation – it's always good to learn from people who are not only innovating, but are also really solid on performance and reliability as well as the user experience.

[Update: a version of this talk is on the Dopplr blog with slides and notes.]