Slides and talk from 'Cosmic Collections' paper

This is a lazy post, a straight copy and paste of my presentation notes (my excuse is that I'm eight days behind on everything at work and uni after being grounded in the US by volcanic ash). Anyway, I hope you enjoy it or that it's useful in some way.

Cosmic Collections: creating a big bang?

View more presentations from Mia .

Slide 1 (solar rays – Cosmic Collections):

The Cosmic Collections project was based on a simple idea – what if we gave people the ability to make their own collection website? The Science Museum was planning an exhibition on astronomy and culture, to be called ‘Cosmos & Culture’. We had limited time and resources to produce a site to support the exhibition and we risked creating ‘just another exhibition microsite’. So what if we provided access to the machine-readable exhibition content that was already being gathered internally, and threw it open to the public to make websites with it?  And what if we motivated them to enter by offering competition prizes?  Competition participants could win a prize and kudos, and museum audiences might get a much more interesting, innovative site.
The idea was a good match for museum mission, exhibition content, technical context, hopefully audience – but was that enough?
Slide 2 (satellite dish):
Questions…
If we built an API, would anyone use it?
Can you really crowdsource the creation of collections interfaces?
The project gave me a chance to investigate some specific questions.  At the time, there were lots of calls from some quarters for museums to produce APIs for each project, but would anyone actually use a museum API?  The competition might help us understand whether or how we should invest in APIs and machine-readable data.
We can never build interfaces to meet the needs of every type of audience.  One of the promises of machine-readable data is that anyone can make something with your data, allowing people with particular needs to create something that supports their own requirements or combines their data with ours – but would anyone actually do it?
Slide 3 (map mashup):
Mashups combine data from one or more sources and/or data and visualisation tools such as maps or timelines.
I'm going to get the geek stuff out of the way and quickly define mashups and APIs…
Mashups are computer applications that take existing information from known sources and present it to the viewer in a new way. Here’s a mashup of content edits from Wikipedia with a map showing the location of the edit.
Slide 4 (APIs)
APIs (Application Programming Interfaces) are a way for one machine to talk to another: ‘Hi Bob, I’d like a list of objects from you, and hey, Alice, could you draw me a timeline to put the objects on?’
APIs tell a computer, 'if you go here, you will get that information, presented like this, and you can do that with it'.
A way of providing re-usable content to the public, other museums and other departments within our museum – we created a shared backend for web and gallery interactives.
I think of APIs as user interfaces for developers and wanted to design a good experience for developers with the same care you would for end users*.  I hoped that feedback from the competition could be used to improve the beta API
* we didn’t succeed in the first go but it’s something to aim for post-beta
Slide 5: (what if nobody came?)
AKA 'the fears and how to deal with them'
Acknowledge those fears
Plan for the worst case scenario
Take a deep breath and do it anyway
And on the next slides, the results.  If I was replicating the real experience, you’d have several nerve-biting months while you waited for the museum to lumber into gear, planned the launch event, publicised the project in the participant communities… Then waited for results to come in. But let’s skip that bit…
Slide 6: (Ryan Ludwig's http://www.serostar.com/cosmic/)
The results – our judges declared a winner and a runner-up, these are screenshots – this is the second prize winning entry.
People came to the party. Yay! I'd like to thank all the participants, whether they submitted a final entry or not. It wouldn't have worked without them.
Slide 7: (Natalie and Simon's http://cosmos.natimon.com/)
This is a screenshot from the winning site – it made the best use of the API and was designed to lure the visitor in and keep drawing them through the site.
(We didn’t get subject specialists scratching their own itch – maybe they don’t need to share their work, maybe we didn’t reach them. Would like to reach researchers, let them know we have resources to be used, also that they can help us/our audiences by sharing their work)
Slide 8: (astrolabe – what did we learn?)
People need (more) help to participate in a geektastic project like this
The dynamics of a competition are tricky
Mashups are shaped by the data provided – you get out what you put in
Can we help people bring their own content to a future mashup?
Slide 9: (evaluation)
I did a small survey to evaluate the project… Turns out the project was excellent outreach into the developer community. People were really excited about being invited to play with our data.  My favourite quote: "The very idea of the competition was awesome"
Slide 10: (paper sheet)
Also positive coverage in technical press. So in conclusion?
Slide 11: (Tim Berners-Lee):
“The thing people are amazed about with the web is that, when you put something online, you don’t know who is going to use it—but it does get used.”
There are a lot of opportunities and excitement around putting machine-readable data online…
Slide 12: Tim Berners-Lee 2:
But:  It doesn’t happen automatically; It’s not a magic bullet
But people won't find and use your APIs without some encouragement. You need to support your API users. People outside the museum bring new ideas but there's still a big role for people who really understand the data and audiences to help make it a quality experience…
Slide 13 (space):
What next?
Using the feedback to focus and improve collection-wide API
Adding other forms of machine-readable data
Connecting with data from your collections?
I've been thinking about how to improve APIs – offer subject authorities with links to collections, embed markup in the collections pages to help search engines understand our data…
I want more! The more of us with machine-readable data available for re-use, the better the cross-collections searches, the region or specialism-wide mashups… I'd love to be able to put together a mashup showing all the cultural heritage content about my suburb; all the Boucher self-portraits; all the inventions that helped make the Space Shuttle work…
Slide 14: (thank you)
If you're interested in possibilities of machine-readable data and access to your collections, join in the conversation on the museum API wiki or follow along on twitter or on blogs.  Join in at http://museum-api.pbworks.com/
More at https://openobjects.org.uk/ or @mia_out

Image credits include:
http://antwrp.gsfc.nasa.gov/apod/ap100415.html
http://antwrp.gsfc.nasa.gov/apod/ap100414.html
http://antwrp.gsfc.nasa.gov/apod/ap100409.html
http://antwrp.gsfc.nasa.gov/apod/ap100209.html
http://antwrp.gsfc.nasa.gov/apod/ap100315.html
http://www.sciencemuseum.org.uk/Centenary/Home/Icons/Pilot_ACE_Computer.aspx

Mash the state

Cosmic Collections – the results are in. And can you help us ask the right questions?

For various reasons, the announcement of the winners of our mashup competition has been a bit low key – but we're working on a site that combines the best bits of the winners, and we'll make a bit more of a song and dance about it when that's ready.

I'd like to take the opportunity to personally thank the winners – Simon Willison and Natalie Down in first place, and Ryan Ludwig as runner-up – and equally importantly, those who took part but didn't win; those who had a play and gave us some feedback; those who helped spread the word, and those who cheered along the way.

I have a cheeky final request for your time.  I would normally do a few interviews to get an idea of useful questions for a survey, but it's not been possible lately. I particularly want to get a sense of the right questions to ask in an evaluation because it's been such a tricky project to explain and 'market', and I'm far too close to it to have any perspective.  So if you'd like to help us understand what questions to ask in evaluation, please take our short survey http://www.surveymonkey.com/s/5ZNSCQ6 – or leave a comment here or on the Cosmic Collections wiki.  I'm writing a paper on it at the moment, so hopefully other museums (and also the Science Museum itself) will get to learn from our experiences.

And again – my thanks to those who've already taken the survey – it's been immensely useful, and I really appreciate your honesty and time.

Nine days to go! And entering Cosmic Collections just got easier

Quoting myself over on the museum developers blog, Cosmic Collections – do one thing and do it well:

I’ve realised that there may be some mismatch between the way mashups tend to work, and the scope we’ve suggested for entries to our competition. The types of interfaces someone might produce with the API may lend themselves more to exploring one particular idea in depth than produce something suitable for the broadest range of our audiences.

So I’m proposing to change the scope for entries to the competition, to make it more realistic and a better experience for entrants: I’d like to ask you to build a section of a site, rather than a whole site. The scope for entrants would then be: “create something that does one thing, and does it well”. Our criteria – use of collections data, creativity, accessibility, user experience and ease of deployment and maintenance – are still important but we’ll consider them alongside the type of mashup you submit.

I've updated the Cosmic Collections competition page to reflect this change. This page also features a new 'how to take part' section, including a direct link to the API and to a discussion group.

I'd love to hear your thoughts on this change – there's an email address lurking on the competition page, and I'm on twitter @mia_out and @coscultcom.

In other news, programmableweb published a blog post about the competition today: Science Museum Opens API and Challenges Developers to Mashup the Cosmos. Woo!

And I don't know if it's any kind of consolation if you're entering, but I'll be working right alongside you up until Friday 28th, on an assignment for my MSc.

'Cosmic Collections' launches at the Science Museum this weekend

I think I've already said pretty much everything I can about the museum website mashup competition we're launching around the 'Cosmos and Culture' exhibition, but it'd be a bit silly of me not to mention it here since the existence and design of the project reflects a lot of the issues I've written about here.

If you make it along to the launch at the Science Museum on Saturday, make sure you say hello – I should be easy to find cos I'm giving a quick talk at some point.
Right now the laziest thing I could do is to give you a list of places where you can find out more:
Finally, you can talk to us @coscultcom on twitter, or tag content with #coscultcom.
Btw – if you want an idea of how slowly museums move, I think I first came up with the idea in January (certainly before dev8D because it was one of the reasons I wanted to go) and first blogged about it (I think) on the museum developers blog in March. The timing was affected by other issues, but still – it's a different pace of life!

How I do documentation: a column of bumph and a column of gold

All programmers hate documentation, right? But I've discovered a way to make it less painful and I'm posting in case it helps anyone else.

The first trick is to start documenting as soon as you start thinking about a project – well before you've written any code. I keep a running document of the work I've done, including the bits I'm about to try, information about links into other databases or applications, issues I need to think about or questions I need to ask someone, rude comments (I know, I look like such a nice girl), references, quick use cases, bits about functions, summary notes from meetings, etc.

Mostly I record by date, blog style. Doing it by date helps me link repository files, paper notes and emails with particular bits of work, which can otherwise be tricky if it's a while since you worked on a project or if you have lots of projects on the go. It's also handy if you need to record the time spent on different projects.

I just did it like this for a while, and it was ok, but I learnt the hard way that it takes a while to sort through it if I needed to send someone else some documentation. Then I made a conscious decision to separate the random musings from the decisions and notes on the productive bits of code.

So now my document has two columns. This first column is all the bumph described above – the stuff I'd need if I wanted to retrace my steps or remind myself why I ended up doing things a certain way. The second column records key decisions or final solutions. This is your column of gold.

This way I can quickly run down the items in the second column, organise it by area instead of by date and come up with some good documentation without much effort. And if I ever want to write up the whole project, I've got a record of the whole process in the column of bumph.

You could add a third column to record outstanding tasks or questions. I tend to mark these up with colour and un-colour them when they're done. It just depends how you like to work.

It's amazingly simple, but it works. I hope it might be useful for you too. Or if you have any better suggestions (or a better title for this post), I'd love to hear them.

Notes from Advanced Web Development: software strategies for online applications at MW2008

These are my notes from the Advanced Web Development: software strategies for online applications workshop with Rob Stein, Charles Moad and Edward Bachta from the Indianapolis Museum of Art at Museums and the Web 2008 (MW2008) in Montreal. I don't know if they'll be useful for anyone else, but if you have any questions about my notes, let me know.

They had their slides online before the presentation, which was really helpful. [More of this sort of thing, please! Though I wish there was a way to view thumbnails of slides on slideshare so you can skip to particular slides.]

The workshop covered a lot of ground, and they did a pretty good job of pitching it at different levels of geekdom. Some of my notes will seem self-evident to different types of geeks or non-geeks but I've tried to include most of what they covered. I've put some of my own comments in [square brackets].

They started with the difference between web pages and web applications, and pointed out that people have been building applications for 30 years so build on existing stuff.

Last year's talk was about 'web 2.0' and the foundations of building solid software applications but since then APIs/SDKs have taken off. Developers should pick pieces that already work rather than building from the bottom up. The craft lies in knowing how to choose the components and how to integrate them.

There are still reasons to consider building your own APIs e.g. if you have unique information others are unlikely to support adequately, if you care about security of data, if you want to control the distribution of information, or if a guarantee of service is important (e.g. if vendors disappear).

Building APIs
They're using model driven development, using xmlschema or database as your model.

Object relational mappers provide object-oriented access to a database. Data model changes are picked up automatically and they're generally database-agnostic so you can swap out the back end. Object relational mappers include Ruby, Hibernate (also in .Net), Propel and SQLAlchemy.

IMA use Hibernate with EMu (their collections management system) and Propel. They've built an 'adaptive layer' for their collection that glues it all together.

Slide on Eclipse: 'rich client platform', not just an IDE. Supports nearly every language except .Net; is cross-platform.

Search
Use full-text indexes for good search functionality. They suggest Lucene (from apache.org) or Google gears. Lucene query types offer finer control than Google e.g. fielded searching [a huge draw for specialist collections searches], date range searching, sorting by any field, multiple index searching with merged results. Fast, low memory usage, extensible. Tools built on Lucene include Nutch (web crawler) and Solr – REST and SOAP API.

Bite size web components and suggestions for a web application toolkit
Harking back to the 'find good components' thing. Leverage someone else's work, and reduce dev/debugging costs – in their experience it produces fewer errors than writing their own stuff.

Storage – Amazon, Nirvanix, XDrive, Google, Box.net. Use Amazon S3 if accessed infrequently cos of free structure.

Video – YouTube, Revver, blip.tv also have developer interfaces. The IMA don't host any video on their website, it's all on YouTube.

Images – Flickr, Picasa. [But the picasa UI sucks so please don't inflict that on your users!]. Flickr support for REST, SOAP, JSON.

Compute (EC, Amazon web service) – Linux virtual machines. Custom disk images for specific requirements. Billable on use. See slides on costs for web hosting.

Authentication services – OpenID, OAuth.

Social computing
Consider social computing when developing your web applications – it's evolving rapidly and is uncertain. Facebook vs OpenSocial (might be the question today, but tomorrow?). Stick with the eyeballs and be ready to change. [Though the problem for museums thinking about social software applications remains – by the time most museums go through approval processes to get onto Facebook it'll be dead in the water. Another reason to have good programmers on staff and include content resources in online programs, so that teams can be more flexible while still working within the overall online strategy of their organisation.]

Developing on Facebook
Facebook API – REST-based API. Use their developer platform – simpler than original API calls. JSON simpler than XML responses. Facebook Query Language (FQL) reduces calls to API. Facebook Markup Language (FBML). HTML + Facebook specific features, inc security controls and interfaces features. [There's a pronoun tag with built-in 'they' if not sure of gender of person. Cute.] Lots more in their slides.

Widget frameworks
Widgets are the buzzword that hasn't quite taken off. The utility isn't quite there yet, so what are they used for? Players are Google, Netvibes (supports more platforms including Apple Mac dashboard, Yahoo, iGoogle, etc) but is Adobe AIR the widget killer? Flash-based runtime for desktop apps. e.g. twhirl. Run as background processes, and can access desktop files directly, clipboard, drag and drop. [I downloaded the AIR Google Analytics application during the session, it's a good example.]

Content management
The CMS is the container to put all the components together. A good CMS will let you integrate components into a new site with a minimum of effort. [Wouldn't that be nice?] Examples include Joomla, WordPress, Drupal, Plone.

There aren't slides for the next 'CMS tour' bit, but they gave some great examples.

Nature holds my camera: they tried visitor blogging with a terminal in gallery so people could ask questions.

They talked about the IMA dashboard. [I asked a vague question about whether there was a user-driven or organisational business case for it – turns out it was driven by their CEO's interest in transparency, e.g. in sharing how they invest monies, track stats and communicate with their visitors. It helps engender trust and loyalty e.g. for donors. Attendance drives corporate sponsorship so there was a business case. It's also good for tracking their performance against actual actions vs stated goals.]

The advantages of using a web application toolkit – theromansarecoming.com took $50,000 to build for a four month exhibition. It hit the goals but was expensive. [The demo looked really cool, it's a shame you don't seem to be able to access it online.]

Breaking the Mode was built using existing components on the technical side, but required the same content investment i.e. in-house resources as The Romans Are Coming. The communication issues were much better because it was built in-house – less of a requirement to explain to external developers, which had some effect on the cost [but the biggest saving was i re-usable component] – the site took 25 hours to build and IT staff costs were about $1000. [So, quite a saving there.]

They demonstrated 'athena', the IMA's intranet. It has file sharing and task management and is built on drupal, looks a bit like basecamp-lite but without licensing issues. "Everything you do in a museum is project-based" and their intranet is built to support that.

There was discussion about whether their intranet could be shared with other museums. Rob Stein is a firm believer in open source and thinks it's the best way to go for museum sector. They're willing to share the source code but don't have the facilities to support it. There's a possibility that they could partner with other institutions to combine to pay small vendors to support it.

[I could hear a sudden burst of keyboards clicking around me as the discussion went onto pooling resources to create and support open source applications for stuff museums need to do. Smaller museums (i.e. most of us, and most are much smaller than MoL) don't have the resources for bespoke software or support but if we all combined, we'd be a bigger market. Overall, it was a really good, grounded discussion about the realities and possibilities of open source development.]

Back to the slides…

Team Troubles
[It was absolutely brilliant to see a discussion of teamwork and collaboration issues in a technical session.]

Divide and conquer – allow team members to focus on area of expertise. Makes it easier to swap out content and themes.

They're using MVC – Model (data management), Controller (interaction logic), View (user internface). They had some good stuff on MVC and the web in their slides (around 77-79). They also discussed the role of non-technical team members.

Drupal boot camp
[This was a pretty convincing demo of getting started with Drupal and using the Content Construction Kit (CCK) to create custom content types e.g. work of art to publish content quickly, though I did wonder about how it integrated with ORMs that would automatically pick up an underlying data structure. Slide 103 showed recommended Drupal modules. It's definitely worth checking out if you're looking for a CMS. If you're on Windows, check out bitnami for installation.]

Client side development
"The customer is always right"
They talked about the DOM (document object model) and javascript for Web 2.0 coolness.
They recommended using Javascript toolkits – more object-orientated, solve cross-browser issues, rapid development. Slide 109 listed some Javascript toolkits and they also recommended Firebug.

Interface components
They should be re-usable, just like the server-side stuff. They should some suggestions like reCAPTCHA, image carousels and rating modules. Pick the tools with best community support and cross-platform support.

CSS boilerplates
Treat CSS like another software component of web design and standardise your CSS usage. Use structured naming for classes and divs in server-side content generation. Check out oswd.org for free templates.

XML in the real world
They demonstrated Global Origins (more on that and other goodness at www.ima-digital.org/special-projects) which uses XML driven content.

Questions and discussion
I asked about integration with legacy/existing systems. Their middleware component 'Mercury' binds their commercial packages and other applications together. e.g. collection management system extraction layer. [This could be a good formalised model for MoL, as we have to pull from a few different places and push out to lots more and it's all a bit ad hoc at the moment. I think we'll be having lots of good discussions about this very soon.]

Some discussion about putting pressure on vendors to open data models. It's a better economic model for them and for museums.

Their CEO is supportive of iteration (in the development process). The web team is cross-department, and they have new media content creators.

[I was curious about how iterative development and the possibility of making mistakes work with their brand but didn't want to ask too many questions]

They made the point that you have a bigger recruiting pool with open source software. [Recruiting geeks into museums has been a bit of a conference meme.]

They give away iPods for online surveys and get more responses that way, but you do have to be aware that people might only give polite answers to survey questions so pay close attention to any criticism.

The IMA say you should be able to justify the longevity of projects when experimenting. Measure your projects against your mission, and how they can implement your mission statement.

So, that's it! I hope I didn't misrepresent anything they said.

Common Craft have produced videos on RSS in Plain English, Social Bookmarking in Plain English, Wikis in Plain English and Social Networking in Plain English (via Groundswell)

Also worth a look, Google Code for Educators "provides teaching materials created especially for CS educators looking to enhance their courses with some of the most current computing technologies and paradigms". They say, "[w]e know that between teaching, doing research and advising students, CS educators have little time to stay on top of the most recent trends. This website is meant to help you do just that" and it looks like it might also be useful for busy professionals who want to try new technologies they don't get time to play with in their day jobs (via A Consuming Experience).

Also from A Consuming Experience, a report on a talk on "5 secrets of successful Web 2.0 businesses" at the June London Geek Dinner.

On a random note, I noticed that the BBC have added social bookmarking to their news site:

I wonder if this marks the 'mainstreaming' of social bookmarking.