March 2007 – Open Objects

IBM helps blind 'see' web video

"Technology giant, IBM, is soon to launch a multimedia browser to make audio and video content accessible to people with vision impairments."

"Geeks get a kick out of the creative part of engineering" from below, in comic form.

BCS Lovelace Lecture, Tim Berners-Lee

This is my informal write-up of my notes from the British Computer Society's Lovelace lecture on March 13, 2007. They're very much written from my point of view; for a more objective version you can also view the talk slides at http://www.w3.org/2007/Talks/0313-bcs-tbl/, the BCS report on the event, video or check out this report with video and transcripts.

The lecture was introduced by Nigel Shadbolt (who is also involved in the Web Science Research Initiative).

Berners-Lee's talk was titled 'Looking Back, Looking forward' and sub-titled 'The process of designing things in a very large space'.

He talked about philosophical engineering, as physics used to be called, and introduced the concept of the Semantic Web as a philosophical space.

He introduced the ideas of microscopic rules or design and macroscopic behaviour; that rules are both social and technical and that designing a system involves social conventions as well as technical solutions.

Complexity is introduced between the micro and macro stages. Web science happens when the macro effect is analysed, and issues become clear. Values can be applied at this point, and a creative response to these issues results in another idea. The 'magic' happens at the points of collaboration/complexity and creativity. Magic can be defined as 'stuff you don't understand (yet)'. One slide was about the division between science and engineering in the process.

As an example of this system, email went from being a micro solution that suited a friendly community of academics to an unfriendly world and at that point we got spam. The issue now is how we deal with spam.

The idea that started the web was not being able to access information (there was lots of stuff around but it was all over the place). The technical solution was protocols (URI, HTTP, HTML), the social was incentives to link, e.g. personal collections of bookmarks, lists of URLs to answer FAQs. As the web exploded and went from micro to macro, the issue became not being able to find stuff. This leads to the issues we're dealing with now.

The talk then went onto the reasons the web worked (or the essentials for it to work):

universality (across a range of factors, see slide)

layering

Layering worked on a number of levels: the internet was application-independent, the web was application-independent. Berners-Lee described it as the difference between foundation vs. ceiling technologies. Clean, firm, foundation technologies such as TCP/IP have hooks all over them that allow unimagined innovation.

There was discussion of Google, wikis and blogs. In the original web model, everybody could write. HTML links in other documents should be easy to make (e.g. click, click, save and publish) but it didn't happen at the time.

[The Google slide reminded me that web searches got less fun as Google's algorithms got better – there was much less randomness. I guess all the kinks were smoothed out.]

So the issue was that people couldn't write stuff – wikis were a technical solution, and the social solution was to throw out the permissions model so that everybody can write. The new issue is wiki 'battles'. The idea may be a wiki process, possibly leading to meritocratic systems.

With the Semantic Web, the issue was that web data couldn't be re-used as it was only exposed as HTML. In the model presented, the initial idea is data sharing. The social solution is to use URIs, make useful stuff and useful links, agree on ontologies and share them; pages with URIs link to other pages with URIs. The technical solution is to 'use URIs for documents and concepts', RDF, OWL, SPARQL, RIF (which I assume is Rule Interchange Format having googled it later) and the 'same ladder of authority'. The Semantic Web is 'data but also a web' at the micro level and becomes FOAF or life sciences at the macro level.

[Which led me to wonder, what issues will arise when we get to the macro level of the SW? And throughout the lecture, the question of ontologies kept worrying me – is relying on them realistic?]

URIs: in the Semantic Web, everything has a URI. Not just things – give terms a URI, e.g. don't just say 'blue' – give the URI of that colour blue (give domain [of knowledge]). This allows you to provide the definitive meaning of that term at your URI (and in that way gives ownership of that definition).

The next slides were about the Semantic Web model, including the dream of a unifying logic, passing proof (of identity) around and trusted systems and the SW as a language for explaining the use of Public (PGP) keys; and current Semantic Web work, including the Semantic Web Interest Group.

The lecture moved onto the shapes of data and how they have changed – from lines (tapes, cards) to matrix/tables/boxes (databases) to trees (SGML, XML, top-down structured design, OO) to webs/nets (the internet? www?).

Berners-Lee made the point that the web is not a spider's web – it has no centre.

[All of which made me wonder – is the requirement for ontologies making trees of nets?
How can the tree-like structure of ontologies mesh with nets or webs?]

The next slides were on the idea of applications connected by concepts and the fractal web of concepts – a 'tangle' across boundaries of scale, varying access level, local and global standards, and personal interactions on multiple scales. "The semantic web is about allowing data systems to change by evolution not revolution".

Berners-Lee also discussed the dream of politics and democracy in a civilised society.

TCO: Total Cost of Ontologies – it's a small overall cost. The lesson is: "do you your bit, others will do theirs".

The challenges of web science:

user interface challenges (domain-specific vs generic)
data policy challenges (e.g. identity, privacy, transparency)
resilience
new devices (smarter and cheaper devices; developing countries).

The final slides were on "intercreativity" – when the top and bottom magics are happening together, and the idea of the "connection of half-formed ideas" (as a good thing) and web science.

That's where my notes end, except that I'd also noted that Berners-Lee said 'geeks get a kick out of the creative part of engineering', because it really resonated; and that he had designed the web on a NEXT machine and presented the slides in Safari on a Mac.

Is My Web Site Ineffective? 'Two Ultimate Web Design Checklists' for non-profit organisations.

Libraries and Web 2.0

Via Wired's article on Web 2.0 in libraries, I found the fabulous resources 'Web 2.0 and Libraries: Best Practices for Social Software' and 23 Learning 2.0 things: "23 Things (or small exercises) that you can do on the web to explore and expand your knowledge of the Internet and Web 2.0" at Learning 2.0, an "online self-discovery program that encourages the exploration of web 2.0 tools and new technologies".

Semantic Web Think Tank, Cambridge, March 2007

Here's a write-up of the discussions of the Semantic Web Think Tank meeting held at the Museum of Anthropology and Archaeology, Cambridge, on March 19, 2007.

I can't promise my notes are complete, or that I've correctly attributed everyone's comments, but in the interests of having something for people to react to before the transcripts can be done, here it is. [??] means I didn't note down in time who said it, and other bits in square brackets are where I can't remember if I said it or just noted it. I've posted it as is but if you were there, let me know of any corrections, additions, etc.

Key:
Jeremy Ottevanger (JO), Frances Lloyd Baynes (FLB), Paul Shabajee (PS), Suzanne Keene (SK), Robin Boast (RB), Richard Light (RL), Mike Lowndes (ML), Dylan Edgar (DE), Alex Whitfield (AW), Brian Kelly (BK), Ross Parry (RP), Mia Ridge (MR), Nick Poole (NP), Jon Pratty (JP).

Ross Parry: Introduction and welcome.

What should the final output of the workshops be? We will need to present our findings at the June UK Museums On The Web in Leicester.

Possibly "A Netful of Jewels 2"? It could reflect on opportunities of last ten years and look at current state.

We've spent a lot of time discussing the technical problems but what about the conceptual problems?

NP would want the afternoon conversation to be embedded in real world practice, funding and technology.

RP asked everyone around the table to introduce themselves and their goals for the workshops.

MR: Achievable goals, practical recommendations for museums. Infrastructure and standards should be interoperable and reusable.
JO: high aspirations, low barriers to entry.
FLB: embedded in real world, achievable.
SK: unified view, future proof.

[??] Research labs vs real world: what's happening where?

Robin Boast

RB: interests are history and philosophy of science and knowledge. SW: problems with knowledge.

They have put 200,000 records on online catalogue since 1998, using (or in sympathy with) SPECTRUM, but an extended version. It took 28 years to enter the data.

But: no-one cares. People are not interested in the data. Why?

RB: the SW is in opposition to Web 2.0. Agents in SW…

RB: Tim Berners-Lee (TBL) says SW: Language expresses data and rules for reasoning about the data.
RB: what if our understanding of how knowledge systems operate is contentious?

So why aren't people interested? Catalogues are useful for finding and managing objects, but they're not useful for people wanting to know about objects. Users always have their own local systems for representing the objects. These systems are diverse in content, structure, idiom and purpose. All systems are meaningful but only locally.

MR: what research have they done to support these conclusions? RB: it's based on work with communities using the data.

TBL-style rejoinder (might be): But all joined through object… RB: but… each diverse local system has a different definition of what the object is in scope, boundaries, associations and definition. Interest might be in family of objects or history of object rather than object itself.

RB: the conceptual psychology underling TBL vision is contentious and is currently under heavy fire.

The other vision: knowledge is not a classification of the world but an ongoing negotiation within knowledge communities. Knowledge is dynamic and diverse; it's hard, takes work, commitment and engagement. Most knowledge systems cannot be directly translated to another e.g. translating between languages. The knowledge object itself is dynamic and diverse. Sharing knowledge depends, necessarily, on an ongoing conversation.

Is the SW bad? No. But it depends on where, how, for who and which part is used. It's not a universal system of knowledge but a local, situated tool.

RP: tools to improve discoverability.
RB: SW is about universalisation but that doesn't mean we can't take from it.
RL: the CRM is 'coy' at object level – it doesn't care at that level, it pays more attention in different parts.

[The conversation that followed basically edged around issue of whether we're talking SW or sw, lower-case semantic web being an informal version. The definition was discussed at the first Leicester meeting. I wonder how and to what extent the particular audience and context interpretative content was written for will affect how the interpretative content is perceived outside of that context. How much does the POV of interpretation matter when content is published more widely?]

PS: guidelines… are useful in this context.
DE: museums start from position of wanting to put catalogues online. They should start from position that they want people to find out about objects in collection then look at the best way to do it.
SK: online catalogues can be finding aids, they don't need to be more complicated. Not about mediated knowledge from authoritative POV that encapsulates everything they know about the object. For example, CHIN in Canada.

FLB: museums can't anticipate how people will make meaning with objects; museums should facilitate access.

ML: will still need to link that information with the objects. SW allows us to decouple objects from systems they came from. Once decoupled then need to find way of inferencing things.

RP: the discussion really has two parts: collections management and related requirements vs access (a la Areti), Internal / external access and requirements?

[??] Aiding discoverability?

MR: What about museums as the central 'home' for objects (URIs)? Everything can point back to it, even if it doesn't point back out with explicitly created links but uses implicit [discoverable] links (trackbacks and pings for museum objects?) How does that affect museum authority, curatorship?

RB: they are giving logins to their new CMS to people around the world so they can put their layers of meaning, multiple names, classifications. Their data is stored within their own space. [MR: how does that work with statement about different views of 'object'.] RB: internal management system.

MR: we need to resolve issue of where we stand on ontologies as well as sw/SW.

RB: ontologies are dynamic, emergent.

FLB: museums have to provide a base-level ontology for their objects, but also allow/understand that other ontologies will apply. The user will have the context of museum voice to understand ontology.

RP: difference between promoting sw standards for community of practice vs promoting sw tools for community of interest. Both sw and SW?

[break]

Richard Light
RL demonstrated Topic Maps. He used found content – digitised letters from the Wordsworth Trust. They were transcribed into TEI and summarised into ModesXML object summaries. Generated XML Topic Map XTM. Web app queries ModesXML Topic Map. Then can link back to summary data then back to source material.

Word docs saved by Open Office (Sebastian Rahtz has a plug-in) as TEI.

Can have multiple identifiers, topic maps have merging rules. Associations, assertions can be included.

Topic maps: mirror structure of data. Need some kind of ontology for data so it can go into a topic map.

[Some discussion around] Topic maps vs relational data. Topic maps and relation to already highly structured Collection Management System data… [Paul: had whole bit but I missed it cos I was reviewing document, duffer.] Can combine multiple ontologies. Community of interest can create own indexing, combine with authoritative metadata to generate own. "Ontology of hatness."

RL: can inherit hierarchy so can query based on that even though not all of the hierarchy is visible.

RL: structure in source becomes ontology in web environment.
The CIDOC website uses a CRM Topic Map to produce a self-organising navigation panel and the visitor can filter pages e.g. related to activity.

Museums and SW: machine-processable information, usually XML. [Paul – N3 – alternative for XML to RDF?] Information re-worked in some way, value added to it. The (end) user of the information isn’t the producer of the information [back to internal/external uses, management of collections vs access]. The producer of the information has no control over or knowledge of the users to which "their" information is put.

RL on being a web 2.0 information publisher:
Define and publish an XML application or use an existing XML format [BK: HTML is fighting back]
Provide a reliable information-delivering service as a stable URL
Accept you have no control over what uses are made of your information

[??] What about museums as consumers/users of web 2.0 information as well as producers? Collaborative future, where museums are contributors to bigger picture? Looking beyond sector, but e.g. relate to other historical content

ITIS? Taxnomic info for species. AAT (art and architecture thesaurus), DNB dictionary national bibliography, Grove Art.

Two XML interchange formats? One for museum-to-museum, one for public information exchange. Mismatch between what public might be interested in (people, places) and what museums have to say, this object exists.

Stuff in Collections Management Systems that isn't about management of objects: events, people, places, stories.

Need standards for interchange of collection level descriptions?

Bamber Gascoigne's Timesearch website?

RL: what is our scope? Machine processable feeds. Generic web 2.0 (e.g. learning objects, mash-ups). Semantic web proper: OWL, Topic maps.

RL would exclude user-generated content (calls it web 1.0) [and concentrate on the] conversion of additional musem resources to web format (e.g. converting publications to TEI).

What's around?
XML based projects, BRICKS ('Building Resources for Integrated Cultural Knowledge Services', CRM OWL-based), English Heritage's MIDAS.

We need low level standards: when was it, where was it? GML for places, ISO time standard.

RL: need to develop one museum delivery format, in XML. Look at naming, look for as much coherence as possible with least amount of effort.

[MR: create something that sits above Collections Management Systems level, provides level of genericisation (generalisation?) i.e. made somewhat generic/interoperable? Then apply transformations to suit different external or internal requirements. What data standard(s) to use?]

[post lunch]

Dylan Edgar
RP: internal SW/external sw. Is the idea getting towards something that could serve as a seed for both? Start with location of museum.

BK: radius space 5 [??] Nottingham Uni mashup of cultural resources. Northumbria?

FLB: loans could be an institution- and sector-lead driver for exchanging data.

NP: push for persistent URI for museum/collection/object.

[??] Competing URIs for museums. MLA Institution Server is latest instance.

[MR: institutions devolving power to other orgs – is this another way of trust/radical trust?
Can this group make a decision on where permanent object URIs should live? Is a sector-wide container possible/advisable? Does it matter if one object has more than one persistent URI?]

RL: go international.
PS: museums change.
RB: museums aren't only places that hold collections.
RB: also problem of definition of object. But leaving out on-going discussions, benefits are worth it.

RP: MDA to tweak definition to provide for URIs?

SK: German institute that is an example of it. Deals with change in holding institution [and effect on URIs].

BK: libraries sector is still arguing about unique identifiers. Got to be HTTP URIs not some new schema.

FLB: URIs should be something that can be implemented in an [iterative] approach, not require that all development stops.

RL: should the URI be required to point to anything? [MR: like a pot of gold at the end of the rainbow?]

RP: this afternoon about getting realistic, what's possible in the UK museum sector today.

DE: benefits of sw: creating better meaning for people. [And how to sell it to policy makers]. How do we get funders excited about it? Similar to documentation debate. Focus on outcomes and benefits rather than 'let's implement sw'. Tie it into wider agendas. Scale and diversity of sector is a challenge. Especially for smaller museums with less or no IT resources.

Two strands: technical and advocacy. Getting practitioner and government on board. Value for money. HLF: interested in impact on people.

RP: are there differences in challenges and opportunities in Scotland?

DE: [his organisation is] smaller, the equivalent of MLA in Scotland but only for museums.

JO: we could be the Model and the Controller, provide the V.

JO/MR: Move from application to being a service. We can still build application on top but others can also build applications too.

Nick Poole
How does it work for practitioners?

[Had diagrams on slides, notes based on those and discussion]

Chain:
Political priorities: social welfare, efficiency.
Departments: DC online [??] a car crash because of fundamental fear of ICT. SW: what you don't know… longer term impact/VFM [value for money?], more inclusive services.
Public sector bodies: MLA, SMC. Longer term strategic change. MLA has never really brought into its own IT strategy for the sector. What SW can do for Knowledge Web, inter-domain data/services. MLA pushing harmonisation across museums, libraries and archive sectors.
Intrastructural organisations: standards, best practice/Standards (non sector): standards, applications. A lot of what we need exists outside our industry. How SW is manifest in; standards, terminological practice.
System providers: client needs, standards environment. SW in the machine. Make it not a threat, and easily understood.
Museum management: Local/national priority, market need; SW: the business case, more for less, the cachet of pathfinding: other nations are doing it better so re-establish national pride. Harder case to argue.
Practitioner: stock control, service delivery; SW: argue that can do the same job, see a bigger return.

Various people including BK: users not in the chain.

Calculus of change. Y Axis: Quick wins, slow burn, tectonics/X Axis: projects, standards, advocacy, background communications, funding/performance.

Projects
Standards
Advocacy
Background communications
Funding/performance
Quick wins Slow-burn Tectonics

[Alex has the diagram of this, I think]

NP: some organisations were so burnt by NOF-digi and Culture Online they're unlikely to digitise anything else any time soon.

What will prevent us getting there? One is the nature and status of this thinktank [?]

Is this just part of what we do or is it new national program?

JO: can we talk to PNDS about re-using data?

BK: users missing from NP's chain. Quality of experience, not quality of [data]?

NP: aggregation?

JO: dream of 'my favourite museum objects'.
[MR: if using something like Exploring 20th Century London data that's published on three sites already (PNDS, Hub and MoL site), where does the 'final' record URI live?]

NP: Standard for resource discovery in museums. Yes.

[MR: How big a piece of work is it? Can it be done iteratively so that small cases can be made to show how it would work, demonstrators, etc? Could it also be a microformat standard that defines a few basic fields and a link to a URI, something that can be implemented quickly a la JO?
Does it tie to learning objects metadata?]

Quick wins and slow burns.

NP: museums still reeling from separation of content and presentation.

[MR: Taking things forward: project to recommend infrastructure and data standards?]

SK: the previous Netful of Jewels report was about what users would be able to do.

BK: SEA: Strategic E-content Alliance (was CIE Common Information Environment).

RP: dissemination to discoverability. Competition to collaboration.

NP: how to get it into museum policy. MR: tie it to funding.

BK: JISC Users and Innovation program?

NP: does industry need core skills in ICT? Or in development of content, mark-up, collections or content management systems?

NP: Open Business [?]

NP: for evidence based policy. RP: research is of interest to him, SK, RB, etc as have students and sources for research funding.

BK: JISC funded open source watch thing [the one with Sebastian Rahtz, presumably]

NP: something to make it tangible and real whether blog or examples. Migration to a new business model. Need some kind of marketing strategy/overarching document that says who we'll talk to, what the strategies are.

ML: SW acts as thing to hang discussions on but afternoon discussions move away from SW. Avoid problem of original Netful that it just seemed like a whinge that could be ignored. Why do joined-up-ness?
Is semantic interoperability a solution? "Semi-mantics"

If talking to David Dawson, how do we point out where the gaps are, that aren't there in EU stuff or Minerva or whatever?

Final summaries
[MR based on something JO said:] apply transformations to the idea for presentation to different audiences just as if it was a schema. E.g. marketing.

NP: call it documentation not SW. What it can do and not what it is in its own right. Show support of sector for project.

[MR: geek stuff should be in the background, stated output should be audience and sector led. Achieve x goals, secret uncool output is in how it's done (sw or SW). Tell story of how it will be used in the end, the implementation of sw as a side effect.]

PS: is anyone a member of W3C? Could be an idea if developing standards on top of standards. What's going on in other sectors?

SK: museums facilitating users, not just providing stuff they think users want.

Jennifer Trant posted about the reaction of the Museums and the Web copy editor to the papers about interactive papers for this year's conference:

"Rather than thinking about the Web site as a reference work, our editor had repositioned it and herself. The museum was no-longer a remote information resource. The technology had become an enabler in the museum space, that made it possible for her to record the story that interested her."

I think that's the type of response that makes the movement towards the user-focused/participatory web so worthwhile.

Exposing the layers of history in cityscapes

I really liked this talk on "Time, History and the Internet" because it touches on lots of things I'm interested in.

I have a on-going fascination with the idea of exposing the layers of history present in any cityscape.

I'd like to see content linked to and through particular places, creating a sense of four dimensional space/time anchored specifically in a given location. Discovering and displaying historical content marked-up with the right context (see below) gives us a chance to 'move' through the fourth dimension while we move through the other three; the content of each layer of time changing as the landscape changes (and as information is available).

Context for content: when was it written? Was it written/created at the time we're viewing, or afterwards, or possibly even before it about the future time? Who wrote/created it, and who were they writing/drawing/creating it for? If this context is machine-readable and content is linked to a geo-reference, can we generate a representation of these layers on-the-fly?

Imagine standing at the base of Centrepoint at London's Tottenham Court Road and being able to ask, what would I have seen here ten years ago? fifty? two hundred? two thousand? Or imagine sitting at home, navigating through layers of historic mapping and tilting down from a birds eye view to a view of a street-level reconstructed scene. It's a long way off, but as more resources are born or made discoverable and interoperable, it becomes more possible.

If Web 3.0 = Semantic Web is this the 'first major' Semantic Web application?

Via Rough Type post 'Freebase: the Web 3.0 machine':

"Artificial intelligence guru Danny Hillis has launched an early version of the first major Web 3.0 application. It's called Freebase, and its grandiose epistemological mission is right up there with those of Google and Wikipedia.

…

The product of Hillis's latest company, Metaweb Technologies, Freebase is a user-generated brain. Like Wikipedia, it allows people to freely add information to it, in the form of text or images or, one assumes, anything else that can be rendered digitally. But it also allows users to add "metadata" about the information – tags that describe what a word or picture is and how it relates to other information.

…

The addition of rich meta tags in a standardized form is what makes Freebase a next-generation Web application – a manifestation of what Tim Berners-Lee long ago dubbed the Semantic Web and what has recently been rebranded Web 3.0 for popular consumption.

…Freebase is really more about the creation of a community of machines than a community of people. The essence of the Semantic Web is the development of a language through which computers can share meaning and hence operate at a higher, more human level of intelligence. The meta tags are crucial to that machine language. Freebase hopes to harness the (free) labor of a big pool of vounteers to add those tags, which is a labor-intensive chore (and a big hurdle on the path to Web 3.0)."

It's worth checking out the IHT article linked above, A 'more revolutionary' Web. I liked this bit:

"A consequence of an open and diffuse Internet, he noted, is that unexpected outcomes can emerge from unanticipated places.

For instance, some early experiments in highlighting new relationships from existing Web data have come out of Flickr, a photo-sharing site that members categorize themselves, and FOAF, which stands for "friend of a friend," a research project to describe the various links between people.

Both add "meaning" where such context did not exist before, just by changing the underlying programming to reflect links between databases, Shadbolt said."

I've put a draft of my CAA paper online because I said I'd get copies to a few people. I'll be re-organising it a little to get away from the Powerpoint slide-ishness of some of it, and re-writing into the 3rd person next week but I'd be interested to hear any comments in the meantime.

Buzzword or benefit: The possibilities of Web 2.0 for the cultural heritage sector, CAA UK 2007

Update: I've put the final version online at the same address (Buzzword or benefit) and moved the draft.

Thanks to everyone who read and commented!