screenshot

Trying computational data generation and entity extraction

I’ve developed this exercise on computational data generation and entity extraction for various information/data visualisation workshops I’ve been teaching lately. As these methods have become more accessible, my dataviz workshops have included more discussion of computational methods for generating data to be visualised. I used to do a text-based version of this, but found that using services that describe images was more accessible and generated richer discussion in class. If you try something like this in your classes I’d love to hear from you.

It’s also a chance to talk about the uses of these technologies in categorising and labelling our posts on social media. We can tell people that their social media posts are analysed for personality traits and mentions of brands, but seeing it in action is much more powerful.

Exercise: trying computational data generation and entity extraction

Time: c. 10 minutes with discussion.

Goal: explore methods for extracting information from text or an image and reflect on what the results tell you about the algorithms

1 Find a sample image

Find an image (e.g. from a news site or digitised text) you can download and drag into the window. It may be most convenient to save a copy to your desktop. Many sites let you load images from a URL, so right- or control-clicking to copy an image location for pasting into the site can be useful.

2 Work in your browser

It’s probably easiest to open each of these links in a new browser window. It’s best to use Firefox or Chrome, if you can. Safari and Internet Explorer may behave slightly differently on some sites. You should not need to register to use these sites – please read the tips below or ask for help if you get stuck.

3 Review the outputs

Make notes, or discuss with your neighbour. Be prepared to report back to the group.

  • What attributes does each tool report on?
  • Which attributes, if any, were unique to a service?
  • Based on this, what do Clarifai, Google, IBM and Microsoft seem to think is important to them (or to their users)?
  • How many of possible entities (concepts, people, places, events, references to time or dates, etc) did it pick up?
  • Is any of the information presented useful?
  • Did it label anything incorrectly?
  • What options for exporting or saving the results did the demo offer? What about the underlying service or software?
  • For tools with configuration options – what could you configure? What difference did changing classifiers or other parameters  make?
  • If you tried it with a few images, did it do better with some than others? Why might that be?

This exercise focuses on images, but you can try a similar exercise with text-based tools like Stanford’s Natural Language Processing (NLP) demo http://corenlp.run/, DBPedia https://dbpedia-spotlight.github.io/demo/ and Ontotext http://tag.ontotext.com/

Spoiler alert!

screenshot
Clarifai’s image recognition tool with a historical image

Leave a Reply

Your email address will not be published. Required fields are marked *