In the last blog post I made a proposal to help make OER easier to discover automatically by web crawlers. The immediate reaction can be summed up with "interesting" with many raising specific points of concerns about where the proposal fails. I’d like to thank everyone who pitched in, by commenting on the blog post or by email, particularly Nathan Yergler, the CTO of Creative Commons, and Scott Wilson, who built Ensemble.
I’ve been listening to everyone and I want to write up some of my notes from these discussions, which is this blog post.
Recap: What’s the problem and what is oer.txt?
There was confusion about the exact definition of the problem oer.txt is trying to solve and what oer.txt is. The problem is that automatically identifying educational resources is not easy. There is no widely accepted way to help web crawlers find OER. We have several technologies that support OER dissemination (e.g. RSS, OPML, OAI-PMH) but there is no way to say this (say) RSS feed is OER as opposed to a blog’s feed.
The best analogy for what oer.txt is that it’s like a road sign: it points a compatible crawler to the URLs OER can be found on a website. I specified it to say what format the URLs are in so that a crawler can choose which ones to pursue. Anything beyond that, like metadata about what the OER actually is, which format it’s in, which education level it is aimed at, etc, are all intentionally out of scope.
A simpler analogy is that oer.txt is merely an advertisment for what you already have.
Interestingly, no one is saying there isn’t a problem to be solved here. I also want to be 100% clear: I honestly do not mind what the final solution looks like and if oer.txt is the wrong one, great, let’s agree a better one. I’ll be the first to kick oer.txt out the door!
Summary of discussions so far
So what are people saying? In short, the design of the oer.txt solution is wrong on two counts:
- It’s OER-specific, meaning that it doesn’t help/work for other problems. This also has the knock-on effect of…
- It needs something new to be agreed, namely the oer.txt file, and so why not use what is already in use like OPML, RSS, etc?
Autodiscovery alternatives: link tags
The first theme that emerged is that we already have a way for autodiscovery by directly embedding <link> tags with rel="alternate" attributes in HTML. This is already in wide-spread use (it’s how Firefox knows this blog has an RSS feed for example) and so why not use that? It’s a great idea, and the two questions I have about this are:
- Which HTML page would have this tag? The home page or all pages?
- What do these alternate links point to? The simplest solution would be what is already being released. For example, a course’s home page could have two alternate links, one pointing to the course’s own machine-readable feed and one to the website’s machine-readable feed.
We would still need a way to mark these URLs as OER as opposed to any other type of feed (like a blog’s RSS feed). We could specify a new rel attribute value as a way to tag OER. For example, to tag an RSS feed we currently use:
Instead we could use:
(Fictitious URLs for the sake of example.)
Autodiscovery alternatives: robots.txt itself
Another alternative is to hook into robots.txt exactly like how the Sitemaps protocol does. In this case, instead of pointing to sitemap URLs, we point to OER URLs. This side-steps the need for a separate oer.txt file (good) but might require us to agree a protocol analogous to Sitemaps (bad). I say "might" as I see no problem in pointing directly to whichever format is already produced, be it RSS, Atom, OAI-PMH, etc. This solves the two problems of communicating where the OER is and does what oer.txt aims to do. An example of this OER-enhanced robots.txt could be something like:
(Apart from the OpenSeach URL, these URLs are fictitious.)
We can also adopt an already-established format like RDF as our Sitemaps protocol analogue, as per the last line in this above example.
Format alternatives: OPML
OPML might be the solution we seek instead of oer.txt, if everyone actually uses it. This is a good suggestion, and would work perfectly with the HTML link tags autodiscovery.
The question I have is what new attributes do we need for the <outline> tags to have to make OPML more useful for OER?
Format alternatives: RDF or POWDER
This idea basically says we can add richness when we advertise our OER in a machine readable format, and we can do so with already established formats like RDF or POWDER. This means we not only tag resources as educational but we also have a way to add extra meta data.
I’m of two minds about this idea. On the one hand I’m really keen to keep things simple: oer.txt is just a list of what you already have, which is as simple as it gets. On the other hand, have a bit of richness beyond a bare-bones format like oer.txt would be very useful. At the end of the day, it’s what the community, particularly the content producers, are happy with that makes this decision.
The other thing to consider is that RDF or POWDER are excellent candidates for a protocol analogous to Sitemaps in robots.txt as explained above.