This entry is a technical guide to help you to improve the metadata on your digital resources. Often, creators of resources will complain that “their content gets lost in the repository”, or that “this metadata stuff never works”, or that “the LMS is broken as it doesn’t read our metadata properly”. If these are problems that afflict you, or your users, then read on.
Recently, we’ve been asked to look at metadata that has been used to describe a range of different files. Metadata, or (“data about data”) is a way of describing what is in a file without actually needing to open a file. Anyone that has ever uploaded files from a digital camera to a PC will be familiar with metadata (“what’s in DSCN0001.jpg?”); hit a button on your camera and it’ll show you the exposure time, focal length, compression, file size… The good thing is that your camera probably gets it right.
Metadata in the e-learning space is often an XML document that has been prepared using a schema. Before we go any further, the next two points underpin everything. If you take nothing else away from reading this, take these two:
Well Formed XML. It is an obvious point, but XML documents really should be well formed if they are to stand a chance of being read by a consuming system. I think every XML based programming library I have used, and certainly all of the shrink wrapped tools, offer some method for checking if a document is well formed. Seriously, there is no excuse for sending a malformed XML document to someone. It’s NEVER going to work, and it takes all of a minute of your time to fix it.
Validate the XML. A related point to the above. If you are using a schema, then your document should be conformant to the schema. Again, this is a five second check, and is something that can be automated. If your XML isn’t validated, then there is a really good chance that the consuming system will find an error in it. If there is an error, then your resource is at the mercy of the consuming system. Said system may well import your content and dump the metadata, or take a ‘best guess’ at what your metadata might be.
The two points above are things that you should just do. Especially if you have spent hours working out a beautiful classification for your resource. That work will be all but wasted if the saved metadata file has “machine errors” in it.
But I use a tool for this stuff! Great! Have you checked it’s output? We’ve just looked at hundreds of machine generated metadata files. They have a lot of errors in them, most, if not all would have been found if checked by a validator.
Here’s a brief list of the types of things that we’ve seen in metadata files:
- Schema Confusion: The various different flavours of metadata schema are different. IMS 1.1 is different to IMS 1.2.1. This means that you can’t take an IMS 1.2.1 metadata file and simply change the schema to be IMS 1.1. They’re different. Yet I’ve seen a metadata file which uses IMS 1.2.1 metadata in a file that is supposed to validate against IMS 1.1.
- Namespace Confusion: Related to the above, XML elements live in namespaces, and schemas specify the namespace that they describe. Putting a namespace declaration on an XML element puts that element, and it’s children, into that namespace. It’s then quite easy to blow vast chunks of your metadata out of the correct namespace and into another one.
- Extension Madness: Some schemas allow you to use extensions, some don’t. If your schema doesn’t allow you to put an extension in somewhere, then don’t just stick a random element from a different namespace into it. That’s an illegal extension. A validator will spot this.
- Illegal values: This is where the schema will impose some sort of constraints on how a value is to be formatted. This could be as simple as “no spaces”. A validator will inform you if you break the rules for the format of your data.
- Default Values: This is a case of using an authoring tool to create your metadata, and not getting the job finished. I really don’t think you mean the value of one of your elements to be, “Default value. Please Change”. These errors will not be found by a validator, and a tool will probably let you save files with these sorts of errors in them.
I’ve not considered the likes of application profiles, nor choosing vocabularies and the values from them, nor how to make your items more discoverable. Suffice it to say, if you are bound by extra rules, then you really need to check that you’ve followed them. SCORM for instance, requires certain metadata elements to be present depending on what and where the metadata is attached to something. I’ve not come across a validator outside of some research work yet that handles this sort of thing. I’ve got some prototype code that reports on these sorts of things, drop me a line if you’re interested in using it.
My final thought on this? Errors do occur in the consuming systems. You can have perfectly tagged resources that are rejected. If there is a bug in the consuming system, then it’s their problem to fix it. If there’s a bug in your metadata, and they find it, then you’ll probably get sent a bill for the trouble. Validating takes seconds of your time. Building a simple validator into code that produces swathes of data on a button press, takes a few minutes of your time.
Thanks for reading.