Home

Metadata
Metadata @ Melbourne Common Element Set Web metadata Cultural Collections Middle East and North Africa Before 1860 Other communities Links Additional Information Glossary

Types of Metadata

Please note: This page currently only discusses different types of metadata. After further discussion and as decisions are made in various aspects of the metadata field (e.g. taxonomies, thesauri, vocabularies (controlled and uncontrolled), ontologies) further information will be added to this site.

Acknowledgement: The following base text was copied from the 'Metadata Handbook'. It was then amended.

General v. specialist

The Dublin Core Element Set (DC) is regarded as generalist metadata because it is commonly used to describe resources across all domains, while on the other hand IEEE LOM is classed as specialist metadata since it is designed for a specific community, in this case educational resources ("learning objects"). Its elements are designed to capture educational context and pedagogical information in addition to descriptive data.

Minimalist v. rich

Minimalist schema elements tend to be generic in nature at a high level of granularity, e.g. there would be an element for creator - it would not be possible to qualify this name (either personal or organizational) as to whether it was an author, illustrator, editor etc. and if there was more than one creator there would be no way to distinguish who had primary or major responsibility. These schemas also tend to have a limited set of elements.

General metadata is often minimalist in nature with specialist metadata schemas being richer in data collected.

Minimalist schemas tend to describe objects in isolation either with very cursory or no relationship data included.

A rich metadata schema proposes a comprehensive way of describing the world as viewed by a specific community. This is usually to a fine level of granularity. One of the earliest schemas to do this was AACR (Anglo-American Cataloguing Rules) which since the 1960s has been encoded using MARC (MAchine Readable Cataloging). AACR2R (second edition revised) is the bibliographic standard used by libraries to describe what is in their collections.

Complex or rich schemas can also be used at minimalist level. That is, organisations may adopt the use of only a few elements of a schema to describe their resources. For example, the VET metadata application profile (Vetadata) recommends that, at a minimum, organisations use 5 elements to describe their resources. These elements will facilitate basic search and enable organisations to share and exchange resource information.

General.Identifier.Catalog
General.Identifier.Entry or Technical.Location
General.Title
General.Description
General.Keyword

In only using a minimal set of elements from a rich schema consideration should be given whether this is sufficient to adequately describe the object. How much information is conveyed by the object or collection of which it may be a part?

Some organizations, investing substantial funds in the creation of learning materials (e.g. The Learning Federation, and the VET Learning Object Repositories Project) devote a greater effort to describing these resources to enable the largest exposure to the targeted audience and also to increase recall and precision. Rich sets of metadata will not only include the commonly considered mandatory elements of author, title, description and location, but also the use of educational context elements and the use of controlled vocabularies to describe such attributes as subject, education level, type of resource and target audiences.

Hierarchical v. linear

There are two main types of element structures - hierarchical and linear (flat). Hierarchical schemas are characterised by the nesting of elements and sub-elements. This structure identifies and displays relationships between elements. An example of a hierarchical schema is IEEE LOM. There is also the ability of 'parent' elements, in the example 'Taxon', to have multiple children 'Id' and 'Entry'.

9. Classification
9.1 Classification.Purpose
9.2 Classification.TaxonPath
9.2.1 Classification.TaxonPath.Source
9.2.2 Classification.TaxonPath.Taxon
9.2.2.1 Classification.TaxonPath.Taxon.Id
9.2.2.2 Classification.TaxonPath.Taxon.Entry
9.3 Classification.Description
9.4 Classification.Keyword

A linear schema is characterized by the absence of element relationships. Each element is unique and defines a specific data element. Dublin Core is an example of a linear schema.

Machine generated v. human authored

Humans create metadata by writing descriptions of resources either in a structured or unstructured form. Computer applications can extract certain information from a resource or its context. This may involve simply capturing information that is already available, such as the format of the file, or running an algorithm to determine the subject of a textual resource by counting keywords or by checking and analysing pointers to the resource. Some applications use complex algorithms to increase the accuracy of the machine generated metadata. Google is an example of a system that creates and uses machine generated metadata.

Structured v. unstructured

Metadata is considered to be structured when it complies with a set of rules or specifications for data entry and/or data structures. The structure of the elements and their attributes can be either simple or complex.

Embedded v. detached

While the term metadata has existed before the advent of the World Wide Web, it has taken on a special significance in the context of the online delivery of information. HTML can be used to record metadata � known as embedded metadata � as well as the instructions for rendering information on a web page. The two most common tags used for embedded metadata are DESCRIPTION and KEYWORDS.

Detached metadata is stored in files separate from the the resource and contains a link or some other method of identification to the item it describes.

Metadata assigned to a resource is usually stored in a container or package called a record. A large aggregation of records is a database. The records can be held separately (detached) from the resources they describe, or metadata records and their resources may be held together in a repository.

Surface information

Some information that is useful in managing resources is available directly from the resource. For example, the title of the resource may be clearly indicated as its heading. The authors may be clearly identified. Information that can be gathered by machines and converted into metadata is known as 'surface' metadata and the process of gathering it is known as screen scraping. This process can be used to populate repositories with structured metadata, especially where the resources are uniformly marked-up and well-formed.

Other types of metadata � keywords, Google, tags, user assigned

There are a range of different types of resource description that may be useful. Historically, in the Web world, keywords were included in a resource to aid discovery. They are located in the 'header' (machine instructions section) of the web page in between <meta />...<meta /> markup tags. Lack of trust in these keywords followed their abuse by people who entered misleading words to attract attention to their sites (aka spamming). More formal entries in the meta tags, such as those following the Dublin Core schema, provided greater structure and granularity.

Google has shown that the words in the resource may be misleading but those relating to it in another resource that points to the first are more likely to be reliable. This is the base of the algorithm that Google uses, avoiding the 'keyword stacking' described earlier.

'Tags' are a more recently popular type of metadata that are simple words attached to a resource. They are usually found in some common location within the resource or linked to it, such as those in blogs that are placed within the link markup tags: e.g.<a href="http://www.example.org/rabbit rel="rabbit"> where the term rabbit is the tag. Tags are particularly associated with the emerging Web 2.0, or Semantic Web.

top of page

Sep	OCT	Nov
	24
2008	2009	2010

Information Services Metadata

Metadata