Internet Archive Items

What Is an Item?

Archive.org is made up of “items”. An item is a logical “thing” that we represent on one web page on archive.org. An item can be considered as a group of files that deserve their own metadata. If the files in an item have separate metadata, the files should probably be in different items. An item can be a book, a song, an album, a dataset, a movie, an image or set of images, etc. Every item has an identifier that is unique across archive.org.

How Items Are Structured

An item is just a directory of files and possibly subdirectories. Every item has at least two files named in the following format for more information on what an identifier is):

  • <identifier>_files.xml
  • <identifier>_meta.xml

The _meta.xml file is an XML file containing all of the metadata describing the item. The _files.xml file is an XML file containing all of the file-level metadata. There can only be one _meta.xml file and one _files.xml file per item.

Alongside these metadata files and the original files uploaded to the item, the item may also contain derivative files automatically generated by archive.org.

Item Limitations

As a rule of thumb, items should:

  • not be over 100GB
  • not contain more than 10,000 files.

Collections

All items must be part of a collection. A collection is simply an item with special characteristics. Besides an image file for the collection logo, files should never be uploaded directly to a collection item. Items can be assigned to a collection at the time of creation, or after the item has been created by modifying the collection element in an items metadata to contain the identifier for the given collection (i.e. ia metadata <identifier> -m collection:<collection-identifier>. Currently collections can only be created by archive.org staff. Please contact info@archive.org if you need a collection.

Archival URLs

An item’s “details” page will always be available at:

https://archive.org/details/<identifier>

The item directory is always available at:

https://archive.org/download/<identifier>

A particular file can always be downloaded from:

https://archive.org/download/<identifier>/<filename>

Note: Archival URLs may redirect to an actual server that contains the content. The resultant URL is not a permalink. For example, the archival URL:

https://archive.org/download/popeye_taxi-turvey/popeye_taxi-turvey_meta.xml

currently redirects to:

https://ia802304.us.archive.org/30/items/popeye_taxi-turvey/popeye_taxi-turvey_meta.xml

DO NOT LINK to any archive.org URL that begins with numbers like this. This refers to the particular machine that we’re serving the file from right now, but we move items to new servers all the time. If you link to this sort of URL, instead of the archival URL, your link WILL break at some point.