Certain functions of the internetarchive library require your archive.org credentials (i.e. uploading, modifying metadata, searching). Your credentials and other configurations can be provided via a dictionary when instantiating an ArchiveSession or Item object, or in a config file.
The easiest way to create a config file is with the configure function:
>>> from internetarchive import configure
>>> configure('user@example.com', 'password')
Config files are stored in either $HOME/.ia or $HOME/.config/ia.ini by default. You can also specify your own path:
>>> from internetarchive import configure
>>> configure('user@example.com', 'password', config_file='/home/jake/.config/ia-alternate.ini')
Custom config files can be specified when instantiating an ArchiveSession object:
>>> from internetarchive import get_session
>>> s = get_session(config_file='/home/jake/.config/ia-alternate.ini')
Or an Item object:
>>> from internetarchive import get_item
>>> item = get_item('nasa', config_file='/home/jake/.config/ia-alternate.ini')
Your IA-S3 keys are required for uploading and modifying metadata. You can retrieve your IA-S3 keys at https://archive.org/account/s3.php.
They can be specified in your config file like so:
[s3]
access = mYaccEsSkEY
secret = mYs3cREtKEy
Or, using the ArchiveSession object:
>>> from internetarchive import get_session
>>> c = {'s3': {'access': 'mYaccEsSkEY', 'secret': 'mYs3cREtKEy'}}
>>> s = get_session(config=c)
>>> s.access_key
'mYaccEsSkEY'
Your archive.org logged-in cookies are required for downloading access-restricted files that you have permissions to and retrieving information about archive.org catalog tasks.
Your cookies can be specified like so:
[cookies]
logged-in-user = user%40example.com
logged-in-sig = <redacted>
Or, using the ArchiveSession object:
>>> from internetarchive import get_session
>>> c = {'cookies': {'logged-in-user': 'user%40example.com', 'logged-in-sig': 'foo'}}
>>> s = get_session(config=c)
>>> s.cookies['logged-in-user']
'user%40example.com'
You can specify logging levels and the location of your log file like so:
[logging]
level = INFO
file = /tmp/ia.log
Or, using the ArchiveSession object:
>>> from internetarchive import get_session
>>> c = {'logging': {'level': 'INFO', 'file': '/tmp/ia.log'}}
>>> s = get_session(config=c)
By default logging is turned off.
By default all requests are HTTPS in Python versions 2.7.10 or newer. You can change this setting in your config file in the general section:
[general]
secure = False
Or, using the ArchiveSession object:
>>> from internetarchive import get_session
>>> s = get_session(config={'general': {'secure': False}})
In the example above, all requests will be made via HTTP.
The ArchiveSession object is subclassed from requests.Session. It collects together your credentials and config.
Return a new ArchiveSession object. The ArchiveSession object is the main interface to the internetarchive lib. It allows you to persist certain parameters across tasks.
Parameters: |
|
---|---|
Returns: | ArchiveSession object. |
Usage:
>>> from internetarchive import get_session
>>> config = dict(s3=dict(access='foo', secret='bar'))
>>> s = get_session(config)
>>> s.access_key
'foo'
From the session object, you can access all of the functionality of the internetarchive lib:
>>> item = s.get_item('nasa')
>>> item.download()
nasa: ddddddd - success
>>> s.get_tasks(task_ids=31643513)[0].server
'ia311234'
Item objects represent Internet Archive items. From the Item object you can create new items, upload files to existing items, read and write metadata, and download or delete files.
Get an Item object.
Parameters: |
|
---|
>>> from internetarchive import get_item
>>> item = get_item('nasa')
>>> item.item_size
121084
Uploading to an item can be done using Item.upload():
>>> item = get_item('my_item')
>>> r = item.upload('/home/user/foo.txt')
>>> from internetarchive import upload
>>> r = upload('my_item', '/home/user/foo.txt')
The item will automatically be created if it does not exist.
Refer to archive.org Identifiers for more information on creating valid archive.org identifiers.
Remote filenames can be defined using a dictionary:
>>> from io import BytesIO
>>> fh = BytesIO()
>>> fh.write(b'foo bar')
>>> item.upload({'my-remote-filename.txt': fh})
Upload files to an item. The item will be created if it does not exist.
Parameters: |
|
---|---|
Returns: | A list of requests.Response objects. |
Modify the metadata of an existing item on Archive.org.
Parameters: |
|
---|---|
Returns: | requests.Response object or requests.Request object if debug is True. |
The default target to write to is metadata. If you would like to write to another target, such as files, you can specify so using the target parameter. For example, if we had an item whose identifier was my_identifier and you wanted to add a metadata field to a file within the item called foo.txt:
>>> r = modify_metadata('my_identifier', metadata=dict(title='My File'), target='files/foo.txt')
>>> from internetarchive import get_files
>>> f = list(get_files('iacli-test-item301', 'foo.txt'))[0]
>>> f.title
'My File'
You can also create new targets if they don’t exist:
>>> r = modify_metadata('my_identifier', metadata=dict(foo='bar'), target='extra_metadata')
>>> from internetarchive import get_item
>>> item = get_item('my_identifier')
>>> item.item_metadata['extra_metadata']
{'foo': 'bar'}
Download files from an item.
Parameters: |
|
---|---|
Return type: | bool |
Returns: | True if all files were downloaded successfully. |
Delete files from an item. Note: Some system files, such as <itemname>_meta.xml, cannot be deleted.
Parameters: |
|
---|
Get File objects from an item.
Parameters: |
|
---|
>>> from internetarchive import get_files
>>> fnames = [f.name for f in get_files('nasa', glob_pattern='*xml')]
>>> print(fnames)
['nasa_reviews.xml', 'nasa_meta.xml', 'nasa_files.xml']
Search for items on Archive.org.
Parameters: |
|
---|---|
Returns: | A Search object, yielding search results. |
Get tasks from the Archive.org catalog. internetarchive must be configured with your logged-in-* cookies to use this function. If no arguments are provided, all queued tasks for the user will be returned.
Parameters: |
|
---|---|
Returns: | A set of CatalogTask objects. |