Preliminary Inventory of Digital Collections by Jason Ronallo

Upgrading from Ubuntu 17.10 to 18.04

2018-05-26T14:59:00-04:00

I just upgraded from Xubuntu 17.10 to 18.04. It was a smooth upgrade and most everything appeared to be working. Below are the few issues I ran into. I’ll update this post when I uncover (and hopefully resolve) other issues.

curl

The curl CLI was removed for some reason. I use it often and various scripts I use also rely on it being present. So I reinstalled it.

quicktile

I use quicktile to give me configurable keyboard shortcuts for tiling my windows. This solution from a GitHub issue worked for me.

Pop!_OS theme

I’m using the Pop!_OS theme with Xubuntu. It has a really nice dark theme. In order to get the new version to install I had to purge all of the packages and reinstall them.

Power indicator

The power indicator I was using in my notification area was removed during installation. I had to fiddle with different panel and panel notification widget settings to get it right again. The only helpful part I can note is that running xfce4-panel --restart is useful to get settings applied.

Mouse Acceleration

The trackpoint mouse acceleration became super fast. First I had to uninstall libinput and reinstall evdev instead. I got that from this issue. After that I was able to go into settings and adjust the mouse speed down again.

SSH agent issue

I had an issue where I couldn’t connect with a remote server anymore via ssh. ssh-add fixed it up.

Choosing a Path Forward for IIIF Audio and Video

2017-02-02T21:00:00-05:00

IIIF is working to bring AV resources into IIIF. I have been thinking about how to bring to AV resources the same benefits we have enjoyed for the IIIF Image and Presentation APIs. The initial intention of IIIF, especially with the IIIF Image API, was to meet a few different goals to fill gaps in what the web already provided for images. I want to consider how video works on the web and what gaps still need to be filled for audio and video.

This is a draft and as I consider the issues more I will make changes to better reflect my current thinking.

See updates at the end of this post.

Images

When images were specified for the web the image formats were not chosen, created, or modified with the intention of displaying and exploring huge multi-gigabit images. Yet we have high resolution images that users would find useful to have in all their detail. So the first goal was to improve performance of delivering high resolution images. The optimization that would work for viewing large high resolution images was already available; it was just done in multiple different ways. Tiling large images is the work around that has been developed to improve the performance of accessing large high resolution images. If image formats and/or the web had already provided a solution for this challenge, tiling would not have been necessary. When IIIF was being developed there were already tiling image servers available. The need remained to create standardized access to the tiles to aid in interoperability. IIIF accomplished standardizing the performance optimization of tiling image servers. The same functionality that enables tiling can also be used to get regions of an image and manipulate them for other purposes. In order to improve performance smaller derivatives can be delivered for use as thumbnails on a search results page.

The other goal for the IIIF Image API was to improve the sharing of image resources across institutions. The situation before was both too disjointed for consumers of images and too complex for those implementing image servers. IIIF smoothed the path for both. Before IIIF there was not just one way of creating and delivering tiles, and so trying to retrieve image tiles from multiple different institutions could require making requests to multiple different kinds of APIs. IIIF solves this issue by providing access to technical information about an image through an info.json document. That information can then be used in a standardized way to extract regions from an image and manipulate them. The information document delivers the technical properties necessary for a client to create the URLs needed to request the given sizes of whole images and tiles from parts of an image. Having this standard accepted by many image servers has meant that institutions can have their choice of image servers based on local needs and infrastructure while continuing to interoperate for various image viewers.

So it seems as if the main challenges the IIIF Image API were trying to solve were about performance and sharing. The web platform had not already provided solutions so they needed to be developed. IIIF standardized the pre-existing performance optimization pattern of image tiling. Through publishing information about available images in a standardized way it also improved the ability to share images across institutions.

What other general challenges were trying to be solved with the IIIF Image API?

Video and Audio

The challenges of performance and sharing are the ones I will take up below with regards to AV resources. How does audio and video currently work on the web? What are the gaps that still need to be filled? Are there performance problems that need to be solved? Are there challenges to sharing audio and video that could be addressed?

AV Performance

The web did not gain native support for audio and video until later in its history. For a long time the primary ways to deliver audio and video on the web used Flash. By the time video and audio did become native to the web many of the performance considerations of media formats already had standard solutions. Video formats have such advanced lossy compression that they can sometimes even be smaller than an image of the same content. (Here is an example of a screenshot as a lossless PNG being much larger than a video of the same page including additional content.) Tweaks to the frequency of full frames in the stream and the bitrate for the video and audio can further help improve performance. A lot of thought has been put into creating AV formats with an eye towards improving file size while maintaining quality. Video publishers also have multiple options for how they encode AV in order to strike the right balance for their content between compression and quality.

Progressive Download

In addition video and audio formats are designed to allow for progressive download. The whole media file does not need to be downloaded before part of the media can begin playing. Only the beginning of the media file needs to be downloaded before a client can get the necessary metadata to begin playing the video in small chunks. The client can also quickly seek into the media to play from any arbitrary point in time without downloading the portions of the video that have come before or after. Segments of the media can be buffered to allow for smooth playback. Requests for these chunks of media can be done with a regular HTTP web server like Apache or Nginx using byte range requests. The web server just needs minimal configuration to allow for byte range requests that can deliver just the partial chunk of bytes within the requested range. Progressive download means that a media file does not have to be pre-segmented–it can remain a single whole file–and yet it can behave as if it has been segmented in advance. Progressive download effectively solves many of the issues with the performance of the delivery of very long media files that might be quite large in size. Media files are already structured in such a way that this functionality of progressive download is available for the web. Progressive download is a performance optimization similar to image tiling. Since these media formats and HTTP already effectively solve the issue of quick playback of media without downloading the whole media file, there is no need for IIIF to look for further optimizations for these media types. Additionally there is no need for special media servers to get the benefits of the improved performance.

Quality of Service

While progressive download solves many of the issues with delivery of AV on the web based on how the media files are constructed, it is a partial solution. The internet does not provide assurances on quality of service. A mobile device at the edge of the range of a tower will have more latency in requesting each chunk of content than a wired connection at a large research university. Even over the same stable network the time it takes for a segment of media to be returned can fluctuate based on network conditions. This variability can lead to media playback stuttering or stalling while retrieving the next segment or taking too much time to buffer enough content to achieve smooth playback. There are a couple different solutions to this that have been developed.

With only progressive download at your disposal one solution is to allow the user to manually select a rendition to play back. The same media content is delivered as several separate files at different resolutions and/or bitrates. Lower resolutions and bitrates mean that the segments will be smaller in size and faster to deliver. The media player is given a list of these different renditions with labels and then provides a control for the user to choose the version they prefer. The user can then select whether they want to watch a repeatedly stalling, but high quality, video or would rather watch a lower resolution video playing back smoothly. Many sites implement this pattern as a relatively simple way to take into account that different users will have different network qualities. The problem I have found with this solution for progressive download video is that I am often not the best judge of network conditions. I have to fiddle with the setting until I get it right if I ever do. I can set it higher than it can play back smoothly or select a much lower quality than what my current network could actually handle. I have also found sites that set my initial quality level much lower than my network connection can handle which results in a lesser experience until I make the change to a higher resolution version. That it takes me doing the switching is annoying and distracting from the content.

Adaptive Bitrate Formats

To improve the quality of the experience while providing the highest quality rendition of the media content that the network can handle, other delivery mechanisms were developed. I will cover in general terms a couple I am familiar with, that have the largest market share, and that were designed for delivery over HTTP. For these formats the client measures network conditions and delivers the highest quality version that will lead to smooth playback. The client monitors how long it takes to download each segment as well as the duration of the current buffer. (Sometimes the client also measures the size of the video player in order to select an appropriate resolution rendition.) The client can then adapt on the fly to network conditions to play the video back smoothly without user intervention. This is why it is called “smooth streaming” in some products.

For adaptive bitrate formats like HLS and MPEG-DASH what gets initially delivered is a manifest of the available renditions/adaptations of the media. These manifests contain pointers for where (which URL) to find the media. These could be whole media files for byte range requests, media file segments as separate files, or even in the case of HLS a further manifest/playlist file for each rendition/stream. While the media is often referred to in a manifest with relative URLs, it is possible to serve the manifest from one server and the media files (or further manifests) from a different server like a CDN.

How the media files are encoded is important for the success of this approach. For these formats the different representations can be pre-segmented into the same duration lengths for each segment across all representations. In a similar way they can also be carefully generated single files that have full frames relatively close together within a file and all have these full frames synchronized between all the renditions of the media. For instance all segments could be six seconds with an iframe every 2 seconds. This careful alignment of segments allows for switching between representations without having glitchy moments where the video stalls, without the video replaying or skipping ahead a moment, and with the audio staying synchronized with the video.

It is also possible in the case of video to have one or more audio streams separate from the video streams. Separate audio streams aligned with the video representations will have small download sizes for each segment which can allow a client to decide to continue to play the audio smoothly even if the video is temporarily stalled or reduced in quality. One use case for this audio stream performance optimization is the delivery of alternative language tracks as separate audio streams. The video and audio bitrates can be controlled by the client independently.

In order for adaptive formats like this to work all of the representations need to have the next required segment ready on the server in case the client decides to switch up or down bitrates. While cultural heritage use cases that IIIF considers do not include live streaming broadcasts, the number of representations that all need to be encoded and available at the same time effects the “live edge”–how close to real-time the stream can get. If segments are available in only one high bitrate rendition then the client may not be able to keep up with a live broadcast. If all the segments are not available for immediate delivery then it can lead to playback issues.

The manifests for adaptive bitrate formats also include other helpful technical information about the media. (For HLS the manifest is called a master playlist and for MPEG-DASH a Media Presentation Description.) Included in these manifests can be the duration of the media, the maximum/minimum height and width of the representations, the mimetype and codecs (including MP4 level) of the video and audio, the framerate or sampling rate, and lots more. Most importantly for quality of experience switching, each representation includes a number for its bandwidth. There are cases where content providers will deliver two video representations with the same height and width and different bitrates to switch between. In these cases it is a better experience for the user to maintain the resolution and switch down a bandwidth than to switch both resolution and bandwidth. The number of representations–the ladder of different bandwidth encodes–can be quite extensive for advanced cases like Netflix over-the-top (OTT aka internet) content delivery. These adaptive bitrate solutions are meant to scale for high demand use cases. The manifests can even include information about sidecar or segmented subtitles and closed captions. (One issue with adaptive formats is that they may not play back across all devices, so many implementations will still provide progressive download versions as a fallback.) Manifests for adaptive formats include the kind of technical information that is useful for clients.

Because there are existing standards for the adaptive bitrate pattern that have broad industry and client support, there is no need to attempt to recreate these formats.

AV Performance Solved

All except the most advanced video on demand challenges have current solutions through ubiquitous video formats and adaptive bitrate streaming. As new formats like VP9 increase in adoption the situation for performance will improve even further. These formats have bitrate savings through more advanced encoding that greatly reduces file sizes while maintaining quality. This will mean that adaptive bitrate formats are likely to require fewer renditions than are typically published currently. Note though that in some cases smaller file sizes and faster decoding comes at the expense of much slower encoding when trying to keep a good quality level.

There is no need for the cultural heritage community to try to solve performance challenges when the expert AV community and industry has developed advanced solutions.

Parameterized URLs and Performance

One of the proposals for providing a IIIF AV API alongside the Image API involves mirroring the existing Image API by providing parameters for segmenting and transforming of media. I will call this the “parameterized approach.” One way of representing this approach is this URL:

http://server/prefix/identifier/timeRegion/spaceRegion/timeSize/spaceSize/rotation/quality.format

You can see more about this type of proposal here and here. The parameters after the identifier and before the quality would all be used to transform the media.

For the Image API the parameterized approach for retrieving tiles and other derivatives of an image works as an effective performance optimization for delivery. In the case of AV having these parameters does not improve performance. It is already possible to seek into progressive download and adaptive bitrate formats. There is not the same need to tile or zoom into a video as there is for a high definition image. A good consumer monitor will show you as full a resolution as you can get out of most video.

And these parameters do not actually solve the most pressing media delivery performance problems. The parameterized approach probably is not optimizing for bitrate which is one of the most important settings to improve performance. Having a bitrate parameter within a URL would be difficult to implement well. Bitrate could significantly increase the size of the media or increase visible artifacts in the video or audio beyond usability. Would the audio and video bitrates be controlled separately in the parameterized approach? Bitrate is a crucially important parameter for performance and not one I think you would put into the hands of consumers. It will be especially difficult as bitrate optimization for video on demand is slow and getting more complicated. In order to optimize variable bitrate encoding 2-pass encoding is used and slower encoding settings can further improve quality. With new formats with better performance for delivery, bitrate is reduced for the same quality while encoding is much slower. Advanced encoding pipelines have been developed that perform metrics on perceptual difference so that each video or even section of a video can be encoded at the lowest bitrate that still maintains the desired quality level. Bitrate is where performance gains can be made.

The only functionality proposed for IIIF AV that I have seen that might be helped by the parameterized approach is download of a time segment of the video. This is specific to download of just that time segment. Is this use case big enough to be seriously considered for the amount of complexity it adds? Why is download of a time segment crucial? Why would most cases not be met with just skipping to that section to play? Or can the need be met with downloading the whole video in those cases where download is really necessary? If needed any kind of time segment download use case could live as a separate non-IIIF service. Then it would not have any expectation of being real-time. I doubt most would really see the need to implement a download service like this if the need can be met some other way. In those cases where real-time performance to a user does not matter those video manipulations could be done outside of IIIF. For any workflow that needs to use just a portion of a video the manipulation could be a pre-processing step. In any case if there is really the desire for a video transformation service it does not have to be the IIIF AV API but could be a separate service for those who need it.

Most of the performance challenges with AV have already been solved via progressive download formats and adaptive bitrate streaming. Remaining challenges not fully solved with progressive download and adaptive bitrate formats include live video, server-side control of quality of service adaptations, and greater compression in new codecs. None of these are the types of performance issues the cultural heritage sector ought to try to take on, and the parameterized approach does not contribute solutions to these remaining issues. Beyond these rather advanced issues, performance is a solved problem that has had a lot of eyes on it.

If the parameterized approach is not meant to help with optimizing performance what problem is it trying to solve? The community would be better off steering clear of this trap of trying to optimize for performance and instead focus on problems that still need to be solved. The parameterized approach is sticking with a performance optimization pattern that does not add anything for AV. It has a detrimental fixation on the bitstream that does not work for AV especially as adaptive bitrate segmented formats are concerned. It appears motivated by some kind of purity of approach rather than taking into account the unique attributes of AV and solving these particular challenges well.

The other challenge a standard can help with is sharing of AV across institutions. If the parameterized approach does not solve a performance problem, then what about sharing? If we want to optimize for sharing and have the greatest number of institutions sharing their AV resources, then there is still no clear benefit for the parameterized approach. What about this parameterized approach aids in sharing? It seems to optimize for performance, which as we have seen above is not needed, at the expense of the real need to improve and simplify sharing. There are many unique challenges for sharing video across institutions on the web that ought to be considered before settling on a solution.

One of the big barriers to sharing is the complexity of AV. Compared to delivery of still images video is much more complicated. I have talked to a few institutions that have digitized video and have none of it online yet because of the hurdles. Some of the complication is technical, and because of this institutions are quicker to use easily available systems just to get something done. As a result many fewer institutions will have as much control over AV as they have over images. It will be much more difficult to gain that kind of control. For instance with some media servers they may not have a lot of control over how the video is served or the URL for a media file.

Video is expensive. Even large libraries often make choices about technology and hosting for video based on campus providing the storage for it. Organizations should be able to make the choices that work for their budget while still being able to share in as much as they desire and is possible.

One argument made is that many institutions had images they were delivering in a variety of formats before the IIIF Image API, so asking for similar changes to how AV is delivered should not be a barrier to pursuing a particular technical direction. The difficulty of institutions in dealing with AV can not be minimized in this way as any kind of change will be much greater and asking much more. The complexity and costs of AV and the choices that forces should be taken into consideration.

An important question to ask is who you want to help by standardizing an API for sharing? Is it only for the well-resourced institutions who self-host video and have the technical expertise? If it is required that resources live in a particular location and only certain formats be used it will lead to fewer institutions gaining the sharing benefits of the API because of the significant barriers to entry. If the desire is to enable wide sharing of AV resources across as many institutions as possible, then that ought to lead to a different consideration of the issues of complexity and cost.

One issue that has plagued HTML5 video from the beginning is the inability of the browser vendors to agree on formats and codecs. Early on open formats like WebM with VP8 were not adopted by some browsers in favor of MP4 with H.264. It became common practice out of necessity to encode each video in a variety of formats in order to reach a broad audience. Each source would be listed on the page (on a source element within a video element) and the browser picks which it can play. HTML5 media was standardized to use a pattern to accommodate the situation where it was not possible to deliver a single format that could be played across all browsers. It is only recently that MP4 with H.264 has been able to be played across all current browsers. Only after Cisco open sourced its licensed version of H.264 was this possible. Note while the licensing situation for playback has been improved there are still patent/licensing issues which mean that some institutions still will not create or deliver any MP4 with H.264.

But now even as H.264 can be played across all current browsers, there are still changes coming that mean a variety of formats will be present in the wild. New codecs like VP9 that provide much better compression are taking off and have been adopted by most, but not all, modern browsers. The advantages of VP9 are that it reduces file size such that storage and bandwidth costs can be reduced significantly. Encoding time is increased while performance is improved. And still other new, open formats like AV1 using the latest technologies are being developed. Even audio is seeing some change as Firefox and Chrome are implementing FLAC which will make it an option to use a lossless codec for audio delivery.

As the landscape for codecs continues to change the decision on which formats to provide should be given to each institution. Some will want to continue to use a familiar H.264 encoding pipeline. Others will want to take advantage of the cost savings of new formats and migrate. There ought to be allowance for each institution to pick which formats best meet their needs. Since sources in HTML5 media can be listed in order of preference, in as much as is possible a standard ought to support the ability of a client to respect the preferences of the institution for these reasons. So if WebM VP9 is the first source and the browser can play that format it should play it even if an MP4 H.264 is available which it can also play. The institution may make decisions around the quality to provide for each format to optimize for their particular content and intended uses.

Then there is the choice to implement adaptive bitrate streaming. Again institutions could decide to implement these formats for a variety of reasons. Delivering the appropriate adaptation for the situation has benefits beyond just enabling smooth playback. By delivering only the segment size a client can use based on network conditions and sometimes player size, the segments can be much smaller lowering bandwidth costs. The institution can make a decision depending on their implementation and use patterns whether their costs are more with storage or bandwidth and use the formats that work best for them. It can also be a courtesy to mobile users to deliver smaller segment sizes. Then there are delivery platforms where an adaptive bitrate format is required. Apple requires iOS applications to deliver HLS for any video over ten minutes long. Any of these types of considerations might nudge an AV provider to use ABR formats. They add complexity but also come with attractive performance benefits.

Any solution for an API for AV media should not try to pick winners among codecs or formats. The choice should be left to the institution while still allowing them to share the media in these formats with other institutions. It should allow for sharing AV in whatever formats an institution chooses. An approach which restricts which codecs and formats can be shared does harm and closes off important considerations for publishers. Asking them to deliver too many duplicate versions will also mean forcing certain costs. Will this variety of codecs allow for complete interoperability from every institution to every other institution and user? Probably not, but the tendency will be for institutions to do what is needed to support a broad range of browsers while optimizing for their particular needs. Guidelines and evolving best practices can also be part of any community built around the API. A standard for AV sharing should not shut off options while allowing for a community of practice to develop.

Simple API

If an institution is able to deliver any of their video on the web, then that is an accomplishment. What could be provided to allow them to most easily share their video with other institutions? One simple approach would be for them to create a URL where they can publish information about the video. Some JSON with just enough technical information could map to the properties an HTML5 video player uses. Since it is still the case that many institutions are publishing multiple versions of each video in order to cover the variety of new and old browsers and mobile devices, it could include a list of these different video sources in a preferred order. Preference could be given to an adaptive bitrate format or newer, more efficient codec like VP9 with an MP4 fallback further down the list. Since each video source listed includes a URL to the media, the media file(s) could live anywhere. Hybrid delivery mechanisms are even possible where different servers are used for different formats or the media are hosted on different domains or use CDNs.

This ability to just list a URL to the media would mean that as institutions move to cloud hosting or migrate to a new video server, they only need to change a little bit of information in a JSON file. This greatly simplifies the kind of technical infrastructure that is needed to support the basics of video sharing. The JSON information file could be a static file. No need even for redirects for the video files since they can live wherever and change location over time.

Here is an example of what part of a typical response might look like where a WebM and an MP4 are published:

{
  "sources": [
    {
      "id": "https://iiif.lib.ncsu.edu/iiifv/pets/pets-720x480.webm"
      "format": "webm",
      "height": 480,
      "width": 720,
      "size": "3360808",
      "duration": "35.627000",
      "type": "video/webm; codecs=\"vp8,vorbis\"",
    },
    {
      "id": "https://iiif.lib.ncsu.edu/iiifv/pets/pets-720x480.mp4"
      "format": "mp4",
      "frames": "1067",
      "height": 480,
      "width": 720,
      "size": "2924836",
      "duration": "35.627000",
      "type": "video/mp4; codecs=\"avc1.42E01E,mp4a.40.2\"",
    }
  ]
}

You can see an example of this “sources” approach here.

An approach that simply lists the available sources an institution makes available for delivery ought to be easier for more institutions over other options for sharing AV. It would allow them to effectively share the whole range of the types of audio and video they already have no matter what technologies they are currently using. In the simplest cases there would be no need for even redirects. If you are optimizing for widest possible sharing from the most institutions, then an approach along these lines ought to be considered.

Straight to AV in the Presentation API?

One interesting option has been proposed for IIIF to move forward with supporting AV resources. This approach is presented in What are Audio and Video Content APIs?. The mechanism is to list out media sources similar to the above approach but on a canvas within a Presentation API manifest. The pattern appears clear for how to provide a list of resources in a manifest in this way. It would not require a specific AV API that tries to optimize for the wrong concerns. The approach still has some issues that may impede sharing.

Requiring an institution to go straight to implementing the Presentation API means that nothing is provided to share AV resources outside of a manifest or a canvas that can be referenced separate from a Presentation manifest. Not every case of sharing and reuse requires the complexity of a Presentation manifest in order to just play back a video. There are many use cases that do not need a sequence with a canvas with media with an annotation with a body with a list of items–a whole highly nested structure, just to get to the AV sources needed to play back some media. This breaks the pattern from the Image API where it is easy and common to view an image without implementing Presentation at all. Only providing access to AV through a Presentation manifest lacks simplicity which would allow an institution to level up over time. What is the path for an institution to level up over time and incrementally adopt IIIF standards? Even if a canvas could be used as the AV API as a simplification over a manifest, requiring a dereferenceable canvas would further complicate what it takes to implement IIIF. Even some institutions that have implemented IIIF and see the value of a dereferenceable canvas have not gotten that far yet in their implementations.

One of the benefits I have found with the Image API is the ability to view images without needing to have the resource described and published to the public. This allows me to check on the health of images, do cache warming to optimize delivery, and use the resources in other pre-publication workflows. I have only implemented manifests and canvases within my public interface once a resource has been published, so would effectively be forced to publish the resource prematurely or otherwise change the workflow. I am guessing that others have also implemented manifests in such a way that is tied to their public interfaces.

Coupling of media access with a manifest has some other smaller implications. Requiring a manifest or canvas leads to unnecessary boilerplate when an institution does not have the information yet and still needs access to the resources to prepare the resource for publication. For instance a manifest and a canvas MUST have a label. Should they use “Unlabeled” in cases where this information is not available yet?

In my own case sharing with the world is often the happy result rather than the initial intention of implementing something. For instance there is value in an API that supports different kinds of internal sharing. Easy internal sharing enables us to do new things with our resources more easily regardless of whether the API is shared publicly. That internal sharing ought to be recognized as an important motivator for adopting IIIF and other standards. IIIF thus far has enabled us to more quickly develop new applications and functionality that reuse special collections image resources. Not every internal use will need or want the features found in a manifest, but just need to get the audio or video sources to play them.

If there is no IIIF AV API that optimizes for the sharing of a range of different AV formats and instead relies on manifests or canvases, then there is still a gap that could be filled. For at least local use I would want some kind of AV API in order to get the technical information I would need to embed in a manifest or canvas. This seems like it could be a common desire to decouple technical information about video resources from the fuller information needed for a manifest including attributes like labels needed for presentation with context to the public. Coupling AV access too tightly to Presentation does not help to solve the desire to decouple these technical aspects. It is a reasonable choice to consider this technical information a separate concern. And if I am already going through the work to create such an internal AV API, I would like to be able to make this API available to share my AV resources outside of a manifest or canvas.

Then there is also the issue of AV players. In the case of images many pan zoom image viewers were modified to work with the Image API. One of the attractions to delivery images via IIIF or adopting a IIIF image server is that there is choice in viewers. Is the expectation that any AV players would need to read in a Presentation manifest or canvas in order to support IIIF and play media? The complexity of the manifest and canvas documents may hinder adoption IIIF in media players. These are rather complicated documents that take some time to understand. A simpler API than Presentation may have a better chance to be more widely adopted for players and easier to maintain. We only have the choice of a couple featureful client side applications for presenting manifests (UniversalViewer and Mirador), but we already have many basic viewers for the Image API. Even though not all of those basic viewers are used within the likes of UniversalViewer and Mirador, the simpler viewers have still been of value for other use cases. For instance a simple image viewer can be used in a metadata management interface where UniversalViewer features like the metadata panel and download buttons are unnecessary or distracting. Would the burden of maintaining plugins and shims for various AV players to understand a manifest or canvas rest with the relatively small IIIF community rather than with the larger group of maintainers of AV players? Certainly having choice is part of the benefit of having the Image API supported in many different image viewers. Would IIIF still have the goal of being supported by a wide range of video players? This ability to have broad support within some of the foundational pieces like media players allows for better experimentation on top of it.

My own implementation of the Image API has shown how having a choice of viewers can be of great benefit. When I was implementing the IIIF APIs I wanted to improve the viewing experience for users by using a more powerful viewer. I chose UniversalViewer even though it did not have a very good mobile experience at the time. We did not want to give up the decent mobile experience we had previously developed. Moving to only using UV would have meant giving up on mobile use. So that we could still have a good mobile interface while UV was in the middle of improving its mobile view, we also implemented a Leaflet-based viewer alongside UV. We toggled each viewer on/off with CSS media queries. This level of interoperability at this lower level in the viewer allowed us to take advantage of multiple viewers while providing a better experience for our users. You can read more about this in Simple Interoperability Wins with IIIF. As AV players are uneven in their support of different features this kind of ability to swap out one player for another, say based on video source type, browser version, or other features, may be particularly useful. We have also seen new tools for tasks like cropping grow up around the Image API and it would be good to have a similar situation for AV players.

So while listing out sources within a manifest or canvas would allow for institutions with heterogeneous formats to share their distributed AV content, the lack of an API that covers these formats results in some complication, open questions, and less utility.

Conclusion

IIIF ought to focus on solving the right challenges for audio and video. There is no sense in trying to solve the performance challenges of AV delivery. That work has been well done already by the larger AV community and industry. The parameterized approach to an AV API does not bring significant delivery performance gains though that is the only conceivable benefit to the approach. The parameterized approach does not sufficiently help make it easier for smaller institutions to share their video. It does not provide any help at all to institutions that are trying to use current best practices like adaptive bitrate formats.

Instead IIIF should focus on achieving ubiquitous sharing of media across many types of institutions. The focus on solving the challenges with sharing media and the complexity and costs with delivering AV resources leads to meeting institutions more where they are at. A simple approach to an AV API that lists out the sources would more readily solve the challenges institutions will face with sharing.

Optimizing for sharing leads to different conclusions than optimizing for performance.

Updates

Since writing this post I’ve reconsidered some questions and modified my conclusions.

Update 2017-02-04: Canvas Revisited

Since I wrote this post I got some feedback on it, and I was convinced to try the canvas approach. I experimented with creating a canvas, and it looks more complex and nested than I would like, but it isn’t terrible to understand and create. I have a few questions I’m not sure how I’d resolve, and there’s some places where there could be less ambiguity.

You can see one example in this gist.

I’d eventually like to have an image service that can return frames from the video, but for now I’ve just included a single static poster image as a thumbnail. I’m not sure how I’d provide a service like that yet, though I had prototyped something in my image server. One way to start with creating an image service that just provides full images for the various sizes that are provided with the various adaptations. Or could a list of poster image choices with width & height just be provided somehow? I’m not sure what an info.json would look like for non-tiled images. Are there any Image API examples out in the wild that only provide a few static images?

I’ve included a width an height for the adaptive bitrate formats, but what I really mean is the maximum height and width that’s provided for those formats. It might be useful to have those values available.

I haven’t included duration for each format, though there would be slight variations. I don’t know how the duration of the canvas would be reconciled with the duration of each individual item. Might just be close enough to not matter.

How would I also include an audio file alongside a video? Are all the items expected to be a video and the same content? Would it be alright to also add an audio file or two to the items? My use case is that I have a lot of video oral histories. Since they’re mostly talking heads some may prefer to just listen to the audio than to play the video. How would I say that this is the audio content for the video?

I’m uncertain how with the seeAlso WebVTT captions I could say that they are captions rather than subtitles, descriptions, or chapters. Would it be possible to add a “kind” field that maps directly to an HTML5 track element attribute? Otherwise it could be ambiguous what the proper use for any particular WebVTT (or other captions format) file is.

Several video players allow for preview thumbnails over the time rail via a metadata WebVTT file that references thumbnail sprites with media fragments. Is there any way to expose this kind of metadata file on a canvas to where it is clear what the intended use of the metadata file is? Is this a service?

Testing DASH and HLS Streams on Linux

2016-12-28T15:01:00-05:00

I’m developing a script to process some digitized and born digital video into adaptive bitrate formats. As I’ve gone along trying different approaches for creating these streams, I’ve wanted reliable ways to test them on a Linux desktop. I keep forgetting how I can effectively test DASH and HLS adaptive bitrate streams I’ve created, so I’m jotting down some notes here as a reminder. I’ll list both the local and online players that you can try.

While I’m writing about testing both DASH and HLS adaptive bitrate formats, really we need to consider 3 formats as HLS can be delivered as MPEG-2 TS segments or fragmented MP4 (fMP4). Since mid-2016 and iOS 10+ HLS segments can be delivered as fMP4. This now allows you to use the same fragmented MP4 files for both DASH and HLS streaming. Until uptake of iOS 10 is greater you likely still need to deliver video with HLS-TS as well (or go with an HLS-TS everywhere approach). While DASH can use any codec I’ll only be testing fragmented MP4s (though maybe not fully conformant to DASH-IF AVC/264 interoperability points). So I’ll break down testing by DASH, HLS TS, and HLS fMP4 when applicable.

The important thing to remember is that you’re not playing back a video file directly. Instead these formats use a manifest file which lists out the various adaptations–different resolutions and bitrates–that a client can choose to play based on bandwidth and other factors. So what we want to accomplish is the ability to play back video by referring to the manifest file instead of any particular video file or files. In some cases the video files will be self-contained, muxed video and audio and byte range requests will be used to serve up segments, but in other cases the video is segmented with the audio in either a separate single file or again the audio segmented similar to the video. In fact depending on how the actual video files are created they may even lack data necessary to play back independent of another file. For instance it is possible to create a separate initialization MP4 file that includes the metadata that allows a client to know how to play back each of the segment files that lack this information. Also, all of these files are intended to be served up over HTTP. They can also include links to text tracks like captions and subtitles. Support for captions in these formats is lacking for many HTML5 players.

Also note that all this testing is being done on Ubuntu 16.04.1 LTS though the Xubuntu variant and it is possible I’ve compiled some of these tools myself (like ffmpeg) rather than using the version in the Ubuntu repositories.

Playing Manifests Directly

I had hoped that it would be fairly easy to test these formats directly without putting them behind a web server. Here’s what I discovered about playing the files without a web server.

GUI Players

Players like VLC and other desktop players have limited support for these formats, so even when they don’t work in these players that doesn’t mean the streams won’t play in a browser or on a mobile device. I’ve had very little luck using these directly from the file system. Assume for this post that I’m already in a directory with the video manifest files: cd /directory/with/video

So this doesn’t work for a DASH manifest (Media Presentation Description): vlc stream.mpd

Neither does this for an HLS-TS manifest: vlc master.m3u8

In the case of HLS it looks like VLC is not respecting relative paths the way it needs to. Some players appear like they’re trying to play HLS, but I haven’t found a Linux GUI player yet that can play the stream directly from the file sytem like this yet. Suggestions?

Command Line Players

DASH

Local testing of DASH can be done with the GPAC MP4Client: MP4Client stream.mpd

This works and can tell you if it is basically working and a separate audio file is synced, but only appears to show the first adaptation. I also have some times when it will not play a DASH stream that plays just fine elsewhere. It will not show you whether the sidecar captions are working and I’ve not been able to use MP4Client to figure out whether the adaptations are set up correctly. Will the video sources actually switch with restricted bandwidth? There’s a command line option for this but I can’t see that it works.

HLS

For HLS-TS it is possible to use the ffplay media player that uses the ffmpeg libraries. It has some of the same limitations as MP4Client as far as testing adaptations and captions. The ffplay player won’t work though for HLS-fMP4 or MPEG-DASH.

Other Command Line Players

The mpv media player is based on MPlayer and mplayer2 and can play back both HLS-TS and HLS-fMP4 streams, but not DASH. It also has some nice overlay controls for navigating through a video including knowing about various audio tracks. Just use it with mpv master.m3u8. The mplayer player also works, but seems to choose only one adaptation (the lowest bitrate or the first in the list?) and does not have overlay controls. It doesn’t seem to recognize the sidecar captions included in the HLS-TS manifest.

Behind a Web Server

One simple solution to be able to use other players is to put the files behind a web server. While local players may work, these formats are really intended to be streamed over HTTP. I usually do this by installing Apache and allowing symlinks. I then symlink from the web root to the temporary directory where I’m generating various ABR files. If you don’t want to set up Apache you can also try web-server-chrome which works well in the cases I’ve tested (h/t @Bigggggg_Al).

GUI Players & HTTP

I’ve found that the GStreamer based Parole media player included with XFCE can play DASH and HLS-TS streams just fine. It does appear to adapt to higher bitrate versions as it plays along, but Parole cannot play HLS-fMP4 streams yet.

To play a DASH stream: parole http://localhost/pets/fmp4/stream.mpd

To play an HLS-TS stream: parole http://localhost/pets/hls/master.m3u8

Are there other Linux GUIs that are known to work?

Command Line Players & HTTP

ffplay and MP4Client also work with localhost URLs. ffplay can play HLS-TS streams. MP4Client can play DASH and HLS-TS streams, but for HLS-TS it seems to not play the audio.

Online Players

And once you have a stream already served up from a local web server, there are online test players that you can use. No need to open up a port on your machine since all the requests are made by the browser to the local server which it already has access to. This is more cumbersome with copy/paste work, but is probably the best way to determine if the stream will play in Firefox and Chromium. The main thing you’ll need to do is set CORS headers appropriately. If you have any problems with this check your browser console to see what errors you’re getting. Besides the standard Access-Control-Allow-Origin “*” for some players you may need to set headers to accept pre-flight Access-Control-Allow-Headers like “Range” for byte range requests.

The Bitmovin MPEG-DASH & HLS Test Player requires that you select whether the source is DASH or HLS-TS (or progressive download). Even though Linux desktop browsers do not natively support playing HLS-TS this player can repackage the TS segments so that they can be played back as MP4. This player does not work with HLS-fMP4 streams, though. Captions that are included in the DASH or HLS manifests can be displayed by clicking on the gear icon, though there’s some kind of double-render issue with the DASH manifests I’ve tested.

Really when you’re delivering DASH you’re probably using dash.js underneath in most cases so testing that player is useful. The DASH-264 JavaScript Reference Client Player has a lot of nice features like allowing the user to select the adaptation to play and display of various metrics about the video and audio buffers and the bitrate that is being downloaded. Once you have some files in production this can be helpful for seeing how well your server is performing. Captions that are included in the DASH manifest can be displayed.

The videojs-contrib-hls project has a demo VideoJS HLS player that includes support for fMP4.

The hls.js player has a great demo site for each version that has a lot of options to test quality control and show other metrics. Change the version number in the URL to the latest version. The other nice part about this demo page is that you can just add a src parameter to the URL with the localhost URL you want to test. I could not get hls.js to work with HLS-fMP4 streams, though there is an issue to add fMP4 support. Captions do not seem to be enabled.

There is also the JW Player Stream Tester. But since I don’t have a cert for my local server I need to use the JW Player HTTP stream tester instead of the HTTPS one. I was successfully able to test a DASH and HLS-TS streams with this tool. Captions only displayed for the HLS stream.

The commercial Radiant media player has a DASH and HLS tester than can be controlled with URL parameters. I’m not sure why the streaming type needs to be selected first, but otherwise it works well. It knows how to handle DASH captions but not HLS ones, and it does not work with HLS-fMP4.

The commercial THEOplayer HLS and DASH testing tool only worked for my HLS-TS stream and not the DASH or HLS-fMP4 streams I’ve tested. Maybe it was the test examples given, but even their own examples did not adapt well and had buffering issues.

Wowza has a page for video test players but it seems to require a local Wowza server be set up.

What other demo players are there online that can be used to test ABR streams?

I’ve also created a little DASH tester using Plyr and dash.js. You can either enter a URL to an MPD into the input or append a src parameter with the URL to the MPD to the test page URL. To make it even easier to use, I created a short script that allows me to launch it from a terminal just by giving it the MPD URL. This approach could be used for a couple of the other demos above as well.

One gap in my testing so far is the Shaka player. They have a demo site, but it doesn’t allow enabling an arbitrary stream.

Other Tools for ABR Testing

In order to test automatic bitrate switching it is useful to test that bandwidth switching is working. Latest Chromium and Firefox nightly both have tools built into their developer tools to simulate different bandwidth conditions. In Chromium this is under the network tab and in Firefox nightly it is only accessible when turning on the mobile/responsive view. If you set the bandwidth to 2G you ought to see network requests for a low bitrate adaptation, and if you change it to wifi it ought to adapt to a high bitrate adaptation.

Summary

There are decent tools to test HLS and MPEG-DASH while working on a Linux desktop. I prefer using command line tools like MP4Client (DASH) and mpv (HLS-TS, HLS-fMP4) for quick tests that the video and audio are packaged correctly and that the files are organized and named correctly. These two tools cover both formats and can be launched quickly from a terminal.

I plan on taking a DASH-first approach, and for desktop testing I prefer to test in video.js if caption tracks are added as track elements. With contributed plugins it is possible to test DASH and HLS-TS in browsers. I like testing with Plyr (with my modifications) if the caption file is included in DASH manifest since Plyr was easy to hack to make this work. For HLS-fMP4 (and even HLS-TS) there’s really no substitute to testing on an iOS device (and for HLS-fMP4 on an iOS 10+ device) as the native player may be used in full screen mode.

Client-side Video Tricks for IIIF

2016-10-18T13:55:00-04:00

I wanted to push out these examples before the IIIF Hague working group meetings and I’m doing that at the 11th hour. This post could use some more editing and refinement of the examples, but I hope it still communicates well enough to see what’s possible with video in the browser.

IIIF solved a lot of the issues with working with large images on the Web. None of the image standards or Web standards were really developed with very high resolution images in mind. There’s no built-in way to request just a portion of an image. Usually you’d have to download the whole image to see it at its highest resolutions. Image tiling works around a limitation of image formats by just downloading the portion of the image that is in the viewport at the desired resolution. IIIF has standardized and image servers have implemented how to make requests for tiles. Dealing with high resolution images in this way seems like one of the fundamental issues that IIIF has helped to solve.

This differs significantly from the state of video on the web. Video only more recently came to the web. Previously Flash was the predominant way to deliver video within HTML pages. Since there was already so much experience with video and the web before HTML5 video was specified, it was probably a lot clearer what was needed when specifying video and how it ought to be integrated from the beginning. Also video formats provide a lot of the kinds of functionality that were missing from still images. When video came to HTML it included many more features right from the start than images.

As we’re beginning to consider what features we want in a video API for IIIF, I wanted to take a moment to show what’s possible in the browser with native video. I hope this helps us to make choices based on what’s really necessary to be done on the server and what we can decide is a client-side concern.

Crop a video on the spatial dimension (x,y,w,h)

It is possible to crop a video in the browser. There’s no built-in way that this is done, but with how video it integrated into HTML and all the other APIs that are available there cropping can be done. You can see one example below where the image of the running video is snipped and add to a canvas of the desired dimensions. In this case I display both he original video and the canvas version. We do not even need to have the video embedded on the page to play it and copy the images over to the canvas. The full video could have been completely hidden and this still would have worked. While no browser implements it a spatial media fragment could let a client know what’s desired.

Also, in this case I’m only listening for the timeupdate event on the video and copying over the portion of the video image then. That event only triggers so many times a second (depending on the browser), so the cropped video does not display as many frames as it could. I’m sure this could be improved upon with a simple timer or a loop that requests an animation frame.

And similar could be done solely by creating a wrapper div around a video. The div is the desired width with overflow hidden and the video is positioned relative to the div to give the desired crop.

This is probably the hardest one of these to accomplish with video, but both of these approaches could probably be refined and developed into something workable.

Truncate a video on the temporal dimension (start,end)

This is easily accomplished with a Media Fragment added to the end of the video URL. In this case it looks like this: http://siskel.lib.ncsu.edu/SCRC/ua024-002-bx0149-066-001/ua024-002-bx0149-066-001.mp4#t=6,10. The video will begin at the 6 second mark and stop playing at the 10 second mark. Nothing here prevents you from playing the whole video or any part of the video, but what the browser does by default could be good enough in lots of cases. If this needs to be a hard constraint then it ought to be pretty easy to do that with JavaScript. The user could download the whole video to play it, but any particular player could maintain the constraint on time. What’s nice with video on the web is that the browser can seek to a particular time and doesn’t even need to download the whole video to start playing any moment in the video since it can make byte-range requests. And the server side piece can just be a standard web sever (Apache, nginx) with some simple configuration. This kind of “seeking” of tiles isn’t possible with images without a smarter server.

Scale the video on the temporal dimension (play at 1.5x speed)

HTML5 video provides a JavaScript API for manipulating the playback rate. This means that this functionality could be included in any player the user interacts with. There are some limitations on how fast or slow the audio and video can play, but there’s a larger range of how fast or slow the just the images of the video can play. This will also differ based on browser and computer specifications.

This video plays back at 3 times the normal speed:

This video plays back at half the normal speed:

Change the resolution (w,h)

If you need to fit a video within a particular space on the page, a video can easily be scaled up and down on the spatial dimension. While this isn’t always very bandwidth friendly, it is possible to scale a video up and down and even do arbitrary scaling right in the browser. A video can be scaled with or without maintaining its aspect ratio. It just takes some CSS (or applying styles via JavaScript).

Rotate the video

I’m not sure what the use case within IIIF is for rotating video, but you can do it rather easily. (I previously posted an example which might be more appropriate for the Hague meeting.)

Use CSS and JavaScript safely, OK?

Conclusion

Two of the questions I’ll have about any feature being considered for IIIF A/V APIs are:

What’s the use case?
Can it be done in the browser?

I’m not certain what the use case for some of these transformations of video would be, but would like to be presented with them. But even if there are use cases, what are the reasons why they need to be implemented via the server rather than client-side? Are there feasibility issues that still need to be explored?

I do think if there are use cases for some of these and the decision is made that they are a client-side concern, I am interested in the ways in with the Presentation API and Web Annotations can support the use cases. How would you let a client know that a particular video ought to be played at 1.2x the default playback rate? Or that the video (for some reason I have yet to understand!) needs to be rotated when it is placed on the canvas? In any case I wonder to what extent making the decision that someone is a client concern might effect the Presentation API.

IIIF Examples #1: Wellcome Library

2016-10-01T23:13:00-04:00

As I’m improving the implementation of IIIF on the NCSU Libraries Rare and Unique Digital Collections site, I’m always looking for examples from other implementations for how they’re implementing various features. This will mostly be around the Presentation and Content Search APIs where there could be some variability.

This is just a snapshot look at some features for one resource on the Wellcome Library site, the example may not be good or correct, and could be changed by the time that you read this. I’m also thinking out loud here and asking lots of questions about my own gaps in knowledge and understanding. In any case I hope this might be helpful to others.

HTML Page

The example I’m looking at is the “[Report of the Medical Officer of Health for Wandsworth District, The Board of Works (Clapham, Putney, Streatham, Tooting & Wandsworth)]”. You ought to be able to see the resource an embedded view below with UniversalViewer:

If you visit the native page and scroll down it you’ll see some some other information. They’ve taken some care to expose tables within the text and show snippets of those tables. This is a really nice example of how older texts might be able to be used in new research. And even though Universal Viewer provides some download options they provide a separate download option for the tables. Clicking on one of the tables displays the full table. Are they using OCR to extract tables? How do they recognize tables and then ensure that the text is correct?

Otherwise the page includes the barest amount of metadata and the “more information” panel in UV does not provide much more.

This collection includes a good page with more information about these medical reports including a page about how to download and use the report data.

Presentation API

I’m now going to go down and read through sections of the presentation manifest for this resource and note what I find interesting.

`related`

The manifest links back to the HTML page with a related property at the top level like this:

"related": {
  "@id": "http://wellcomelibrary.org/item/b18250464",
  "format": "text/html"
}

There has been some discussion where the right place is to put a link back from the manifest to an HTML page for humans to see the resource. The related property may not be the right one and that YKK should be used instead.

`seeAlso`

The manifest has a seeAlso for an RDF (turtle) representation of the resource:

"seeAlso": {
  "@id": "http://wellcomelibrary.org/resource/b18250464",
  "format": "text/turtle"
}

What other types of seeAlso links are different manifests providing?

`service`

Several different services are given. Included are standard services for Content Search and autocomplete. (I’ll come back to to those in a bit.) There are also a couple services outside of the iiif.io context.

The first is an extension around access control. Looking at the context you can see different levels of access shown–open, clickthrough, credentials. Not having looked closely at the Authentication specification, I don’t know yet whether these are aligned with that or not. The other URLs here don’t resolve to anything so I’m not certain what their purpose is.

{
  "@context": "http://wellcomelibrary.org/ld/iiif-ext/0/context.json",
  "@id": "http://wellcomelibrary.org/iiif/b18250464-0/access-control-hints-service",
  "profile": "http://wellcomelibrary.org/ld/iiif-ext/access-control-hints",
  "accessHint": "open"
}

There’s also a service that appears to be around analytics tracking. From the context document http://universalviewer.io/context.json it appears that there are other directives that can be given to UV to turn off/on different features. I don’t remember seeing anything in the UV documentation on the purpose and use of these though.

`sequences`

One thing I’m interested in is how organizations name and give ids (usually HTTP URLs) for resources that require them. In this case the id of the only sequence is a URL that ends in “/s0” and resolves to an error page. The label for the sequence is “Sequence s0” which could have been automatically generated when the manifest was created. This lack of an id and label value for a sequence is understandable since these types of things wouldn’t regularly get named in a metata workflow or

This leaves me with the question of which ids ought to have something at the other end of the URI? Should every URI give you something useful back? And is there the assumption that HTTP URIs ought to be used rather than other types of identifiers–ones that might not have the same assumptions about following them to something useful? Or are these URIs just placeholders for good intentions of making something available there later on?

`rendering`

Both PDF and raw text renderings are made available. It was examples like this one that helped me to see where to place renderings so that UniversalViewer would display them in the download dialog. The PDF is oddly extensionless, but it does work and it has a nice cover page that includes a persistent URL and a statement on conditions of use. The raw text would be suitable for indexing the resource, but is one long string of text without break so not really for reading.

`viewingHint`

The viewingHint given here is “paged.” I’ll admit that one of the more puzzling things to me about the specification is exactly what viewing experience is being hinted at with the different viewing hints. How do each of these effect, or not, different viewers? Are there examples of what folks expect to see with each of the different viewing hints?

`canvases`

The canvases don’t dereference and seem to follow the same sort of pattern as sequences by adding “/canvas/c0” to the end of an identifier. There’s a seeAlso that points to OCR as ALTO XML. Making this OCR data available with all of the details of bounding boxes of lines and words on the text is potentially valuable. The unfortunate piece that there’s no MIME type for ALTO XML so the format here is generic and does not unambiguously indicate that the type of file will contain ALTA OCR.

"seeAlso": {
  "@id": "http://wellcomelibrary.org/service/alto/b18250464/0?image=0",
  "format": "text/xml",
  "profile": "http://www.loc.gov/standards/alto/v3/alto.xsd",
  "label": "METS-ALTO XML"
}

Even more interesting for each canvas they deliver otherContent that includes an annotation list of the text of the page. Each line of the text is a separate annotation, and each annotation points to the relevant canvas including bounding boxes. Since the annotation list is on the canvas the on property for each annotation has the same URL except for a different fragment hash for the bounding box for the line. I wonder if there is code for extracting annotation lists like this from ALTO? Currently each annotation is not separately dereferenceable and uses a similar approach as seen with sequences and canvases of incrementing a number on the end of the URL to distinguish them.

In looking closer at the images on the canvas, I learned better about how to think about the width/height of the canvas as opposed to the width/height of the image resource and what the various ids within a canvas, image, and resource ought to mean. I’m getting two important bits wrong currently. The id for the image should be to the image as an annotation and not to the image API service. Similarly the id for the image resource ought to actually be an image (with format given) and again not the URL to the image API. For this reason the dimensions of the canvas can be different (and in many cases larger) than the dimensions of the single image resource given as it can be expensive to load very large images.

Content Search API

Both content search and autocomplete are provided for this resource. Both provide a simple URL structure where you can just add “?q=SOMETHING” to the end and get a useful response.

http://wellcomelibrary.org/annoservices/search/b18250464?q=medical

First thing I noticed about the content search response is that it is delivered as “text/plain” rather that JSON or JSON-LD, which is something I’ll let them know about. Otherwise this looks like a nice implementation. The hits include both the before and after text which could be useful for highlighting the text off of the canvas.

The annotations themselves use ids that include the the bounding box as a way to distinguish between them. Again they’re fine identifiers but do don’t have anything at the other end. So here’s an example annotation where “1697,403,230,19” is used for both the media fragment as well as the annotation URL:

{
  "@id": "http://wellcomelibrary.org/iiif/b18250464/annos/searchResults/a80h40r1697,403,230,19",
  "@type": "oa:Annotation",
  "motivation": "sc:painting",
  "resource": {
    "@type": "cnt:ContentAsText",
    "chars": "Medical"
  },
  "on": "http://wellcomelibrary.org/iiif/b18250464/canvas/c80#xywh=1697,403,230,19"
}

The autocomplete service is simple enough: http://wellcomelibrary.org/annoservices/autocomplete/b18250464?q=med

Conclusion

I learned a bit about how ids are used in this implementation. I hope this helps give some pointers to where others can learn from the existing implementations in the wild.

Closing in on Client-side IIIF Content Search

2016-09-25T09:30:00-04:00

It sounds like client-side search inside may at some point be feasible for a IIIF-compatible viewer, so I wanted to test the idea a bit further. This time I’m not going to try to paint a bounding box over an image like in my last post, but just use client-side search results to create IIIF Content Search API JSON that could be passed to a more capable viewer.

This page is a test for that. Some of what I need in a Presentation manifest I’ve only deployed to staging. From there this example uses an issue from the Nubian Message. First, you can look at how I created the lunr index using this gist. I did not have to use the manifest to do this, but it seemed like a nice little reuse of the API since I’ve begun to include seeAlso links to hOCR for each canvas. The manifest2lunr tool isn’t very flexible right now, but it does successfully download the manifest and hOCR, parse the hOCR, and create a data file with everything we need.

In the data file are included the pre-created lunr.js index and the documents including the OCR text. What was extracted into documents and indexed is the the text of each paragraph. This could be changed to segment by lines or some other segment depending on the type of content and use case. The id/ref/key for each paragraph combines the identifier for the canvas (shortened to keep index size small) and the x, y, w, h that can be used to highlight that paragraph. We can just parse the ref that is returned from lunr to get the coordinates we need. We can’t get back from lunr.js what words actually match our query so we have to fake it some. This limitation also means at this point there is no reason to go back to our original text for anything just for hit highlighting. The documents with original text are still in the original data should the client-side implementation evolve some in the future.

Also included with the data file is the URL for the original manifest the data was created from and the base URLs for creating canvas and image URLs. These base URLs could have a better, generic implementation with URL templates but it works well enough in this case because of the URL structure I’m using for canvases and images.

manifest URL:

base canvas URL:

base image URL:

Now we can search and see the results in the textareas below.

Raw results that lunr.js gives us are in the following textarea. The ref includes everything we need to create a canvas URI with a xywh fragment hash.

Resulting IIIF Content API JSON-LD:

Since I use the same identifier part for canvases and images in my implementation, I can even show matching images without going back to the presentation manifest. This isn’t necessary in a fuller viewer implementation since the content search JSON already links back to the canvas in the presentation manifest, and each canvas already contains information about where to find images.

I’ve not tested if this content search JSON would actually work in a viewer, but it seems close enough to begin fiddling with until it does. I think in order for this to be feasible in a IIIF-compatible viewer the following would still need to happen:

Some way to advertise this client-side service and data/index file via a Presentation manifest.
A way to turn on the search box for a viewer and listen to events from it.
A way to push the resulting Content Search JSON to the viewer for display.

What else would need to be done? How might we accomplish this? I think it’d be great to have something like this as part of a viable option for search inside for static sites while still using the rest of the IIIF ecosystem and powerful viewers like UniversalViewer.

Preliminary Inventory of Digital Collections by Jason Ronallo

Upgrading from Ubuntu 17.10 to 18.04

curl

quicktile

Pop!_OS theme

Power indicator

Mouse Acceleration

SSH agent issue

Choosing a Path Forward for IIIF Audio and Video

Images

Video and Audio

AV Performance

Progressive Download

Quality of Service

Adaptive Bitrate Formats

AV Performance Solved

Parameterized URLs and Performance

AV Sharing

Simple API

Straight to AV in the Presentation API?

Conclusion

Updates

Update 2017-02-04: Canvas Revisited

Testing DASH and HLS Streams on Linux

Playing Manifests Directly

GUI Players

Command Line Players

DASH

HLS

Other Command Line Players

Behind a Web Server

GUI Players & HTTP

Command Line Players & HTTP

Online Players

Other Tools for ABR Testing

Summary

Client-side Video Tricks for IIIF

Crop a video on the spatial dimension (x,y,w,h)

Truncate a video on the temporal dimension (start,end)

Scale the video on the temporal dimension (play at 1.5x speed)

Change the resolution (w,h)

Rotate the video

Conclusion

IIIF Examples #1: Wellcome Library

HTML Page

Presentation API

related

seeAlso

service

sequences

rendering

viewingHint

canvases

Content Search API

Conclusion

Closing in on Client-side IIIF Content Search

`related`

`seeAlso`

`service`

`sequences`

`rendering`

`viewingHint`

`canvases`