Calling out nonsense - John Degen

This post by John Degen looks like F.U.D., Fear, Uncertainty, and Doubt. If it doesn’t, please tell me why. The thing with F.U.D. is that there are generally misconceptions that lead to false conclusions, and that is what I am seeing in the post by Mr. Degen.

Mr. Degen, I respect the position you are in. Like me, you are standing up for a set of values, ethics, and rights for your profession. This is not a black and white issue. There are grey areas where we overlap, and that is where agreement or conflict can exist. In this case, we have a lot of conflict. But, we have a some stark lines drawn for us with Bill C-11, and the recent Supreme Court rulings. Simply put, rights around fair dealing and educational use have expanded.

Now, the misconceptions.

Misconception one. I am not the great dread pirate black beard of librarianship. In no way have I, nor the Ontario Library and Information Technology Association, said that creators should not be compensated. Yes, resolution language is ugly Robert’s Rules of Order legalese. That is what it has to be for the setting of an annual general meeting. Do I wish it was plain, simple, beautiful prose? Yes.

WHEREAS there exists model license agreements between Access Copyright and the Association of Universities and Colleges of Canada (AUCC) and between Access Copyright and the Association of Canadian Community Colleges (ACCC), and

WHEREAS there exist agreements between Access Copyright and the University of Toronto and between Access Copyright and the University of Western Ontario, and

WHEREAS the Canadian Association of University Teachers (CAUT), the British Columbia Library Association (BCLA), the Atlantic Provinces Library Association (APLA), the Manitoba Library Association (MLA), the Newfoundland Labrador Library Association (NLLA), the Progressive Librarians’ Guild (PLG) as well as many leading copyright scholars in Canada have taken strong positions against the Access Copyright licenses, and

WHEREAS the addition of “education” to the fair dealing categories, and the broad support for fair dealing in the Supreme Court’s pentalogy rulings of July 2012 provide further support for the position that the Access Copyright license does not provide any additional value to institutions beyond their existing rights, and

WHEREAS the fee structure is inequitable to students on whom the costs are imposed, and

WHEREAS several provisions in the license agreements limit the use of emerging technologies and increase the potential for monitoring and surveillance,

BE IT RESOLVED THAT the Ontario Library and Information Technology Association (OLITA):

  1. Stands opposed to the Access Copyright license agreements as they currently stand, including the AUCC and ACCC Model Licenses and the separate licenses with the University of Toronto and the University of Western Ontario,
  2. Urges Canadian post-secondary institutions not to enter into this licensing agreement,
  3. Encourages those who have already signed to exercise their termination options as soon as possible, and
  4. Recommends that institutions move toward the construction of systems of knowledge creation and sharing based on fair dealing, open access, site licensing as well as transactional licenses where they are needed.

The WHEREAS clauses provide the context, setting, or a lens with respect the the resolution. The resolution, I believe, is fairly explicit. OLITA, “stands opposed to the Access Copyright license agreements as they currently stand...” OLITA did not say, “Access Copyright is Cthulhu. It should be banished from this dimension, and no creator should ever be compensated.” We have an issue with those specific model licenses, and agreements. We are not the first to raise this issue. The Canadian Association of University Teachers, the Atlantic Provinces Library Association, the Newfoundland and Labrador Library Association, the Manitoba Library Association, the BC Library Association, the McMaster University Academic Librarians’ Association, the Progressive Librarians Guild Toronto Area Chapter, and many leading copyright scholars in Canada have all spoken out in opposition to these model agreements and licenses. OLITA isn’t even the first association to oppose or condemn the model licenses and agreements by way of a resolution. CAUT did so last spring, as did the BC Library Association and the McMaster University Academic Librarian’s Association, and believe it or not, the Ontario College and University Library Association. Furthermore, OCULA passed the same exact resolution as OLITA one day prior. Neither Mr. Degen or Access Copyright seemed to notice this at all from what I can tell via recent public communications by both parties.

Misconception two. The “dialogue”. I have tried my best to be as transparent as possible. Mr Degen and Access Copyright seem to be misrepresenting the narrative (“a strategic attempt to influence perception by disseminating negative and dubious or false information”). Mr. Degen and Access Copyright both refer to the letter I referenced in the previous post, and seemingly lead one to believe that the transmission of that letter is the end of the story. That Access Copyright tried to engage in an open dialogue with myself and OLITA, and both I and OLITA refused a dialogue. As I showed in my previous post, I welcomed participation in the process at the AGM for those Access Copyright board members, directors, employees, etc., who are OLITA members. From what I understand, we do have members of OLITA that are affiliated with Access Copyright, so there was every opportunity to participate. Moreover, the Access Copyright Executive Director followed up to my response saying, "We don't see how this can properly take place at your AGM. Would you consider delaying the motion until we have the opportunity to meet and begin a dialogue?" That, in my opinion, is attempting circumvent a democratic process. The executive director asked that I pull a resolution. There is no right or standing to ask such a thing. As for the following statement, “We don’t see how this can properly take place at your AGM.” Really? Resolutions are a normal part of AGMs. A member has every right to submit a resolution. If it is moved and seconded, then it moves to the agenda for the meeting. So, yes, it can properly take place. To think otherwise is silly.

Finally, if you want to talk let’s talk. Yesterday wasn’t an example of a constructive dialogue. In fact it got really unconstructive. Mr. Degen, I would like to personally apologise if there was any offense taken from any of my actions, and would also like to apologise on behalf of my colleagues. As I ended my previous post, if Access Copyright or Mr. Degen would like to open a dialogue about why these resolutions were unanimously passed, now is the time to do so.
 

»

Calling out nonsense - Access Copyright

Inspired by some fearless leaders in our community, this is my Access Copyright story.

This past week, a very interesting series of events unfolded with Access Copyright, or maybe better said, what unfolded was a lesson in how not to engage in open dialogue. I will not be speaking to the text of the the resolutions mentioned below, just the events surrounding them.

Last week was the annual Ontario Library Association Super Conference. During Super Conference, each OLA Division has their Annual General Meeting. Among other things, AGMs provide opportunities for resolutions to be put forward by the membership. At this particular AGM, we had two resolutions put forward: 1) A Memorial Resolution Honouring Aaron Swartz (Thanks ALA/LITA!) and 2) OLITA Resolution on Opposition to Access Copyright License Agreements. Standard procedures were followed, the resolutions were moved and seconded, and sent out to the membership in advance.

This past Monday (2013-01-08) something happened. I received an email from Robert Gilbert, New Media and Communication Services, at Access Copyright. This is what they had to say. I will explain why I am making this letter public later.

I addressed the incorrect information in the letter in a reply to the sender, and cc'd recipients:

Dear Mr. Gilbert,

I'd like to clear up some confusion with the resolution. The posted resolution[1] which I assume you have seen or been directed to is a proposed resolution for the Ontario Library and Information Technology Association's (OLITA) Annual General Meeting[2]. It was sent out in advance to membership.

The resolution has been moved and seconded, and will be put before the membership at the meeting for a vote. Prior to the vote, an opportunity will be provided to speak to the motion, ask questions, and propose amendments. If you are or your colleagues are OLITA members, you are more than welcome to participate.

cheers!

-nruest

[1] http://www.accessola2.com/olita/insideolita/wordpress/?p=58235 [2] http://www.accessola2.com/olita/insideolita/wordpress/?p=58053

A day went by, and other than I an out of office reply, I didn't hear anything in response. I figured we were done.

Nope.

On Wednesday evening (2013-01-30), I received an email from the Executive Director of Access Copyright. I will not publish the entire email, but I was asked to delay the motion, "We don't see how this can properly take place at your AGM. Would you consider delaying the motion until we have the opportunity to meet and begin a dialogue?"

I responded:

Hi XXXXXXX,

I due[sic] hope you understand the weight and merit of what you are asking. You are asking that I forgo a democratic process. This is a resolution that was put forward by a member of our association, and will be discussed as [sic] voted on at our AGM.

As I stated previously, if you or any of your colleagues are OLITA members, you are more then welcome to come and part take in this democratic process. You will be provided every opportunity to speak to the resolution on the table.

Other than that, I will in no way interfere this process as you have suggested.

Regards,

-nruest

I had hoped this was the end of the exchange.

Nope.

The OLITA AGM was Friday evening (2013-02-01). Access Copyright was present at the conference as they had a booth in the exhibitors' hall. During the day, a colleague of mine showed me the letter I mentioned earlier. Somewhat (well really a lot) flabbergasted, I asked where and how they got a copy, assuming the only people to see the aforementioned letter were those that sent it, and those that received it. Nope. Access Copyright decided the best way to engage in an open "dialogue" with me, our association and/or our community was to print off a stack of these letters (in a very classy paper stock!) to hand out at their exhibitor booth.

I fully appreciate, and can understand the rationale behind trying to open up a dialogue. However, Access Copyright tried to circumvent a democratic process, refused to engage in a public dialogue, and tried to misrepresent and embarrass OLITA on the exhibitors’ floor. I find these intimidation tactics unacceptable.

We played fair. We brought no mention of Access Copyright's behaviour to the assembly floor. The resolution went forward with a single friendly amendment, and was passed unanimously. The OLITA membership has spoken. If Access Copyright would like to open a dialogue about why these resolutions were unanimously passed, now is the time to do so.

»

Islandora Web ARChive Solution Pack

What is it?

The Islandora Web ARChive Solution Pack is yet another Islandora Solution Pack. This particular solution pack provides the necessary Fedora objects for persisting and disseminating web archive objects; warc files.

What does it do?

Currently, the SP allows a user to upload a warc with an associated MODS form. Once the object is deposited, the associated metadata is displayed along with a download link to the warc file.

You can check out an example here

Can I get the code?

Of course!

Todo?

If I am doing something obviously wrong, please let me know!

Immediate term:

  1. Incorporate Wayback integration for the DIP. I think this is the best disseminator for the warc files. However, I haven't wrapped my head around how to programatically provide access to the warc files in the Wayback. I know that I will have two warc objects, an AIP warc and a DIP warc (Big thank you to @sbmarks for being a soundboard today!). Fedora will manage the AIP, and Wayback will manage the DIP. Do I iFrame the Wayback URI for the object, or link out to it?

  2. Drupal 7 module. Drupal 7 versions of Islandora Solution Packs should be on their way shortly -- Next release I believe. The caveat to using the Drupal 6 version of this module is the mimetype support. It looks like the Drupal 6 api (file_get_mimetype) doesn't pull the correct mimetype for warc file. I should get 'application/warc' but I am getting 'application/octet-stream' -- the fallback default for the api.

Long term:

  1. Incorporate Islandora microservices. What I would really like to do is allow users to automate this entire process. Basically, just say this is a site I would like to archive. This is the frequency at which I would like it archived, with necessary wget options. This is the default metadata profile for it. Then grab the site, ingest it into Fedora, drop the DIP warc into Wayback, and make it all available.

  2. If you have any idea on how to do the above, or how to do it a better manner, please let me know!

»

DPLA Appfest Drupal integration

Below is the output of the little project I worked on today at the DPLA Appfest. It definitely isn't a perfect solution to the problem. It is not a drop-in module to just grab a collection from the DPLA API and "curate" it in your library's Drupal site. I hate reinventing the wheel especially if there are existing modules that can solve the problem for you. Moreover, as one of the few people that still respects what OAI-PMH does, it would be worth considering using DPLA as and OAI-PMH provider. But, I'm not sure if that is technically legal in OAI-PMH terms given that they are most likely likely harvesting it via OAI-PMH. Don't want to get into and infinite regressing of metadata providers. S'up dawg? All jokes aside, I think OAI-PMH would be a better solution that I what I tossed together because it would make harvesting a "set" a hell of a lot easier. My 2¢.

I also have a live demo of it living on my EC2 instance. I've ingested 2000 items from the API, and decided to throw them into a solr index just to demonstrate the possibilities of what you can do with the ingested content.

Finally, I big giant thank you to DPLA and Chattanooga Public Library for putting this on and the wonderful hospitality. This was absolutely fantastic!

Idea

Drupal module or distribution

Your Name: Nate Hill

Type of app: Drupal CMS

Description of App: Many. many libraries choose to use Drupal as their content management system or as their application development framework. A contrib Drupal module that creates a simple interface for admin users to curate collections of DPLA content for display on a library website would be useful.

Workflow

Preamble:

I don't like recreating the wheel. So, let's see what contrib modules already exist, and see if we can just create a workflow to do this to start with. It would be really nice if DPLA had a OAI-PMH provider, then you could just use CCK + Feeds + Feeds OAI-PMH.

Example: bitly.com/VXMvMr

Requirements:

  • CCK

    drush pm-download cck

  • Feeds

    drush pm-download feeds

  • Feeds - JSON Parser

    drush pm-download feeds_jsonpath_parser cd sites/all/modules/feeds_jsonpath_parser && wget http://jsonpath.googlecode.com/files/jsonpath-0.8.1.php

Setup:

  • Create a Content Type for the DPLA content you would like to pull in (admin/content/types/add)
  • Create DPLA metadata fields for the Content Type (admin/content/node-type/YOURCONTENTYPE/fields)
  • Create a new feed importer (admin/build/feeds/create)
  • Configure the settings for you new feed importer
    • Basic settings:
    • Select the Content Type you would like to import into
    • Select a fequency you would like Feeds to ingest
    • Fetcher
    • HTTP Fetcher
    • Processor
    • Node processor
    • Select the Content Type you created
    • Mappings (create a mapping for each metadata field you created)
      • Source : jsonpath_parser:0
      • Target : Title
    • Parser
    • JSONPath Parser
    • Settings for JSONPath parser
      • Context: $.docs.*
  • Construct a search you would like to ingest using the DPLA API
    • ex: http://api.dp.la/v1/items?dplaContributor=%22Minnesota%20Digital%20Library%22
  • Start the import! (node/add/YOURCONTENTTYPE)
  • Give the import a title... whatever your heart desires.
  • Add a feed url
  • Click on JSONPath Parser settings, and start adding all of the JSONPaths
  • Click save, and watch the import go.
  • Check out your results
»

York University Libraries Open Access Week 2012 - #blogvsbook

Yesterday, York University Libraries held a debate in the Scott Library entitled, "Be it resolved the blog replace the book?" The debate turned out pretty awesome, and somehow the team arguing for the book won!? (Some might say it was because of @adr's compelling closing statements.) 

Along with livestreaming the debate on ustream, I pulled together (a special thanks to Ed Summers, and his very permissive licensing) a little node.js application to display a "twitterfall" of the hashtag for the event. As is always the case, technology is bound to fail, somehow, someway, at a live event. Turns out that we owe a very special thank you to the giant Amazon outage, which in turn took out Heroku's infrastructure. Good thing my paranoia urged me to use a backup application to snag the archive for the stream; with all of the variations on the hashtag.

Enough about the debate, and Amazon's large internet burp! What I want to really talk about is some fun ways to play with the data we collected from the Twitter API. The backup application I mentioned earlier, has some nice visualizations incorporated in it. Along with its ease of use, it is pretty slick and simple to use application. But, most important, I have a csv (deposited in the OCUL Dataverse site) of all the tweets, for all the hashtags I could figure out. Which means we (yes you! Download the csv and have fun with this too!) can start doing some cool visualizations. 

Inspired by @Sarah0s' "Dead easy data visualization for libraries" talk at AccessYUL I decided to play with infogr.am to see how easy it would be to toss together a visualization of the number of tweets per user.


This is a fairly basic and easy one to make. You only need two columns: twitter usernames, and corresponding number of tweets. Once you have those entered, just hit publish, and you're good to go. 

So, that is something quick and easy. I have "Designing Data Visualizations" on the way. Hopefully that inspires me a bit more, and maybe I'll start playing with d3js again. Should be fairly straightforward to drop the csv into Google Refine and get some json back. In the interim, I just leave it up to Bill Denton to show us some really cool stuff with the data in R 

»

iaTorrent update OR Learning by reading code

Last week, inspired from a meeting, I started tossing together a little python program to solve a problem. It wasn't perfect. It was warty. But I think I have something worthwhile now. Or, at least useful for me -- It gives you what you want, and writes to a log when something goes wrong.

What I really want to do here, is just take a moment to sing the praises of learning by reading code. Heading into this little project, I had a basic idea of what I wanted to do, and I knew something like this could be done given Tim's project. I knew that I wanted to make a this a module, and set it up on PyPI, but I had really know idea how to do so. But! I knew of somebody who did, and is quite proflic in my mind. Ed making his code available on Github (and using very open licenses) made it possible for me to learn how to build the structure for a Python module, the structure for writing tests, and using argparse/optparse correctly.

So, here is to learning by reading code!

»

IA Torrent

Yesterday in a meeting for our Digital Initiatives Advisory Group we were discussing what collections we should consider sending over to the UofT Internet Archive shop, and I asked an innocent newbie question - So, do we have copies of everything we have had the Internet Archive digitize?

NOPE.

No big deal. We're in the infant stages of creating a digital preservation program here, and everything that comes with it. INFRASTRUCTURE!

I knew Tim Ribaric over at Brock University wrote an Internet Archive scraper a while back, so I knew it would be possible to get our content if need be. Knowing that combined with the Internet Archive announcement a little over a month ago about making available torrents for items in the Internet Archive, it inspired me to whip together a Python script to grab all the torrents for a given collection.

Last night I threw together a little proof-of-concept grabbing the RSS feed on the York University Libraries Internet Archive page using BeautifulSoup and some ugly regex.

This morning, still inspired and brainstorming with Dan Richert, I started poking around for different ways to get at our collection. The Internet Archive's advanced search is super helpful for this, and I can get the results as json! So, no regex; as Dan told me, "if you solve a problem with regex, you now have two problems."

On the advanced search page, you will need your query parameters. You can grab those from the 'All items (most recently added first) link on a collection page. For example, the York University Libraries collection query parameters:

(collection:yorkuniversity AND format:pdf) AND -mediatype:collection'

Then selected your desired output format, and number of results. 2608 for me given the number of items in the collection. Then you end up with some json like this:

{
   "responseHeader":{
      "status":0,
      "QTime":1,
      "params":{
         "json.wrf":"",
         "qin":"(collection:yorkuniversity AND format:pdf) AND -mediatype:collection",
         "fl":"identifier,title",
         "indent":"",
         "start":"0",
         "q":"( collection:yorkuniversity AND format:pdf ) AND -mediatype:collection;",
         "wt":"json",
         "rows":"5"
      }
   },
   "response":{
      "numFound":2608,
      "start":0,
      "docs":[
         {
            "title":"Saint-Pétersbourg",
            "identifier":"saintptersboyork00rauoft"
         },
         {
            "title":"Revised statutes of Ontario, 1990 = Lois refondues de l'Ontario de 1990",
            "identifier":"v4revisedstat1990ontauoft"
         },
         {
            "title":"Essai philosophique concernant l'entendement humain : ou l'on montre quelle est l'etendue de nos connoissances certaines, et la maniere dont nous y parvenons",
            "identifier":"1714essaiphiloso00lockuoft"
         },
         {
            "title":"Essai philosophique concernant l'entendement humain : où l'on montre quelle est l'étendue de nos connoissances certaines, et la manière dont nous y parvenons",
            "identifier":"1729essaiphiloso00lockuoft"
         },
         {
            "title":"Essai philosophique concernant l'entendement humain, : ou l'on montre quelle est l'etendue de nos connoissances certaines, et la maniere dont nous y parvenons.",
            "identifier":"1735essaiphiloso00lockuoft"
         }
      ]
   }
}

(make sure you lop off '&callback=callback&save=yes' at the end of the url). Once you have the url for the json, it is pretty straightforward from there. You just call the script like so:

ia-torrent.py 'http://archive.org/advancedsearch.php?q=%28collection%3Ayorkuniversity+AND+format%3Apdf%29+AND+-mediatype%3Acollection&fl%5B%5D=identifier&fl%5B%5D=title&sort%5B%5D=&sort%5B%5D=&sort%5B%5D=&rows=2608&page=1&output=json' '/tmp/ia-torrent'

Caveats! I haven't been able to download all the torrents for an entire collection yet. Looks like Internet Archive's servers don't like the number of requests, and the script dies out with:

'IOError: [Errno socket error] [Errno 111] Connection refused'

I've tried throttling myself in the script at 15 seconds per request, and still get cut off. If anybody knows if Internet Archive has any published request rates, or has a better idea in implementing this, please let me know! Add a comment, or fork + clone + pull request. Patches are most welcome!

Big thank you to Dan Richert for the impromptu crash course on parsing json this morning!!!

»

Islandora development visualization

>

Hit a bit of a wall yesterday getting checksums working when ingesting content into Islandora, so I made a Gource video of the Islandora commits in my fork of the git repo.

Music by RipCD (@drichert) and myself.

How'd I do it?

  1. I wanted to use the Gravatars, so I used this handy little perl script.
  2. Hopped into the Islandora git repo, and ran:

    gource --user-image-dir .git/avatar/ -s 3 --auto-skip-seconds 0.1 --file-idle-time 50 --max-files 500 --disable-bloom --stop-at-end --highlight-users --hide mouse --background-colour 111111 --font-size 20 --title "Islandora Development" --output-ppm-stream - --output-framerate 60 | avconv -y -r 60 -f image2pipe -vcodec ppm -i - -b 8192K ~/Videos/islandora-gource.mp4    

  3. Then I used OpenShot to add the music and uploaded to YouTube.
»

FITS and Islandora integration

Digital preservationistas rejoice?
 
I managed to get FITS integration working in Islandora via a plugin. The plugin will automatically create a FITS xml datastream for an object upon ingest in the Islandora interface for a given solution pack. Right now I have it working with the Basic Image Solution Pack, Large Image Solution Pack, and PDF Solution Pack. You just have to make sure fits.sh is in your apache user's path (thanks @adr). [UPDATE: Works with the Audio Solution Pack now.]
 
What I had feared was going to be a pretty insane process turned out to be fairly simple and straightforward, which I'll outline here.

  1. I looked at existing plugins for something similar that I could crib from, and found that something in the exiftool plugin which is used in the audio and video solution packs.
  2. Using the existing plugin, I ran some grep queries to figure out how it is used in the overall codebase (Islandora, and solution packs). 
  3. Created a feature branch
  4. Hammered away until I had something working. (Thanks @mmccollow)
  5. Create an ingest rule for a solution pack. This tells the solution pack to call the plugin.
  6. Test, test, and test.
  7. Merged feature branch with 6.x branch, pushed, and opened up a  pull request.

That is basically it. Let me know if you have any questions. Or, if you know of a way to make it even better, patches welcome ;)
 
[Update #2]
 
I've added a configuration option to the Islandora admin page to enable FITS datastream creation, and the ability to define a path to fits.sh. I put it in the advanced section of the admin page which is not expanded by default. This will probably be problematic, and folks won't notice it. It might be a better idea to collect all the various command line tools Islandora uses, and give them all a section in the admin page to define their paths.
 
I also have FITS creation working with the Video Solution Pack now. Up next, Islandora Scholar... just have to get that up and running ;)

»

Just an easy way to help out!

Do you have a computer or server sitting around running Linux, and want to help out a good cause? Then you should check out one of the active Archive Team projects. "Archive Team is a loose collective of rogue archivists, programmers, writers and loudmouths dedicated to saving our digital heritage." Basically, this wonderful group of people monitors the internet for sites/services that are about to disappear, and do their absolute best to make sure these things are preserved. 
 
I wish I would have got involved with this earlier, but as I was walking out the door from McMaster I stumbled across the MobileMe project a figured I could throw some bandwidth at it. If you are comfortable with installing software on the command line, a little bit of get, and compiling then you should throw in a helpful hand! (Or, if you are a bad ass programmer, you can help write some scripts for projects that need a hand.) The instructions on each project page are pretty straightforward. If you have a question or need a hand to get going, pop into the IRC channel or just ask me.

»
Syndicate content