Hypothes.is, Safari robots, PathagarBooks, Nicole Ozer, Benetech

Those are live conference notes taken Friday October 25th at the Books in Browsers 2013 conference, San Francisco. See an index of my live notes here.

Usual live blogging disclaimer :

These are informal notes taken by me, Manuel Schmalstieg, at the event, and may or may not be similar to what was said by the people who spoke on these topics. Probably if something here is incorrect it is because I mistyped it or misunderstood, and if anyone wants corrections, they should email me – or post a comment. Thanks!

Annotation, Data, and the Living (“e”)Book

with Dan Whaley and Jake Hartnell, Hypothes.is

also part of open source project at berkeley, epub JS

An extended social layer on the top of the web.
actually an old idea.
we had it in books over history, marginalia, the Talmud.

here, the grandfather of Vannevar Bush. The web of knowledge, where people can connect pieces of information, and share them.

in 1994, Marc Andressen turned it on in an early version of Mosaic. then turned it off, because he realized the scale that was required to make it happen.
recently, he noted regret that he didn’t implement this feature, to allow an annotated web.

a spreadsheed of similar projects, that tries to implement an annotation layer in some form.

what happened now. a community group at W3C trying to implement a standard for such an annotation layer. semantic processes.

Open Annotation Draft Spec
how to share annotation.

if my software creates annotation on top of something, my software can read that .

a conference, I Annotate, brought about 100 people together, working on annotation projects, with people wanting to implement annotation.

One important project.

started by the OK foundation. it’s the API inside the browser, that allows to build annotation interfaces more easily.

the way to win this game: foster an ecosystem, with shared standards, and interface paradigms.

Hypothes.is. non-profit. grants from … foundations .

how to deal with organizations not interested in the long term. is facebook going to serve our needs for the next 100 or 1000 years ?


how to re-think?

we need to work accross time scales, . those services work on different time frames (twitter: seconds, IA: decades)

fork knowledge: keep separate copies.

work in whatever format.
how can we build services and handlers for the different formats?

Jake Hartnell:
i’m here to talk about EPUB.
Books often aren’t on the internet, owned by A+A (Apple, Amazon) it’s not an annotable resource, it’s a walled garden of knowledge.
i came to Hyp. to open them up to annotation. not really possible

the browser is the greatest reading application of our time. but EPUB is a great standard, and at Berkeley we started the Epub.js project.

books in browsers: not that new (Gnutenberg)
but new for the publishing community.

they think they need to go through A +A; but the users are already on the internet, that nobody owns. it’s silly.

i’m a science-fiction author, book: 23rd Century Romance, the future of sex and relationships – “it’s pretty weird” (laughter)
I published the book using Epub.js.

Amazon is the true publisher of that digital age. But YOU can change that, put your content in the open web. you can also use your own fonts, which is amazing.
kindle is throwing away all that paratext!!

let’s get back to annotation. we will show some demos, what hypo can do.

A website: alpha prototype, where you can download and use. it’s early, so be kind. But it enables the annotation of the web.

here’s a page from the front of the new york times.
with the browser extension, you can annotate a part of the text..


annotation is stored on a remote server, and can bee seen by other users.

we want to do:
– run a service, for people who don’t want to install the software.
– you are not locked into using our service.

search service: visualsearch.js – can show annotations from some domain, or from a specific user.

Text tends to change a lot, the web is not a static thing.
NYT articles are updated quite a lot. maybe the text you annotated has changed, or the context has changed. this is one of the great challenges of annotation projects.

The DiffMatchPatch library – fuzzy anchoring. the tolerance is tunable. it allows to re-attach the annotation on a page.

another great project: PDF.JS
a project by mozilla, to render PDFs in the browser natively in Javascript. Originally chrome and FF had third-party proprietary binaries, that did the rendering. The documents themselves were not accessible in the browser, nobody could interact with. With PDF.JS we can render the pdf, and do cool things, like annotations.

It should be integrated in Hypothes.is in the next couple weeks.

Annotorius: a great project by Rainer Simon, that allows image-annotation in the browsers. It’s an Annotator plugin, so we integrate it with Hypothes.is. it will be discoverable, show up in the stream of annotations from that page.
we have started to launch slowly

Casey Boyle: students have been annotating classroom material. so he can provide feedback.

Back to Books:
this is EPUB.js
it’s a book in a browser, there’s annotations.
why do you want annotations? It creates that social sphere. people talking about content is a good thing. giving them tools is an even better thing.

people who read something want to know each other, want to share experiences.

it’s amusing to see what my readers think- it gives amazing feedback, even about the user interface of my app.

there’s also a really cool demo, i will tweet about.

Believing in Robots

with Keith Fahlgren and Peter Collingridge, Safari

This talk is going to be mostly about robots. And about how we learned to love and interact robots. but not necessarily HAL:

there’s great references of robots – we don’t mean HAL; or marvin the paranoid android, as much as we love them.

as pieces of software that interact with humans and language.

mostly they have a simple job to do, and
overhearing machines talking about what they are doing. status reports.

twitter has been a great breeding ground for robots.

even when people grew popular hand had some humans on it, .. example: House of Coates.

Stealth Mountain
epically pedantic. robot that replies and fixes spelling.
endless entertainment.

when SF writers talked about robots, they didn’t think about this. but it causes great entertainment.

the arrogant robot who just tweets nonsense. no apparent motive. but wen he was proved to be a human masquerading as a robot … we didn’t know what to feel about that.

We actually employ robots at safari books online.
we have a deeply distributed team, 150 ppl, around the US and the world. and find the distributed culture quite delightful.
but it makes us slightly eccentric.

Hangouts: common features: trolling, humor, epic pedantry.

We started to make robots to make the – sometimes dry – textual conversation a little bit more humane.

looks for incongruous phrases in our chat rooms.

Secondarily, we have robots who are actually useful.

The war room, main chat room for engineers. Robots give status updates from github, tell how servers are doing.

We have a sort of complicated relationship with robots.

There has been a lot of space between early thinking about robots, and what they do now.

The laws of Asimov… insubordination is also funny. we like the idea of a robot who sometimes doesn’t do what he is told to.

We want to make borderline human robots.
Occasional fights with other robots.

Unexpectedly useful toys:


was simply expanding short-codes + give context.
people would start thanking jirabot.
interaction started, into a weird anthropomorphic, complicated thing.

Exception: the only robot that exists in physical space.


Whenever we get a new subscriber (paying customer), the robot squarks and flashes a light.

We found that robot thinking went into our minds when dreaming up new products.

Product to make people more awesome at their jobs.

We have data about how books were consumed. we could use it for recommendation, “you may read this chapter, watch this clip”.


We wanted to use it to encourage somebody to do something. And get them into the books, get immersed in the reading experience.

We were happy to play internally. This gave a tremendous amount of real understanding.

We need to allow ourself to build really small things.


Q: if you annotate, and the text completely changes – what happens to the comment?
A: we have plans for this. we will have a place to show non-attached annotations.

Q: context of leaving comments on blogs, rather the whole Disquss thing? As a primary commenting tool?
A: conceptually, absolutely. there’s a lot of interest for moving that discussion.

Lightning talk: Sameer Verma


2/3 of the world is not on the internet. we can’t get to them.

how to get books to them?
a project to bring books into villages.
… TED videos. in a village with no internet.

PathagarBooks on github

Shows a lego server, a book server, on Rhasberry Pie.

– Will you put Kahn Academy on?
– Yes

Digital Books: A New Chapter for Reader Privacy

with Nicole Ozer, ACLU-NorCal


dotRIGHTS.org ((Demand your dot right))

Specialized lawyer or privacy, free speech, and new technology.

ACLU, largest civil liberties organization, has been working on reading privacy issues.
as digital books have grown.
making sure that laws are upgraded to reflect new realities of digital books.

it’s new reality. there is a great capacity for all sorts of uses and innovations, but also issues of data recollection and retention. the government may be watching what we reading and annotating: a chilling effect.

context of state enforcement, law enforcement: trying to access reading records.

a summary of cases that had gone up in recent years, where gov tried to access reading records.
we need to make sure reading is safeguarded in the digital age.


“If you build it & collect if, they will come knocking”.

how to bake-in privacy protections?
recently the internet archive has been reevaluating it’s own data retention policies.

in the US there has been a long history of demand for user records, but we have been successful in pushing them back.


2010, amazon case, Amazon vs Lay, a large range of reading records and purchase records. the case was a success for users.

California : first privacy law that enforces reader privacy for digital books.


Includes transparency reporting: a company that receives a demand, you are obligated to publish a transparency report.
We have created a transparency template, that makes it easy and fast to create a transparency report.

Do you (website, app) collect private information of Californians? You are under obligation to have an online privacy policy.

Working with companies. Published a primer, business case study and tips: “promoting privacy and free speech: A roadmap”.

link: aclunc.org/tech

Born Accessible

with Gerardo Capiel, Benetech

Many times people don’t know what a “Print disability” is. I will show you a video that gives you sense of that.

((shows girl reading at 24pt size; a blind boy using braille reader; boy listening to voice reader, helps him focus))

Those are three students that BookShare serves, one of Benetech’s projects.

250’000 students are a fraction of 30 mio americans with “print disabilities”.

A huge need for accessibility.

We are working with over 150 publishers. but it’s difficult to keep up with all the content that gets created.

Solution: content needs to be *born* accessible.

In science and mathematics education, it becomes complicated. There are graphics, videos, interactivity.

((shows images that would be hard to describe with text only))

We need to provide tactile access forms, sound access-forms.

SVG turned out to be really, really interesting. You can embed descriptions inside of SVG. even if you don’t distribute it, you should start of a SVG.

Example: instead of ink, the picture is raised, so the student can feel the shapes.

We have come up with a way to take SVG, and sonify it.

MathTrax Sonifier.
((on EPUB in ipad: takes a graph and sonifies it))

it’s all done with traditional web technologies.
the idea came from a blind student, who used that method to understand mathematical graphs.

For many images that you saw, you need lots of characters. HTML5 has also ways to deal with this, like the new EPUB 3 spec.

One tool: Poet – used to crowdsource image descriptions. By volunteers.

On the mathematic side, the way to handle it is MathML.

Unfortunately today, in most books, formulas are just images.

But the good news: Safari and Firefox are both supporting MathML. Weird thing with chrome: they added oral rendering, but sighted users cannot see it in the browser. IE is worse: IE9 had a plugin, IE10 and 11 dropped support for that plugin.

There’s a JS library, MathJax, that works also inside EPUB3.

Other way: rendering it on the server, and render PNGs or SVGs with descriptions.

But to listen to a long math formula is quite hard to absorb. You need the ability to navigate the math, so the formula is more easy to comprehend.

Join a W3C community group!

How to find accessible content?

my favorite pie is pecan pie.
google search: limit to recipes.

advanced search: only want pecan pie with this ingredients – a rich filtering in google.

the reason: those publishers have marked up their HTML with Microdata or RDFa, defined by schema.org – there’s a common way to describe those properties.

There should be a similar way to describe accessibility.

We are working with the Schema.org folks, so we can add accessibility properties to textbooks.

It also applies to video, all type of media.

I want you to :

  • leverage standards. don’t build books as native apps, please.
  • build accessibility features into the authoring tool.
  • creating reading systems, make sure it leverages standards. look at the [Readium](http://readium.org/) system.
  • for the hackers – github.com/benetech


Question: is the ACLU concerned about people being surveilled at ACLU
Yes: the gov has a Verizon record. the ACLU is a Verizon customer. our call records have been disclosed to the NSA. we are very concerned, from a free expression and free association point of view. We filed a lawsuit within a week after the revelations and are very aggressive. Tomorrow will be the largest rally, in Washington DC, the greatest rally against surveillance in the history of the US.

Peter: mentions that one of Aaron Swartz’ last projects was (safeguarding reader privacy).