Giacomo Cignoni: My Internship at the Internet Archive

This summer, Open Library and the Internet Archive took part in Google Summer of Code (GSoC), a Google initiative to help students gain coding experience by contributing to open source projects. I was lucky enough to mentor Giacomo while he worked on improving our BookReader experience and infrastructure. We have invited Giacomo to write a blog post to share some of the wonderful work he has done and his learnings. It was a pleasure working with you Giacomo, and we all wish you the best of luck with the rest of your studies! – Drini


Hi, I am Giacomo Cignoni, a 2nd year computer science student from Italy. I submitted my 2020 Google Summer of Code (GSoC) project to work with the Internet Archive and I was selected for it. In this blogpost, I want to tell you about my experience and my accomplishments working this summer on BookReader, Internet Archive’s open source book reading web application.

The BookReader features I enjoyed the most working on are page filters (which includes “dark mode”) and the text selection layer for certain public domain books. They were both challenging, but mostly had a great impact on the user experience of Bookreader. The first allows text to be selected and copied directly from the page images (currently in internal testing), and the second permits turning white-background black-text pages into black-background-white-text ones.

Short summary of implemented features:

  • End-to-end testing (search, autoplay, right-to-left books)
  • Generic book from Internet Archive demo
  • Mobile BookReader table of contents
  • Checkbox for filters on book pages (including dark mode)
  • Text selection layer plugin for public domain books
  • Bug fixes for page flipping
  • Using high resolution book images bug fix

First approach to GSoC experience

Once I received the news that I had been selected for GSoC with Internet Archive for my BookReader project, I was really excited, as it was the beginning of a new experience for me. For the same reason, I will not hide that I was a little bit nervous because it was my first internship-like experience. Fortunately, even from the start, my mentor Drini and also Mek were supportive and also ready to offer help. Moreover, the fact that I was already familiar with BookReader was helpful, as I had already used it (and even modified it a little bit) for a personal project.

For most of the month of May, since the 6th, the day of the GSoC selection, I mainly focused on getting to know the other members of the UX team at Internet Archive, whom I would be working with for the rest of the summer, and also define a more precise roadmap of my future work with my mentor, as my proposed project was open to any improvements for BookReader.

End to end testing

The first tasks I worked on, as stated in the project, were about end-to-end testing for BookReader. I learned about the Testcafe tool that was to be used, and my first real task was to remove and explore some old QUnit tests (#308). Then I started to make end-to-end tests for the search feature in BookReader, both for desktop (#314) and mobile (#322). Lastly, I fixed the existent autoplay end-to-end test (#344) that was causing problems and I also had prepared end-to-end tests for right-to-left books (#350), but it wasn’t merged immediately because it needed a feature that I would have implemented later; a system to choose different books from the IA servers to be displayed specifying the book id in the URL.

This work on testing (which lasted until the ~20th of June) was really helpful at the beginning as it allowed me to gain more confidence with the codebase without trying immediately harder tasks and also to gain more confidence with JavaScript ES6. The frequent meetings with my mentor and other members of the team made me really feel part of the workplace.

Working on the source code

The table of contents panel in BookReader mobile

My first experience working on core BookReader source code was during the Internet Archive hackathon on May the 30th when, with the help of my mentor, I created the first draft for the table of content panel for mobile BookReader. I would then resume to work on this feature in July, refining it until it was released (#351). I then worked on a checkbox to apply different filters to the book page images, still on mobile BookReader (#342), which includes a sort of “dark mode”. This feature was probably the one I enjoyed the most working on, as it was challenging but not too difficult, it included some planning and was not purely technical and received great appreciation from users.

Page filters for BookReader mobile let you read in a “dark mode”
https://twitter.com/openlibrary/status/1280184861957828608

Then I worked on the generic demo feature; a particular demo for BookReader which allows you to choose a book  from the Internet Archive servers to be displayed, by simply adding the book id in the URL as a parameter (#356). This allowed the right to left e2e test to be merged and proved to be useful for manually testing the text selection plugin. In this period I also fixed two page flipping issues: one more critical (when flipping pages in quick succession the pages started turning back and forth randomly) (#386), and the other one less urgent, but it was an issue a user specifically pointed out (in an old BookReader demo it was impossible to turn pages at all) (#383). Another issue I solved was BookReader not correctly displaying high resolution images on high resolution displays (#378).

Open source project experience

One aspect I really enjoyed of my GSoC is the all-around experience of working on an open source project. This includes leaving more approachable tasks for the occasional member of the community to take on and helping them out. Also, I found it interesting working with other members of the team aside from my mentor, both for more technical reasons and for help in UI designing and feedback about the user experience: I always liked having more points of view about my work. Moreover, direct user feedback from the users, which showed appreciation for the new implemented features (such as BookReader “dark mode”), was very motivating and pushed me to do better in the following tasks.

Text selection layer

The normally invisible text layer shown red here for debugging

The biggest feature of my GSoC was implementing the ability to select text directly on the page image from BookReader for public domain books, in order to copy and paste it elsewhere (#367). This was made possible because Internet Archive books have information about each word and its placement in the page, which is collected by doing OCR. To implement this feature we decided to use an invisible text layer placed on top of the page image, with words being correctly positioned and scaled. This made it possible to use the browser’s text selection system instead of creating a new one. The text layer on top of the page was implemented using an SVG element, with subelements for each paragraph and word in the page. The use of the SVG instead of normal html text elements made it a lot easier to overcome most of the problems we expected to find regarding the correct placement and scaling of words in the layer.

I started working sporadically on this feature since the start of July and this led to having a workable demo by the first day of August. The rest of the month of August was spent refining this feature to make it production-ready. This included refining word placement in the layer, adding unit tests, adding support for more browsers, refactoring some functions, making the experience more fluid, making the selected text to be accurate for newlines and spaces on copy. The most challenging part was probably to integrate well the text selection actions in the two page view of BookReader, without disrupting the click-to-flip-page and other functionalities related to mouse-click events.

This feature is currently in internal testing, and scheduled for release in the next few weeks.

The text selection experience

Conclusions

Overall, I was extremely satisfied with my GSoC at the Internet Archive. It was a great opportunity to learn new things for me. I got much more fluent in JavaScript and CSS, thanks to both my mentor and using these languages in practice while coding. I learnt a lot about working on an open source project, but a part that I probably found really interesting was attending and participating in the decision making processes, even about projects I was not involved in. It was also interesting for me to apply concepts I had studied on a more theoretical level at university in a real workplace environment.

To sum things up, the ability to work on something I liked that had an impact on users and the ability to learn useful things for my personal development really made this experience worthwhile for me. I would 100% recommend doing a GSoC at the Internet Archive!