Skip to main content

Read Books in Roam - A Detailed How To Guide for Importing and Using ePub in Roam

I have perfected the process of importing ePub books into Roam. I was able to create an almost frictionless experience for reading a book in Roam. Put aside the issue with reading on a tablet screen instead of a kindle, this solution is much more comfortable and efficient for reading and note taking at the same time then the Kindle. I have also develop a solution to automatically maintain full traceability from my notes to the source text even though the ePub book sits in a different Roam graph compared to my literature notes.

This post is a detailed how-to guide for setting yourself up to read books in Roam.

  • Here is the link to the Python program you need for converting your ePub books: Colab Notebook
  • Here is the link to the Book-base.json that will be required to achieve the full experience.

This is how the end result looks

Detailed how-to walkthrough

This second video explains the full process from the beginning to the end in 27 minutes.

My solution to traceability from literature notes to original text

In my proposed solution you will load the full text book to a separate Roam graph (maybe a local graph) not your main Roam database. This is to safeguard the performance of your main graph. In my experience as the size of your database increases, Roam's performance degrades. In an extreme example, I have loaded two Bible translations and full cross reference into Roam. This meant creating a database with 33000 pages and over 96000 blocks. Though Roam still functioned, its performance was at times down to 10 seconds between characters when typing.

Having the ePub full text in a separate database means that traditional block references to the source will not work because block references do not function between graphs (at least not at time of writing this). My workaround is to include the reference to the page of the Roam book in each paragraph of text, which I then hide using a one liner CSS snippet that needs to be placed in [[roam/css]]. The CSS code is provided bit further below.

workaround to block references

By including these references in each paragraph, when you copy the paragraph to your own Roam database as part of your literature notes, the paragraph will "remember" which page of the book it came from. Once finished with reading, I advise archiving your local Roam-book in an .edn file and uploading it to your literature notes page in your main Roam graph. This way you can also reach back to the original text in couple of minutes if needed. 

Upload .edn archive for later reference

Step by step instructions

  1. Save the desired ePub book to your Google Drive
  2. If the book includes images that you would like to show in your Roam book as well:
    1. Make a copy of the ePub file 
    2. Rename the copy to .zip and unpack
    3. Locate the folder that contains the images and observe the relative location of this folder compared to the books body.  Note that every ePub is different. Sometimes the images and the document files are all simply in the root directory, sometimes somewhere else.

      Folder structure within ePub file

    4. Copy the images to a location that you can publish on http://loclahost, or upload it to the internet. I use github for this purpose. Place the files in a folder that mimics the location in the ePub file. My Python conversion script simply deletes the "../" part of the image source references and replaces them with the base url provided in the img_base_url variable. A text file similar to the one I show here will be created after executing step 4.2 of the Python program. This raw text file will contain all of the epub's contents and will be saved in your output folder. You need to search for "<img" in this file to see how images are referenced in the document you are trying to convert.
      Identifying where to publish the images
    5. Set the img_base_url variable accordingly. In this example the pictures should go in the images folder such as http://localhost/book/images/, and the base url that you would need to set in the program would be for example "http://localhost/book/".
      setting the img_base_url variable
  3. Open the Colab Notebook and configure the variables.
    1. In section 2. set the input and output path and filenames. Note that the input filename you provide will also become the namespace within Roam. Its best to give the file a nice looking name.
    2. In section 4.1 set image base URL as explained above.
  4. Run each of the code blocks in the notebook in sequence up to and including step 5.3.
  5. Import the json files to an empty Roam database. Be sure to include all 3 files in the import. Two are created by the script (corpus.json and toc.json), and one you need to download from here: Book-base.json
    Import corpus.json, toc.json and Book-base.json
  6. After the import is finished, start the script on the [[roam/js]] page. This script will provide you with the automation that you can simply double click on a block of text and it will be copied to the clipboard. The script will also append "> " in front of the paragraph that is being copied. As a result when you past the paragraph it will be rendered as a block qoute. 
    start the javascript
  7. Copy the CSS snippet into your own roam graph into [[roam/css]]. Paste it into a code-block and make sure to set the type of the code-block to "CSS". This CSS code hides the tag specifying the location of the quoted text. Equally you can copy the CSS snippet from here:
    	display: none;
  8. Locate the Table of Contents and start reading.
    Locating the table of contents
  9. When you want to copy a paragraph into your literature notes, simply double click on the paragraph, then paste into your own Roam graph using CTRL+V or selecting "Paste"

More about my journey with Roam JSON here

Like this post?
Show your support.


Popular posts from this blog

Deep Dive Into Roam's Data Structure - Why Roam is Much More Than a Note Taking App

Which are the longest paragraphs in your graph? Which pages did you edit or create last week? How many paragraphs of text do you have in your database in total? Which pages do you have under a given namesapece (e.g. meetings/)?

Showcasing Excalidraw

Conor ( @Conaw ) pointed me to Excalidraw last week, and I was blown away by the tool and especially about the opportunities it opens up for  Roam Research ! It is a full-featured, embeddable sketching component ready for web integration. This post will showcase key Excalidraw features and discusses some of the issues I still need to solve to complete its integration into Roam. I spent most of my free time during the week integrating Excalidraw into Roam. This article will introduce Excalidraw by showcasing its features.

My GTD - How I Organize Meetings and TODOs in Roam

How efficient is your workflow for keeping on top of all your meeting notes, action items, contacts, projects and more?  If you were to bump into someone unexpectedly would you be able to remind yourself of all the relevant topics you wanted to discuss with the person?  Can you remember all the things you wanted to get done when running your errands?  Can you keep track of your discussions with all the people you talk to regularly? In this post I will walk you through my meetings-actions-people workflow in Roam. If you are new to Roam and Roam42... Just in case you are not familiar with Roam , it is an ultra flexible note taking tool. It's like the Excel for text. If you want to find out more, there is tremendous amount of quality content available on YouTube, just search from "Roam Research". Equally, you can head over to for all the best links and more. My workf

Roam-Excalidraw Plugin MVP Release

  I am releasing the MVP version of the Roam-Excalidraw Plugin. Over the past two weeks, I have been super focused on getting to this point. As a consequence, this post is going to be shorter and more utilitarian than usual. I had to make a choice whether to release the plugin this weekend or to write a detailed blog post. I opted for the first.