Skip to main content

Deep Dive Into Roam's Data Structure - Why Roam is Much More Than a Note Taking App

Query example
  • Which are the longest paragraphs in your graph?
  • Which pages did you edit or create last week?
  • How many paragraphs of text do you have in your database in total?
  • Which pages do you have under a given namesapece (e.g. meetings/)?

Roam Research is a full-featured database and you can ask it many more questions, beyond what is available via the {{[[query]]:}} function. This post should give you a good foundational understanding of the underlying data structure in Roam. 

I have spent the last week deep-diving into Roam's data. I had lots of fun and I have learned a lot. This summary is for myself, as much as for anybody else, attempting to capture my understanding in writing. It is possible, that you will find this too technical. I am sorry for that. I will try my best to convey the information in a way that is easy to follow, building from the most basic concepts to the complex.

In the course of my explorations I have also built a set of query SmartBlocks and created several example queries which you can find here. Even if you don't want to understand the details, you may find running some examples interesting.

Through my deep-dive, my appreciation for Roam Research has grown significantly. I am ever more confident that Roam will scale. In a not-so-distant future, Roam will hold, in full-text, everything that I read. My notes, book and article summaries, etc. will conveniently reference back to the original source accessible with a single click within one system. The future of Roam is bright!

My work in this post builds on many extremely valuable articles and references. In particular, I would like to highlight the following. Once you have read my overview, if you thirst for more, I highly recommend checking these out as well.

You will find two things in this article that I couldn't find in any of the ones above:

  1. A detailed discussion of the very basics, including a comprehensive introduction of the basic Roam data structures.
  2. A set of #42SmartBlocks to execute advanced queries right inside Roam. If you are not interested in the basics and want to look at the SmartBlock, jump ahead.

Let's dive in! I hope you will enjoy the ride as much as I have!

Basic Concepts

Roam is built on a Datomic database. In simple terms, a Datom is an individual fact. It is an attribute with a value. Datoms consist of four elements:

  • Entity ID
  • Attribute
  • Value
  • Transaction ID
You can think of Roam as a flat set of Datoms that looks like this:
[<e-id>	<attribute>	<value>			<tx-id>  ]
...
[4 	:block/children	5 			536870917] 
[4 	:block/children	9 			536870939] 
[4 	:block/uid 	"01-19-2021" 		536870916] 
[4 	:node/title 	"January 19th, 2021" 	536870916] 
[5 	:block/order 	0 			536870917] 
[5 	:block/page 	4 			536870918] 
[5 	:block/parents 	4 			536870918] 
[5 	:block/refs 	6 			536870920] 
[5 	:block/string 	"check [[Projects]]" 	536870919] 
[5 	:block/uid 	"r61dfi2ZH" 		536870917] 

Datoms that share the transaction-id were added within the same transaction. Amongst others, this transactional approach makes it possible for Roam to synchronize content to your different devices and to manage complex undo operations.

Datoms that have the same entity-id are facts about the same block.

If you want to query the entity-id of a block, based on its block reference, you could write:

[:find ?e-id
 :where
 [?e-id :block/uid "r61dfi2ZH"]]

Considering the data above, this query would return the value 5.

Attributes

Roam stores facts about paragraphs and pages using :block/ attributes. There are few minor differences between pages and paragraphs that I will explain in a minute, however, the basic concept you must understand, is that a page is just a special type of block. Mostly, Roam treats a page exactly the same way as a paragraph. Both are blocks.

Blocks have 2 IDs

Hidden ID: 

  • The entity-id is the real block-id, even though it is not visible through the Roam user interface. This is the ID that is used to tie information together in the database. The Entity ID identifies facts about a block, capturing parent-child relationships, and references to blocks.

Public ID:

  • Is the block reference of a paragraph, e.g.: ((GGv3cyL6Y)), or
  • the [[Page Title]] for pages. Note that pages also have a nine-character long UID - very much like the block reference, you could use these, for example, for constructing URLs that point to specific pages in your graph.

Common attributes for all blocks

Every block will have the following attributes:

:block/uidThe public ID, i.e. nine-character long block reference.
:create/emailThe email address of the person who has created the block.
:create/timeTime in milliseconds since epoch (January 1, 1970 midnight UTC/GMT).
:edit/emailThe email address of the person who has edited the block.
:edit/timeThe time the block was last edited.
[10 :block/uid		"p6qzzKa-u"     536870940]
[10 :create/email	"foo@gmail.com" 536870940]
[10 :create/time	1611058803997   536870940]
[10 :edit/email		"foo@gmail.com" 536870940]
[10 :edit/time		1611058996600   536870949]

Trees in the forest

The Roam database is like a forest. Each page is a tree. The root of the tree is the page, the branches are the higher-level paragraphs; the leaves are the paragraphs at the deepest level of nesting on the page.

[[Page]]
* Branch
  * Branch
    * Leaf
    * Leaf
  * Leaf
  * Branch
    * Branch
      * Leaf
* Branch
  * Leaf
...

For every paragraph, Roam always creates two pointers. Children reference the entity-id of their parents using :block/parents and parents reference the entity-id of their children using :block/children.

[4	:block/children	5	536870917]
[5	:block/parents	4	536870918]

Parents keep a list of their children in the :block/children attribute. This list will ONLY include the entity-id of their immediate descendants, not the grandchildren. A page will only list the top-level paragraphs on the page as its children, but not the nested paragraphs. Similarly, a paragraph will only list blocks nested right under it, not the blocks nested under the nested blocks. The lowest level blocks in the nesting (the leaves) will have no :block/children attribute.

Children also keep a list of their parents in the :block/parents attribute. Contrary to :block/children, the list of parents includes the entity-id of ALL ancestors i.e. grandparents, great grandparents, etc. A nested paragraph will have references to the parent paragraph(s) and the page as well. The top-level paragraphs on a page have the entity-id of the page in the :block/parents attribute, while paragraphs nested under another paragraph will have the entity-id of the higher level paragraph and the entity-id of the page.

Page-only attributes

:node/titleAll pages will have a title, and no paragraphs will have one.

If you want to find all pages in your database, you need to query :node/title because this property will only hold value for pages. By executing the following query you will get a table with two columns: the entity-id of each page under ?p and the title of each page under ?title.

[:find ?p ?title
 :where [?p :node/title ?title]]

If you would also want to see the nine-character UID for each page, for example, to construct a link to the page, you would need to find the :block/uid attribute associated with the ?p entity-id. Here's how the query would look like. Note how ?p appears in both patterns in the where clause. This tells the query engine to find the title and the uid for the same entity.

[:find ?p ?title ?uid
 :where [?p :node/title ?title]
        [?p :block/uid ?uid]]

Paragraph-only attributes 

Every paragraph will have the following attributes:

:block/pageEvery paragraph on a page regardless of their level of nesting will also reference the entity-id of their page.
:block/orderThis is the sequence of the block within the page or within their level of nesting under a paragraph. You will need to sort this value to retrieve the paragraphs in the proper sequence, as they appear in the document.
:block/stringThe contents of the block.
:block/parentsThe ancestors of the paragraph. For top-level paragraphs this is only the page. For nested paragraphs, this attribute lists all their ancestors leading up to (and including) the page.

Optional attributes:

Roam will only set these attributes (only exist in the database for the paragraph) if you change the value from the default for the specific block. e.g. you set the text alignment of the block from left aligned to centered.

:children/view-type
Specifies how to display the block’s children. Recognized values are ‘bullet’, ‘document’, ‘numbered’.
:block/headingIn case you set the heading level of the block to H1, H2, or H3. Allowed values of this are 1,2,3.
:block/props
This is where Roam stores the sizing of an image or iframe, the position of the slider, the setting of the Pomodoro timer, etc.
:block/text-alignAlignment of the paragraph. Values are 'left', 'center', 'right', 'justify'.

The Roam data-structure

If you are wondering how to find out what attributes exist in your database, I have good news! Using a simple query, you can list all the attributes in your database.

[:find ?Namespace ?Attribute
 :where [_ ?Attribute]
[(namespace ?Attribute) ?Namespace]]

Below is the list. Truth be told, the query above will not sort the values, and will not create the last column. I have included the slightly more advanced version of the query, which will do the sorting, in the downloadable roam.json file. I found the namespace function in the clojure.core documentation.

NamespaceAttribute:Namespace/Attribute
attrslookup:attrs/lookup
blockchildren:block/children
blockheading:block/heading
blockopen:block/open
blockorder:block/order
blockpage:block/page
blockparents:block/parents
blockprops:block/props
blockrefs:block/refs
blockstring:block/string
blocktext-align:block/text-align
blockuid:block/uid
childrenview-type:children/view-type
createemail:create/email
createtime:create/time
editemail:edit/email
editseen-by:edit/seen-by
edittime:edit/time
entityattrs:entity/attrs
logid:log/id
nodetitle:node/title
pagesidebar:page/sidebar
usercolor:user/color
userdisplay-name:user/display-name
useremail:user/email
userphoto-url:user/photo-url
usersettings:user/settings
useruid:user/uid
vcblocks:vc/blocks
versionid:version/id
versionnonce:version/nonce
versionupgraded-nonce:version/upgraded-nonce
windowfilters:window/filters
windowid:window/id
Like this post?
Show your support.

Queries

If you are interested in writing queries for Roam, you should really work through the nine chapters of Learn Datalog Today. It is fun and action packed with exercises. 

I will now quote a few paragraphs from the tutorial almost verbatim, changing the examples to fit Roam.  For the rest, please visit the tutorial.

I also recommend the following youtube video by Stuart Halloway, which summarizes the key features of the Datalog query language in eleven minutes.

Core concepts

A query is a vector starting with the keyword :find followed by one or more pattern variables (symbols starting with ?, e.g. ?title). After the find clause comes the :where clause which restricts the query to datoms that match the given data patterns. Use the _ symbol as a wildcard for the parts of the data pattern you wish to ignore.

For example if you want to find the text based on the block reference of a paragraph you would write:

[:find ?string
 :where [?b :block/uid "r61dfi2ZH"]
[?b :block/string ?string]]

Considering the example at the beginning of this post, this query would return "Check [[Projects]]"

The important thing to note here is that the pattern variable ?b is used in both data patterns. When a pattern variable is used in multiple places, the query engine requires it to be bound to the same value in each place. Therefore, this query will only find the string for the block that has the uid r61dfi2ZH.

Datoms about an entity may be in attributes that are in different namespaces. For example, if I would want to find the title of the page, that holds the ((r61dfi2ZH)) paragraph,I would write the following query. Note that I first read the ?block/page attribute for the entity-id of the page which I store in ?p. I then use this to locate the ?note/title and ?block/uid for the page.

[:find ?title ?uid
 :where [?b :block/uid "r61dfi2ZH"]
[?b :block/page ?p] [?p :node/title ?title] [?p :block/uid ?uid]]

Considering the example above, this would return "January 19th, 2021" and "01-19-2021".

The :in clause provides the query with input parameters, much in the same way that function or method arguments do in your programming language. Here's how the previous query would look like, with an input parameter for block_reference.

[:find ?title ?uid
 :in $ ?block_ref
 :where [?b :block/uid ?block_ref]
        [?b :block/page ?p]
        [?p :node/title ?title]
        [?p :block/uid  ?uid]]

This query takes two arguments: $ is the database itself (implicit, if no :in clause is specified) and block_ref which presumably will be the block reference of the paragraph.

You can execute the above using window.roamAlphaAPI.q(query,block_ref);. If you do not provide a value for $, the query engine will implicitly assume the default database. Since you will be only querying your own Roam database, there is no need to state the database. Maybe once Roam offers cross database links, this could become interesting.

I will now skip in the tutorial to cover a few topics that are slightly different in Roam. If you are interested in what you are missing out on, head over to the tutorial for the details I am skipping. There is a very helpful discussion about Tuples, Collections, and Relations which provide means to execute logical OR and AND operations. 

Predicates

The predicate clause filters the result set to only include results for which the predicate returns true. In datalog you can use any Clojure function or Java method as a predicate function. In my experience, in the Roam javascript implementation, Java functions are not available, and only a handful of Clojule functions work.

Clojure functions must be fully namespace-qualified, except for the clojure.core namespace. Sadly, outside the core namespace I have only found few that worked in Roam. These include clojure.string/includes?, clojure.string/starts-with?, and clojure.string/ends-with?. Some of the helpful functions from the core namespace include namespace which returns the namespace of an attribute, and count which returns the length of a string. Some ubiquitous predicates that can also be used without namespace qualification are <, >, <=, >=, =, not=, != and so on.

Here are two examples using predicates. The first one returns the number of characters in a paragraph based on the block_reference.

[:find ?string ?size
 :in $ ?block_ref
 :where [?b :block/uid ?block_ref]
        [?b :block/string ?string]
        [(count ?string) ?size]]

The second lists blocks that were modified after a given date.

[:find ?block_ref ?string
 :in $ ?start_of_day
 :where [?b :edit/time ?time]
        [(> ?time ?start_of_day)]
        [?b :block/uid ?block_ref]
        [?b :block/string ?string]]

Transformation functions

Sadly, I could not make transformation functions work in javascript. These are only an option if you install a Datalog database on your desktop, and load the Roam.EDN for further manipulation.

The only workaround available is to post-process the results after the query. The following example will filter page titles to find a text fragment ("temp") case insensitively, and then to sort the results alphabetically. This query will return pages including words like "Template", "template", "Temporary", "attempt", etc.

let query = `[:find ?title ?uid
              :where [?page :node/title ?title]
        	     [?page :block/uid ?uid]]`;

let results = window.roamAlphaAPI.q(query)
                     .filter((item,index) => item[0].toLowerCase().indexOf('temp') > 0)
                     .sort((a,b) => a[0].localeCompare(b[0]));;

Aggregates

Aggregates work, as expected. There are many aggregates available including sum, max, min, avg, count. You can read more about aggregates here.

If, for example, you do not know the purpose of an an attribute, or what values are allowed, simply query your database to find existing values. The next example lists the values for :children/view-type. Note, that if you are only using bullets in your graph the query will only return one value: 'bullet'. I use the distinct aggregate function, without it I would get a list of potentially thousands of values, one row for each block where view-type is specified.

[:find (distinct ?type)
 :where
 [_ :children/view-type ?type]]

Rules

You can abstract away reusable parts of your queries into rules, give them meaningful names and forget about the implementation details, just like you can with functions in your favorite programming language.

A typical example of rules in Roam are the ancestor rules. These exploit the :block/children to traverse the tree of nested blocks. A simple ancestor rule would look like this. This finds the ?child based on a ?parent entity-id.

[[(ancestor ?child ?parent) 
 [?parent :block/children ?child]]]

The first vector is called the head of the rule where the first symbol is the name of the rule. The rest of the rule is called the body.

It is possible to use (...) or [...] to enclose it, but it is conventional to use (...) to aid the eye when distinguishing between the rule's head and its body, and also between rule invocations and normal data patterns, as we'll see below.

You can think of a rule as a kind of function, but remember that this is logic programming, so we can use the same rule to find parents based on the entity-id of the child, and the child based on the entity-id of the parent.

Put another way, you can use both ?parent and ?child in (ancestor ?child ?parent) for input and for output. If you provide values for neither, you will get all the possible combinations in the database. If you provide values for one or both, it will constrain the result returned by the query as you would expect.

[:find ?uid ?string
 :in $ ?parent
 :where [?parent :block/children ?c]
        [?c :block/uid ?uid]
        [?c :block/string ?string]]
 

Now becomes:

[:find ?uid ?string
 :in $ ?parent %
 :where (ancestor ?c ?parent)
        [?c :block/uid ?uid]
        [?c :block/string ?string]]
 

The % symbol in the :in clause represents the rules.

This may not seem as an enormous achievement at first. Rules, however, can be nested. By extending the above rule you can make it so, that it returns not only the children, but the entire sub-tree under ?parent. Rules can contain other rules, and can call recursively themselves.

[[(ancestor ?child ?parent) 
 [?parent :block/children ?child]]
 [(ancestor ?child ?grand_parent) 
 [?parent :block/children ?child] 
 (ancestor ?parent ?grand_parent)]]]

We can now use this rule, for example, to count the number of descendants of a given block.

window.roamAlphaAPI.q(`
     [:find ?ancestor (count ?block)
      :in $ ?ancestor_uid % 
      :where  [?ancestor :block/uid ?ancestor_uid]
              [?ancestor :block/string]
              [?block :block/string]
    	      (ancestor ?block ?ancestor)]`
      ,"hAfIHN6Gi",rule);

Of course, in this example, we would be better off using the :block/parent attribute, which would allow for a much simpler query.

[:find ?ancestor (count ?block)
 :where  [?ancestor :block/uid "hAfIHN6Gi"]
         [?ancestor :block/string]
         [?block :block/parents ?ancestor]]

Pull

This post is already too long and too technical. For this reason, I am completely omitting a discussion about (pull ) requests - though in the examples in the roam.json I will include a few. (pull ?e [*]) is a powerful approach to get data from your database. Here are two references that are worth reading if you want to learn more.

Roam query SmartBlock

It is possible to run queries within SmartBlocks and in the Console of the Developer Tools in your browser. Results, however, are hard to view, because they are returned in obscure data structures such as nested JSONs.

Update 28 January 2021
I learned in the meantime, that you can also run simple 
queries natively in Roam using the :q command in a block. 
Try the following:

:q [:find(count ?t):where[_ :node/title ?t]]

It won't display pulls as nice as my SB or have 
page links, but still awesome ...

Further update on 22 February 2021: I have created a long list of sample statistical queries using :q. You can find those here.

I wanted to make the query experience more convenient and integrated into Roam. As a result, I created a set of SmartBlocks that help embed queries into your Roam pages, just like any other component you include in your documents.

Here’s the link to the DatomicQuery.JSON file that you can import into your Roam graph. This includes two pages, the SmartBlocks and a host of query examples. Read on to understand how to use them.

You may choose between simple and advance queries. Simple queries do not take input parameters and cannot include rules. You can of course include input parameters directly into your query, as you can see in the example bit further below. Advanced queries give you full flexibility.

Page links, date links

My SmartBlock will take the query results and format them as a table for easy consumption. It returns results in a single block using ::hiccup. This way I can avoid littering your graph with an unnecessary number of blocks. As additional convenience, I have built some simple logic to convert page titles into clickable page links and time into links to the corresponding Daily Notes pages.

For the page links feature, you need to author your query in a special way.

  • Designate the title filed by adding :name to the end of the field name. e.g. ?title:name

  • Designate the corresponding uid by placing the uid immediately after the ?title:name field, and by adding :uid to the end of the field name. e.g.: ?title:uid

  • Designate a field you want to convert to a Daily Notes page link by adding :date to the end of the field. e.g.: ?time:date

[:find ?title:name ?title:uid ?time:date
 :where [?page :node/title ?title:name]
        [?page :block/uid ?title:uid]
        [?page :edit/time ?time:date]
        [(clojure.string/starts-with? ?title:name "roam/")]]
Query example

Pull statements

The SmartBlock will also neatly display nested results as a table, in the table, in the table. When executing a query that includes a (pull ) statement, the result will be a tree, not a table. I render query results according to the following logic:

  • I will display the top-level of the result-set as rows of a table, with values as the columns.

  • Nested levels in the result-set alternate between being rendered in columns or rows.

  • To avoid too large result-sets, MAXROWS is set to 40 by default. In the advanced query, you can change this number.

  • At nested levels, I use MAXROWS/4 to limit the number of rows to display. Even with this setting, the resulting table could reach hundreds of rows. (40x10x10x…)

This is how the results of a (pull ) look. Pulling 1 level deep:

Pull example - 1 level deep

Pulling 2 levels deep:

Pull 2 levels deep

Query templates

To generate the template for your query, run the appropriate Roam42 SmartBlock:

  • Datomic simple-template

  • Datomic advanced-template

Once ready with your query, simply execute it by pressing the button nested under the query.

Closing thoughts

After one week, I am not even close to be an expert on the topic. If I wrote something silly, if there is an error in my queries or the SmartBlock, please let me know. You can reach me in the comments below, or on Twitter @zsviczian.

Also, I would be very interested to understand how you used what you learned from this post, and the SmartBlock. Please share your thoughts and results. Thank you!

Like this post?
Show your support.

Comments

  1. Amazing work, Zsolt. Thank you.

    ReplyDelete
  2. Hi Zsolt, this is a excellent article! I've translated this article into Chinese version, Could you please let me post it to the #RoamCN Chinese community?

    https://blog.jimmylv.info/2021-03-08-Roam-Data-Structure-Query-zh-translation/

    I will give credit to the author and the original address: Deep Dive Into Roam's Data Structure - Why Roam is Much More Than a Note Taking App -- Zsolt VicziƔn

    With the popularity of Roam Research, Bi-directional links and Block-based note taking software are emerging, and they (Hulu Notes, logseq, Athens) all use the Datomic Datalog database of Clojure technology stack, which makes me curious to explore more.

    This article will be a hardcore analysis of the implementation principles behind Roam, to discover the deep technical advantages of Roam based on Block, to help you meet the arrival of the Roam API age!

    ReplyDelete
    Replies
    1. That's great!. Thanks for asking! Feel free to publish.

      Delete
  3. Hi Zsolt - you do incredible work. Thank you for taking the time to publish it here. This seems to have stopped working. Has Roam changed to prevent it from executing properly? Or perhaps the update to Smartblocks?

    ReplyDelete
    Replies
    1. Indeed, it does seem something has changed.
      I used this about 1 month ago, it worked at that time.
      If you just want to run simple datomic queries you can use the :q syntax. You'll find a bunch of examples here: https://roamresearch.com/#/app/Zsolt-Blog/page/WUn5PuTDV

      Delete
  4. Thanks.. those are helpful.. but really need help getting to access the results to reformat them. The main one I was looking for was the namespace Smartblock. Happy to buy you a "week's worth of coffee" ;)

    ReplyDelete

Post a Comment

Popular posts from this blog

Showcasing Excalidraw

Conor ( @Conaw ) pointed me to Excalidraw last week, and I was blown away by the tool and especially about the opportunities it opens up for  Roam Research ! It is a full-featured, embeddable sketching component ready for web integration. This post will showcase key Excalidraw features and discusses some of the issues I still need to solve to complete its integration into Roam. I spent most of my free time during the week integrating Excalidraw into Roam. This article will introduce Excalidraw by showcasing its features.

Mind mapping with Excalidraw in Obsidian

Mind-mapping is a powerful tool. In this post I will show you how and when I mindmap with Excalidraw in Obsidian and why mindmapping is such a good tool for Personal Knowledge Management. Like this post? Show your support.

Evergreen Note on Note-taking Strategies and Their Practical Implementations

This is an evergreen note that I will be revisit regularly to develop a comprehensive list of note-taking approaches including their practical implementations in various software tools. This post will serve as an elaborate table of contents, including a brief introductory discussion on the importance of note-taking, followed by a high-level walkthrough of each method. Links to posts and videos with detailed examples and descriptions will follow over the coming weeks and months.

TOSCA an Algorithm for Framing Problems

We fail more often because we solve the wrong problem than because we get the wrong solution to the right problem. Russel L. Ackoff In case you were wondering, those are ducks on the table. The facilitator gave us six pieces of LEGOs and asked us to create ducks. You may think this is a well-defined problem. I find it amazing though, how each of us in a group of ten came up with a completely original design. Our unique perspective and our experiences and skills hugely influence our solutions to problems. How we perceive a situation will heavily influence the issues we identify and the solutions we find. If you put one person into a situation, they get stuck. When you put another into the same situation, they solve it in an instant or solve it in a way that you would have never expected. You can frame problems differently leading to unique solutions. Outside school there are rarely problems with an ultimate right solution. To go a step further, there are ra

contact: info@zsolt.blog