Deep Dive Into Roam's Data Structure - Why Roam is Much More Than a Note Taking App

Which are the longest paragraphs in your graph?
Which pages did you edit or create last week?
How many paragraphs of text do you have in your database in total?
Which pages do you have under a given namesapece (e.g. meetings/)?

Roam Research is a full-featured database and you can ask it many more questions, beyond what is available via the {{[[query]]:}} function. This post should give you a good foundational understanding of the underlying data structure in Roam.

I have spent the last week deep-diving into Roam's data. I had lots of fun and I have learned a lot. This summary is for myself, as much as for anybody else, attempting to capture my understanding in writing. It is possible, that you will find this too technical. I am sorry for that. I will try my best to convey the information in a way that is easy to follow, building from the most basic concepts to the complex.

In the course of my explorations I have also built a set of query SmartBlocks and created several example queries which you can find here. Even if you don't want to understand the details, you may find running some examples interesting.

Through my deep-dive, my appreciation for Roam Research has grown significantly. I am ever more confident that Roam will scale. In a not-so-distant future, Roam will hold, in full-text, everything that I read. My notes, book and article summaries, etc. will conveniently reference back to the original source accessible with a single click within one system. The future of Roam is bright!

My work in this post builds on many extremely valuable articles and references. In particular, I would like to highlight the following. Once you have read my overview, if you thirst for more, I highly recommend checking these out as well.

Learn Datalog Today! by Jonas Enlund
Introduction to the Roam Alpha API - Put Your Left Foot
Datalog Queries for Roam Research by David Bieber
Datomic Queries and Rules | Datomic
The Datomic Information Model (infoq.com) by Rich Hickey, the author of Clojure and designer of Datomic
Roam42 Source Code, by @RoamHacker
clojure.core namespace | ClojureDoc
clojure.string namespace | ClojureDocs

You will find two things in this article that I couldn't find in any of the ones above:

A detailed discussion of the very basics, including a comprehensive introduction of the basic Roam data structures.
A set of #42SmartBlocks to execute advanced queries right inside Roam. If you are not interested in the basics and want to look at the SmartBlock, jump ahead.

Let's dive in! I hope you will enjoy the ride as much as I have!

Basic Concepts

Roam is built on a Datomic database. In simple terms, a Datom is an individual fact. It is an attribute with a value. Datoms consist of four elements:

Entity ID
Attribute
Value
Transaction ID

You can think of Roam as a flat set of Datoms that looks like this:

[<e-id>	<attribute>	<value>			<tx-id>  ]
...
[4 	:block/children	5 			536870917] 
[4 	:block/children	9 			536870939] 
[4 	:block/uid 	"01-19-2021" 		536870916] 
[4 	:node/title 	"January 19th, 2021" 	536870916] 
[5 	:block/order 	0 			536870917] 
[5 	:block/page 	4 			536870918] 
[5 	:block/parents 	4 			536870918] 
[5 	:block/refs 	6 			536870920] 
[5 	:block/string 	"check [[Projects]]" 	536870919] 
[5 	:block/uid 	"r61dfi2ZH" 		536870917]

Datoms that share the transaction-id were added within the same transaction. Amongst others, this transactional approach makes it possible for Roam to synchronize content to your different devices and to manage complex undo operations.

Datoms that have the same entity-id are facts about the same block.

If you want to query the entity-id of a block, based on its block reference, you could write:

[:find ?e-id
 :where
 [?e-id :block/uid "r61dfi2ZH"]]

Considering the data above, this query would return the value 5.

Attributes

Roam stores facts about paragraphs and pages using :block/ attributes. There are few minor differences between pages and paragraphs that I will explain in a minute, however, the basic concept you must understand, is that a page is just a special type of block. Mostly, Roam treats a page exactly the same way as a paragraph. Both are blocks.

Blocks have 2 IDs

Hidden ID:

The entity-id is the real block-id, even though it is not visible through the Roam user interface. This is the ID that is used to tie information together in the database. The Entity ID identifies facts about a block, capturing parent-child relationships, and references to blocks.

Public ID:

Is the block reference of a paragraph, e.g.: ((GGv3cyL6Y)), or
the [[Page Title]] for pages. Note that pages also have a nine-character long UID - very much like the block reference, you could use these, for example, for constructing URLs that point to specific pages in your graph.

Common attributes for all blocks

Every block will have the following attributes:

`:block/uid`	The public ID, i.e. nine-character long block reference.
`:create/email`	The email address of the person who has created the block.
`:create/time`	Time in milliseconds since epoch (January 1, 1970 midnight UTC/GMT).
`:edit/email`	The email address of the person who has edited the block.
`:edit/time`	The time the block was last edited.

[10 :block/uid		"p6qzzKa-u"     536870940]
[10 :create/email	"foo@gmail.com" 536870940]
[10 :create/time	1611058803997   536870940]
[10 :edit/email		"foo@gmail.com" 536870940]
[10 :edit/time		1611058996600   536870949]

Trees in the forest

The Roam database is like a forest. Each page is a tree. The root of the tree is the page, the branches are the higher-level paragraphs; the leaves are the paragraphs at the deepest level of nesting on the page.

[[Page]]
* Branch
  * Branch
    * Leaf
    * Leaf
  * Leaf
  * Branch
    * Branch
      * Leaf
* Branch
  * Leaf
...

For every paragraph, Roam always creates two pointers. Children reference the entity-id of their parents using :block/parents and parents reference the entity-id of their children using :block/children.

[4	:block/children	5	536870917]
[5	:block/parents	4	536870918]

Parents keep a list of their children in the :block/children attribute. This list will ONLY include the entity-id of their immediate descendants, not the grandchildren. A page will only list the top-level paragraphs on the page as its children, but not the nested paragraphs. Similarly, a paragraph will only list blocks nested right under it, not the blocks nested under the nested blocks. The lowest level blocks in the nesting (the leaves) will have no :block/children attribute.

Children also keep a list of their parents in the :block/parents attribute. Contrary to :block/children, the list of parents includes the entity-id of ALL ancestors i.e. grandparents, great grandparents, etc. A nested paragraph will have references to the parent paragraph(s) and the page as well. The top-level paragraphs on a page have the entity-id of the page in the :block/parents attribute, while paragraphs nested under another paragraph will have the entity-id of the higher level paragraph and the entity-id of the page.

Page-only attributes

:node/title All pages will have a title, and no paragraphs will have one.

If you want to find all pages in your database, you need to query :node/title because this property will only hold value for pages. By executing the following query you will get a table with two columns: the entity-id of each page under ?p and the title of each page under ?title.

[:find ?p ?title
 :where [?p :node/title ?title]]

If you would also want to see the nine-character UID for each page, for example, to construct a link to the page, you would need to find the :block/uid attribute associated with the ?p entity-id. Here's how the query would look like. Note how ?p appears in both patterns in the where clause. This tells the query engine to find the title and the uid for the same entity.

[:find ?p ?title ?uid
 :where [?p :node/title ?title]
        [?p :block/uid ?uid]]

Paragraph-only attributes

Every paragraph will have the following attributes:

`:block/page`	Every paragraph on a page regardless of their level of nesting will also reference the entity-id of their page.
`:block/order`	This is the sequence of the block within the page or within their level of nesting under a paragraph. You will need to sort this value to retrieve the paragraphs in the proper sequence, as they appear in the document.
`:block/string`	The contents of the block.
`:block/parents`	The ancestors of the paragraph. For top-level paragraphs this is only the page. For nested paragraphs, this attribute lists all their ancestors leading up to (and including) the page.

Optional attributes:

Roam will only set these attributes (only exist in the database for the paragraph) if you change the value from the default for the specific block. e.g. you set the text alignment of the block from left aligned to centered.

`:children/view-type`	Specifies how to display the block’s children. Recognized values are ‘bullet’, ‘document’, ‘numbered’.
`:block/heading`	In case you set the heading level of the block to H1, H2, or H3. Allowed values of this are 1,2,3.
`:block/props`	This is where Roam stores the sizing of an image or iframe, the position of the slider, the setting of the Pomodoro timer, etc.
`:block/text-align`	Alignment of the paragraph. Values are 'left', 'center', 'right', 'justify'.

The Roam data-structure

If you are wondering how to find out what attributes exist in your database, I have good news! Using a simple query, you can list all the attributes in your database.

[:find ?Namespace ?Attribute
 :where [_ ?Attribute]
[(namespace ?Attribute) ?Namespace]]

Below is the list. Truth be told, the query above will not sort the values, and will not create the last column. I have included the slightly more advanced version of the query, which will do the sorting, in the downloadable roam.json file. I found the namespace function in the clojure.core documentation.

Namespace	Attribute	:Namespace/Attribute
attrs	lookup	:attrs/lookup
block	children	:block/children
block	heading	:block/heading
block	open	:block/open
block	order	:block/order
block	page	:block/page
block	parents	:block/parents
block	props	:block/props
block	refs	:block/refs
block	string	:block/string
block	text-align	:block/text-align
block	uid	:block/uid
children	view-type	:children/view-type
create	email	:create/email
create	time	:create/time
edit	email	:edit/email
edit	seen-by	:edit/seen-by
edit	time	:edit/time
entity	attrs	:entity/attrs
log	id	:log/id
node	title	:node/title
page	sidebar	:page/sidebar
user	color	:user/color
user	display-name	:user/display-name
user	email	:user/email
user	photo-url	:user/photo-url
user	settings	:user/settings
user	uid	:user/uid
vc	blocks	:vc/blocks
version	id	:version/id
version	nonce	:version/nonce
version	upgraded-nonce	:version/upgraded-nonce
window	filters	:window/filters
window	id	:window/id

Like this post?
Show your support.

Queries

If you are interested in writing queries for Roam, you should really work through the nine chapters of Learn Datalog Today. It is fun and action packed with exercises.

I will now quote a few paragraphs from the tutorial almost verbatim, changing the examples to fit Roam. For the rest, please visit the tutorial.

I also recommend the following youtube video by Stuart Halloway, which summarizes the key features of the Datalog query language in eleven minutes.

Core concepts

A query is a vector starting with the keyword :find followed by one or more pattern variables (symbols starting with ?, e.g. ?title). After the find clause comes the :where clause which restricts the query to datoms that match the given data patterns. Use the _ symbol as a wildcard for the parts of the data pattern you wish to ignore.

For example if you want to find the text based on the block reference of a paragraph you would write:

[:find ?string
 :where [?b :block/uid "r61dfi2ZH"]
        [?b :block/string ?string]]

Considering the example at the beginning of this post, this query would return "Check [[Projects]]"

The important thing to note here is that the pattern variable ?b is used in both data patterns. When a pattern variable is used in multiple places, the query engine requires it to be bound to the same value in each place. Therefore, this query will only find the string for the block that has the uid r61dfi2ZH.

Datoms about an entity may be in attributes that are in different namespaces. For example, if I would want to find the title of the page, that holds the ((r61dfi2ZH)) paragraph,I would write the following query. Note that I first read the ?block/page attribute for the entity-id of the page which I store in ?p. I then use this to locate the ?note/title and ?block/uid for the page.

[:find ?title ?uid
 :where [?b :block/uid "r61dfi2ZH"]
        [?b :block/page ?p]
        [?p :node/title ?title]
        [?p :block/uid  ?uid]]

Considering the example above, this would return "January 19th, 2021" and "01-19-2021".

The :in clause provides the query with input parameters, much in the same way that function or method arguments do in your programming language. Here's how the previous query would look like, with an input parameter for block_reference.

[:find ?title ?uid
 :in $ ?block_ref
 :where [?b :block/uid ?block_ref]
        [?b :block/page ?p]
        [?p :node/title ?title]
        [?p :block/uid  ?uid]]

This query takes two arguments: $ is the database itself (implicit, if no :in clause is specified) and block_ref which presumably will be the block reference of the paragraph.

You can execute the above using window.roamAlphaAPI.q(query,block_ref);. If you do not provide a value for $, the query engine will implicitly assume the default database. Since you will be only querying your own Roam database, there is no need to state the database. Maybe once Roam offers cross database links, this could become interesting.

I will now skip in the tutorial to cover a few topics that are slightly different in Roam. If you are interested in what you are missing out on, head over to the tutorial for the details I am skipping. There is a very helpful discussion about Tuples, Collections, and Relations which provide means to execute logical OR and AND operations.

Predicates

The predicate clause filters the result set to only include results for which the predicate returns true. In datalog you can use any Clojure function or Java method as a predicate function. In my experience, in the Roam javascript implementation, Java functions are not available, and only a handful of Clojule functions work.

Clojure functions must be fully namespace-qualified, except for the clojure.core namespace. Sadly, outside the core namespace I have only found few that worked in Roam. These include clojure.string/includes?, clojure.string/starts-with?, and clojure.string/ends-with?. Some of the helpful functions from the core namespace include namespace which returns the namespace of an attribute, and count which returns the length of a string. Some ubiquitous predicates that can also be used without namespace qualification are <, >, <=, >=, =, not=, != and so on.

Here are two examples using predicates. The first one returns the number of characters in a paragraph based on the block_reference.

[:find ?string ?size
 :in $ ?block_ref
 :where [?b :block/uid ?block_ref]
        [?b :block/string ?string]
        [(count ?string) ?size]]

The second lists blocks that were modified after a given date.

[:find ?block_ref ?string
 :in $ ?start_of_day
 :where [?b :edit/time ?time]
        [(> ?time ?start_of_day)]
        [?b :block/uid ?block_ref]
        [?b :block/string ?string]]

Transformation functions

Sadly, I could not make transformation functions work in javascript. These are only an option if you install a Datalog database on your desktop, and load the Roam.EDN for further manipulation.

The only workaround available is to post-process the results after the query. The following example will filter page titles to find a text fragment ("temp") case insensitively, and then to sort the results alphabetically. This query will return pages including words like "Template", "template", "Temporary", "attempt", etc.

let query = `[:find ?title ?uid
              :where [?page :node/title ?title]
        	     [?page :block/uid ?uid]]`;

let results = window.roamAlphaAPI.q(query)
                     .filter((item,index) => item[0].toLowerCase().indexOf('temp') > 0)
                     .sort((a,b) => a[0].localeCompare(b[0]));;

Aggregates

Aggregates work, as expected. There are many aggregates available including sum, max, min, avg, count. You can read more about aggregates here.

If, for example, you do not know the purpose of an an attribute, or what values are allowed, simply query your database to find existing values. The next example lists the values for :children/view-type. Note, that if you are only using bullets in your graph the query will only return one value: 'bullet'. I use the distinct aggregate function, without it I would get a list of potentially thousands of values, one row for each block where view-type is specified.

[:find (distinct ?type)
 :where
 [_ :children/view-type ?type]]

Rules

You can abstract away reusable parts of your queries into rules, give them meaningful names and forget about the implementation details, just like you can with functions in your favorite programming language.

A typical example of rules in Roam are the ancestor rules. These exploit the :block/children to traverse the tree of nested blocks. A simple ancestor rule would look like this. This finds the ?child based on a ?parent entity-id.

[[(ancestor ?child ?parent) 
 [?parent :block/children ?child]]]

The first vector is called the head of the rule where the first symbol is the name of the rule. The rest of the rule is called the body.

It is possible to use (...) or [...] to enclose it, but it is conventional to use (...) to aid the eye when distinguishing between the rule's head and its body, and also between rule invocations and normal data patterns, as we'll see below.

You can think of a rule as a kind of function, but remember that this is logic programming, so we can use the same rule to find parents based on the entity-id of the child, and the child based on the entity-id of the parent.

Put another way, you can use both ?parent and ?child in (ancestor ?child ?parent) for input and for output. If you provide values for neither, you will get all the possible combinations in the database. If you provide values for one or both, it will constrain the result returned by the query as you would expect.

[:find ?uid ?string
 :in $ ?parent
 :where [?parent :block/children ?c]
        [?c :block/uid ?uid]
        [?c :block/string ?string]]

Now becomes:

[:find ?uid ?string
 :in $ ?parent %
 :where (ancestor ?c ?parent)
        [?c :block/uid ?uid]
        [?c :block/string ?string]]

The % symbol in the :in clause represents the rules.

This may not seem as an enormous achievement at first. Rules, however, can be nested. By extending the above rule you can make it so, that it returns not only the children, but the entire sub-tree under ?parent. Rules can contain other rules, and can call recursively themselves.

[[(ancestor ?child ?parent) 
 [?parent :block/children ?child]]
 [(ancestor ?child ?grand_parent) 
 [?parent :block/children ?child] 
 (ancestor ?parent ?grand_parent)]]]

We can now use this rule, for example, to count the number of descendants of a given block.

window.roamAlphaAPI.q(`
     [:find ?ancestor (count ?block)
      :in $ ?ancestor_uid % 
      :where  [?ancestor :block/uid ?ancestor_uid]
              [?ancestor :block/string]
              [?block :block/string]
    	      (ancestor ?block ?ancestor)]`
      ,"hAfIHN6Gi",rule);

Of course, in this example, we would be better off using the :block/parent attribute, which would allow for a much simpler query.

[:find ?ancestor (count ?block)
 :where  [?ancestor :block/uid "hAfIHN6Gi"]
         [?ancestor :block/string]
         [?block :block/parents ?ancestor]]

Pull

This post is already too long and too technical. For this reason, I am completely omitting a discussion about (pull ) requests - though in the examples in the roam.json I will include a few. (pull ?e [*]) is a powerful approach to get data from your database. Here are two references that are worth reading if you want to learn more.

Datomic Pull in the Datomic On-Prem Documentation
Introduction to the Roam Alpha API on Put Your Left Foot.

Roam query SmartBlock

It is possible to run queries within SmartBlocks and in the Console of the Developer Tools in your browser. Results, however, are hard to view, because they are returned in obscure data structures such as nested JSONs.

Update 28 January 2021
I learned in the meantime, that you can also run simple 
queries natively in Roam using the :q command in a block. 
Try the following:

:q [:find(count ?t):where[_ :node/title ?t]]

It won't display pulls as nice as my SB or have 
page links, but still awesome ...

Further update on 22 February 2021: I have created a long list of sample statistical queries using :q. You can find those here.

I wanted to make the query experience more convenient and integrated into Roam. As a result, I created a set of SmartBlocks that help embed queries into your Roam pages, just like any other component you include in your documents.

Here’s the link to the DatomicQuery.JSON file that you can import into your Roam graph. This includes two pages, the SmartBlocks and a host of query examples. Read on to understand how to use them.

You may choose between simple and advance queries. Simple queries do not take input parameters and cannot include rules. You can of course include input parameters directly into your query, as you can see in the example bit further below. Advanced queries give you full flexibility.

Page links, date links

My SmartBlock will take the query results and format them as a table for easy consumption. It returns results in a single block using ::hiccup. This way I can avoid littering your graph with an unnecessary number of blocks. As additional convenience, I have built some simple logic to convert page titles into clickable page links and time into links to the corresponding Daily Notes pages.

For the page links feature, you need to author your query in a special way.

Designate the title filed by adding :name to the end of the field name. e.g. ?title:name
Designate the corresponding uid by placing the uid immediately after the ?title:name field, and by adding :uid to the end of the field name. e.g.: ?title:uid
Designate a field you want to convert to a Daily Notes page link by adding :date to the end of the field. e.g.: ?time:date

[:find ?title:name ?title:uid ?time:date
 :where [?page :node/title ?title:name]
        [?page :block/uid ?title:uid]
        [?page :edit/time ?time:date]
        [(clojure.string/starts-with? ?title:name "roam/")]]

Pull statements

The SmartBlock will also neatly display nested results as a table, in the table, in the table. When executing a query that includes a (pull ) statement, the result will be a tree, not a table. I render query results according to the following logic:

I will display the top-level of the result-set as rows of a table, with values as the columns.
Nested levels in the result-set alternate between being rendered in columns or rows.
To avoid too large result-sets, MAXROWS is set to 40 by default. In the advanced query, you can change this number.

At nested levels, I use MAXROWS/4 to limit the number of rows to display. Even with this setting, the resulting table could reach hundreds of rows. (40x10x10x…)

This is how the results of a (pull ) look. Pulling 1 level deep:

Pulling 2 levels deep:

Query templates

To generate the template for your query, run the appropriate Roam42 SmartBlock:

Datomic simple-template

Datomic advanced-template

Once ready with your query, simply execute it by pressing the button nested under the query.

Closing thoughts

After one week, I am not even close to be an expert on the topic. If I wrote something silly, if there is an error in my queries or the SmartBlock, please let me know. You can reach me in the comments below, or on Twitter @zsviczian.

Also, I would be very interested to understand how you used what you learned from this post, and the SmartBlock. Please share your thoughts and results. Thank you!