My understanding is that while other programs like workflowy store text as a single text file (or onenote as multiple page files for example), Remnote has no “text files” but “rems” that relate to others.
Could someone explain to me why you made this decision? What are the advantages and disadvantages? It occurs to me that this way there can be greater efficiency with synchronization, am I correct?
TLDR: It’s easier to save it the same format it is layed out in memory.
Let’s first make the distinction between the persistent data format (how the data is saved on disk) and in memory datastructures (how the data is layed out in memory while the software is running. This is not the same because of performance reasons: You can not operate efficiently on plain text. Even text editors which are supposed to just manipulate that plain text build an additional datastructure in memory when opening the file to support fast insertion and deletion of words on all places in the file. Otherwise you had to move all characters one memory cell forward/backward each time you add/delete a word. ([Sublime Text: Rope](https://en.wikipedia.org/wiki/Rope_(data_structure), VS Code: Piece Tree). When you save the file this datastructure is “exported” to plain text again. RemNote/Roam/obsidian for example need a graph representation of some for in memory e.g. to rename all references when you rename a bullet point.
Then the question is how do you export the graph to plain text and how do you parse text into a graph?
You can make an additional distinction between WYSIWYG (RemNote) and markup based (Obsidian).
When you store your data as plain text with some markup it makes sense to also edit them directly. Markup is very nice in a development context to split content from styling. WYSIWYG (while much harder to robustly implement (!) - even Word glitches on moving images) is much nicer to look at while editing. Roam kind of chose a middle ground to just show the markup of the current bullet.
And the more features you have the more data you’d somehow insert to insert into the markup. Mindforger for example hides much data in HTML comments in the markup. (I don’t know what Obsidian files look like.)
RemNote has clozes, latex, display full reference, disabled/enabled Type relation (little x).
Not to mention all the card related stuff.
I think the markup would get quite cluttered.
(Sidenote: I don’t think OneNote uses text files to store documents.)
First of all, thank you very much for your detailed answer.
I understand that marked vs WYSIWIG are often opposed. But, wouldn’t it be more correct to contrast user-friendly markup (like markdown) with structured markup (XML or HTML) that is usually displayed as WYSIWYG?
If my previous point is correct, even though the app has many features, I don’t see a reason to choose a graph instead of markup files structured like XML / HTML more than the one that you mentioned (edit things like to rename all references when you rename a bullet point).
What would the files from Remnote or another program that works with graphs look like? Here I mean what I think you called “persistent data format”.
Of course, I meant that onenote has a file format for each page, as opposed to workflow, which is my understanding that everything is a single large text file.
That would be another discussion by the way … If one chose to store text files and not “graphs”, in what small parts should they be divided?
Thank you for your time and I hope I did not ask a silly question that shows that I did not understand anything haha.
With markup I mean the generic concept (hypernym) of annotating text with semantics. Markdown, Wiki syntax (user friendly) and HTML/XML (less user friendly) are special cases.
I compare plain text + markup with WYSIWYG because this is the level of abstraction the user has to deal with. HTML is just the technology with which a document is displayed.
There is even a middle ground: Markdown can be displayed as more or less complex HTML. Typora for example hides the markup completely after typing and shows *text* bold etc. Obsidian also has this on their roadmap.
Or you can transform markdown like shortcodes like [[ and `` to WYSIWYG elements like RemNote does.
I think you still have a misunderstanding of what it means to “store a graph”.
Let me describe it from another point of view: Memory is just a sequence of memory cells. Let’s say each cell stores a character. The computer can access each cell at random with an address (“give me the content of cell number 5”). A text document for example can be stored in an interval of memory like cell 5 to cell 200.
But how would you express complex structures like a graph then? You can make a convention like:
A node v is located at cell X and following.
Cell X contains the address of the text of v (using another datastructure).
Cell X+1 contains how many children v has (lets say N).
Cells X+1+1…X+1+N contain addresses of the children nodes. Goto 1 and repeat.
If you want you can store that binary blob containing all cells like that on disk. But this is usually impractical, because those structures contain much unnecessary information (not in the example above) and it is not portable (maybe on your other device the memory cell X is already used).
Therefore this structure is translated into a save format e.g. a database, JSON or XML.
Translating into and from a machine readable format (JSON, XML) which closely matches the in memory datastructures is obviously easier than translating into a human readable format. (And if you let the user touch the data directly you have to implement all kinds of error handling which is a lot of work and testing.)
You can look at RemNotes database in the browsers DevTools. In Chrome (or Firefox) press F12, Application > Storage > IndexedDB … > quanta and optionally search for the rem id (the thing in the URL bar after ...document/.
The opened rem with the id sNcX... has 3 children.
The text of the rem is stored in key with a part having b(old): true and a reference to another rem given by its id.
Its parentMjL... is the Daily Document.
It has 2 typeParents aka tags: #Document and #Daily Document.
The web technology to store RemNotes data offline is called IndexedDB. It just stores key-value pairs where the key is the rem id and the value is a (json) object with all information about this rem.
This means your knowledge base is just a huge list of rem objects (or list of mini documents that reference each other if you want).
And an operation like moving a rem is just as easy as setting the parent entry of this value object to the id of another rem and inserting its id into the children list of the new parent.
I would rather not think about how you could do this in an unstructured, text-based way. Like having the parent name/id in a yaml frontmatter? And if you don’t have a file per rem, how would you address child rems when referencing? Using ids is so much easier.
Awesome @hannesfrank ! Most of that goes over my head, but very cool to learn a bit about it.
A couple of (probably very confusing/confused) questions
Does the entire knowledge base needs to be in memory at all times (to allow things like search and linking)? Or is there some kind of “streaming” system in place? I wouldn’t mind needing to wait a couple seconds longer for certain operations, once my database becomes very large. I definitely wouldn’t want Remnote to be taking more and more of my RAM as my database grows.
How do you think this will translate to the desktop app? Will it be able to use the same system? In terms of performance (and considering my question 1 above), could desktop be faster (and less of a burden on RAM/processor)?
I think it’s a good idea to have all text in memory in some kind of optimized data structure. Metadata and binary data (images and other media files) can be loaded when required. But I personally want the search and reference suggestions to be instant.
People always overestimate the size of text. We are talking about KB…MB here. A character takes up 1…4 Bytes (latin script text usually 1 Byte). A page of text is at most 500 words or 3000 characters which equals 3 KB. You’d have to write a lot to even come close to GB. Quick estimation: 100MB is 30000 pages and 1cm pile of A4 paper = 100 pages so you’d have to write a pile from floor to ceiling to start noticing it in your memory.
Of course there is overhead in your datastructure (text formatting, links etc.) but I guess this is at most one order of magnitude (10MB -> 100MB) when you have lots of small rem.
In my estimation the RAM usage of the knowledge graph is negligible compared to the RAM used by the UI which consists of a ton of HTML elements called DOM.
You can check the approx. size of you knowledge graph by checking the raw export.