The goal of this document is to write a basic version of The NoteWriter to emphasize the core abstractions and the main logic.
The Model
The NoteWriter extract objects from Markdown files that will be stored inside nt/objects in YAML and inside nt/database.db using SQL tables (useful for speed up queries + the full-text search support).
For example:
This document generates 3 objects: 1 file (notes.md) and 2 notes (Note: Example 1 and Note: Example 2).
File
Here is the definition of the object File simplified for this document:
Basically, we persist various metadata about the file to quickly determine if a file has changed when running the command ntlite add. In addition:
Each object get assigned an OID (a unique 40-character string like the hash of Git objects). This OID is used as the primary key inside the SQL database and can be used with the official command nt cat-file <oid> to get the full information about an object.
Each object includes various timestamps. The creation and last modification dates are mostly informative. The timestamp LastCheckedAt is updated every time an object is traversed (even if the object hasn’t changed) and is useful to quickly find all deleted objects.
Each object uses Go struct tags to make easy to serialize them in YAML.
Each object includes the fields new and stale to determine if a change must be saved and if the object must be inserted or updated.
Note
Here is the definition of the similar struct Note:
ParsedXXX
The structs File and Note must be populated by parsing Markdown files but to make easy to test the parsing logic, we will use basic structs to ignore some of the complexity (for example, Note contains the logic to enrich the Markdown and convert it to HTML). This is the intent behind the structs ParsedXXX:
The logic to initialize a ParsedFile simply uses the standard Go librairies:
The logic to initialize ParsedNote is slightly more elaborate:
The target objects can be initalized from these structs easily:
As explained before, the OIDs uses the same format as Git (SHA1) but are not determined from a hash of the content. If the content of a note (or a flashcard) is edited, we want to update the old object (when Git stores a new file in this case). Therefore, the OID are in fact disguised UUID under the hood:
The Repository
Now that we know how to parse Markdown files, we need to write the logic to traverse the file system. Most commands will have to process the complete set of all note files, that are represented by the struct Repository:
The repository will be useful from many places inside the code to resolve absolute paths (the actual code contains a lot more methods) and is defined as a singleton (preferable compared to a global variable to initialize it lazily).
We define a convenient method to locate the note files:
We will reuse this method several times later but now, we need to have a look at the database.
The Database
Index represents the content of the database (= a list of known OIDs), including the staging area (= the objects that were added using ntlite add but still not committed using ntlite commit).
The index is a YAML file located at .nt/index. We define a few functions and methods to load and dump it:
The other attribute of DB is the connection to the SQLite database instance located at .nt/database.db:
We will use the standard database/sql Go package to interact with the database. We will also expose a singleton to make easy to retrieve the connection:
Using this connection, we can now add methods on our model to persist the objects in the database:
That’s a lot of code as we are using a low-level library. We have a method for every operation Insert(), Update(), Delete(), and an additional method Check() to only update the LastCheckedAt timestamp. The method Save() determines which method to call based on the attributes new and stale.
Before closing the section, there is still one issue to debate. Using CurrentDB().Client() makes easy to execute queries but each query is executed inside a different transaction. When running commands, we will work on many objects at the same time. If a command fails to any reasons, we may want to rollback our changes and only report the error. We need to use transactions.
Transactions
The standard type sql.DB exposes a method BeginTx that returns a variable of type *sql.Tx useful to Rollback() or Commit() the transaction. This object sql.Tx exposes also different methods to query the database, the same methods as offered by sql.DB, except there is no common interface between these two types. Ideally, we would like our methods Save() to work if there are a transaction in progress or not. To solve this issue, we define an interface:
We define only the few methods used by the application.
We also rework the method Client() on DB to use this type and to return the default connection when no transaction was started (*sql.DB) or the current transaction (*sql.Tx):
We will now implement the basic commands where these transactions will be indispensable.
The Commands
add
The command add updates the database with new objects. For this document, we consider only File and Note but The NoteWriter manages more object types (Flashcard, Reminder, Link, …). A common interface between these different types is useful to factorize code. For example, we want to add any type of object to the database in a uniform way. Here are the interfaces:
In practice, all objects satisfy the StatefulObject interface but we can choose to use one of two types to make explicit if we are interesting in reading the object or updating it.
The implementations of these methods is trivial. We have already covered the method Save(). Here are the other methods implemented by the struct File:
The last method SubObjects() will be particularly useful when processing the collection of notes, since we will create objects of type File and use SubObjects() to iterate over other sub-objects recursively without having to interact directly with all types of objects.
Here is the code for the command add:
We iterate over files using the walk() method. We create a new File using the newly function NewOrExistingFile() whose goal is to check in database if the file is already known and compare for changes:
When the State() of an object is different from None (= the object has changed), we place the object in the staging area (= the objects waiting to be committed). Then, we Save() every object to at minimum update their LastCheckedAt timestamp.
The only step remaining to be covered in more detail is the method StageObject():
Basically, we append a new IndexObject into the slice StagingArea defined by the struct Index. What is more subtle to understand is the field Data where the content of the staged object (can be any type) is serialized. Indeed, the index (and thus the staging area) is serialized in YAML. We serialize the content of all staged objects in YAML before compressing it using the package compress/zlib and encoding it in Base64 to end up with a simple string in Data:
Here is a preview of this code:
Saving the edited objects is particularly useful for deletions. When running the command add, there is a check (commented in the above code) to list all objects which have not be saved (= the objects that no longer exist) and we issue a DELETE in database to remove them. This way, the objects disappear from the relational database (and the inverted index) and are not visible from the desktop UI. But if the user decides to run the command reset (not supported in this document), we need to restore the content using the field Data in the staging area.
commit
The command commit only interacts with the database (nt/objects) since the relational database was already updated when adding the files.
The goal of this command is to move the objects present inside the staging area to the final objects under .nt/objects. The file .nt/index must also be updated to empty the staging area and append the new objects in the reference list.
The code iterates over the elements present in the slice StagingArea and decode/uncompress/unmarshall the objects before creating a new YAML file under .nt/objects.
The code ends with a call to the method ClearStagingArea() defined like this:
Objects are migrated from the staging area to the list of all known objects. The command commit ends by saving the file .nt/index.
We are ready for the next batch of files to add. That’s all for now.