First, I got carried away with researching what a Dyck language is. I was damn certain that some geeks already released an online service for converting natural languages to Dyck. NO! There is none.
So I decided to write one.
Literature
- There is an order between words of equal length. They transform using a special rule and all together make a tamari lattice. Amount of elements there is a Catalan number. ref
- There are another objects that under some conditions also counted by Catalan numbers. Thus can be created a bijection between such objects. At this point we are going to consider Dyck path (not words, defined in the article) and we will build a bijection to binary trees ref. related question about the bijection, and some notes
- Also noncrossing partitions can be ordered and hence similar to trees and paths. Phagocyte lattice ?
- Some notation on how to transition from Dyck words to paths
According to the notes (from someone's howework (?)):
Each time you go up in path , draw the left son from the last vertex you drew in tree . Each time you go down in path , go up one vertex in tree and draw the right son of that vertex.
And reverse:
Let be a binary tree. Start in the root of . If you can go down left in , do it and go up in your Dyck Path. If you can’t, then go up until you can go down right where you haven’t been before in and go down in the Dyck Path.
Maybe related about tree traversal
To my big surprise researchers are trying to use Large Language Models like GPT to predict word endings in the Dyck language. Found in Big Bench benchmark
For some reason I looked into the bijection between binary trees and Dyck words. It's not a trivial thing I must say.
But how do we encode alphabet or words from English language as binary trees? Morse code?