Bidirectional Associative Memories

Published: March 15, 2023Updated: March 19, 2023

Table of contents

Papers

BAM can link together data of different types. Associations...

From one side the model requires to use bipolar patterns - arrays of -1 and 1. But I need to store words and sentences. How do I decode them into that format? I looked into byte pair encoding (BPE) used in GPT-2. It basically tokenized by subwords.

How letters and words stored in neurons? Should I look into psychological research or just made up some decoder?

Cannot achieve clusters and hierarchy with just one layer in BAM.
Sadly words are triggered together with other neuron activity (even with electrodes placed in the brain, 'electrocorticography' - official name of the technique, we can't get much understanding)
Every letter and every word stored in separate neurons.

GPT has a vocabulary size of 40,478 since they have 478 base characters and chose to stop training after 40,000 merges.

GPT-2 uses bytes as the base vocabulary, which is a clever trick to force the base vocabulary to be of size 256 while ensuring that every base character is included in the vocabulary. GPT-2 has a vocabulary size of 50,257, which corresponds to the 256 bytes base tokens, a special end-of-text token and the symbols learned with 50,000 merges.

From https://huggingface.co/docs/transformers/tokenizer_summary

Here is how BPE encoding looks like. You can see it as a dictionary used by GPT. It has subwords and single letters. But the amount of subwords is limited by researchers.

As a result they have flat vocabulary. By "flat" I mean all symbols are independent. Strange prefix \u0120 (sometimes printed as Ġ, [1], [2], [3]) denotes a space, which means that nothing can be prepended to symbols containing it.

But instead of being flat we need a hierarchical network that starts with single letters and combines them into subwords and words.

Next, firing of inputs is not "flat" as well. We will not present a word "hello" as a simultaneous input to first layer neurons

Papers

Original - Kosko (1988) and a follow up with more examples
Chartier uses two matrices instead of transposition of one matrix (another short paper)
some explanation with code and pictures (but wrong tests, meaningless random data) post
Tae-Dok Eom, Changkyu Choi, Ju-Jang Lee Generalized asymmetrical bidirectional associative memory for multiple association
short version (still don’t understand what their notation means)
more examples and explanation of chartier approach, good examples with text and picture associations. and more: temporal pattern sequences with the help of autoassociative layer (pdf)
very formal and formula rich, mostly about stability
another paper with many formulas, this time focused on memory capacity