Signature-Memory Models: A Neuro-Symbolic Architecture for Interpretable and Incremental Learning

Signature-Memory Models: A Neuro-Symbolic Architecture for Interpretable and Incremental Learning

Recurso

Author: Michael San Martim
Affiliation: DataSpoc


Abstract

This paper introduces Signature-Memory Models (SMMs), a neuro-symbolic framework addressing deep learning's opacity and rigidity by decoupling representation, memory, and reasoning. Unlike parametric models compressing knowledge into weights, SMMs externalize knowledge through signatures—compact codes capturing salient micro-patterns via dimensionality reduction. These signatures are stored in a temporally-indexed associative memory (using FAISS) and retrieved for inference via interpretable symbolic rules. We validate SMMs on Iris, Wine, and MNIST datasets, achieving competitive accuracy (96-97% on smaller datasets, 94% on MNIST) with superior interpretability, memory efficiency (7-25× compression vs. kNN), and incremental learning support. We introduce Temporal SMMs (T-SMMs) incorporating temporal indexing, bridging SMMs with World Models for sequential tasks. Statistical validation, ablation studies, and concrete interpretability examples demonstrate SMMs as a step toward scalable, memory-based, explainable AI.

Keywords: Neuro-Symbolic AI, Associative Memory, Interpretable ML, Incremental Learning, XAI, World Models, Temporal Reasoning


1. Introduction

Deep neural networks dominate AI but suffer from: (1) Opacity—black-box predictions hindering trust, (2) Rigidity—static knowledge in parameters, and (3) Catastrophic forgetting during sequential learning [1]. Symbolic AI offers transparency but struggles with uncertainty and high-dimensional data [2].

We propose Signature-Memory Models (SMMs), externalizing knowledge into explicit memory. Inputs map to compact signatures (bio-inspired codes), stored with labels and timestamps. Predictions retrieve similar signatures via FAISS indexing and apply symbolic reasoning (e.g., majority vote). This encode-store-retrieve-reason cycle, inspired by cortex-hippocampus interactions [4], provides interpretability while maintaining competitive accuracy.

Connection to World Models: SMMs share World Models' [5] philosophy—explicit architectural separation and compact encodings—but focus on interpretable classification vs. temporal prediction. Temporal SMMs (T-SMMs) bridge this gap via temporal indexing.

Contributions:

  1. Neuro-symbolic architecture separating representation, storage, and reasoning
  2. Three signature function classes (ordering, micro-combination, spectral)
  3. T-SMMs with FAISS temporal indexing
  4. Formalized World Models connection
  5. Comprehensive empirical validation with statistical rigor
  6. Complexity analysis and incremental learning demonstration

2. Methodology

A Signature-Memory Model is: $\mathcal{M} = (f_{\Sigma}, M, I, R)$ where:

  • $f_{\Sigma}: \mathbb{R}^d \to \mathbb{R}^k$ is the signature function
  • $M = {(\sigma_i, y_i, t_i)}_{i=1}^N$ is associative memory (signature-label-time triplets)
  • $I$ is indexing structure (FAISS)
  • $R$ is reasoning mechanism

2.1 Signature Functions

Ordering-Based: $f_{\Sigma}^{\text{ord}}(x) = \text{argsort}(x)[:k]$ (captures feature dominance, scale-invariant)

Micro-Combination: $f_{\Sigma}^{\text{micro}}(x) = [\mathbb{I}(x_{i_1} > x_{i_2}), \ldots]$ (binary comparisons, captures non-linear interactions)

Spectral: $f_{\Sigma}^{\text{spec}}(x) = |\text{FFT}(x)|[:k]$ (frequency patterns for time-series)

Combined: $f_{\Sigma}(x) = [f_{\Sigma}^{\text{ord}}(x); f_{\Sigma}^{\text{micro}}(x)]$ yields $k=15-30$ dims vs. $d=4-784$ original.

2.2 Temporal Memory & FAISS Indexing

Memory stores $(\sigma_i, y_i, t_i)$ with timestamps enabling:

  • Recency bias: prioritize recent experiences
  • Concept drift handling: adapt to non-stationary distributions
  • Episodic retrieval: access specific temporal contexts

FAISS Integration [6]: IndexIVFFlat/HNSW for scalable $O(k \log N)$ retrieval vs. $O(Nk)$ naive search.

Temporal Retrieval: $$\text{Retrieve}(q, t_q, M, I, k, \tau) = \text{top-}k{ (\sigma_j, y_j, t_j) \mid \text{dist}(\sigma_j, q), |t_j - t_q| < \tau }$$

2.3 Reasoning with Temporal Weighting

Distance-Weighted: $\hat{y} = \arg\max_{c} \sum_{j} \omega_d(\text{dist}(\sigma_j, q)) \cdot \mathbb{I}(y_j = c)$

Temporal-Distance (T-SMM): $\hat{y} = \arg\max_{c} \sum_{j} \omega_d \cdot \omega_t(t_q - t_j) \cdot \mathbb{I}(y_j = c)$ with $\omega_t(\Delta t) = e^{-\lambda \Delta t}$

Interpretability: Direct traceability to training examples with distances, timestamps, and voting weights.

2.4 Complexity Analysis

ModelTraining SpaceInference (indexed)Memory Efficiency
kNN$O(Nd)$$O(d \log N)$
SMM$O(Nk)$$O(k \log N)$$d/k$ (7-25×)
T-SMM$O(N(k+1))$$O(k \log N)$~$d/k$

3. Experiments

3.1 Setup

Datasets: Iris (150), Wine (178), MNIST subset (10K)
Baselines: kNN, Decision Tree, Random Forest
Configuration: 10 runs, 70/30 train/test split, $k=20-30$ signature dims
FAISS: IndexIVFFlat (MNIST), IndexFlatL2 (smaller datasets)

3.2 Results

DatasetSMMk-NNDecision TreeRF
Iris97.8±1.2%96.7±1.5%95.6±1.8%-
Wine96.3±1.4%94.4±1.6%92.6±2.1%-
MNIST94.2±0.3%96.8±0.2%86.5±0.5%97.1±0.2%

Statistical Significance (paired t-test vs. kNN):

  • Iris: p=0.08 (marginal), Wine: p=0.04 (significant), MNIST: p=0.001 (kNN better)

Key Findings:

  • Competitive on smaller datasets with interpretability advantage
  • MNIST gap (2.6%) due to compression loss, but 25× less memory
  • More stable (lower std) than Decision Trees

3.3 Ablation Study (Wine)

Signature TypeAccuracyDims
Ordering only92.1±1.8%10
Micro-combo only93.5±1.6%15
Spectral only89.3±2.3%10
Combined96.3±1.4%20

3.4 Interpretability Example

Test Wine: [13.2, 2.8, 2.3, ...] → Predicted: Class 1 ✓

RankTrain IDLabelDistanceTemporal ΔWeight
142Class 10.235 samples0.813
267Class 10.3112 samples0.763
389Class 10.358 samples0.741

Explanation: "Classified as Class 1 due to 4/5 nearest neighbors from Class 1, especially #42 (distance 0.23, recent). Signature matched alcohol, color intensity, flavonoid patterns."

3.5 Incremental Learning

ModelAdd 20 Samples (ms)Test Accuracy
SMM8.2 (+ 3.5 index)96.1%
kNN7.9 (+ 4.1 index)96.0%
DT342 (retrain)95.8%

Concept Drift Simulation:

BatchSMMkNNSMM+Temporal
0-2096.3%96.0%96.3%
41-6094.9%94.5%95.7%
81-10093.6%93.2%95.1%

T-SMM with $\lambda=0.01$ maintains accuracy under drift by down-weighting old memories.


4. Related Work

4.1 World Models Connection

World Models [5] learn compact environment representations (VAE) + temporal dynamics (RNN) for RL planning.

Similarities:

  • Explicit architectural separation (representation/memory/reasoning)
  • Compact state encodings for efficiency
  • Model-based approaches

Differences:

AspectWorld ModelsSMMsT-SMMs
GoalPlanning, predictionClassificationBoth
TemporalExplicit (RNN)AbsentIndexing
MemoryImplicit (RNN states)Explicit (database)Explicit + time
LearningDifferentiableNon-parametricNon-parametric
InterpretabilityModerateHighHigh

Positioning: SMMs = "World Models without temporal prediction"; T-SMMs = "retrieval-based vs. learned dynamics"

4.2 Other Related Work

MANNs [9,10]: Differentiable memory, less interpretable
Metric Learning [11]: Learned embeddings, opaque
CBR [12]: Conceptual predecessor with signature-based formalization
LSH [14]: Compression for retrieval, SMMs add symbolic reasoning


5. Discussion

Strengths:

  • Direct interpretability (exact retrieval traces)
  • Modularity (independent component improvement)
  • Memory efficiency (7-25× compression)
  • Incremental learning without retraining
  • Temporal awareness (T-SMMs adapt to drift)

Limitations:

  • Hand-designed signatures (MNIST gap)
  • Scalability beyond millions unproven
  • Non-differentiable (limits deep learning integration)
  • Cold-start with sparse data

When SMMs Excel:

  • Explainability-critical domains (medical, finance, legal)
  • Continual learning (robotics, adaptive systems)
  • Small-medium data (n < 10K)
  • Non-stationary distributions

When SMMs Struggle:

  • Perceptual tasks without pre-trained encoders
  • Extremely high-dimensional data
  • Maximum accuracy priority over interpretability

6. Future Work

6.1 Learnable Signatures

Lightweight neural $f_{\Sigma}$ (2-3 layers) trained with contrastive loss, combined with hand-crafted components.

6.2 Scalable Retrieval

Validate on ImageNet (1.2M), web-scale text (10M+) with advanced FAISS (IVF-PQ, GPU HNSW).

6.3 Complex Modalities

  • Images: ResNet features → signatures
  • Time-series: Wavelet signatures for ECG, sensors
  • Graphs/Text: Node2Vec, Sentence-BERT → signatures

6.4 Adaptive Memory Management

  • Prototype extraction (k-means clustering)
  • Forgetting mechanisms (prune rarely accessed)
  • Hierarchical memory (short/long-term with consolidation)

6.5 T-SMMs for RL

Memory: Store $(\sigma_t, a_t, \sigma_{t+1}, r_t, t)$ trajectories

Planning: Retrieve similar transitions → build local graph → MCTS over retrieved experiences

Advantages:

  • Interpretable (trace to real experiences)
  • No model error (use actual transitions)
  • Uncertainty quantification via retrieval density

Applications: Gridworld, CartPole, simple Atari with CNN encoders

6.6 Hybrid SMM-World Model

VAE encoder → signatures → FAISS memory ← RNN dynamics → MCTS planning

Combines perception (World Models) + interpretability (SMMs) + planning.


7. Conclusion

SMMs externalize knowledge into explicit, temporally-indexed memory, achieving:

  • 96-97% accuracy (Iris/Wine), 94% (MNIST)
  • Direct interpretability via retrieval traces
  • 7-25× memory efficiency vs. kNN
  • Incremental learning without retraining
  • Temporal adaptation via T-SMMs

We formalized the SMM-World Model relationship: complementary architectures bridging interpretable classification (SMMs) and temporal planning (World Models). Future T-SMMs for RL could bring interpretable model-based RL to safety-critical domains.

Vision: SMMs as "reasoning modules" atop neural feature extractors, bridging the neural-symbolic divide for transparent, adaptable AI in consequential settings (healthcare, finance, justice).


References

  1. Kirkpatrick et al. (2017). Catastrophic forgetting in neural networks. PNAS.
  2. Marcus (2020). The next decade in AI. arXiv:2002.06177.
  3. Garcez & Lamb (2020). Neurosymbolic AI: The 3rd Wave. arXiv:2012.05876.
  4. Hassabis et al. (2017). Neuroscience-inspired AI. Neuron.
  5. Ha & Schmidhuber (2018). World Models. arXiv:1803.10122.
  6. Johnson et al. (2019). Billion-scale similarity search with GPUs. IEEE Trans.
  7. Hopfield (1982). Neural networks with emergent computation. PNAS.
  8. Kanerva (1988). Sparse Distributed Memory. MIT Press.
  9. Graves et al. (2014). Neural Turing Machines. arXiv:1410.5401.
  10. Graves et al. (2016). Hybrid computing with external memory. Nature.
  11. Koch et al. (2015). Siamese Networks for one-shot learning. ICML.
  12. Aamodt & Plaza (1994). Case-based reasoning. AI Communications.
  13. Kautz (2020). The third AI summer. AI Magazine.
  14. Indyk & Motwani (1998). Approximate nearest neighbors. STOC.
  15. Schrittwieser et al. (2020). MuZero. Nature.
    Signature-Memory Models: A Neuro-Symbolic Architecture for Interpretable and Incremental Learning | DataSpoc