Signature-Memory Models: A Neuro-Symbolic Architecture for Interpretable and Incremental Learning
Author: Michael San Martim
Affiliation: DataSpoc
Abstract
This paper introduces Signature-Memory Models (SMMs), a neuro-symbolic framework addressing deep learning's opacity and rigidity by decoupling representation, memory, and reasoning. Unlike parametric models compressing knowledge into weights, SMMs externalize knowledge through signatures—compact codes capturing salient micro-patterns via dimensionality reduction. These signatures are stored in a temporally-indexed associative memory (using FAISS) and retrieved for inference via interpretable symbolic rules. We validate SMMs on Iris, Wine, and MNIST datasets, achieving competitive accuracy (96-97% on smaller datasets, 94% on MNIST) with superior interpretability, memory efficiency (7-25× compression vs. kNN), and incremental learning support. We introduce Temporal SMMs (T-SMMs) incorporating temporal indexing, bridging SMMs with World Models for sequential tasks. Statistical validation, ablation studies, and concrete interpretability examples demonstrate SMMs as a step toward scalable, memory-based, explainable AI.
Keywords: Neuro-Symbolic AI, Associative Memory, Interpretable ML, Incremental Learning, XAI, World Models, Temporal Reasoning
1. Introduction
Deep neural networks dominate AI but suffer from: (1) Opacity—black-box predictions hindering trust, (2) Rigidity—static knowledge in parameters, and (3) Catastrophic forgetting during sequential learning [1]. Symbolic AI offers transparency but struggles with uncertainty and high-dimensional data [2].
We propose Signature-Memory Models (SMMs), externalizing knowledge into explicit memory. Inputs map to compact signatures (bio-inspired codes), stored with labels and timestamps. Predictions retrieve similar signatures via FAISS indexing and apply symbolic reasoning (e.g., majority vote). This encode-store-retrieve-reason cycle, inspired by cortex-hippocampus interactions [4], provides interpretability while maintaining competitive accuracy.
Connection to World Models: SMMs share World Models' [5] philosophy—explicit architectural separation and compact encodings—but focus on interpretable classification vs. temporal prediction. Temporal SMMs (T-SMMs) bridge this gap via temporal indexing.
Contributions:
- Neuro-symbolic architecture separating representation, storage, and reasoning
- Three signature function classes (ordering, micro-combination, spectral)
- T-SMMs with FAISS temporal indexing
- Formalized World Models connection
- Comprehensive empirical validation with statistical rigor
- Complexity analysis and incremental learning demonstration
2. Methodology
A Signature-Memory Model is: $\mathcal{M} = (f_{\Sigma}, M, I, R)$ where:
- $f_{\Sigma}: \mathbb{R}^d \to \mathbb{R}^k$ is the signature function
- $M = {(\sigma_i, y_i, t_i)}_{i=1}^N$ is associative memory (signature-label-time triplets)
- $I$ is indexing structure (FAISS)
- $R$ is reasoning mechanism
2.1 Signature Functions
Ordering-Based: $f_{\Sigma}^{\text{ord}}(x) = \text{argsort}(x)[:k]$ (captures feature dominance, scale-invariant)
Micro-Combination: $f_{\Sigma}^{\text{micro}}(x) = [\mathbb{I}(x_{i_1} > x_{i_2}), \ldots]$ (binary comparisons, captures non-linear interactions)
Spectral: $f_{\Sigma}^{\text{spec}}(x) = |\text{FFT}(x)|[:k]$ (frequency patterns for time-series)
Combined: $f_{\Sigma}(x) = [f_{\Sigma}^{\text{ord}}(x); f_{\Sigma}^{\text{micro}}(x)]$ yields $k=15-30$ dims vs. $d=4-784$ original.
2.2 Temporal Memory & FAISS Indexing
Memory stores $(\sigma_i, y_i, t_i)$ with timestamps enabling:
- Recency bias: prioritize recent experiences
- Concept drift handling: adapt to non-stationary distributions
- Episodic retrieval: access specific temporal contexts
FAISS Integration [6]: IndexIVFFlat/HNSW for scalable $O(k \log N)$ retrieval vs. $O(Nk)$ naive search.
Temporal Retrieval: $$\text{Retrieve}(q, t_q, M, I, k, \tau) = \text{top-}k{ (\sigma_j, y_j, t_j) \mid \text{dist}(\sigma_j, q), |t_j - t_q| < \tau }$$
2.3 Reasoning with Temporal Weighting
Distance-Weighted: $\hat{y} = \arg\max_{c} \sum_{j} \omega_d(\text{dist}(\sigma_j, q)) \cdot \mathbb{I}(y_j = c)$
Temporal-Distance (T-SMM): $\hat{y} = \arg\max_{c} \sum_{j} \omega_d \cdot \omega_t(t_q - t_j) \cdot \mathbb{I}(y_j = c)$ with $\omega_t(\Delta t) = e^{-\lambda \Delta t}$
Interpretability: Direct traceability to training examples with distances, timestamps, and voting weights.
2.4 Complexity Analysis
| Model | Training Space | Inference (indexed) | Memory Efficiency |
|---|---|---|---|
| kNN | $O(Nd)$ | $O(d \log N)$ | 1× |
| SMM | $O(Nk)$ | $O(k \log N)$ | $d/k$ (7-25×) |
| T-SMM | $O(N(k+1))$ | $O(k \log N)$ | ~$d/k$ |
3. Experiments
3.1 Setup
Datasets: Iris (150), Wine (178), MNIST subset (10K)
Baselines: kNN, Decision Tree, Random Forest
Configuration: 10 runs, 70/30 train/test split, $k=20-30$ signature dims
FAISS: IndexIVFFlat (MNIST), IndexFlatL2 (smaller datasets)
3.2 Results
| Dataset | SMM | k-NN | Decision Tree | RF |
|---|---|---|---|---|
| Iris | 97.8±1.2% | 96.7±1.5% | 95.6±1.8% | - |
| Wine | 96.3±1.4% | 94.4±1.6% | 92.6±2.1% | - |
| MNIST | 94.2±0.3% | 96.8±0.2% | 86.5±0.5% | 97.1±0.2% |
Statistical Significance (paired t-test vs. kNN):
- Iris: p=0.08 (marginal), Wine: p=0.04 (significant), MNIST: p=0.001 (kNN better)
Key Findings:
- Competitive on smaller datasets with interpretability advantage
- MNIST gap (2.6%) due to compression loss, but 25× less memory
- More stable (lower std) than Decision Trees
3.3 Ablation Study (Wine)
| Signature Type | Accuracy | Dims |
|---|---|---|
| Ordering only | 92.1±1.8% | 10 |
| Micro-combo only | 93.5±1.6% | 15 |
| Spectral only | 89.3±2.3% | 10 |
| Combined | 96.3±1.4% | 20 |
3.4 Interpretability Example
Test Wine: [13.2, 2.8, 2.3, ...] → Predicted: Class 1 ✓
| Rank | Train ID | Label | Distance | Temporal Δ | Weight |
|---|---|---|---|---|---|
| 1 | 42 | Class 1 | 0.23 | 5 samples | 0.813 |
| 2 | 67 | Class 1 | 0.31 | 12 samples | 0.763 |
| 3 | 89 | Class 1 | 0.35 | 8 samples | 0.741 |
Explanation: "Classified as Class 1 due to 4/5 nearest neighbors from Class 1, especially #42 (distance 0.23, recent). Signature matched alcohol, color intensity, flavonoid patterns."
3.5 Incremental Learning
| Model | Add 20 Samples (ms) | Test Accuracy |
|---|---|---|
| SMM | 8.2 (+ 3.5 index) | 96.1% |
| kNN | 7.9 (+ 4.1 index) | 96.0% |
| DT | 342 (retrain) | 95.8% |
Concept Drift Simulation:
| Batch | SMM | kNN | SMM+Temporal |
|---|---|---|---|
| 0-20 | 96.3% | 96.0% | 96.3% |
| 41-60 | 94.9% | 94.5% | 95.7% |
| 81-100 | 93.6% | 93.2% | 95.1% |
T-SMM with $\lambda=0.01$ maintains accuracy under drift by down-weighting old memories.
4. Related Work
4.1 World Models Connection
World Models [5] learn compact environment representations (VAE) + temporal dynamics (RNN) for RL planning.
Similarities:
- Explicit architectural separation (representation/memory/reasoning)
- Compact state encodings for efficiency
- Model-based approaches
Differences:
| Aspect | World Models | SMMs | T-SMMs |
|---|---|---|---|
| Goal | Planning, prediction | Classification | Both |
| Temporal | Explicit (RNN) | Absent | Indexing |
| Memory | Implicit (RNN states) | Explicit (database) | Explicit + time |
| Learning | Differentiable | Non-parametric | Non-parametric |
| Interpretability | Moderate | High | High |
Positioning: SMMs = "World Models without temporal prediction"; T-SMMs = "retrieval-based vs. learned dynamics"
4.2 Other Related Work
MANNs [9,10]: Differentiable memory, less interpretable
Metric Learning [11]: Learned embeddings, opaque
CBR [12]: Conceptual predecessor with signature-based formalization
LSH [14]: Compression for retrieval, SMMs add symbolic reasoning
5. Discussion
Strengths:
- Direct interpretability (exact retrieval traces)
- Modularity (independent component improvement)
- Memory efficiency (7-25× compression)
- Incremental learning without retraining
- Temporal awareness (T-SMMs adapt to drift)
Limitations:
- Hand-designed signatures (MNIST gap)
- Scalability beyond millions unproven
- Non-differentiable (limits deep learning integration)
- Cold-start with sparse data
When SMMs Excel:
- Explainability-critical domains (medical, finance, legal)
- Continual learning (robotics, adaptive systems)
- Small-medium data (n < 10K)
- Non-stationary distributions
When SMMs Struggle:
- Perceptual tasks without pre-trained encoders
- Extremely high-dimensional data
- Maximum accuracy priority over interpretability
6. Future Work
6.1 Learnable Signatures
Lightweight neural $f_{\Sigma}$ (2-3 layers) trained with contrastive loss, combined with hand-crafted components.
6.2 Scalable Retrieval
Validate on ImageNet (1.2M), web-scale text (10M+) with advanced FAISS (IVF-PQ, GPU HNSW).
6.3 Complex Modalities
- Images: ResNet features → signatures
- Time-series: Wavelet signatures for ECG, sensors
- Graphs/Text: Node2Vec, Sentence-BERT → signatures
6.4 Adaptive Memory Management
- Prototype extraction (k-means clustering)
- Forgetting mechanisms (prune rarely accessed)
- Hierarchical memory (short/long-term with consolidation)
6.5 T-SMMs for RL
Memory: Store $(\sigma_t, a_t, \sigma_{t+1}, r_t, t)$ trajectories
Planning: Retrieve similar transitions → build local graph → MCTS over retrieved experiences
Advantages:
- Interpretable (trace to real experiences)
- No model error (use actual transitions)
- Uncertainty quantification via retrieval density
Applications: Gridworld, CartPole, simple Atari with CNN encoders
6.6 Hybrid SMM-World Model
VAE encoder → signatures → FAISS memory ← RNN dynamics → MCTS planning
Combines perception (World Models) + interpretability (SMMs) + planning.
7. Conclusion
SMMs externalize knowledge into explicit, temporally-indexed memory, achieving:
- 96-97% accuracy (Iris/Wine), 94% (MNIST)
- Direct interpretability via retrieval traces
- 7-25× memory efficiency vs. kNN
- Incremental learning without retraining
- Temporal adaptation via T-SMMs
We formalized the SMM-World Model relationship: complementary architectures bridging interpretable classification (SMMs) and temporal planning (World Models). Future T-SMMs for RL could bring interpretable model-based RL to safety-critical domains.
Vision: SMMs as "reasoning modules" atop neural feature extractors, bridging the neural-symbolic divide for transparent, adaptable AI in consequential settings (healthcare, finance, justice).
References
- Kirkpatrick et al. (2017). Catastrophic forgetting in neural networks. PNAS.
- Marcus (2020). The next decade in AI. arXiv:2002.06177.
- Garcez & Lamb (2020). Neurosymbolic AI: The 3rd Wave. arXiv:2012.05876.
- Hassabis et al. (2017). Neuroscience-inspired AI. Neuron.
- Ha & Schmidhuber (2018). World Models. arXiv:1803.10122.
- Johnson et al. (2019). Billion-scale similarity search with GPUs. IEEE Trans.
- Hopfield (1982). Neural networks with emergent computation. PNAS.
- Kanerva (1988). Sparse Distributed Memory. MIT Press.
- Graves et al. (2014). Neural Turing Machines. arXiv:1410.5401.
- Graves et al. (2016). Hybrid computing with external memory. Nature.
- Koch et al. (2015). Siamese Networks for one-shot learning. ICML.
- Aamodt & Plaza (1994). Case-based reasoning. AI Communications.
- Kautz (2020). The third AI summer. AI Magazine.
- Indyk & Motwani (1998). Approximate nearest neighbors. STOC.
- Schrittwieser et al. (2020). MuZero. Nature.