Scaling & Retrieval
Before the pipeline: what scaling laws taught us.
Where It Started
Before SHGAT and GRU, there was a simpler question: can node retrieval be solved with embedding similarity alone? NB-01 and NB-02 are the origin story — the experiments that showed why a learned approach was necessary at all.
The earliest message passing experiment: a synthetic hypergraph with controlled structure. Validated that SHGAT-style attention could learn to weight neighbors by structural role, not just semantic similarity. Five nodes. Three capability levels. Gradient check passed.
The toy problem was intentionally small to verify the mathematics before investing in a full training pipeline. It served its purpose: message passing works in theory.
The retrieval baseline: given a user intent, retrieve the most similar workflow using cosine similarity over mean-pooled node embeddings. Tested at three scales: 50 workflows, 200 workflows, 1,000 workflows.
Retrieval accuracy degrades sub-linearly with scale — but it degrades. At 1,000 workflows, cosine similarity over flat embeddings plateaus. The problem is not the retrieval mechanism; it is that flat embeddings cannot encode the sequential structure of node co-execution. This is the gap SHGAT was designed to fill.
Retrieval finds the most similar past workflow. Prediction produces the next tool directly, conditioned on the current sequence. These are different problems with different failure modes. NB-01 and NB-02 proved that retrieval alone is insufficient; the GRU was built to predict.
The Gap That Motivated Everything
NB-02 showed that at 920 leaf nodes, a pair of nodes can have near-identical embeddings but very different usage contexts — and very different nodes can co-execute reliably in sequence. Pure cosine similarity cannot distinguish either case.
The decision to build SHGAT came from this: if you cannot retrieve the right tool by what it looks like, you need a model that knows where it fits — in the graph, in the sequence, in the hierarchy. That is what the full SHGAT + GRU pipeline provides.