• get_all_msg_headers(): cold *_NULL fields read undefined via dot-acces

    From Rob Swindell@1:103/705 to GitLab issue in main/sbbs on Thu May 21 14:27:18 2026
    open https://gitlab.synchro.net/main/sbbs/-/issues/1143

    ## Summary

    `MsgBase.get_all_msg_headers()` header objects return `undefined` on the **first** (cold) access of a `LAZY_STRING_TRUNCSP_NULL` field (`to_ext`, `from_ext`, `replyto`, `to_list`, `cc_list`, `summary`, `tags`, `from_org`, …) via **dot/literal** property access — even when the field is populated. Touching any other (non-NULL) field first "primes" the object and the field then reads correctly.

    This issue documents the **proven root cause** and leaves the fix direction open for discussion. Two prior commits are involved:

    - `ca448cb8b` — eager `JS_DefineProperty("number")` per header. **Ineffective** on real bases (see below).
    - `666ff71ce` — eager full-resolve per header. **Works**, but abandons the lazy design, and its commit message overstated the mechanism as proven (it was written before the decisive evidence below existed). The message should be corrected regardless of the final fix.

    ## Reproduced (read-only, live ~7.2k-msg mail base)

    With the pre-fix binary, cold `hdr.to_ext` → **7138 `undefined`** of **7026** actually-populated `to_ext`; priming with `.to` first → all correct. Strictly read-only (no `save_msg`/`remove_msg`). Repro scripts attached.

    ## Root cause (evidence-based)

    SpiderMonkey 1.8.5's **`JSOP_GETPROP` per-(shape, pc) property cache**, interacting with `js_msghdr_class`'s **old-style resolve hook** (`js_get_msg_header_resolve`) that **conditionally** defines `*_NULL` fields — the field becomes an own property iff `p->msg.<field>` is non-NULL.

    All bulk headers are created identically (`JS_NewObject(&js_msghdr_class, proto, …)`) and the hook defines fields in a fixed order, so they **share one shape lineage** but have **divergent real property sets** (a field is present on some headers, absent on others). The per-`(shape, pc)` GETPROP cache can't represent "sometimes present", so once an entry for `(shared-shape, .to_ext)` is established it mis-serves `undefined` to same-shape headers.

    ## Three discriminators that nailed it

    | Test | Result | Conclusion |
    |------|--------|------------|
    | Siblings (`attr`/`number`/`to`) after a cold `to_ext` miss | **alive on all 7140** | `p->enumerated` is **not** set; resolve hook runs fine |
    | `hdr["to_ext"]` (computed, **JSOP_GETELEM**) | undef = 213 = true NULL count → **correct** | not the resolve hook / SMB layer / data |
    | `hdr.to_ext` (literal, **JSOP_GETPROP**) | undef = 7138 → **the bug** | it's specifically the literal-name property cache |
    | cold-defined-but-*wrong* | **0** | purely spurious `undefined`, never bad data |

    SM source confirms the shape of it: `js_GetPropertyHelperWithShapeInline` fills the cache **only on the found path** (`jsobj.cpp:5438`) and explicitly **not** on a miss (`nofills++`, `jsobj.cpp:5366`) — i.e. found-then-mispredict across shared shapes, not negative caching.

    ### Why the prior fixes behaved as they did

    - `ca448cb8b` ("define number") can't work: it just moves every header off the empty shape onto **one shared** `{number,…}` shape; the conditional-property divergence is untouched.
    - `666ff71ce` (eager full-resolve) **does** address the root cause — every populated field becomes a real own property at construction, so GETPROP finds them by normal scope lookup with consistent shapes, no conditional resolve-add divergence. The cost is the loss of laziness.

    ## Notes on hypotheses raised in discussion

    - **`p->enumerated` spuriously set** — ruled out (siblings alive on the same object after a cold miss).
    - **`js_get_msg_header_resolve(…, JSID_VOID)` early-returning through macros** — doesn't happen; in the `name==NULL` path every `LAZY_*` macro only `return`s when `name` is non-NULL, so it falls through and defines all available fields, and that path never sets `enumerated`.

    ## Candidate fixes (undecided)

    1. **`JSCLASS_NEW_RESOLVE`** conversion — the SM-sanctioned way to keep laziness; hook sets `*objp` only when it actually defines a field. Must be built and proven against the live base (not guaranteed to fix it).
    2. **Keep eager-resolve** (`666ff71ce`) and just correct the commit message; optionally benchmark eager vs. lazy on the live base to quantify perf.
    3. **Lazy-define-as-null variants** that don't change the public contract (e.g. only in the bulk path, or non-enumerable null) so same-shape headers stop diverging.

    ## Reproducers (attached)

    Both run via `jsexec -c /sbbs/ctrl <script>` against the live `mail` base, strictly read-only:

    - [probe_to_ext.js](/uploads/e84b9f7807c1bc07057efb35633a6e62/probe_to_ext.js) — cold vs. primed `to_ext` counts, transition point, mismatch detection.
    - [probe_enum.js](/uploads/b261bac444a39ba2730c7b0fdaed12dd/probe_enum.js) — the discriminators: sibling-after-miss, and GETELEM-vs-GETPROP per-field cold reads.
    --- SBBSecho 3.37-Linux
    * Origin: Vertrauen - [vert/cvs/bbs].synchro.net (1:103/705)
  • From Rob Swindell@1:103/705 to GitLab note in main/sbbs on Thu May 21 14:38:28 2026
    https://gitlab.synchro.net/main/sbbs/-/issues/1143#note_8987

    **SpiderMonkey 1.8.5 per-shape property-cache implementation — references**

    All paths relative to the vendored tree `3rdp/src/mozjs/js-1.8.5/js/src/`.

    Canonical doc (cited in the header itself, `jspropertycache.h:51-53`): MDN *SpiderMonkey/Internals/Property cache* — `https://developer.mozilla.org/en/SpiderMonkey/Internals/Property_cache` (now in archived Mozilla wiki content).

    **Data structures — `jspropertycache.h`**
    - `class PropertyCache` — `:160`; fixed-size direct-mapped table, `SIZE_LOG2 = 12` → 4096 entries (`:164-169`).
    - `struct PropertyCacheEntry` — `:121`; key = `kpc` (testing bytecode pc, `:123`) + `kshape` (shape of the key object, `:124`); value = `vcap`/`vword`.
    - hash over **(pc, kshape)** — `hash()` `:215` (`((pc>>SIZE_LOG2 ^ pc) + kshape) & MASK`).
    - entry kinds: `adding()` `:128` (kshape≠vshape, predicted shape transition) vs `directHit()` `:129` (kshape==vshape).
    - `PCVal` tagged union (slot / shape / fun-obj) — `:81`.

    **Hit/lookup path — `jspropertycacheinlines.h`**, `PropertyCache::test()` `:73`
    - `kshape = obj->shape()` `:78`; `entry = table[hash(pc, kshape)]` `:79`.
    - validated by `entry->kpc == pc && entry->kshape == kshape` `:84`, then `matchShape(cx, pobj, entry->vshape())` `:92`.
    - **On a validated hit it returns the cached PCVal and never calls the resolve hook** — the entry is keyed purely on `(pc, shape)`, and all bulk header objects share one shape lineage. This is the crux.

    **Fill — `jspropertycache.cpp`**, `PropertyCache::fill()` `:51`
    - where entries are created (the predicted-transition / `adding` bookkeeping), with bail-outs `:62-128` (cache disabled, `!nativeContains`, dictionary-mode `:84`, non-native protos, overdeep chains).

    **GETPROP wiring — `jsobj.cpp`**
    - found-path fill: `JS_PROPERTY_CACHE(cx).fill(cx, aobj, 0, protoIndex, obj2, shape)` — `:5438` (under `JSGET_CACHE_RESULT`).
    - miss path that explicitly does **not** fill (`nofills++`) — `:5366`.

    **Shapes — `jsscope.cpp`**: `js_GenerateShape()` `:74` / `:99`; `Shape` in `jsscope.h`.

    The empirically-anchored part is just the first half: `test()` keys on `(pc, obj->shape())` and a validated hit bypasses resolve. The exact entry-transition bookkeeping that lets a same-shape header get mis-served `undefined` lives in the interplay of `fill()` (`:51`) and the `adding()`/`directHit()` validation in `test()` — that's the spot to confirm the precise misprediction.
    --- SBBSecho 3.37-Linux
    * Origin: Vertrauen - [vert/cvs/bbs].synchro.net (1:103/705)
  • From Deucе@1:103/705 to GitLab note in main/sbbs on Thu May 21 15:13:07 2026
    https://gitlab.synchro.net/main/sbbs/-/issues/1143#note_8988

    A few problems with the diagnosis I want to surface before we move on.

    1. JIT is off in the build that produced these counts.

    JAVASCRIPT_OPTIONS in sbbsdefs.h:69 is 0x810 = JSOPTION_JIT | JSOPTION_COMPILE_N_GO, which is what global->js.options defaults to (sbbs_ini.c:317) and what jsexec uses (jsexec.cpp:63). JSOPTION_JIT is bit 11, which is TraceMonkey, not MethodJIT, and js-confdefs.h for this tree compiles in JS_METHODJIT, JS_MONOIC, JS_POLYIC but not JS_TRACER. TraceMonkey is gone entirely. MethodJIT (with its PolyIC/MonoIC inline caches) is compiled in but requires JSOPTION_METHODJIT (bit 14 = 0x4000), which is not in 0x810.

    The only caching layer that could plausibly produce the reported behavior under defaults is therefore the interpreter PropertyCache in jspropertycache.cpp.

    2. The interpreter PropertyCache as written can't mis-serve the way the issue describes.

    Trace from fill() to test():

    - The GET-path fill at jsobj.cpp:5432 (js_GetPropertyHelperWithShapeInline) passes adding=JS_FALSE (default, no third arg).
    - In fill() (jspropertycache.cpp:51), with adding=FALSE and a default getter, control reaches the if (kshape == 0) block at line 256, which sets kshape = obj->shape() and vshape = pobj->shape(). By this point the resolve hook has already called JS_DefineProperty and the shape has transitioned, so this obj->shape() reads post-transition (call it S_post).
    - test() at jspropertycacheinlines.h:73 reads kshape = obj->shape() for the receiver currently being looked up (line 78), then rejects unless entry->kshape == kshape (line 84).

    A fresh header at S_init reading hdr.to_ext therefore has kshape = S_init in test(). Any entry filled by a populated-sibling header is at kshape = S_post. The kshape check fails, control falls through to fullTest, then to js_GetPropertyHelper, which calls the resolve hook. There is no path in the interpreter cache I can find where a same-S_init header gets a hit on an entry filled by an S_post header.

    3. Other candidate mechanisms considered and discarded:

    - Cached miss / negative caching of "this id not on this shape": PCVal (jspropertycache.h:81-119) has four states (Null, FunObj, Slot, Shape), no negative state. fill() requires a real Shape* and bails via JS_NO_PROP_CACHE_FILL if !pobj->nativeContains(*shape). Every fill call site passes a real shape.
    - Proto-chain contamination (a stray to_ext = undefined defined on the shared msghdr prototype): falsified by the very GETELEM-vs-GETPROP discriminator. If the proto held the entry, GETELEM's proto walk in js_LookupPropertyWithFlagsInline at jsobj.cpp:4985-5009 would find it and return undef too. The reported GETELEM count of 213 (matching true-NULL count) rules this out.
    - "Shape priming" from ca448cb8b: not a real SpiderMonkey concept. Shapes transition via PropertyTree::getChild (jspropertytree.cpp:204-243) which returns existing nodes for matching transitions, so two headers both adding "number" land on the same destination shape. There's no per-object priming distinct from shape.
    - JIT PolyIC misprediction (methodjit/PolyIC.cpp): would have been my next candidate, except JIT is off per (1).

    4. What this leaves.

    Either (a) there is a path in fullTest (jspropertycache.cpp:320-410) or the vcapTag()==1 proto-hit branch in test() (jspropertycacheinlines.h:87-90) that I have not traced and that does in fact mis-serve under shared-shape conditions, or (b) the reported counts are not measuring what the writeup says they measure.

    I can't rule out (a) without sitting in the debugger. But (b) is the more parsimonious read of where this issue's history has been so far: ca448cb8b's commit-message theory did not survive scrutiny, 666ff71ce's did not either, and the experimental writeup cites a mechanism (per-(shape, pc) cache hit on a same-S_init fresh header) that does not reproduce against a read of the source on JIT-off SM 1.8.5.

    Request: please attach probe_enum.js and probe_to_ext.js, or paste them inline.

    The numbers (cold to_ext = 7138 undef, GETELEM = 213, sibling-alive = 7140, cold-defined-but-wrong = 0) are load-bearing for the diagnosis, and without seeing how each was counted, I can't tell whether the discriminator was actually exercised, or whether (for example) the GETELEM pass was inadvertently warmed by something that ran earlier in the same script. With the scripts in hand we can re-run them, and either confirm the misprediction is real (in which case the next step is gdb on the interpreter cache to find the path I missed) or identify the script artifact.
    --- SBBSecho 3.37-Linux
    * Origin: Vertrauen - [vert/cvs/bbs].synchro.net (1:103/705)
  • From Deucе@1:103/705 to GitLab note in main/sbbs on Thu May 21 15:46:07 2026
    https://gitlab.synchro.net/main/sbbs/-/issues/1143#note_8991

    Zooming out: unchecked JS_* returns is the broader pattern

    The lost-exception trap in LAZY_STRING_TRUNCSP_NULL / LAZY_STRING_TRUNCSP is not unique to those macros. It's how nearly the entire JS binding layer is written.

    Quick count in this tree:

    - js_msgbase.cpp: 28 JS_DefineProperty calls, 0 with a return-value check.
    - Whole tree: 683 JS_DefineProperty calls, ~108 inside any if, and most of those are conditional calls (if (cond) JS_DefineProperty(...)) rather than checks on the return. The actual checked-return rate is well under 10%.
    - Same pattern for JS_NewObject, JS_NewStringCopyZ, JS_NewArrayObject, JS_SetProperty, JS_DefineElement.

    Any allocation failure in any of those sites sets a pending exception on cx and is silently swallowed. The script proceeds with whatever the default observable is (NULL pointers from JS_GetPrivate, undefined from skipped property defines, no-op'd setup). The exception sits on cx until something else trips on it (or jsexec reports it at script termination).

    Suggested test to surface the scope

    Run the same get_all_msg_headers fetch under a deliberately undersized heap, read several LAZY-resolved fields per header, count undef per field:

    ```
    // Set JavaScriptMaxBytes very low in jsexec.ini first, e.g. 4M or 2M.
    var fields = ["to", "from", "subject", "to_ext", "from_ext",
    "to_list", "cc_list", "summary", "tags",
    "replyto", "from_org", "editor"];
    var u = {};
    for (var f = 0; f < fields.length; f++) u[fields[f]] = 0;

    var caught = null;
    try {
    for (var n in headers) {
    var h = headers[n];
    for (var f = 0; f < fields.length; f++) {
    if (h[fields[f]] === undefined)
    u[fields[f]]++;
    }
    }
    } catch (e) { caught = e; }

    for (var f = 0; f < fields.length; f++)
    print(fields[f] + ": " + u[fields[f]] + " undef");
    print("caught: " + caught);

    // flush any pending exception that lingered past loop end
    try { var s = ""; for (var i = 0; i < 50000; i++) s += "x"; }
    catch (e) { print("post-loop: " + e); }
    ```

    If OOM under a small heap is the trigger, all these fields should show elevated undef counts, not just the *_NULL ones. to/from/subject are routed through LAZY_STRING_TRUNCSP, which has the same swallow-and-undefine behavior on JS_NewStringCopyZ failure. If only to_ext-shaped fields show it, OOM isn't the (whole) story.

    I'll file the unchecked-return audit as a separate issue so it doesn't tangle with the property-cache thread in this one. The point of this comment is just: even after #1143 is closed, the same defect pattern is going to keep producing "mysterious" symptoms across the codebase until it gets swept.
    --- SBBSecho 3.37-Linux
    * Origin: Vertrauen - [vert/cvs/bbs].synchro.net (1:103/705)
  • From Rob Swindell@1:103/705 to GitLab note in main/sbbs on Thu May 21 16:32:33 2026
    https://gitlab.synchro.net/main/sbbs/-/issues/1143#note_8993

    **OOM / allocation-failure (#1144) ruled out as the trigger for this symptom**

    Tested the open question of whether allocation failure (`JS_NewStringCopyZ` → NULL under heap pressure, the LAZY_* fall-through path described in #1144) is what produces the cold-`to_ext`-undefined here. Two independent lines of evidence say no:

    **1. Heap invariance.** Ran the cold/primed probe against the live mail base at three heap sizes (`jsexec -m<bytes>`): 16 MB (default), 256 MB, 1 GB. Bit-identical results every time:

    | heap | total | cold to_ext undef | BUG mismatches | cold defined-but-wrong | |------|-------|-------------------|----------------|------------------------| | 16 MB | 7294 | 7193 | 6971 | 0 |
    | 256 MB | 7294 | 7193 | 6971 | 0 |
    | 1 GB | 7294 | 7193 | 6971 | 0 |

    If allocation pressure were the trigger, a 64× larger heap would relieve it and the count would drop — it didn't move by a single header. An OOM-driven failure would also be non-deterministic (GC timing / high-water mark dependent); this is fully deterministic.

    **2. GETELEM returns the real string** (decisive, heap-independent). If `JS_NewStringCopyZ` had returned NULL, the string would never exist and the property would never be defined — so `hdr["to_ext"]` (JSOP_GETELEM) would *also* read undefined. But GETELEM returns the correct value (`to_ext` undef = 213 = the true NULL count), while `hdr.to_ext` (JSOP_GETPROP) reads undefined. The string **was** allocated and the property **was** defined; only the GETPROP property-cache lookup path fails.

    So the symptom in this issue is the `JSOP_GETPROP` property-cache mispredict across same-shape headers, not the #1144 allocation-failure path.

    **This does not invalidate #1144.** The unchecked `JS_*` returns are a separable, real latent defect that would bite under genuine allocation pressure (and silently swallow the pending exception), exactly as described there — it's just not the mechanism firing in this reproduction.
    --- SBBSecho 3.37-Linux
    * Origin: Vertrauen - [vert/cvs/bbs].synchro.net (1:103/705)