ActionMapper: bound serialization breadth, move SQLite write off the action thread #1

Open
opened 2026-05-23 20:54:16 +00:00 by coilysiren · 0 comments
Owner

Originally filed by @coilysiren on 2026-05-19T02:33:38Z - https://github.com/coilysiren/eco-replay/issues/8

Problem - With eco-replay installed, a single player click on the live server (coilysiren/infrastructure#183) allocated enough memory to freeze kai-server. Idle play was fine. Two structural bugs in the recorder make this possible.

Bug 1: ActionMapper.ToRow serialization is bounded in depth but not breadth.

BodySettings sets MaxDepth = 2, which only constrains nesting. ShallowResolver short-circuits properties whose declared PropertyType.FullName is exactly one of User / ItemStack / WorldObject / Deed. That misses:

  • Properties typed as object, IAlias, IOwned, or any interface / base class. Runtime type doesn't match the FullName exact-match, so Newtonsoft enumerates.
  • Properties typed as IEnumerable<T>, IList<T>, IDictionary<,>, HashSet<T> of complex Eco entities (inventories, chunks, world-object lists, property graphs). MaxDepth=2 lets each element be fully expanded one layer.
  • Newtonsoft's Error handler swallows exceptions but doesn't stop enumeration. A 100k-element collection of complex objects allocates GB of intermediate strings before anything errors.

Click actions (interact / place / trade / claim) carry exactly these wide references. Idle actions don't.

Fix sketch:

  • In ShallowResolver, walk prop.PropertyType and its base types + implemented interfaces against SkipTypes, not just exact FullName match. Better: maintain an allow-list of safe primitive-ish property types and skip everything else.
  • Treat any IEnumerable (other than string and primitive collections) at any depth as a count summary: \"<n items>\", not the full enumeration. A custom JsonConverter on the resolver is cleaner than fighting DefaultContractResolver.
  • Hard cap body_json size after the fact: if > 16 KB after serialization, replace with {\"truncated\": true, \"action_type\": ...}.

Bug 2: SQLite insert runs synchronously on Eco's action thread.

EventStore.Insert takes writeLock and does ExecuteNonQuery from inside ActionPerformed. That's on whatever thread Eco fires ActionUtil. Under load this:

  • Couples the game tick to disk latency.
  • Serializes every action through a single mutex.
  • Amplifies any pathology in Bug 1 (the runaway serialization runs on the game thread too).

Fix sketch:

  • Channel<EventRow>(bounded, DropOldest) between ActionPerformed and a single background Task that drains the channel and batches inserts.
  • ActionPerformed becomes: build row, TryWrite, return. Never blocks the game thread.
  • The background writer can batch (transaction over 100 rows or 250 ms, whichever comes first) for better throughput.

Out of scope - the host-level memory cage already landed in coilysiren/infrastructure#183. That stops the host from freezing again, but the mod is still uninstalled until this issue closes.

Verify -

  1. Unit tests covering a synthetic GameAction with a 10k-item IEnumerable property: ToRow returns in bounded time, body_json size is bounded.
  2. Reinstall on kai-server, confirm clicks no longer balloon RSS, Storage/EcoReplay.db grows normally.
  3. MemoryHigh=10G cgroup soft-cap does not get touched during normal play.
_Originally filed by @coilysiren on 2026-05-19T02:33:38Z - [https://github.com/coilysiren/eco-replay/issues/8](https://github.com/coilysiren/eco-replay/issues/8)_ **Problem** - With eco-replay installed, a single player click on the live server (coilysiren/infrastructure#183) allocated enough memory to freeze kai-server. Idle play was fine. Two structural bugs in the recorder make this possible. **Bug 1: ``ActionMapper.ToRow`` serialization is bounded in depth but not breadth.** ``BodySettings`` sets ``MaxDepth = 2``, which only constrains nesting. ``ShallowResolver`` short-circuits properties whose declared ``PropertyType.FullName`` is exactly one of ``User`` / ``ItemStack`` / ``WorldObject`` / ``Deed``. That misses: - Properties typed as ``object``, ``IAlias``, ``IOwned``, or any interface / base class. Runtime type doesn't match the ``FullName`` exact-match, so Newtonsoft enumerates. - Properties typed as ``IEnumerable<T>``, ``IList<T>``, ``IDictionary<,>``, ``HashSet<T>`` of complex Eco entities (inventories, chunks, world-object lists, property graphs). MaxDepth=2 lets each element be fully expanded one layer. - Newtonsoft's ``Error`` handler swallows exceptions but doesn't stop enumeration. A 100k-element collection of complex objects allocates GB of intermediate strings before anything errors. Click actions (interact / place / trade / claim) carry exactly these wide references. Idle actions don't. **Fix sketch:** - In ``ShallowResolver``, walk ``prop.PropertyType`` and its base types + implemented interfaces against ``SkipTypes``, not just exact ``FullName`` match. Better: maintain an allow-list of safe primitive-ish property types and skip everything else. - Treat any ``IEnumerable`` (other than ``string`` and primitive collections) at any depth as a count summary: ``\"<n items>\"``, not the full enumeration. A custom ``JsonConverter`` on the resolver is cleaner than fighting ``DefaultContractResolver``. - Hard cap ``body_json`` size after the fact: if ``> 16 KB`` after serialization, replace with ``{\"truncated\": true, \"action_type\": ...}``. **Bug 2: SQLite insert runs synchronously on Eco's action thread.** ``EventStore.Insert`` takes ``writeLock`` and does ``ExecuteNonQuery`` from inside ``ActionPerformed``. That's on whatever thread Eco fires ``ActionUtil``. Under load this: - Couples the game tick to disk latency. - Serializes every action through a single mutex. - Amplifies any pathology in Bug 1 (the runaway serialization runs on the game thread too). **Fix sketch:** - ``Channel<EventRow>(bounded, DropOldest)`` between ``ActionPerformed`` and a single background ``Task`` that drains the channel and batches inserts. - ``ActionPerformed`` becomes: build row, ``TryWrite``, return. Never blocks the game thread. - The background writer can batch (transaction over 100 rows or 250 ms, whichever comes first) for better throughput. **Out of scope** - the host-level memory cage already landed in coilysiren/infrastructure#183. That stops the host from freezing again, but the mod is still uninstalled until this issue closes. **Verify** - 1. Unit tests covering a synthetic ``GameAction`` with a 10k-item ``IEnumerable`` property: ``ToRow`` returns in bounded time, ``body_json`` size is bounded. 2. Reinstall on kai-server, confirm clicks no longer balloon RSS, ``Storage/EcoReplay.db`` grows normally. 3. ``MemoryHigh=10G`` cgroup soft-cap does not get touched during normal play.
Sign in to join this conversation.
No labels
P0
P1
P2
P3
P4
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
coilyco-flight-deck/eco-replay#1
No description provided.