Skip to content

improvement(db): add session statement/lock timeouts; simplify KB doc tx#4593

Merged
TheodoreSpeaks merged 61 commits into
stagingfrom
fix/threshold-billing-pool-deadlock
May 14, 2026
Merged

improvement(db): add session statement/lock timeouts; simplify KB doc tx#4593
TheodoreSpeaks merged 61 commits into
stagingfrom
fix/threshold-billing-pool-deadlock

Conversation

@TheodoreSpeaks
Copy link
Copy Markdown
Collaborator

@TheodoreSpeaks TheodoreSpeaks commented May 14, 2026

Summary

  • Add server-side lock_timeout=5s / statement_timeout=30s session defaults via connection: {...} startup params in packages/db/db.ts. Converts silent pool wedges into loud server-side cancellations.
  • Workspace archival tx keeps its own ceiling via SET LOCAL statement_timeout='5min' and lock_timeout='30s' — rare admin op stays atomic without tripping the new global default.
  • KB document creation (bulk + single) drops the db.transaction + SELECT 1 ... FOR UPDATE wrapper. Because KB delete is soft (deletedAt = now) the FK alone can't guard a concurrent delete, so the new insertDocumentsIfKbAlive helper does the existence check and the insert in a single statement via INSERT...SELECT...WHERE EXISTS over jsonb_to_recordset. Atomic at the MVCC snapshot — race-free, no transaction, no row lock. Side effect: removes a processDocumentTags-uses-db-inside-tx deadlock surface.

Type of Change

  • Bug fix

Testing

Tested manually. bun run lint clean. bun run check:api-validation:strict passes. Vitest: workspace lifecycle 2/2, billing 29/29, knowledge 49/49, webhook trigger 17/17.

Checklist

  • Code follows project style guidelines
  • Self-reviewed my changes
  • Tests added/updated and passing
  • No new warnings introduced
  • I confirm that I have read and agree to the terms outlined in the Contributor License Agreement (CLA)

waleedlatif1 and others added 30 commits April 3, 2026 23:30
…ership workflow edits via sockets, ui improvements
…ration, signup method feature flags, SSO improvements
* feat(posthog): Add tracking on mothership abort (#4023)

Co-authored-by: Theodore Li <theo@sim.ai>

* fix(login): fix captcha headers for manual login  (#4025)

* fix(signup): fix turnstile key loading

* fix(login): fix captcha header passing

* Catch user already exists, remove login form captcha
…nts, secrets performance, polling refactors, drag resources in mothership
…endar triggers, docs updates, integrations/models pages improvements
…mat, logs performance improvements

fix(csp): add missing analytics domains, remove unsafe-eval, fix workspace CSP gap (#4179)
fix(landing): return 404 for invalid dynamic route slugs (#4182)
improvement(seo): optimize sitemaps, robots.txt, and core web vitals across sim and docs (#4170)
fix(gemini): support structured output with tools on Gemini 3 models (#4184)
feat(brightdata): add Bright Data integration with 8 tools (#4183)
fix(mothership): fix superagent credentials (#4185)
fix(logs): close sidebar when selected log disappears from filtered list; cleanup (#4186)
v0.6.46: mothership streaming fixes, brightdata integration
icecrasher321 and others added 19 commits April 29, 2026 10:16
…rizations, mothership positional table row insertion, CI improvements, org-external users, file viewer improvements
v0.6.62: fix new copilot chat creation and selection on refresh
…ixes, db query optimizations, contract boundaries code hygiene, CORS, toast improvements, tables infinite query, executor robustness, reranker support
…ogs block, parallel-in-loop wall clock, gpt-image-2
…s, logs panel width, tables UI/DB decoupling

v0.6.67: VFS upload fix, posthog/copilot correlation, exa date filters, logs panel width, tables UI/DB decoupling
…ering upgrades, data drains, security hardening, paginated dropdowns
…ntegrations, robots.txt update, workday hardening
v0.6.72: billing pool contention fix
…personation fixes, md rendering, doc/pdf/pptx generation improvements
…pentelemetry updates, data drains to snowflake, blob, datadog, bigquery
…ip md polish

v0.6.75: scheduler claim-budget drain, helm chart hardening, mothership md polish
@vercel
Copy link
Copy Markdown

vercel Bot commented May 14, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment
Project Deployment Actions Updated (UTC)
docs Skipped Skipped May 14, 2026 4:24pm

Request Review

@gitguardian
Copy link
Copy Markdown

gitguardian Bot commented May 14, 2026

⚠️ GitGuardian has uncovered 2 secrets following the scan of your pull request.

Please consider investigating the findings and remediating the incidents. Failure to do so may lead to compromising the associated services or software components.

🔎 Detected hardcoded secrets in your pull request
GitGuardian id GitGuardian status Secret Commit Filename
29606901 Triggered Generic High Entropy Secret a54dcbe apps/sim/providers/utils.test.ts View secret
32763747 Triggered Generic Password 3e9849b helm/sim/tests/validators_test.yaml View secret
🛠 Guidelines to remediate hardcoded secrets
  1. Understand the implications of revoking this secret by investigating where it is used in your code.
  2. Replace and store your secrets safely. Learn here the best practices.
  3. Revoke and rotate these secrets.
  4. If possible, rewrite git history. Rewriting git history is not a trivial act. You might completely break other contributing developers' workflow and you risk accidentally deleting legitimate data.

To avoid such incidents in the future consider


🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.

@cursor
Copy link
Copy Markdown

cursor Bot commented May 14, 2026

PR Summary

Medium Risk
Introduces global Postgres lock_timeout/statement_timeout defaults, which can cause previously-long or lock-waiting queries to start failing and needs validation in production workloads. Also changes knowledge-base document creation semantics by removing a transaction/lock, so knowledge_base.updatedAt is no longer updated atomically with the insert.

Overview
Adds default Postgres session guards in packages/db/db.ts (lock_timeout=5s, statement_timeout=30s) so lock waits and long-running queries fail fast instead of wedging connections.

Updates workspace archival to SET LOCAL higher per-transaction timeouts (5min statement, 30s lock) to keep this rare, large operation reliable under the new global defaults.

Simplifies knowledge-base document creation (createDocumentRecords, createSingleDocument) by dropping the explicit transaction and SELECT ... FOR UPDATE lock, relying on FK constraints/atomic inserts and reducing deadlock/pool checkout overhead.

Reviewed by Cursor Bugbot for commit 0ad209f. Bugbot is set up for automated code reviews on this repo. Configure here.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 14, 2026

Greptile Summary

This PR adds server-side lock_timeout=5s and statement_timeout=30s session defaults to the postgres.js connection pool, converting silent long-running query hangs into loud server-side cancellations. It also refactors KB document creation to use a single INSERT...SELECT...WHERE EXISTS statement rather than an explicit transaction with a row-level lock, and overrides the new shorter defaults in the workspace archival transaction with SET LOCAL.

  • packages/db/db.ts: Adds connection: { lock_timeout: 5_000, statement_timeout: 30_000 } so every connection in the pool inherits these session-level guards at connection startup.
  • apps/sim/lib/workspaces/lifecycle.ts: Prepends SET LOCAL overrides at the start of the archival transaction so the 30s default doesn't abort legitimate large-workspace cleanup operations.
  • apps/sim/lib/knowledge/documents/service.ts: Replaces the explicit db.transaction + SELECT ... FOR UPDATE pattern with a new insertDocumentsIfKbAlive helper that does the existence check and insert atomically in one SQL statement via INSERT...SELECT...WHERE EXISTS.

Confidence Score: 4/5

The timeout and lock changes are safe; open questions remain around the KB document creation path that were raised in previous review threads.

The db.ts and lifecycle.ts changes are straightforward and mechanically correct — session timeouts are set at startup, and the archival transaction correctly overrides them with SET LOCAL before any DML. The insertDocumentsIfKbAlive helper is a genuine improvement for combining the soft-delete guard and insert into one SQL statement. The two open issues from prior review threads (the soft-delete race window at READ COMMITTED isolation and the non-atomic updatedAt update) are still present in this revision and would benefit from being addressed before merging to a production path.

apps/sim/lib/knowledge/documents/service.ts — the document creation path still has the two open concerns from prior review threads worth re-examining before merge.

Important Files Changed

Filename Overview
packages/db/db.ts Adds session-level lock_timeout (5s) and statement_timeout (30s) via postgres.js connection startup params; values are correct integer milliseconds accepted by PostgreSQL.
apps/sim/lib/workspaces/lifecycle.ts Adds SET LOCAL statement_timeout='5min' and SET LOCAL lock_timeout='30s' at the top of the archival transaction to correctly override session defaults for this rare, long-running admin operation.
apps/sim/lib/knowledge/documents/service.ts Introduces insertDocumentsIfKbAlive using INSERT…SELECT…WHERE EXISTS to combine the soft-delete guard and insert into one statement; the updatedAt update remains outside any transaction (as flagged in existing review threads).
apps/sim/lib/workspaces/lifecycle.test.ts Adds execute: vi.fn().mockResolvedValue([]) to the transaction mock to handle the two new SET LOCAL calls; change is correct and minimal.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[createDocumentRecords / createSingleDocument] --> B{Upfront KB check\ndb.select WHERE deletedAt IS NULL}
    B -- KB not found --> C[throw 'Knowledge base not found']
    B -- KB alive --> D[Process tag data\nprocessDocumentTags]
    D --> E[Build NewDocumentRow array]
    E --> F[insertDocumentsIfKbAlive\nINSERT...SELECT...WHERE EXISTS\ndeleted_at IS NULL]
    F -- 0 rows returned\nKB soft-deleted in window --> G[throw 'Knowledge base not found']
    F -- N rows returned --> H[Log success]
    H --> I[db.update knowledgeBase\nset updatedAt - separate statement]
    I --> J[Return document data]
Loading

Reviews (2): Last reviewed commit: "fix(knowledge): close soft-delete TOCTOU..." | Re-trigger Greptile

Comment thread apps/sim/lib/knowledge/documents/service.ts Outdated
Comment thread apps/sim/lib/knowledge/documents/service.ts
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 0ad209f. Configure here.

Comment thread apps/sim/lib/knowledge/documents/service.ts Outdated
Fix the race the bots flagged: KB delete is soft (`deletedAt = now`) so
the FK can't catch a concurrent KB delete between the existence check
and the document insert.

- Add `insertDocumentsIfKbAlive` helper that gates the insert on
  `EXISTS(SELECT 1 FROM knowledge_base WHERE id=$kb AND deleted_at IS NULL)`
  in the same statement via INSERT...SELECT...WHERE EXISTS. Atomic at the
  MVCC snapshot — no transaction, no row lock.
- Use jsonb_to_recordset to declare column types once, avoiding per-param
  casts for nullable columns.
- Wire into both `createDocumentRecords` (bulk) and `createSingleDocument`.
- Keep the upfront KB existence check as a fast-path early-out for the
  common case; the atomic insert is the race guard.
@TheodoreSpeaks
Copy link
Copy Markdown
Collaborator Author

@greptile review

@TheodoreSpeaks TheodoreSpeaks merged commit 4295a5c into staging May 14, 2026
13 of 14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants