improvement(style-api): fix auth order, add size guard, extract pptx slide meta + skill fixes by waleedlatif1 · Pull Request #4524 · simstudioai/sim

waleedlatif1 · 2026-05-08T23:04:08Z

Summary

Fix auth order in style extraction route — session check now runs before parseRequest per CLAUDE.md mandate
Add 100 MB file size guard before downloading for style extraction (prevents OOM on large files)
Fix 422 error message to accurately cover all three formats (docx/pptx/pdf), not just binary OOXML
Fix contract schema drift: add pdf to format enum, make theme optional, add defaults/pageSize/fonts/slideCount/aspectRatio/background fields
Extract PPTX slide count, aspect ratio, and slide master background color from presentation.xml + slideMaster1.xml
Fix drawCircle size pitfall in PDF SKILL.md — size is the radius, not diameter (size: 40 = 40pt radius / 80pt wide)
Add ExternalHyperlink pattern to DOCX SKILL.md
Add slide.addNotes() speaker notes pattern to PPTX SKILL.md

Type of Change

Bug fix
Improvement

Testing

Tested manually — verified style extraction on real PPTX, DOCX, and PDF files; confirmed auth order, size guard, and new PPTX fields in response

Checklist

Code follows project style guidelines
Self-reviewed my changes
Tests added/updated and passing
No new warnings introduced
I confirm that I have read and agree to the terms outlined in the Contributor License Agreement (CLA)

…ers, MIME guards, and 256 MB isolate limit

…dImage

…nt origin placement

…e/transformation

…parsing

- DOCX addImage: upfront width/height validation (matches PDF/PPTX pattern) - PDF embedImage: remove dead Buffer ternary; drop redundant size guard already enforced in getFileBase64 - isolated-vm-worker: add friendly MemoryLimitError branch in both execute paths so OOM produces a clear message instead of a raw V8 error

…ror is reachable Isolate OOM auto-disposes the isolate before throwing, meaning isDisposed is true on the way into the catch block. The previous ordering caused OOM to surface as AbortError ('Execution cancelled') instead of MemoryLimitError. Move the message-based OOM check above the isDisposed check in both executeCode and executeTask so the friendly message is actually shown.

PptxViewJS renders slides as canvas->JPEG which has known layout fidelity issues with complex overlapping shapes. The downloaded binary is always correct — add a note so users know to download for exact rendering.

…ts/BodyText, PDF support

…slide meta

vercel · 2026-05-08T23:04:12Z

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment

Project	Deployment	Actions	Updated (UTC)
docs	Skipped		May 8, 2026 11:23pm

cursor · 2026-05-08T23:04:18Z

PR Summary

Medium Risk
Expands style extraction to PDFs and adds deeper DOCX/PPTX parsing (inheritance, slide metadata), which can affect response shape and parsing robustness. Also raises isolated-vm memory limits and changes OOM error handling, impacting sandbox resource usage and failure modes.

Overview
Extends workspace file style extraction to support PDFs and returns a richer, format-specific style summary: optional OOXML themes for DOCX, DOCX default run properties + inherited paragraph styles, PPTX slide count/aspect ratio/master background, and PDF page size + embedded font families.

Hardens the style API/VFS path with earlier auth checks, clearer 422 errors, and a 100MB pre-download size limit to reduce OOM risk.

Improves sandbox execution stability by increasing isolated-vm memory limits to 256MB, surfacing a dedicated MemoryLimitError, and adding safer image helper utilities/size caps in the docx-generate, pptx-generate, and pdf-generate tasks.

^{Reviewed by Cursor Bugbot for commit 3310321. Configure here.}

greptile-apps · 2026-05-08T23:07:50Z

Greptile Summary

This PR hardens the style extraction API (auth order, 100 MB size guard, PDF support) and upgrades the three sandbox document generators with geometry constants, validated image helpers, and better error messages. The isolated-VM memory limit is also doubled to 256 MB with an explicit OOM error path.

Auth + size guard in route.ts: session check now precedes parseRequest and a 100 MB cap is enforced before downloading — both correct, but the same size guard is absent from the parallel VFS path in workspace-vfs.ts.
document-style.ts refactor: theme is now optional, DOCX style inheritance is resolved across basedOn chains, and PPTX slide count/aspect ratio/background are extracted from presentation.xml and slideMaster1.xml.
Sandbox task improvements: pptx-generate, docx-generate, and pdf-generate all gain validated addImage/drawImage helpers, 8 MB per-image caps, and page-geometry constants.

Confidence Score: 4/5

Safe to merge with the VFS size guard added — all other changes are straightforward improvements

The auth ordering fix, size guard, and PDF extraction in route.ts are all correct. The larger refactor in document-style.ts is well-structured. The main gap is that the 100 MB download guard was not applied to the WorkspaceVFS.read() path — the copilot-facing route — so a large file accessed there would still be fetched in full without any size check.

apps/sim/lib/copilot/vfs/workspace-vfs.ts — needs the same size check before fetchWorkspaceFileBuffer that was added to route.ts

Important Files Changed

Filename	Overview
apps/sim/app/api/workspaces/[id]/files/[fileId]/style/route.ts	Auth order fixed (session before parseRequest), 100 MB size guard added, PDF format supported — all correct
apps/sim/lib/copilot/vfs/workspace-vfs.ts	PDF extension added to style path, but the 100 MB size guard from route.ts is absent — oversized files are downloaded unconditionally
apps/sim/lib/copilot/vfs/document-style.ts	Large refactor: theme now optional, style inheritance resolution, PPTX slide meta extraction, PDF style via pdf-lib — logic is sound with a minor semicolon-font edge case
apps/sim/lib/api/contracts/workspace-files.ts	Schema updated to add pdf format, optional theme, and new pptx/pdf fields — aligns correctly with the updated DocumentStyleSummary interface
apps/sim/lib/execution/isolated-vm-worker.cjs	Memory limit doubled to 256 MB; OOM detection added before cancel guard — broad 'memory limit' substring match could misclassify user errors
apps/sim/sandbox-tasks/pptx-generate.ts	Default layout set to LAYOUT_16x9, geometry constants added, getFileBase64 hardened, addImage helper added
apps/sim/sandbox-tasks/docx-generate.ts	Page geometry constants added, getFileBase64 hardened, addImage helper returns a docx.ImageRun — logic is correct
apps/sim/sandbox-tasks/pdf-generate.ts	rgb/StandardFonts shortcuts added, embedImage hardened with MIME validation, drawImage helper added — all correct

Comments Outside Diff (1)

apps/sim/lib/execution/isolated-vm-worker.cjs, line 589-594 (link)

Broad OOM string match may swallow unrelated errors

err.message.includes('memory limit') is a substring check — any user-thrown error whose message contains those words would be caught here, silently replaced with MemoryLimitError, and the real stack trace discarded. Tightening the match or combining with err.name === 'RangeError' would reduce false-positive risk. The same pattern exists in executeTask around line 941.

_{Reviews (1): Last reviewed commit: "improvement(style-api): fix auth order, ..." | Re-trigger Greptile}

greptile-apps · 2026-05-08T23:07:54Z

+        if (rawExt !== 'docx' && rawExt !== 'pptx' && rawExt !== 'pdf') return null
+        const ext: 'docx' | 'pptx' | 'pdf' = rawExt
        const buffer = await fetchWorkspaceFileBuffer(record)
        const summary = await extractDocumentStyle(buffer, ext)


Missing size guard before buffer download

The API route correctly checks fileRecord.size > MAX_STYLE_FILE_BYTES before downloading, but the VFS path calls fetchWorkspaceFileBuffer(record) unconditionally. This means a copilot-triggered style read on a large PDF (or DOCX/PPTX) still fetches the entire file into memory, defeating the purpose of the guard added in this PR.

Suggested change

if (rawExt !== 'docx' && rawExt !== 'pptx' && rawExt !== 'pdf') return null

const ext: 'docx' | 'pptx' | 'pdf' = rawExt

const buffer = await fetchWorkspaceFileBuffer(record)

const summary = await extractDocumentStyle(buffer, ext)

const rawExt = record.name.split('.').pop()?.toLowerCase()

if (rawExt !== 'docx' && rawExt !== 'pptx' && rawExt !== 'pdf') return null

const ext: 'docx' | 'pptx' | 'pdf' = rawExt

const MAX_STYLE_FILE_BYTES = 100 * 1024 * 1024 // 100 MB

if (record.size > MAX_STYLE_FILE_BYTES) return null

const buffer = await fetchWorkspaceFileBuffer(record)

greptile-apps · 2026-05-08T23:07:58Z

+): { font?: string; themeFont?: string } {
+  const asciiLit = /\bw:ascii="([^"]+)"/.exec(fontAttrsXml)
+  // LibreOffice DOCX files may put a semicolon-separated fallback list in w:ascii — take the first
+  if (asciiLit) return { font: asciiLit[1].split(';')[0].trim() || asciiLit[1] }


Semicolon-only font name falls back to ";"

When w:ascii is ";" (malformed DOCX), split(';')[0].trim() returns "" which is falsy, so the expression falls back to asciiLit[1] which is ";". The resulting font field would propagate into the response as a font name. Using || undefined instead would silently drop the malformed value.

Suggested change

if (asciiLit) return { font: asciiLit[1].split(';')[0].trim() || asciiLit[1] }

if (asciiLit) return { font: asciiLit[1].split(';')[0].trim() || undefined }

cursor

Cursor Bugbot has reviewed your changes and found 2 potential issues.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 3310321. Configure here.}

cursor · 2026-05-08T23:12:36Z

+      ...parent,
+      ...s,
+      fontSize: s.fontSize ?? parent.fontSize,
+      bold: s.bold ?? parent.bold,


Style inheritance loses explicit bold-off override

Medium Severity

The style parsing only ever sets bold: true or omits the property entirely. When a child style explicitly disables bold via <w:b w:val="0"/>, no bold property is stored. During resolveInheritance, s.bold ?? parent.bold then falls through to the parent's bold: true, making the child appear bold even though it explicitly opted out. The StyleRaw type allows boolean, but the parsing never sets bold: false, so the nullish coalescing cannot distinguish "not specified" from "explicitly off."

^{Reviewed by Cursor Bugbot for commit 3310321. Configure here.}

cursor · 2026-05-08T23:12:36Z

+  const slideCount = (sldIdLst.match(/<p:sldId\b/g) ?? []).length
+
+  // Slide size in EMU — 1 inch = 914400 EMU
+  const sldSzMatch = /<p:sldSz\b[^>]*\bcx="(\d+)"[^>]*\bcy="(\d+)"/.exec(xml)


Slide size regex assumes fixed attribute ordering

Low Severity

The regex /<p:sldSz\b[^>]*\bcx="(\d+)"[^>]*\bcy="(\d+)"/ requires cx to appear before cy in the XML attributes. XML attribute order is not guaranteed by the spec, so if a PPTX generator writes cy before cx, the regex won't match and aspectRatio silently defaults to 'custom' instead of the correct '16:9' or '4:3'.

^{Reviewed by Cursor Bugbot for commit 3310321. Configure here.}

… font suffix stripping

waleedlatif1 · 2026-05-09T00:04:44Z

Superseded by #4526, which includes all of these changes plus sandbox hardening, OOM error handling, docx __docxDocOptions support, PPTX finalize guard, bold detection fixes, and font suffix stripping — rebased cleanly onto latest staging.

waleedlatif1 added 15 commits May 7, 2026 18:30

improvement(sandbox): upgrade pptx/docx/pdf bootstrap with image help…

508edf6

…ers, MIME guards, and 256 MB isolate limit

fix(sandbox): strict MIME allowlist and nullish coalescing in docx ad…

9e38939

…dImage

fix(sandbox): validate required opts in pdf drawImage to prevent sile…

f3f1a45

…nt origin placement

fix(sandbox): throw on malformed data URI in docx addImage

86c6ba8

fix(sandbox): prevent opts from clobbering computed ImageRun data/typ…

918954a

…e/transformation

fix(sandbox): prevent opts from clobbering fetched data in pptx addImage

449ed27

fix(sandbox): validate required opts in pptx addImage

20c2341

fix(sandbox): remove silent image/png fallback in docx addImage MIME …

925d2ba

…parsing

fix(files): add preview accuracy disclaimer to pptx viewer

9d71d99

PptxViewJS renders slides as canvas->JPEG which has known layout fidelity issues with complex overlapping shapes. The downloaded binary is always correct — add a note so users know to download for exact rendering.

revert(files): remove approximate preview disclaimer from pptx viewer

f230ca8

improvement(style): expand style extraction — DOCX inheritance/defaul…

e149fab

…ts/BodyText, PDF support

improvement(style): remove noise from style extraction output

ced273f

improvement(style-api): fix auth order, add size guard, extract pptx …

3310321

…slide meta

greptile-apps Bot reviewed May 8, 2026

View reviewed changes

cursor Bot reviewed May 8, 2026

View reviewed changes

fix(sandbox): align docx/pptx/pdf task guards, fix bold detection and…

8d0ae09

… font suffix stripping

vercel Bot temporarily deployed to Preview May 8, 2026 23:23 Inactive

waleedlatif1 closed this May 9, 2026

waleedlatif1 deleted the fix/pptx branch May 9, 2026 22:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

improvement(style-api): fix auth order, add size guard, extract pptx slide meta + skill fixes#4524

improvement(style-api): fix auth order, add size guard, extract pptx slide meta + skill fixes#4524
waleedlatif1 wants to merge 16 commits into
stagingfrom
fix/pptx

waleedlatif1 commented May 8, 2026

Uh oh!

vercel Bot commented May 8, 2026 •

edited

Loading

Uh oh!

cursor Bot commented May 8, 2026 •

edited

Loading

Uh oh!

greptile-apps Bot commented May 8, 2026 •

edited

Loading

Comments Outside Diff (1)

Uh oh!

greptile-apps Bot May 8, 2026

Uh oh!

greptile-apps Bot May 8, 2026

Uh oh!

cursor Bot left a comment

Uh oh!

cursor Bot May 8, 2026

Uh oh!

cursor Bot May 8, 2026

Uh oh!

waleedlatif1 commented May 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	if (asciiLit) return { font: asciiLit[1].split(';')[0].trim() \|\| asciiLit[1] }
	if (asciiLit) return { font: asciiLit[1].split(';')[0].trim() \|\| undefined }

Conversation

waleedlatif1 commented May 8, 2026

Summary

Type of Change

Testing

Checklist

Uh oh!

vercel Bot commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cursor Bot commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Summary

Uh oh!

greptile-apps Bot commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Comments Outside Diff (1)

Uh oh!

greptile-apps Bot May 8, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot May 8, 2026

Choose a reason for hiding this comment

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor Bot May 8, 2026

Choose a reason for hiding this comment

Style inheritance loses explicit bold-off override

Uh oh!

cursor Bot May 8, 2026

Choose a reason for hiding this comment

Slide size regex assumes fixed attribute ordering

Uh oh!

waleedlatif1 commented May 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vercel Bot commented May 8, 2026 •

edited

Loading

cursor Bot commented May 8, 2026 •

edited

Loading

greptile-apps Bot commented May 8, 2026 •

edited

Loading