close
Skip to content

Replace pm2 with homegrown process manager daemon#2712

Merged
fredrikekelund merged 31 commits into
trunkfrom
stu-1349-replace-pm2-with-homegrown
Mar 10, 2026
Merged

Replace pm2 with homegrown process manager daemon#2712
fredrikekelund merged 31 commits into
trunkfrom
stu-1349-replace-pm2-with-homegrown

Conversation

@fredrikekelund
Copy link
Copy Markdown
Contributor

@fredrikekelund fredrikekelund commented Mar 5, 2026

Related issues

How AI was used in this PR

Codex wrote most of the code in this PR, starting from a detailed list of requirements I shared (based on our experience with pm2). I iterated on this over multiple days, both with Codex and by hand. I've touched all major files in the PR. process-manager-ipc.ts is a standout example – I made many manual edits there. I've reviewed all the code. The code I've spent the least time editing or reviewing is the tests.

Proposed Changes

Tip

I'm happy to pair for a review of this PR. I know it's a big one. Also, please note that this PR builds upon the changes in #2690. That PR should be reviewed first.

See STU-1349 for the "why" of removing pm2. This PR implements the following:

  • A process-manager-daemon.ts that manages child processes. This involves forking them, tracking them in-memory, forwarding child IPC messages as typed daemon events, and writing stdout/stderr to PM2-compatible log files under ~/.studio/pm2/logs. This file is the PM2 replacement.
  • daemon-client.ts replaces pm2-manager.ts. It talks to the daemon over a control socket for request/response commands and an events socket for receiving messages from child processes and process events (online/exit, etc.).

The architecture largely follows PM2's example, and process-manager-daemon.ts implements the same functionality. A few notes to get reviewers started:

  • Starting/stopping processes are relatively easy to understand from the code.
  • IPC is a little trickier, but the gist is that daemon-client exports a sendMessageToProcess function that uses the process manager "control socket" to send a message with a processId target. When process-manager-daemon.ts receives the message, it uses the standard Node.js IPC mechanism to forward it to the child process. Child processes cannot respond directly to these messages, but they can emit events through Node.js standard IPC, which process-manager-daemon.ts picks up and broadcasts on the "events socket". daemonBus listens on the events socket, and the subscribeSiteEvents function picks up the process-message and process-event events, which are ultimately picked up by _events.ts.

Testing Instructions

  1. Run npm start
  2. Ensure that starting and stopping sites works as expected
  3. Try creating a new site, too
  4. While Studio is still running, run node apps/cli/dist/cli/main.js site start --skip-browser --path PATH_TO_SITE in your terminal
  5. Ensure that the site is immediately marked as started in Studio

Pre-merge Checklist

  • Have you checked for TypeScript, React or other console errors?

@fredrikekelund fredrikekelund requested review from a team and bcotrim March 5, 2026 15:32
@fredrikekelund fredrikekelund self-assigned this Mar 5, 2026
Base automatically changed from stu-1301-replace-pm2-axon-with-homegrown to trunk March 9, 2026 07:27
@wpmobilebot
Copy link
Copy Markdown
Collaborator

wpmobilebot commented Mar 9, 2026

📊 Performance Test Results

Comparing c649e82 vs trunk

site-editor

Metric trunk c649e82 Diff Change
load 1752.00 ms 1805.00 ms +53.00 ms 🔴 3.0%

site-startup

Metric trunk c649e82 Diff Change
siteCreation 6116.00 ms 6087.00 ms -29.00 ms ⚪ 0.0%
siteStartup 3931.00 ms 3942.00 ms +11.00 ms ⚪ 0.0%

Results are median values from multiple test runs.

Legend: 🟢 Improvement (faster) | 🔴 Regression (slower) | ⚪ No change (<50ms diff)

Copy link
Copy Markdown
Contributor

@bcotrim bcotrim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While testing this I found some weird behaviors:

  1. Killing a site process (kill pid) didn't reflect on Studio. Site was still running in the UI and stop operation didn't work (CLI worked fine). I could only replicate this once
  2. Doing Stop all from the Studio UI button, seems to break the UI. After doing stop all starting any other sites would not be reflected in the UI. I was able to replicate this consistently.

Added some minor comments to the code that aren't blockers, but we should look into these 2 issues.

} from 'cli/lib/types/wordpress-server-ipc';

// Zod schema for process descriptions
export const processDescriptionSchema = z.object( {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we be more intentional about the status + pid options?
I believe an online process will always have a pid, and the optional scenario is only for stopped processes. We could use an union to be more intentional about this type, what do you think?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call! 👍 I'm all for well-defined types. I've updated the code accordingly.

Comment thread apps/cli/lib/socket.ts Outdated
const socket = await this.connect();
return this.sendAndReadFromSocket( socket, payload );
} );
this.queue = responsePromise.then(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pattern is odd, I wonder if there's a better approach to it.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it's kind of an odd pattern. I also reacted to it previously and chatted with Codex about it.

Really, it's deceptively simple. It works because each send call reassigns the queue promise, so the result is a chain of promises. See https://jsfiddle.net/vjpdy7q4/1/ for a clearer proof of concept

import path from 'path';

export const PROCESS_MANAGER_HOME =
process.env.STUDIO_PROCESS_MANAGER_HOME ?? path.join( os.homedir(), '.studio', 'pm2' );
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm assuming you decided keeping the pm2 name for backwards compatibility reasons, but I wonder if we should just rename it and handle existing folder content in a migration script or similar?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm assuming you decided keeping the pm2 name for backwards compatibility reasons

Yep, that's right. Seems reasonable to change the path and remove all pm2 naming, but I'd rather do it in a follow-up PR. I've added STU-1390 to track

@fredrikekelund
Copy link
Copy Markdown
Contributor Author

Thanks for the review, @bcotrim! 🙏 I should've caught the "Stop all" thing sooner…

Killing a site process (kill pid) didn't reflect on Studio. Site was still running in the UI and stop operation didn't work (CLI worked fine). I could only replicate this once

I'll test this some more. There's no obvious reason why this would happen…

Doing Stop all from the Studio UI button, seems to break the UI. After doing stop all starting any other sites would not be reflected in the UI. I was able to replicate this consistently.

That was because SocketClient wouldn't reconnect when the underlying socket closed (which it did when the process manager daemon shut down and closed its eventsServer).

I've refactored socket.ts so the socket client automatically reconnects and verified that this has the desired effect in the app. I also took the opportunity to add an E2E test that clicks the Stop all button and then restarts a site. Seems like a useful thing to have in the test suite.

const processDescriptionSchemaStopped = processDescriptionSchemaBase.extend( {
status: z.literal( 'stopped' ),
} );
export const processDescriptionSchema = z.discriminatedUnion( 'status', [
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice 👍

Copy link
Copy Markdown
Contributor

@bcotrim bcotrim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍
I can no longer replicate any of the issues I mentioned, thanks for making the changes

@fredrikekelund fredrikekelund enabled auto-merge (squash) March 10, 2026 13:41
@fredrikekelund fredrikekelund merged commit 7d2b52f into trunk Mar 10, 2026
8 of 10 checks passed
@fredrikekelund fredrikekelund deleted the stu-1349-replace-pm2-with-homegrown branch March 10, 2026 13:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants