Where `git status` Time Actually Goes: a Microbenchmark

Q: What's the largest cost inside `git status`?

The working-tree walk. On a 250k-file repo, ~85% of a cold `git status` is `lstat()` calls on every tracked file. Less than 10% is index parsing, hashing, or output formatting. fsmonitor cuts the walk to a few files, eliminating the dominant cost.

Q: Does platform matter?

Yes, but less than you'd think. macOS APFS, Linux ext4, and Windows NTFS all run at roughly 30,000–80,000 `lstat()` per second. Wall-clock differences come more from inotify/FSEvents/RDC efficiency than raw stat throughput.

Q: Why doesn't `git status` use multi-threading?

It does, in part — the working-tree walk is multi-threaded with `core.preloadIndex`. But you still pay the syscall cost per file, and threading mostly helps you saturate the kernel rather than reduce total work.

We profiled `git status` on a 250k-file monorepo across macOS, Linux, and Windows. Here's where the milliseconds end up — and why fsmonitor wins by a factor of 50.

Dipankar Sarkar June 8, 2026 7 min read

#benchmark
#git
#performance
#profiling

A while ago I wanted to know exactly where git status spends its time in a large repo. “Walking the tree” is the cliché answer, but I wanted numbers. Here’s what I found.

Test setup

Monorepo with 248,000 tracked files, 32,000 directories.
Three machines: M2 MacBook Pro (macOS 14, APFS), Ryzen 7950X (Ubuntu 24.04, ext4), Surface Studio (Windows 11, NTFS).
Git 2.46. feature.manyFiles=true. No fsmonitor (baseline).
Each measurement is the median of 10 runs, taken after a git update-index --refresh to ensure the index is clean.

Wall-clock baseline (cold cache)

Operation	macOS	Linux	Windows
`git status`	7,950 ms	4,180 ms	11,200 ms
`git status --porcelain`	7,820 ms	4,090 ms	11,050 ms

Linux ext4 is the fastest absolute baseline; Windows NTFS is the slowest by a factor of ~2.5×. This roughly tracks lstat() throughput on each platform.

Where does the time go?

Using perf on Linux and dtruss on macOS, I broke down the warm baseline (cache populated, fewer disk reads):

Cost	Linux	macOS
`lstat()` on tracked files	78%	84%
`readdir()` for untracked-file scan	11%	6%
Index parse + comparisons	6%	4%
Hashing (SHA-1 on modified files)	3%	4%
Output formatting	<1%	<1%
Process startup, dynamic linker	1%	2%

The walk dominates. Everything else is rounding error. This is why fsmonitor wins so completely — it eliminates the dominant cost almost entirely.

With fsmonitor

Same machines, same repo, with gity registered:

Operation	macOS	Linux	Windows
`git status` (warm)	26 ms	18 ms	34 ms
`git status` (after touching 1 file)	31 ms	24 ms	42 ms
`git status` (after touching 100 files)	48 ms	38 ms	71 ms

Speedup vs baseline: ~300× on macOS, ~230× on Linux, ~330× on Windows.

The fixed cost is ~20ms — that’s process startup, IPC round-trip, and the few necessary lstat()s on actually-changed files. Above that, the cost scales linearly with the number of changed files, not with the size of the repo.

What this means in practice

For developers:

The size of your repo barely matters once fsmonitor is on. A 1-million-file repo with 3 changed files runs as fast as a 50k-file repo with 3 changed files.
The platform you’re on barely matters either. Windows is no longer 2.5× slower; it’s a few milliseconds behind.
IDE polling becomes essentially free. Polling every second adds ~30ms of background CPU, not 8 seconds.

For CI:

A 250k-file monorepo with three git status calls per job — common in incremental-build pipelines — saves about 15–25 seconds per run. Over thousands of runs per day, this is real money.
Cold-start latency (the first call after a reboot or fresh container start) is still ~50ms because the daemon’s cache has to prime. gity daemon oneshot includes a quick prime step.

The bottleneck that remains

Once git status is fast, the next bottleneck depends on your workflow:

git fetch: dominated by network and object-walk overhead on the remote. Mitigation: partial clone (--filter=blob:none), background prefetch.
git diff with hashing: SHA-1 (or SHA-256 if you’re on a modern repo) on each changed file. Negligible for small changes; visible when you compare a megabyte-scale binary file.
git log over deep history: O(depth) without commit-graph; O(1) with. The commit-graph file is the single best optimization for log-heavy workflows.

git status, post-fsmonitor, is no longer in the top-five list of bottlenecks. That’s the goal.

Methodology notes

A few caveats so you can reproduce this:

I excluded git status runs immediately after git checkout from the warm-baseline median. Checkout changes a large fraction of mtimes, which gives fsmonitor work and inflates apparent latency.
I used gity v0.1.2 for the fsmonitor numbers. Watchman is within 10% (slightly slower due to Perl helper overhead). Git’s built-in daemon is within 5% (slightly slower than gity at high call frequency due to per-call allocation).
The repo I tested on is a real monorepo (anonymized for this post) at one of my client engagements. Numbers may differ on your repo, especially if your tree shape is unusually deep or wide. The gity demo command will give you your-machine, your-repo numbers in a minute.

Try it yourself:

cargo install gity
cd ~/work/your-largest-repo
gity demo

The included demo races vanilla Git against gity in a TUI and prints both wall-clock and speedup numbers when it finishes.

Frequently asked questions

What's the largest cost inside `git status`?

The working-tree walk. On a 250k-file repo, ~85% of a cold `git status` is `lstat()` calls on every tracked file. Less than 10% is index parsing, hashing, or output formatting. fsmonitor cuts the walk to a few files, eliminating the dominant cost.

Does platform matter?

Yes, but less than you'd think. macOS APFS, Linux ext4, and Windows NTFS all run at roughly 30,000–80,000 `lstat()` per second. Wall-clock differences come more from inotify/FSEvents/RDC efficiency than raw stat throughput.

Why doesn't `git status` use multi-threading?

It does, in part — the working-tree walk is multi-threaded with `core.preloadIndex`. But you still pay the syscall cost per file, and threading mostly helps you saturate the kernel rather than reduce total work.