Skip to content

API key rotation does not invalidate auth cache — old keys remain valid for up to 2h #881

@stepanic

Description

@stepanic

Summary

Rotating a user API key via POST /v1/users/api-keys does not invalidate the ristretto in-memory cache, so the old (rotated) API key continues to authenticate requests for up to 2 hours (until the cache TTL expires) or until the API container is restarted.

This means a leaked user API key cannot be effectively revoked through the standard rotation flow — an operator must additionally restart the API process.

Reproduction

Self-hosted Docker deployment, single API container, fresh-rotated key.

API_KEY_OLD="uk_..."   # current key on the user
NEW_KEY=""             # to be filled in step 3

# 1) Confirm current key works.
curl -sS -H "x-api-key: $API_KEY_OLD" https://<host>/v1/users/me
#  → HTTP 200, returns user JSON

# 2) Rotate via web UI (Settings → "Regenerate API Key") OR API:
#    POST /v1/users/api-keys                       (handler `user_handler.go:266`)

# 3) Re-test the OLD key. Expected: 401. Actual: 200.
curl -sS -H "x-api-key: $API_KEY_OLD" https://<host>/v1/users/me
#  → HTTP 200, payload's `data.api_key` field reflects the NEW value but the
#    OLD key still authenticates.

# 4) Restart API container, repeat test.
#  → HTTP 401 (correct) — confirming the issue is purely in-memory cache.

We observed the old key continuing to work for the full duration we tested (~30 min, didn't wait for the 2h TTL); a colleague's earlier rotation only stopped working after a separate container restart.

Expected behaviour

After RotateAPIKey() returns, the previous API key should immediately fail authentication (HTTP 401), regardless of cache state.

Root cause

api/pkg/repositories/gorm_user_repository.go:

  • LoadAuthContext (lines 152–183) caches the apiKey → AuthContext mapping with a 2-hour TTL in ristretto.
  • RotateAPIKey (lines 58–82) updates the users.api_key column in the DB but never calls repository.cache.Del(...) or cache.Clear(). The cache entry for the old key continues to serve hits until natural TTL expiration.

For comparison, gorm_phone_api_key_repository.go correctly handles invalidation (cache.Clear() on store/delete at lines 60, 203 and cache.Del(phoneAPIKey.APIKey) on update at line 159). The user API key path is just missing the equivalent.

Suggested fix

Two options, in increasing order of precision:

Option A — minimal, matches existing phone-api-key pattern

Add repository.cache.Clear() to RotateAPIKey (and ideally to any other path that mutates users.api_key, e.g. Store at line 217). This mirrors what gorm_phone_api_key_repository.Store/Delete already do.

func (repository *gormUserRepository) RotateAPIKey(ctx context.Context, userID entities.UserID) (*entities.User, error) {
    // ... existing impl ...

    if err := tx.WithContext(ctx).Model(user).
        Clauses(clause.Returning{}).
        Where("id = ?", userID).
        Update("api_key", "uk_"+apiKey).Error; err != nil {
        return nil, ...
    }

    // Invalidate cache so the previous api_key stops authenticating.
    repository.cache.Clear()

    return user, nil
}

Option B — surgical, only delete the old entry

Load the user first (or have the caller pass it in), capture the old api_key, then cache.Del(oldKey) before the DB update. Slightly more code but avoids flushing unrelated cached users.

var oldKey string
err := crdbgorm.ExecuteTx(ctx, repository.db, nil, func(tx *gorm.DB) error {
    if err := tx.Model(user).Where("id = ?", userID).First(user).Error; err != nil {
        return err
    }
    oldKey = user.APIKey
    return tx.Model(user).Clauses(clause.Returning{}).
        Where("id = ?", userID).
        Update("api_key", "uk_"+apiKey).Error
})
if err == nil && oldKey != "" {
    repository.cache.Del(oldKey)
}

Happy to send a PR for either approach if it would help — let me know which you prefer.

Security impact

For self-hosted single-instance deployments, this means a leaked API key remains valid for up to 2 hours after rotation. For multi-instance deployments behind a load balancer (each instance has its own ristretto cache), the window is the same per instance and rotation visibility depends on the request hitting an instance that still has the old key cached.

Mitigation operators can apply today: restart the API process(es) after rotating an exposed key.

Environment

  • httpsms commit aa8bb70a (squashed snapshot, current main at the time of testing)
  • Single API container deployed on Coolify (Oracle Cloud, ARM64)
  • Postgres 16 + Redis 7

Thanks for the project — it's been a great base for building a self-hosted SMS gateway.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions