health

package module
v6.0.2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 29, 2026 License: MIT Imports: 17 Imported by: 0

README

health-go

CI Go Report Card Go Reference codecov

A production-ready library for adding health checks to Go services with advanced features for reliability and observability.

What's New in v6

v6 is a major modernization release focusing on stability, performance, and Go 1.25+ compatibility.

Breaking Changes
  • Go 1.25+ required - Takes advantage of modern Go features
  • Module path changed - github.com/bretep/health-go/v6
  • Redis client updated - Uses github.com/redis/go-redis/v9 (renamed from go-redis/redis)
Stability Improvements
  • Fixed data races - All race conditions detected by -race have been resolved
  • Fixed goroutine leaks - Check goroutines now properly terminate on pause/shutdown
  • Fixed copy-lock issues - Mutex-containing structs are now properly handled
  • Thread-safe status updates - Added proper synchronization to StatusUpdater
Performance Improvements
  • Buffered channels - Prevents goroutine blocking during health checks
  • Optimized event tracking - Uses maps.Clone() for efficient map copying
  • Reduced allocations - Uses clear() builtin instead of map reallocation
  • Modern random number generation - Uses math/rand/v2 for better performance
Code Quality
  • Comprehensive linting - Passes golangci-lint with strict configuration
  • Race-tested - All tests pass with -race flag
  • 77% test coverage - Extensive integration tests with real services
  • Modern Go idioms - Uses cmp.Or(), slices package, for range N syntax
Dependency Updates
Dependency Version
OpenTelemetry v1.39.0
gRPC v1.78.0
Redis client v9.17.3
MongoDB driver v1.17.7
MySQL driver v1.9.3
Cassandra (gocql) v1.7.0
RabbitMQ (amqp091-go) v1.10.0
pgx/v5 v5.8.0
NATS v1.48.0
InfluxDB client v2.14.0
SQLite (modernc.org) v1.44.3
testify v1.11.1

Features

  • HTTP Handler - Exposes health status via HTTP endpoints compatible with net/http
  • Background Health Checks - Checks run asynchronously on intervals to prevent DoS to backend services
  • Status Thresholds - Configure successes/failures required before status changes (debouncing)
  • Notification System - Subscribe to health status changes with configurable notifiers
  • Action Runners - Execute shell commands automatically on status changes
  • Event Tracking - Correlate related alerts during incidents with event IDs and sequences
  • Maintenance Mode - Group failures during maintenance windows under a single event
  • State Persistence - Retain health check state across process restarts (SQLite or custom)
  • Pause/Resume - Dynamically pause and resume individual health checks
  • OpenTelemetry Support - Built-in tracing support
Built-in Checkers
Checker Description
Cassandra Apache Cassandra connectivity
gRPC gRPC health checking protocol
HTTP HTTP endpoint availability
InfluxDB InfluxDB v1.x connectivity
Maintenance File-based maintenance mode
Memcached Memcached connectivity
MongoDB MongoDB connectivity and ping
MySQL MySQL/MariaDB connectivity
NATS NATS messaging connectivity
PostgreSQL PostgreSQL via lib/pq
pgx/v4 PostgreSQL via pgx v4
pgx/v5 PostgreSQL via pgx v5
RabbitMQ RabbitMQ connectivity and publish/consume
Redis Redis connectivity and ping

Why Use This Library?

Writing a health check endpoint seems simple—until you need it to be production-ready. Here's what this library handles that you'd otherwise build yourself:

The Naive Approach Breaks Under Load

A simple health check that queries your database on every request creates problems:

// DON'T DO THIS - causes cascading failures
func healthHandler(w http.ResponseWriter, r *http.Request) {
    if err := db.Ping(); err != nil {
        w.WriteHeader(503)
        return
    }
    w.WriteHeader(200)
}

When your service is under load or your database is struggling, every health check request adds more pressure. Load balancers checking health every few seconds across multiple instances can turn a slow database into an outage.

This library runs checks in the background on intervals, serving cached status to HTTP requests. Your database gets checked once every 30 seconds, not once per health check request.

Flapping Checks Create Alert Fatigue

A database that's slow for one check shouldn't page your on-call engineer at 3 AM. But a database that's been failing for 30 seconds should.

Status thresholds let you require multiple consecutive failures before changing status, and multiple consecutive successes before recovering. This eliminates noise from transient issues.

Incident Correlation is Hard

When multiple services fail during a database outage, you get flooded with alerts. Correlating them manually wastes time during incidents.

Event tracking automatically assigns the same event ID to related failures. When you enter maintenance mode, all failures during that window share an event ID, making it trivial to group and suppress related alerts.

Recovery Actions Need Coordination

You might want to run a script when a check fails—but not every time it fails. Running a recovery script 100 times during a 5-minute outage makes things worse.

Action runners have built-in cooldowns and can be configured to run only on state transitions, not on every failed check.

What You Get
Concern DIY Effort This Library
Background checks Goroutines, timers, synchronization Built-in
Debouncing/thresholds Counter logic, state machines Configuration
Notification routing Channel management, fan-out Subscribe once
Alert correlation UUID generation, state tracking Automatic
Maintenance windows Flag management, conditional logic Name a check "maintenance"
Graceful degradation Circuit breaker patterns SkipOnErr: true
Concurrent check limits Semaphores, worker pools WithMaxConcurrent(n)
Observability Manual instrumentation OpenTelemetry built-in

The library is ~1000 lines of tested, production-hardened code. Writing it yourself means debugging race conditions, edge cases in state transitions, and notification delivery—time better spent on your actual product.

Installation

go get github.com/bretep/health-go/v6

Requirements: Go 1.25 or later

Quick Start

package main

import (
	"log"
	"net/http"
	"time"

	"github.com/bretep/health-go/v6"
	"github.com/bretep/health-go/v6/checks/maintenance"
	healthMysql "github.com/bretep/health-go/v6/checks/mysql"
)

func main() {
	h, err := health.New(
		health.WithComponent(health.Component{
			Name:    "myservice",
			Version: "v1.0",
		}),
		health.WithSystemInfo(),
	)
	if err != nil {
		log.Fatalf("Failed to create health checker: %v", err)
	}

	// Maintenance mode - create file to enter, remove to exit
	// Use a persistent path (not /tmp) so maintenance survives reboots
	h.Register(health.CheckConfig{
		Name:                   "maintenance",
		Interval:               time.Second, // file check is cheap
		SuccessesBeforePassing: 1,           // exit maintenance immediately
		Check: maintenance.New(maintenance.Config{
			File:   "/var/lib/myservice/maintenance",
			Health: h,
		}),
	})

	h.Register(health.CheckConfig{
		Name:     "mysql",
		Timeout:  time.Second * 2,
		Interval: time.Second * 30,
		Check: healthMysql.New(healthMysql.Config{
			DSN: "user:pass@tcp(localhost:3306)/db",
		}),
	})

	http.Handle("/health", h.Handler())
	log.Fatal(http.ListenAndServe(":8080", nil))
}

Configuration

CheckConfig Options
type CheckConfig struct {
	// Name is the name of the resource to be checked (required)
	Name string

	// Check is the function that performs the health check (required)
	Check CheckFunc

	// Interval is how often the check runs (default: 10s, minimum: 1s)
	Interval time.Duration

	// Timeout for each check execution (default: 2s)
	Timeout time.Duration

	// SkipOnErr returns Warning instead of Critical on failure
	SkipOnErr bool

	// Status thresholds for debouncing
	SuccessesBeforePassing int  // default: 3
	FailuresBeforeWarning  int  // default: 1
	FailuresBeforeCritical int  // default: 1

	// Actions to run on status changes
	SuccessAction *Action
	WarningAction *Action
	FailureAction *Action
	TimeoutAction *Action

	// Notifiers to use for this check's notifications
	Notifiers []string
}
Status States
Status HTTP Code Description
passing 200 All checks healthy
warning 429 Check failed but SkipOnErr is true
critical 503 Check failed
timeout 503 Check exceeded timeout
initializing 503 Check hasn't met SuccessesBeforePassing threshold yet

Status Thresholds

Prevent flapping by requiring multiple consecutive results before changing status:

h.Register(health.CheckConfig{
	Name:     "database",
	Interval: time.Second * 10,
	Check:    myCheck,

	// Require 3 consecutive successes before reporting healthy
	SuccessesBeforePassing: 3,

	// Require 2 consecutive failures before warning
	FailuresBeforeWarning: 2,

	// Require 5 consecutive failures before critical
	FailuresBeforeCritical: 5,
})

Notifications

Subscribe to health status changes:

h, _ := health.New()

// Subscribe to notifications
notifications := h.Subscribe()

go func() {
	for notification := range notifications {
		fmt.Printf("Check: %s, Message: %s, EventID: %s\n",
			notification.Name,
			notification.Message,
			notification.EventID,
		)
	}
}()

// Temporarily disable notifications (e.g., during deployment)
h.NotificationsDisable(5 * time.Minute)

// Re-enable notifications
h.NotificationsEnable()

// Check if notifications are enabled
if h.NotificationsEnabled() {
	// ...
}
Notification Structure
type CheckNotification struct {
	Name       string   // Check name
	Message    string   // Status message
	Attachment []byte   // Command output (if SendCommandOutput is true)
	Tags       []string // Metadata tags (status, event_id, sequence, etc.)
	Notifiers  []string // Which notifiers to use
	EventID    string   // Correlates related alerts during an incident
	Sequence   int      // Order within the event (1, 2, 3...)
}

Action Runners

Execute commands automatically when status changes:

h.Register(health.CheckConfig{
	Name:  "database",
	Check: myCheck,

	FailureAction: &health.Action{
		Command:             "/usr/local/bin/alert-oncall.sh",
		UnlockAfterDuration: 5 * time.Minute,  // Cooldown period
		SendCommandOutput:   true,              // Include output in notification
		Notifiers:           []string{"slack", "pagerduty"},
	},

	SuccessAction: &health.Action{
		Command:                "/usr/local/bin/resolve-alert.sh",
		UnlockOnlyAfterHealthy: true,  // Only run after recovery from failure
	},
})
Action Configuration
Field Description
Command Shell command to execute
UnlockAfterDuration Minimum time between executions (cooldown)
UnlockOnlyAfterHealthy Only allow running after status was previously healthy
SendCommandOutput Include command stdout/stderr in notification
Notifiers List of notifier names to send results to
Environment Variables

Actions receive context via environment variables:

Variable Description
HEALTH_GO_MESSAGE The error message from the failed check
#!/bin/bash
# alert-oncall.sh
echo "Health check failed: $HEALTH_GO_MESSAGE"
curl -X POST "https://api.pagerduty.com/incidents" \
  -d "{\"message\": \"$HEALTH_GO_MESSAGE\"}"

Event Tracking

Events correlate related alerts during incidents. An event starts when a check becomes unhealthy and ends when it recovers.

// Access the event tracker
tracker := h.EventTracker

// Get current event ID for a check
eventID := tracker.GetEventID("database")

// Get all active events
events := tracker.ActiveEvents()

// Check maintenance status
if tracker.IsMaintenanceActive() {
	maintenanceEventID := tracker.GetMaintenanceEventID()
}
Maintenance Mode

When a check named maintenance becomes unhealthy:

  1. A maintenance event ID is created
  2. All new failures use this event ID (correlating them)
  3. When maintenance ends, checks keep the event ID until they recover
  4. This groups all maintenance-related alerts together

Use the built-in file-based maintenance checker:

import "github.com/bretep/health-go/v6/checks/maintenance"

h, _ := health.New()

// Register the maintenance check - MUST be named "maintenance" for event correlation
h.Register(health.CheckConfig{
	Name:                   "maintenance",
	Interval:               time.Second, // file check is cheap, respond quickly
	SuccessesBeforePassing: 1,           // exit maintenance immediately when file removed
	Check: maintenance.New(maintenance.Config{
		File:   "/var/lib/myservice/maintenance", // persistent path survives reboots
		Health: h,                                 // Optional: enables notification control
	}),
})

Entering maintenance mode:

# Simple maintenance
echo "Database upgrade in progress" > /var/lib/myservice/maintenance

# With notification suppression for 1 hour
echo "Scheduled maintenance
HEALTH_GO_DISABLE_NOTIFICATIONS_3600" > /var/lib/myservice/maintenance

# Suppress notifications indefinitely
echo "HEALTH_GO_DISABLE_NOTIFICATIONS" > /var/lib/myservice/maintenance

Exiting maintenance mode:

rm /var/lib/myservice/maintenance

When the file is removed, the check passes and notifications are automatically re-enabled.

State Persistence

By default, health check state is lost when your process restarts. This means:

  • Event IDs reset, breaking alert correlation
  • Success/failure counters reset, causing re-initialization delays
  • Action cooldowns reset, potentially triggering duplicate alerts

State persistence solves this by saving state to durable storage.

Using the SQLite Persister

The built-in SQLite persister provides zero-configuration persistence:

import (
	"github.com/bretep/health-go/v6"
	"github.com/bretep/health-go/v6/persister/sqlite"
)

// Create the persister
persister, err := sqlite.New(sqlite.Config{
	Path: "/var/lib/myapp/health-state.db",
	// Optional: customize debounce interval (default: 1s)
	// DebounceInterval: 500 * time.Millisecond,
})
if err != nil {
	log.Fatal(err)
}
defer persister.Close()

// Create health checker with persistence
h, err := health.New(
	health.WithStatePersister(persister),
)

What gets persisted:

Component State
EventTracker Event IDs, sequences, maintenance state
StatusUpdater Success/failure counts, pending event IDs
CheckStatus Current status and error message
ActionRunner Status, per-action last run times and cooldowns

Design features:

  • Async saves with debouncing - State changes are batched (default 1s) to avoid disk I/O on every check
  • Soft failures - Persistence errors are logged but don't fail health checks
  • WAL mode - SQLite uses write-ahead logging for better concurrent access
  • Automatic restore - State is loaded automatically when health.New() is called
Saving State on Shutdown

For graceful shutdown, explicitly save state to ensure the latest changes are persisted:

// Set up signal handling
sigCh := make(chan os.Signal, 1)
signal.Notify(sigCh, syscall.SIGINT, syscall.SIGTERM)

go func() {
	<-sigCh
	log.Println("Shutting down, saving state...")
	h.SaveState(context.Background())
	os.Exit(0)
}()
Implementing a Custom Persister

For other storage backends (Redis, PostgreSQL, S3, etc.), implement the StatePersister interface:

type StatePersister interface {
	SaveEventTrackerState(ctx context.Context, state *EventTrackerState) error
	LoadEventTrackerState(ctx context.Context) (*EventTrackerState, error)
	SaveCheckState(ctx context.Context, checkName string, state *CheckState) error
	LoadCheckState(ctx context.Context, checkName string) (*CheckState, error)
	LoadAllCheckStates(ctx context.Context) (map[string]*CheckState, error)
	DeleteCheckState(ctx context.Context, checkName string) error
	Close() error
}

See _examples/custom_persister.go for a complete file-based implementation example.

When to Use Persistence
Scenario Recommendation
Short-lived processes (serverless, batch jobs) Skip persistence
Long-running services with infrequent restarts Optional
Services with action cooldowns you want preserved Recommended
Services where alert correlation across restarts matters Recommended
High-availability setups with rolling deploys Recommended

Pause/Resume Checks

Dynamically control individual checks:

h.Register(health.CheckConfig{
	Name:  "database",
	Check: myCheck,
})

// Get the check config
check := h.checks["database"]

// Pause the check (stops running)
check.Pause()

// Resume the check (starts running again)
check.Start()

Custom Check Functions

func myCustomCheck(ctx context.Context) health.CheckResponse {
	// Perform health check logic
	err := checkSomething()

	if err != nil {
		return health.CheckResponse{
			Error:     err,
			IsWarning: false,  // true = Warning, false = Critical
		}
	}

	return health.CheckResponse{}  // Healthy
}

h.Register(health.CheckConfig{
	Name:  "custom",
	Check: myCustomCheck,
})
Disabling Notifications Per-Response
func myCheck(ctx context.Context) health.CheckResponse {
	// Don't send notification for this specific response
	return health.CheckResponse{
		Error:          errors.New("expected transient error"),
		NoNotification: true,
	}
}

HTTP Handlers

Standard Handler
http.Handle("/health", h.Handler())
HandlerFunc
// Works with any router
r := chi.NewRouter()
r.Get("/health", h.HandlerFunc)

// Or with gorilla/mux
r := mux.NewRouter()
r.HandleFunc("/health", h.HandlerFunc)
Response Format

Healthy (200 OK):

{
  "status": "passing",
  "timestamp": "2024-01-15T10:30:00.000Z",
  "system": {
    "version": "go1.25.0",
    "goroutines_count": 12,
    "total_alloc_bytes": 1234567,
    "heap_objects_count": 5678,
    "alloc_bytes": 234567
  },
  "component": {
    "name": "myservice",
    "version": "v1.0"
  }
}

Unhealthy (503 Service Unavailable):

{
  "status": "critical",
  "timestamp": "2024-01-15T10:30:00.000Z",
  "failures": {
    "database": "connection refused",
    "redis": "timeout after 2s"
  },
  "system": { ... },
  "component": { ... }
}

Options

h, err := health.New(
	// Add component metadata
	health.WithComponent(health.Component{
		Name:    "api-server",
		Version: "v2.1.0",
	}),

	// Include Go runtime metrics in response
	health.WithSystemInfo(),

	// Limit concurrent check execution
	health.WithMaxConcurrent(4),

	// Add OpenTelemetry tracing
	health.WithTracerProvider(tp, "health-checks"),

	// Enable state persistence (see State Persistence section)
	health.WithStatePersister(persister),

	// Register checks at creation
	health.WithChecks(
		health.CheckConfig{Name: "db", Check: dbCheck},
		health.CheckConfig{Name: "cache", Check: cacheCheck},
	),
)

Using Built-in Checkers

import (
	"github.com/bretep/health-go/v6"
	"github.com/bretep/health-go/v6/checks/http"
	"github.com/bretep/health-go/v6/checks/maintenance"
	"github.com/bretep/health-go/v6/checks/mysql"
	"github.com/bretep/health-go/v6/checks/postgres"
	"github.com/bretep/health-go/v6/checks/redis"
)

// HTTP endpoint check
h.Register(health.CheckConfig{
	Name:    "external-api",
	Timeout: time.Second * 5,
	Check: http.New(http.Config{
		URL:            "https://api.example.com/health",
		RequestTimeout: time.Second * 3,
	}),
})

// MySQL check
h.Register(health.CheckConfig{
	Name: "mysql",
	Check: mysql.New(mysql.Config{
		DSN: "user:pass@tcp(localhost:3306)/mydb",
	}),
})

// PostgreSQL check
h.Register(health.CheckConfig{
	Name: "postgres",
	Check: postgres.New(postgres.Config{
		DSN: "postgres://user:pass@localhost:5432/mydb?sslmode=disable",
	}),
})

// Redis check
h.Register(health.CheckConfig{
	Name: "redis",
	Check: redis.New(redis.Config{
		DSN: "redis://localhost:6379",
	}),
})

Testing

The library includes comprehensive tests with real service integrations:

# Run unit tests
go test ./...

# Run with race detector
go test -race ./...

# Run with coverage
go test -coverprofile=coverage.out ./...

# Run integration tests (requires Docker)
docker compose up -d
go test -race ./...
docker compose down

Examples

See the _examples directory for complete examples:

Example Description
server.go Basic usage with multiple check types
server_with_persistence.go Using SQLite persister for state persistence
custom_persister.go Implementing a custom StatePersister

Migration from v5

  1. Update import paths:

    // Old
    import "github.com/bretep/health-go/v5"
    
    // New
    import "github.com/bretep/health-go/v6"
    
  2. Update Redis import if using the Redis checker:

    // The redis client package was renamed upstream
    // No code changes needed, just `go mod tidy`
    
  3. Ensure Go 1.25+ is installed

  4. Run go mod tidy to update dependencies

Contributing

  1. Fork it
  2. Create your feature branch (git checkout -b my-new-feature)
  3. Commit your changes (git commit -am 'Add some feature')
  4. Push to the branch (git push origin my-new-feature)
  5. Create new Pull Request
Development
# Install dependencies
go mod download

# Run linter
golangci-lint run ./...

# Run tests with Docker services
docker compose up -d
go test -race -cover ./...
docker compose down

License

This project is licensed under the MIT License - see the LICENSE file for details.


This project is a fork of github.com/hellofresh/health-go with additional features. See NOTICE for attribution details.

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Action

type Action struct {
	Command                string
	UnlockAfterDuration    time.Duration
	UnlockOnlyAfterHealthy bool
	SendCommandOutput      bool
	// Notifiers list of enabled notifiers
	Notifiers []string
	// contains filtered or unexported fields
}

Action contains configuration for running an action

func (Action) Run

func (a Action) Run(message string) (notification CheckNotification)

Run executes the action command

type ActionRunner

type ActionRunner struct {
	// contains filtered or unexported fields
}

ActionRunner keeps track of a checks actions

func NewActionRunner

func NewActionRunner(checkName string, successAction, warningAction, failureAction, timeoutAction *Action, notifications *Notifications, eventTracker *EventTracker) *ActionRunner

NewActionRunner returns a new ActionRunner

func (*ActionRunner) Failure

func (a *ActionRunner) Failure(message string, eventID string)

Failure handles a failed check result

func (*ActionRunner) GetState added in v6.0.1

func (a *ActionRunner) GetState() *ActionRunnerState

GetState returns a snapshot of the ActionRunner state for persistence.

func (*ActionRunner) RestoreState added in v6.0.1

func (a *ActionRunner) RestoreState(state *ActionRunnerState)

RestoreState restores the ActionRunner state from a persisted snapshot.

func (*ActionRunner) Success

func (a *ActionRunner) Success(message string, eventID string)

Success handles a successful check result

func (*ActionRunner) Timeout

func (a *ActionRunner) Timeout(message string, eventID string)

Timeout handles a timed out check result

func (*ActionRunner) Warning

func (a *ActionRunner) Warning(message string, eventID string)

Warning handles a warning check result

type ActionRunnerState added in v6.0.1

type ActionRunnerState struct {
	// Status is the current action runner status
	Status Status `json:"status"`

	// Per-action state
	SuccessAction *ActionState `json:"success_action,omitempty"`
	WarningAction *ActionState `json:"warning_action,omitempty"`
	FailureAction *ActionState `json:"failure_action,omitempty"`
	TimeoutAction *ActionState `json:"timeout_action,omitempty"`
}

ActionRunnerState represents the persistable state of an ActionRunner.

type ActionState added in v6.0.1

type ActionState struct {
	// LastRun is when the action was last executed
	LastRun time.Time `json:"last_run"`

	// CanRun indicates whether the action is eligible to run
	CanRun bool `json:"can_run"`
}

ActionState represents the persistable state of an Action.

type Check

type Check struct {
	// Status is the check.
	Status Status `json:"check"`
	// Timestamp is the time in which the check occurred.
	Timestamp time.Time `json:"timestamp"`
	// Failures holds the failed checks along with their messages.
	Failures map[string]string `json:"failures,omitempty"`
	// System holds information of the go process.
	*System `json:"system,omitempty"`
	// Component holds information on the component for which checks are made
	Component `json:"component"`
}

Check represents the health check response.

type CheckConfig

type CheckConfig struct {
	// Name is the name of the resource to be checked.
	Name string
	// Interval is how often the health check should run
	Interval time.Duration
	// Timeout is the timeout defined for every check.
	Timeout time.Duration
	// SkipOnErr if set to true, it will retrieve StatusPassing providing the error message from the failed resource.
	SkipOnErr bool
	// Check is the func which executes the check.
	Check CheckFunc
	// Status
	Status *StatusUpdater
	// SuccessesBeforePassing number of passing checks before reporting as passing
	SuccessesBeforePassing int
	// FailuresBeforeWarning number of passing checks before reporting as warning
	FailuresBeforeWarning int
	// FailuresBeforeCritical number of passing checks before reporting as critical
	FailuresBeforeCritical int
	// SuccessAction configuration
	SuccessAction *Action
	// WarningAction configuration
	WarningAction *Action
	// FailureAction configuration
	FailureAction *Action
	// TimeoutAction configuration
	TimeoutAction *Action
	// Notifiers list of enabled notifiers
	Notifiers []string
	// contains filtered or unexported fields
}

CheckConfig carries the parameters to run the check.

func (*CheckConfig) Pause

func (c *CheckConfig) Pause()

Pause a health check

func (*CheckConfig) Start

func (c *CheckConfig) Start()

Start a health check

type CheckFunc

type CheckFunc func(ctx context.Context) CheckResponse

CheckFunc is the func which executes the check.

type CheckNotification

type CheckNotification struct {
	Name       string
	Message    string
	Attachment []byte
	Tags       []string
	Notifiers  []string
	EventID    string // Event ID for correlating related alerts during an incident
	Sequence   int    // Sequence number within the event (1, 2, 3...)
}

CheckNotification represents a notification sent when check status changes.

type CheckResponse

type CheckResponse struct {
	// Error message
	Error error

	// IsWarning if set to true, it will retrieve StatusPassing providing the error message from the failed resource.
	IsWarning bool

	// NoNotification disables a notification for this response
	NoNotification bool
}

CheckResponse is returned by a check function.

type CheckState added in v6.0.1

type CheckState struct {
	// StatusUpdater state
	Successes      int    `json:"successes"`
	Failures       int    `json:"failures"`
	PendingEventID string `json:"pending_event_id"`

	// CheckStatus state
	Status   Status `json:"status"`
	ErrorMsg string `json:"error_msg"`

	// ActionRunner state
	ActionRunnerState *ActionRunnerState `json:"action_runner_state,omitempty"`

	// UpdatedAt is when this state was last updated
	UpdatedAt time.Time `json:"updated_at"`
}

CheckState represents the persistable state of a health check.

type CheckStatus

type CheckStatus struct {
	// Status is the check.
	Status Status
	// Error informational message about the Status
	Error error
	// contains filtered or unexported fields
}

CheckStatus holds the current status of a check.

func (*CheckStatus) Get

func (s *CheckStatus) Get() (status Status, err error)

Get check of running check

func (*CheckStatus) Update

func (s *CheckStatus) Update(status Status, err error)

Update check of running check

type Component

type Component struct {
	// Name is the name of the component.
	Name string `json:"name"`
	// Version is the component version.
	Version string `json:"version"`
}

Component descriptive values about the component for which checks are made

type EventTracker

type EventTracker struct {
	// contains filtered or unexported fields
}

EventTracker manages event IDs for health check incidents. An event starts when a check transitions from healthy to unhealthy, and ends when it returns to healthy.

Special handling for maintenance mode: - When maintenance check becomes unhealthy, a maintenance event_id is created - All checks that fail during maintenance use the maintenance event_id - When maintenance ends, checks that are still unhealthy keep the maintenance event_id - Only when each individual check becomes healthy does it clear its event_id

func NewEventTracker

func NewEventTracker() *EventTracker

NewEventTracker creates a new event tracker

func (*EventTracker) ActiveEvents

func (t *EventTracker) ActiveEvents() map[string]string

ActiveEvents returns a copy of all active events

func (*EventTracker) ClearSequence

func (t *EventTracker) ClearSequence(eventID string)

ClearSequence removes sequence tracking for an event (call when event ends)

func (*EventTracker) GetEventID

func (t *EventTracker) GetEventID(checkName string) string

GetEventID returns the current event ID for a check without modifying state. Returns empty string if no active event.

func (*EventTracker) GetMaintenanceEventID

func (t *EventTracker) GetMaintenanceEventID() string

GetMaintenanceEventID returns the current maintenance event ID

func (*EventTracker) GetNextSequence

func (t *EventTracker) GetNextSequence(eventID string) int

GetNextSequence returns the next sequence number for an event and increments counter. Returns 0 if eventID is empty.

func (*EventTracker) GetOrCreateEventID

func (t *EventTracker) GetOrCreateEventID(checkName string, status Status) string

GetOrCreateEventID returns the active event ID for a check, creating a new one if none exists (for failure/warning notifications). For success notifications, this returns the event_id that was active (so downstream systems know which event ended) and clears it from the tracker.

Special maintenance behavior: - If this is the maintenance check becoming unhealthy, creates maintenance event_id - If maintenance is active and another check fails, uses maintenance event_id - If maintenance ends but a check is still unhealthy, keeps maintenance event_id

func (*EventTracker) GetState added in v6.0.1

func (t *EventTracker) GetState() *EventTrackerState

GetState returns a snapshot of the EventTracker state for persistence.

func (*EventTracker) IsMaintenanceActive

func (t *EventTracker) IsMaintenanceActive() bool

IsMaintenanceActive returns whether maintenance mode is currently active

func (*EventTracker) RestoreState added in v6.0.1

func (t *EventTracker) RestoreState(state *EventTrackerState)

RestoreState restores the EventTracker state from a persisted snapshot. This should be called before any checks are registered.

type EventTrackerState added in v6.0.1

type EventTrackerState struct {
	// EventIDs maps check names to their active event IDs
	EventIDs map[string]string `json:"event_ids"`

	// Sequences maps event IDs to their current sequence numbers
	Sequences map[string]int `json:"sequences"`

	// MaintenanceEventID is the current maintenance event ID (empty if not in maintenance)
	MaintenanceEventID string `json:"maintenance_event_id"`

	// MaintenanceActive indicates whether maintenance mode is currently active
	MaintenanceActive bool `json:"maintenance_active"`

	// MaintenanceChecks tracks checks that started failing during maintenance
	MaintenanceChecks map[string]bool `json:"maintenance_checks"`

	// UpdatedAt is when this state was last updated
	UpdatedAt time.Time `json:"updated_at"`
}

EventTrackerState represents the persistable state of an EventTracker.

type Health

type Health struct {
	NotificationsSender *Notifications
	EventTracker        *EventTracker
	// contains filtered or unexported fields
}

Health is the health-checks container

func New

func New(opts ...Option) (*Health, error)

New instantiates and build new health check container

func (*Health) Handler

func (h *Health) Handler() http.Handler

Handler returns an HTTP handler (http.HandlerFunc).

func (*Health) HandlerFunc

func (h *Health) HandlerFunc(w http.ResponseWriter, r *http.Request)

HandlerFunc is the HTTP handler function.

func (*Health) NotificationsDisable

func (h *Health) NotificationsDisable(suppressTime time.Duration)

NotificationsDisable disables notification

func (*Health) NotificationsEnable

func (h *Health) NotificationsEnable()

NotificationsEnable enables notifications

func (*Health) NotificationsEnabled

func (h *Health) NotificationsEnabled() bool

NotificationsEnabled enables notifications

func (*Health) Persister added in v6.0.1

func (h *Health) Persister() StatePersister

Persister returns the configured state persister. Returns nil if no persister is configured (using NoopPersister).

func (*Health) Register

func (h *Health) Register(c CheckConfig) error

Register registers a check config to be performed.

func (*Health) SaveState added in v6.0.1

func (h *Health) SaveState(ctx context.Context)

SaveState persists the current state of all checks and the event tracker. This can be called periodically or before shutdown to ensure state is saved. Errors are logged but not returned - persistence failures should not impact health checks.

func (*Health) Status

func (h *Health) Status(ctx context.Context) Check

Status returns that check of the overall health checks

func (*Health) Subscribe

func (h *Health) Subscribe() (c <-chan CheckNotification)

Subscribe returns a channel for receiving health check notifications

type NoopPersister added in v6.0.1

type NoopPersister struct{}

NoopPersister is a no-op implementation of StatePersister. It performs no persistence and is the default when no persister is configured.

func NewNoopPersister added in v6.0.1

func NewNoopPersister() *NoopPersister

NewNoopPersister creates a new no-op persister.

func (*NoopPersister) Close added in v6.0.1

func (p *NoopPersister) Close() error

Close is a no-op.

func (*NoopPersister) DeleteCheckState added in v6.0.1

func (p *NoopPersister) DeleteCheckState(_ context.Context, _ string) error

DeleteCheckState is a no-op.

func (*NoopPersister) LoadAllCheckStates added in v6.0.1

func (p *NoopPersister) LoadAllCheckStates(_ context.Context) (map[string]*CheckState, error)

LoadAllCheckStates returns an empty map.

func (*NoopPersister) LoadCheckState added in v6.0.1

func (p *NoopPersister) LoadCheckState(_ context.Context, _ string) (*CheckState, error)

LoadCheckState returns nil (no persisted state).

func (*NoopPersister) LoadEventTrackerState added in v6.0.1

func (p *NoopPersister) LoadEventTrackerState(_ context.Context) (*EventTrackerState, error)

LoadEventTrackerState returns nil (no persisted state).

func (*NoopPersister) SaveCheckState added in v6.0.1

func (p *NoopPersister) SaveCheckState(_ context.Context, _ string, _ *CheckState) error

SaveCheckState is a no-op.

func (*NoopPersister) SaveEventTrackerState added in v6.0.1

func (p *NoopPersister) SaveEventTrackerState(_ context.Context, _ *EventTrackerState) error

SaveEventTrackerState is a no-op.

type Notifications

type Notifications struct {
	// contains filtered or unexported fields
}

Notifications manages publishing notifications

func NewNotificationSender

func NewNotificationSender(channel chan CheckNotification) *Notifications

NewNotificationSender for sending notifications

func (*Notifications) Disable

func (n *Notifications) Disable(disableDuration time.Duration)

Disable notifications

func (*Notifications) Enable

func (n *Notifications) Enable()

Enable notifications

func (*Notifications) Enabled

func (n *Notifications) Enabled() bool

Enabled check to see if notifications are enabled

func (*Notifications) Send

func (n *Notifications) Send(notification CheckNotification)

Send notifications

type Option

type Option func(*Health) error

Option is the health-container options type

func WithChecks

func WithChecks(checks ...CheckConfig) Option

WithChecks adds checks to newly instantiated health-container

func WithComponent

func WithComponent(component Component) Option

WithComponent sets the component description of the component to which this check refer

func WithMaxConcurrent

func WithMaxConcurrent(n int) Option

WithMaxConcurrent sets max number of concurrently running checks. Set to 1 if want to run all checks sequentially.

func WithStatePersister added in v6.0.1

func WithStatePersister(p StatePersister) Option

WithStatePersister sets the state persister for persisting health check state across process restarts. If not set, a no-op persister is used (no persistence).

func WithSystemInfo

func WithSystemInfo() Option

WithSystemInfo enables the option to return system information about the go process.

func WithTracerProvider

func WithTracerProvider(tp trace.TracerProvider, instrumentationName string) Option

WithTracerProvider sets trace provider for the checks and instrumentation name that will be used for tracer from trace provider.

type StatePersister added in v6.0.1

type StatePersister interface {
	// SaveEventTrackerState persists the event tracker state.
	SaveEventTrackerState(ctx context.Context, state *EventTrackerState) error

	// LoadEventTrackerState loads the persisted event tracker state.
	// Returns nil, nil if no state exists.
	LoadEventTrackerState(ctx context.Context) (*EventTrackerState, error)

	// SaveCheckState persists the state for a single check.
	SaveCheckState(ctx context.Context, checkName string, state *CheckState) error

	// LoadCheckState loads the persisted state for a single check.
	// Returns nil, nil if no state exists for the check.
	LoadCheckState(ctx context.Context, checkName string) (*CheckState, error)

	// LoadAllCheckStates loads all persisted check states.
	LoadAllCheckStates(ctx context.Context) (map[string]*CheckState, error)

	// DeleteCheckState removes persisted state for a check.
	DeleteCheckState(ctx context.Context, checkName string) error

	// Close releases any resources held by the persister.
	Close() error
}

StatePersister defines the interface for persisting health check state. Implementations should be safe for concurrent use.

type Status

type Status string

Status type represents health check

const (
	// StatusPassing healthcheck is passing
	StatusPassing Status = "passing"
	// StatusWarning healthcheck is failing but should not fail the component
	StatusWarning Status = "warning"
	// StatusCritical healthcheck is failing should fail the component
	StatusCritical Status = "critical"
	// StatusTimeout healthcheck timed out should fail the component
	StatusTimeout Status = "timeout"
	// StatusInitializing healthcheck is starting up and has not meet the passing threshold
	StatusInitializing Status = "initializing"
	// MinimumInterval is the minimum time between checks
	// to prevent fork bombing a system
	MinimumInterval = time.Second
)

type StatusUpdater

type StatusUpdater struct {
	// contains filtered or unexported fields
}

StatusUpdater keeps track of a checks status

func NewStatusUpdater

func NewStatusUpdater(successesBeforePassing, failuresBeforeWarning, failuresBeforeCritical int, actionRunner *ActionRunner, notifications *Notifications, notifiers []string, eventTracker *EventTracker) *StatusUpdater

NewStatusUpdater returns a new StatusUpdater that is in critical condition. It sends an "initializing" notification to indicate the check is starting up. Use NewStatusUpdaterSilent when restoring from persisted state to avoid spurious notifications.

func NewStatusUpdaterSilent added in v6.0.2

func NewStatusUpdaterSilent(successesBeforePassing, failuresBeforeWarning, failuresBeforeCritical int, actionRunner *ActionRunner, notifications *Notifications, notifiers []string, eventTracker *EventTracker) *StatusUpdater

NewStatusUpdaterSilent returns a new StatusUpdater without sending the "initializing" notification. This should be used when restoring state from persistence to avoid sending spurious notifications that would incorrectly resolve or duplicate existing alerts.

func (*StatusUpdater) GetState added in v6.0.1

func (s *StatusUpdater) GetState() *CheckState

GetState returns a snapshot of the StatusUpdater state for persistence.

func (*StatusUpdater) RestoreState added in v6.0.1

func (s *StatusUpdater) RestoreState(state *CheckState)

RestoreState restores the StatusUpdater state from a persisted snapshot.

type System

type System struct {
	// Version is the go version.
	Version string `json:"version"`
	// GoroutinesCount is the number of the current goroutines.
	GoroutinesCount int `json:"goroutines_count"`
	// TotalAllocBytes is the total bytes allocated.
	TotalAllocBytes int `json:"total_alloc_bytes"`
	// HeapObjectsCount is the number of objects in the go heap.
	HeapObjectsCount int `json:"heap_objects_count"`
	// TotalAllocBytes is the bytes allocated and not yet freed.
	AllocBytes int `json:"alloc_bytes"`
}

System runtime variables about the go process.

Directories

Path Synopsis
checks
influxdb
Package influxdb implements a health check for InfluxDB instance.
Package influxdb implements a health check for InfluxDB instance.
maintenance
Package maintenance implements a file-based maintenance mode check.
Package maintenance implements a file-based maintenance mode check.
persister
sqlite
Package sqlite provides a SQLite-based implementation of the health.StatePersister interface.
Package sqlite provides a SQLite-based implementation of the health.StatePersister interface.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL