health

package module

v6.0.2 Latest Latest Go to latest Published: Jan 29, 2026 License: MIT Imports: 17 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/bretep/health-go

Links

Open Source Insights

README ¶

health-go

A production-ready library for adding health checks to Go services with advanced features for reliability and observability.

What's New in v6

v6 is a major modernization release focusing on stability, performance, and Go 1.25+ compatibility.

Breaking Changes

Go 1.25+ required - Takes advantage of modern Go features
Module path changed - github.com/bretep/health-go/v6
Redis client updated - Uses github.com/redis/go-redis/v9 (renamed from go-redis/redis)

Stability Improvements

Fixed data races - All race conditions detected by -race have been resolved
Fixed goroutine leaks - Check goroutines now properly terminate on pause/shutdown
Fixed copy-lock issues - Mutex-containing structs are now properly handled
Thread-safe status updates - Added proper synchronization to StatusUpdater

Performance Improvements

Buffered channels - Prevents goroutine blocking during health checks
Optimized event tracking - Uses maps.Clone() for efficient map copying
Reduced allocations - Uses clear() builtin instead of map reallocation
Modern random number generation - Uses math/rand/v2 for better performance

Code Quality

Comprehensive linting - Passes golangci-lint with strict configuration
Race-tested - All tests pass with -race flag
77% test coverage - Extensive integration tests with real services
Modern Go idioms - Uses cmp.Or(), slices package, for range N syntax

Dependency Updates

Dependency	Version
OpenTelemetry	v1.39.0
gRPC	v1.78.0
Redis client	v9.17.3
MongoDB driver	v1.17.7
MySQL driver	v1.9.3
Cassandra (gocql)	v1.7.0
RabbitMQ (amqp091-go)	v1.10.0
pgx/v5	v5.8.0
NATS	v1.48.0
InfluxDB client	v2.14.0
SQLite (modernc.org)	v1.44.3
testify	v1.11.1

Features

HTTP Handler - Exposes health status via HTTP endpoints compatible with net/http
Background Health Checks - Checks run asynchronously on intervals to prevent DoS to backend services
Status Thresholds - Configure successes/failures required before status changes (debouncing)
Notification System - Subscribe to health status changes with configurable notifiers
Action Runners - Execute shell commands automatically on status changes
Event Tracking - Correlate related alerts during incidents with event IDs and sequences
Maintenance Mode - Group failures during maintenance windows under a single event
State Persistence - Retain health check state across process restarts (SQLite or custom)
Pause/Resume - Dynamically pause and resume individual health checks
OpenTelemetry Support - Built-in tracing support

Built-in Checkers

Checker	Description
Cassandra	Apache Cassandra connectivity
gRPC	gRPC health checking protocol
HTTP	HTTP endpoint availability
InfluxDB	InfluxDB v1.x connectivity
Maintenance	File-based maintenance mode
Memcached	Memcached connectivity
MongoDB	MongoDB connectivity and ping
MySQL	MySQL/MariaDB connectivity
NATS	NATS messaging connectivity
PostgreSQL	PostgreSQL via lib/pq
pgx/v4	PostgreSQL via pgx v4
pgx/v5	PostgreSQL via pgx v5
RabbitMQ	RabbitMQ connectivity and publish/consume
Redis	Redis connectivity and ping

Why Use This Library?

Writing a health check endpoint seems simple—until you need it to be production-ready. Here's what this library handles that you'd otherwise build yourself:

The Naive Approach Breaks Under Load

A simple health check that queries your database on every request creates problems:

// DON'T DO THIS - causes cascading failures
func healthHandler(w http.ResponseWriter, r *http.Request) {
    if err := db.Ping(); err != nil {
        w.WriteHeader(503)
        return
    }
    w.WriteHeader(200)
}

When your service is under load or your database is struggling, every health check request adds more pressure. Load balancers checking health every few seconds across multiple instances can turn a slow database into an outage.

This library runs checks in the background on intervals, serving cached status to HTTP requests. Your database gets checked once every 30 seconds, not once per health check request.

Flapping Checks Create Alert Fatigue

A database that's slow for one check shouldn't page your on-call engineer at 3 AM. But a database that's been failing for 30 seconds should.

Status thresholds let you require multiple consecutive failures before changing status, and multiple consecutive successes before recovering. This eliminates noise from transient issues.

Incident Correlation is Hard

When multiple services fail during a database outage, you get flooded with alerts. Correlating them manually wastes time during incidents.

Event tracking automatically assigns the same event ID to related failures. When you enter maintenance mode, all failures during that window share an event ID, making it trivial to group and suppress related alerts.

Recovery Actions Need Coordination

You might want to run a script when a check fails—but not every time it fails. Running a recovery script 100 times during a 5-minute outage makes things worse.

Action runners have built-in cooldowns and can be configured to run only on state transitions, not on every failed check.

What You Get

Concern	DIY Effort	This Library
Background checks	Goroutines, timers, synchronization	Built-in
Debouncing/thresholds	Counter logic, state machines	Configuration
Notification routing	Channel management, fan-out	Subscribe once
Alert correlation	UUID generation, state tracking	Automatic
Maintenance windows	Flag management, conditional logic	Name a check "maintenance"
Graceful degradation	Circuit breaker patterns	`SkipOnErr: true`
Concurrent check limits	Semaphores, worker pools	`WithMaxConcurrent(n)`
Observability	Manual instrumentation	OpenTelemetry built-in

The library is ~1000 lines of tested, production-hardened code. Writing it yourself means debugging race conditions, edge cases in state transitions, and notification delivery—time better spent on your actual product.

Installation

go get github.com/bretep/health-go/v6

Requirements: Go 1.25 or later

Quick Start

package main

import (
	"log"
	"net/http"
	"time"

	"github.com/bretep/health-go/v6"
	"github.com/bretep/health-go/v6/checks/maintenance"
	healthMysql "github.com/bretep/health-go/v6/checks/mysql"
)

func main() {
	h, err := health.New(
		health.WithComponent(health.Component{
			Name:    "myservice",
			Version: "v1.0",
		}),
		health.WithSystemInfo(),
	)
	if err != nil {
		log.Fatalf("Failed to create health checker: %v", err)
	}

	// Maintenance mode - create file to enter, remove to exit
	// Use a persistent path (not /tmp) so maintenance survives reboots
	h.Register(health.CheckConfig{
		Name:                   "maintenance",
		Interval:               time.Second, // file check is cheap
		SuccessesBeforePassing: 1,           // exit maintenance immediately
		Check: maintenance.New(maintenance.Config{
			File:   "/var/lib/myservice/maintenance",
			Health: h,
		}),
	})

	h.Register(health.CheckConfig{
		Name:     "mysql",
		Timeout:  time.Second * 2,
		Interval: time.Second * 30,
		Check: healthMysql.New(healthMysql.Config{
			DSN: "user:pass@tcp(localhost:3306)/db",
		}),
	})

	http.Handle("/health", h.Handler())
	log.Fatal(http.ListenAndServe(":8080", nil))
}

Configuration

CheckConfig Options

type CheckConfig struct {
	// Name is the name of the resource to be checked (required)
	Name string

	// Check is the function that performs the health check (required)
	Check CheckFunc

	// Interval is how often the check runs (default: 10s, minimum: 1s)
	Interval time.Duration

	// Timeout for each check execution (default: 2s)
	Timeout time.Duration

	// SkipOnErr returns Warning instead of Critical on failure
	SkipOnErr bool

	// Status thresholds for debouncing
	SuccessesBeforePassing int  // default: 3
	FailuresBeforeWarning  int  // default: 1
	FailuresBeforeCritical int  // default: 1

	// Actions to run on status changes
	SuccessAction *Action
	WarningAction *Action
	FailureAction *Action
	TimeoutAction *Action

	// Notifiers to use for this check's notifications
	Notifiers []string
}

Status States

Status	HTTP Code	Description
`passing`	200	All checks healthy
`warning`	429	Check failed but `SkipOnErr` is true
`critical`	503	Check failed
`timeout`	503	Check exceeded timeout
`initializing`	503	Check hasn't met `SuccessesBeforePassing` threshold yet

Status Thresholds

Prevent flapping by requiring multiple consecutive results before changing status:

h.Register(health.CheckConfig{
	Name:     "database",
	Interval: time.Second * 10,
	Check:    myCheck,

	// Require 3 consecutive successes before reporting healthy
	SuccessesBeforePassing: 3,

	// Require 2 consecutive failures before warning
	FailuresBeforeWarning: 2,

	// Require 5 consecutive failures before critical
	FailuresBeforeCritical: 5,
})

Notifications

Subscribe to health status changes:

h, _ := health.New()

// Subscribe to notifications
notifications := h.Subscribe()

go func() {
	for notification := range notifications {
		fmt.Printf("Check: %s, Message: %s, EventID: %s\n",
			notification.Name,
			notification.Message,
			notification.EventID,
		)
	}
}()

// Temporarily disable notifications (e.g., during deployment)
h.NotificationsDisable(5 * time.Minute)

// Re-enable notifications
h.NotificationsEnable()

// Check if notifications are enabled
if h.NotificationsEnabled() {
	// ...
}

Notification Structure

type CheckNotification struct {
	Name       string   // Check name
	Message    string   // Status message
	Attachment []byte   // Command output (if SendCommandOutput is true)
	Tags       []string // Metadata tags (status, event_id, sequence, etc.)
	Notifiers  []string // Which notifiers to use
	EventID    string   // Correlates related alerts during an incident
	Sequence   int      // Order within the event (1, 2, 3...)
}

Action Runners

Execute commands automatically when status changes:

h.Register(health.CheckConfig{
	Name:  "database",
	Check: myCheck,

	FailureAction: &health.Action{
		Command:             "/usr/local/bin/alert-oncall.sh",
		UnlockAfterDuration: 5 * time.Minute,  // Cooldown period
		SendCommandOutput:   true,              // Include output in notification
		Notifiers:           []string{"slack", "pagerduty"},
	},

	SuccessAction: &health.Action{
		Command:                "/usr/local/bin/resolve-alert.sh",
		UnlockOnlyAfterHealthy: true,  // Only run after recovery from failure
	},
})

Action Configuration

Field	Description
`Command`	Shell command to execute
`UnlockAfterDuration`	Minimum time between executions (cooldown)
`UnlockOnlyAfterHealthy`	Only allow running after status was previously healthy
`SendCommandOutput`	Include command stdout/stderr in notification
`Notifiers`	List of notifier names to send results to

Environment Variables

Actions receive context via environment variables:

Variable	Description
`HEALTH_GO_MESSAGE`	The error message from the failed check

#!/bin/bash
# alert-oncall.sh
echo "Health check failed: $HEALTH_GO_MESSAGE"
curl -X POST "https://api.pagerduty.com/incidents" \
  -d "{\"message\": \"$HEALTH_GO_MESSAGE\"}"

Event Tracking

Events correlate related alerts during incidents. An event starts when a check becomes unhealthy and ends when it recovers.

// Access the event tracker
tracker := h.EventTracker

// Get current event ID for a check
eventID := tracker.GetEventID("database")

// Get all active events
events := tracker.ActiveEvents()

// Check maintenance status
if tracker.IsMaintenanceActive() {
	maintenanceEventID := tracker.GetMaintenanceEventID()
}

Maintenance Mode

When a check named maintenance becomes unhealthy:

A maintenance event ID is created
All new failures use this event ID (correlating them)
When maintenance ends, checks keep the event ID until they recover
This groups all maintenance-related alerts together

Use the built-in file-based maintenance checker:

import "github.com/bretep/health-go/v6/checks/maintenance"

h, _ := health.New()

// Register the maintenance check - MUST be named "maintenance" for event correlation
h.Register(health.CheckConfig{
	Name:                   "maintenance",
	Interval:               time.Second, // file check is cheap, respond quickly
	SuccessesBeforePassing: 1,           // exit maintenance immediately when file removed
	Check: maintenance.New(maintenance.Config{
		File:   "/var/lib/myservice/maintenance", // persistent path survives reboots
		Health: h,                                 // Optional: enables notification control
	}),
})

Entering maintenance mode:

# Simple maintenance
echo "Database upgrade in progress" > /var/lib/myservice/maintenance

# With notification suppression for 1 hour
echo "Scheduled maintenance
HEALTH_GO_DISABLE_NOTIFICATIONS_3600" > /var/lib/myservice/maintenance

# Suppress notifications indefinitely
echo "HEALTH_GO_DISABLE_NOTIFICATIONS" > /var/lib/myservice/maintenance

Exiting maintenance mode:

rm /var/lib/myservice/maintenance

When the file is removed, the check passes and notifications are automatically re-enabled.

State Persistence

By default, health check state is lost when your process restarts. This means:

Event IDs reset, breaking alert correlation
Success/failure counters reset, causing re-initialization delays
Action cooldowns reset, potentially triggering duplicate alerts

State persistence solves this by saving state to durable storage.

Using the SQLite Persister

The built-in SQLite persister provides zero-configuration persistence:

import (
	"github.com/bretep/health-go/v6"
	"github.com/bretep/health-go/v6/persister/sqlite"
)

// Create the persister
persister, err := sqlite.New(sqlite.Config{
	Path: "/var/lib/myapp/health-state.db",
	// Optional: customize debounce interval (default: 1s)
	// DebounceInterval: 500 * time.Millisecond,
})
if err != nil {
	log.Fatal(err)
}
defer persister.Close()

// Create health checker with persistence
h, err := health.New(
	health.WithStatePersister(persister),
)

What gets persisted:

Component	State
EventTracker	Event IDs, sequences, maintenance state
StatusUpdater	Success/failure counts, pending event IDs
CheckStatus	Current status and error message
ActionRunner	Status, per-action last run times and cooldowns

Design features:

Async saves with debouncing - State changes are batched (default 1s) to avoid disk I/O on every check
Soft failures - Persistence errors are logged but don't fail health checks
WAL mode - SQLite uses write-ahead logging for better concurrent access
Automatic restore - State is loaded automatically when health.New() is called

Saving State on Shutdown

For graceful shutdown, explicitly save state to ensure the latest changes are persisted:

// Set up signal handling
sigCh := make(chan os.Signal, 1)
signal.Notify(sigCh, syscall.SIGINT, syscall.SIGTERM)

go func() {
	<-sigCh
	log.Println("Shutting down, saving state...")
	h.SaveState(context.Background())
	os.Exit(0)
}()

Implementing a Custom Persister

For other storage backends (Redis, PostgreSQL, S3, etc.), implement the StatePersister interface:

type StatePersister interface {
	SaveEventTrackerState(ctx context.Context, state *EventTrackerState) error
	LoadEventTrackerState(ctx context.Context) (*EventTrackerState, error)
	SaveCheckState(ctx context.Context, checkName string, state *CheckState) error
	LoadCheckState(ctx context.Context, checkName string) (*CheckState, error)
	LoadAllCheckStates(ctx context.Context) (map[string]*CheckState, error)
	DeleteCheckState(ctx context.Context, checkName string) error
	Close() error
}

See _examples/custom_persister.go for a complete file-based implementation example.

When to Use Persistence

Scenario	Recommendation
Short-lived processes (serverless, batch jobs)	Skip persistence
Long-running services with infrequent restarts	Optional
Services with action cooldowns you want preserved	Recommended
Services where alert correlation across restarts matters	Recommended
High-availability setups with rolling deploys	Recommended

Pause/Resume Checks

Dynamically control individual checks:

h.Register(health.CheckConfig{
	Name:  "database",
	Check: myCheck,
})

// Get the check config
check := h.checks["database"]

// Pause the check (stops running)
check.Pause()

// Resume the check (starts running again)
check.Start()

Custom Check Functions

func myCustomCheck(ctx context.Context) health.CheckResponse {
	// Perform health check logic
	err := checkSomething()

	if err != nil {
		return health.CheckResponse{
			Error:     err,
			IsWarning: false,  // true = Warning, false = Critical
		}
	}

	return health.CheckResponse{}  // Healthy
}

h.Register(health.CheckConfig{
	Name:  "custom",
	Check: myCustomCheck,
})

Disabling Notifications Per-Response

func myCheck(ctx context.Context) health.CheckResponse {
	// Don't send notification for this specific response
	return health.CheckResponse{
		Error:          errors.New("expected transient error"),
		NoNotification: true,
	}
}

HTTP Handlers

Standard Handler

http.Handle("/health", h.Handler())

HandlerFunc

// Works with any router
r := chi.NewRouter()
r.Get("/health", h.HandlerFunc)

// Or with gorilla/mux
r := mux.NewRouter()
r.HandleFunc("/health", h.HandlerFunc)

Response Format

Healthy (200 OK):

{
  "status": "passing",
  "timestamp": "2024-01-15T10:30:00.000Z",
  "system": {
    "version": "go1.25.0",
    "goroutines_count": 12,
    "total_alloc_bytes": 1234567,
    "heap_objects_count": 5678,
    "alloc_bytes": 234567
  },
  "component": {
    "name": "myservice",
    "version": "v1.0"
  }
}

Unhealthy (503 Service Unavailable):

{
  "status": "critical",
  "timestamp": "2024-01-15T10:30:00.000Z",
  "failures": {
    "database": "connection refused",
    "redis": "timeout after 2s"
  },
  "system": { ... },
  "component": { ... }
}

Options

h, err := health.New(
	// Add component metadata
	health.WithComponent(health.Component{
		Name:    "api-server",
		Version: "v2.1.0",
	}),

	// Include Go runtime metrics in response
	health.WithSystemInfo(),

	// Limit concurrent check execution
	health.WithMaxConcurrent(4),

	// Add OpenTelemetry tracing
	health.WithTracerProvider(tp, "health-checks"),

	// Enable state persistence (see State Persistence section)
	health.WithStatePersister(persister),

	// Register checks at creation
	health.WithChecks(
		health.CheckConfig{Name: "db", Check: dbCheck},
		health.CheckConfig{Name: "cache", Check: cacheCheck},
	),
)

Using Built-in Checkers

import (
	"github.com/bretep/health-go/v6"
	"github.com/bretep/health-go/v6/checks/http"
	"github.com/bretep/health-go/v6/checks/maintenance"
	"github.com/bretep/health-go/v6/checks/mysql"
	"github.com/bretep/health-go/v6/checks/postgres"
	"github.com/bretep/health-go/v6/checks/redis"
)

// HTTP endpoint check
h.Register(health.CheckConfig{
	Name:    "external-api",
	Timeout: time.Second * 5,
	Check: http.New(http.Config{
		URL:            "https://api.example.com/health",
		RequestTimeout: time.Second * 3,
	}),
})

// MySQL check
h.Register(health.CheckConfig{
	Name: "mysql",
	Check: mysql.New(mysql.Config{
		DSN: "user:pass@tcp(localhost:3306)/mydb",
	}),
})

// PostgreSQL check
h.Register(health.CheckConfig{
	Name: "postgres",
	Check: postgres.New(postgres.Config{
		DSN: "postgres://user:pass@localhost:5432/mydb?sslmode=disable",
	}),
})

// Redis check
h.Register(health.CheckConfig{
	Name: "redis",
	Check: redis.New(redis.Config{
		DSN: "redis://localhost:6379",
	}),
})

Testing

The library includes comprehensive tests with real service integrations:

# Run unit tests
go test ./...

# Run with race detector
go test -race ./...

# Run with coverage
go test -coverprofile=coverage.out ./...

# Run integration tests (requires Docker)
docker compose up -d
go test -race ./...
docker compose down

Examples

See the _examples directory for complete examples:

Example	Description
server.go	Basic usage with multiple check types
server_with_persistence.go	Using SQLite persister for state persistence
custom_persister.go	Implementing a custom `StatePersister`

Migration from v5

Update import paths:

// Old
import "github.com/bretep/health-go/v5"

// New
import "github.com/bretep/health-go/v6"

Update Redis import if using the Redis checker:

// The redis client package was renamed upstream
// No code changes needed, just `go mod tidy`

Ensure Go 1.25+ is installed
Run go mod tidy to update dependencies

Contributing

Fork it
Create your feature branch (git checkout -b my-new-feature)
Commit your changes (git commit -am 'Add some feature')
Push to the branch (git push origin my-new-feature)
Create new Pull Request

Development

# Install dependencies
go mod download

# Run linter
golangci-lint run ./...

# Run tests with Docker services
docker compose up -d
go test -race -cover ./...
docker compose down

License

This project is licensed under the MIT License - see the LICENSE file for details.

This project is a fork of github.com/hellofresh/health-go with additional features. See NOTICE for attribution details.

Documentation ¶

Index ¶

type Action
- func (a Action) Run(message string) (notification CheckNotification)
type ActionRunner
- func NewActionRunner(checkName string, ...) *ActionRunner
- func (a *ActionRunner) Failure(message string, eventID string)
- func (a *ActionRunner) GetState() *ActionRunnerState
- func (a *ActionRunner) RestoreState(state *ActionRunnerState)
- func (a *ActionRunner) Success(message string, eventID string)
- func (a *ActionRunner) Timeout(message string, eventID string)
- func (a *ActionRunner) Warning(message string, eventID string)
type ActionRunnerState
type ActionState
type Check
type CheckConfig
- func (c *CheckConfig) Pause()
- func (c *CheckConfig) Start()
type CheckFunc
type CheckNotification
type CheckResponse
type CheckState
type CheckStatus
- func (s *CheckStatus) Get() (status Status, err error)
- func (s *CheckStatus) Update(status Status, err error)
type Component
type EventTracker
- func NewEventTracker() *EventTracker
- func (t *EventTracker) ActiveEvents() map[string]string
- func (t *EventTracker) ClearSequence(eventID string)
- func (t *EventTracker) GetEventID(checkName string) string
- func (t *EventTracker) GetMaintenanceEventID() string
- func (t *EventTracker) GetNextSequence(eventID string) int
- func (t *EventTracker) GetOrCreateEventID(checkName string, status Status) string
- func (t *EventTracker) GetState() *EventTrackerState
- func (t *EventTracker) IsMaintenanceActive() bool
- func (t *EventTracker) RestoreState(state *EventTrackerState)
type EventTrackerState
type Health
- func New(opts ...Option) (*Health, error)
- func (h *Health) Handler() http.Handler
- func (h *Health) HandlerFunc(w http.ResponseWriter, r *http.Request)
- func (h *Health) NotificationsDisable(suppressTime time.Duration)
- func (h *Health) NotificationsEnable()
- func (h *Health) NotificationsEnabled() bool
- func (h *Health) Persister() StatePersister
- func (h *Health) Register(c CheckConfig) error
- func (h *Health) SaveState(ctx context.Context)
- func (h *Health) Status(ctx context.Context) Check
- func (h *Health) Subscribe() (c <-chan CheckNotification)
type NoopPersister
- func NewNoopPersister() *NoopPersister
- func (p *NoopPersister) Close() error
- func (p *NoopPersister) DeleteCheckState(_ context.Context, _ string) error
- func (p *NoopPersister) LoadAllCheckStates(_ context.Context) (map[string]*CheckState, error)
- func (p *NoopPersister) LoadCheckState(_ context.Context, _ string) (*CheckState, error)
- func (p *NoopPersister) LoadEventTrackerState(_ context.Context) (*EventTrackerState, error)
- func (p *NoopPersister) SaveCheckState(_ context.Context, _ string, _ *CheckState) error
- func (p *NoopPersister) SaveEventTrackerState(_ context.Context, _ *EventTrackerState) error
type Notifications
- func NewNotificationSender(channel chan CheckNotification) *Notifications
- func (n *Notifications) Disable(disableDuration time.Duration)
- func (n *Notifications) Enable()
- func (n *Notifications) Enabled() bool
- func (n *Notifications) Send(notification CheckNotification)
type Option
- func WithChecks(checks ...CheckConfig) Option
- func WithComponent(component Component) Option
- func WithMaxConcurrent(n int) Option
- func WithStatePersister(p StatePersister) Option
- func WithSystemInfo() Option
- func WithTracerProvider(tp trace.TracerProvider, instrumentationName string) Option
type StatePersister
type Status
type StatusUpdater
- func NewStatusUpdater(successesBeforePassing, failuresBeforeWarning, failuresBeforeCritical int, ...) *StatusUpdater
- func NewStatusUpdaterSilent(successesBeforePassing, failuresBeforeWarning, failuresBeforeCritical int, ...) *StatusUpdater
- func (s *StatusUpdater) GetState() *CheckState
- func (s *StatusUpdater) RestoreState(state *CheckState)
type System

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

This section is empty.

Types ¶

type Action ¶

type Action struct {
	Command                string
	UnlockAfterDuration    time.Duration
	UnlockOnlyAfterHealthy bool
	SendCommandOutput      bool
	// Notifiers list of enabled notifiers
	Notifiers []string
	// contains filtered or unexported fields
}

Action contains configuration for running an action

func (Action) Run ¶

func (a Action) Run(message string) (notification CheckNotification)

Run executes the action command

type ActionRunner ¶

type ActionRunner struct {
	// contains filtered or unexported fields
}

ActionRunner keeps track of a checks actions

func NewActionRunner ¶

func NewActionRunner(checkName string, successAction, warningAction, failureAction, timeoutAction *Action, notifications *Notifications, eventTracker *EventTracker) *ActionRunner

NewActionRunner returns a new ActionRunner

func (*ActionRunner) Failure ¶

func (a *ActionRunner) Failure(message string, eventID string)

Failure handles a failed check result

func (*ActionRunner) GetState ¶ added in v6.0.1

func (a *ActionRunner) GetState() *ActionRunnerState

GetState returns a snapshot of the ActionRunner state for persistence.

func (*ActionRunner) RestoreState ¶ added in v6.0.1

func (a *ActionRunner) RestoreState(state *ActionRunnerState)

RestoreState restores the ActionRunner state from a persisted snapshot.

func (*ActionRunner) Success ¶

func (a *ActionRunner) Success(message string, eventID string)

Success handles a successful check result

func (*ActionRunner) Timeout ¶

func (a *ActionRunner) Timeout(message string, eventID string)

Timeout handles a timed out check result

func (*ActionRunner) Warning ¶

func (a *ActionRunner) Warning(message string, eventID string)

Warning handles a warning check result

type ActionRunnerState ¶ added in v6.0.1

type ActionRunnerState struct {
	// Status is the current action runner status
	Status Status `json:"status"`

	// Per-action state
	SuccessAction *ActionState `json:"success_action,omitempty"`
	WarningAction *ActionState `json:"warning_action,omitempty"`
	FailureAction *ActionState `json:"failure_action,omitempty"`
	TimeoutAction *ActionState `json:"timeout_action,omitempty"`
}

ActionRunnerState represents the persistable state of an ActionRunner.

type ActionState ¶ added in v6.0.1

type ActionState struct {
	// LastRun is when the action was last executed
	LastRun time.Time `json:"last_run"`

	// CanRun indicates whether the action is eligible to run
	CanRun bool `json:"can_run"`
}

ActionState represents the persistable state of an Action.

type Check ¶

type Check struct {
	// Status is the check.
	Status Status `json:"check"`
	// Timestamp is the time in which the check occurred.
	Timestamp time.Time `json:"timestamp"`
	// Failures holds the failed checks along with their messages.
	Failures map[string]string `json:"failures,omitempty"`
	// System holds information of the go process.
	*System `json:"system,omitempty"`
	// Component holds information on the component for which checks are made
	Component `json:"component"`
}

Check represents the health check response.

type CheckConfig ¶

type CheckConfig struct {
	// Name is the name of the resource to be checked.
	Name string
	// Interval is how often the health check should run
	Interval time.Duration
	// Timeout is the timeout defined for every check.
	Timeout time.Duration
	// SkipOnErr if set to true, it will retrieve StatusPassing providing the error message from the failed resource.
	SkipOnErr bool
	// Check is the func which executes the check.
	Check CheckFunc
	// Status
	Status *StatusUpdater
	// SuccessesBeforePassing number of passing checks before reporting as passing
	SuccessesBeforePassing int
	// FailuresBeforeWarning number of passing checks before reporting as warning
	FailuresBeforeWarning int
	// FailuresBeforeCritical number of passing checks before reporting as critical
	FailuresBeforeCritical int
	// SuccessAction configuration
	SuccessAction *Action
	// WarningAction configuration
	WarningAction *Action
	// FailureAction configuration
	FailureAction *Action
	// TimeoutAction configuration
	TimeoutAction *Action
	// Notifiers list of enabled notifiers
	Notifiers []string
	// contains filtered or unexported fields
}

CheckConfig carries the parameters to run the check.

func (*CheckConfig) Pause ¶

func (c *CheckConfig) Pause()

Pause a health check

func (*CheckConfig) Start ¶

func (c *CheckConfig) Start()

Start a health check

type CheckFunc ¶

type CheckFunc func(ctx context.Context) CheckResponse

CheckFunc is the func which executes the check.

type CheckNotification ¶

type CheckNotification struct {
	Name       string
	Message    string
	Attachment []byte
	Tags       []string
	Notifiers  []string
	EventID    string // Event ID for correlating related alerts during an incident
	Sequence   int    // Sequence number within the event (1, 2, 3...)
}

CheckNotification represents a notification sent when check status changes.

type CheckResponse ¶

type CheckResponse struct {
	// Error message
	Error error

	// IsWarning if set to true, it will retrieve StatusPassing providing the error message from the failed resource.
	IsWarning bool

	// NoNotification disables a notification for this response
	NoNotification bool
}

CheckResponse is returned by a check function.

type CheckState ¶ added in v6.0.1

type CheckState struct {
	// StatusUpdater state
	Successes      int    `json:"successes"`
	Failures       int    `json:"failures"`
	PendingEventID string `json:"pending_event_id"`

	// CheckStatus state
	Status   Status `json:"status"`
	ErrorMsg string `json:"error_msg"`

	// ActionRunner state
	ActionRunnerState *ActionRunnerState `json:"action_runner_state,omitempty"`

	// UpdatedAt is when this state was last updated
	UpdatedAt time.Time `json:"updated_at"`
}

CheckState represents the persistable state of a health check.

type CheckStatus ¶

type CheckStatus struct {
	// Status is the check.
	Status Status
	// Error informational message about the Status
	Error error
	// contains filtered or unexported fields
}

CheckStatus holds the current status of a check.

func (*CheckStatus) Get ¶

func (s *CheckStatus) Get() (status Status, err error)

Get check of running check

func (*CheckStatus) Update ¶

func (s *CheckStatus) Update(status Status, err error)

Update check of running check

type Component ¶

type Component struct {
	// Name is the name of the component.
	Name string `json:"name"`
	// Version is the component version.
	Version string `json:"version"`
}

Component descriptive values about the component for which checks are made

type EventTracker ¶

type EventTracker struct {
	// contains filtered or unexported fields
}

EventTracker manages event IDs for health check incidents. An event starts when a check transitions from healthy to unhealthy, and ends when it returns to healthy.

Special handling for maintenance mode: - When maintenance check becomes unhealthy, a maintenance event_id is created - All checks that fail during maintenance use the maintenance event_id - When maintenance ends, checks that are still unhealthy keep the maintenance event_id - Only when each individual check becomes healthy does it clear its event_id

func NewEventTracker ¶

func NewEventTracker() *EventTracker

NewEventTracker creates a new event tracker

func (*EventTracker) ActiveEvents ¶

func (t *EventTracker) ActiveEvents() map[string]string

ActiveEvents returns a copy of all active events

func (*EventTracker) ClearSequence ¶

func (t *EventTracker) ClearSequence(eventID string)

ClearSequence removes sequence tracking for an event (call when event ends)

func (*EventTracker) GetEventID ¶

func (t *EventTracker) GetEventID(checkName string) string

GetEventID returns the current event ID for a check without modifying state. Returns empty string if no active event.

func (*EventTracker) GetMaintenanceEventID ¶

func (t *EventTracker) GetMaintenanceEventID() string

GetMaintenanceEventID returns the current maintenance event ID

func (*EventTracker) GetNextSequence ¶

func (t *EventTracker) GetNextSequence(eventID string) int

GetNextSequence returns the next sequence number for an event and increments counter. Returns 0 if eventID is empty.

func (*EventTracker) GetOrCreateEventID ¶

func (t *EventTracker) GetOrCreateEventID(checkName string, status Status) string

GetOrCreateEventID returns the active event ID for a check, creating a new one if none exists (for failure/warning notifications). For success notifications, this returns the event_id that was active (so downstream systems know which event ended) and clears it from the tracker.

Special maintenance behavior: - If this is the maintenance check becoming unhealthy, creates maintenance event_id - If maintenance is active and another check fails, uses maintenance event_id - If maintenance ends but a check is still unhealthy, keeps maintenance event_id

func (*EventTracker) GetState ¶ added in v6.0.1

func (t *EventTracker) GetState() *EventTrackerState

GetState returns a snapshot of the EventTracker state for persistence.

func (*EventTracker) IsMaintenanceActive ¶

func (t *EventTracker) IsMaintenanceActive() bool

IsMaintenanceActive returns whether maintenance mode is currently active

func (*EventTracker) RestoreState ¶ added in v6.0.1

func (t *EventTracker) RestoreState(state *EventTrackerState)

RestoreState restores the EventTracker state from a persisted snapshot. This should be called before any checks are registered.

type EventTrackerState ¶ added in v6.0.1

type EventTrackerState struct {
	// EventIDs maps check names to their active event IDs
	EventIDs map[string]string `json:"event_ids"`

	// Sequences maps event IDs to their current sequence numbers
	Sequences map[string]int `json:"sequences"`

	// MaintenanceEventID is the current maintenance event ID (empty if not in maintenance)
	MaintenanceEventID string `json:"maintenance_event_id"`

	// MaintenanceActive indicates whether maintenance mode is currently active
	MaintenanceActive bool `json:"maintenance_active"`

	// MaintenanceChecks tracks checks that started failing during maintenance
	MaintenanceChecks map[string]bool `json:"maintenance_checks"`

	// UpdatedAt is when this state was last updated
	UpdatedAt time.Time `json:"updated_at"`
}

EventTrackerState represents the persistable state of an EventTracker.

type Health ¶

type Health struct {
	NotificationsSender *Notifications
	EventTracker        *EventTracker
	// contains filtered or unexported fields
}

Health is the health-checks container

func New ¶

func New(opts ...Option) (*Health, error)

New instantiates and build new health check container

func (*Health) Handler ¶

func (h *Health) Handler() http.Handler

Handler returns an HTTP handler (http.HandlerFunc).

func (*Health) HandlerFunc ¶

func (h *Health) HandlerFunc(w http.ResponseWriter, r *http.Request)

HandlerFunc is the HTTP handler function.

func (*Health) NotificationsDisable ¶

func (h *Health) NotificationsDisable(suppressTime time.Duration)

NotificationsDisable disables notification

func (*Health) NotificationsEnable ¶

func (h *Health) NotificationsEnable()

NotificationsEnable enables notifications

func (*Health) NotificationsEnabled ¶

func (h *Health) NotificationsEnabled() bool

NotificationsEnabled enables notifications

func (*Health) Persister ¶ added in v6.0.1

func (h *Health) Persister() StatePersister

Persister returns the configured state persister. Returns nil if no persister is configured (using NoopPersister).

func (*Health) Register ¶

func (h *Health) Register(c CheckConfig) error

Register registers a check config to be performed.

func (*Health) SaveState ¶ added in v6.0.1

func (h *Health) SaveState(ctx context.Context)

SaveState persists the current state of all checks and the event tracker. This can be called periodically or before shutdown to ensure state is saved. Errors are logged but not returned - persistence failures should not impact health checks.

func (*Health) Status ¶

func (h *Health) Status(ctx context.Context) Check

Status returns that check of the overall health checks

func (h *Health) Subscribe() (c <-chan CheckNotification)

Subscribe returns a channel for receiving health check notifications

type NoopPersister ¶ added in v6.0.1

type NoopPersister struct{}

NoopPersister is a no-op implementation of StatePersister. It performs no persistence and is the default when no persister is configured.

func NewNoopPersister ¶ added in v6.0.1

func NewNoopPersister() *NoopPersister

NewNoopPersister creates a new no-op persister.

func (*NoopPersister) Close ¶ added in v6.0.1

func (p *NoopPersister) Close() error

Close is a no-op.

func (*NoopPersister) DeleteCheckState ¶ added in v6.0.1

func (p *NoopPersister) DeleteCheckState(_ context.Context, _ string) error

DeleteCheckState is a no-op.

func (*NoopPersister) LoadAllCheckStates ¶ added in v6.0.1

func (p *NoopPersister) LoadAllCheckStates(_ context.Context) (map[string]*CheckState, error)

LoadAllCheckStates returns an empty map.

func (*NoopPersister) LoadCheckState ¶ added in v6.0.1

func (p *NoopPersister) LoadCheckState(_ context.Context, _ string) (*CheckState, error)

LoadCheckState returns nil (no persisted state).

func (*NoopPersister) LoadEventTrackerState ¶ added in v6.0.1

func (p *NoopPersister) LoadEventTrackerState(_ context.Context) (*EventTrackerState, error)

LoadEventTrackerState returns nil (no persisted state).

func (*NoopPersister) SaveCheckState ¶ added in v6.0.1

func (p *NoopPersister) SaveCheckState(_ context.Context, _ string, _ *CheckState) error

SaveCheckState is a no-op.

func (*NoopPersister) SaveEventTrackerState ¶ added in v6.0.1

func (p *NoopPersister) SaveEventTrackerState(_ context.Context, _ *EventTrackerState) error

SaveEventTrackerState is a no-op.

type Notifications ¶

type Notifications struct {
	// contains filtered or unexported fields
}

Notifications manages publishing notifications

func NewNotificationSender ¶

func NewNotificationSender(channel chan CheckNotification) *Notifications

NewNotificationSender for sending notifications

func (*Notifications) Disable ¶

func (n *Notifications) Disable(disableDuration time.Duration)

Disable notifications

func (*Notifications) Enable ¶

func (n *Notifications) Enable()

Enable notifications

func (*Notifications) Enabled ¶

func (n *Notifications) Enabled() bool

Enabled check to see if notifications are enabled

func (*Notifications) Send ¶

func (n *Notifications) Send(notification CheckNotification)

Send notifications

type Option ¶

type Option func(*Health) error

Option is the health-container options type

func WithChecks ¶

func WithChecks(checks ...CheckConfig) Option

WithChecks adds checks to newly instantiated health-container

func WithComponent ¶

func WithComponent(component Component) Option

WithComponent sets the component description of the component to which this check refer

func WithMaxConcurrent ¶

func WithMaxConcurrent(n int) Option

WithMaxConcurrent sets max number of concurrently running checks. Set to 1 if want to run all checks sequentially.

func WithStatePersister ¶ added in v6.0.1

func WithStatePersister(p StatePersister) Option

WithStatePersister sets the state persister for persisting health check state across process restarts. If not set, a no-op persister is used (no persistence).

func WithSystemInfo ¶

func WithSystemInfo() Option

WithSystemInfo enables the option to return system information about the go process.

func WithTracerProvider ¶

func WithTracerProvider(tp trace.TracerProvider, instrumentationName string) Option

WithTracerProvider sets trace provider for the checks and instrumentation name that will be used for tracer from trace provider.

type StatePersister ¶ added in v6.0.1

type StatePersister interface {
	// SaveEventTrackerState persists the event tracker state.
	SaveEventTrackerState(ctx context.Context, state *EventTrackerState) error

	// LoadEventTrackerState loads the persisted event tracker state.
	// Returns nil, nil if no state exists.
	LoadEventTrackerState(ctx context.Context) (*EventTrackerState, error)

	// SaveCheckState persists the state for a single check.
	SaveCheckState(ctx context.Context, checkName string, state *CheckState) error

	// LoadCheckState loads the persisted state for a single check.
	// Returns nil, nil if no state exists for the check.
	LoadCheckState(ctx context.Context, checkName string) (*CheckState, error)

	// LoadAllCheckStates loads all persisted check states.
	LoadAllCheckStates(ctx context.Context) (map[string]*CheckState, error)

	// DeleteCheckState removes persisted state for a check.
	DeleteCheckState(ctx context.Context, checkName string) error

	// Close releases any resources held by the persister.
	Close() error
}

StatePersister defines the interface for persisting health check state. Implementations should be safe for concurrent use.

type Status ¶

type Status string

Status type represents health check

const (
	// StatusPassing healthcheck is passing
	StatusPassing Status = "passing"
	// StatusWarning healthcheck is failing but should not fail the component
	StatusWarning Status = "warning"
	// StatusCritical healthcheck is failing should fail the component
	StatusCritical Status = "critical"
	// StatusTimeout healthcheck timed out should fail the component
	StatusTimeout Status = "timeout"
	// StatusInitializing healthcheck is starting up and has not meet the passing threshold
	StatusInitializing Status = "initializing"
	// MinimumInterval is the minimum time between checks
	// to prevent fork bombing a system
	MinimumInterval = time.Second
)

type StatusUpdater ¶

type StatusUpdater struct {
	// contains filtered or unexported fields
}

StatusUpdater keeps track of a checks status

func NewStatusUpdater ¶

func NewStatusUpdater(successesBeforePassing, failuresBeforeWarning, failuresBeforeCritical int, actionRunner *ActionRunner, notifications *Notifications, notifiers []string, eventTracker *EventTracker) *StatusUpdater

NewStatusUpdater returns a new StatusUpdater that is in critical condition. It sends an "initializing" notification to indicate the check is starting up. Use NewStatusUpdaterSilent when restoring from persisted state to avoid spurious notifications.

func NewStatusUpdaterSilent ¶ added in v6.0.2

func NewStatusUpdaterSilent(successesBeforePassing, failuresBeforeWarning, failuresBeforeCritical int, actionRunner *ActionRunner, notifications *Notifications, notifiers []string, eventTracker *EventTracker) *StatusUpdater

NewStatusUpdaterSilent returns a new StatusUpdater without sending the "initializing" notification. This should be used when restoring state from persistence to avoid sending spurious notifications that would incorrectly resolve or duplicate existing alerts.

func (*StatusUpdater) GetState ¶ added in v6.0.1

func (s *StatusUpdater) GetState() *CheckState

GetState returns a snapshot of the StatusUpdater state for persistence.

func (*StatusUpdater) RestoreState ¶ added in v6.0.1

func (s *StatusUpdater) RestoreState(state *CheckState)

RestoreState restores the StatusUpdater state from a persisted snapshot.

type System ¶

type System struct {
	// Version is the go version.
	Version string `json:"version"`
	// GoroutinesCount is the number of the current goroutines.
	GoroutinesCount int `json:"goroutines_count"`
	// TotalAllocBytes is the total bytes allocated.
	TotalAllocBytes int `json:"total_alloc_bytes"`
	// HeapObjectsCount is the number of objects in the go heap.
	HeapObjectsCount int `json:"heap_objects_count"`
	// TotalAllocBytes is the bytes allocated and not yet freed.
	AllocBytes int `json:"alloc_bytes"`
}

System runtime variables about the go process.

Source Files ¶

View all Source files

Directories ¶

Path	Synopsis
_examples
checks
cassandra
grpc
http
influxdb Package influxdb implements a health check for InfluxDB instance.	Package influxdb implements a health check for InfluxDB instance.
maintenance Package maintenance implements a file-based maintenance mode check.	Package maintenance implements a file-based maintenance mode check.
memcached
mongo
mysql
nats
pgx4
pgx5
postgres
rabbitmq
redis
persister
sqlite Package sqlite provides a SQLite-based implementation of the health.StatePersister interface.	Package sqlite provides a SQLite-based implementation of the health.StatePersister interface.

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL