vision

package

v0.0.0-...-23baa5b Latest Latest Go to latest Published: Mar 2, 2026 License: AGPL-3.0 Imports: 39 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/photoprism/photoprism

Links

Open Source Insights

README ¶

PhotoPrism — Vision Package

Last Updated: February 23, 2026

Overview

internal/ai/vision provides the shared model registry, request builders, and parsers that power PhotoPrism’s caption, label, face, NSFW, and future generate workflows. It reads vision.yml, normalizes models, and dispatches calls to one of three engines:

TensorFlow (built‑in) — default Nasnet / NSFW / Facenet models, no remote service required. Long-running TensorFlow inference can accumulate C-allocated tensor memory until GC finalizers run, so PhotoPrism periodically triggers garbage collection to return that memory to the OS; tune with PHOTOPRISM_TF_GC_EVERY (default 200, 0 disables). Lower values reduce peak RSS but increase GC overhead and can slow indexing, so keep the default unless memory pressure is severe.
Ollama — local or proxied multimodal LLMs. See ollama/README.md for tuning and schema details. The engine defaults to ${OLLAMA_BASE_URL:-http://ollama:11434}/api/generate, trimming any trailing slash on the base URL; set OLLAMA_BASE_URL=https://ollama.com to opt into cloud defaults.
OpenAI — cloud Responses API. See openai/README.md for prompts, schema variants, and header requirements.

Configuration

Models

The vision.yml file is usually kept in the storage/config directory (override with PHOTOPRISM_VISION_YAML). It defines a list of models under Models:. Key fields are captured below. If a type is omitted entirely, PhotoPrism will auto-append the built-in defaults (labels, nsfw, face, caption) so you no longer need placeholder stanzas. The Thresholds block is optional; missing or out-of-range values fall back to defaults.

Field	Default	Notes
`Type` (required)	—	`labels`, `caption`, `face`, `nsfw`, `generate`. Drives routing & scheduling.
`Name`	derived from type/version	Display name; lower-cased by helpers.
`Model`	`""`	Raw identifier override; precedence: `Service.Model` → `Model` → `Name`.
`Version`	`latest` (non-OpenAI)	OpenAI payloads omit version.
`Engine`	inferred from service/alias	Aliases set formats, file scheme, resolution. Explicit `Service` values still win.
`Run`	`auto`	See Run modes table below.
`Default`	`false`	Keep one per type for TensorFlow fallbacks.
`Disabled`	`false`	Registered but inactive.
`Resolution`	224 (TensorFlow) / 720 (Ollama/OpenAI)	Thumbnail edge in px; TensorFlow models default to 224 unless you override.
`System` / `Prompt`	engine defaults	Override prompts per model.
`Format`	`""`	Response hint (`json`, `text`, `markdown`).
`Schema` / `SchemaFile`	engine defaults / empty	Inline vs file JSON schema (labels).
`TensorFlow`	nil	Local TF model info (paths, tags).
`Options`	nil	Sampling/settings merged with engine defaults.
`Service`	nil	Remote endpoint config (see below).

Run Modes

Value	When it runs	Recommended use
`auto`	TensorFlow defaults during index; external via metadata/schedule	Leave as-is for most setups.
`manual`	Only when explicitly invoked (CLI/API)	Experiments and diagnostics.
`on-index`	During indexing + manual	Fast built-in models only.
`newly-indexed`	Metadata worker after indexing + manual	External/Ollama/OpenAI without slowing import.
`on-demand`	Manual, metadata worker, and scheduled jobs	Broad coverage without index path.
`on-schedule`	Scheduled jobs + manual	Nightly/cron-style runs.
`always`	Indexing, metadata, scheduled, manual	High-priority models; watch resource use.
`never`	Never executes	Keep definition without running it.

Note: For performance reasons, on-index is only supported for the built-in TensorFlow models.

Model Options

The model Options adjust model parameters such as temperature, top-p, and schema constraints when using Ollama or OpenAI. Rows are ordered exactly as defined in vision/model_options.go.

Option	Engines	Default	Description
`Temperature`	Ollama, OpenAI	engine default	Controls randomness with a value between `0.01` and `2.0`; not used for OpenAI's GPT-5.
`TopK`	Ollama	engine default	Limits sampling to the top K tokens to reduce rare or noisy outputs.
`TopP`	Ollama, OpenAI	engine default	Nucleus sampling; keeps the smallest token set whose cumulative probability ≥ `p`.
`MinP`	Ollama	engine default	Drops tokens whose probability mass is below `p`, trimming the long tail.
`TypicalP`	Ollama	engine default	Keeps tokens with typicality under the threshold; combine with TopP/MinP for flow.
`TfsZ`	Ollama	engine default	Tail free sampling parameter; lower values reduce repetition.
`Seed`	Ollama	random per run	Fix for reproducible outputs; unset for more variety between runs.
`NumKeep`	Ollama	engine default	How many tokens to keep from the prompt before sampling starts.
`RepeatLastN`	Ollama	engine default	Number of recent tokens considered for repetition penalties.
`RepeatPenalty`	Ollama	engine default	Multiplier >1 discourages repeating the same tokens or phrases.
`PresencePenalty`	OpenAI	engine default	Increases the likelihood of introducing new tokens by penalizing existing ones.
`FrequencyPenalty`	OpenAI	engine default	Penalizes tokens in proportion to their frequency so far.
`PenalizeNewline`	Ollama	engine default	Whether to apply repetition penalties to newline tokens.
`Stop`	Ollama, OpenAI	engine default	Array of stop sequences (e.g., `["\\n\\n"]`).
`Mirostat`	Ollama	engine default	Enables Mirostat sampling (`0` off, `1/2` modes).
`MirostatTau`	Ollama	engine default	Controls surprise target for Mirostat sampling.
`MirostatEta`	Ollama	engine default	Learning rate for Mirostat adaptation.
`NumPredict`	Ollama	engine default	Ollama-specific max output tokens; synonymous intent with `MaxOutputTokens`.
`MaxOutputTokens`	Ollama, OpenAI	engine default	Upper bound on generated tokens; adapters raise low values to defaults.
`ForceJson`	Ollama, OpenAI	engine default	Forces structured output when enabled.
`SchemaVersion`	Ollama, OpenAI	derived from schema	Override when coordinating schema migrations.
`CombineOutputs`	OpenAI	engine default	Controls whether multi-output models combine results automatically.
`Detail`	OpenAI	engine default	Controls OpenAI vision detail level (`low`, `high`, `auto`).
`NumCtx`	Ollama, OpenAI	engine default	Context window length (tokens).
`NumThread`	Ollama	runtime auto	Caps CPU threads for local engines.
`NumBatch`	Ollama	engine default	Batch size for prompt processing.
`NumGpu`	Ollama	engine default	Number of GPUs to distribute work across.
`MainGpu`	Ollama	engine default	Primary GPU index when multiple GPUs are present.
`LowVram`	Ollama	engine default	Enable VRAM-saving mode; may reduce performance.
`VocabOnly`	Ollama	engine default	Load vocabulary only for quick metadata inspection.
`UseMmap`	Ollama	engine default	Memory map model weights instead of fully loading them.
`UseMlock`	Ollama	engine default	Lock model weights in RAM to reduce paging.
`Numa`	Ollama	engine default	Enable NUMA-aware allocations when available.

Model Service

Configures the endpoint URL, method, format, and authentication for Ollama, OpenAI, and other engines that perform remote HTTP requests:

Field	Default	Notes
`Uri`	required for remote	Endpoint base. Empty keeps model local (TensorFlow). Ollama alias fills `${OLLAMA_BASE_URL}/api/generate`, defaulting to `http://ollama:11434`.
`Method`	`POST`	Override verb if provider needs it.
`Key`	`""`	Bearer token; prefer env expansion (OpenAI: `OPENAI_API_KEY`, Ollama: `OLLAMA_API_KEY`).
`Username` / `Password`	`""`	Injected as basic auth when URI lacks userinfo.
`Model`	`""`	Endpoint-specific override; wins over model/name.
`Org` / `Project`	`""`	OpenAI headers (org/proj IDs).
`Think`	`""`	Optional reasoning hint passed as `think` in service requests. Supports levels like `low`, `medium`, `high`; string values `true`/`false` are normalized to JSON booleans on output. Omitted when empty.
`RequestFormat` / `ResponseFormat`	set by engine alias	Explicit values win over alias defaults.
`FileScheme`	set by engine alias (`data` or `base64`)	Controls image transport.
`Disabled`	`false`	Disable the endpoint without removing the model.

Authentication: All credentials and identifiers support ${ENV_VAR} expansion. Service.Key sets Authorization: Bearer <token>; Username/Password injects HTTP basic authentication into the service URI when it is not already present. When Service.Key is empty, PhotoPrism defaults to OPENAI_API_KEY (OpenAI engine) or OLLAMA_API_KEY (Ollama engine), also honoring their _FILE counterparts.

Field Behavior & Precedence

Model identifier resolution order: Service.Model → Model → Name. Model.GetModel() returns (id, name, version) where Ollama receives name:version and other engines receive name plus a separate Version.
Env expansion runs for all Service credentials and Model overrides; empty or disabled models return empty identifiers.
Options merging: engine defaults fill missing fields; explicit values always win. Temperature is capped at MaxTemperature.
Authentication: Service.Key sets Authorization: Bearer <token>; Username/Password inject HTTP basic auth into the service URI when not already present.
Reasoning control: Service.Think maps to ApiRequest.Think and is serialized only when non-empty (omitempty). During JSON encoding, "true" / "false" are converted to boolean true / false; other non-empty values are sent as strings.

Minimal Examples

TensorFlow (built‑in defaults)

Models:
  - Type: labels
    Default: true
    Run: auto

  - Type: nsfw
    Default: true
    Run: auto

  - Type: face
    Default: true
    Run: auto

Ollama Labels

Models:
  - Type: labels
    Model: gemma3:latest
    Engine: ollama
    Run: newly-indexed
    Service:
      Uri: ${OLLAMA_BASE_URL}/api/generate

More Ollama guidance: internal/ai/vision/ollama/README.md.

OpenAI Captions

Models:
  - Type: caption
    Model: gpt-5-mini
    Engine: openai
    Run: newly-indexed
    Service:
      Uri: https://api.openai.com/v1/responses
      Org: ${OPENAI_ORG}
      Project: ${OPENAI_PROJECT}
      Key: ${OPENAI_API_KEY}

More OpenAI guidance: internal/ai/vision/openai/README.md.

Custom TensorFlow Labels (SavedModel)

Models:
  - Type: labels
    Name: transformer
    Engine: tensorflow
    Path: transformer   # resolved under assets/models
    Resolution: 224     # keep standard TF input size unless your model differs
    TensorFlow:
      Output:
        Logits: true    # set true for most TF2 SavedModel classifiers

Custom TensorFlow Models — What’s Supported

Scope: Classification tasks only (labels). TensorFlow models cannot generate captions today; use Ollama or OpenAI for captions.
Location & paths: If Path is empty, the model is loaded from assets/models/<name> (lowercased, underscores). If Path is set, it is still searched under assets/models; absolute paths are not supported.
Expected files: saved_model.pb, a variables/ directory, and a labels.txt alongside the model; use TF2 SavedModel classifiers.
Resolution: Stays at 224px unless your model requires a different input size; adjust Resolution and the TensorFlow.Input block if needed.
Sources: Labels produced by TensorFlow models are recorded with source image; overriding the source isn’t supported yet.
Config file: vision.yml is the conventional name; in the latest version, .yaml is also supported by the loader.

CLI Quick Reference

List models: photoprism vision ls (shows resolved IDs, engines, options, run mode, disabled flag).
Run a model: photoprism vision run -m labels --count 5 (use --force to bypass Run rules).
Validate config: photoprism vision ls --json to confirm env-expanded values without triggering calls.

When to Choose Each Engine

TensorFlow: fast, offline defaults for core features (labels, faces, NSFW). Zero external deps.
Ollama: private, GPU/CPU-hosted multimodal LLMs; best for richer captions/labels without cloud traffic.
OpenAI: highest quality reasoning and multimodal support; requires API key and network access.

Model Unload on Idle

PhotoPrism currently keeps TensorFlow models resident for the lifetime of the process to avoid repeated load costs. A future “model unload on idle” mode would track last-use timestamps and close the TensorFlow session/graph after a configurable idle period, releasing the model’s memory footprint back to the OS. The trade-off is higher latency and CPU overhead when a model is used again, plus extra I/O to reload weights. This may be attractive for low-frequency or memory-constrained deployments but would slow continuous indexing jobs, so it is not enabled today.

Ollama specifics: internal/ai/vision/ollama/README.md
OpenAI specifics: internal/ai/vision/openai/README.md
REST API reference: https://docs.photoprism.dev/
Developer guide (Vision): https://docs.photoprism.app/developer-guide/api/

Documentation ¶

Overview ¶

Package vision provides a computer vision request handler and a client for using external APIs.

This program is free software: you can redistribute it and/or modify
it under Version 3 of the GNU Affero General Public License (the "AGPL"):
<https://docs.photoprism.app/license/agpl>

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU Affero General Public License for more details.

The AGPL is supplemented by our Trademark and Brand Guidelines,
which describe how our Brand Assets may be used:
<https://www.photoprism.app/trademark>

Feel free to send an email to [email protected] if you have questions, want to support our work, or just want to say hello.

Additional information can be found in our Developer Guide: <https://docs.photoprism.app/developer-guide/>

Index ¶

Constants
Variables
func DetectFaces(fileName string, minSize int, cacheCrop bool, expected int) (result face.Faces, err error)
func DetectNSFW(images Files, mediaSrc media.Src) (result []nsfw.Result, err error)
func FilterModels(models []string, when RunType, allow func(ModelType, RunType) bool) []string
func GenerateCaption(images Files, mediaSrc media.Src) (*CaptionResult, *Model, error)
func GenerateFaceEmbeddings(imgData []byte) (embeddings face.Embeddings, err error)
func GenerateLabels(images Files, mediaSrc media.Src, labelSrc entity.Src) (classify.Labels, error)
func GetCachePath() string
func GetFacenetModelPath() string
func GetModelPath(name string) string
func GetModelsPath() string
func GetNasnetModelPath() string
func GetNsfwModelPath() string
func PriorityFromTopicality(topicality float32) int
func RegisterEngine(format ApiFormat, engine Engine)
func RegisterEngineAlias(name string, info EngineInfo)
func ReportRunType(when RunType) string
func Resolution(modelType ModelType) int
func SetCachePath(dir string)
func SetCaptionFunc(fn func(Files, media.Src) (*CaptionResult, *Model, error))
func SetLabelsFunc(fn func(Files, media.Src, entity.Src) (classify.Labels, error))
func SetModelsPath(dir string)
func SetNSFWFunc(fn func(Files, media.Src) ([]nsfw.Result, error))
func Thumb(modelType ModelType) (size thumb.Size)
type ApiFormat
type ApiRequest
- func NewApiRequest(requestFormat ApiFormat, files Files, fileScheme scheme.Type) (result *ApiRequest, err error)
- func NewApiRequestImages(images Files, fileScheme scheme.Type) (*ApiRequest, error)
- func NewApiRequestOllama(images Files, fileScheme scheme.Type) (*ApiRequest, error)
- func NewApiRequestUrl(fileName string, fileScheme scheme.Type) (result *ApiRequest, err error)
- func (r *ApiRequest) GetId() string
- func (r *ApiRequest) GetResponseFormat() ApiFormat
- func (r *ApiRequest) JSON() ([]byte, error)
- func (r *ApiRequest) WriteLog()
type ApiRequestContext
type ApiResponse
- func NewApiError(id string, code int) ApiResponse
- func NewCaptionResponse(id string, model *Model, result *CaptionResult) ApiResponse
- func NewLabelsResponse(id string, model *Model, results classify.Labels) ApiResponse
- func PerformApiRequest(apiRequest *ApiRequest, uri, method, key string) (apiResponse *ApiResponse, err error)
- func (r *ApiResponse) Err() error
- func (r *ApiResponse) HasResult() bool
type ApiResult
- func (r *ApiResult) IsEmpty() bool
type CaptionResult
type ConfigValues
- func NewConfig() *ConfigValues
- func (c *ConfigValues) IsCustom(t ModelType) bool
- func (c *ConfigValues) IsDefault(t ModelType) bool
- func (c *ConfigValues) Load(fileName string) error
- func (c *ConfigValues) Model(t ModelType) *Model
- func (c *ConfigValues) RunType(t ModelType) RunType
- func (c *ConfigValues) Save(fileName string) error
- func (c *ConfigValues) ShouldRun(t ModelType, when RunType) bool
type Engine
- func EngineFor(format ApiFormat) (Engine, bool)
type EngineDefaults
type EngineInfo
- func EngineInfoFor(name string) (EngineInfo, bool)
type Files
type LabelResult
- func (r LabelResult) ToClassify(labelSrc string) classify.Label
type Model
- func (m *Model) ApplyEngineDefaults()
- func (m *Model) ApplyService(apiRequest *ApiRequest)
- func (m *Model) ClassifyModel() *classify.Model
- func (m *Model) Clone() *Model
- func (m *Model) Endpoint() (uri, method string)
- func (m *Model) EndpointFileScheme() (fileScheme scheme.Type)
- func (m *Model) EndpointKey() (key string)
- func (m *Model) EndpointRequestFormat() (format ApiFormat)
- func (m *Model) EndpointResponseFormat() (format ApiFormat)
- func (m *Model) EngineName() string
- func (m *Model) FaceModel() *face.Model
- func (m *Model) GetFormat() string
- func (m *Model) GetModel() (model, name, version string)
- func (m *Model) GetOptions() *ModelOptions
- func (m *Model) GetPrompt() string
- func (m *Model) GetSource() string
- func (m *Model) GetSystemPrompt() string
- func (m *Model) IsDefault() bool
- func (m *Model) NsfwModel() *nsfw.Model
- func (m *Model) PromptContains(s string) bool
- func (m *Model) RunType() RunType
- func (m *Model) SchemaInstructions() string
- func (m *Model) SchemaTemplate() string
- func (m *Model) ShouldRun(when RunType) bool
type ModelEngine
type ModelOptions
type ModelType
type ModelTypes
- func ParseModelTypes(s string) (types ModelTypes)
type Models
type RequestBuilder
type ResponseParser
type RunType
- func ParseRunType(s string) RunType
type Service
- func (m *Service) BasicAuth() (username, password string)
- func (m *Service) Endpoint() (uri, method string)
- func (m *Service) EndpointFileScheme() scheme.Type
- func (m *Service) EndpointKey() string
- func (m *Service) EndpointOrg() string
- func (m *Service) EndpointProject() string
- func (m *Service) EndpointRequestFormat() ApiFormat
- func (m *Service) EndpointResponseFormat() ApiFormat
- func (m *Service) EndpointThink() string
- func (m *Service) GetModel() string
type Thresholds
- func (t *Thresholds) GetConfidence() int
- func (t *Thresholds) GetConfidenceFloat32() float32
- func (t *Thresholds) GetNSFW() int
- func (t *Thresholds) GetNSFWFloat32() float32
- func (t *Thresholds) GetTopicality() int
- func (t *Thresholds) GetTopicalityFloat32() float32

Constants ¶

View Source

const (
	// FormatJSON indicates JSON payloads.
	FormatJSON = "json"
)

Variables ¶

View Source

var (
	// CachePath stores the directory used for caching downloaded vision models.
	CachePath = ""
	// ModelsPath stores the directory containing downloaded vision models.
	ModelsPath = ""
	// DownloadUrl overrides the default model download endpoint when set.
	DownloadUrl = ""
	// ServiceApi enables exposing vision APIs via the service layer when true.
	ServiceApi = false
	// ServiceUri sets the base URI for the vision service when exposed externally.
	ServiceUri = ""
	// ServiceKey provides an optional API key for the vision service.
	ServiceKey = ""
	// ServiceTimeout sets the maximum duration for service API requests.
	ServiceTimeout = 10 * time.Minute
	// ServiceMethod defines the HTTP verb used when calling the vision service.
	ServiceMethod = http.MethodPost
	// ServiceFileScheme specifies how local files are encoded when sent to the service.
	ServiceFileScheme = scheme.Data
	// ServiceRequestFormat sets the default payload format for service requests.
	ServiceRequestFormat = ApiFormatVision
	// ServiceResponseFormat sets the expected response format from the service.
	ServiceResponseFormat = ApiFormatVision
	// DefaultResolution specifies the default square resize dimension for model inputs.
	DefaultResolution = 224
	// DefaultTemperature sets the sampling temperature for compatible models.
	DefaultTemperature = 0.1
	// MaxTemperature clamps user-supplied temperatures to a safe upper bound.
	MaxTemperature = 2.0
	// DefaultSrc defines the fallback source string for generated labels.
	DefaultSrc = entity.SrcImage
	// DetectNSFWLabels toggles NSFW label detection in vision responses.
	DetectNSFWLabels = false
)

View Source

var (
	VersionLatest = "latest"
	VersionMobile = "mobile"
	Version3B     = "3b"
)

Default model version strings.

View Source

var (
	NasnetModel = &Model{
		Type:       ModelTypeLabels,
		Default:    true,
		Name:       "nasnet",
		Version:    VersionMobile,
		Resolution: 224,
		TensorFlow: &tensorflow.ModelInfo{
			TFVersion: "1.12.0",
			Tags:      []string{"photoprism"},
			Input: &tensorflow.PhotoInput{
				Name:              "input_1",
				Height:            224,
				Width:             224,
				ResizeOperation:   tensorflow.CenterCrop,
				ColorChannelOrder: tensorflow.RGB,
				Shape:             tensorflow.DefaultPhotoInputShape(),
				Intervals: []tensorflow.Interval{
					{
						Start: -1.0,
						End:   1.0,
					},
				},
				OutputIndex: 0,
			},
			Output: &tensorflow.ModelOutput{
				Name:          "predictions/Softmax",
				NumOutputs:    1000,
				OutputIndex:   0,
				OutputsLogits: false,
			},
		},
	}
	NsfwModel = &Model{
		Type:       ModelTypeNsfw,
		Default:    true,
		Name:       "nsfw",
		Version:    VersionLatest,
		Resolution: 224,
		TensorFlow: &tensorflow.ModelInfo{
			TFVersion: "1.12.0",
			Tags:      []string{"serve"},
			Input: &tensorflow.PhotoInput{
				Name:        "input_tensor",
				Height:      224,
				Width:       224,
				OutputIndex: 0,
				Shape:       tensorflow.DefaultPhotoInputShape(),
			},
			Output: &tensorflow.ModelOutput{
				Name:          "nsfw_cls_model/final_prediction",
				NumOutputs:    5,
				OutputIndex:   0,
				OutputsLogits: false,
			},
		},
	}
	FacenetModel = &Model{
		Type:       ModelTypeFace,
		Default:    true,
		Name:       "facenet",
		Version:    VersionLatest,
		Resolution: 160,
		TensorFlow: &tensorflow.ModelInfo{
			TFVersion: "1.7.1",
			Tags:      []string{"serve"},
			Input: &tensorflow.PhotoInput{
				Name:        "input",
				Height:      160,
				Width:       160,
				Shape:       tensorflow.DefaultPhotoInputShape(),
				OutputIndex: 0,
			},
			Output: &tensorflow.ModelOutput{
				Name:          "embeddings",
				NumOutputs:    512,
				OutputIndex:   0,
				OutputsLogits: false,
			},
		},
	}
	CaptionModel = &Model{
		Type:   ModelTypeCaption,
		Engine: ollama.EngineName,
		Run:    RunManual,
	}
	DefaultModels = Models{
		NasnetModel,
		NsfwModel,
		FacenetModel,
		CaptionModel,
	}
	DefaultThresholds = Thresholds{
		Confidence: 10,
		Topicality: 0,
		NSFW:       75,
	}
)

Default computer vision model configuration.

View Source

var Config = NewConfig()

Config reference the current configuration options.

View Source

var (
	// ErrInvalidModel indicates an unknown or unsupported vision model name.
	ErrInvalidModel = fmt.Errorf("vision: invalid model")
)

View Source

var RunTypes = map[string]RunType{
	RunAuto:            RunAuto,
	"auto":             RunAuto,
	RunNever:           RunNever,
	RunManual:          RunManual,
	"manually":         RunManual,
	"command":          RunManual,
	RunAlways:          RunAlways,
	RunNewlyIndexed:    RunNewlyIndexed,
	"on-newly-indexed": RunNewlyIndexed,
	"indexed":          RunNewlyIndexed,
	"on-indexed":       RunNewlyIndexed,
	"after-index":      RunNewlyIndexed,
	RunOnDemand:        RunOnDemand,
	RunOnSchedule:      RunOnSchedule,
	"schedule":         RunOnSchedule,
	RunOnIndex:         RunOnIndex,
	"index":            RunOnIndex,
}

RunTypes maps configuration strings to standard RunType model settings.

Functions ¶

func DetectFaces ¶

func DetectFaces(fileName string, minSize int, cacheCrop bool, expected int) (result face.Faces, err error)

DetectFaces detects faces in the specified image and generates embeddings from them.

func DetectNSFW ¶

func DetectNSFW(images Files, mediaSrc media.Src) (result []nsfw.Result, err error)

DetectNSFW checks images for inappropriate content and generates probability scores grouped by category.

func FilterModels ¶

func FilterModels(models []string, when RunType, allow func(ModelType, RunType) bool) []string

FilterModels takes a list of model type names and a scheduling context, and returns only the types that are allowed to run according to the supplied predicate. Empty or unknown names are ignored.

func GenerateCaption ¶

func GenerateCaption(images Files, mediaSrc media.Src) (*CaptionResult, *Model, error)

GenerateCaption returns generated captions for the specified images.

func GenerateFaceEmbeddings ¶

func GenerateFaceEmbeddings(imgData []byte) (embeddings face.Embeddings, err error)

GenerateFaceEmbeddings returns the embeddings for the specified face crop image.

func GenerateLabels ¶

func GenerateLabels(images Files, mediaSrc media.Src, labelSrc entity.Src) (classify.Labels, error)

GenerateLabels finds matching labels for the specified image. Caller must pass the appropriate metadata source string (e.g., entity.SrcOllama, entity.SrcOpenAI) so that downstream indexing can record where the labels originated.

func GetCachePath ¶

func GetCachePath() string

GetCachePath returns the cache path.

func GetFacenetModelPath ¶

func GetFacenetModelPath() string

GetFacenetModelPath returns the absolute path of the default Facenet model.

func GetModelPath ¶

func GetModelPath(name string) string

GetModelPath returns the absolute path of a named model file in CachePath.

func GetModelsPath ¶

func GetModelsPath() string

GetModelsPath returns the model assets path, or an empty string if not configured or found.

func GetNasnetModelPath ¶

func GetNasnetModelPath() string

GetNasnetModelPath returns the absolute path of the default Nasnet model.

func GetNsfwModelPath ¶

func GetNsfwModelPath() string

GetNsfwModelPath returns the absolute path of the default NSFW model.

func PriorityFromTopicality ¶

func PriorityFromTopicality(topicality float32) int

PriorityFromTopicality converts topicality scores to our priority scale (-2..5).

func RegisterEngine ¶

func RegisterEngine(format ApiFormat, engine Engine)

RegisterEngine adds/overrides an engine implementation for a specific API format.

func RegisterEngineAlias ¶

func RegisterEngineAlias(name string, info EngineInfo)

RegisterEngineAlias maps a logical engine name (e.g., "ollama") to a request/response format pair.

func ReportRunType ¶

func ReportRunType(when RunType) string

ReportRunType returns a human-readable string for the run type, preserving the explicit value when set or "auto" when delegation is in effect.

func Resolution ¶

func Resolution(modelType ModelType) int

Resolution returns the image resolution of the given model type.

func SetCachePath ¶

func SetCachePath(dir string)

SetCachePath updates the cache path.

func SetCaptionFunc ¶

func SetCaptionFunc(fn func(Files, media.Src) (*CaptionResult, *Model, error))

SetCaptionFunc overrides the caption generator. Intended for tests.

func SetLabelsFunc ¶

func SetLabelsFunc(fn func(Files, media.Src, entity.Src) (classify.Labels, error))

SetLabelsFunc overrides the labels generator. Intended for tests.

func SetModelsPath ¶

func SetModelsPath(dir string)

SetModelsPath updates the model assets path.

func SetNSFWFunc ¶

func SetNSFWFunc(fn func(Files, media.Src) ([]nsfw.Result, error))

SetNSFWFunc overrides the Vision NSFW detector. Intended for tests.

func Thumb ¶

func Thumb(modelType ModelType) (size thumb.Size)

Thumb returns the matching thumbnail size for the given model type.

Types ¶

type ApiFormat ¶

type ApiFormat = string

ApiFormat defines the payload format accepted by the Vision API.

const (
	// ApiFormatUrl treats inputs as HTTP(S) URLs.
	ApiFormatUrl ApiFormat = "url"
	// ApiFormatImages sends images in the native Vision format.
	ApiFormatImages ApiFormat = "images"
	// ApiFormatVision represents a Vision-internal payload.
	ApiFormatVision ApiFormat = "vision"
	// ApiFormatOllama proxies requests to Ollama models.
	ApiFormatOllama ApiFormat = ollama.ApiFormat
	// ApiFormatOpenAI proxies requests to OpenAI vision models.
	ApiFormatOpenAI ApiFormat = openai.ApiFormat
)

type ApiRequest ¶

type ApiRequest struct {
	Id             string             `form:"id" yaml:"Id,omitempty" json:"id,omitempty"`
	Model          string             `form:"model" yaml:"Model,omitempty" json:"model,omitempty"`
	Version        string             `form:"version" yaml:"Version,omitempty" json:"version,omitempty"`
	System         string             `form:"system" yaml:"System,omitempty" json:"system,omitempty"`
	Prompt         string             `form:"prompt" yaml:"Prompt,omitempty" json:"prompt,omitempty"`
	Suffix         string             `form:"suffix" yaml:"Suffix,omitempty" json:"suffix"`
	Format         string             `form:"format" yaml:"Format,omitempty" json:"format,omitempty"`
	Url            string             `form:"url" yaml:"Url,omitempty" json:"url,omitempty"`
	Org            string             `form:"org" yaml:"Org,omitempty" json:"org,omitempty"`
	Project        string             `form:"project" yaml:"Project,omitempty" json:"project,omitempty"`
	Think          string             `form:"think" yaml:"Think,omitempty" json:"think,omitempty"`
	Options        *ModelOptions      `form:"options" yaml:"Options,omitempty" json:"options,omitempty"`
	Context        *ApiRequestContext `form:"context" yaml:"Context,omitempty" json:"context,omitempty"`
	Stream         bool               `form:"stream" yaml:"Stream,omitempty" json:"stream"`
	Images         Files              `form:"images" yaml:"Images,omitempty" json:"images,omitempty"`
	Schema         json.RawMessage    `form:"schema" yaml:"Schema,omitempty" json:"schema,omitempty"`
	ResponseFormat ApiFormat          `form:"-" yaml:"-" json:"-"`
}

ApiRequest represents a Vision API service request.

func NewApiRequest ¶

func NewApiRequest(requestFormat ApiFormat, files Files, fileScheme scheme.Type) (result *ApiRequest, err error)

NewApiRequest returns a new service API request with the specified format and payload.

func NewApiRequestImages ¶

func NewApiRequestImages(images Files, fileScheme scheme.Type) (*ApiRequest, error)

NewApiRequestImages returns a new Vision API request with the specified images as payload.

func NewApiRequestOllama ¶

func NewApiRequestOllama(images Files, fileScheme scheme.Type) (*ApiRequest, error)

NewApiRequestOllama returns a new Ollama API request with the specified images as payload.

func NewApiRequestUrl ¶

func NewApiRequestUrl(fileName string, fileScheme scheme.Type) (result *ApiRequest, err error)

NewApiRequestUrl returns a new Vision API request with the specified image Url as payload.

func (*ApiRequest) GetId ¶

func (r *ApiRequest) GetId() string

GetId returns the request ID string and generates a random ID if none was set.

func (*ApiRequest) GetResponseFormat ¶

func (r *ApiRequest) GetResponseFormat() ApiFormat

GetResponseFormat returns the expected response format type.

func (*ApiRequest) JSON ¶

func (r *ApiRequest) JSON() ([]byte, error)

JSON returns the request data as JSON-encoded bytes.

func (*ApiRequest) WriteLog ¶

func (r *ApiRequest) WriteLog()

WriteLog logs the request data when trace log mode is enabled.

type ApiRequestContext ¶

type ApiRequestContext = []int

ApiRequestContext represents a context parameter returned from a previous request.

type ApiResponse ¶

type ApiResponse struct {
	Id     string    `yaml:"Id,omitempty" json:"id,omitempty"`
	Code   int       `yaml:"Code,omitempty" json:"code,omitempty"`
	Error  string    `yaml:"Error,omitempty" json:"error,omitempty"`
	Model  *Model    `yaml:"Model,omitempty" json:"model,omitempty"`
	Result ApiResult `yaml:"Result,omitempty" json:"result"`
}

ApiResponse represents a Vision API service response.

func NewApiError ¶

func NewApiError(id string, code int) ApiResponse

NewApiError generates a Vision API error response based on the specified HTTP status code.

func NewCaptionResponse ¶

func NewCaptionResponse(id string, model *Model, result *CaptionResult) ApiResponse

NewCaptionResponse generates a new Vision API image caption service response.

func NewLabelsResponse ¶

func NewLabelsResponse(id string, model *Model, results classify.Labels) ApiResponse

NewLabelsResponse generates a new Vision API image classification service response.

func PerformApiRequest ¶

func PerformApiRequest(apiRequest *ApiRequest, uri, method, key string) (apiResponse *ApiResponse, err error)

PerformApiRequest performs a Vision API request and returns the result.

func (*ApiResponse) Err ¶

func (r *ApiResponse) Err() error

Err returns an error if the request has failed.

func (*ApiResponse) HasResult ¶

func (r *ApiResponse) HasResult() bool

HasResult checks if there is at least one result in the response data.

type ApiResult ¶

type ApiResult struct {
	Labels     []LabelResult     `yaml:"Labels,omitempty" json:"labels,omitempty"`
	Nsfw       []nsfw.Result     `yaml:"Nsfw,omitempty" json:"nsfw,omitempty"`
	Embeddings []face.Embeddings `yaml:"Embeddings,omitempty" json:"embeddings,omitempty"`
	Caption    *CaptionResult    `yaml:"Caption,omitempty" json:"caption,omitempty"`
}

ApiResult represents the model response(s) to a Vision API service request and can optionally include data from multiple models.

func (*ApiResult) IsEmpty ¶

func (r *ApiResult) IsEmpty() bool

IsEmpty checks if there is no result in the response data.

type CaptionResult ¶

type CaptionResult struct {
	Text       string  `yaml:"Text,omitempty" json:"text,omitempty"`
	Source     string  `yaml:"Source,omitempty" json:"source,omitempty"`
	Confidence float32 `yaml:"Confidence,omitempty" json:"confidence,omitempty"`
}

CaptionResult represents the result generated by a caption generation model.

type ConfigValues ¶

type ConfigValues struct {
	Models     Models     `yaml:"Models,omitempty" json:"models,omitempty"`
	Thresholds Thresholds `yaml:"Thresholds,omitempty" json:"thresholds"`
}

ConfigValues represents computer vision configuration values for the supported Model types.

func NewConfig ¶

func NewConfig() *ConfigValues

NewConfig returns a new computer vision config with defaults.

func (*ConfigValues) IsCustom ¶

func (c *ConfigValues) IsCustom(t ModelType) bool

IsCustom checks whether the specified type uses a custom model or service.

func (*ConfigValues) IsDefault ¶

func (c *ConfigValues) IsDefault(t ModelType) bool

IsDefault checks whether the specified type is the built-in default model.

func (*ConfigValues) Load ¶

func (c *ConfigValues) Load(fileName string) error

Load user settings from file.

func (*ConfigValues) Model ¶

func (c *ConfigValues) Model(t ModelType) *Model

Model returns the first enabled model with the matching type. It returns nil if no matching model is available or every model of that type is disabled, allowing callers to chain nil-safe Model methods.

func (*ConfigValues) RunType ¶

func (c *ConfigValues) RunType(t ModelType) RunType

RunType returns the normalized run type for the first enabled model matching the provided type. Disabled or missing models fall back to RunNever so callers can treat the result as authoritative scheduling information.

func (*ConfigValues) Save ¶

func (c *ConfigValues) Save(fileName string) error

Save user settings to a file.

func (*ConfigValues) ShouldRun ¶

func (c *ConfigValues) ShouldRun(t ModelType, when RunType) bool

ShouldRun reports whether the configured model for the given type is allowed to run in the specified context. It returns false when no suitable model exists or when execution is explicitly disabled.

type Engine ¶

type Engine struct {
	Builder  RequestBuilder
	Parser   ResponseParser
	Defaults EngineDefaults
}

Engine groups the callbacks required to integrate a third-party vision service.

func EngineFor ¶

func EngineFor(format ApiFormat) (Engine, bool)

EngineFor returns the registered engine implementation for the given API format, if any.

type EngineDefaults ¶

type EngineDefaults interface {
	SystemPrompt(model *Model) string
	UserPrompt(model *Model) string
	SchemaTemplate(model *Model) string
	Options(model *Model) *ModelOptions
}

EngineDefaults supplies engine-specific prompt and schema defaults when they are not configured explicitly.

type EngineInfo ¶

type EngineInfo struct {
	Uri               string
	RequestFormat     ApiFormat
	ResponseFormat    ApiFormat
	FileScheme        string
	DefaultModel      string
	DefaultResolution int
	DefaultKey        string // Optional placeholder key (e.g., ${OPENAI_API_KEY}); applied only when Service.Key is empty.
}

EngineInfo describes metadata that can be associated with an engine alias.

func EngineInfoFor ¶

func EngineInfoFor(name string) (EngineInfo, bool)

EngineInfoFor returns the metadata associated with a logical engine name.

type Files ¶

type Files = []string

Files holds a list of input file paths or URLs for vision requests.

type LabelResult ¶

type LabelResult struct {
	Name           string   `yaml:"Name,omitempty" json:"name"`
	Source         string   `yaml:"Source,omitempty" json:"source"`
	Priority       int      `yaml:"Priority,omitempty" json:"priority,omitempty"`
	Confidence     float32  `yaml:"Confidence,omitempty" json:"confidence,omitempty"`
	Topicality     float32  `yaml:"Topicality,omitempty" json:"topicality,omitempty"`
	Categories     []string `yaml:"Categories,omitempty" json:"categories,omitempty"`
	NSFW           bool     `yaml:"Nsfw,omitempty" json:"nsfw,omitempty"`
	NSFWConfidence float32  `yaml:"NsfwConfidence,omitempty" json:"nsfw_confidence,omitempty"`
}

LabelResult represents a label generated by an image classification model.

func (LabelResult) ToClassify ¶

func (r LabelResult) ToClassify(labelSrc string) classify.Label

ToClassify returns the label results as classify.Label.

type Model ¶

type Model struct {
	Type       ModelType             `yaml:"Type,omitempty" json:"type,omitempty"`
	Default    bool                  `yaml:"Default,omitempty" json:"default,omitempty"`
	Model      string                `yaml:"Model,omitempty" json:"model,omitempty"`
	Name       string                `yaml:"Name,omitempty" json:"name,omitempty"`
	Version    string                `yaml:"Version,omitempty" json:"version,omitempty"`
	Engine     ModelEngine           `yaml:"Engine,omitempty" json:"engine,omitempty"`
	Run        RunType               `yaml:"Run,omitempty" json:"Run,omitempty"` // "auto", "never", "manual", "always", "newly-indexed", "on-schedule"
	System     string                `yaml:"System,omitempty" json:"system,omitempty"`
	Prompt     string                `yaml:"Prompt,omitempty" json:"prompt,omitempty"`
	Format     string                `yaml:"Format,omitempty" json:"format,omitempty"`
	Schema     string                `yaml:"Schema,omitempty" json:"schema,omitempty"`
	SchemaFile string                `yaml:"SchemaFile,omitempty" json:"schemaFile,omitempty"`
	Resolution int                   `yaml:"Resolution,omitempty" json:"resolution,omitempty"`
	TensorFlow *tensorflow.ModelInfo `yaml:"TensorFlow,omitempty" json:"tensorflow,omitempty"`
	Options    *ModelOptions         `yaml:"Options,omitempty" json:"options,omitempty"`
	Service    Service               `yaml:"Service,omitempty" json:"service"`
	Path       string                `yaml:"Path,omitempty" json:"-"`
	Disabled   bool                  `yaml:"Disabled,omitempty" json:"disabled,omitempty"`
	// contains filtered or unexported fields
}

Model represents a computer vision model configuration.

func (*Model) ApplyEngineDefaults ¶

func (m *Model) ApplyEngineDefaults()

ApplyEngineDefaults normalizes the engine name and applies registered engine defaults (formats, schemes, resolution) when these are not explicitly configured.

func (*Model) ApplyService ¶

func (m *Model) ApplyService(apiRequest *ApiRequest)

ApplyService updates the ApiRequest with service-specific values when configured.

func (*Model) ClassifyModel ¶

func (m *Model) ClassifyModel() *classify.Model

ClassifyModel returns the matching classify model instance, if any. Nil receivers return nil.

func (*Model) Clone ¶

func (m *Model) Clone() *Model

Clone returns a shallow copy of the model. Nil receivers return nil.

func (*Model) Endpoint ¶

func (m *Model) Endpoint() (uri, method string)

Endpoint returns the remote service request method and endpoint URL. Nil receivers return empty strings.

func (*Model) EndpointFileScheme ¶

func (m *Model) EndpointFileScheme() (fileScheme scheme.Type)

EndpointFileScheme returns the endpoint API request file scheme type. Nil receivers fall back to the global default scheme.

func (*Model) EndpointKey ¶

func (m *Model) EndpointKey() (key string)

EndpointKey returns the access token belonging to the remote service endpoint, or an empty string for nil receivers.

func (*Model) EndpointRequestFormat ¶

func (m *Model) EndpointRequestFormat() (format ApiFormat)

EndpointRequestFormat returns the endpoint API request format. Nil receivers fall back to the global default format.

func (*Model) EndpointResponseFormat ¶

func (m *Model) EndpointResponseFormat() (format ApiFormat)

EndpointResponseFormat returns the endpoint API response format. Nil receivers fall back to the global default format.

func (*Model) EngineName ¶

func (m *Model) EngineName() string

EngineName returns the normalized engine identifier or infers one from the request configuration. Nil receivers return an empty string.

func (*Model) FaceModel ¶

func (m *Model) FaceModel() *face.Model

FaceModel returns the matching face recognition model instance, if any. Nil receivers return nil.

func (*Model) GetFormat ¶

func (m *Model) GetFormat() string

GetFormat returns the configured response format or a sensible default. Nil receivers return an empty string.

func (*Model) GetModel ¶

func (m *Model) GetModel() (model, name, version string)

GetModel returns the normalized model identifier, name, and version strings used in service requests. Callers can always destructure the tuple because nil receivers return empty values.

func (*Model) GetOptions ¶

func (m *Model) GetOptions() *ModelOptions

GetOptions returns the API request options, applying engine defaults on demand. Nil receivers return nil.

func (*Model) GetPrompt ¶

func (m *Model) GetPrompt() string

GetPrompt returns the configured model prompt, using engine defaults when none is specified. Nil receivers return an empty string.

func (*Model) GetSource ¶

func (m *Model) GetSource() string

GetSource returns the default entity src based on the model configuration.

func (*Model) GetSystemPrompt ¶

func (m *Model) GetSystemPrompt() string

GetSystemPrompt returns the configured system prompt, falling back to engine defaults when none is specified. Nil receivers return an empty string.

func (*Model) IsDefault ¶

func (m *Model) IsDefault() bool

IsDefault reports whether the model refers to one of the built-in defaults. Nil receivers return false.

func (*Model) NsfwModel ¶

func (m *Model) NsfwModel() *nsfw.Model

NsfwModel returns the matching nsfw model instance, if any. Nil receivers return nil.

func (*Model) PromptContains ¶

func (m *Model) PromptContains(s string) bool

PromptContains returns true if the prompt contains the specified substring.

func (*Model) RunType ¶

func (m *Model) RunType() RunType

RunType returns the normalized run type configured for the model. Nil receivers default to RunAuto.

func (*Model) SchemaInstructions ¶

func (m *Model) SchemaInstructions() string

SchemaInstructions returns a helper string that can be appended to prompts. Nil receivers return an empty string.

func (*Model) SchemaTemplate ¶

func (m *Model) SchemaTemplate() string

SchemaTemplate returns the model-specific JSON schema template, if any. Nil receivers return an empty string.

func (*Model) ShouldRun ¶

func (m *Model) ShouldRun(when RunType) bool

ShouldRun reports whether the model should execute in the specified scheduling context. Nil receivers always return false.

type ModelEngine ¶

type ModelEngine = string

ModelEngine represents the canonical identifier for a computer vision service engine.

const (
	// EngineVision represents the default PhotoPrism vision service endpoints.
	EngineVision ModelEngine = "vision"
	// EngineTensorFlow represents on-device TensorFlow models.
	EngineTensorFlow ModelEngine = "tensorflow"
	// EngineLocal is used when no explicit engine can be determined.
	EngineLocal ModelEngine = "local"
)

type ModelOptions ¶

type ModelOptions struct {
	Temperature      float64  `yaml:"Temperature,omitempty" json:"temperature,omitempty"`            // Ollama, OpenAI
	TopK             int      `yaml:"TopK,omitempty" json:"top_k,omitempty"`                         // Ollama
	TopP             float64  `yaml:"TopP,omitempty" json:"top_p,omitempty"`                         // Ollama, OpenAI
	MinP             float64  `yaml:"MinP,omitempty" json:"min_p,omitempty"`                         // Ollama
	TypicalP         float64  `yaml:"TypicalP,omitempty" json:"typical_p,omitempty"`                 // Ollama
	TfsZ             float64  `yaml:"TfsZ,omitempty" json:"tfs_z,omitempty"`                         // Ollama
	Seed             int      `yaml:"Seed,omitempty" json:"seed,omitempty"`                          // Ollama
	NumKeep          int      `yaml:"NumKeep,omitempty" json:"num_keep,omitempty"`                   // Ollama
	RepeatLastN      int      `yaml:"RepeatLastN,omitempty" json:"repeat_last_n,omitempty"`          // Ollama
	RepeatPenalty    float64  `yaml:"RepeatPenalty,omitempty" json:"repeat_penalty,omitempty"`       // Ollama
	PresencePenalty  float64  `yaml:"PresencePenalty,omitempty" json:"presence_penalty,omitempty"`   // OpenAI
	FrequencyPenalty float64  `yaml:"FrequencyPenalty,omitempty" json:"frequency_penalty,omitempty"` // OpenAI
	PenalizeNewline  bool     `yaml:"PenalizeNewline,omitempty" json:"penalize_newline,omitempty"`   // Ollama
	Stop             []string `yaml:"Stop,omitempty" json:"stop,omitempty"`                          // Ollama, OpenAI
	Mirostat         int      `yaml:"Mirostat,omitempty" json:"mirostat,omitempty"`                  // Ollama
	MirostatTau      float64  `yaml:"MirostatTau,omitempty" json:"mirostat_tau,omitempty"`           // Ollama
	MirostatEta      float64  `yaml:"MirostatEta,omitempty" json:"mirostat_eta,omitempty"`           // Ollama
	NumPredict       int      `yaml:"NumPredict,omitempty" json:"num_predict,omitempty"`             // Ollama
	MaxOutputTokens  int      `yaml:"MaxOutputTokens,omitempty" json:"max_output_tokens,omitempty"`  // Ollama, OpenAI
	ForceJson        bool     `yaml:"ForceJson,omitempty" json:"force_json,omitempty"`               // Ollama, OpenAI
	SchemaVersion    string   `yaml:"SchemaVersion,omitempty" json:"schema_version,omitempty"`       // Ollama, OpenAI
	CombineOutputs   string   `yaml:"CombineOutputs,omitempty" json:"combine_outputs,omitempty"`     // OpenAI
	Detail           string   `yaml:"Detail,omitempty" json:"detail,omitempty"`                      // OpenAI
	NumCtx           int      `yaml:"NumCtx,omitempty" json:"num_ctx,omitempty"`                     // Ollama, OpenAI
	NumThread        int      `yaml:"NumThread,omitempty" json:"num_thread,omitempty"`               // Ollama
	NumBatch         int      `yaml:"NumBatch,omitempty" json:"num_batch,omitempty"`                 // Ollama
	NumGpu           int      `yaml:"NumGpu,omitempty" json:"num_gpu,omitempty"`                     // Ollama
	MainGpu          int      `yaml:"MainGpu,omitempty" json:"main_gpu,omitempty"`                   // Ollama
	LowVram          bool     `yaml:"LowVram,omitempty" json:"low_vram,omitempty"`                   // Ollama
	VocabOnly        bool     `yaml:"VocabOnly,omitempty" json:"vocab_only,omitempty"`               // Ollama
	UseMmap          bool     `yaml:"UseMmap,omitempty" json:"use_mmap,omitempty"`                   // Ollama
	UseMlock         bool     `yaml:"UseMlock,omitempty" json:"use_mlock,omitempty"`                 // Ollama
	Numa             bool     `yaml:"Numa,omitempty" json:"numa,omitempty"`                          // Ollama
}

ModelOptions represents additional model parameters listed in the documentation. Comments note which engines currently honor each field.

type ModelType ¶

type ModelType = string

ModelType defines the classifier type used by a vision model (labels, caption, face, etc.).

const (
	// ModelTypeLabels runs label detection.
	ModelTypeLabels ModelType = "labels"
	// ModelTypeNsfw runs NSFW detection.
	ModelTypeNsfw ModelType = "nsfw"
	// ModelTypeFace performs face detection or recognition.
	ModelTypeFace ModelType = "face"
	// ModelTypeCaption generates captions.
	ModelTypeCaption ModelType = "caption"
	// ModelTypeGenerate produces new content (e.g., text-to-image), when supported.
	ModelTypeGenerate ModelType = "generate"
)

type ModelTypes ¶

type ModelTypes = []ModelType

ModelTypes is a list of model type identifiers.

func ParseModelTypes ¶

func ParseModelTypes(s string) (types ModelTypes)

ParseModelTypes parses a model type string.

type Models ¶

type Models []*Model

Models represents a set of computer vision models.

type RequestBuilder ¶

type RequestBuilder interface {
	Build(ctx context.Context, model *Model, files Files) (*ApiRequest, error)
}

RequestBuilder builds an API request for an engine based on the model configuration and input files.

type ResponseParser ¶

type ResponseParser interface {
	Parse(ctx context.Context, req *ApiRequest, raw []byte, status int) (*ApiResponse, error)
}

ResponseParser parses a raw engine response into the generic ApiResponse structure.

type RunType ¶

type RunType = string

RunType specifies when a vision model should be run.

const (
	// RunAuto automatically decides when to run based on model type and configuration.
	RunAuto RunType = ""
	// RunNever disables the model entirely.
	RunNever RunType = "never"
	// RunManual runs only when explicitly invoked (e.g., via the "vision run" command).
	RunManual RunType = "manual"
	// RunAlways runs manually, on-schedule, on-demand, and on-index.
	RunAlways RunType = "always"
	// RunNewlyIndexed runs manually and for newly indexed pictures.
	RunNewlyIndexed RunType = "newly-indexed"
	// RunOnDemand runs manually, for newly indexed pictures, and on configured schedule.
	RunOnDemand RunType = "on-demand"
	// RunOnSchedule runs manually and on-schedule.
	RunOnSchedule RunType = "on-schedule"
	// RunOnIndex runs manually and after indexing.
	RunOnIndex RunType = "on-index"
)

func ParseRunType ¶

func ParseRunType(s string) RunType

ParseRunType parses a run type string into the canonical RunType constant. Unknown or empty values default to RunAuto.

type Service ¶

type Service struct {
	Uri            string    `yaml:"Uri,omitempty" json:"uri"`
	Method         string    `yaml:"Method,omitempty" json:"method"`
	Model          string    `yaml:"Model,omitempty" json:"model,omitempty"` // Optional endpoint-specific model override.
	Username       string    `yaml:"Username,omitempty" json:"-"`            // Optional basic auth user injected into Endpoint URLs.
	Password       string    `yaml:"Password,omitempty" json:"-"`
	Key            string    `yaml:"Key,omitempty" json:"-"`
	Org            string    `yaml:"Org,omitempty" json:"org,omitempty"`         // Optional organization header (e.g. OpenAI).
	Project        string    `yaml:"Project,omitempty" json:"project,omitempty"` // Optional project header (e.g. OpenAI).
	Think          string    `yaml:"Think,omitempty" json:"think,omitempty"`     // Optional reasoning hint for compatible engines (e.g. Ollama, GPT-OSS).
	FileScheme     string    `yaml:"FileScheme,omitempty" json:"fileScheme,omitempty"`
	RequestFormat  ApiFormat `yaml:"RequestFormat,omitempty" json:"requestFormat,omitempty"`
	ResponseFormat ApiFormat `yaml:"ResponseFormat,omitempty" json:"responseFormat,omitempty"`
	Disabled       bool      `yaml:"Disabled,omitempty" json:"disabled,omitempty"`
}

Service represents a remote computer vision service configuration.

func (*Service) BasicAuth ¶

func (m *Service) BasicAuth() (username, password string)

BasicAuth returns the username and password for basic authentication.

func (*Service) Endpoint ¶

func (m *Service) Endpoint() (uri, method string)

Endpoint returns the remote service request method and endpoint URL, if any.

func (*Service) EndpointFileScheme ¶

func (m *Service) EndpointFileScheme() scheme.Type

EndpointFileScheme returns the endpoint API file scheme type.

func (*Service) EndpointKey ¶

func (m *Service) EndpointKey() string

EndpointKey returns the access token belonging to the remote service endpoint, if any.

func (*Service) EndpointOrg ¶

func (m *Service) EndpointOrg() string

EndpointOrg returns the organization identifier for the endpoint, if any.

func (*Service) EndpointProject ¶

func (m *Service) EndpointProject() string

EndpointProject returns the project identifier for the endpoint, if any.

func (*Service) EndpointRequestFormat ¶

func (m *Service) EndpointRequestFormat() ApiFormat

EndpointRequestFormat returns the endpoint API request format.

func (*Service) EndpointResponseFormat ¶

func (m *Service) EndpointResponseFormat() ApiFormat

EndpointResponseFormat returns the endpoint API response format.

func (*Service) EndpointThink ¶

func (m *Service) EndpointThink() string

EndpointThink returns the optional thinking/reasoning setting for the endpoint, if any.

func (*Service) GetModel ¶

func (m *Service) GetModel() string

GetModel returns the model identifier override for the endpoint, if any.

type Thresholds ¶

type Thresholds struct {
	Confidence int `yaml:"Confidence,omitempty" json:"confidence,omitempty"`
	Topicality int `yaml:"Topicality,omitempty" json:"topicality,omitempty"`
	NSFW       int `yaml:"NSFW,omitempty" json:"nsfw,omitempty"`
}

Thresholds are expressed as percentages (0-100) and gate label acceptance, topicality, and NSFW handling for the configured vision models.

func (*Thresholds) GetConfidence ¶

func (t *Thresholds) GetConfidence() int

GetConfidence returns the Confidence threshold in percent from 0 to 100.

func (*Thresholds) GetConfidenceFloat32 ¶

func (t *Thresholds) GetConfidenceFloat32() float32

GetConfidenceFloat32 returns the Confidence threshold as float32 for comparison.

func (*Thresholds) GetNSFW ¶

func (t *Thresholds) GetNSFW() int

GetNSFW returns the NSFW threshold in percent from 0 to 100.

func (*Thresholds) GetNSFWFloat32 ¶

func (t *Thresholds) GetNSFWFloat32() float32

GetNSFWFloat32 returns the NSFW threshold as float32 for comparison.

func (*Thresholds) GetTopicality ¶

func (t *Thresholds) GetTopicality() int

GetTopicality returns the Topicality threshold in percent from 0 to 100.

func (*Thresholds) GetTopicalityFloat32 ¶

func (t *Thresholds) GetTopicalityFloat32() float32

GetTopicalityFloat32 returns the Topicality threshold as float32 for comparison.

Directories ¶

Path	Synopsis
ollama Package ollama integrates PhotoPrism's vision pipeline with Ollama-compatible multi-modal models so adapters can share logging and engine helpers.	Package ollama integrates PhotoPrism's vision pipeline with Ollama-compatible multi-modal models so adapters can share logging and engine helpers.
openai Package openai implements the PhotoPrism vision adapter that calls the OpenAI Responses API for captions, labels, and optional markers.	Package openai implements the PhotoPrism vision adapter that calls the OpenAI Responses API for captions, labels, and optional markers.
schema Package schema defines canonical JSON and JSON Schema templates shared by PhotoPrism's AI vision engines.	Package schema defines canonical JSON and JSON Schema templates shared by PhotoPrism's AI vision engines.

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL

README ¶

PhotoPrism — Vision Package

Overview

Configuration

Models

Run Modes

Model Options

Model Service

Field Behavior & Precedence

Minimal Examples

TensorFlow (built‑in defaults)

Ollama Labels

OpenAI Captions

Custom TensorFlow Labels (SavedModel)

Custom TensorFlow Models — What’s Supported

CLI Quick Reference

When to Choose Each Engine

Model Unload on Idle

Related Docs

Documentation ¶

Overview ¶

Index ¶

Constants ¶

Variables ¶

Functions ¶

func DetectFaces ¶

func DetectNSFW ¶

func FilterModels ¶

func GenerateCaption ¶

func GenerateFaceEmbeddings ¶

func GenerateLabels ¶

func GetCachePath ¶

func GetFacenetModelPath ¶

func GetModelPath ¶

func GetModelsPath ¶

func GetNasnetModelPath ¶

func GetNsfwModelPath ¶

func PriorityFromTopicality ¶

func RegisterEngine ¶

func RegisterEngineAlias ¶

func ReportRunType ¶

func Resolution ¶

func SetCachePath ¶

func SetCaptionFunc ¶

func SetLabelsFunc ¶

func SetModelsPath ¶

func SetNSFWFunc ¶

func Thumb ¶

Types ¶

type ApiFormat ¶

type ApiRequest ¶

func NewApiRequest ¶

func NewApiRequestImages ¶

func NewApiRequestOllama ¶

func NewApiRequestUrl ¶

func (*ApiRequest) GetId ¶

func (*ApiRequest) GetResponseFormat ¶

func (*ApiRequest) JSON ¶

func (*ApiRequest) WriteLog ¶

type ApiRequestContext ¶

type ApiResponse ¶

func NewApiError ¶

func NewCaptionResponse ¶

func NewLabelsResponse ¶

func PerformApiRequest ¶

func (*ApiResponse) Err ¶

func (*ApiResponse) HasResult ¶

type ApiResult ¶

func (*ApiResult) IsEmpty ¶

type CaptionResult ¶

type ConfigValues ¶

func NewConfig ¶

func (*ConfigValues) IsCustom ¶

func (*ConfigValues) IsDefault ¶

func (*ConfigValues) Load ¶

func (*ConfigValues) Model ¶

func (*ConfigValues) RunType ¶

func (*ConfigValues) Save ¶

func (*ConfigValues) ShouldRun ¶

type Engine ¶