Documentation
¶
Overview ¶
Package vision provides a computer vision request handler and a client for using external APIs.
Copyright (c) 2018 - 2025 PhotoPrism UG. All rights reserved.
This program is free software: you can redistribute it and/or modify it under Version 3 of the GNU Affero General Public License (the "AGPL"): <https://docs.photoprism.app/license/agpl> This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Affero General Public License for more details. The AGPL is supplemented by our Trademark and Brand Guidelines, which describe how our Brand Assets may be used: <https://www.photoprism.app/trademark>
Feel free to send an email to [email protected] if you have questions, want to support our work, or just want to say hello.
Additional information can be found in our Developer Guide: <https://docs.photoprism.app/developer-guide/>
Index ¶
- Constants
- Variables
- func DetectFaces(fileName string, minSize int, cacheCrop bool, expected int) (result face.Faces, err error)
- func DetectNSFW(images Files, mediaSrc media.Src) (result []nsfw.Result, err error)
- func FilterModels(models []string, when RunType, allow func(ModelType, RunType) bool) []string
- func GenerateCaption(images Files, mediaSrc media.Src) (*CaptionResult, *Model, error)
- func GenerateFaceEmbeddings(imgData []byte) (embeddings face.Embeddings, err error)
- func GenerateLabels(images Files, mediaSrc media.Src, labelSrc entity.Src) (classify.Labels, error)
- func GetCachePath() string
- func GetFacenetModelPath() string
- func GetModelPath(name string) string
- func GetModelsPath() string
- func GetNasnetModelPath() string
- func GetNsfwModelPath() string
- func PriorityFromTopicality(topicality float32) int
- func RegisterEngine(format ApiFormat, engine Engine)
- func RegisterEngineAlias(name string, info EngineInfo)
- func ReportRunType(when RunType) string
- func Resolution(modelType ModelType) int
- func SetCachePath(dir string)
- func SetCaptionFunc(fn func(Files, media.Src) (*CaptionResult, *Model, error))
- func SetLabelsFunc(fn func(Files, media.Src, entity.Src) (classify.Labels, error))
- func SetModelsPath(dir string)
- func SetNSFWFunc(fn func(Files, media.Src) ([]nsfw.Result, error))
- func Thumb(modelType ModelType) (size thumb.Size)
- type ApiFormat
- type ApiRequest
- func NewApiRequest(requestFormat ApiFormat, files Files, fileScheme scheme.Type) (result *ApiRequest, err error)
- func NewApiRequestImages(images Files, fileScheme scheme.Type) (*ApiRequest, error)
- func NewApiRequestOllama(images Files, fileScheme scheme.Type) (*ApiRequest, error)
- func NewApiRequestUrl(fileName string, fileScheme scheme.Type) (result *ApiRequest, err error)
- type ApiRequestContext
- type ApiResponse
- func NewApiError(id string, code int) ApiResponse
- func NewCaptionResponse(id string, model *Model, result *CaptionResult) ApiResponse
- func NewLabelsResponse(id string, model *Model, results classify.Labels) ApiResponse
- func PerformApiRequest(apiRequest *ApiRequest, uri, method, key string) (apiResponse *ApiResponse, err error)
- type ApiResult
- type CaptionResult
- type ConfigValues
- func (c *ConfigValues) IsCustom(t ModelType) bool
- func (c *ConfigValues) IsDefault(t ModelType) bool
- func (c *ConfigValues) Load(fileName string) error
- func (c *ConfigValues) Model(t ModelType) *Model
- func (c *ConfigValues) RunType(t ModelType) RunType
- func (c *ConfigValues) Save(fileName string) error
- func (c *ConfigValues) ShouldRun(t ModelType, when RunType) bool
- type Engine
- type EngineDefaults
- type EngineInfo
- type Files
- type LabelResult
- type Model
- func (m *Model) ApplyEngineDefaults()
- func (m *Model) ApplyService(apiRequest *ApiRequest)
- func (m *Model) ClassifyModel() *classify.Model
- func (m *Model) Clone() *Model
- func (m *Model) Endpoint() (uri, method string)
- func (m *Model) EndpointFileScheme() (fileScheme scheme.Type)
- func (m *Model) EndpointKey() (key string)
- func (m *Model) EndpointRequestFormat() (format ApiFormat)
- func (m *Model) EndpointResponseFormat() (format ApiFormat)
- func (m *Model) EngineName() string
- func (m *Model) FaceModel() *face.Model
- func (m *Model) GetFormat() string
- func (m *Model) GetModel() (model, name, version string)
- func (m *Model) GetOptions() *ModelOptions
- func (m *Model) GetPrompt() string
- func (m *Model) GetSource() string
- func (m *Model) GetSystemPrompt() string
- func (m *Model) IsDefault() bool
- func (m *Model) NsfwModel() *nsfw.Model
- func (m *Model) PromptContains(s string) bool
- func (m *Model) RunType() RunType
- func (m *Model) SchemaInstructions() string
- func (m *Model) SchemaTemplate() string
- func (m *Model) ShouldRun(when RunType) bool
- type ModelEngine
- type ModelOptions
- type ModelType
- type ModelTypes
- type Models
- type RequestBuilder
- type ResponseParser
- type RunType
- type Service
- func (m *Service) BasicAuth() (username, password string)
- func (m *Service) Endpoint() (uri, method string)
- func (m *Service) EndpointFileScheme() scheme.Type
- func (m *Service) EndpointKey() string
- func (m *Service) EndpointOrg() string
- func (m *Service) EndpointProject() string
- func (m *Service) EndpointRequestFormat() ApiFormat
- func (m *Service) EndpointResponseFormat() ApiFormat
- func (m *Service) EndpointThink() string
- func (m *Service) GetModel() string
- type Thresholds
Constants ¶
const (
// FormatJSON indicates JSON payloads.
FormatJSON = "json"
)
Variables ¶
var ( // CachePath stores the directory used for caching downloaded vision models. CachePath = "" // ModelsPath stores the directory containing downloaded vision models. ModelsPath = "" // DownloadUrl overrides the default model download endpoint when set. DownloadUrl = "" // ServiceApi enables exposing vision APIs via the service layer when true. ServiceApi = false // ServiceUri sets the base URI for the vision service when exposed externally. ServiceUri = "" // ServiceKey provides an optional API key for the vision service. ServiceKey = "" // ServiceTimeout sets the maximum duration for service API requests. ServiceTimeout = 10 * time.Minute // ServiceMethod defines the HTTP verb used when calling the vision service. ServiceMethod = http.MethodPost // ServiceFileScheme specifies how local files are encoded when sent to the service. ServiceFileScheme = scheme.Data // ServiceRequestFormat sets the default payload format for service requests. ServiceRequestFormat = ApiFormatVision // ServiceResponseFormat sets the expected response format from the service. ServiceResponseFormat = ApiFormatVision // DefaultResolution specifies the default square resize dimension for model inputs. DefaultResolution = 224 // DefaultTemperature sets the sampling temperature for compatible models. DefaultTemperature = 0.1 // MaxTemperature clamps user-supplied temperatures to a safe upper bound. MaxTemperature = 2.0 // DefaultSrc defines the fallback source string for generated labels. DefaultSrc = entity.SrcImage // DetectNSFWLabels toggles NSFW label detection in vision responses. DetectNSFWLabels = false )
var ( VersionLatest = "latest" VersionMobile = "mobile" Version3B = "3b" )
Default model version strings.
var ( NasnetModel = &Model{ Type: ModelTypeLabels, Default: true, Name: "nasnet", Version: VersionMobile, Resolution: 224, TensorFlow: &tensorflow.ModelInfo{ TFVersion: "1.12.0", Tags: []string{"photoprism"}, Input: &tensorflow.PhotoInput{ Name: "input_1", Height: 224, Width: 224, ResizeOperation: tensorflow.CenterCrop, ColorChannelOrder: tensorflow.RGB, Shape: tensorflow.DefaultPhotoInputShape(), Intervals: []tensorflow.Interval{ { Start: -1.0, End: 1.0, }, }, OutputIndex: 0, }, Output: &tensorflow.ModelOutput{ Name: "predictions/Softmax", NumOutputs: 1000, OutputIndex: 0, OutputsLogits: false, }, }, } NsfwModel = &Model{ Type: ModelTypeNsfw, Default: true, Name: "nsfw", Version: VersionLatest, Resolution: 224, TensorFlow: &tensorflow.ModelInfo{ TFVersion: "1.12.0", Tags: []string{"serve"}, Input: &tensorflow.PhotoInput{ Name: "input_tensor", Height: 224, Width: 224, OutputIndex: 0, Shape: tensorflow.DefaultPhotoInputShape(), }, Output: &tensorflow.ModelOutput{ Name: "nsfw_cls_model/final_prediction", NumOutputs: 5, OutputIndex: 0, OutputsLogits: false, }, }, } FacenetModel = &Model{ Type: ModelTypeFace, Default: true, Name: "facenet", Version: VersionLatest, Resolution: 160, TensorFlow: &tensorflow.ModelInfo{ TFVersion: "1.7.1", Tags: []string{"serve"}, Input: &tensorflow.PhotoInput{ Name: "input", Height: 160, Width: 160, Shape: tensorflow.DefaultPhotoInputShape(), OutputIndex: 0, }, Output: &tensorflow.ModelOutput{ Name: "embeddings", NumOutputs: 512, OutputIndex: 0, OutputsLogits: false, }, }, } CaptionModel = &Model{ Type: ModelTypeCaption, Engine: ollama.EngineName, Run: RunManual, } DefaultModels = Models{ NasnetModel, NsfwModel, FacenetModel, CaptionModel, } DefaultThresholds = Thresholds{ Confidence: 10, Topicality: 0, NSFW: 75, } )
Default computer vision model configuration.
var Config = NewConfig()
Config reference the current configuration options.
var ( // ErrInvalidModel indicates an unknown or unsupported vision model name. ErrInvalidModel = fmt.Errorf("vision: invalid model") )
var RunTypes = map[string]RunType{ RunAuto: RunAuto, "auto": RunAuto, RunNever: RunNever, RunManual: RunManual, "manually": RunManual, "command": RunManual, RunAlways: RunAlways, RunNewlyIndexed: RunNewlyIndexed, "on-newly-indexed": RunNewlyIndexed, "indexed": RunNewlyIndexed, "on-indexed": RunNewlyIndexed, "after-index": RunNewlyIndexed, RunOnDemand: RunOnDemand, RunOnSchedule: RunOnSchedule, "schedule": RunOnSchedule, RunOnIndex: RunOnIndex, "index": RunOnIndex, }
RunTypes maps configuration strings to standard RunType model settings.
Functions ¶
func DetectFaces ¶
func DetectFaces(fileName string, minSize int, cacheCrop bool, expected int) (result face.Faces, err error)
DetectFaces detects faces in the specified image and generates embeddings from them.
func DetectNSFW ¶
DetectNSFW checks images for inappropriate content and generates probability scores grouped by category.
func FilterModels ¶
FilterModels takes a list of model type names and a scheduling context, and returns only the types that are allowed to run according to the supplied predicate. Empty or unknown names are ignored.
func GenerateCaption ¶
GenerateCaption returns generated captions for the specified images.
func GenerateFaceEmbeddings ¶
func GenerateFaceEmbeddings(imgData []byte) (embeddings face.Embeddings, err error)
GenerateFaceEmbeddings returns the embeddings for the specified face crop image.
func GenerateLabels ¶
GenerateLabels finds matching labels for the specified image. Caller must pass the appropriate metadata source string (e.g., entity.SrcOllama, entity.SrcOpenAI) so that downstream indexing can record where the labels originated.
func GetFacenetModelPath ¶
func GetFacenetModelPath() string
GetFacenetModelPath returns the absolute path of the default Facenet model.
func GetModelPath ¶
GetModelPath returns the absolute path of a named model file in CachePath.
func GetModelsPath ¶
func GetModelsPath() string
GetModelsPath returns the model assets path, or an empty string if not configured or found.
func GetNasnetModelPath ¶
func GetNasnetModelPath() string
GetNasnetModelPath returns the absolute path of the default Nasnet model.
func GetNsfwModelPath ¶
func GetNsfwModelPath() string
GetNsfwModelPath returns the absolute path of the default NSFW model.
func PriorityFromTopicality ¶
PriorityFromTopicality converts topicality scores to our priority scale (-2..5).
func RegisterEngine ¶
RegisterEngine adds/overrides an engine implementation for a specific API format.
func RegisterEngineAlias ¶
func RegisterEngineAlias(name string, info EngineInfo)
RegisterEngineAlias maps a logical engine name (e.g., "ollama") to a request/response format pair.
func ReportRunType ¶
ReportRunType returns a human-readable string for the run type, preserving the explicit value when set or "auto" when delegation is in effect.
func Resolution ¶
Resolution returns the image resolution of the given model type.
func SetCaptionFunc ¶
SetCaptionFunc overrides the caption generator. Intended for tests.
func SetLabelsFunc ¶
SetLabelsFunc overrides the labels generator. Intended for tests.
func SetNSFWFunc ¶
SetNSFWFunc overrides the Vision NSFW detector. Intended for tests.
Types ¶
type ApiFormat ¶
type ApiFormat = string
ApiFormat defines the payload format accepted by the Vision API.
const ( // ApiFormatUrl treats inputs as HTTP(S) URLs. ApiFormatUrl ApiFormat = "url" // ApiFormatImages sends images in the native Vision format. ApiFormatImages ApiFormat = "images" // ApiFormatVision represents a Vision-internal payload. ApiFormatVision ApiFormat = "vision" // ApiFormatOllama proxies requests to Ollama models. ApiFormatOllama ApiFormat = ollama.ApiFormat // ApiFormatOpenAI proxies requests to OpenAI vision models. ApiFormatOpenAI ApiFormat = openai.ApiFormat )
type ApiRequest ¶
type ApiRequest struct {
Id string `form:"id" yaml:"Id,omitempty" json:"id,omitempty"`
Model string `form:"model" yaml:"Model,omitempty" json:"model,omitempty"`
Version string `form:"version" yaml:"Version,omitempty" json:"version,omitempty"`
System string `form:"system" yaml:"System,omitempty" json:"system,omitempty"`
Prompt string `form:"prompt" yaml:"Prompt,omitempty" json:"prompt,omitempty"`
Suffix string `form:"suffix" yaml:"Suffix,omitempty" json:"suffix"`
Format string `form:"format" yaml:"Format,omitempty" json:"format,omitempty"`
Url string `form:"url" yaml:"Url,omitempty" json:"url,omitempty"`
Org string `form:"org" yaml:"Org,omitempty" json:"org,omitempty"`
Project string `form:"project" yaml:"Project,omitempty" json:"project,omitempty"`
Think string `form:"think" yaml:"Think,omitempty" json:"think,omitempty"`
Options *ModelOptions `form:"options" yaml:"Options,omitempty" json:"options,omitempty"`
Context *ApiRequestContext `form:"context" yaml:"Context,omitempty" json:"context,omitempty"`
Stream bool `form:"stream" yaml:"Stream,omitempty" json:"stream"`
Images Files `form:"images" yaml:"Images,omitempty" json:"images,omitempty"`
Schema json.RawMessage `form:"schema" yaml:"Schema,omitempty" json:"schema,omitempty"`
ResponseFormat ApiFormat `form:"-" yaml:"-" json:"-"`
}
ApiRequest represents a Vision API service request.
func NewApiRequest ¶
func NewApiRequest(requestFormat ApiFormat, files Files, fileScheme scheme.Type) (result *ApiRequest, err error)
NewApiRequest returns a new service API request with the specified format and payload.
func NewApiRequestImages ¶
func NewApiRequestImages(images Files, fileScheme scheme.Type) (*ApiRequest, error)
NewApiRequestImages returns a new Vision API request with the specified images as payload.
func NewApiRequestOllama ¶
func NewApiRequestOllama(images Files, fileScheme scheme.Type) (*ApiRequest, error)
NewApiRequestOllama returns a new Ollama API request with the specified images as payload.
func NewApiRequestUrl ¶
func NewApiRequestUrl(fileName string, fileScheme scheme.Type) (result *ApiRequest, err error)
NewApiRequestUrl returns a new Vision API request with the specified image Url as payload.
func (*ApiRequest) GetId ¶
func (r *ApiRequest) GetId() string
GetId returns the request ID string and generates a random ID if none was set.
func (*ApiRequest) GetResponseFormat ¶
func (r *ApiRequest) GetResponseFormat() ApiFormat
GetResponseFormat returns the expected response format type.
func (*ApiRequest) JSON ¶
func (r *ApiRequest) JSON() ([]byte, error)
JSON returns the request data as JSON-encoded bytes.
func (*ApiRequest) WriteLog ¶
func (r *ApiRequest) WriteLog()
WriteLog logs the request data when trace log mode is enabled.
type ApiRequestContext ¶
type ApiRequestContext = []int
ApiRequestContext represents a context parameter returned from a previous request.
type ApiResponse ¶
type ApiResponse struct {
Id string `yaml:"Id,omitempty" json:"id,omitempty"`
Code int `yaml:"Code,omitempty" json:"code,omitempty"`
Error string `yaml:"Error,omitempty" json:"error,omitempty"`
Model *Model `yaml:"Model,omitempty" json:"model,omitempty"`
Result ApiResult `yaml:"Result,omitempty" json:"result"`
}
ApiResponse represents a Vision API service response.
func NewApiError ¶
func NewApiError(id string, code int) ApiResponse
NewApiError generates a Vision API error response based on the specified HTTP status code.
func NewCaptionResponse ¶
func NewCaptionResponse(id string, model *Model, result *CaptionResult) ApiResponse
NewCaptionResponse generates a new Vision API image caption service response.
func NewLabelsResponse ¶
func NewLabelsResponse(id string, model *Model, results classify.Labels) ApiResponse
NewLabelsResponse generates a new Vision API image classification service response.
func PerformApiRequest ¶
func PerformApiRequest(apiRequest *ApiRequest, uri, method, key string) (apiResponse *ApiResponse, err error)
PerformApiRequest performs a Vision API request and returns the result.
func (*ApiResponse) Err ¶
func (r *ApiResponse) Err() error
Err returns an error if the request has failed.
func (*ApiResponse) HasResult ¶
func (r *ApiResponse) HasResult() bool
HasResult checks if there is at least one result in the response data.
type ApiResult ¶
type ApiResult struct {
Labels []LabelResult `yaml:"Labels,omitempty" json:"labels,omitempty"`
Nsfw []nsfw.Result `yaml:"Nsfw,omitempty" json:"nsfw,omitempty"`
Embeddings []face.Embeddings `yaml:"Embeddings,omitempty" json:"embeddings,omitempty"`
Caption *CaptionResult `yaml:"Caption,omitempty" json:"caption,omitempty"`
}
ApiResult represents the model response(s) to a Vision API service request and can optionally include data from multiple models.
type CaptionResult ¶
type CaptionResult struct {
Text string `yaml:"Text,omitempty" json:"text,omitempty"`
Source string `yaml:"Source,omitempty" json:"source,omitempty"`
Confidence float32 `yaml:"Confidence,omitempty" json:"confidence,omitempty"`
}
CaptionResult represents the result generated by a caption generation model.
type ConfigValues ¶
type ConfigValues struct {
Models Models `yaml:"Models,omitempty" json:"models,omitempty"`
Thresholds Thresholds `yaml:"Thresholds,omitempty" json:"thresholds"`
}
ConfigValues represents computer vision configuration values for the supported Model types.
func NewConfig ¶
func NewConfig() *ConfigValues
NewConfig returns a new computer vision config with defaults.
func (*ConfigValues) IsCustom ¶
func (c *ConfigValues) IsCustom(t ModelType) bool
IsCustom checks whether the specified type uses a custom model or service.
func (*ConfigValues) IsDefault ¶
func (c *ConfigValues) IsDefault(t ModelType) bool
IsDefault checks whether the specified type is the built-in default model.
func (*ConfigValues) Load ¶
func (c *ConfigValues) Load(fileName string) error
Load user settings from file.
func (*ConfigValues) Model ¶
func (c *ConfigValues) Model(t ModelType) *Model
Model returns the first enabled model with the matching type. It returns nil if no matching model is available or every model of that type is disabled, allowing callers to chain nil-safe Model methods.
func (*ConfigValues) RunType ¶
func (c *ConfigValues) RunType(t ModelType) RunType
RunType returns the normalized run type for the first enabled model matching the provided type. Disabled or missing models fall back to RunNever so callers can treat the result as authoritative scheduling information.
func (*ConfigValues) Save ¶
func (c *ConfigValues) Save(fileName string) error
Save user settings to a file.
type Engine ¶
type Engine struct {
Builder RequestBuilder
Parser ResponseParser
Defaults EngineDefaults
}
Engine groups the callbacks required to integrate a third-party vision service.
type EngineDefaults ¶
type EngineDefaults interface {
SystemPrompt(model *Model) string
UserPrompt(model *Model) string
SchemaTemplate(model *Model) string
Options(model *Model) *ModelOptions
}
EngineDefaults supplies engine-specific prompt and schema defaults when they are not configured explicitly.
type EngineInfo ¶
type EngineInfo struct {
Uri string
RequestFormat ApiFormat
ResponseFormat ApiFormat
FileScheme string
DefaultModel string
DefaultResolution int
DefaultKey string // Optional placeholder key (e.g., ${OPENAI_API_KEY}); applied only when Service.Key is empty.
}
EngineInfo describes metadata that can be associated with an engine alias.
func EngineInfoFor ¶
func EngineInfoFor(name string) (EngineInfo, bool)
EngineInfoFor returns the metadata associated with a logical engine name.
type Files ¶
type Files = []string
Files holds a list of input file paths or URLs for vision requests.
type LabelResult ¶
type LabelResult struct {
Name string `yaml:"Name,omitempty" json:"name"`
Source string `yaml:"Source,omitempty" json:"source"`
Priority int `yaml:"Priority,omitempty" json:"priority,omitempty"`
Confidence float32 `yaml:"Confidence,omitempty" json:"confidence,omitempty"`
Topicality float32 `yaml:"Topicality,omitempty" json:"topicality,omitempty"`
Categories []string `yaml:"Categories,omitempty" json:"categories,omitempty"`
NSFW bool `yaml:"Nsfw,omitempty" json:"nsfw,omitempty"`
NSFWConfidence float32 `yaml:"NsfwConfidence,omitempty" json:"nsfw_confidence,omitempty"`
}
LabelResult represents a label generated by an image classification model.
func (LabelResult) ToClassify ¶
func (r LabelResult) ToClassify(labelSrc string) classify.Label
ToClassify returns the label results as classify.Label.
type Model ¶
type Model struct {
Type ModelType `yaml:"Type,omitempty" json:"type,omitempty"`
Default bool `yaml:"Default,omitempty" json:"default,omitempty"`
Model string `yaml:"Model,omitempty" json:"model,omitempty"`
Name string `yaml:"Name,omitempty" json:"name,omitempty"`
Version string `yaml:"Version,omitempty" json:"version,omitempty"`
Engine ModelEngine `yaml:"Engine,omitempty" json:"engine,omitempty"`
Run RunType `yaml:"Run,omitempty" json:"Run,omitempty"` // "auto", "never", "manual", "always", "newly-indexed", "on-schedule"
System string `yaml:"System,omitempty" json:"system,omitempty"`
Prompt string `yaml:"Prompt,omitempty" json:"prompt,omitempty"`
Format string `yaml:"Format,omitempty" json:"format,omitempty"`
Schema string `yaml:"Schema,omitempty" json:"schema,omitempty"`
SchemaFile string `yaml:"SchemaFile,omitempty" json:"schemaFile,omitempty"`
Resolution int `yaml:"Resolution,omitempty" json:"resolution,omitempty"`
TensorFlow *tensorflow.ModelInfo `yaml:"TensorFlow,omitempty" json:"tensorflow,omitempty"`
Options *ModelOptions `yaml:"Options,omitempty" json:"options,omitempty"`
Service Service `yaml:"Service,omitempty" json:"service"`
Path string `yaml:"Path,omitempty" json:"-"`
Disabled bool `yaml:"Disabled,omitempty" json:"disabled,omitempty"`
// contains filtered or unexported fields
}
Model represents a computer vision model configuration.
func (*Model) ApplyEngineDefaults ¶
func (m *Model) ApplyEngineDefaults()
ApplyEngineDefaults normalizes the engine name and applies registered engine defaults (formats, schemes, resolution) when these are not explicitly configured.
func (*Model) ApplyService ¶
func (m *Model) ApplyService(apiRequest *ApiRequest)
ApplyService updates the ApiRequest with service-specific values when configured.
func (*Model) ClassifyModel ¶
ClassifyModel returns the matching classify model instance, if any. Nil receivers return nil.
func (*Model) Endpoint ¶
Endpoint returns the remote service request method and endpoint URL. Nil receivers return empty strings.
func (*Model) EndpointFileScheme ¶
EndpointFileScheme returns the endpoint API request file scheme type. Nil receivers fall back to the global default scheme.
func (*Model) EndpointKey ¶
EndpointKey returns the access token belonging to the remote service endpoint, or an empty string for nil receivers.
func (*Model) EndpointRequestFormat ¶
EndpointRequestFormat returns the endpoint API request format. Nil receivers fall back to the global default format.
func (*Model) EndpointResponseFormat ¶
EndpointResponseFormat returns the endpoint API response format. Nil receivers fall back to the global default format.
func (*Model) EngineName ¶
EngineName returns the normalized engine identifier or infers one from the request configuration. Nil receivers return an empty string.
func (*Model) FaceModel ¶
FaceModel returns the matching face recognition model instance, if any. Nil receivers return nil.
func (*Model) GetFormat ¶
GetFormat returns the configured response format or a sensible default. Nil receivers return an empty string.
func (*Model) GetModel ¶
GetModel returns the normalized model identifier, name, and version strings used in service requests. Callers can always destructure the tuple because nil receivers return empty values.
func (*Model) GetOptions ¶
func (m *Model) GetOptions() *ModelOptions
GetOptions returns the API request options, applying engine defaults on demand. Nil receivers return nil.
func (*Model) GetPrompt ¶
GetPrompt returns the configured model prompt, using engine defaults when none is specified. Nil receivers return an empty string.
func (*Model) GetSource ¶
GetSource returns the default entity src based on the model configuration.
func (*Model) GetSystemPrompt ¶
GetSystemPrompt returns the configured system prompt, falling back to engine defaults when none is specified. Nil receivers return an empty string.
func (*Model) IsDefault ¶
IsDefault reports whether the model refers to one of the built-in defaults. Nil receivers return false.
func (*Model) NsfwModel ¶
NsfwModel returns the matching nsfw model instance, if any. Nil receivers return nil.
func (*Model) PromptContains ¶
PromptContains returns true if the prompt contains the specified substring.
func (*Model) RunType ¶
RunType returns the normalized run type configured for the model. Nil receivers default to RunAuto.
func (*Model) SchemaInstructions ¶
SchemaInstructions returns a helper string that can be appended to prompts. Nil receivers return an empty string.
func (*Model) SchemaTemplate ¶
SchemaTemplate returns the model-specific JSON schema template, if any. Nil receivers return an empty string.
type ModelEngine ¶
type ModelEngine = string
ModelEngine represents the canonical identifier for a computer vision service engine.
const ( // EngineVision represents the default PhotoPrism vision service endpoints. EngineVision ModelEngine = "vision" // EngineTensorFlow represents on-device TensorFlow models. EngineTensorFlow ModelEngine = "tensorflow" // EngineLocal is used when no explicit engine can be determined. EngineLocal ModelEngine = "local" )
type ModelOptions ¶
type ModelOptions struct {
Temperature float64 `yaml:"Temperature,omitempty" json:"temperature,omitempty"` // Ollama, OpenAI
TopK int `yaml:"TopK,omitempty" json:"top_k,omitempty"` // Ollama
TopP float64 `yaml:"TopP,omitempty" json:"top_p,omitempty"` // Ollama, OpenAI
MinP float64 `yaml:"MinP,omitempty" json:"min_p,omitempty"` // Ollama
TypicalP float64 `yaml:"TypicalP,omitempty" json:"typical_p,omitempty"` // Ollama
TfsZ float64 `yaml:"TfsZ,omitempty" json:"tfs_z,omitempty"` // Ollama
Seed int `yaml:"Seed,omitempty" json:"seed,omitempty"` // Ollama
NumKeep int `yaml:"NumKeep,omitempty" json:"num_keep,omitempty"` // Ollama
RepeatLastN int `yaml:"RepeatLastN,omitempty" json:"repeat_last_n,omitempty"` // Ollama
RepeatPenalty float64 `yaml:"RepeatPenalty,omitempty" json:"repeat_penalty,omitempty"` // Ollama
PresencePenalty float64 `yaml:"PresencePenalty,omitempty" json:"presence_penalty,omitempty"` // OpenAI
FrequencyPenalty float64 `yaml:"FrequencyPenalty,omitempty" json:"frequency_penalty,omitempty"` // OpenAI
PenalizeNewline bool `yaml:"PenalizeNewline,omitempty" json:"penalize_newline,omitempty"` // Ollama
Stop []string `yaml:"Stop,omitempty" json:"stop,omitempty"` // Ollama, OpenAI
Mirostat int `yaml:"Mirostat,omitempty" json:"mirostat,omitempty"` // Ollama
MirostatTau float64 `yaml:"MirostatTau,omitempty" json:"mirostat_tau,omitempty"` // Ollama
MirostatEta float64 `yaml:"MirostatEta,omitempty" json:"mirostat_eta,omitempty"` // Ollama
NumPredict int `yaml:"NumPredict,omitempty" json:"num_predict,omitempty"` // Ollama
MaxOutputTokens int `yaml:"MaxOutputTokens,omitempty" json:"max_output_tokens,omitempty"` // Ollama, OpenAI
ForceJson bool `yaml:"ForceJson,omitempty" json:"force_json,omitempty"` // Ollama, OpenAI
SchemaVersion string `yaml:"SchemaVersion,omitempty" json:"schema_version,omitempty"` // Ollama, OpenAI
CombineOutputs string `yaml:"CombineOutputs,omitempty" json:"combine_outputs,omitempty"` // OpenAI
Detail string `yaml:"Detail,omitempty" json:"detail,omitempty"` // OpenAI
NumCtx int `yaml:"NumCtx,omitempty" json:"num_ctx,omitempty"` // Ollama, OpenAI
NumThread int `yaml:"NumThread,omitempty" json:"num_thread,omitempty"` // Ollama
NumBatch int `yaml:"NumBatch,omitempty" json:"num_batch,omitempty"` // Ollama
NumGpu int `yaml:"NumGpu,omitempty" json:"num_gpu,omitempty"` // Ollama
MainGpu int `yaml:"MainGpu,omitempty" json:"main_gpu,omitempty"` // Ollama
LowVram bool `yaml:"LowVram,omitempty" json:"low_vram,omitempty"` // Ollama
VocabOnly bool `yaml:"VocabOnly,omitempty" json:"vocab_only,omitempty"` // Ollama
UseMmap bool `yaml:"UseMmap,omitempty" json:"use_mmap,omitempty"` // Ollama
UseMlock bool `yaml:"UseMlock,omitempty" json:"use_mlock,omitempty"` // Ollama
Numa bool `yaml:"Numa,omitempty" json:"numa,omitempty"` // Ollama
}
ModelOptions represents additional model parameters listed in the documentation. Comments note which engines currently honor each field.
type ModelType ¶
type ModelType = string
ModelType defines the classifier type used by a vision model (labels, caption, face, etc.).
const ( // ModelTypeLabels runs label detection. ModelTypeLabels ModelType = "labels" // ModelTypeNsfw runs NSFW detection. ModelTypeNsfw ModelType = "nsfw" // ModelTypeFace performs face detection or recognition. ModelTypeFace ModelType = "face" // ModelTypeCaption generates captions. ModelTypeCaption ModelType = "caption" // ModelTypeGenerate produces new content (e.g., text-to-image), when supported. ModelTypeGenerate ModelType = "generate" )
type ModelTypes ¶
type ModelTypes = []ModelType
ModelTypes is a list of model type identifiers.
func ParseModelTypes ¶
func ParseModelTypes(s string) (types ModelTypes)
ParseModelTypes parses a model type string.
type RequestBuilder ¶
type RequestBuilder interface {
Build(ctx context.Context, model *Model, files Files) (*ApiRequest, error)
}
RequestBuilder builds an API request for an engine based on the model configuration and input files.
type ResponseParser ¶
type ResponseParser interface {
Parse(ctx context.Context, req *ApiRequest, raw []byte, status int) (*ApiResponse, error)
}
ResponseParser parses a raw engine response into the generic ApiResponse structure.
type RunType ¶
type RunType = string
RunType specifies when a vision model should be run.
const ( // RunAuto automatically decides when to run based on model type and configuration. RunAuto RunType = "" // RunNever disables the model entirely. RunNever RunType = "never" // RunManual runs only when explicitly invoked (e.g., via the "vision run" command). RunManual RunType = "manual" // RunAlways runs manually, on-schedule, on-demand, and on-index. RunAlways RunType = "always" // RunNewlyIndexed runs manually and for newly indexed pictures. RunNewlyIndexed RunType = "newly-indexed" // RunOnDemand runs manually, for newly indexed pictures, and on configured schedule. RunOnDemand RunType = "on-demand" // RunOnSchedule runs manually and on-schedule. RunOnSchedule RunType = "on-schedule" // RunOnIndex runs manually and after indexing. RunOnIndex RunType = "on-index" )
func ParseRunType ¶
ParseRunType parses a run type string into the canonical RunType constant. Unknown or empty values default to RunAuto.
type Service ¶
type Service struct {
Uri string `yaml:"Uri,omitempty" json:"uri"`
Method string `yaml:"Method,omitempty" json:"method"`
Model string `yaml:"Model,omitempty" json:"model,omitempty"` // Optional endpoint-specific model override.
Username string `yaml:"Username,omitempty" json:"-"` // Optional basic auth user injected into Endpoint URLs.
Password string `yaml:"Password,omitempty" json:"-"`
Key string `yaml:"Key,omitempty" json:"-"`
Org string `yaml:"Org,omitempty" json:"org,omitempty"` // Optional organization header (e.g. OpenAI).
Project string `yaml:"Project,omitempty" json:"project,omitempty"` // Optional project header (e.g. OpenAI).
Think string `yaml:"Think,omitempty" json:"think,omitempty"` // Optional reasoning hint for compatible engines (e.g. Ollama, GPT-OSS).
FileScheme string `yaml:"FileScheme,omitempty" json:"fileScheme,omitempty"`
RequestFormat ApiFormat `yaml:"RequestFormat,omitempty" json:"requestFormat,omitempty"`
ResponseFormat ApiFormat `yaml:"ResponseFormat,omitempty" json:"responseFormat,omitempty"`
Disabled bool `yaml:"Disabled,omitempty" json:"disabled,omitempty"`
}
Service represents a remote computer vision service configuration.
func (*Service) Endpoint ¶
Endpoint returns the remote service request method and endpoint URL, if any.
func (*Service) EndpointFileScheme ¶
EndpointFileScheme returns the endpoint API file scheme type.
func (*Service) EndpointKey ¶
EndpointKey returns the access token belonging to the remote service endpoint, if any.
func (*Service) EndpointOrg ¶
EndpointOrg returns the organization identifier for the endpoint, if any.
func (*Service) EndpointProject ¶
EndpointProject returns the project identifier for the endpoint, if any.
func (*Service) EndpointRequestFormat ¶
EndpointRequestFormat returns the endpoint API request format.
func (*Service) EndpointResponseFormat ¶
EndpointResponseFormat returns the endpoint API response format.
func (*Service) EndpointThink ¶
EndpointThink returns the optional thinking/reasoning setting for the endpoint, if any.
type Thresholds ¶
type Thresholds struct {
Confidence int `yaml:"Confidence,omitempty" json:"confidence,omitempty"`
Topicality int `yaml:"Topicality,omitempty" json:"topicality,omitempty"`
NSFW int `yaml:"NSFW,omitempty" json:"nsfw,omitempty"`
}
Thresholds are expressed as percentages (0-100) and gate label acceptance, topicality, and NSFW handling for the configured vision models.
func (*Thresholds) GetConfidence ¶
func (t *Thresholds) GetConfidence() int
GetConfidence returns the Confidence threshold in percent from 0 to 100.
func (*Thresholds) GetConfidenceFloat32 ¶
func (t *Thresholds) GetConfidenceFloat32() float32
GetConfidenceFloat32 returns the Confidence threshold as float32 for comparison.
func (*Thresholds) GetNSFW ¶
func (t *Thresholds) GetNSFW() int
GetNSFW returns the NSFW threshold in percent from 0 to 100.
func (*Thresholds) GetNSFWFloat32 ¶
func (t *Thresholds) GetNSFWFloat32() float32
GetNSFWFloat32 returns the NSFW threshold as float32 for comparison.
func (*Thresholds) GetTopicality ¶
func (t *Thresholds) GetTopicality() int
GetTopicality returns the Topicality threshold in percent from 0 to 100.
func (*Thresholds) GetTopicalityFloat32 ¶
func (t *Thresholds) GetTopicalityFloat32() float32
GetTopicalityFloat32 returns the Topicality threshold as float32 for comparison.
Source Files
¶
- api_client.go
- api_format.go
- api_ollama.go
- api_request.go
- api_response.go
- caption.go
- config.go
- engine.go
- engine_ollama.go
- engine_openai.go
- errors.go
- face.go
- faces.go
- label_normalizer.go
- labels.go
- model.go
- model_filters.go
- model_options.go
- model_run.go
- model_types.go
- models.go
- nsfw.go
- resolution.go
- service.go
- thresholds.go
- topicality.go
- vision.go
- vision_env.go
Directories
¶
| Path | Synopsis |
|---|---|
|
Package ollama integrates PhotoPrism's vision pipeline with Ollama-compatible multi-modal models so adapters can share logging and engine helpers.
|
Package ollama integrates PhotoPrism's vision pipeline with Ollama-compatible multi-modal models so adapters can share logging and engine helpers. |
|
Package openai implements the PhotoPrism vision adapter that calls the OpenAI Responses API for captions, labels, and optional markers.
|
Package openai implements the PhotoPrism vision adapter that calls the OpenAI Responses API for captions, labels, and optional markers. |
|
Package schema defines canonical JSON and JSON Schema templates shared by PhotoPrism's AI vision engines.
|
Package schema defines canonical JSON and JSON Schema templates shared by PhotoPrism's AI vision engines. |