budoux

package module
v0.7.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Oct 22, 2025 License: Apache-2.0 Imports: 2 Imported by: 0

README

BudouX for Go

BudouX is a standalone, small, and language-neutral phrase segmenter tool that provides beautiful and legible line breaks. This fork targets Go 1.23+, adopting the iterator APIs and model optimisations introduced in Go 1.23.

Supported Languages

  • Japanese (ja)
  • Simplified Chinese (zh-hans)
  • Traditional Chinese (zh-hant)
  • Thai (th)

Installation

go get github.com/soundkitchen/go-budoux

Usage

Simple usage
package main

import (
    "fmt"
    budoux "github.com/soundkitchen/go-budoux"
)

func main() {
    parser := budoux.NewDefaultJapaneseParser()
    phrases := parser.Parse("今日は良い天気ですね。")
    fmt.Println(phrases)
    // Output: [今日は 良い 天気ですね。]
}
Supported languages and their default parsers
  • Japanese: budoux.NewDefaultJapaneseParser()
  • Simplified Chinese: budoux.NewDefaultSimplifiedChineseParser()
  • Traditional Chinese: budoux.NewDefaultTraditionalChineseParser()
  • Thai: budoux.NewDefaultThaiParser()

API Reference

Parser

The main interface for text segmentation.

Constructor Functions
// Create parsers with default models
func NewDefaultJapaneseParser() *Parser
func NewDefaultSimplifiedChineseParser() *Parser
func NewDefaultTraditionalChineseParser() *Parser
func NewDefaultThaiParser() *Parser

// Create parser with custom model
func New(model models.Model) *Parser
Methods
// Parse segments input text into phrases
func (p *Parser) Parse(sentence string) []string

Testing

Run the test suite:

GOCACHE=$(pwd)/.cache go test ./...

Run tests with verbose output:

GOCACHE=$(pwd)/.cache go test -v ./...

Run the parser benchmark:

GOCACHE=$(pwd)/.cache go test -bench=BenchmarkParserParse -benchmem ./...

Regenerating models

The bundled language models are generated from the upstream BudouX JSON assets. To refresh them, run:

GOCACHE=$(pwd)/.cache go generate ./gen

License

Copyright 2021 Google LLC

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

https://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Parser

type Parser struct {
	// contains filtered or unexported fields
}

func New

func New(model models.Model) *Parser

Create new parser.

func NewDefaultJapaneseParser

func NewDefaultJapaneseParser() *Parser

NewDefaultJapaneseParser returns new Parser with default japanese model.

func NewDefaultSimplifiedChineseParser

func NewDefaultSimplifiedChineseParser() *Parser

NewDefaultSimplifiedChineseParser returns new Parser with default simplified chinese model.

func NewDefaultThaiParser

func NewDefaultThaiParser() *Parser

NewDefaultThaiParser returns new Parser with default thai model.

func NewDefaultTraditionalChineseParser

func NewDefaultTraditionalChineseParser() *Parser

NewDefaultTraditionalChineseParser returns new Parser with default traditional chinese model.

func (*Parser) Parse

func (p *Parser) Parse(sentence string) []string

Parses a sentence into phrases.

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL