NHyphenator – .NET library for multi-language soft hyphenation

Let's talk about NHyphenator – a .NET library for automated text hyphenation.

What is a soft hyphen?

If you want to improve the look of your web articles, probably you should need to add hyphenation. However, web-browsers, unlike the Word or the Open Office, don't know how to add hyphens to your texts, unless you help them.

For hyphenate text in the browser, you need to add a special symbol to all positions where a hyphen is possible. This symbol is the soft hyphen, in HTML you can add it as ­ (soft hyphen). After this, web-browser or other renderer will be able to wrap your text both by spaces and by soft hyphens. If it is wrap by soft hyphen, browser will render a real hyphen symbol, but in other cases, when word don't need wrap, browser don't render anything and you will see whole word without spaces and hyphens.

The Algorithm

The most popular algorithm for automatic hyphenation is Knuth-Liang algorithm. It's patterns-based algorithm used in TeX, Open Office, and other products. It means, that algorithm suitable for any language if you provide right patterns for the algorithm. But, due to patterns usage, this algorithm not very fast, and probably, you can find another, more fast and language-specific solutions, but there aren't another universal algorithms. Because of this, NHyphenator implements this algorithm

How to use

NHyphenator is a .NET library and we can use it in any projects compatible with .NET 4.0 or .NET Standard 2.0.

For instal through Nuget, just type

Install-Package NHyphenator

NHyphenator have built-in patterns for English and Russian language, so you can use ResourceLoader

var loader = new ResourceHyphenatePatternsLoader(HyphenatePatternsLanguage.Russian);
Hypenator hypenator = new Hypenator(loader);
var result = hypenator.HyphenateText(text);

For other languages (or if you want load own patterns) , you can use another loader, for load patterns from files

var loader = new new FilePatternsLoader($"{patterns_path}", $"{exceptions_path}");

You can find patterns for your language in TeX repo – link *.pat.txt files contain patterns, *.hyp.txt files contain exceptions

Also, you can change following settings through constructor params

HyphenateSymbol - Symbol used to denote hyphenation
MinWordLength - Minimum word length for hyphenation word
MinLetterCount - Minimum number of characters left on line
HyphenateLastWord - Hyphenate the last word, NOTE: this option works only if input text contains more than one word

Licence

All source code of NHyphenator distributed via Apache 2.0 Licence. TeX Patterns distributed via LaTeX Project Public License.

Hyphenation algorithm

TeX patterns

Nuget

GitHub


comments powered by HyperComments
Яндекс.Метрика