Profinite word

In mathematics, more precisely in formal language theory, the profinite words are a generalization of the notion of finite words into a complete topological space. This notion allows the use of topology to study languages and finite semigroups. For example, profinite words are used to give an alternative characterization of the algebraic notion of a variety of finite semigroups.

Definition

Let A be an alphabet. The set of profinite words over A consists of the completion of a metric space whose domain is the set $A^{*}$ of words over A. The distance used to define the metric is given using a notion of separation of words. Those notions are now defined.

Separation

Let M and N be monoids, and let p and q be elements of the monoid M. Let φ be a morphism of monoids from M to N. It is said that the morphism φ separates p and q if $\phi (p)\neq \phi (q)$ . For example, the morphism $\phi :A^{*}\to \mathbb {Z} /2\mathbb {Z} ,w\mapsto |w|(\operatorname {mod} 2)$ sending a word to the parity of its length separates the words ababa and abaa. Indeed $\phi (ababa)=1\neq 0=\phi (abaa)$ .

It is said that N separates p and q if there exists a morphism of monoids φ from M to N that separates p and q. Using the previous example, $\mathbb {Z} /2\mathbb {Z}$ separates ababa and abaa. More generally, $\mathbb {Z} /n\mathbb {Z}$ separates any words whose size are not congruent modulo n. In general, any two distinct words can be separated, using the monoid whose elements are the factors of p plus a fresh element 0. The morphism sends prefixes of p to themselves and everything else to 0.

Distance

The distance between two distinct words p and q is defined as the inverse of the size of the smallest monoid N separating p and q. Thus, the distance of ababa and abaa is ${\frac {1}{2}}$ . The distance of p to itself is defined as 0.

This distance d is an ultrametric, that is, $d(x,z)\leq \max \left\{d(x,y),d(y,z)\right\}$ . Furthermore it satisfies $d(uw,vw)\leq d(u,v)$ and $d(wu,wv)\leq d(u,v)$ . Since any word p can be separated from any other word using a monoid with |p|+1 elements, where |p| is the length of p, it follows that the distance between p and any other word is at least ${\frac {1}{|p|}}$ . Thus the topology defined by this metric is discrete.

Profinite topology

The profinite completion of $A^{*}$ , denoted ${\widehat {A^{*}}}$ , is the completion of the set of finite words under the distance defined above. The completion preserves the monoid structure.

The topology on ${\widehat {A^{*}}}$ is compact.

Any monoid morphism $\phi :A^{*}\to M$ , with M finite can be extended uniquely into a monoid morphism ${\widehat {\phi }}:{\widehat {A^{*}}}\to M$ , and this morphism is uniformly continuous (using any metric on $M$ compatible with the discrete topology). Furthermore, ${\widehat {A^{*}}}$ is the least topological space with this property.

Profinite word

A profinite word is an element of ${\widehat {A^{*}}}$ . And a profinite language is a set of profinite words. Every finite word is a profinite word. A few examples of profinite words that are not finite are now given.

For m any word, let $m^{\omega }$ denote $\lim _{i\to \infty }m^{i!}$ , which exists because $m^{i!}$ is a Cauchy sequence. Intuitively, to separate $m^{i!}$ and $m^{i'!}$ , a monoid should count at least up to $\min(i,i')$ , and hence requires at least $\min(i,i')$ elements. Since $m^{i!}$ is a Cauchy sequence, $m^{\omega }$ is indeed a profinite word.

Furthermore, the word $m^{\omega }$ is idempotent. This is due to the fact that, for any morphism $\phi :A^{*}\to N$ with N finite, $\phi (m^{i!})=\phi (m)^{i!}$ . Since N is finite, for i great enough, $\phi (m)^{i!}$ is idempotent, and the sequence is constant.

Similarly, $m^{\omega +1}$ and $m^{\omega -1}$ are defined as $\lim _{n\to \infty }m^{n!+1}$ and $\lim _{n\to \infty }m^{n!-1}$ respectively.

Profinite languages

The notion of profinite languages allows one to relate notions of semigroup theory to notions of topology. More precisely, given P a profinite language, the following statements are equivalent:

P is clopen.
P is recognizable,
The syntactic congruence of P is clopen, as a subset of ${\widehat {A^{*}}}\times {\widehat {A^{*}}}$ .

Similar statements also hold for languages P of finite words. The following conditions are equivalent.

$P$ is recognisable (as a subset of $A^{*}$ ),
the closure of P, ${\overline {P}}$ , is recognisable (as a subset of ${\widehat {A^{*}}}$ )
$P=K\cap A^{*}$ , for some clopen K,
${\overline {P}}$ is clopen.

Those characterisations are due to the more general fact that, taking the closure of a language of finite words, and restricting a profinite language to finite words are inverse operations, when they are applied to recognisable languages.

References

Pin, Jean-Éric (2016-11-30). Mathematical Foundations of Automata Theory (PDF). pp. 130–139.

Almeida, Jorge (1994). Finite semigroups and universal algebra. River Edge, NJ: World Scientific Publishing Co. Inc. ISBN 981-02-1895-8.