# Is this a word?

Toki pona is a minimalist constructed language, and thus it has minimalist phonotactics (rules describing what sounds make valid words).

Toki Pona has 8 consonant sounds `m`, `n`, `p`, `t`, `k`, `s`, `w`, `l` and `j` and 5 vowel sounds `a`, `e`, `i`, `o`, `u`. A single basic syllable in toki pona consists of any 1 consonant, any one vowel and optionally an `n`. So all of the following are valid:

``````pi
ko
wa
san
jen
``````

There are four exceptions of sequences that are forbidden they are `ji, wu, wo, ti`.

With the basic syllables we can build words. A word is just a string of basic syllables with two special rules:

1. When we have an `n` followed by either `n` or `m` we drop the `n`. e.g. `jan` + `mo` is `jamo` not `janmo`
2. The initial syllable of a word can drop the initial consonant. e.g. `awen` and `pawen` are both legal and distinct words. This does allow for words consisting of just a vowel.

Your task is to take a non-empty string consisting of lowercase alphabetic characters (`a` through `z`) and determine if it makes a valid toki pona word.

When the input is a valid word you should output one value and when it is not you should output another distinct value.

This is so answers will be scored in bytes with the goal being to minimize the size of the source.

## Test cases

Accept:

``````awen
jan
e
monsuta
kepekin
ike
sinpin
pakala
tamako
jamo
pankulato
pawen
an
nene
nenen
``````

Reject:

``````bimi
ponas
plani
womo
won
tin
wu
miji
sunmo
anna
sam
kain
op
p
n
``````
• "we drop the `n`" - is that both of them in the case when there are two? Jan 14 at 12:22
• @JonathanAllan Only 1. Jan 14 at 12:23
• Since it is a decision problem I have, so far, assumed truthy and falsey are valid as output sets (even though the post states stuff about distinct values). Is that OK? Jan 14 at 13:22
• @JonathanAllan The outputs are as stated in the question. Jan 14 at 14:08
• Fair enough, but I'd suggest rethinking that standpoint for the future - if a language has no `if-else`, `logical processing` or equivalent construct then those languages simply have no concept of truthy and falsey (by definition) and therefore should just use the two distinct values option that should be available to them. Most languages do, so the strict IO just adds extra, boring work. Furthermore competition is intra-language, so the "disadvantage" argument holds no weight. Jan 14 at 14:25

# Retina 0.8.2, 48 47 bytes

``````A`ji|nm|nn|ti|wu|wo
^((^|[j-npstw])[aeiou]n?)+\$
``````

Try it online! Link includes test cases. Edit: Saved 1 obvious byte thanks to @ovs. Explanation:

``````A`ji|nm|nn|ti|wu|wo
``````

Delete invalid inputs.

``````^((^|[j-npstw])[aeiou]n?)+\$
``````

Match valid inputs that weren't invalidated above.

• `j-mn` -> `j-n` for -1 ;)
– ovs
Jan 14 at 12:31
• @ovs Ugh, I can't believe I missed that...
– Neil
Jan 14 at 14:35

# Jelly,  56  51 bytes

+1 to cater for strict IO (two distinct outputs rather than truthy vs falsey being allowed)

``````“jtklmnpsw”,ØẹŒpṖṖ¬3,8¦p”n;ƊṗⱮLẎF€⁾mnyw⁾nnƲÐḟḊ€;\$e@
``````

A (very inefficiant) monadic Link that yields `0` when the input string is not a Toki Pona word and `1` when it is.

(Don't) Try it online! (it's so inefficient it'll only complete for words of length three or less!)

...but here is a test-suite that has all tests except the four syllable `pankulato` that (a) limits to three base-syllables, rather than that of the number of characters in the input string and (b) only calls the word-generating code once for all (hence the `e@` has been moved out to the footer).

### How?

We construct a list containing ALL valid Toki Pona words constructed from at most `length(input)` syllables and check if the input is in there.

Yep that's soooo nasty, but without easy regex access I imagine it's the golfiest way.

``````“jtklmnpsw”,ØẹŒpṖṖ¬3,8¦p”n;Ɗṗ - (partial) Link: integer (from below!)
“jtklmnpsw”                   - "jtklmnpsw"
Øẹ                - "aeiou"
,                  - pair
Œp              - Catesian product
ṖṖ            - pop off "wu" and "wo"
3,8¦       - apply to indices 3 & 8 ("ji" & "ti"):
¬           -   logical NOT (replace these with [0,0] (integers)
”n    -   'n'
p      -   Cartesian product (appends 'n' to each)
;   -   concatenate
ṗ - Catiasian power (the integer)

...ⱮLẎF€⁾mnyw⁾nnƲÐḟḊ€;\$e@ - (continued) Link: string, S
... L                     - length of S
...Ɱ                      - map across [1..length(S)] with:
...                       -   code above -> base-syllable combos of each length
Ẏ                    - tighten
F€                  - flatten each
Ðḟ       - filter discard those for which:
⁾mn               -     "mn"
y              -     translate (convert ms to ns)
⁾nn          -     "nn"
w             -     index of first occurrence (or zero)
Ḋ€     -   dequeue each
;    -   concatenate
@ - with swapped arguments:
e  -   S exists in there?
``````

# TypeScript type system, 313 bytes

``````type v="a"|"e"|"i"|"o"|"u";type i<T>=T extends""?1:T extends`\${Exclude<`\${"m"|"n"|"p"|"t"|"k"|"s"|"w"|"l"|"j"}\${v}`,"ji"|"wu"|"wo"|"ti">}\${infer r}`?i<r>extends 1?1:r extends`n\${infer e}`?e extends`\${"n"|"m"}\${any}`?0:i<e>:0:0;type o<T>=T extends`\${v}\${infer p}`?i<p>extends 1?1:p extends`n\${infer r}`?i<r>:0:i<T>
``````

This is written entirely with TypeScript types - the `o` type outputs 1 if the input parameter is a valid word and 0 if it is not. There's probably some room for further golfing.

# Charcoal, 5958 55 bytes

``````∧θ¬⊙⪪”&↧q1o⁺VＰα”²№θι≔aeiouηＦ⮌θ¿№ηι≔⁻”&↧ï⁸t∕p№t⟦”ηη¿⁻ιn⎚
``````

Try it online! Link is to verbose version of code. Explanation:

``````∧θ¬⊙⪪”&↧q1o⁺VＰα”²№θι
``````

Check that the word doesn't contain any of the illegal letter pairs contained in the compressed string.

``````≔aeiouη
``````

Start by expecting the last character to be a vowel.

``````Ｆ⮌θ
``````

Loop over the word in reverse.

``````¿№ηι
``````

If we see an expected letter, ...

``````≔⁻”&↧ï⁸t∕p№t⟦”ηη
``````

... then flip the set of expected letters by subtracting it from the string all the legal Toki Pona letters grouped into vowels and consonants.

``````¿⁻ιn
``````

Otherwise, if the current letter is not an `n`, ...

``````⎚
``````

... then erase any previous validity there might have been.

# 05AB1E, 49 bytes

``````„nn„nm‚åà≠×ε.•2Ñ|qγù•žMâ¨¨D27SèKD'n««N>ãJ}˜D€¦«Iå
``````

Port of @JonathanAllan's Jelly answer, but even slower.. :/
Outputs `1`/`0` for accept/reject respectively.

Try it online.
As is it's too slow for a test suite, but by adding `2äн` between the `×` and `ε` (map over halve the input-length instead), we can verify all but the longest few truthy test cases and falsey test cases respectively, in separated test suites.

Explanation:

``````„nn„nm‚               # Push pair ["nn","nm"]
åà≠            # Check that NEITHER is present in the (implicit) input
×           # 'Multiply' it by the (implicit) input-string
# (the input if truthy; "" if falsey)
ε                     # Map over the characters:
.•2Ñ|qγù•            #  Push compressed string "jtklmnpsw"
žM          #  Push builtin vowels "aeiou"
â         #  Pop both, and create a list of all possible char-pairs
¨¨       #  Remove the last two ("wu" and "wo")
D      #  Duplicate the list
27S   #  Push pair [2,7]
è  #  Index those into the copy: ["ji","ti"]
K #  Remove those as well
D                    #  Duplicate the list again
'n«                '#  Append an "n" to each string
«                #  Merge the two lists together
N                    #  Push the 0-based map-index
>                   #  Increase it by 1 to make it 1-based
ã                  #  Cartesian product this index on the list of syllables
J                 #  Join each inner list together to a string
}˜                    # After the map: flatten the list of lists
D                   # Duplicate the list
€¦                 # Remove the first consonant from each
«                # Merge the two lists together
Iå                    # Check if the input-string is in this list
# (after which the result is output implicitly)
``````

See this 05AB1E tip of mine (section How to compress strings not part of the dictionary?) to understand why `.•2Ñ|qγù•` is `"jtklmnpsw"`.

# Pip, 5653 47 bytes

-3 bytes by porting Neil's Retina answer

``````X<>"jiwuwotinnnm"NIa&a~=+:`^|[j-nptsw]`+XV.`n?`
``````

Returns 1 for a valid word, 0 for an invalid word. Attempt This Online!

Verify all test cases

### Explanation

At its core, this solution works similarly to Neil's Retina answer:

• The input does not contain any of the illegal sequences `ji`, `wu`, `wo`, `ti`, `nn`, or `nm`; AND
• The input fully matches the regex `((^|[j-nptsw])[aeiou]n?)+`

First half:

``````X<>"jiwuwotinnnm"NIa
"jiwuwotinnnm"     That string
<>                   Grouped into pairs of characters
X                     Converted to a regex that matches any of those pairs
NI   Does not match in
a  The command-line argument
``````

Second half:

``````a~=+:`^|[j-nptsw]`+XV.`n?`
`^|[j-nptsw]`          That regex
+         Wrapped in a non-capturing group and followed by
XV       Built-in regex `[aeiou]`
.      Followed by
`n?`  That regex
+:                       Apply the + quantifier to the above wrapped in n.c. group
a~=                         Command-line argument fully matches that regex
``````

# Perl 5`-p`, 64 bytes

``````\$_=!/[jt]i|wu|wo|nm|nn/&&/^([aeiou]n?)?([mnptkswlj][aeiou]n?)*\$/
``````

Try it online!

• -8 bytes: `\$_=!/[jt]i|wu|wo|nm|nn/&&/^((^|[mnptkswlj])[aeiou]n?)*\$/` - instead of checking for vowel+n for the first syllable, match start of string as the "consonant" for any syllable. Jan 14 at 23:31
• Use the `j-n` character range instead of listing out the consonants. Also, it might work to use `<` instead of `!` and `&&`.
– Neil
Jan 15 at 12:58
• Yes, it does work, and Ivan's of course needs a +, as in the Retina answer. 49 bytes: `/[jt]i|nm|nn|wu|wo/</^((^|[j-npstw])[aeiou]n?)+\$/` Jan 17 at 21:15

# Python 3, 97 88 86 bytes

``````lambda x:re.sub("((?!ji|wu|wo|ti|.*n[nm])(^|[j-npstw])[aeiou]n?)*\$","",x)>""
import re
``````

Try it online!

return `False` for valid word, `True` for invalid

Thanks to @14m2 for -2 bytes

## How it works:

• at each syllable, we chek for `ji|wu|wo|ti` and prevent any capture if it is present. We also chek for the presence of either `nn` or `nm` further in the word.
• if it was absent, we capture the syllable (consonant + voyel (+ n))
• All the syllables captured are replaced by the empty string
• We then check if the result is greater than the empty string (falsey) or equal to the empty string (thruthy)
• 86
– l4m2
Jan 17 at 12:49
• took me a while to understand the modification ^^ Jan 18 at 8:52

# C (gcc), 438 bytes

``````#define R return
int c(l){char a[]={'n','m','p','t','k','s','w','l','j'};for(int i=0;i<9;i++)if(l==a[i])R 1;R 0;}
int v(l){R l==97||l==101||l==105||l==111||l==117?1:0;}
int f(char* s){int i,a,b;for(i=0;*s!=0;s++,i++){a =*s;b=*(s+1);if(!(c(a)||v(a))||((a=='j'||a=='t')&&b=='i'||a=='w'&&(b=='u'||b=='o')||a=='n'&&(b=='n'||b=='m'))||(c(a)&&c(b)&&a!='n')||(v(a)&&v(b))) R 0;}if(i==1&&c(*(s-1))) R 0;if(*s==0&&v(*(s-2))&&*(s-1)!='n') R 0;R 1;}
``````

Try it online!

Explanations :

``````#define R return
// function to detect a consonant
int c(l){char a[]={'n','m','p','t','k','s','w','l','j'};for(int i=0;i<9;i++)if(l==a[i])R 1;R 0;}
// function to detect a vowel
int v(l){R l==97||l==101||l==105||l==111||l==117?1:0;}

int f(char* s){int i,a,b;for(i=0;*s!=0;s++,i++)
{
a =*s;b=*(s+1);
if(!(c(a)||v(a))||      // detect if characters are allowed
((a=='j'||a=='t')&&b=='i'||a=='w'&&(b=='u'||b=='o')||a=='n'&&(b=='n'||b=='m'))|| // detect if sequences ji, wu, wo & ti are not used
(c(a)&&c(b)&&a!='n')||  // detect if there are not 2 consecutives consonants
(v(a)&&v(b)))           // detect if there are not 2 consecutives vowels
R 0;
if(i==1&&c(*(s-1))) R 0;    // detect if it a single letter word & a vowel
if(*s==0&&v(*(s-2))&&*(s-1)!='n') R 0;  // test if the last character is not a consonant except 'n'
R 1;
}
$$```$$
``````
• 283 bytes Jan 19 at 17:24

# Lexurgy, 195 bytes

Lexurgy is a tool made for conlangers for applying sound changes, so this is perfect for this challenge! (and here I am bashing it into code golf)

Outputs the original word if it's valid Toki Pona, and an empty string otherwise.

Extremely slow version:

``````Class c {m,n,p,t,k,s,w,l,j}
Class v {a,e,i,o,u}
a:
{({j,t} i),(w {o,u}),({m,n} {m,n}),!@c&!@v}=>`
{(!n&@c @c),(@v @v)}=>` *
!@v&!n=>`/_ \$
n=>`/\$ _ \$
c propagate:
[]=>`/{` _,_ `}
d:
`=>*
``````

Much faster version, 199 bytes:

``````Class c {m,n,p,t,k,s,w,l,j}
Class v {a,e,i,o,u}
a:
{j,t} i=>`
w {o,u}=>`
{m,n} {m,n}=>`
!n&@c @c=>` *
@v @v=>` *
!@v&!n=>`/_ \$
n=>`/\$ _ \$
!@c&!@v=>`
c propagate:
[]=>`/{` _,_ `}
d:
`=>*
``````

Ungolfed:

``````Class cons {m,n,p,t,k,s,w,l,j}
Class vow {a,e,i,o,u}

remove-forbidden:
{j,t} i => ` # ji, ti
w {o,u} => ` # wo, wu
{m,n} {m,n} => ` # mn, mm, etc
!n&@cons @cons => ` * # no consecutive consonants
@vow @vow => ` * # no consecutive vowels
!@vow&!n => ` / _ \$ # ending with a vowel or n
n => ` / \$ _ \$ # nothing of length 1
Then:
!@cons&!@vow => ` # convert any invalid character
Then propagate:
[] => ` / {` _, _ `} # spread the invalid
Then:
` => * # delete the invalid
``````