Constant time and space for determining if an ASCII string contains unique characters
2021.09.14 - Jan Reggie Dela Cruz - ~3 Minutes
In this post I describe a constant time and space algorithm for determining if a string contains unique ASCII characters, a question which was given to me in an interview a while ago.
I am definitely not the first person to discover this algorithm, but I don’t see much discussion of it in the Internet, aside from this Computer Science Stack Exchagnge post and this Stack Overflow post .
Problem statement
Suppose you are given a string,
that contains only the seven bit ASCII characters (0
to 127
).
Write a function that takes in this string and returns a boolean
representing if this string contains unique characters.
// function allUnique(s string) bool { ... }
allUnique("abcd") // true
allUnique("bruhb") // false: b occurs twice
allUnique("Prepa") // true: P and p are different
allUnique("Prepare") // false: r and e occur twice
allUnique("") // true: no character occurs more than once
Solution
I’ll go straight to the code
func allUnique(str string) bool {
if len(str) <= 1 {
return true
}
if len(str) > 128 {
return false
}
var used [128]bool // array set to all false
for ii := range str {
if str[ii] >= 128 {
return false
}
if used[str[ii]] {
return false
}
used[str[ii]] = true
}
return true
}
That’s it. If you’re familiar with the question, this is the solution that you’re probably thinking of.
Okay, let’s explain it further.
Identifying if characters have been used
used
is a fixed-size array containing false
by default
that checks if a given character has already been used.
used[c]
will be set to true
for every c
(this case, str[ii]
) in the string:
var used [128]bool // array set to all false
for ii := range str {
if str[ii] >= 128 {
return false
}
if used[str[ii]] {
return false
}
used[str[ii]] = true
}
return true
The first if
block just checks if the character is “invalid”,
and if so, returns false
immediately.
The second if
block checks if the character has been “used”,
and if so, also returns false
.
Because we are using a fixed-size array here, this part of the code is constant space, but not necessarily “constant time”.
Initial if statements
These are the statements that guarantee that the problem is constant time as the length of the string increases:
if len(str) <= 1 {
return true
}
If a string contains 0 or 1 characters, it is guaranteed that there are no duplicates.
if len(str) > 128 {
return false
}
Since there are only 128 ASCII characters, if the string length goes greater than that, it is guaranteed that the character at the 128th index must have occurred at least once in the list of characters before it. (See pigeonhole principle .)
But is it really constant time?
One can argue that the problem is linear time
since the amount of iterations in the for
loop
grows proportionally to the length of the input.
However, because Big O notation is used
to “classify algorithms how time or space grows as input size grows”
,
the problem just becomes constant time after a certain threshold.
When I realized that this was constant time and space in an interview, both the interviewer and I were impressed. It was a galaxy brain moment.
The algorithm requires at most 128 iterations
in the for
loop provided,
and that worst case only happens
if all the characters are unique in the ASCII table.