Constant time and space for determining if an ASCII string contains unique characters

 2021.09.14 -  Jan Reggie Dela Cruz -  ~3 Minutes

In this post I describe a constant time and space algorithm for determining if a string contains unique ASCII characters, a question which was given to me in an interview a while ago.

I am definitely not the first person to discover this algorithm, but I don’t see much discussion of it in the Internet, aside from this Computer Science Stack Exchagnge post   and this Stack Overflow post   .

Problem statement

Suppose you are given a string, that contains only the seven bit ASCII characters (0 to 127). Write a function that takes in this string and returns a boolean representing if this string contains unique characters.

// function allUnique(s string) bool { ... }
allUnique("abcd")   // true
allUnique("bruhb")  // false: b occurs twice
allUnique("Prepa")  // true: P and p are different
allUnique("Prepare") // false: r and e occur twice
allUnique("")     // true: no character occurs more than once

Solution

I’ll go straight to the code

func allUnique(str string) bool {
  if len(str) <= 1 {
    return true
  }
  if len(str) > 128 {
    return false
  }

  var used [128]bool // array set to all false
  for ii := range str {
    if str[ii] >= 128 {
      return false
    }
    if used[str[ii]] {
      return false
    }
    used[str[ii]] = true
  }

  return true
}

That’s it. If you’re familiar with the question, this is the solution that you’re probably thinking of.

Okay, let’s explain it further.

Identifying if characters have been used

used is a fixed-size array containing false by default that checks if a given character has already been used. used[c] will be set to true for every c (this case, str[ii]) in the string:

var used [128]bool // array set to all false
for ii := range str {
  if str[ii] >= 128 {
    return false
  }
  if used[str[ii]] {
    return false
  }
  used[str[ii]] = true
}

return true

The first if block just checks if the character is “invalid”, and if so, returns false immediately.

The second if block checks if the character has been “used”, and if so, also returns false.

Because we are using a fixed-size array here, this part of the code is constant space, but not necessarily “constant time”.

Initial if statements

These are the statements that guarantee that the problem is constant time as the length of the string increases:

if len(str) <= 1 {
  return true
}

If a string contains 0 or 1 characters, it is guaranteed that there are no duplicates.

if len(str) > 128 {
  return false
}

Since there are only 128 ASCII characters, if the string length goes greater than that, it is guaranteed that the character at the 128th index must have occurred at least once in the list of characters before it. (See pigeonhole principle   .)

But is it really constant time?

One can argue that the problem is linear time since the amount of iterations in the for loop grows proportionally to the length of the input. However, because Big O notation is used to “classify algorithms how time or space grows as input size grows”   , the problem just becomes constant time after a certain threshold.

When I realized that this was constant time and space in an interview, both the interviewer and I were impressed. It was a galaxy brain moment.

The algorithm requires at most 128 iterations in the for loop provided, and that worst case only happens if all the characters are unique in the ASCII table.