Polyglot Dojo #4: Permutation
Challenge
In this post I'm going to talk about the permutation challenge. What we require to accomplish here is to compare two input strings and check if they have the same length with the same set of characters case sensitively.
Our main constraints are defined as:
Can we assume the string is ASCII?
Yes
Note: Unicode strings could require special handling depending on your language
Is whitespace important?
- Yes
Is this case sensitive? 'Nib', 'bin' is not a match?
- Yes
Can we use additional data structures?
- Yes
Can we assume this fits in memory?
- Yes
Python Solution
The unit tests provided for this challenge is as follows:
from nose.tools import assert_equal
class TestPermutation(object):
def test_permutation(self, func):
assert_equal(func(None, 'foo'), False)
assert_equal(func('', 'foo'), False)
assert_equal(func('Nib', 'bin'), False)
assert_equal(func('act', 'cat'), True)
assert_equal(func('a ct', 'ca t'), True)
assert_equal(func('dog', 'doggo'), False)
print('Success: test_permutation')
def main():
test = TestPermutation()
permutations = Permutations()
test.test_permutation(permutations.is_permutation)
try:
permutations_alt = PermutationsAlt()
test.test_permutation(permutations_alt.is_permutation)
except NameError:
# Alternate solutions are only defined
# in the solutions file
pass
if __name__ == '__main__':
main()
First I'm going to make sure the input arguments, have the valid type and the same length (passing the first two cases):
class Permutations(object):
def is_permutation(self, str1, str2):
if not (isinstance(str1, str) and isinstance(str2, str)):
return False
if not len(str1) == len(str2):
return False
Now that we know that our inputs are valid and have the same length, it's time to check the permutation. For that we should make sure that they have the same set characters (ignoring the order). I know that the set
data structure has some cool features for comparing two sets by each other, and reading its document I found symmetric_difference
function which sounds like a good match:
Return a new set with elements in either the set or other but not both.
So if the resulting value of this function applied on our input strings is empty, we can be sure they have the same set of characters:
str1 = set(str1)
str2 = set(str2)
return not str1.symmetric_difference(str2)
And it passes the test suite as you can confirm on replit:
$ open repl.it/@shahinism/Permutation-Python █
The time complexity of symmetric_difference
based on Python's time complexity is $O(len(s))$ where s is the left most set in average case. And in the worst case it's $O(s * t)$ where t is the right most set. So it should be tolerable for our use case (considering the normal length of words).
So far so good. Now let's look at the two interesting answers provided by the challenge repository. The first one is what I like the most:
class Permutations(object):
def is_permutation(self, str1, str2):
if str1 is None or str2 is None:
return False
return sorted(str1) == sorted(str2)
The value type validation is not what I like to use. Even in my own implementation I would like to raise a TypeError
instead of returning False
, however, I returned False
because it was expected in unit tests.
Let's quit ranting and look at the brilliant part of the solution sorted(str1) == sorted(str2)
. The return value of sorted
function is a sorted list of individual characters available in the string. The comparison between the lists returns True
only if they have the same length and each element should have the same index in both of the lists.
However the cleverness of this solution can become costly when the size of our strings is bigger (probably won't happen in this use-case). The reason for that is the builtin sorted
function in Python has the complexity of $O(n\ log\ n)$ for average case.
Now it's time for the second solution they have provided:
from collections import defaultdict
class PermutationsAlt(object):
def is_permutation(self, str1, str2):
if str1 is None or str2 is None:
return False
if len(str1) != len(str2):
return False
unique_counts1 = defaultdict(int)
unique_counts2 = defaultdict(int)
for char in str1:
unique_counts1[char] += 1
for char in str2:
unique_counts2[char] += 1
return unique_counts1 == unique_counts2
This is probably the most solid solution time wise (having time complexity of $O(n)$. How this solution works is based on the comparison of two dict
where the keys are unique characters and values are the number of repetitions. As a matter of fact this solution (comparing dict of repetition counter) is enough by itself and makes the length comparison a redundant part of the source code. The following version would work the same:
from collections import defaultdict
class PermutationsAlt(object):
def is_permutation(self, str1, str2):
if str1 is None or str2 is None:
return False
unique_counts1 = defaultdict(int)
unique_counts2 = defaultdict(int)
for char in str1:
unique_counts1[char] += 1
for char in str2:
unique_counts2[char] += 1
return unique_counts1 == unique_counts2
You can confirm it in my replit entry as well: