Polyglot Dojo #4: Permutation
In this post I'm going to talk about the permutation challenge. What we require to accomplish here is to compare two input strings and check if they have the same length with the same set of characters case sensitively.
Our main constraints are defined as:
Can we assume the string is ASCII?
Note: Unicode strings could require special handling depending on your language
Is whitespace important?
Is this case sensitive? 'Nib', 'bin' is not a match?
Can we use additional data structures?
Can we assume this fits in memory?
The unit tests provided for this challenge is as follows:
from nose.tools import assert_equal class TestPermutation(object): def test_permutation(self, func): assert_equal(func(None, 'foo'), False) assert_equal(func('', 'foo'), False) assert_equal(func('Nib', 'bin'), False) assert_equal(func('act', 'cat'), True) assert_equal(func('a ct', 'ca t'), True) assert_equal(func('dog', 'doggo'), False) print('Success: test_permutation') def main(): test = TestPermutation() permutations = Permutations() test.test_permutation(permutations.is_permutation) try: permutations_alt = PermutationsAlt() test.test_permutation(permutations_alt.is_permutation) except NameError: # Alternate solutions are only defined # in the solutions file pass if __name__ == '__main__': main()
First I'm going to make sure the input arguments, have the valid type and the same length (passing the first two cases):
class Permutations(object): def is_permutation(self, str1, str2): if not (isinstance(str1, str) and isinstance(str2, str)): return False if not len(str1) == len(str2): return False
Now that we know that our inputs are valid and have the same length, it's time to check the permutation. For that we should make sure that they have the same set characters (ignoring the order). I know that the
set data structure has some cool features for comparing two sets by each other, and reading its document I found
symmetric_difference function which sounds like a good match:
Return a new set with elements in either the set or other but not both.
So if the resulting value of this function applied on our input strings is empty, we can be sure they have the same set of characters:
str1 = set(str1) str2 = set(str2) return not str1.symmetric_difference(str2)
And it passes the test suite as you can confirm on replit:
The time complexity of
symmetric_difference based on Python's time complexity is $O(len(s))$ where s is the left most set in average case. And in the worst case it's $O(s * t)$ where t is the right most set. So it should be tolerable for our use case (considering the normal length of words).
So far so good. Now let's look at the two interesting answers provided by the challenge repository. The first one is what I like the most:
class Permutations(object): def is_permutation(self, str1, str2): if str1 is None or str2 is None: return False return sorted(str1) == sorted(str2)
The value type validation is not what I like to use. Even in my own implementation I would like to raise a
TypeError instead of returning
False, however, I returned
False because it was expected in unit tests.
Let's quit ranting and look at the brilliant part of the solution
sorted(str1) == sorted(str2). The return value of
sorted function is a sorted list of individual characters available in the string. The comparison between the lists returns
True only if they have the same length and each element should have the same index in both of the lists.
However the cleverness of this solution can become costly when the size of our strings is bigger (probably won't happen in this use-case). The reason for that is the builtin
sorted function in Python has the complexity of $O(n\ log\ n)$ for average case.
Now it's time for the second solution they have provided:
from collections import defaultdict class PermutationsAlt(object): def is_permutation(self, str1, str2): if str1 is None or str2 is None: return False if len(str1) != len(str2): return False unique_counts1 = defaultdict(int) unique_counts2 = defaultdict(int) for char in str1: unique_counts1[char] += 1 for char in str2: unique_counts2[char] += 1 return unique_counts1 == unique_counts2
This is probably the most solid solution time wise (having time complexity of $O(n)$. How this solution works is based on the comparison of two
dict where the keys are unique characters and values are the number of repetitions. As a matter of fact this solution (comparing dict of repetition counter) is enough by itself and makes the length comparison a redundant part of the source code. The following version would work the same:
from collections import defaultdict class PermutationsAlt(object): def is_permutation(self, str1, str2): if str1 is None or str2 is None: return False unique_counts1 = defaultdict(int) unique_counts2 = defaultdict(int) for char in str1: unique_counts1[char] += 1 for char in str2: unique_counts2[char] += 1 return unique_counts1 == unique_counts2
You can confirm it in my replit entry as well: