Polyglot Dojo #1: Compress String
Challenge
The first challenge we pick to solve is Compress Challange. The description of the challenge says:
Compress a string such that 'AAABCCDDDD' becomes 'A3BC2D4'. Only compress the string if it saves space
To solve this challenge, we must satisfy the following constraints:
We can assume the string is ASCII
The compression should be case sensitive
We can use additional data structures
We can assume the string would fit into the memory
They also provide the following test suite to check our solution:
from nose.tools import assert_equal
class TestCompress(object):
def test_compress(self, func):
assert_equal(func(None), None)
assert_equal(func(''), '')
assert_equal(func('AABBCC'), 'AABBCC')
assert_equal(func('AAABCCDDDDE'), 'A3BC2D4E')
assert_equal(func('BAAACCDDDD'), 'BA3C2D4')
assert_equal(func('AAABAACCDDDD'), 'A3BA2C2D4')
print('Success: test_compress')
def main():
test = TestCompress()
compress_string = CompressString()
test.test_compress(compress_string.compress)
if __name__ == '__main__':
main()
Python Solution
The high-level representation of the idea I can think of here is that we want to group continuous alike characters from a string, and then represent their group length as a number. Kind of like the following picture:
The first approach I tried here was using collection.Counter. Here is an example of how we can use it:
from collections import Counter
count = Counter("AABBCC")
print(count)
Counter({'A': 2, 'B': 2, 'C': 2})
Well, at first sight, it looks like a correct solution, and of course, it can satisfy all of the assertions from the suggested test suite except the last one. The problem with the last test is that the Counter
doesn't look for continuity of the characters, so when we pass "AABBAA"
to it, look what happens:
from collections import Counter
print(Counter("AABBAA"))
Counter({'A': 4, 'B': 2})
The Counter
counts a character all over the string, which is not what we want. Finding out that is not enough; I knew I need to try another approach. To extract these groups, we can iterate through the characters of the string, and check their identity with the previous one. However, it's a bit mechanical for my taste, so I preferred to search for groupby
keyword in Python's standard library. And fortunately, I got lucky and found a bit bizarre groupby
function.
However, before continuing that, I should add, one exciting thing I learned while working on these challenges is, usefullness of having a local copy of language's documentations. In Linux, I'm using Zeal. For Mac users, I know there is Dash which Zeal uses its document format. Also, for hardcore Emacs users, there is a dash.el which is quite impressive, but I haven't used to it yet.
Anyway, let's get back to our business. I said, groupby
was a little bit bizarre to me. Let's look at how it works:
from itertools import groupby
for key, group in groupby("AABBCC"):
print(f"{key}: {list(group)}")
A: ['A', 'A'] B: ['B', 'B'] C: ['C', 'C']
So it gives us what we want, what is bizarre about it? The function name says it is going to group by something. We haven't passed anything as the by part; I mean something like GROUP BY phrase in SQL. So, I took a second look at the documentation where it says:
The key is a function computing a key value for each element. If not specified or is None, key defaults to an identity function and returns the element unchanged
The identity function. So it uses some function to compare each element's identity, which is what we want. What is the nature of that function? We will discover that in another blog post.
So after finding a single function, that provides us with all we need to satisfy the basic logic of the problem, we can form it as a solution. Here is mine:
$ open repl.it/@shahinism/Compress-String-Python โ
All I do here is to basically:
iterate through the result of
groupby
function, which provides me by a pair of key (the character) and a group (an iterable object of continuous repetition of the paired character).Then I turn that group into a
list
andcount()
the number of elements there.If the count is bigger than one, I append the count number paired with the character; otherwise, I append the character alone.
Then I checked the original answer provided by challenge collection. They have chosen the mechanical solution (which we will come back to with our Rust experiment), but I've learned about groupby
๐.
The time complexity of string append (as value += "new value"
is optimized in CPython and is equal to \(O(1)\). So this is safe to say the overall algorithm for our solution has the time complexity of \(O(N)\).
Clojure Solution
Since the interactive coding challenges repository only supports Python, we first need to port our test suite to Clojure. Here are the same constraints expressed in Clojure:
(deftest compress-test
(testing "empty, is empty"
(is (= (compress "") "")))
(testing "doesn't compress fine strings"
(is (= (compress "ABC") "ABC"))
(is (= (compress "AABBCC") "AABBCC")))
(testing "does compress effectively"
(is (= (compress "AAABCCDDDDE") "A3BC2D4E"))
(is (= (compress "BAAACCDDDD") "BA3C2D4"))
(is (= (compress "AAABAACCDDDD") "A3BA2C2D4"))))
The syntax here is quite straightforward. The lines including (is (=...
are defining the main constraints and (testing "..."
are grouping related tests with a readable description. The most exciting thing for me here is the readability and simplicity of the code.
Now, we are ready to tackle the real problem. Since we already used some functional techniques while we were trying to solve this problem in Python, wandering inside the real functional land of Lisp, I hoped the underlying logic won't change much.
To explore the standard library of Clojure, I used Clojure Docs website. The first search for group-by, I found a function, that is much different from what I had in mind. But I remembered, there was a function with a more reasonable name I learned about when I was solving Clojure Koans. The functions name was partition-by. It's described as:
Applies f to each value in coll, splitting it each time f returns a new value. Returns a lazy seq of partitions. Returns a stateful transducer when no collection is provided.
So, we need a function f, which would return a different type, when the input character changes. Do you remember how groupby()
function, was splitting the input string, to groups of the same continuous characters? Back there, we knew Python is using an "identity" function as the default value for the func
parameter.
Interestingly enough, I searched for the identity
in Clojure Docs and found it. So our grouping logic, in Clojure would be like this:
(partition-by identity "AABBCC")
Calling this would return a list of character groups like ((\A \A) (\B \B) (\C \C))
, which is quite like what we need. So I turned it into a function:
(defn group-chars
[string]
(partition-by identity string))
As you know, must of the functional languages, don't like the usual loops! Clojure is not different. However, they provide a more compelling and usually more comfortable to reason alternatives (if you understand how they work, of course). Since the result of our group-char
function, is an iterable list of items, and all we want to do with each item, is to encode them to a character + count
format, our best alternative for a loop would be a map function. The old map
we all know about from JavaScript's famous functional toolbox!
However, to use the map
, we need to have our encoding logic, implemented as a function. Since the logic is a bit bigger than what can fit inside an anonymous function, I'm going to define it as a separate function:
(defn create-part
[group]
(let [length (count group)
char (first group)]
(if (= length 1)
(str char)
(str char length))
))
All it does is:
Takes a
group
as the input parameter.Uses
count
to get the length of the group and stores it insidelength
.considers the
first
character as the group's identity and stores it inchar
.And finally, if the length of the group is 1, it just returns the
char
as a string, otherwise, returns the concatenation ofchar
andlength
as the result.
One main characteristic of this function, is that it assumes the group contains a unique set of characters. A better approach here would be to check for that explicitly, and probably clojure.spec would be helpful to implement it as design by contract.
Yet, I'm not familiar with the concept or how to handle exceptions safely in Clojure as of yet. Given our software is a small challenge solution, I'm going not to bother myself with this fact, and rely on our pipeline to handle that until I learn more about Clojure (even though it's not a good practice! ๐).
So, with that part in place, we can go forward, and implement our final part of the puzzle, which is, getting the input string, split it into groups, feed it to create the part, and concatenate the resulting map:
(defn compress
"Compress string."
[string]
(let [compressed (join "" (map create-part (group-chars string)))]
(if (= (count compressed) (count string))
string
compressed)))
As you see, I also used a condition to return the original string, if compressing it didn't reduce the size of the string. Putting it all together, we solve the challenge as you can confirm here:
$ open repl.it/@shahinism/Compress-String-Clojure โ
Rust Solution
Well, the most challenging part for me in this journey was solving it using Rust, and yet I'm not happy with the results. It's probably because it's the language I know far less about comparing to others. However, let's start to learn ๐. Again, let's port the tests:
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_compress() {
assert_eq!(compress(""), "");
assert_eq!(compress("ABC"), "ABC");
assert_eq!(compress("AABBCC"), "AABBCC");
assert_eq!(compress("AAABCCDDDDE"), "A3BC2D4E");
assert_eq!(compress("BAAACCDDDD"), "BA3C2D4");
assert_eq!(compress("AAABAACCDDDD"), "A3BA2C2D4");
assert_eq!(compress(String::from("AAABAACCDDDD")), "A3BA2C2D4");
}
}
This is quite usual unit testing syntax, not much interesting. However, as you see, I've added an extra test, to make sure it can compress instances of String
object just like string literals.
And here is a working solution to the problem:
$ open repl.it/@shahinism/Compress-String-Rust โ
I'm going to describe this solution with a high-level perspective instead of focusing on the details. There are two reasons for this:
As I said, I'm not happy with my solution so far. My experience is quite small on Rust, and I believe all those
.to_string()
and.as_str()
should be eliminated somehow. So I plan to refactor this solution gradually, as I learn more about Rust's type system.Don't want my ego or perfectionism, get into the way of my blogging ๐ . As I said in the introduction of this series, I started it to help me learn more.
With that out of the way, let's see, what is my Rusty solution to this problem. It's quite close to the original Python answer provided in interactive challenges.
I have a function
create_part
, which handles the encoding of a character, based on the number of continuous repetitions.And the primary
compress
function which loops through all the characters of the string counting the number of their continuity and appends each part to aresult
string.Finally, I check for the length of the compressed string, and only provide the compressed version as a result, if it's reduced the size of the original string.
Conclusion
The idea of this series is so exciting to me. Well, I always wanted to improve my overall problem-solving skills, and so far, looks like this series is helping me in that regard. As you see, different paradigms of these languages, are forcing me to solve the problems with different approaches (top-down in Rust and bottom-up in Python and Clojure).
Also, the different level of abstractions provided by each language's standard library is helping me dive into different levels of problem's basics gradually.
I don't want to lie about it, I some times get mad when something doesn't make sense, yet going back and re-reading or repeating the process, is improving my overall confidence in coding. No need to say that it's also helping me to learn new abilities of a language I thought I already knew (Python) ๐.