YouFolio is an online portfolio (e-portfolio) company working with universities and students to showcase their skills and help them land their dream jobs by making their content search engine optimized and verifiable. As an early stage company, it has a strong focus on problem solving and therefore bases its interview questions on real-life YouFolio solutions. If an interviewee solves a problem more efficiently than YouFolio's current implementation, they are often invited to join the team.
YouFolio uses a lot of data streams and never wants to go above O(n)
time complexity if possible.
Iterators are a great way of doing that.
Because our data is very sparse, we had two options:
- Have an array with many empty values and take up more space than needed
- Use tree structures to efficiently access that data
Specifically, we can now insert items ad-hoc and they are automatically sorted because our tree structure can be sparse. For example, users sign up in any order, but we want to have them sorted by name without constantly re-making the index. Basically, what we do is take a node on the tree, and
- If the value is higher, it goes to the right
- If the value is lower, it goes to the left
- Every so often, rebalnace such that the median is at the top of the tree
This approach is even more applicable for portfolio content, in which the title often changes and we want to easily add and remove without re-working the index too much.
Now, the challenge is accessing this collection in O(n)
time complexity as if it were a list.
This is done using an iterator.
An iterator is an object that enables traversal of a collection, in order. Here is the API we currently use. This API is in Python because all the big tech companies conduct their interviews in Python.
class Node:
left: Node
right: Node
class Iterator:
"the iterator will iterate left, middle, right"
def __init__(self, node: Node):
"constructor to initiate an Iterator, usually taking the top of the tree"
def next(self) -> bool:
"moves to the next item in the iteration"
def exists(self) -> bool:
"checks if there is a value"
def get_value(self) -> int:
"gets the current value (an integer)
The goal is to implement this API in a manner that balances insert penalty and read speed. For any given tree following the rules described above, the iterator should list every element in order.
Now let's implement this API.
#!/usr/bin/python2.7
class Node:
"holds left and right values, forming a tree"
def __init__(self, data: int):
"constructor to initiate a node"
# store data
self.data = data
# store reference (next item)
self.left = None
self.right = None
return
def to_string(self) -> str:
"returns the node as a string"
s = ""
s += str(self.data)
if self.left != None or self.right != None:
s += " ["
if self.left != None:
s += self.left.to_string()
else:
s += "null"
s += " "
if self.right != None:
s += self.right.to_string()
else:
s += "null"
s += "]"
return s
# Define an enum
LEFT = 0
MIDDLE = 1
RIGHT = 2
class Iterator:
"the iterator will iterate left, middle, right"
def __init__(self, node: Node):
"constructor to initiate an Iterator"
self.node = node
self.current = None
self.checking = LEFT
self.next_called = False
def next(self) -> bool:
"moves to the next item in the iteration"
self.next_called = True
if self.checking == LEFT:
if self.node.left == None:
self.checking = MIDDLE
return True
else:
if self.current == None:
self.current = Iterator(self.node.left)
return True
else:
if self.current.next():
return True
else:
self.current = None
self.checking = MIDDLE
return True
if self.checking == MIDDLE:
self.checking = RIGHT
if self.node.right == None:
return False
self.current = Iterator(self.node.right)
return True
# LEFT and MIDDLE have been checked, we're on the RIGHT now
if self.current == None:
return False
# if we can go next, return true
if self.current.next():
return True
# otherwise, we're done here
self.current = None
return False
def exists(self) -> bool:
"checks if there is a value"
if self.next_called == False:
self.next()
if self.checking == RIGHT or self.checking == LEFT:
if self.current == None:
return False
return True
def get_value(self) -> int:
"gets the value (an integer)"
if self.next_called == False:
self.next()
if self.checking == RIGHT or self.checking == LEFT:
if self.current != None:
return self.current.get_value()
return
return self.node.data
Time to invoke the code...
# make the tree
c = Node(4)
c.left = Node(2)
c.left.left = Node(1)
c.left.right = Node(3)
c.right = Node(6)
c.right.left = Node(5)
c.right.right = Node(7)
# make an iterator
i = Iterator(c)
while i.exists():
print(i.get_value())
i.next()
print("The value is ", c.to_string())
Now let's see the output...
1 2 3 4 5 6 7 The value is 4 [2 [1 3] 6 [5 7]]
Success!
This traverses the collection in O(n)
complexity and O(n)
space complexity (one node for each element).
This is about as fast as you can get for a collection, but it is ordered and the insert penalty is only as bad as the depth of the tree.
In other words, this scales horizontally well.
Want to try for a more efficient algorithm? Here's a link to get started.