The first article in this series dealt with avoiding excessive use of mocks when testing code that performs I/O.
This article deals with service functions that make use of dependency inversion.
Mocks should be absent from these tests altogether. This is because any code that performs I/O should be hidden behind an interface, so tests can use lightweight implementations of these interfaces.
The code
Code likely speaks louder than prose here:
def ingest_listings(listings: typing.Iterable[tuple[int, str]],
ingest_callback: typing.Callable[[data.Profile], None],
ingested: protocols.SetStore):
for identifier, listing in listings:
if identifier in ingested:
logger.info(f"Listing {identifier} has already been ingested. "
"Exiting...")
break
profile = scraping.profile_from_listing_html(listing)
ingest_callback(profile)
ingested.add(identifier)
Notice that the types of all three parameters are interfaces, an iterable, a callable and a simple protocol I wrote called SetStore
.
The function isn’t very complex, but it does have iteration, a conditional and a break
to determine control flow. This is worth testing.
Because it’s doing web scraping, I/O is unavoidable and it would be very annoying to mock this I/O when testing such simple logic.
Are the type annotations necessary?
The tests
I wrote three tests for this function, but the one below should be enough to illustrate the point.
def test_ingest_skips_ingested(mocker,
listing_one_page,
listing_one_identifier,
listing_one_profile,
listing_two_page,
listing_two_identifier,
listing_two_profile):
mocker.patch("scraping.BASE_LISTING_URL",
"https://www.rm.co.uk/properties")
ingested = {listing_two_identifier}
results = []
listings = [(listing_one_identifier, listing_one_page),
(listing_two_identifier, listing_two_page)]
service.ingest_listings(ingested=ingested,
listings=listings,
ingest_callback=results.append)
assert results == [listing_one_profile]
assert ingested == {listing_one_identifier, listing_two_identifier}
Python build-in sets satisfy the SetStore
protocol. As a result, the fact that one listing has already been ingested can be described using one short assignment statement with a set
literal. As I happen to be using Redis as a data store, it would be possible to use something like FakeRedis rather than mocking. However, a built-in type is a lot more convenient.
Similarly, the listings
iterable, which involves generators and lots of I/O in practice can be tested using a built-in list.
The fact that the append
method of the results
list satisfies the callable interface and allows a delightful level of laziness. There’s no need to worry about why it works because the meaning is straightforward and intuitive.
“I’ll humour you, Simon, why does it work?”
results.append
is a bound method. When you access a function object that is an attribute of a class from an instance of that class, Python wraps the function in another callable with the instance as its first argument — effectively a partial function. See descriptors.SetStore protocol
Since this protocol isn’t built-in, I thought it was worth sharing.
class SetStore(typing.Protocol):
"""Simple set-like data store
supporting adding members and membership checks.
"""
def __contains__(self, item: typing.Any):
"""Does the store contain an item."""
def add(self, item: typing.Any):
"""Add and item to the store"""
It’s a subset of the interface of a built-in set. Allowing the addition of items, and the use of the in
keyword via the __contains__
method.
The Redis hashtable implementation is trivial.
The main function
It may be helpful to see the ingest_listings
function in context.
@click.command()
@click.argument("url")
def main(url):
runner_url = os.getenv("RUNNER_URL")
def _send_to_runner(profile: data.Profile):
profile_data = dataclasses.asdict(profile)
payload = json.dumps(profile_data)
resp = requests.post(runner_url, payload)
resp.raise_for_status()
redis_host = os.getenv("REDIS_HOST")
redis_pass = os.getenv("REDIS_PASS")
redis_client = redis.Redis(host=redis_host, password=redis_pass)
ingested = adapters.RedisSetStore(redis_client, namespace="ingested-rm")
listings_gen = scraping.iter_listings(url)
service.ingest_listings(listings_gen,
ingest_callback=_send_to_runner,
ingested=ingested)
I haven’t written automated tests for the main method, which some may consider heresy. However, hear me out!
For one thing, there is no branching logic to test. The entire thing, including the in-line function, is flat. It’s just one statement after another.
Testing this would involve a lot of mocking and add very little value.
Why? Because the mocks will give no guarantees about whether we’re using the API of the requests
and redis-py
libraries correctly, let alone whether we’re respecting the contract of the runner service.
Conclusion
The first in this series shared the joy of testing functions with no side effects or I/O, and the value of mocking sparingly. This one covered dependency inversion as another useful way of avoiding dealing with I/O and side effects.
I’m sure I’ve overused mocks and monkey-patching in Python when I first discovered it. In my early days as a programmer, my code was strewn with side effects and I/O throughout all “layers” of the application (assuming there were even discernable layers). Pure(-ish — this is Python) functions and dependency inversion are two tools I wish I had in my belt as a junior.