Python Testing Part 2: Dependency Inversion with Service Functions

Python Testing Part 2: Dependency Inversion with Service Functions

The first article in this series dealt with avoiding excessive use of mocks when testing code that performs I/O.

This article deals with service functions that make use of dependency inversion.

Mocks should be absent from these tests altogether. This is because any code that performs I/O should be hidden behind an interface, so tests can use lightweight implementations of these interfaces.

The code

Code likely speaks louder than prose here:

def ingest_listings(listings: typing.Iterable[tuple[int, str]],
                    ingest_callback: typing.Callable[[data.Profile], None],
                    ingested: protocols.SetStore):

    for identifier, listing in listings:
        if identifier in ingested:
            logger.info(f"Listing {identifier} has already been ingested. "
                        "Exiting...")
            break

        profile = scraping.profile_from_listing_html(listing)

        ingest_callback(profile)
        ingested.add(identifier)

Notice that the types of all three parameters are interfaces, an iterable, a callable and a simple protocol I wrote called SetStore.

The function isn’t very complex, but it does have iteration, a conditional and a break to determine control flow. This is worth testing.

Because it’s doing web scraping, I/O is unavoidable and it would be very annoying to mock this I/O when testing such simple logic.

Are the type annotations necessary?
They make the code clearer but aren’t strictly necessary for this pattern in Python; Python is nice like that. With that said, my favourite thing about the Go language is how it does duck typing with a static type system. This allows similar code, where the caller doesn’t need to explicitly implement required interfaces. They only need to pass something in with the right signatures. Go works this out at compile-time!

The tests

I wrote three tests for this function, but the one below should be enough to illustrate the point.

def test_ingest_skips_ingested(mocker,
                               listing_one_page,
                               listing_one_identifier,
                               listing_one_profile,
                               listing_two_page,
                               listing_two_identifier,
                               listing_two_profile):

    mocker.patch("scraping.BASE_LISTING_URL",
                 "https://www.rm.co.uk/properties")
    ingested = {listing_two_identifier}
    results = []

    listings = [(listing_one_identifier, listing_one_page),
                (listing_two_identifier, listing_two_page)]

    service.ingest_listings(ingested=ingested,
                            listings=listings,
                            ingest_callback=results.append)

    assert results == [listing_one_profile]
    assert ingested == {listing_one_identifier, listing_two_identifier}

Python build-in sets satisfy the SetStore protocol. As a result, the fact that one listing has already been ingested can be described using one short assignment statement with a set literal. As I happen to be using Redis as a data store, it would be possible to use something like FakeRedis rather than mocking. However, a built-in type is a lot more convenient.

Similarly, the listings iterable, which involves generators and lots of I/O in practice can be tested using a built-in list.

The fact that the append method of the results list satisfies the callable interface and allows a delightful level of laziness. There’s no need to worry about why it works because the meaning is straightforward and intuitive.

“I’ll humour you, Simon, why does it work?”
Internally results.append is a bound method. When you access a function object that is an attribute of a class from an instance of that class, Python wraps the function in another callable with the instance as its first argument — effectively a partial function. See descriptors.

SetStore protocol

Since this protocol isn’t built-in, I thought it was worth sharing.

class SetStore(typing.Protocol):
    """Simple set-like data store
    supporting adding members and membership checks.
    """ 

    def __contains__(self, item: typing.Any):
        """Does the store contain an item."""

    def add(self, item: typing.Any):
        """Add and item to the store"""

It’s a subset of the interface of a built-in set. Allowing the addition of items, and the use of the in keyword via the __contains__ method.

The Redis hashtable implementation is trivial.

The main function

It may be helpful to see the ingest_listings function in context.

@click.command()
@click.argument("url")
def main(url):
    runner_url = os.getenv("RUNNER_URL")

    def _send_to_runner(profile: data.Profile):
        profile_data = dataclasses.asdict(profile)
        payload = json.dumps(profile_data)
        resp = requests.post(runner_url, payload)
        resp.raise_for_status()

    redis_host = os.getenv("REDIS_HOST")
    redis_pass = os.getenv("REDIS_PASS")

    redis_client = redis.Redis(host=redis_host, password=redis_pass)
    ingested = adapters.RedisSetStore(redis_client, namespace="ingested-rm")

    listings_gen = scraping.iter_listings(url)

    service.ingest_listings(listings_gen,
                            ingest_callback=_send_to_runner,
                            ingested=ingested)

I haven’t written automated tests for the main method, which some may consider heresy. However, hear me out!

For one thing, there is no branching logic to test. The entire thing, including the in-line function, is flat. It’s just one statement after another.

Testing this would involve a lot of mocking and add very little value.

Why? Because the mocks will give no guarantees about whether we’re using the API of the requests and redis-py libraries correctly, let alone whether we’re respecting the contract of the runner service.

Conclusion

The first in this series shared the joy of testing functions with no side effects or I/O, and the value of mocking sparingly. This one covered dependency inversion as another useful way of avoiding dealing with I/O and side effects.

I’m sure I’ve overused mocks and monkey-patching in Python when I first discovered it. In my early days as a programmer, my code was strewn with side effects and I/O throughout all “layers” of the application (assuming there were even discernable layers). Pure(-ish — this is Python) functions and dependency inversion are two tools I wish I had in my belt as a junior.