Django and Semgrep: Enforcing a Service Layer Using Static Analysis

Django and Semgrep: Enforcing a Service Layer Using Static Analysis

In my previous post about implementing a service layer in Django, I wrote about a simple pattern that "plays nice" with the mountain of functionality that comes with Django out-of-the-box, particularly the ORM.

In this implementation, business logic is grouped into modules containing functions. Although logic can be grouped based on whatever makes sense in terms of encapsulating functionality and providing useful APIs to the rest of a codebase, this article is only concerned with service modules that map onto Django data models.

This article will show you rough-and-ready example how to enforce the simplest Service Layer pattern using Semgrep rules that can be checked in CI pipelines. By automatically enforcing adherence to a basic high-level pattern, static analysis can take some of the burden away from code reviewers, and empower them to focus on business logic rather than patterns.

Enforcing the Rules

The rules map a particular Django ORM model to a service module and ensure that all or a subset of manager methods for that model are only called with the module.

The simplest way to enforce the service layer pattern is to prevents developers from calling any ORM methods from outside of the service module. Below is where a YAML list of semgrep rules is built dynamically by appending the output of a Jinja template.


# This is expected to be run from repo root

echo "rules:" > $service_rules_filename
jinja -D model_name Share \
      -D service_file_path \
      -D model_class_path apps.core.models.User \
      $template_path >> $service_rules_filename
jinja -D model_name Share \
      -D service_file_path apps/advertising/service/ \
      -D model_class_path apps.advertising.models.Campaign \
      $template_path >> $service_rules_filename

semgrep --error -f $service_rules_filename touchsurgery

rm $service_rules_filename

exit $semgrep_exit_code

The resulting rule ensurers that ORM calls that access the database via apps.advertising.models.Campaign can only be made from the apps.advertising.service.campaign module and the same for the User model.

Here is the template used to generate rules for each data model/service module pair.

  - id: {{ model_name|lower }}-service-strict
      - python
    message: |
      Call methods on the {{ model_name }} model's manager(s) in the appropriate service module:
      {{ service_file_path }}
      - pattern: {{ model_class_path }}(...)
      - pattern: {{ model_class_path }}.$MANAGER.$METHOD(...)\
    severity: ERROR
        - {{ service_file_path }}
        - test*.py
        - tests/*.py

As you can see it's not particularly complicated. The trade-off of this is having to wrap even safeSELECT queries in service functions; after a while this can become laborious.

Testing the Rules

It's good to have some confidence that static analysis works as intended. Semgrep has us covered here, allowing us to run rules against test files that are annotated with the names of rules.


# This is expected to be run from platform repo root

mkdir $test_dir
cp semgrep/python/django_service_pattern_strict/ $rules_test_filename
echo "rules:" > $rules_filename
jinja -D model_name SomeModel \
      -D service_file_path /dev/null \
      -D model_class_path app.models.SomeModel \
      $template_path >> $rules_filename

semgrep --quiet --test $test_dir

rm -r $test_dir

Here is with various code snippets that will trigger the strict Semgrep rule.

import random

from django.db import transaction
from django.db.models import F, Q

from app.models import SomeModel

# ruleid: somemodel-service-strict
instance = SomeModel(foo="bar")

# ruleid: somemodel-service-strict
result = SomeModel.objects.get(pk=1)

# ruleid: somemodel-service-strict
results = SomeModel.objects.filter(foo="bar")

# ruleid: somemodel-service-strict
values = SomeModel.objects.filter(foo="bar").values_list("baz", flat=True)

# ruleid: somemodel-service-strict
obj_one, obj_two = SomeModel.objects.bulk_create(
    # ruleid: somemodel-service-strict
    (SomeModel(foo="bar"), SomeModel(foo="baz"))
) = = "flob"
# ruleid: somemodel-service-strict
SomeModel.some_custom_manager.bulk_update([obj_one, obj_two])

# ruleid: somemodel-service-strict

# ruleid: somemodel-service-strict

# ruleid: somemodel-service-strict
obj, _ = SomeModel.objects.get_or_create(foo="wibble")

# ruleid: somemodel-service-strict
qs = SomeModel.objects.select_for_update().filter(foo="bar")
with transaction.atomic():
    for i, obj in enumerate(qs): = f"bar_{i:03d}"

# ruleid: somemodel-service-strict

# ruleid: somemodel-service-strict
SomeModel.objects.update_or_create(id=2, foo="bar")

# ruleid: somemodel-service-strict
unpersisted_obj = SomeModel() = "fuzzle"

# ruleid: somemodel-service-strict
doomed_qs = SomeModel.objects.all()
mercy = bool(random.getrandbits(1))
if mercy:
    doomed_qs = doomed_qs.annotate(odd=F("id") % 2).filter(odd=False)

# ruleid: somemodel-service-strict
doomed_obj = SomeModel.objects.filter(
    Q(foo__contains="z") | Q(foo__contains="a")

# ruleid: somemodel-service-strict
obj, _ = SomeModel.objects.all()[:2]

A less restrictive approach

There is an alternative approach, which only prevents developers from calling ORM methods that modify database state outside of the service module. This less strict version doesn’t force developers to wrap safe ORM calls that just generate SELECT queries.

I am not including an example rule because of the verbosity necessitated by covering a subset of Django QuerySet and Manager methods, as well as chaining these methods using the ORMs fluent interface. Not to mention the myriaad way in which Model instances can be instantiated, mutated and persisted.


We tend to use static analysis to help us get the details right, to enforce best practices, to spot security risks and code smells; it's particular use when working with a dynamic language like Python. Hopefully this post goes some way in showing how static analysis can be used to enforce specific high-level patterns within a codebase.

The example code is little more than bashglue tying several CLIs together. It's not yet clear whether it would be worth writing a purpose-built abstraction layer on top of Semgrep. In the mean time I encourage you to get creative and try to use it to enforce patterns and conventions within your codebases.