A front graphic showing tips about the combination of FastAPI and SQLAlchemy.

10 Tips for adding SQLAlchemy to FastAPI

Introducing a relational database such as PostgreSQL or MySQL to FastAPI the proper way as a data scientist.

Technologies
FastAPI
SQLAlchemy
Alembic

Introduction

The popularity of FastAPI has been steadily growing over the last few years. It is slowly climbing the top 25 web frameworks and technologies used in the Stack Overflow Developer Survey 2022, being right behind Flask and Django (but beating both in Loved vs. Dreaded). In an age where data and AI are claiming a bigger part of any business, it is no surprise there is a lot of interest in the lightweight, performant API framework for Python. And you only need a few lines of code to get started:

1
2
3
4
5
6
7
from fastapi import FastAPI

app = FastAPI()

@app.get("/")
def root():
    return "hello"

FastAPI’s usage among data scientists and ML engineers is growing as we speak. The framework being so minimalistic might tempt you to believe there is not a lot of room for error. But complexity is incremental. Eventually these API’s will have to be updated, extended and maintained. You would be better off having read The Art of Computer Programming than The Art of Statistics. Just thinking about the poor excuse for an ETL pipeline I hacked together years ago using Pandas makes me shiver to this day.

Therefore, we want to share some of our learnings from maturing FastAPI apps, in particular: adding a relational database. Whether you use FastAPI in a large web app or a small ML inference service, chances are you introduce one at some point. You might want to track metrics, or add some users. The most popular Python package to use for adding a relational database is SQLAlchemy. It provides a core framework for communicating with a database, as well as an ORM. Everyone working with a relational database in Python has worked with - or at least considered - SQLAlchemy. In this blog post, we share 10 tips for integrating SQLAlchemy with your FastAPI application. The code samples can be seen in action in our FastAPI Template Project.

FastAPI and DDD

When you decided to add a relational database, it is important to integrate it properly. In this section, we introduce a few practices and abstractions we have found to be helpful for both maintaining and testing the code.

1. Don’t put BaseModels in your endpoints

SQLAlchemy ORM uses what is called the data mapper pattern. The data mapper pattern is great for its flexibility and attempts to bridge the Object–relational impedance mismatch. But there are disadvantages to using these ORM models throughout your application, particularly your API. What if you want to keep the endpoints simple to users, but store objects across multiple tables or hide certain fields to users? The code could easily end up bending over backwards to add awkward logic. This could make your API hard to use. You should consider evolving your API and database schema separately.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
class TodoInDB(SQL_BASE):  # type: ignore
    __tablename__ = "todo"

    id = Column(Integer, primary_key=True, autoincrement=True)
    key = Column(String(length=128), nullable=False, unique=True)
    value = Column(String(length=128), nullable=False)
    done = Column(Boolean, default=False)


class Todo(BaseModel):
    key: str
    value: str
    done: bool = False

In particular, you should not try to create a class that is both a Pydantic BaseModel and a declarative base from SQLAlchemy. Pydantic is mainly an input validation library. Above we see an example of a model for which both a Pydantic BaseModel is created for the API and a SQLAlchemy model. You only need to translate these objects into one another. (The next tip tells you where to do that.)

2. Use the Repositories Pattern

Being able to swap components in your software tells you that it is modular. And although you are probably not switching databases anytime soon, being able to swap real persistence with in-process memory can both allow you to postpone the decision to introduce new entities (or even a whole database) as well as make it easier to write unit tests. It requires creating an interface around the storage layer and its querying logic. Enter the Repository Pattern.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
class TodoRepository:
    def save(self, todo: Todo):
        raise NotImplementedError()

    def get_by_key(self, key: str) -> Optional[Todo]:
        raise NotImplementedError()

class SQLTodoRepository(TodoRepository):
    def __init__(self, session: Session):
        self._session: Session = session

    def save(self, todo: Todo):
        self._session.add(TodoInDB(key=todo.key, value=todo.value))

    def get_by_key(self, key: str) -> Optional[Todo]:
        instance = self._session.query(TodoInDB).filter(TodoInDB.key == key).first()

        if instance:
            return Todo(key=instance.key, value=instance.value, done=instance.done)

The Repository Pattern can be found in books such as Domain Driven Design. For our purposes, repositories are intermediaries responsible for storing the entities in your domain model. A basic repository typically has methods such as save(), get_by_id() or get_all_enabled(). The signatures of these methods consist of domain models and query parameters such as filters. They do not expose how they persist entities, only which entities to store and receive using domain models.

The interface of a repository can be implemented by a SQLAlchemy implementation and a basic in-memory implementation. As mentioned before, the latter can be used for unit tests.

3. Create Query Objects

One pattern I personally found to be helpful is the use of Query Objects. Since you often want to query your database with several filters, collecting then in a Query Object allows you to add or remove filter parameters, change defaults filter values and even reuse filters without changing a Repository’s interface. Another benefit is that it reduces the amount of arguments you have to keep track of when fetching entities.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
class TodoFilter(BaseModel):
    limit: int = 1
    key_contains: Optional[str] = None
    value_contains: Optional[str] = None
    done: Optional[bool] = None

class SQLTodoRepository:
    ...

    def get(self, todo_filter: TodoFilter) -> List[Todo]:
        query = self._session.query(TodoInDB)

        if todo_filter.key_contains is not None:
            query = query.filter(TodoInDB.key.contains(todo_filter.key_contains))

        if todo_filter.value_contains is not None:
            query = query.filter(TodoInDB.value.contains(todo_filter.value_contains))

        if todo_filter.done is not None:
            query = query.filter(TodoInDB.done == todo_filter.done)

        if todo_filter.done is not None:
            query = query.limit(todo_filter.limit)

        return [Todo(key=todo.key, value=todo.value, done=todo.done) for todo in query]

We have all written functions with too many arguments at some point; a code-smell suggesting you might want to introduce a new object. In the case of repositories, it’s often a Query Object you need.

Managing SQLAlchemy scopes

Both FastAPI and SQLAlchemy have their own documentation on how to add a database using depends (FastAPI) and managing connections, sessions and transactions, etc. (SQLAlchemy). But the former works with a rather succinct setup, while the latter is a very detailed explanation of how to work with SQLAlchemy in all its glory. Here, we show a complete yet simple setup that should do the trick for most API’s. This leverages SQLAlchemy’s different scopes without coupling implementation details to your domain.

4. Re-use database connections

As every documentation page will tell you, a big anti-pattern is not re-using database connections. They are expensive to set up, so reconnecting for every transaction introduces significant overhead. The FastAPI documentation shows you how you could approach this. One way to deal with this is caching the function that initialises the engine (the object managing the connection pool).

1
2
3
@lru_cache(maxsize=None)
def get_engine():
    return create_engine(os.getenv("DB_STRING"), pool_pre_ping=True)

This way you re-use the engine between request and only pay for the connections on the first one.

5. One session per request

The second layer of bookkeeping is managing sessions. Sessions are not expensive to create. They form a layer between the database that performs transactions, identity mapped objects etc. SQLAlchemy has its own documentation on sessions, where they define the transaction scope and the session scope. In short, in the case of API’s it is good practice to create one session per request.

To manage a sessions life cycle, we can leverage FastAPI’s Depends by creating an iterator.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
def create_todo_repository() -> Iterator[TodoRepository]:
    session = sessionmaker(bind=get_engine())()
    todo_repository = SQLTodoRepository(session)

    try:
        yield todo_repository
    except Exception:
        session.rollback()
        raise
    finally:
        session.close()

The FastAPI endpoint to create a Todo would then simply read:

1
2
3
@app.post("/{key}")
def create(key: str, value: str, todo_repository: TodoRepository = Depends(create_todo_repository)):
    ...

Every time the endpoint is called, a repository is instantiated with a new session that is automatically closed after the request ends. Even when an exception occurs, we rollback the session.

6. Transactions using context managers

The last thing to manage is individual transactions. Sometimes, one transaction per request offers enough granularity. In that case, committing the session right before closing in the create_todo_repository function does the trick. When you want to commit transactions in the service layer, repositories need a way to construct transactions without introducing a leaky abstraction (by exposing the session). Luckily, context managers are a great solution for this.

To make a context manager out of a repository, you need to implement an __enter__() and an __exit__() method. We can commit the session when exiting the context. The __exit__() even allows you to handle any exceptions raised in the current context, so you can safely rollback any pending transactions before exiting.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
class SQLTodoRepository:
    def __init__(self, session):
        self._session: Session = session

    def __enter__(self):
        return self

    def __exit__(self, exc_type, exc_value: str, exc_traceback: str) -> None:
        if exc_type is not None:
            self._session.rollback()
            return

        try:
            self._session.commit()
        except DatabaseError:
            self._session.rollback()
            raise

    ...

7. Add Alembic

You are most likely going to change your database schema at one point. Alembic has been the standard schema management solution for SQLAlchemy. (They were also developed by the same author!)

8. Test your repositories

The great thing about repositories is that they are easy to test. You should check if they save and return the right entities, perform the right filtering and they do not accept invalid values. If you test your repositories well, it gives you a lot of comfort writing more involved business logic since you know that you are at least using the right entities. If you have in-memory implementations (and perhaps add ContractTests for your repositories as well), you can even swap the SQL implementations for these lightweight test doubles to unit test business logic.

While we are at it, we might as well test our migrations by using Alembic in our test setup. This ensures you do not forget to generate and commit your migrations, since your tests will fail without them.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
@pytest.fixture
def todo_repository():
    time.sleep(1)
    alembicArgs = ["--raiseerr", "upgrade", "head"]
    alembic.config.main(argv=alembicArgs)

    engine = get_engine(os.getenv("DB_STRING"))
    session = sessionmaker(bind=engine)()

    yield SQLTodoRepository(session)

    session.close()

    sessionmaker(bind=engine, autocommit=True)().execute(
        ";".join([f"TRUNCATE TABLE {t} CASCADE" for t in SQL_BASE.metadata.tables.keys()])
    )

@pytest.mark.integration
def test_repository(todo_repository: SQLTodoRepository):
    with todo_repository as r:
        r.save(Todo(key="testkey", value="testvalue"))

    todo = r.get_by_key("testkey")
    assert todo.value == "testvalue"

    with pytest.raises(IntegrityError):
        with todo_repository as r:
            r.save(Todo(key="testkey", value="not allowed: unique todo keys!"))

    with pytest.raises(DataError):
        with todo_repository as r:
            r.save(Todo(key="too long", value=129 * "x"))

@pytest.mark.integration
def test_repository_filter(todo_repository: SQLTodoRepository):
    with todo_repository as r:
        r.save(Todo(key="testkey", value="testvalue"))
        r.save(Todo(key="abcde", value="v"))

    todos = r.get(TodoFilter(key_contains="test"))
    assert len(todos) == 1
    assert todos[0].value == "testvalue"

    todos = r.get(TodoFilter(key_contains="abcde"))
    assert len(todos) == 1
    assert todos[0].value == "v"

    assert len(r.get(TodoFilter(key_contains="e"))) == 2
    assert len(r.get(TodoFilter(key_contains="e", limit=1))) == 1
    assert len(r.get(TodoFilter(value_contains="v"))) == 2
    assert len(r.get(TodoFilter(done=True))) == 0

When to add a database to FastAPI

Before diving into SQLAlchemy, here are two tips that help make the right considerations when you have not bitten the bullet yet about introducing SQLAlchemy.

9. Make sure your FastAPI app needs a relational database

Everything is a trade-off. And although a big part of the web runs on relational databases, PostgreSQL is not the answer to everything. Despite their versatility, relational databases can be a mismatch for API’s, especially at an early stage. Does your inference endpoint only need a trained model that updates its version once in a while? Then it is probably enough to add the version number to the models filename. (Python wheels is one of the many examples of how data can be managed largely by a naming convention.) Want to cache some calculations to speed up response times? Check out the functools caching decorators before adding Redis. Even when adding authentication to your API, if it is only for your launching customer you could just verify their hashed credentials directly in your environment.

On the other hand, your API might have performance requirements that can not be met by a relational database. Relational databases are designed and optimized for OLTP purposes. For OLAP use-cases where your API serves complex analytics dashboards on large amounts of data, there may be more suitable database technologies (check out Google BigQuery, Amazon Redshift, HBase… or even DuckDB!). The expected amount of reads and writes, the most common query pattern and ACID requirements are all factors that should be considered. The data domain has developed vastly and if we have learned something, it is that there is no one size fits all.

10. Spend some time on your data model

If you postponed adding a relational database long enough, you should take another critical look at your data model. Of course, if you manage the data yourself and it is not mission critical, you get away with almost anything. But when you put your app in a production environment and start storing other people’s data, migrations of a quickly evolving database schema can become a real pain. Classics include creating a nullable column that should have been mandatory or decreasing VARCHAR lengths. And how normalized should your schema be? Do you create that extra table and sacrifice some query performance because of extra joins you need to do, or do you accept the duplication?

Architecture is what’s hard to change, and your database schema is part of it. Getting an outside view can be very helpful as well!

Conclusion

SQLAlchemy and FastAPI are both great Python packages that get you up and running with minimal setup. In this post we listed several aspects to take into account as you add a relational database to your API and your project matures. But as mentioned above, everything is a trade-off. You might find turning on Pydantic’s ORM mode a lot easier to work with and never run into needing more flexibility on both your API and database mapping. Maybe you want to leave out the context manager and just set autocommit=True while creating a new session. In the end, you know the requirements of your application best.

Having said that, hopefully these tips help you to build out a great data application or at least give food for thought regarding its design and setup. If you want to see all the code in this post in action, you can check out our FastAPI Template Project on Github and read our post about it. If you think we missed important tips or you disagree on the ones listed, feel free to reach out! We are always happy to discuss and improve on ideas about Python API’s and data.

About the author

Donny Peeters

Donny is a mathematician who started freelancing as a software/data engineer before co-founding BiteStreams. He can quickly spot design quirks and adapt systems to business needs. In his free-time he likes to work out, read and grab a drink with friends.

About us

Do you need a professional data API for the right business insights? Contact us now

Get more data-driven with BiteStreams, and leave the competition behind.

Contact us