2024-05-19

Scaling Python Microservices - service discovery

In a previous blog post we’d seen how to use OpenTelemetry and Parca to instrument two microservcies. Imagine that our architecture has grown and we now have many microservices. For example, there is a microservice that is used to send emails, another which keeps track of user profile information, etc. We’d previously hardcoded the HTTP address of the second microservice to make a call from the first microservice. We could have, in a production environment, added a DNS entry which points to the load balancer and used that instead. However, when there are many microservcies, it becomes cumbersome to add these entries. Furthermore, any new microservice which depends on another microservice now needs to have the address as a part of its configuration. Maintaining a single repository, which can be used to find the microservices and their addresses, becomes difficult to maintain. An easier way to find services is to use service discovery - a dynamic way for one service to find another. In this post we’ll modify the previous architecture and add service discovery using Zookeeper.

Before We Begin

In a nutshell, we’ll use Zookeeper for service registration and discovery. We’ll modify our create_app factory function to make one more function call which registers the service as it starts. The function call creates an ephemeral node in Zookeeper at a specified path and stores the host and port. Any service which needs to find another service looks for it on the same path. We’ll create a small library for all of this which will make things easier.

Getting Started

We’ll begin by creating a simple attr class to represent a service.

@define(frozen=True)
class Service:
    host: str
    port: int

We’ll be using the kazoo Python library to interact with Zookeeper. The next step is to create a private variable which will be used to store the Zookeeper client.

1	_zk: KazooClient \| None = None

Next we’ll add the function which will be called as a part of the create_app factory function.

1
2
3

def init_zookeeper_and_register():
    _init_zookeeper()
    _register_service()

In this function we’ll first initialize the connection to Zookeeper and then register our service. We’ll see both of these functions next, starting with the one to init Zookeeper.

def _init_zookeeper():
    global _zk

    if not _zk:
        _zk = KazooClient(hosts='127.0.0.1:2181')
        _zk.start(timeout=5)
        _zk.add_listener(_state_change_listener)

The _zk variable is meant to be a singleton. I was unable to find whether this is thread-safe or not but for the sake of this post we’ll assume it is. The start method creates a connection to Zookeeper synchronously and raises an error if no connection could be made in timeout seconds. We also add a listener to respond to changes in the state of a Zookeeper connection. This enables us to respond to scenarios where the connection drops momentarily. Since we’ll be creating ephemeral nodes in Zookeeper, they’ll be removed in the case of a session loss and would need to be recreated. In the listener we recreate the node upon a successful reconnection.

Next we’ll look at the function to register the service.

def _register_service():
    global _zk

    service = os.environ.get("SERVICE")
    host = socket.getfqdn()
    port = os.environ.get("PORT")

    assert service
    assert host
    assert port

    identifier = str(uuid4())
    path = f"/providers/{service}/{identifier}"
    data = {"host": host, "port": int(port)}
    data_bytes = json.dumps(data).encode("utf-8")

    _zk.create(path, data_bytes, ephemeral=True, makepath=True)

We get the name of the service and the port it is running on as environment variables. The host is retrieved as the fully-qualified domain name of the machine we’re on. Although the example is running on my local machine, it may work on a cloud provider like AWS. We then create a UUID identifier for the service, and a path on which the service will be registered. The information stored on the path is a JSON containing the host and the port of the service. Finally, we create an ephemeral node on the path and store the host and port.

Next we’ll look at the function which handles changes in connection state.

1
2
3

def _state_change_listener(state):
    if state == KazooState.CONNECTED:
        _register_service()

We simply re-register the service upon reconnection.

Finally, we’ll look at the function which retrieves an instance of the service from Zookeeper.

def get_service_from_zookeeper(service: str) -> Service | None:
    global _zk
    assert _zk

    path = f"/providers/{service}"
    children = _zk.get_children(path)

    if not children:
        return None

    idx = random.randint(0, len(children) - 1)

    child = children[idx]
    config, _ = _zk.get(f"/providers/{service}/{child}")
    config = config.decode("utf-8")
    config = json.loads(config)

    return Service(**config)

To get a service we get all the children at the path /providers/SERVICE_NAME. This returns a list of UUIDs, each of which represents an instance of the service. We randomly pick one of these instances and fetch the information associated with it. What we get back is a two-tuple containing the host and port, and an instance of ZStat which we can ignore. We decode and parse the returned information to get an instance of dict. This is then used to return an instance of Service. We can then use this information to make a call to the returned service.

This is all we need to create a small set of utility functions to register and discover services. All we need to do to register the service as a part of the Flask application process is to add a function call in the create_app factory function.

def create_app() -> Flask:
    ...

    # -- Initialize Zookeeper and register the service.
    init_zookeeper_and_register()

    return app

Finally, we’ll retrieve the information about the second service in the first service right before we make the call.

def make_request() -> dict:
    service = get_service_from_zookeeper(service="second")
    url = f"http://{service.host}:{service.port}"
    ...

As an aside, the service will automatically be deregistered once it stops running. This is because we created an ephemeral node in Zookeeper and it only persists as long as the session that created it is alive. When the service stops, the session is disconnected, and the node is removed from Zookeeper.

Rationale

The rationale behind registering and discovering services from Zookeeper is to make it easy to create new services and to find the ones that we need. It’s even more convenient when there is a shared library which contains the code that we saw above. For the sake of simplicity, we’ll assume all our services are written in Python and there is a single library that we need. The library could also contain an enum that represents all the services that are registered with Zookeeper. For example, to make a call to an email microservice, we could have code that looks like the following.

1 2	email_service = Service.EMAIL_SERVICE.value service = get_service_from_zookeeper(service=email_service)

This, in my opinion, is much simpler than adding a DNS entry. The benefits add up over time and result in code that is both readable and maintainable.

That’s it. That’s how we can register and retrieve services from Zookeeper. Code is available on Github.

2024-05-12

Scaling Python Microservices - sharding

In one of the previous posts we saw how we can scale a Python microservice and allow it to connect, theoretically, to an infinite number of databases. The way we did this is by fetching the database connection information at runtime from another microservice using the unique identifier of the customer. This allowed us to scale horizontally to some extent. However, there is still the limitation that the data of the customer may be so large that it would exceed the limit of the database when it is scaled vertically. In this post we’ll look at how to extend the architecture we saw previously and shard the data across servers. We’ll continue to use relational databases and see how we can shard the data using Postgres PARTITION BY.

Before We Begin

The gist of scaling by sharding is to split the table into multiple partitions and let each of these be hosted on a separate host. For the purpose of this post we’ll use a simple setup that consists of four partitions that are spread over two hosts. We’ll use Postgres’ Foreign Data Wrapper (FDW) to connect one instance of Postgres to another instance of Postgres. We’ll store partitions in both these hosts, and create a table which uses these partitions. Querying this table would allow us to query data from all the partitions.

Getting Started

My setup has two instances of Postgres, both of which will host partitions. One of them will also contain the base table which will use these partitions. We’ll begin by logging into the first instance and creating the FDW extension which ships natively with Postgres.

1	CREATE EXTENSION postgres_fdw;

Next, we’ll tell the first instance that there is a second instance of Postgres that we can connect to. Since both of these instances are running as Docker containers, I will use the hostname in the SQL query.

1	CREATE SERVER postgres_5 FOREIGN DATA WRAPPER postgres_fdw OPTIONS (host 'postgres_5', dbname 'postgres');

Next, we’ll create a user mapping. This allows the user of the first instance to log into the second instance as one of its users. We’re simply mapping the postgres user of the first instance to the postgres user of the second instance.

1	CREATE USER MAPPING FOR postgres SERVER postgres_5 OPTIONS (user 'postgres', password 'my-secret-pw');

Next, we’ll create the base table. There are a couple of things to notice. First, we use the PARTITION BY clause to specify that the table is partitioned. Second, there is no primary key on this table. Specifying a primary key prevents us from using foreign tables so we’ll omit them.

CREATE TABLE person (
  id BIGSERIAL NOT NULL,
  quarter BIGINT NOT NULL,
  name TEXT NOT NULL,
  address TEXT NOT NULL,
  customer_id TEXT NOT NULL
) PARTITION BY HASH (quarter);

Next, we’ll create two partitions that reside on the first instance. We could, if the data were large enough, host each of these on separate instances. For the purpose of this post, we’ll host them on the same instance.

1 2	CREATE TABLE person_0 PARTITION OF person FOR VALUES WITH (MODULUS 4, REMAINDER 0); CREATE TABLE person_1 PARTITION OF person FOR VALUES WITH (MODULUS 4, REMAINDER 1);

We’ll now switch to the second instance and create two tables which will host the remaining two partitions.

CREATE TABLE person_2 (
  id BIGSERIAL NOT NULL,
  quarter BIGINT NOT NULL,
  name TEXT NOT NULL,
  address TEXT NOT NULL,
  customer_id TEXT NOT NULL
);


CREATE TABLE person_3 (
  id BIGSERIAL NOT NULL,
  quarter BIGINT NOT NULL,
  name TEXT NOT NULL,
  address TEXT NOT NULL,
  customer_id TEXT NOT NULL
);

Once this is done, we’ll go back to the first instance and designate these tables as partitions of the base table.

1
2

CREATE FOREIGN TABLE person_2 PARTITION OF person FOR VALUES WITH (MODULUS 4, REMAINDER 2) SERVER postgres_5;
CREATE FOREIGN TABLE person_3 PARTITION OF person FOR VALUES WITH (MODULUS 4, REMAINDER 3) SERVER postgres_5;

That’s it. This is all we need to partition data across multiple Postgres hosts. We’ll now run a benchmark to insert data into the table and its partitions.

1	ab -p /dev/null -T "Content-Type: application/json" -n 5000 -c 100 -H "X-Customer-ID: 4" http://localhost:5000/person

Once the benchmark is complete, we can query the base table to see that we have 5000 rows.

SELECT COUNT(*) FROM person;

count
5000

What I like about this approach is that it is built using functionality that is native to Postgres - FDW, partitions, and external tables. Additionally, the sharding is transparent to the application; it sees a single Postgres instance.

Finito.

2024-04-07

Implementing TSum with Dask

In one of the previous blog posts I’d written about implementing TSum, a table-summarization algorithm from Google Research. The implementation was written using Javascript and was meant for small datasets that can be summarized within the browser itself. I recently ported the implementation to Dask so that it can be used for larger datasets that consist of many rows. In a nutshell, it lets us summarize a Dask DataFrame and find representative patterns within it. In this post we’ll see how to use the algorithm to summarize a Dask DataFrame, and run benchmarks to see its performance.

Before We Begin

Although the library is designed to be used in production on data stored in a warehouse, it can also be used to summarize CSV or Parquet files. In essence, anything that can be read into a Dask DataFrame can be summarized.

Getting Started

Summarizing data

Imagine that we have customer data stored in a datawarehouse that we’d like to summarize. For example, how would we best describe the customer’s behavior given the data? In essence, we’d like to find patterns within this dataset. In scenarios like these, TSum works well. As an example of data summarization, we’ll use the patient data given in the research paper and pass it to the summarization algorithm.

We’ll begin by adding a function to generate some test data.

def data(n=1):
    return [
        {"gender": "M", "age": "adult", "blood_pressure": "normal"},
        {"gender": "M", "age": "adult", "blood_pressure": "low"},
        {"gender": "M", "age": "adult", "blood_pressure": "normal"},
        {"gender": "M", "age": "adult", "blood_pressure": "high"},
        {"gender": "M", "age": "adult", "blood_pressure": "low"},
        {"gender": "F", "age": "child", "blood_pressure": "low"},
        {"gender": "M", "age": "child", "blood_pressure": "low"},
        {"gender": "F", "age": "child", "blood_pressure": "low"},
        {"gender": "M", "age": "teen", "blood_pressure": "high"},
        {"gender": "F", "age": "child", "blood_pressure": "normal"},
    ] * int(n)

We’ll then add code to summarize this data.

import json
import time

import cattrs
import dask.dataframe as dd
import pandas as pd
import tabulate

from tsum import summarize

if __name__ == "__main__":
    from dask.distributed import LocalCluster

    cluster = LocalCluster(n_workers=1, nthreads=8, diagnostics_port=8787)
    client = cluster.get_client()

    df = pd.DataFrame.from_records(data=data(n=1))
    ddf = dd.from_pandas(df, npartitions=4)
    t0 = time.perf_counter()
    patterns = summarize(ddf=ddf)
    t1 = time.perf_counter()

    dicts = [cattrs.unstructure(_) for _ in patterns]
    print(json.dumps(dicts, indent=4))

Upon running the script we get the following patterns.

[
    {
        "pattern": {
            "gender": "M",
            "age": "adult"
        },
        "saving": 3313,
        "coverage": 50.0
    },
    {
        "pattern": {
            "age": "child",
            "blood_pressure": "low"
        },
        "saving": 1684,
        "coverage": 30.0
    }
]

This indicates that the patterns that best describe our data are “adult males”, which comprise 50% of the data, followed by “children with low blood pressure”, which comprise 30% of the data. We can verify this by looking at the data returned from the data function, and from the patterns mentioned in the paper.

Running benchmarks

To run the benchmarks, we’ll modify the script and create DataFrames with increasing number of rows. The benchmarks are being run on my local machine which has an Intel i7-8750H, and 16GB of RAM. The script which runs the benchmark is given below.

if __name__ == "__main__":
    from dask.distributed import LocalCluster

    cluster = LocalCluster(n_workers=1, nthreads=8, diagnostics_port=8787)
    client = cluster.get_client()
    table = []

    for n in [1, 1e1, 1e2, 1e3, 1e4, 1e5, 1e6]:
        df = pd.DataFrame.from_records(data=data(n=n))
        ddf = dd.from_pandas(df, npartitions=4)
        t0 = time.perf_counter()
        summarize(ddf=ddf)
        t1 = time.perf_counter()
        table.append(
            {
                "Rows": len(ddf),
                "Time Taken (seconds)": (t1 - t0),
            }
        )

    print(tabulate.tabulate(table))

This is the output generated. As we can see, it takes 17 minutes for 1e6 rows.

--------  ---------
      10    14.5076
     100    24.1455
    1000    23.4862
   10000    23.4842
  100000    32.8378
 1000000   121.013
10000000  1050.46
--------  ---------

Conclusion

That’s it. That’s how we can summarize a Dask DataFrame using TSum. The library is available on PyPI and can be installed with the following command.

1	pip install tsum

The code is available on GitHub. Contributions welcome.

2024-03-31

Running Database Migrations

Let’s start with a question: how do you run database migrations? Depending on the technology you are using, you may choose something like Flyway, Alembic, or some other tool that fits well with your process. My preference is to write and run the migrations as SQL files. I recently released a small library, yoyo-cloud, that allows storing the migrations as files in S3 and then applying them to the database. In this post we will look at the rationale behind the library, and the kind of workflow it enables.

Before We Begin

We’ll start with an example that shows how to run migrations on a Postgres instance. There are a couple of SQL files that I have stored in an S3 bucket — one to create a table, and another to insert a row after the table is created. The bucket is public so you should be able to run the snippet below.

from yoyo import get_backend
from yoyo_cloud import read_s3_migrations

if __name__ == "__main__":
    migrations = read_s3_migrations(paths=["s3://yoyo-migrations/"])
    backend = get_backend(f"postgresql://postgres:my-secret-pw@localhost:5432/postgres")

    with backend.lock():
        # -- Apply any outstanding migrations
        backend.apply_migrations(backend.to_apply(migrations))

As you can see from the imports, we use a combination of the original library, yoyo, which provides a helper function to connect to the database, and the new library, yoyo_cloud, which provides a helper function to read the migrations that are stored in S3.

The read_s3_migrations function reads the files that are stored in S3. It takes as input a list of S3 paths that point to directories where the files are stored. This function is similar in interface to the read_migrations function in yoyo which reads migrations from a directory on the file system except that it reads from S3; the value returned is the same — a list of migrations.

Finally, we apply the migrations to the Postgres instance. The purpose of this example was to demonstrate the simplicity with which migrations stored in S3 can be applied to a database using yoyo_cloud. We will now look at the rationale behind the library in the next section.

Rationale

The rationale behind the library, as mentioned at the start of this post, is to store SQL migrations as files and apply them one-by-one. Additionally, it allows migrating multiple similar databases easily. In the previous post on scaling Python microservices we’d seen how a Flask microservice can dynamically connect to multiple databases. If we want to apply a migration, say adding a new column to a table, we’d want it to be applied to all the databases that the service connects to. Using yoyo_cloud we can read the migration from S3, and apply to every table. Since the migrations are idempotent, we can safely reapply the previous migrations.

Workflow

Let’s assume we’d like to create an automation which applies database migrations. We’ll assume we have two environments — stage, and prod. Whenever a migration is to be released to production, it is first tested in the dev environment, and then committed to version control to be applied to the staging environment. These could be stored as a part of the same repository or in a separate repository that contains only migrations. Let’s assume it is the latter. We could have a directory structure as follows.

migrations
  - order_service
    - prod
      - 001-create-table.sql
    - stage
      - 001-create-table.sql
      - 002-add-column.sql

The automation could then apply these migrations to the relevant environment of the service. An additional benefit of this approach is that it makes setting up new environments easier because the migrations can applied to a new database. Additionally, committing migrations to version control, as opposed to running them in an adhoc manner, allows keeping track of when the migration was introduced, why, and by whom.

Conclusion

That’s it. That’s how we can apply migrations stored in S3 using yoyo_cloud. If you’re using it, or considering using it, please leave a comment.

2024-03-25

Scaling Python Microservices - dynamic databases

In one of the previous posts we saw how to set up a Python microservice. This post is a continuation and shows how to scale it. Specifically, we’ll look at handling larger volumes of data by sharding it across databases. We’ll shard based on the customer (or tenant, or user, etc.) making the request, and route it to the right database. While load tests on my local machine show promise, the pattern outlined is far from production-ready.

Before We Begin

We’ll assume that we’d like to create a service that can handle growing volume of data. To accommodate this we’d like to shard the data across databases based on the user making the request. For the sake of simplicity we’ll assume that the customer is a user of a SaaS platform. The customers are heterogenous in their volume of data — some small, some large, some so large that they require their own dedicated database.

We’ll develop a library, and a sharding microservice. Everything outlined in this post is very specific to the Python ecosystem and the libraries chosen but I am hopeful that the ideas can be translated to a language of your choice. We’ll use Flask and Peewee to do the routing and create a pattern that allows transitioning from a single database to multiple databases.

The setup is fully Dockerized and consists of three Postgres databases, a sharding service, and an api service.

Getting Started

In a nutshell, we’d like to look at a header in the request and decide which database to connect to. This needs to happen as soon as the request is received. Flask makes this easy by allowing us to execute functions before and after receiving a request and we’ll leverage them to do the routing.

Library

We’ll begin by creating a library that allows connecting to the right database.

@attr.s(auto_attribs=True, frozen=True)
class Shard:
    config: APIConfig | StandaloneConfig
    identifier_field: str

    def db(self, identifier: Identifier) -> Database:
        credentials = self._db_credentials(identifier=identifier)

        credentials_dict = attr.asdict(credentials)  # noqa
        credentials_dict.pop("flavor")

        if credentials.flavor == DatabaseFlavor.POSTGRES:
            return PostgresqlDatabase(**credentials_dict)

        if credentials.flavor == DatabaseFlavor.MYSQL:
            return MySQLDatabase(**credentials_dict)

    def _db_credentials(self, identifier: Identifier) -> DBCredentials:
        if isinstance(self.config, StandaloneConfig):
            credentials = attr.asdict(self.config)  # noqa
            return DBCredentials(**credentials)
        return self._fetch_credentials_from_api(identifier=identifier)

    def _fetch_credentials_from_api(self, identifier: Identifier) -> DBCredentials:
        url = f"{self.config.endpoint}/write/{str(identifier)}"
        response = requests.get(url)
        response.raise_for_status()
        json = response.json()
        return cattrs.structure(json, DBCredentials)

An instance of Shard is responsible for connecting to the right database depending on how it is configured. In “standalone” mode, it connects to a single database for all customers. This is helpful when creating the microservice for the first time. In “api” mode it makes a request to the sharding microservice. This is helpful when we’d like to scale the service. The API returns the credentials for the appropriate database depending on the identifier passed to it. The identifier_field is a column which must be present in all tables. For example, every table must have a “customer_id” column.

We’ll add a helper function to create an instance of the Shard. This makes it easy to transition from standalone to api mode by simply setting a few environment variables.

def from_env_variables() -> Shard:
    mode = os.environ.get(EnvironmentVariables.SHARDING_MODE)
    identifier_field = os.environ.get(EnvironmentVariables.SHARDING_IDENTIFIER_FIELD)

    if mode == ShardingMode.API:
        endpoint = os.environ.get(EnvironmentVariables.SHARDING_API_ENDPOINT)
        config = APIConfig(endpoint=endpoint)
        return Shard(config=config, identifier_field=identifier_field)

    if mode == ShardingMode.STANDALONE:
        host = os.environ.get(EnvironmentVariables.SHARDING_HOST)
        port = os.environ.get(EnvironmentVariables.SHARDING_PORT)
        user = os.environ.get(EnvironmentVariables.SHARDING_USER)
        password = os.environ.get(EnvironmentVariables.SHARDING_PASSWORD)
        database = os.environ.get(EnvironmentVariables.SHARDING_DATABASE)
        flavor = os.environ.get(EnvironmentVariables.SHARDING_FLAVOR)

        config = StandaloneConfig(
            host=host,
            port=int(port),
            user=user,
            password=password,
            database=database,
            flavor=flavor,
        )

        return Shard(config=config, identifier_field=identifier_field)

API service

We’ll add request hooks to the API service which will use an instance of the Shard to connect to the right database.

_shard = sharding.from_env_variables()


def _before_request():
    identifier = request.headers.get("X-Customer-ID")
    g.db = _shard.db(identifier=identifier)


def _after_request(response):
    g.db.close()
    return response


api = Blueprint("api", __name__)
api.before_app_request(_before_request)
api.after_app_request(_after_request)

We’re creating an instance of Shard and using it to retrieve the appropriate database. This is then stored in the per-request global g. This lets us use the same database throughout the context of the request. Finally, we register the before and after hooks.

We’ll now add functions to the library which allow saving and retrieving the data using the database stored in g.

def save_with_db(
    instance: Model,
    db: Database,
    force_insert: bool = False,
):
    identifier_field = os.environ.get(EnvironmentVariables.SHARDING_IDENTIFIER_FIELD)
    identifier = getattr(instance, identifier_field)
    assert identifier, "identifier field is not set on the instance"

    with db.bind_ctx(models=[instance.__class__]):
        instance.save(force_insert=force_insert)

What allows us to switch the database at runtime is the bind_ctx method of the Peewee Database instance. This temporarily binds the model to the database that was retrieved using the Shard. In essence, we’re storing and retrieving the data from the right database.

Next we’ll add a simple Peewee model that represents a person.

class Person(Model):
    class Meta:
        model_metadata_class = ThreadSafeDatabaseMetadata
    
    id = BigAutoField(primary_key=True)
    name = TextField()
    address = TextField()
    customer_id = TextField()

We’ll add an endpoint which will let us save a row with some randomly generated data.

@api.post("/person/<string:customer_id>")
def post_person(customer_id: str) -> dict:
    fake = Faker()
    person = Person(
        name=fake.name(),
        address=fake.address(),
        customer_id=customer_id,
    )
    save_with_db(instance=person, db=g.db, force_insert=True)
    return {"success": True}

Sharding service

We’ll add an endpoint to the sharding service which will return the database to connect to over an API.

@api.get("/write/<string:identifier>")
def get_write_db(identifier: str):
    return {
        "host": f"postgres_{identifier}",
        "port": 5432,
        "flavor": "postgres",
        "user": "postgres",
        "password": "my-secret-pw",
        "database": "postgres",
    }

Here we’re selecting the right database depending on the customer making the request. For the sake of this demo the information is mostly static but in a production scenario this would come from a meta database that the sharding service connects to.

Databases

We’ll now create tables in each of the three databases.

CREATE TABLE person (
  id BIGSERIAL PRIMARY KEY,
  name TEXT NOT NULL,
  address TEXT NOT NULL,
  customer_id TEXT NOT NULL
);

Testing

We’ll add a small script which uses Apache Bench to send requests for three different customers.

#!/bin/sh

ab -p /dev/null -T "Content-Type: application/json" -n 1000 -c 100 -H "X-Customer-ID: 1" http://localhost:5000/person/1 &
ab -p /dev/null -T "Content-Type: application/json" -n 1000 -c 100 -H "X-Customer-ID: 2" http://localhost:5000/person/2 &
ab -p /dev/null -T "Content-Type: application/json" -n 1000 -c 100 -H "X-Customer-ID: 3" http://localhost:5000/person/3 &

We’ll run the script and wait for it to complete.

1	./bench.sh

We’ll now connect to one of the databases and check the data. I’m connecting to the third instance of Postgres which should have data for the customer with ID “3”. We’ll first check the count of the rows to see that all 1000 rows are present, and then check the customer ID to ensure that requests are properly routed in a multithreaded environment.

1	SELECT COUNT(*) FROM person;

This returns the following:

1 2	count 1000

We’ll now check for the customer ID stored in the database.

1	SELECT DISTINCT customer_id FROM person;

This returns the following:

1 2	customer_id 3

Conclusion

That’s it. That’s how we can connect a Flask service to multiple databases dynamically.

2024-03-22

Hosting Python Packages

In the previous post we saw how to create Python microservices. It’s likely that these microservices will share code. For example libraries for logging, accessing the database, etc. In this post we’ll see how to create and host Python packages.

Before We Begin

The setup is intentionally chosen for simplicity. We’ll separate the common code into a repository of its own, and use Poetry to package it. We’ll deploy it to S3, and use dumb-pypi to generate a static site which can be used to list the hosted packages.

The setup can be expanded to accommodate larger organizations with more teams, but it is primarily intended for smaller organizations with fewer teams that need to ramp up quickly. The main idea behind package distribution for Python is to arrange the files in a particular directory structure that can be accessed by a web server. While I am utilizing S3, you are free to use any comparable technology. You may even host everything on virtual machines (VMs) that are placed behind a load balancer.

Getting Started

Let’s start with a package called “common” which contains the OTel code we saw previously. In essence, we’ve simply copied the contents of package over into a new repository. Next we’ll intialize a Poetry project and enter the relevant information interactively.

1	poetry init --name="common" --description="A collection of utilities" --python="^3.12"

Next we’ll add the relevant dependencies with Poetry. The first three are for the common code, and the fourth is for generating static pages.

poetry add opentelemetry-sdk@latest
poetry add opentelemetry-api@latest
poetry add setuptools@latest
poetry add --group dev dumb-pypi@latest

Next we’ll write a small bash script which will create the package, generate a static site to display hosted packages, and upload them to S3.

deploy.sh

#!/bin/sh
PROJECT="common"
BUCKET="... bucket name ..."
TEAM="devtools"

# -- Generate the package file for dumb-pypi
PACKAGE=$(poetry build | awk '{print $3}' | sed -n '3p')
echo "$PACKAGE" >> packages
sort -u packages -o packages

# -- Upload package
aws s3 cp "dist/$PACKAGE" "s3://$BUCKET/packages/$TEAM/$PROJECT/$PACKAGE"

# -- Generate static files for PyPi
dumb-pypi \
    --package-list packages \
    --packages-url "https://$BUCKET.s3.us-west-2.amazonaws.com/packages/$TEAM/$PROJECT" \
    --output-dir index

# -- Upload the files in S3
aws s3 cp --recursive index/ "s3://$BUCKET/dumb-pypi/$TEAM/index/"

# -- Open static pages in the browser
open "https://$BUCKET.s3.us-west-2.amazonaws.com/dumb-pypi/$TEAM/index/index.html"

# -- Serve static pages locally
# -- python -m http.server -b 0.0.0.0 -d index 8080

There’s a lot going on in the script. First we build the package using Poetry on line #7. The output contains the name of the zip file and we extract that into a variable. This is then stored into a file on line #8 and will be used by dumb-pypi to generate the static files. We sort and deduplicate the packages on line #9. On line #12 we store copy the package to S3. On line #15 we generate the static files into a folder named index which we then copy to S3 on line #21. On line #24 we open the documentation in the browser.

The S3 bucket contains a directory structure as mentioned in the Python packaging docs.^[1]

Before we run the script, however, we will have to configure the S3 bucket to be publicly accessible. We do so by enabling it to serve static content, and adding a policy which enables access to the content. The policy is given below.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "ReadIndex",
            "Effect": "Allow",
            "Principal": "*",
            "Action": "s3:GetObject",
            "Resource": "arn:aws:s3:::yourbucketnamehere/*"
        }
    ]
}

Now we’ll run the script.

lang;bash

1	./deploy.sh

The browser displays the following page.

We can now test the installation of the package with both Poetry and Pip. We will do this in a different Poetry project so that the install does not conflict with the existing “common” package.

pip install --extra-index-url https://selfhostedpackages.s3.us-west-2.amazonaws.com/packages/devtools/ common

Looking in indexes: https://pypi.org/simple, https://selfhostedpackages.s3.us-west-2.amazonaws.com/packages/devtools/
Collecting common
  Using cached common-0.1.2.tar.gz (3.5 kB)
  Preparing metadata (setup.py) ... done
Building wheels for collected packages: common
  Building wheel for common (setup.py) ... done
  Created wheel for common: filename=common-0.1.2-py3-none-any.whl size=3708 sha256=51d2f21c15829e49375762f5ca246d7f0e4d0bc82c425b25b5e77fcc83e97eae
  Stored in directory: /home/fasih/.cache/pip/wheels/12/f4/3f/8982873f5bfad3134251f605011de0c35f93d64b78cb07e3b8
Successfully built common
Installing collected packages: common
Successfully installed common-0.1.2

Notice how we added the location an extra index using --extra-index-url. The URL points to the devtools directory in the bucket which is the root directory for the packages created by the “devtools” team. The subdirectories follow the layout mentioned previously.

Next we’ll try the same with Poetry after uninstalling the package using pip. First, we’ll check how Poetry allows us to do it.

1
2
3

poetry add --help
...
A url (https://example.com/packages/my-package-0.1.0.tar.gz)

Let’s go ahead and add the package in that format.

poetry add --dry-run https://selfhostedpackages.s3.us-west-2.amazonaws.com/packages/devtools/common/common-0.1.0.tar.gz

Updating dependencies
Resolving dependencies... (0.1s)

Package operations: 3 installs, 0 updates, 0 removals, 29 skipped
...

We can verify that the package was installed successfully by importing it in a Python shell.

Python 3.12.0 | packaged by Anaconda, Inc. | (main, Oct  2 2023, 17:29:18) [GCC 11.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from common import create_histogram
>>>

Conclusion

That’s it. That’s how we can host our own Python packages using S3. Note that this is a setup to get started quickly. If you’re looking for a more matured setup, please take a look at the devpi project.^[2] The code is available on Github.

Footnotes and References

[1] https://packaging.python.org/en/latest/guides/hosting-your-own-index/#manual-repository
[2] https://devpi.net/docs/devpi/devpi/latest/+doc/index.html

2024-03-21

Setting up a Python microservice

In my tenure as an engineer working for early-to-mid stage startups, I’ve noticed that the first iteration of the product is usually a monolith; all of the code is written in a monorepo, and the tooling, and processes are created to support it. While this helps to ship the product quick and early, it becomes difficult to scale both in terms of software systems, and teams which write code. For example, if multiple teams are creating features then merging the code becomes a bottleneck since all the code is committed to the same repository. Additionally, all of the changes being made to the code that is shared among multiple teams needs to be backward compatible, and any breaking change needs to be communicated to the relevant teams. All of this can slow down the subsequent releases.

As the startup grows, and the multiple features become products of their own, it may be required to separate the monolith into microservices. This requires a change in how systems are designed, and deployed. For example, we’d now need to trace an API request across multiple services. While many teams moving from monoliths to microservices think of them as code that is written in separate repositories, and deployed independently, they’re actually smaller subsystems of the larger software.

This post is my opinion on how Python microservices can be created. While we’ll see libraries and tools used from the Python ecosystem, the ideas discussed in the following sections can be applied to the language of your choice. We’ll create a couple of microservices, and see how we can add logging, tracing, metrics, and profiling to them. The goal is to develop a blueprint that can be used to create microservices.

Before We Begin

We’ll look at two key parts of creating a microservice: telemetry, and profiling. Telemetry includes collecting logs, and metrics. Profiling includes analysing the CPU, memory, IO, etc. Both of these put together give us the complete picture of how the microservice is performing. We’ll use OpenTelemetry and Parca for telemetry, and profiling respectively. A detailed introduction to both of these projects is beyond the scope of this post and you’re encouraged to read the relevant documentation.

Briefly, we will use a combination of zero-code and code-based instrumentation offered by OTel. The zero-code instrumentaion is helpful as it adds instrumentation to the libraries that we are using. If we’d like to instrument our code, we’ll have to use code-based instrumentaion and add it to the source ourselves.

Finally, the setup consists of two Flask apps, and Parca Agent running on the machine; everything else runs as Docker containers. We’ll run an OTel Collector exporter to collect the metrics and logs emitted from OpenTelemetry, Jaeger to display the distributed trace, and Parca Server for continuous profiling.^[1]

Getting Started

For the sake of brevity, we will only look at parts of the code that are necessary for the post and will exclude all the scaffolding. The complete code is available in the repository mentioned at the end.

Microservices

We’ll begin by looking at how the microservices work. The two microservices are called “first” and “second”. The first microservice receives an API call, sleeps for duration, and makes an API call to the second. The second receives the API call and makes an API call to the /delay endpoint of HTTPBin. We are trying to mimic a real-world scenario by adding some delay, and a slow API call.

The code for the first endpoint is given below.

@api.post("/")
def post() -> dict:
    time.sleep(random.random())
    return make_request()  # Makes a call to second

The code for the second endpoint is given below.

1
2
3

@api.post("/")
def post() -> dict:
    return make_request()  # Makes a call to HTTPBin

We can test the two endpoints by running the Flask apps and then making a call to the first.

1	curl -s -XPOST localhost:5000 \| jq .

Adding automatic instrumentation

We’ll now begin by adding zero-code instrumentation to both of these microservices. The first step is to install the packages.

1	pip install opentelemetry-distro opentelemetry-exporter-otlp

This installs the API, SDK, opentelemetry-bootstrap, and opentelemetry-instrument. We’ll now use opentelemetry-bootstrap to install the instrumentation libraries for the libraries that we have installed. It does so by reading the list of libraries installed, and fetching the corresponding instrumentation library, if applicable.

1	opentelemetry-bootstrap -a install

Finally, we’ll add two bash scripts to run the apps with OTel enabled. The script for the first service is given below and the one for the second service looks similiar.

#!/bin/sh

export PYTHONPATH=.
opentelemetry-instrument \
    --traces_exporter console \
    --metrics_exporter console \
    --logs_exporter console \
    --service_name first \
    --exporter_otlp_traces_endpoint 0.0.0.0:4317 \
    --exporter_otlp_logs_endpoint 0.0.0.0:4317 \
    --exporter_otlp_metrics_endpoint 0.0.0.0:4317 \
    python first/service.py

We’ll then send requests to the first service using curl and observe the traces of the two services. Specifically, we’re looking for how OTel does distributed tracing.

1	for i in `seq 1 10`; do curl -s -XPOST localhost:5000 > /dev/null; done

One of the trace generated by the first service is the following.

{
    "name": "/",
    "context": {
        "trace_id": "0xaea0259b21595c636b2829efa04b9bdd",
        "span_id": "0xc077c62137db5ea8",
        "trace_state": "[]"
    },
    "kind": "SpanKind.SERVER",
    "parent_id": null,
    "start_time": "2024-03-19T11:25:39.131261Z",
    "end_time": "2024-03-19T11:25:43.223344Z",
    "status": {
        "status_code": "UNSET"
    },
    "attributes": {
        "http.method": "POST",
        "http.server_name": "127.0.0.1",
        "http.scheme": "http",
        "net.host.port": 5000,
        "http.host": "localhost:5000",
        "http.target": "/",
        "net.peer.ip": "127.0.0.1",
        "http.user_agent": "curl/7.81.0",
        "net.peer.port": 37316,
        "http.flavor": "1.1",
        "http.route": "/",
        "http.status_code": 200
    },
    "events": [],
    "links": [],
    "resource": {
        "attributes": {
            "telemetry.sdk.language": "python",
            "telemetry.sdk.name": "opentelemetry",
            "telemetry.sdk.version": "1.23.0",
            "service.name": "first",
            "telemetry.auto.version": "0.44b0"
        },
        "schema_url": ""
    }
}

The context contains the trace_id which is the ID that will be used to trace the request across services. It also contains span_id which represents the current unit of work and that is the request received by the service. The parent_id is null which means that this is the root span; it represents the entry of the request in the mesh of services. The attributes are key-value pairs and they show that the request was received from curl on the / endpoint, and that it returned a 200 response. More detailed information on the structure of the trace can be found in the OTel documentation.^[2]

To trace the request across services, OTel propagates some contextual information.^[3] This allows linking the spans in a downstream service with an upstream service. In our case, the request received by the second service will be associated with the first. This is done by setting the parent_id of one of the spans in the second service to the span_id of the first service. In essence we’re creating a hierarchy. Let’s look at an example.

Here’s a span from the first service. Notice that it’s parent_id is set to a value to inidicate that it is not a root span. The attributes indicate that a request was made to http://localhost:6000/ and that is the endpoint of the second service.

First Span

{
    "name": "POST",
    "context": {
        "trace_id": "0x2025471829c2f09088d8660876b8896f",
        "span_id": "0x84715b55044c5c7d",
        "trace_state": "[]"
    },
    "kind": "SpanKind.CLIENT",
    "parent_id": "0x2fa37c124a45b9e4",
    "start_time": "2024-03-20T05:34:53.311658Z",
    "end_time": "2024-03-20T05:34:55.561899Z",
    "status": {
        "status_code": "UNSET"
    },
    "attributes": {
        "http.method": "POST",
        "http.url": "http://localhost:6000/",
        "http.status_code": 200
    },
    "events": [],
    "links": [],
    "resource": {
        "attributes": {
            "telemetry.sdk.language": "python",
            "telemetry.sdk.name": "opentelemetry",
            "telemetry.sdk.version": "1.23.0",
            "service.name": "first",
            "telemetry.auto.version": "0.44b0"
        },
        "schema_url": ""
    }
}

If we were to depict the flow of request as a tree we get the following.

1 2	Root Span - first service calls the second

Let us now look at a span from the second service. It has the same trace_id as the one from the first, and the parent_id is the span_id of the span we saw previously. The attributes indicate that this span represents the request that was made from the first service to the second.

{
    "name": "/",
    "context": {
        "trace_id": "0x2025471829c2f09088d8660876b8896f",
        "span_id": "0x71860f29c6b0e465",
        "trace_state": "[]"
    },
    "kind": "SpanKind.SERVER",
    "parent_id": "0x84715b55044c5c7d",
    "start_time": "2024-03-20T05:34:53.312600Z",
    "end_time": "2024-03-20T05:34:55.559641Z",
    "status": {
        "status_code": "UNSET"
    },
    "attributes": {
        "http.method": "POST",
        "http.server_name": "127.0.0.1",
        "http.scheme": "http",
        "net.host.port": 6000,
        "http.host": "localhost:6000",
        "http.target": "/",
        "net.peer.ip": "127.0.0.1",
        "http.user_agent": "python-requests/2.31.0",
        "net.peer.port": 48138,
        "http.flavor": "1.1",
        "http.route": "/",
        "http.status_code": 200
    },
    "events": [],
    "links": [],
    "resource": {
        "attributes": {
            "telemetry.sdk.language": "python",
            "telemetry.sdk.name": "opentelemetry",
            "telemetry.sdk.version": "1.23.0",
            "service.name": "second",
            "telemetry.auto.version": "0.44b0"
        },
        "schema_url": ""
    }
}

If we were to depict the flow of request as a tree we get the following.

1
2
3

Root Span
 - first service calls the second
   - second service receives the request

We’ll look at one final span in the second service, and that is the child of the span mentioned above.

{
    "name": "POST",
    "context": {
        "trace_id": "0x2025471829c2f09088d8660876b8896f",
        "span_id": "0x5735ce777bfb155a",
        "trace_state": "[]"
    },
    "kind": "SpanKind.CLIENT",
    "parent_id": "0x71860f29c6b0e465",
    "start_time": "2024-03-20T05:34:53.313345Z",
    "end_time": "2024-03-20T05:34:55.556813Z",
    "status": {
        "status_code": "UNSET"
    },
    "attributes": {
        "http.method": "POST",
        "http.url": "https://httpbin.org/delay/0",
        "http.status_code": 200
    },
    "events": [],
    "links": [],
    "resource": {
        "attributes": {
            "telemetry.sdk.language": "python",
            "telemetry.sdk.name": "opentelemetry",
            "telemetry.sdk.version": "1.23.0",
            "service.name": "second",
            "telemetry.auto.version": "0.44b0"
        },
        "schema_url": ""
    }
}

We can see that the parent_id is the span_id of the previous span, and the attributes indicate that a call was made to HttpBin. Notice that throughout this flow the trace_id has remained the same. We can now look at the final tree to see the flow of requests. Distributed tracing backends like Jaeger provide a visual representation of this flow.

Root Span
 - first service calls the second
   - second service receives the request
     - second service calls HttpBin

So far we’ve sent the traces to the console which is helpful for development. We’ll now look at how we can send the traces to the OpenTelemetry Collector^[4], and from there to appropriate backends.

OTel Collector

Telemetry data is received, processed, and exported via the OTel Collector. It may receive logs and export them to any vendor of your choosing, or it can receive traces and export them to Jaeger. As a result, the services can submit telemetry data to the collector, which will forward it to the relevant backends. This enables the codebase to be instrumented with merely OTel, while a combination of open-source and proprietary backends can be used to process the telemetry data.

We only need to change the command in the bash script that launches the services in order to transmit the data to the Collector. Notice that we have included OTLP as an export option for all of our telemetry data. In a similar vein, we supply the endpoints — which leads to the collector(s) — to whom the data will be transferred. While all metrics can be specified as a single collection endpoint, the example below demonstrates how to send each type of data separately, allowing the collector(s) to be scaled independently.^[5]

#!/bin/sh

export PYTHONPATH=.
opentelemetry-instrument \
    --traces_exporter console,otlp \
    --metrics_exporter console,otlp \
    --logs_exporter console,otlp \
    --service_name first \
    --exporter_otlp_traces_endpoint 0.0.0.0:4317 \
    --exporter_otlp_logs_endpoint 0.0.0.0:4317 \
    --exporter_otlp_metrics_endpoint 0.0.0.0:4317 \
    python first/service.py

We need to configure the collector(s) for it to be able to recieve, transform, and export the data. This is done through a YAML file. We specify the receivers, processors, and exporters for each type of telemetry data.^[6] For the sake of this post, we’ll modify the example given in the official OTel documentation slightly and send the traces to Jaeger.

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318

processors:
  batch:

exporters:
  otlp/jaeger:
    endpoint: jaeger:4317
  debug:
    verbosity: detailed
  prometheus:
    endpoint: 0.0.0.0:8889
    namespace: default


extensions:
  health_check:
  pprof:
  zpages:

service:
  extensions: [health_check, pprof, zpages]
  pipelines:
    traces:
      receivers: [otlp]
      processors: [batch]
      exporters: [otlp/jaeger]
    metrics:
      receivers: [otlp]
      processors: [batch]
      exporters: [prometheus]
    logs:
      receivers: [otlp]
      processors: [batch]
      exporters: [debug]

If we look at the pipeline, at the bottom of the YAML file, which processes the traces, we’ll see that we receive them in the OTLP format, batch them^[7], and send them to Jaeger. Similarly, we could export the logs to a vendor of our choosing, or to a self-hosted solution like OpenSearch.

Finally, we’ll look at adding instrumention using code to the services.

Code-based Intrumentation

When we put a service into production, we will need more than just the basic monitoring that OTel provides. For instance, we might want to monitor how much time it takes for a specific part of the code to run. We can use the OTel SDK for situations like these. When we did the bootstrap step, the OTel API and SDK were installed for us. We can check this by looking for them in the list of installed packages.

1
2
3

pip list | grep opentelemetry | grep -E "*.-(api|sdk)"
opentelemetry-api                        1.23.0
opentelemetry-sdk                        1.23.0

We’ll begin by couting the number of requests that are received by the first service. To do this, we’ll first need to obtain a MeterProvider^[8]. From this we will create a Meter, and then Metric Instruments. The instruments represent the actual metric that we want to track. For example, a counter.

The common package in our repository contains code to configure a MeterProvider, obtain a counter, and to incremenet it. For the sake of brevity we will only look at the change to the endpoint of the first service.

_counter = create_counter(
    name="first.request.count",
    unit="1",
    description="Counts the number of requests received by the service"
)


@api.post("/")
def post() -> dict:
    increment_counter(counter=_counter)
    time.sleep(random.random())
    return make_request()  # Makes a call to second

We are now keeping track of how many times we call the API. If we check the numbers in the console, we will find this. The value shows how many times we used curl to make requests. It is currently at 10.

{
    "name": "first.request.count",
    "description": "Counts the number of requests received by the service",
    "unit": "1",
    "data": {
        "data_points": [
            {
                "attributes": {
                    "env": "dev"
                },
                "start_time_unix_nano": 1710928263740358000,
                "time_unix_nano": 1710928319927995000,
                "value": 10
            }
        ],
        "aggregation_temporality": 2,
        "is_monotonic": true
    }
}

We can create different instruments, too. Let us now use a histogram to track the time taken by HTTPBin.

_histogram = create_histogram(
    name="httpbin.latency",
    unit="1.0",
    description="Track the time taken by the /delay endpoint."
)


@api.post("/")
def post() -> dict:
    with timed(_histogram):
        return make_request()  # Makes a call to HTTPBin

Like with the previous metric, we can see it logged to the console. The output for a hisogram is sufficiently large and has been excluded for brevity.

With this we conclude how to add telemetry, both automatic and manual, using OTel. Before we move on to profiling, I’d like to point out a few things I noticed when using the Python SDK for OTel. One, there is only the console exporter for metrics and traces. I’d assume we’d need an OTLP exporter to be able to send these to the collector. Two, the logs SDK is still under development.^[9] Nonetheless, OTel is a great way to add instrumentation to the services.

Profiling

Telemetry helps us observe how our program is performing by looking at logs, metrics, and traces. Another way to observe the program is by profiling it. This helps us understand how resources like memory, IO, CPU, etc. are being used. In this section we will look at how to profile our microservices using Parca. Specifically, we will profile the CPU usage.^[10]

To see our service’s CPU usage over time, we will add a new endpoint which computes Fibonacci numbers. Since this is a CPU-intensive operation, it will be captured visibly in the trace. We’ll add the following endpoint to the first microservice.

@api.get("/<int:n>")
def get(n: int) -> dict:

    def _one(n: int):
        return _two(n=n)

    def _two(n: int):
        return _fibo(n=n)

    def _fibo(n: int):
        for i in range(1, n - 1):
            fibo(i)
        return fibo(n=n)

    return {"fibonacci": _one(n=n)}

Notice that the returned value is generated from calling _one. This function then calls _two which eventually calls _fibo. The reason it is written this way is to have them show up in the profile.

Next we will download the Parca Agent. This is an always-on profiler which reads the stack traces of programs in both user-space and kernel-space.

1 2	RUN wget -O parca-agent https://github.com/parca-dev/parca-agent/releases/download/v0.30.0/parca-agent_0.30.0_`uname -s`_`uname -m` RUN chmod +x parca-agent

We’ll also add a small script which sends random requests to the endpoint.

import requests
import random

if __name__ == "__main__":
    for _ in range(1, int(1e10)):
        n = random.randint(1, 31)
        response = requests.get(f"http://localhost:5000/{n}")
        response.raise_for_status()

We’ll now run the agent and the services. The Parca Server which receives the profiles and displays them is part of the Docker compose file. The agent will run on the machine and send the profiles to the server.

1	sudo ./parca-agent --remote-store-address="localhost:7070" --node="laptop" --http-address="0.0.0.0:7071" --remote-store-insecure

Finally, we’ll send requests to the endpoint and wait for the profiler to generate profiling data. This will be visible on localhost:7071. To query the first microservice we will need its pid, and this can be obtained by grepping for it.

1 2	ps aux \| grep python \| grep -E ".(first)." fasih 838847 68.9 0.3 720716 53720 pts/7 Sl+ 13:51 76:08 /home/fasih/anaconda3/envs/microservice/bin/python first/service.py

We can now query Parca and look at the profiles. The benefit of profiling is to look at how the program is performing over time. We’ll compare two profiles along side each other. Notice how we’ve used the PID to filter the profiles.

Similarly, we can look at the stack trace.

Towards the middle of the image we’ll see the call stack calling the private functions _one, _two, and _fib. In the cumulative diff column^[11] we see a +10.05s. This means that between the two timeframes, the stacktrace has been running 10.05s longer; we’ve gotten slower with time. This can be confirmed by switching to the icicle graph which indicates the same.

That’s it. We’ve added profiling to our service.

Conclusion

We saw how we can add telemetry using OTel, and profiling using Parca. Both of these are great ways to observe a service in production. However, as of writing, both of these projects are in their early stages as can be seen from functionality that is yet to be developed. For example, OTel’s Python SDK is yet to add support for logging, and Parca only supports CPU profiling. Despite this, they’re both worth following as they let us add observability to our services without much effort.

This is my opinion on how to create a microservice with observability baked in. The code for this post is available on GitHub.

Footnotes and References

[1] I was unable to send traces and metrics to the collector, and from there to the appropriate backends, as I kept running into issues with gRPC. Perhaps an astute reader would like to submit a PR to fix these. :)
[2] https://opentelemetry.io/docs/concepts/signals/traces/
[3] https://opentelemetry.io/docs/concepts/context-propagation/
[4] https://opentelemetry.io/docs/collector/
[5] https://opentelemetry.io/docs/collector/scaling/
[6] https://opentelemetry.io/docs/collector/configuration/
[7] https://github.com/open-telemetry/opentelemetry-collector/blob/main/processor/batchprocessor/README.md
[8] https://opentelemetry.io/docs/concepts/signals/metrics/#meter-provider
[9] https://opentelemetry.io/docs/languages/python/instrumentation/
[10] https://www.parca.dev/docs/profiling-101/
[11] https://www.parca.dev/docs/concepts/#cumulative-and-diff-values

2024-03-17

Setting up Sphinx Documentation

In one of my previous blog posts I had written about creating software architecture as code. The primary motivation behind it was to commit the architecture to version control, and keep it close to the source code. The added benefit, that perhaps remained implicit, is that architecture reviews can now happen as a part of the PR review. In the same spirit, I’d also like to keep the documentation close to the source code, and make documentation review a part of the PR review process. This post is about setting up Sphinx documentaion for a Flask microservice. Although everything mentioned in this post is Python-specific, the ideas can hopefully be applied to any language and framework of your choice.

Getting Started

We’ll build a simple Flask microservice. It receives an HTTP request, conducts some processing, and then invokes another microservice. For the purpose of this post, the processing is simply time.sleep, while the other microservice is HTTPBin. We’ll then document our code and render it with Sphinx. This example, while contrived, is analogous to many real-world software projects in which numerous packages within the codebase are used in conjunction to express business logic. The goal of documentation, therefore, is to provide context to anyone working with the source. We will include documentation that gives a broad overview of the project, and covers each package in detail.

HttpBin

We’ll begin by adding classes which will let us make requests to HTTPBin.

class _HttpBinRequest(abc.ABC):

    base_url: str = "https://httpbin.org"

    @abc.abstractmethod
    def execute(self) -> Any:
        raise NotImplemented


@dc.dataclass
class HttpBinPost(_HttpBinRequest):
    params: dict[str, Any] = dc.field(default=dict)
    json: dict[str, Any] = dc.field(default=dict)

    def execute(self) -> Any:
        url = f"{self.base_url}/post"
        headers = {"Content-Type": "application/json"} if self.json else {}
        response = requests.post(
            url=url, 
            json=self.json, 
            params=self.params, 
            headers=headers,
        )
        response.raise_for_status()
        return response.json()

An object of HttpBinPost class is a representation of a POST request to HTTPBin, and can include query params, and a JSON body as a part of the request. As you’d have noticed, there is no documentation in the code.

We’ll now add a blueprint which will accept incoming requests, and then make calls to HttpBin.

@httpbin.post("/post")
def post():
    params = {"foo": "bar"}
    json = {"foo": "bar"}
    request = HttpBinPost(params=params, json=json)
    time.sleep(0.5)
    return request.execute()

The httpbin blueprint configures a /post endpoint. We send some query parameters and a json body along with the POST request, and the response is returned directly to the caller. Finally, we make a curl call to the endpoint.

1	curl -s -XPOST localhost:5000/post \| jq .

To summarize, we have a codebase that includes a package for making requests to HttpBin, as well as an API endpoint that makes the request using this package. We will now look at Sphinx and how to document the codebase.

Sphinx

Sphinx is a documentation generator that we’ll use to generate HTML files from a directory of reStructuredText files. We’ll also use the autodoc extension to generate documentaion from docstrings of Python classes and functions. The first step, however, is to install Sphinx. We’ll do so by using pip.

1	pip install sphinx

We’ll now begin configuring Sphinx. We’ll navigate to the docs directory, and run the sphinx-quickstart script that will run the interactive setup. While following along the setup, we’ll make sure to separate the build and source directories.

1 2	cd docs/ sphinx-quickstart

The source directory will contain the configuration file along with the rst files. The build directory will contain the HTML files that are generated when we build the documentation from the source directory. Let’s take a quick look at the generated files and directories with tree.

.
├── build
├── make.bat
├── Makefile
└── source
    ├── conf.py
    ├── index.rst
    ├── _static
    └── _templates

We will update conf.py and add our repository to sys.path. This is required by the autodoc extension as it imports modules to be documented. We’ll add the following lines to the bottom of the file. You will have to update the path appropriately.

# -- Add our repository to sys.path
from pathlib import Path
import sys

path = Path.home() / "Personal" / "sphinx"
sys.path.insert(0, str(path))

We’ll now generate documentation by running the sphinx-build command manually. Later we’ll write a small bash script to automate the process of regenerating the documentation.

1 2	sphinx-build source/ build/html/ open build/html/index.html

We’ll get the following HTML page after rendering index.rst. It is extremely basic, and we will add to it to give an overview of the codebase.

We’ll update the conf.py file and enable a couple of extensions by adding them to the extensions array.

1	extensions = ["sphinx.ext.autodoc", "sphinx.ext.napoleon"]

The two extensions we’ve added allow us to include docstrings as a part of the documentation. For the sake of this post, we’ll add lorem ipsum to the HttpBinPost class.

@dc.dataclass
class HttpBinPost(_HttpBinRequest):
    """
    Lorem ipsum dolor sit amet, consectetur adipiscing elit.
    Fusce laoreet lectus neque, in congue purus dapibus et.
    Sed eros elit, luctus ac ante eget, fermentum imperdiet urna.
    Integer rutrum leo sed quam faucibus rutrum. Suspendisse nulla diam, rhoncus id nisi et, aliquet auctor risus.
    In pellentesque, orci quis molestie dignissim, dui massa posuere lorem, ut suscipit orci libero quis sem.
    Etiam ullamcorper turpis at tempus semper.
    Nunc odio massa, feugiat quis sem nec, hendrerit pretium ex.
    Integer varius volutpat interdum.
    """

We’ll now create a new rst file called httpbin.rst in the source directory. This will contain the overview of the HTTPBin module, and a couple of directives to include the module, and the HttpBinPost class as a part of the documentation.

HttpBin Module
==============

Lorem ipsum dolor sit amet, consectetur adipiscing elit.
Fusce laoreet lectus neque, in congue purus dapibus et.
Sed eros elit, luctus ac ante eget, fermentum imperdiet urna.
Integer rutrum leo sed quam faucibus rutrum. Suspendisse nulla diam, rhoncus id nisi et, aliquet auctor risus.
In pellentesque, orci quis molestie dignissim, dui massa posuere lorem, ut suscipit orci libero quis sem.
Etiam ullamcorper turpis at tempus semper.
Nunc odio massa, feugiat quis sem nec, hendrerit pretium ex.
Integer varius volutpat interdum.

.. automodule:: service.httpbin
.. autoclass:: service.httpbin.HttpBinPost

Finally, we’ll update the index.rst file. This is where we’ll provide the overview of the codebase, and link the httpbin.rst file.

Welcome to Sphinx's documentation!
==================================
Lorem ipsum dolor sit amet, consectetur adipiscing elit.
Fusce laoreet lectus neque, in congue purus dapibus et.
Sed eros elit, luctus ac ante eget, fermentum imperdiet urna.
Integer rutrum leo sed quam faucibus rutrum. Suspendisse nulla diam, rhoncus id nisi et, aliquet auctor risus.
In pellentesque, orci quis molestie dignissim, dui massa posuere lorem, ut suscipit orci libero quis sem.
Etiam ullamcorper turpis at tempus semper.
Nunc odio massa, feugiat quis sem nec, hendrerit pretium ex.
Integer varius volutpat interdum.

.. toctree::
   :maxdepth: 2
   :caption: Contents:

   httpbin.rst

...

The generated documentation now has an overview on the index page, and a link to the HttpBin module. When we click the link to the module, we’ll see that the overview, as well as the docstring of the HttpBinPost class, are included in the documentation. Screenshots of both of these pages are provided below.

We’ll now add a small bash script, called docs.sh, to regenerate the documentation for the codebase, and place it at the root of the codebase.

#!/bin/bash

sphinx-build docs/source/ docs/build/html/
open docs/build/html/index.html

Conclusion

This post is a basic introduction to using Sphinx to generate documentation. Keeping the documentation within the repository allows keeping the context close to the source. Combining this post with the previous post on architecture as code, we can keep most of the context of a repository within itself. I find this to be more helpful than using Confluence or Notion. Finally, all of the code for this post is available as a GitHub repository.

2024-03-11

Peace, Somewhat Precisely

In the footnote of my previous essay, titled ‘War, Peace, and Everything in Between’, I wrote about being able to empirically measure how much at peace a country is. I also mentioned that I’d like to write an essay titled ‘Peace, Precisely’ which would have more mathematical rigor. I spent quite some time thinking about it, and in the interim, I am going to write the current essay and call it ‘Peace, Somewhat Precisely’. The premise of this essay is to step in the direction of the final essay by laying a foundation which can perhaps be reused later.

We shall try to come up with a value which defines how peaceful the world, as a whole, is at a given moment in time.

In my essay I mentioned that we can define interactions between two nations that can help us measure, empirically, how much peace there is between them. For example, the amount of trade and commerce among them can be used as an indicator of peace between them. In theory, we can have many such interactions that together help us measure the peace between the two nations. Mathematically, let’s call this a function . The input to this function are all the interactions among the two nations, and the output is a value between 0 and 1, inclusive that indicates the amount of peace. We shall denote this value as . We, therefore, have .

We can now model the nations of the world, and the relationships between them as a weighted, undirected graph. If a nation has an interaction with another nation, there shall be an edge between them. The weight of the edge shall be the value between the two nations. We’ll introduce one more value, , for serenity, which is a measure of the amount of peace within a nation’s borders, and takes on values between 0 and 1, inclusive. We can define similar interactions which can help us measure the peace within a nation. Let’s define a function such that .

Combining the two above, the amount of peace that a nation has is a combination of peace within, and outside its borders. We can, therefore, define a function which combines the value of and . We now have , and can generate a set of -values for all the nations of the world. The -value, mentioned at the begining of the essay, can now be computed from the set of -values. For example, as a median of the set of values.

This is how we can measure the amount of peace in the world. Hypothetically. :P

2024-03-08

War, Peace, and Everything in Between

After concluding my series of essays which hope to bring peace to Palestine, and Israel, I wondered what the walk on the path to peace would look like. Specifically, how would we know how far we’ve come since the cessation of hostilities? After giving it some thought, I came up with a way to subjectively measure the progress. In this essay I shall discuss what it is, starting from a very rudimentary approach, and building on it to make it more sophisticated.

As always, we’ll start with a thought experiment. Take a moment to pause and reflect on what you think the relationship between Moshe, Moosa, and their nations would look like if the hostilities ended. Perhaps you envisioned a utopian outcome where all is blissful, or maybe a more pragmatic outcome which requires both the nations to put in time, and effort to normalise the relationship. In the course of this essay, we shall start with the simplistic, utopian outcome, and then build on it to discuss a more sophisticated, pragmatic outcome.

In its simplest form, an end to hostilities results in peace. In other words, there is war, and then there is peace. In this binary view of the world, an end to the war results in immediate peace. I posit that this is a very naive way to look at the outcome, and I shall explain this using the framework of collective consciousness. In my fifth, and final, essay I mentioned that emotions are both a cause and consequence of events that become a part of the individual, and collective consciousness. Looking at the history of the two nations, it will take some time, and effort for the consciousnesses of both the nations to be at peace with each other. With this in mind, we’ll refine the binary view to include one more outcome - stalemate.

Let us now introduce a state of stalemate that exists between the states of war, and peace. This is a state of equilibruim, a purgatory of sort, which provides the two nations’ consciousnesses sufficient time to heal. We can summarize this by saying that peace is better than stalemate, and stalemate is better than war. In a state of stalemate there is neither war, nor peace but there is the potential for either of the two. This interim period can be used to either prepare for war, or to chalk out a path for ensuing peace. This view of progress, although realistic, is also very mechanical. A state of stalemate is the one of perfect lull where there is neither peace, nor war. An immediate transition to such a state, although possible, seems less plausible given that emotions are a part of collective consciousness. We will refine the view one final time and view the states of war, stalemate, and peace as a spectrum.

Let us now view war, stalemate, and peace as a spectrum. On the far ends are war, and peace, and in the middle is stalemate. Nations of the world who have been at war, and desire to be at peace will have to first, gradually, work towards a state of stalemate. From there, they will have to work, gradually again, towards peace. This, in my opinion, is the most pragmatic way to view the progress after a cessation of hostilities, and fits perfectly well within the framework of collective consciousnesses. The time, and effort it will take to transition to from war to stalemate, and finally to peace, provide the necessary ingredients for the consciousnesses of the two nations to be at peace with each other.

With this view in mind, let us hope that can work towards peace in the land of the prophets.

Thank you for reading.

Footnotes and References

[1] We can view this essay mathematically, too. The first model is a set of ordinal values such that . The second model introduces one more value in the set, , such that . Finally, the third model can be viewed as a range, with values between and , inclusive, where a value of represents stalemate.
[2] We can also come up with interactions among nations which can act as milestones to measure progress as they move from war, to stalemate, and then to peace. For example, some amount of trade and commerce can indicate a state of transitioning to a stalemate, whereas completely opening up to trade and commerce indicates transitioning to peace.
[3] Combine the two footnotes, and we can empirically measure the state of relations between two nations. Perhaps someday I will write about it and call it ‘Peace, Precisely’.

2024-03-07

Palestine and Israel from the perspective of collective consciousness, 5th essay

This is the final essay in my series of essays that aim to bring peace to Palestine and Israel. Like in my previous few essays, I wanted to title it differently but to maintain continuity I shall call it “5th essay”; I had considered “From Moshe to Moosa” or “From Moosa to Moshe”^[1] after the inscription on Rambam’s tomb which reads “From Moses to Moses arose none like Moses”^[2], a testament to wisdom and intelligence. In this essay I shall try and make an appeal to both emotion and reason, simultaneously, and show why peace is better for both Moshe and Moosa. As is the spirit of this series of essays, I shall handle the subject with sensitivity. If I inadvertently say something insensitive, I apologise from the beginning.

Throughout my essays I have used the lens of collective consciousness to posit that the way forward is peace. In my original essay on collective consciousness, titled ‘Collective Consciousness’, I explain how one is a part of the consciousness of humanity, and how the consciousness of humanity is made up of the consciousnesses of everyone. I also explain how events shape the individual, and collective consciousness. What remained implicit was that emotions, as both the result and cause of events, too, shape consciousnesses.

Let us begin by critiquing my previous essays and calling them hyperrational. I use the word ‘hyperrational’ very consciously because what I have written may be perceived as my asking people on either side to forget whatever is in their consciousnesses, as individuals, and as people of a nation, given my arguments, to work towards peace because that is objectively the better option. To add to the criticism, I wrote in the opening lines of the first essay that I am observing the crisis unfold from a distance and, therefore, carry the credibility of an armchair critic; I am far from hearing a rocket or a bomb explode, or losing a member of my family to this crisis. Although I have tried to defend my position in the second essay, the “if this, then that” approach may seem to be a pure appeal to reason. The classical elements of persuasion suggest that an appeal be made either using the credibility of the speaker, to the emotions of the audience, or to their ability to reason. Much to my dismay, I have the credibility of an armchair critic appealing to the two nations’ ability to reason in an emotionally charged crisis.

In my defense I posit that, using the teachings of Rambam and Ibn ‘Abbas, I shall establish my credibility, and that an appeal can be made to both emotion and reason simultaneously, within the framework of collective consciousness.

Let’s start with a thought experiment. Take a moment to pause and ask yourself what you’d like to be remembered for. Perhaps you’d like to be remembered for your values, for your vocation, for your contributions to a cause you consider worthy, or one of the myriad other things. Your emotions, stemming from your conception of the answer, would lead you in pursuit of bringing it to reality. In other words, your desire to be become an indelible part of humanity’s consciousness would lead you in pursuit of events that bring it to bear. A generalization of this is that the nations of the world would like to be an eternal part of humanity’s consciousness.

We’ll find similar ideas among the medieval Muslim philosophers. They speak of humans as having multiple souls; one vegetal or animal soul, capable only of growth and reproduction without rational thought,^[3] and one rational soul.^[4] While the former perishes with death, the latter can hope to live on. The true form of the human is, therefore, intellect whereas the body just becomes a conduit for the activation of human potential. The realisation of human potentiality allows one to survive after death, and comes from one’s understanding that these separate souls do not reside in matter. When one contemplates these separate souls, and grasps them, then one is united with them, and becomes eternal. This unification is the closest a human can come to becoming divine, and therefore, is the completion of human perfection, and guarantees immortality, and eternal reward. Ibn Sina, known more popularly as Avicenna, writes about eternal reward, and the felicity of the soul after death in his Kitab al-Najat, The Book of Deliverance, which is a treatise on the soul. He writes that the divine philosphers desire to attain the felicity gained from the unification of the souls than corporeal felicity.^[5] He then writes about the difficulties of grasping such a pleasure by giving the example of a eunuch who does not crave the pleasure of intimacy because he has never experienced it, and does not know what it is. This, Ibn Sena writes, is our situation regarding the pleasure whose existence we know but cannot concieve.^[6]

Take a moment and pause on the similarity of looking at yourself, and others, from the perspective of many consciousnesses, and that of many souls. I’ll borrow the idea of the felicity of the soul, and propose a felicity of the consciousness that applies to individual, and collective consciousness. I posit that, unlike what Ibn Sena writes, it is indeed possible to experience such bliss. In the context of these essays, however, I shall look at the felicity of the nations of Moshe, and Moosa.

Like in my previous essay, I shall draw upon the sayings of Rambam, beginning with his self-perception. He describes himself to one of his students by saying “You know well how humbly I behave towards everyone, and that I put myself on a par with everyone, no matter how small he may be.” The approach that he had towards his students, and followers was of gentleness, and pragmatism.^[7] In my previous essay I had mentioned that Rambam was a polymath who had studied medicine, and astronomy. He considered that the science of astronomy is the hallmark of a civilized nation. In Dalalat al-Ha’irin, The Guide for the Perplexed, in the context of astronomy, he writes that “our nation is a nation that is full of knowledge and is perfect (milla ‘alima kamila), as He, may He be exalted, has made it clear through the intermediary of the Master who made us perfect, saying: ‘Surely, this great nation is a wise and understanding people.’”^[8] I’d like to extrapolate from this, with utmost sensitivity, respect, gentleness, and pragmatism, that Rambam considered the Jewish nation as the one of wise and understanding people. The felicity of the nation shall be found, therefore, in his sayings. To contrast this with Ibn Sena’s saying, what Rambam says is both conceivable, and achievable.

We shall now take a look at some of the verses from the Quran, and the sayings of the Prophet (pbuh) to see what would bring felicity to the Muslim consciousness. I am going to focus on those verses, and sayings which find common ground with what has been mentioned above, while also trying to convey the overall picture of the Islamic faith. I shall cherrypick some verses from the Quran, but it is my sincere belief that they shall be within their context. A chapter of the Quran is called “Surah” or “Surat”, and I shall use that to maintain fluency of sentences. The first two verses comes from Surat An-Nisa verse 36 and 37 and state what it means to be a Muslim. They read as follows “Worship Allah ˹alone˺ and associate none with Him. And be kind to parents, relatives, orphans, the poor, near and distant neighbours, close friends, ˹needy˺ travellers, and those ˹bondspeople˺ in your possession. Surely Allah does not like whoever is arrogant, boastful, those who are stingy, promote stinginess among people, and withhold Allah’s bounties. We have prepared for the disbelievers a humiliating punishment.” The verses of the Quran are usually studied in conjunction with the Seerah, the life of the Prophet (pbuh), as his sayings expound on what has been revealed in the verse. One can find many of his sayings about the rights of parents, neighbors, orphans, etc. and on charity, and it is a condition of the Islamic faith, for those who are capable, to donate a percentage of their excess wealth every lunar year, the Zakat. The second verse comes from Surat Ta-Ha, verse 114 and reads “Exalted is Allah, the True King! Do not rush to recite ˹a revelation of˺ the Quran ˹O Prophet˺ before it is ˹properly˺ conveyed to you, and pray, “My Lord! Increase me in knowledge.”” While both a minor admonition, and an advise to the Prophet (pbuh), one can glean the wisdom in asking Allah for increasing one’s knowledge. Finally, I’ll quote hadith 224 from Ibn Majah, a saying of Prophet (pbuh), which says that “Seeking knowledge is an obligation on every Muslim.” The felicity of the nation shall be found, therefore, in what has been revealed to, and said by the Prophet (pbuh). To contrast this with Ibn Sena’s saying, what is mentioned in the Quran and the hadith is both conceivable, and achievable.

Having mentioned the two verses, and the hadith, it is clear that there is a thread that ties the consciousness of the two faiths and that is the one of humility, gratitude, and pursuit of knowledge.

I shall now return to the original premise of the essay and that was to establish my credibility, and to simultaneously make an appeal to reason and emotion. To address the first, I shall merely repeat the sayings of Rambam and Ibn ‘Abbas and that is to take the truth, and by extension wisdom, wherever it comes from. I make no claims to be wise, but I do make a claim to have spoken the truth. Having quoted Rambam, the Quran, and the Prophet (pbuh), I have made an appeal to the emotions of the two nations. Having borrowed from Ibn Sena, I have tried to state that felicity of consciousness, a state of emotion, is achieveable. It, therefore, stands to reason that we make an attempt to attain that which was stated. In the light of the unfolding crisis in the world of Moshe and Moosa, the only path forward to bring this to reality is peace. As I have stated in my previous essays, it is a difficult walk, but what lies at the end makes it worth it.^[10]

I’ll end my essays by quoting the last part of Khutbat-ul-Haajah, Sermon of Necessities, that Prophet Mohammed (pbuh) would use as the opening to his sermons. These are two verses from the Quran, one after the other, that exhort the believers to speak the truth, and to remember to follow Allah and his Prophet (pbuh). The verses of Surat Ahzab verse 70-71 are as follows.

“O believers! Be mindful of Allah, and say what is right. He will bless your deeds for you, and forgive your sins. And whoever obeys Allah and His Messenger, has truly achieved a great triumph.”

I hope and pray that as a believing Muslim I have spoken that which is right, and kept my duties to Allah and his Prophet (pbuh).

Amma B’ad. After that.

Footnotes and References

[1] Page 153, Stroumsa S. Maimonides in his world: Portrait of a Mediterranean Thinker. Princeton University Press; 2011.
[2] Wikipedia contributors. Maimonides - Wikipedia [Internet]. 2024. Available from: https://en.wikipedia.org/wiki/Maimonides
[3] APA Dictionary of Psychology [Internet]. Available from: https://dictionary.apa.org/vegetative-soul
[4] Page 153, Stroumsa S. Maimonides in his world: Portrait of a Mediterranean Thinker. Princeton University Press; 2011.
[5] Page 154, Stroumsa S. Maimonides in his world: Portrait of a Mediterranean Thinker. Princeton University Press; 2011.
[6] Page 155, Stroumsa S. Maimonides in his world: Portrait of a Mediterranean Thinker. Princeton University Press; 2011.
[7] Page 113, Stroumsa S. Maimonides in his world: Portrait of a Mediterranean Thinker. Princeton University Press; 2011.
[8] Page 113, Stroumsa S. Maimonides in his world: Portrait of a Mediterranean Thinker. Princeton University Press; 2011.
[9] The translation of the Quran is from The Clear Quran by Dr. Mustafa Khattab. The text was copied from quran.com.
[10] Perhaps if this essay were any shorter I would have written about the story of Yusuf (Joseph), the son of Yaqub (Jacob, Israel), as it exemplifies patience, grace, forgiveness, and provides an additional thread that the two nations together. It is the 12^th chapter of the Quran, and I highly recommend you, the reader, to read the translation as you listen to the original recitation. https://quran.com/12. Writing about it would have allowed me to provide one more example through which I could make an appeal to emotion, and reason, simultaneously, to show that path forward is in peace.

2024-03-06

Palestine and Israel from the perspective of collective consciousness, 4th essay

In this essay we shall look at the life of Prophet Mohammed (pbuh). I wanted to title the essay as either “Pathway to Peace from the life of the Prophet” or “Patience and Perseverence from the life of the Prophet”. However, it shall be titled “4th essay” to maintain continuity. While there are entire books, both classical and contemporary, that are written about his life, we shall look at it thematically. There are no citations in this essay because my singular source is Ar-Raheeq Al-Makhtum, The Sealed Nectar, which is a contemporary book on the seerah, the life of the Prophet (pbuh). I have taken every care to be as sensitive as I can, but if I inadvertently say anything insentitive, I apologise from the beginning.

Let’s begin with a thought experiement. I would like you to consider your perception of the Prophet (pbuh), whatever it may be. I hope that by the end of the essay, I will have demonstrated that he is, in fact, a prophet of mercy, and that his life demonstrates, realistically, a path to peace despite the challenges encountered along the way. I shall begin with two verses from the Quran, and then begin to go through the life of the Prophet (pbuh) by dividing it into distinct phases. My hope is that such an overview, for the purpose of this series of essays, shall suffice. I do, however, recommend you to read The Sealed Nectar, should you want to read the seerah in relatively more detail.

The first verse is verse 107 from Surat Al-Anbya, and is the premise of my essay. It reads “We have sent you ˹O Prophet˺ only as a mercy for the whole world.” This is in contrast to the popular perception of the Prophet (pbuh) as a warlord. Although he did go to war, they were far from his purpose in life, as we shall see. The second verse is verse 21 of Al-Ahzab and reads “Indeed, in the Messenger of Allah you have an excellent example for whoever has hope in Allah and the Last Day, and remembers Allah often.” As we shall see, when we look at ‘Ām al-Ḥuzn, the Year of Sorrow, it was the Prophet (pbuh)’s hope in Allah that allowed him to persevere through the difficult times, as is also evident from his prayer in Ta’if.

We shall begin by looking at Arabia at the time of the Prophet (pbuh) as it is important to understand the religious, cultural, political, and genealogical backdrop in which he preached the message of Islam.

We begin by looking at the lineage of the Prophet (pbuh). As the story of Abraham has it, he had two sons, Ismail (Ishmael), and Ishaq (Issac). Ismail, and his mother Hajer (Hagar), were left in Arabia by Abraham after Sarah got jealous that Hajer gave birth to a child. Arabs at the time were divided into tribes, and the tribe of the Prophet (pbuh), the Quraish, traces its lineage back to Ismail.^[1]

Most Arabs followed the religion of Abraham which is to worship one God, and associate none with him. However, with time they turned to paganism and idolatory. This began when one of the cheifs brought back an idol, named Hubal, from Syria when he saw the Amalek worshipping it, and placed it in the middle of the Ka’bah, the house of Allah in Makkah. Soon, there were more idols placed in and around the Ka’bah. Slowly paganism spread in all of Arabia, and became the predominant religion.

The political situation in Arabia was the one of tribal rule where each tribe had a leader, chosen from among them, who enjoyed complete respect and obedience, similar to a king. The tribes, however, were disunited and were governed by tribal rivalry, and conflicts; there was a lack of a unified government.

Arabian society had a variety of socioeconomic levels and showed a world of contrasts. The status of women, for example, is the one where this becomes most apparent. Women in the nobility were held in high regard, and enjoyed a significant degree of free will. They could be so highly cherished that blood would easily be shed in defense of their honour. Then there were women of another social strata where prostitution, and indecency was rampant; the details have been left out for the sake of decency. While some Arabs held their children dear, others buried their female children alive due to their fear of poverty, and shame. There was, however, a strata of men and women who lived a life of moderation. Another aspect of the Arab life was that of a deep-seated attachment to one’s tribe; unity by blood was a principle that bound the Arabs into a social unit. Avarice for leadership, despite descending from one common ancestor, led to tribal warfare and led to fragile inter-tribal relationships. Their impulse to quench their thirst for blood was restrained by their devotion to some religious superstitions, and some customs held in veneration. The custom of abstaining from hostilities in the four sacred months provided a period of peace, and allowed them to earn their livelihood.

There were also admirable traits of the Arab society. For example, hospitality towards guests was of utmost importance and there was nobility attached to it. They were hospitable to a fault, and would sacrifice their own sustenance for a cold, or hungry guest. Similarly, keeping a covenant was taken seriously so much so that one would not avenge the death of his children just to keep a covenant. The proceeds of gambling, surprisingly, were donated to charity.

Take a moment to pause and reflect on the Arab society in pre-Islamic Arabia. This period is referred to as the period of jāhilīyah, meaning ignorance. This is the society in which the Prophet (pbuh) was born, raised, lived, and preached, and it is easy to imagine why a message of monotheism, devotion, chastity, and generosity would be met with resistance, and even war.

We shall now look at the life of the Prophet (pbuh) in four phases: the pre-Islamic life in Mecca, the post-Islamic life in Mecca, life in Medina, and the conquest of Mecca. These divisions are my own, and have been chosen as they, for the purpose of this essay, convey the complete picture of his life.

Prophet Mohammed (pbuh) was born in Mecca, in the Banu Hashim sub-tribe of the Quraish, to Aminah, and Abdallah. As was the custom, he was sent away to be raised by bedouin wet nurses, away from the city, so that he may learn to speak purer Arabic, and grow up healthier. The bedouins were known for their fleucncy of language, and for being free from the vices that develop in sedentary societies. Before being sent away, Prophet Mohammed (pbuh) lost his father. He stayed with the wet nurses till he turned four or five. When he turned six, his mother passed away, and his care was entrusted to his grandfather Abdul-Muttalib, who was more passionate towards him than his own children. Abdul-Muttalib passed away when the Prophet (pbuh) was eight, and his care was entrusted to his uncle Abu Talib.

In his early youth, the Prophet (pbuh) worked as a shepherd. At age 25 he went as a merchant on behalf of Khadija, his future wife, to conduct business for her, in exchange for a share of the profits. She had heard of his honesty, and offered him a higher share than she did to the previous men she had employed. Upon his return, she noticed an increase in her profits, and heard of his honesty, sincerity, and faith from her aide who had accompanied him. She expressed her desire to get married to the Prophet (pbuh) to her friend, who conveyed the news to him. He agreed, and they were subsequently married. It is from her that he has all his children save one. It was during this period of his life that he earned reputation of being Al-Ameen, the trustworthy.

Take a moment to pause and reflect on the pre-Islamic life of the Prophet (pbuh). Here is a description of his early years in The Sealed Nectar that I will quote verbatim, which I believe aptly summarizes his early years. “He proved himself to be the ideal of manhood, and to possess a spotless character. He was the most obliging to his compatriots, the most honest in his talk and the mildest in temper. He was the most gentle-hearted, chaste, hospitable and always impressed people by his piety-inspiring countenance. He was the most truthful and the best to keep covenant.” Finally, I’ll quote Khadijda who said “He unites uterine relations, he helps the poor and the needy, he entertains the guests, and endures hardships in the path of truthfulness.”

We shall now move on to the part of his life in Mecca after the start of prophethood. This part of his life lasted close to 13 years.

Of the habit of the Prophet (pbuh) was contemplating in solitude. He would take with him some Sawiq (barley porridge), and water, and head to the cave of Hira’. He would spend his time, and especially Ramadan, to worship and contemplate on the universe around him. His heart was restless about the moral evils, and the abandonment of the faith of Abraham that were prevelant among his people. It was the preliminary stage of his prophethood. At the age of forty that he received the first revelation, when Jibreel (Gabriel) appeared to him in the cave. The first revelation had left him so horrified that he ran back to his house, and asked Khadija to cover him. He then narrated the incident to her. She proceeded to take him to her cousin who had accepted Christianity, and the Prophet (pbuh) recalled the incident to him, too. It was here that he was informed that his encounter was divine, and that the angel that appeared to him was the one that appeared to Moses. He was also told of the hardships that lie ahead, and how he would be cast out by his own people. Khadija’s cousin passed away a few days after this conversation. Ibn Hisham, the writer of the earliest, and most authoritative text on the seerah, reports that during the incident of the revelation, Jibreel informed the Prophet (pbuh) of his prophethood.

The first few years were spent preacing privately, and the verses of the Quran focused on sanctifying the soul, and telling Muslims to forego the glamor of life. They also gave a vivid account of heaven and hell. Prayer, Salah, was mandated two times a day. Among the early converts to Islam were those close to the Prophet (pbuh), and the slaves. While the pagans of Quraish paid no heed to the spread of the message during its early years, the later years were filled with animosity. There are detailed accounts of torture, and persecution of Muslims. To portray the gruesomeness of what the early Muslims had to endure, I shall speak of one such incident and that is of Sumaiyyah, the mother of Ammar Ibn Yasir. She was impaled with a spear, driven through her privates, for refusing to leave Islam. There were attempts made to assassinate the Prophet (pbuh) himself, and at one point in time his whole tribe was boycotted for continuing to offer him protection. The Prophet (pbuh) instructed some of the converts to migrate to the Christian kingdom of Abyssinia because he knew that the king was just, and would do right by his subjects.

This phase of his life was also marked by the toughest year, called ‘Ām al-Ḥuzn, the Year of Sorrow, which was the tenth year of prophethood. In this year he lost his uncle, Abu Talib, his wife Khadija, and was stoned after he invited the leaders of Ta’if to Islam, in hopes of finding a haven away from Makkah. He then made a supplication which I shall mention next to show his hope and faith in Allah, as indicated in the earlier part of the essay.

“O Allâh! To You alone I make complaint of my helplessness, the paucity of my resources and my insignificance before mankind. You are the most Merciful of the mercifuls. You are the Lord of the helpless and the weak, O Lord of mine! Into whose hands would You abandon me: into the hands of an unsympathetic distant relative who would sullenly frown at me, or to the enemy who has been given control over my affairs? But if Your wrath does not fall on me, there is nothing for me to worry about. I seek protection in the light of Your Countenance, which illuminates the heavens and dispels darkness, and which controls all affairs in this world as well as in the Hereafter. May it never be that I should incur Your wrath, or that You should be wrathful to me. And there is no power nor resource, but Yours alone.”

The silver lining of the Year of Sorrow was his finding support among people of Yathrib, the city that would later be called Medina.

Take a moment to pause and reflect on the Makkan phase of the life of the Prophet (pbuh). It is the one of persecution, expulsion, and of a search for a place to call home. As we will see in the Madani phase, this will become the backdrop in which he establishes the city of Medina, goes to war, returns to Makkah as a conquerer, and completes the religion of Islam to bring back that which Abraham had taught his children.

The Madani part of his life lasted 10 years. After having made Hijrah, emigration, to Medina, Prophet (pbuh) started establishing it as a city-state; his sight was always on Makkah. Those that made the journey with him are called the Muhajiroon, the Immigrants, and the ones who gave them refuge are called the Ansar, the Helpers. The Quraish had done everything they can to prevent people from undertaking the journey to Medina since such a departure would sully their honor. The Ansar knew that by accepting the Prophet (pbuh) in their midst they would have a possibility of war, and that did come to pass. As time unfolded, the Muslims went to war against the Makkans multiple times, and with each war the size of the Makkan army kept getting larger, and in one of the battles the Muslims found their martyrs mutilated. Eventually, they reach the treaty of al-Hudaybiya, which would seize the hostilites on both sides.

However, this is also the time when the Prophet (pbuh) makes the the Night Journey, the Isra’ and Mi’raj, where he travels to Jerusalem, and then ascends to the heavens. During this journey he is greeted by prophets of old including Isa (Jesus), Abraham, and Moosa (Moses). The hadiths, and the verses in the Quran, regarding the Night Journey describe in vivid detail what he saw. The Prophet (pbuh) mentions Sidrat al-Muntaha, the lote tree under the throne of Allah to which everything ascends. He also mentions two heavenly rivers, and two worldly rivers; the worldly rivers are Nile and Euphrates. It is in his ascension that he meets Allah, in a way that He deems fit, and is given the ordainment of fifty prayers for his followers. Upon his return from meeting Allah, Moosa asks him to go back, and ask for the them to be reduced. Eventually, after asking Allah a few times, the Prophet (pbuh) accepts the five daily prayers as the Divine decree.

Take a moment to pause and reflect on this phase of his life. I am going to talk about one last incident, and that is the conquest of Makkah, that took place after the Quraish breached the treaty of al-Hudaybiya.

After making complete preparations for the conquest, the Prophet (pbuh) departed with an army of 10,000. The news-blackout that the Prophet (pbuh) had imposed gave the Quraish no opportunity to prepare against the oncoming army. However, to not take them by surprise, the Prophet (pbuh) asked the Muslims to light cooking fires on all sides, so that the Quraish had an opportunity to prepare. Abu Sufyan, the leader of the Quraish, who was yet to accept Islam, sought an audience with the Prophet (pbuh). The Prophet (pbuh) then granted the people of Mecca general amnesty, and to honor Abu Sufyan said “He who takes refuge in Abu Sufyan’s house is safe; whosoever confines himself to his house, the inmates thereof shall be in safety, and he who enters the Sacred Mosque is safe.”. Upon entering Makkah as a conquerer, the Prophet (pbuh), proceeded to the Ka’bah, and destroyed the idols. He later addressed the people of Quraish and recited the verse of the Quran which reads “O humanity! Indeed, We created you from a male and a female, and made you into peoples and tribes so that you may ˹get to˺ know one another. Surely the most noble of you in the sight of Allah is the most righteous among you. Allah is truly All-Knowing, All-Aware.” He then asked what they thought is the treatment that they shall be meted with, and they said that they expect nothing but goodness from him. To this he said “I speak to you in the same words as Yusuf (Joseph) spoke unto his brothers: He said: ‘There is no blame on you today’, go your way, for you are freed ones.”

Take a moment to reflect on the aftermath of the conquest of Makkah in which the Prophet (pbuh) forgave everyone. The ones he forgave are the ones who mocked him, persecuted him, stoned him, and went as far as mutilating the body of his uncle. In his life we find a practical approach to finding lasting peace and serenity. The path ahead for both Moshe, and Moosa is to emulate the Prophet (pbuh), and say that there is no blame on the other.

This is me looking for peace in the land of the prophets, by citing the life of the prophet of mercy.

Footnotes and References

[1] I am going to, consciously, gloss over the Qahtani and Adnani distinction for the sake of simplicity. The footnote exists as an acknowledgement to the nuance. Suffice to say, Ismail is an Adnani Arab who married into the Qahtani Arabs. Such details are better suited for an intermediate or advanced seerah class.