Skip to main content

The Problem: Everything Measures from Its Own Perspective

Imagine your robot has an RGB-D camera—a camera that captures both color images and depth (distance to each pixel). These are common in robotics: Intel RealSense, Microsoft Kinect, and similar sensors. The camera spots a coffee mug at pixel (320, 240), and the depth sensor says it’s 1.2 meters away. You want the robot arm to pick it up—but the arm doesn’t understand pixels or camera-relative distances. It needs coordinates in its own workspace: “move to position (0.8, 0.3, 0.1) meters from my base.” To convert camera measurements to arm coordinates, you need to know:
  • The camera’s intrinsic parameters (focal length, sensor size) to convert pixels to a 3D direction
  • The depth value to get the full 3D position relative to the camera
  • Where the camera is mounted relative to the arm, and at what angle
This chain of conversions—(pixels + depth) → 3D point in camera frame → robot coordinates—is what transforms handle. output Each arrow in this tree is a transform. To get the mug’s position in gripper coordinates, you chain transforms through their common parent: camera → robot_base → arm → gripper.

What’s a Coordinate Frame?

A coordinate frame is simply a point of view—an origin point and a set of axes (X, Y, Z) from which you measure positions and orientations. Think of it like giving directions:
  • GPS says you’re at 37.7749° N, 122.4194° W
  • The coffee shop floor plan says “table 5 is 3 meters from the entrance”
  • Your friend says “I’m two tables to your left”
These all describe positions in the same physical space, but from different reference points. Each is a coordinate frame. In a robot:
  • The camera measures in pixels, or in meters relative to its lens
  • The LIDAR measures distances from its own mounting point
  • The robot arm thinks in terms of its base or end-effector position
  • The world has a fixed coordinate system everything lives in
Each sensor, joint, and reference point has its own frame.

The Transform Class

The Transform class at geometry_msgs/Transform.py represents a spatial transformation with:
  • frame_id - The parent frame name
  • child_frame_id - The child frame name
  • translation - A Vector3 (x, y, z) offset
  • rotation - A Quaternion (x, y, z, w) orientation
  • ts - Timestamp for temporal lookups
from dimos.msgs.geometry_msgs.Quaternion import Quaternion
from dimos.msgs.geometry_msgs.Transform import Transform
from dimos.msgs.geometry_msgs.Vector3 import Vector3

# Camera 0.5m forward and 0.3m up from base, no rotation
camera_transform = Transform(
    translation=Vector3(0.5, 0.0, 0.3),
    rotation=Quaternion(0.0, 0.0, 0.0, 1.0),  # Identity rotation
    frame_id="base_link",
    child_frame_id="camera_link",
)
print(camera_transform)
base_link -> camera_link
  Translation: → Vector Vector([0.5 0.  0.3])
  Rotation: Quaternion(0.000000, 0.000000, 0.000000, 1.000000)

Transform Operations

Transforms can be composed and inverted:
from dimos.msgs.geometry_msgs.Quaternion import Quaternion
from dimos.msgs.geometry_msgs.Transform import Transform
from dimos.msgs.geometry_msgs.Vector3 import Vector3

# Create two transforms
t1 = Transform(
    translation=Vector3(1.0, 0.0, 0.0),
    rotation=Quaternion(0.0, 0.0, 0.0, 1.0),
    frame_id="base_link",
    child_frame_id="camera_link",
)
t2 = Transform(
    translation=Vector3(0.0, 0.5, 0.0),
    rotation=Quaternion(0.0, 0.0, 0.0, 1.0),
    frame_id="camera_link",
    child_frame_id="end_effector",
)

# Compose: base_link -> camera -> end_effector
t3 = t1 + t2
print(f"Composed: {t3.frame_id} -> {t3.child_frame_id}")
print(f"Translation: ({t3.translation.x}, {t3.translation.y}, {t3.translation.z})")

# Inverse: if t goes A -> B, -t goes B -> A
t_inverse = -t1
print(f"Inverse: {t_inverse.frame_id} -> {t_inverse.child_frame_id}")
Composed: base_link -> end_effector
Translation: (1.0, 0.5, 0.0)
Inverse: camera_link -> base_link

Converting to Matrix Form

For integration with libraries like NumPy or OpenCV:
from dimos.msgs.geometry_msgs.Quaternion import Quaternion
from dimos.msgs.geometry_msgs.Transform import Transform
from dimos.msgs.geometry_msgs.Vector3 import Vector3

t = Transform(
    translation=Vector3(1.0, 2.0, 3.0),
    rotation=Quaternion(0.0, 0.0, 0.0, 1.0),
)
matrix = t.to_matrix()
print("4x4 transformation matrix:")
print(matrix)
4x4 transformation matrix:
[[1. 0. 0. 1.]
 [0. 1. 0. 2.]
 [0. 0. 1. 3.]
 [0. 0. 0. 1.]]

Frame IDs in Modules

Modules in DimOS automatically get a frame_id property. This is controlled by two config options in core/module.py:
  • frame_id - The base frame name (defaults to the class name)
  • frame_id_prefix - Optional prefix for namespacing
from dimos.core.module import Module, ModuleConfig

class MyModuleConfig(ModuleConfig):
    frame_id: str = "sensor_link"
    frame_id_prefix: str | None = None

class MySensorModule(Module):
    config: MyModuleConfig

# With default config:
sensor = MySensorModule()
print(f"Default frame_id: {sensor.frame_id}")

# With prefix (useful for multi-robot scenarios):
sensor2 = MySensorModule(frame_id_prefix="robot1")
print(f"With prefix: {sensor2.frame_id}")
Default frame_id: sensor_link
With prefix: robot1/sensor_link

The TF Service

Every module has access to self.tf, a transform service that:
  • Publishes transforms to the system
  • Looks up transforms between any two frames
  • Buffers historical transforms for temporal queries
The TF service is implemented in tf.py and is lazily initialized on first access.

Multi-Module Transform Example

This example demonstrates how multiple modules publish and receive transforms. Three modules work together:
  1. RobotBaseModule - Publishes world -> base_link (robot’s position in the world)
  2. CameraModule - Publishes base_link -> camera_link (camera mounting position) and camera_link -> camera_optical (optical frame convention)
  3. PerceptionModule - Looks up transforms between any frames
skip ansi=false
import time
import reactivex as rx
from reactivex import operators as ops
from dimos.core.core import rpc
from dimos.core.module import Module
from dimos.msgs.geometry_msgs.Quaternion import Quaternion
from dimos.msgs.geometry_msgs.Transform import Transform
from dimos.msgs.geometry_msgs.Vector3 import Vector3
from dimos.core.coordination.module_coordinator import ModuleCoordinator

class RobotBaseModule(Module):
    """Publishes the robot's position in the world frame at 10Hz."""

    @rpc
    def start(self) -> None:
        super().start()

        def publish_pose(_):
            robot_pose = Transform(
                translation=Vector3(2.5, 3.0, 0.0),
                rotation=Quaternion(0.0, 0.0, 0.0, 1.0),
                frame_id="world",
                child_frame_id="base_link",
                ts=time.time(),
            )
            self.tf.publish(robot_pose)

        self.register_disposable(
            rx.interval(0.1).subscribe(publish_pose)
        )

class CameraModule(Module):
    """Publishes camera transforms at 10Hz."""
    @rpc
    def start(self) -> None:
        super().start()

        def publish_transforms(_):
            camera_mount = Transform(
                translation=Vector3(1.0, 0.0, 0.3),
                rotation=Quaternion(0.0, 0.0, 0.0, 1.0),
                frame_id="base_link",
                child_frame_id="camera_link",
                ts=time.time(),
            )
            optical_frame = Transform(
                translation=Vector3(0.0, 0.0, 0.0),
                rotation=Quaternion(-0.5, 0.5, -0.5, 0.5),
                frame_id="camera_link",
                child_frame_id="camera_optical",
                ts=time.time(),
            )
            self.tf.publish(camera_mount, optical_frame)

        self.register_disposable(
            rx.interval(0.1).subscribe(publish_transforms)
        )

class PerceptionModule(Module):
    """Receives transforms and performs lookups."""

    @rpc
    def start(self) -> None:
        super().start()
        # This is just to init the transforms system.
        # Touching the property for the first time enables the system for this module.
        # Transform lookups normally happen in fast loops in IRL modules.
        _ = self.tf

    @rpc
    def lookup(self) -> None:

        # Will pretty-print information on transforms in the buffer
        print(self.tf)

        direct = self.tf.get("world", "base_link")
        print(f"Direct: robot is at ({direct.translation.x}, {direct.translation.y})m in world\n")

        # Chained lookup - automatically composes world -> base -> camera -> optical
        chained = self.tf.get("world", "camera_optical")
        print(f"Chained: {chained}\n")

        # Inverse lookup - automatically inverts direction
        inverse = self.tf.get("camera_optical", "world")
        print(f"Inverse: {inverse}\n")

        print("Transform tree:")
        print(self.tf.graph())

if __name__ == "__main__":
    dimos = ModuleCoordinator()
    dimos.start()

    robot = dimos.deploy(RobotBaseModule)
    camera = dimos.deploy(CameraModule)
    perception = dimos.deploy(PerceptionModule)

    dimos.start_all_modules()

    # Give worker TF publishers a moment to populate the buffer before querying.
    time.sleep(2.5)

    perception.lookup()

    dimos.stop()

16:21:45.203 [inf][ation/worker_manager_python.py] Worker pool started. n_workers=2
16:21:45.445 [inf][/coordination/python_worker.py] Deployed module. module=RobotBaseModule module_id=0 worker_id=0
16:21:45.451 [inf][/coordination/python_worker.py] Deployed module. module=CameraModule module_id=1 worker_id=1
16:21:45.452 [inf][/coordination/python_worker.py] Deployed module. module=PerceptionModule module_id=2 worker_id=0
16:21:47.968 [inf][dination/module_coordinator.py] Stopping module... module=PerceptionModule
16:21:48.022 [inf][dination/module_coordinator.py] Module stopped. module=PerceptionModule
16:21:48.022 [inf][dination/module_coordinator.py] Stopping module... module=CameraModule
16:21:48.041 [inf][dination/module_coordinator.py] Module stopped. module=CameraModule
16:21:48.041 [inf][dination/module_coordinator.py] Stopping module... module=RobotBaseModule
16:21:48.062 [inf][dination/module_coordinator.py] Module stopped. module=RobotBaseModule
16:21:48.062 [inf][ation/worker_manager_python.py] Shutting down all workers...
16:21:48.062 [inf][/coordination/python_worker.py] Worker stopping module... module=CameraModule module_id=1 worker_id=1
16:21:48.063 [inf][/coordination/python_worker.py] Worker module stopped. module=CameraModule module_id=1 worker_id=1
LCMTF(3 buffers):
  TBuffer(base_link -> camera_link, 24 msgs, 2.37s [2026-04-21 01:21:45 - 2026-04-21 01:21:47])
  TBuffer(camera_link -> camera_optical, 24 msgs, 2.37s [2026-04-21 01:21:45 - 2026-04-21 01:21:47])
  TBuffer(world -> base_link, 24 msgs, 2.37s [2026-04-21 01:21:45 - 2026-04-21 01:21:47])
Direct: robot is at (2.5, 3.0)m in world

Chained: world -> camera_optical
  Translation: → Vector Vector([3.5 3.  0.3])
  Rotation: Quaternion(-0.500000, 0.500000, -0.500000, 0.500000)

Inverse: camera_optical -> world
  Translation: → Vector Vector([ 3.   0.3 -3.5])
  Rotation: Quaternion(0.500000, -0.500000, 0.500000, 0.500000)

Transform tree:
┌─────┐
│world│
└┬────┘
┌▽────────┐
│base_link│
└┬────────┘
┌▽──────────┐
│camera_link│
└┬──────────┘
┌▽─────────────┐
│camera_optical│
└──────────────┘
You can view these transforms in 3D using the Rerun viewer (see Visualization). transforms Key points:
  • Automatic broadcasting: self.tf.publish() broadcasts via LCM to all modules
  • Chained lookups: TF finds paths through the tree automatically
  • Inverse lookups: Request transforms in either direction
  • Temporal buffering: Transforms are timestamped and buffered (default 10s) for sensor fusion
The transform tree from the example above, showing which module publishes each transform: output

Internals

Transform Buffer

self.tf on module is a transform buffer. This is a standalone class that maintains a temporal buffer of transforms (default 10 seconds) allowing queries at past timestamps, you can use it directly:
import time

from dimos.msgs.geometry_msgs.Quaternion import Quaternion
from dimos.msgs.geometry_msgs.Transform import Transform
from dimos.msgs.geometry_msgs.Vector3 import Vector3
from dimos.protocol.tf.tf import MultiTBuffer

tf = MultiTBuffer()

# Simulate transforms at different times
for i in range(5):
    t = Transform(
        translation=Vector3(float(i), 0.0, 0.0),
        rotation=Quaternion(0.0, 0.0, 0.0, 1.0),
        frame_id="base_link",
        child_frame_id="camera_link",
        ts=time.time() + i * 0.1,
    )
    tf.receive_transform(t)

# Query the latest transform
result = tf.get("base_link", "camera_link")
print(f"Latest transform: x={result.translation.x}")
print(f"Buffer has {len(tf.buffers)} transform pair(s)")
print(tf)
Latest transform: x=4.0
Buffer has 1 transform pair(s)
MultiTBuffer(1 buffers):
  TBuffer(base_link -> camera_link, 5 msgs, 0.40s [2026-05-15 21:11:37 - 2026-05-15 21:11:37])
This is essential for sensor fusion where you need to know where the camera was when an image was captured, not where it is now.

Further Reading

For a visual introduction to transforms and coordinate frames: For the mathematical foundations, the ROS documentation provides detailed background: See also: