Chapter 1: What Is $dx$? — A Measuring Device That Eats Vectors, or a Row Vector

§1.0 The Mathematician’s One Dimension, the Physicist’s One Dimension

When we first learn the one-dimensional integral $\int f(x)\,dx$ in high school, the textbook places before us a single line: the $x$-axis. For a mathematics textbook, that is perfectly natural. A mathematician assumes a self-contained abstract world called one-dimensional space $\mathbb{R}^1$, and builds the logic there. A point is a real number $x$, a displacement is a real number $\Delta x$, and an integral is defined as the limit of “function times tiny width.” Everything is self-contained.

But here, we are physicists.

Note (I am not a physicist)

Of course, this is rhetoric. My degree is not in physics but in chemistry, and I am not currently in academia.

What I mean is that the viewpoint of this book may be useful not only to readers in physics, but also to readers in information science, mechanical engineering, electrical engineering, other areas of science and engineering, or simply anyone who has gotten lost in the notation of vector analysis or calculus.

“Physicist” here is not an institutional affiliation. It means a way of looking at things mathematically, as one often does in physical mathematics. I do not mean to put down mathematicians or mathematics books.

I still wrote it a little grandly because I simply wanted to say it.

The actual physical space we deal with is always three-dimensional. Even when a point mass appears to move along a straight line, it is not living in some “fictional one-dimensional space.” Rather, we are looking at a slice of three-dimensional space $\mathbb{R}^3$ in which displacement in the $y$ and $z$ directions happens not to be observed, or can be neglected.

For example, consider a cart sliding on a frictionless straight rail. We say, “This is one-dimensional motion in the $x$ direction,” but in reality:

What a physicist calls “one-dimensional motion” is a situation in which, in three-dimensional space, displacement in one particular direction dominates, while displacement in the other directions is either negligibly small or constrained to be exactly zero by the symmetry of the system.

Note (the embedding point of view)

The “physicist’s one dimension” here means viewing a one-dimensional parameter space as being mapped into three-dimensional space as a curve. In mathematics, such a map is first treated as a parametrization of a curve; under good conditions, such as no self-intersection and nonzero velocity, it is called an embedding.

This book is not denying abstract one-dimensional spaces themselves. Rather, here we first understand the “motion along a curve in space” that often appears in physical mathematics as a mapping into three-dimensional space.

Therefore, what a physicist calls a “one-dimensional infinitesimal displacement” is not really just a scalar $\Delta x$. It is more appropriately represented as the following column vector:

$$ \mathbf{v} := \begin{pmatrix} \Delta x \\ 0 \\ 0 \end{pmatrix} $$

We call a three-component expression of displacement from a point in three-dimensional space a displacement vector. Following the usual convention, the first component corresponds to displacement in the $x$-axis direction, the second component to displacement in the $y$-axis direction, and the third component to displacement in the $z$-axis direction.

The zeros in the second and third components should not be read as saying that those components are merely “being ignored” or “do not exist.” They are the result of a positive choice: we are currently focusing only on motion in the $x$-axis direction and intentionally excluding the other components from what we measure.

This physicist’s three-dimensional point of view lets us see that $dx$ is not a mere extra symbol hanging off the end of an integral sign. In the next section, from this perspective, we will reread $dx$ not as an infinitesimal quantity, but as a matrix, or as an operator.

Note (the standpoint of this book)

This book is, above all, a book for unwinding elementary calculus and vector analysis. It is not a systematic development of differential forms in arbitrary dimensions.

Therefore, until the end, we basically stay fixed in three-dimensional Cartesian coordinates $(x,y,z)$ and adopt the most straightforward expression from linear algebra: matrix representation. The $dx,dy,dz$ used in this book are defined, in this coordinate system, as concrete matrices that extract components.

The aim is to let vector analysis sink in first as concrete symbols and operations—without leaning on infinitesimals—within the plain setting of three dimensions, Cartesian coordinates, and matrices.

Note (on the name “Cartesian coordinates”)

In this book, “Cartesian coordinates” means an orthonormal coordinate system: a coordinate system whose axes meet at a common origin, are mutually orthogonal, and have equal spacing. In mathematics and in other books, this may also be called a rectangular coordinate system on Euclidean space, a Cartesian coordinate system, and so on. In this book, I will simply call it “Cartesian coordinates.”

Note (extensions in Part III)

In Part III, however, we will touch on curvilinear coordinates and extensions to higher dimensions as needed.


§1.1 Riemann Sums and Matrix Products

In the previous section, we reinterpreted a physical infinitesimal displacement as the column vector $\mathbf{v}$. As in §1.0,

$$ \mathbf{v} = \begin{pmatrix} \Delta x \\ 0 \\ 0 \end{pmatrix}. $$

With that picture in place, let us rewrite the familiar Riemann integral and see what perspective “$dx$ as a matrix” gives us.

Note (on transpose notation)

In mathematics and in other books, a column vector is sometimes laid sideways on the page and marked with a superscript ${}^T$, as in $(\Delta x,\Delta y,\Delta z)^T$. In this book, I avoid this kind of space-saving transpose notation as much as possible. Column vectors are written as columns, and row vectors as rows. If ${}^T$ appears exceptionally, read it as an intentional transpose operation.

1.1.1 The Standard Construction of a Riemann Sum (Review)

The definite integral of a function $f(x)$ over the interval $[a,b]$ is defined as follows. In high-school language, this is the method of exhaustion by subdivision.

Note (closed intervals)

The notation $[a,b]$ denotes the closed interval containing both endpoints $a$ and $b$: the set of real numbers $x$ such that $a \le x \le b$.

  1. Divide the interval into $n$ subintervals: $a=x_0
  2. Let the width of the small interval be $\Delta x_i=x_i-x_{i-1}$.
  3. Choose one representative point $\xi_i$ inside each subinterval $[x_{i-1},x_i]$.
  4. Form the Riemann sum $R_n:=\sum_{i=1}^n f(\xi_i)\Delta x_i$. We write the sum for this particular $n$-fold partition as $R_n$.
  5. Take the limit as the partition becomes finer. Strictly speaking, the maximum width of the subintervals approaches $0$. Merely increasing the number of pieces $n$ can still leave a large interval somewhere, so more advanced mathematics emphasizes this condition. In what follows, $\lim_{n\to\infty}R_n$ assumes partitions satisfying this condition:
$$ \int_a^b f(x)\,dx = \lim_{n\to\infty} R_n. $$

In the usual textbook explanation, you read $dx$ as what the “tiny width $\Delta x_i$” becomes in the limit. Here we go one step deeper.

1.1.2 Displacement on Each Subinterval

For the $i$-th subinterval $[x_{i-1},x_i]$, we write the displacement vector from §1.0, adjusted to the width of that interval, as

$$ \mathbf{v}_i = \begin{pmatrix} \Delta x_i \\ 0 \\ 0 \end{pmatrix}. $$

The subscript $i$ only means “for the $i$-th subinterval.” The object $\mathbf{v}_i$ is still a displacement vector. Since we are currently considering straight-line motion, the $y$ and $z$ components are zero. But that is not a condition imposed in advance; rather, in this particular situation, those components simply come out to be zero.

1.1.3 The Appearance of $dx$ — The Defining Assertion of This Book

Now we give the symbol $dx$ the following meaning. This may look bold, even strange, but it is also the defining feature of this book: here, we declare that $dx$ is the following $1\times3$ matrix.

That is, we define $dx$ as the following row vector, with its components written horizontally as a $1\times3$ matrix:

$$ dx := \begin{pmatrix} 1 & 0 & 0 \end{pmatrix}. $$

This is not, in fact, a private notation detached from the standard point of view. The same object already appears in standard linear algebra, tensor analysis, and manifold theory. What I am doing here is fixing standard coordinates on $\mathbb{R}^3$ and writing it out as a matrix from the beginning. I spell this out in the note below.

For readers familiar with more advanced mathematics: this is the matrix representation of the linear operator that takes an input column vector and returns only its $x$-component. It is the representation of $dx$ in the standard coordinates of $\mathbb{R}^3$.

Note ($dx=(1\ 0\ 0)$ and its relation to textbooks)

This notation is not often brought to the foreground in ordinary textbooks, but it is already there implicitly.

For readers who have studied vector analysis

For a scalar-valued function $f:\mathbb{R}^3\to\mathbb{R}$, the gradient is

$$ \nabla f = \begin{pmatrix} \frac{\partial f}{\partial x} \\ \frac{\partial f}{\partial y} \\ \frac{\partial f}{\partial z} \end{pmatrix}. $$

Its transpose

$$ (\nabla f)^T= \begin{pmatrix} \frac{\partial f}{\partial x} & \frac{\partial f}{\partial y} & \frac{\partial f}{\partial z} \end{pmatrix} $$

is a $1\times3$ row vector. If we set $f=x$, then $(\nabla x)^T=(1\ 0\ 0)$. In the Cartesian coordinates and matrix convention used in this book, this transpose of the gradient is precisely the row-vector representation of $dx$.

For readers who have studied tensor analysis

The symbols $dx^i$ are the dual basis to the coordinate basis $\frac{\partial}{\partial x^j}$, so

$$ dx^i\!\left(\frac{\partial}{\partial x^j}\right)=\delta^i_j. $$

Therefore, in Cartesian coordinates, their row-vector representations are

$$ dx^1=(1\ 0\ 0),\qquad dx^2=(0\ 1\ 0),\qquad dx^3=(0\ 0\ 1). $$

For readers familiar with manifold theory

We have $df_p:T_p\mathbb{R}^3\to\mathbb{R}$, $df_p(v)=v(f)$, and $dx^i_p(v)=v(x^i)$. In standard coordinates, if

$$ v=v^i\frac{\partial}{\partial x^i} \longmapsto \begin{pmatrix}v^1\\v^2\\v^3\end{pmatrix}, $$

then $dx^1_p(v)=v^1$, $dx^2_p(v)=v^2$, and $dx^3_p(v)=v^3$. Thus, in standard coordinate representation,

$$ dx^1_p\longmapsto(1\ 0\ 0),\qquad dx^2_p\longmapsto(0\ 1\ 0),\qquad dx^3_p\longmapsto(0\ 0\ 1). $$

Now multiply the displacement vector $\mathbf{v}_i$ on the left by the row vector $dx$:

$$ dx\,\mathbf{v}_i = \begin{pmatrix} 1 & 0 & 0 \end{pmatrix} \begin{pmatrix} \Delta x_i \\ 0 \\ 0 \end{pmatrix} = \Delta x_i. $$

Here we write the image that “$dx$ eats the displacement vector $\mathbf{v}_i$ and spits out the scalar $\Delta x_i$” as if it were a function value:

$$ dx(\mathbf{v}_i) := dx\,\mathbf{v}_i = \Delta x_i. $$

In this book, I will often emphasize this form $dx(\mathbf{v})$: the row vector $dx$ acts on the column vector $\mathbf{v}$. This point cannot be emphasized too often.

Note (operator, functional, function, or map?)

Different books use different words for this $dx(\mathbf{v})$: operator, functional, function, or map. I prefer to call it an operator, so that is the word I will often use from here on.

The $\Delta x_i$ that appears in the Riemann sum is not a “mysterious one-dimensional infinitesimal quantity.” It is the result of applying the matrix $dx$ above to the displacement vector $\mathbf{v}_i$ in three-dimensional space and extracting only its $x$-component.

In other words,

$$ \Delta x_i = dx(\mathbf{v}_i). $$

This is the decisive moment:

Thus $\Delta x_i$ is not a mysterious one-dimensional infinitesimal displacement, but $dx(\mathbf{v}_i)$, obtained by extracting only the $x$-direction component from the three-dimensional infinitesimal displacement $\mathbf{v}_i$.

Note (component-order convention)

For row vectors and abbreviated row components, I write $(1\ 0\ 0)$, with no commas, separating components only by spaces. I use notation with commas followed by spaces, such as $(1, 0, 0)$, when emphasizing coordinates, such as the position of a point. I do not use that notation for abbreviated matrices or row vectors.

Once this one line lands, the $dx$ at the end of the integral sign starts to look different. Next, let us rewrite the ordinary Riemann sum in this notation.

1.1.4 Vector Reinterpretation of the Riemann Sum and the Integral Sign

From this point of view, the Riemann sum becomes

$$ R_n = \sum_{i=1}^n f(\xi_i)\,dx(\mathbf{v}_i). $$

Since

$$ dx(\mathbf{v}_i)=\Delta x_i, $$

this is the same as the ordinary Riemann sum

$$ R_n = \sum_{i=1}^n f(\xi_i)\,\Delta x_i. $$

But the same notation admits a second reading. On each subinterval, consider the row vector obtained by multiplying the measuring device $dx$ by the function value $f(\xi_i)$:

$$ \bigl(f(\xi_i)\,dx\bigr) := f(\xi_i)\,dx = \begin{pmatrix} f(\xi_i) & 0 & 0 \end{pmatrix}. $$

If this row vector acts on the displacement vector $\mathbf{v}_i$, then

$$ \bigl(f(\xi_i)\,dx\bigr)(\mathbf{v}_i) = f(\xi_i)\,dx(\mathbf{v}_i) = f(\xi_i)\,\Delta x_i. $$

In other words, we obtain the individual terms of the Riemann sum themselves.

Therefore, the Riemann sum can also be read as

$$ R_n = \sum_{i=1}^n \bigl(f(\xi_i)\,dx\bigr)(\mathbf{v}_i). $$

In the limit where the partition becomes infinitely fine, that is, in the limit where the maximum width tends to $0$, the definition of the Riemann integral gives

$$ \int_a^b f(x)\,dx = \lim_{n\to\infty} R_n = \lim_{n\to\infty} \sum_{i=1}^n \bigl(f(\xi_i)\,dx\bigr)(\mathbf{v}_i). $$

The left-hand side, $\int_a^b f(x)\,dx$, is the familiar integral notation we have known since high school. It is the same quantity as $\lim_{n\to\infty}R_n$ in the construction above.

The right-hand side is another face of the same integral: the limit of the sum, over subintervals, of the values obtained when the measuring device $f(\xi_i)\,dx$ acts on the displacement vector $\mathbf{v}_i$.

Thus the $dx$ at the end of the integral sign is not a mere decoration. At least in this reading, it works as a measuring device that extracts the $x$-direction width from the displacement vector on each subinterval. And $f(x)\,dx$ can be read as a new row vector obtained by multiplying that measuring device by the value of the function.

Note (work integrals)

The work integral $W=\int F(x)\,dx$ from mechanics can be read in the same way. On each subinterval, the row vector $F(\xi_i)\,dx$ eats the displacement vector $\mathbf{v}_i$ and returns $F(\xi_i)\Delta x_i$. The limit of the sum is the work.

1.1.5 Linear Forms ($1$-Forms)

So far, we have defined $dx$ as a row vector that acts on a displacement vector, extracts a specified component, and returns a scalar, that is, a real number. A linear measuring device of this kind, one that eats a vector and spits out a scalar, is technically called a linear form, or a $1$-form. What we have been calling “$dx$ as a matrix” is precisely this $1$-form.

Note (the term “covector”)

Mathematicians also often use the term covector as another name for a linear form, or $1$-form. Keep it in the corner of your mind as a dictionary entry for reading other books.

The origin of the name is simple:

From here on, I will call this measuring device a “matrix,” a “linear form ($1$-form),” or an operator. These all refer to the same object.

Note (matrix, linear form, measuring device)

To repeat: from the standpoint of this book, these are represented by the same array. Throughout the book, I will call the same object a “matrix,” a “linear form,” or a “measuring device,” depending on what needs to be emphasized.

1.1.6 The Notational Contract of This Book

In this book, we do not use a standalone $dx$ to mean an “infinitesimal displacement” or an “infinitesimal change.” To avoid confusing the width of a displacement with the measuring device, the row vector, we make the following agreement. For the magnitude of an infinitesimal displacement in the $x$ direction, we use $\Delta x$, or we explicitly write the displacement vector $\mathbf{v}$ and write $dx(\mathbf{v})$.

On the other hand, a standalone $dx$ means the operator, the row vector or $1$-form, that extracts the $x$-component. A $dx$ appearing in an expression such as $df=f'(x)\,dx$ is also always read as an operator. Keep this distinction clear. Everything that follows depends on it.

From here on, the $dx$ at the end of the integral sign $\int_a^b f(x)\,dx$ will remain as conventional notation inherited from high school. But outside the integral sign, we will always write it in the form $dx(\mathbf{v})$ when the action is being shown explicitly. In the main text, when speaking about displacements and infinitesimal widths, we will use $\Delta x$ or $dx(\mathbf{v})$, and we will not attach the image of width or amount of change to a bare $dx$.

Note (usage of $dx$ in the literature)

Physicists often use the convention of writing the displacement component itself as $dx$. But once you get used to the notation of this book, it becomes easier to read by separating $dx$ as a measuring device from the scalar of displacement. In more advanced books, that distinction will matter directly.

In the next section, we extend this point of view to the differential of a function and define the total differential $df$ as a matrix.


Checkpoint so far

- An infinitesimal displacement is a column vector $\mathbf{v}$. The symbol $\Delta x$ is a scalar width, while a standalone $dx$ is an operator, a row vector or $1$-form.

- The Cartesian $dy$ and $dz$ will be introduced in the next section, §1.2.3. They are $1$-forms based on the same idea as $dx$ in §1.1.6. Their distinction from $\Delta y$, $\Delta z$, and from $dy(\mathbf{v})$, $dz(\mathbf{v})$, follows the notational contract in §1.1.6.

- Each term of the Riemann sum has the form $(f\,dx)(\mathbf{v}_i)$, and the integral can be understood as the limit of such sums.

- The $dx$ at the end of the integral sign $\int_a^b f(x)\,dx$ is conventional notation. The displacement itself is written as $\Delta x$ or $dx(\mathbf{v})$, according to the contract in §1.1.6.


§1.2 The Total Differential $df$ — An Operator That Packs Rates of Change into a Matrix

In the previous section, we dismantled the Riemann integral as “the limit of sums of the action of the matrix $f(x)\,dx$ on displacement vectors.” In this section, we extend that idea to the differential of the function itself and define the total differential $df$ as a row vector, or matrix. A “differential” is not merely a numerical rate of change. We treat it as an operator that produces the linear part of the change only after it is multiplied by a displacement vector.

1.2.1 Matrix Representation of Differentiability

A function $f(x)$ is differentiable at a point $x$ if the following linear approximation holds:

$$ \Delta f = f(x+\Delta x)-f(x)=f'(x)\Delta x+o(|\Delta x|)\quad(|\Delta x|\to0). $$

Note (Landau order notation)

The term $o(|\Delta x|)$ is Landau order notation. As $\Delta x\to0$, the quantity written as $o(|\Delta x|)$ is a remainder whose ratio to $|\Delta x|$ tends to $0$. In other words, it is small enough to ignore compared with the main term $f'(x)\Delta x$. This notation often appears in science and engineering texts, though some readers may be seeing it for the first time.

Here $\Delta x$ is a scalar, but as in the previous section, we represent the corresponding displacement as the three-dimensional vector:

$$ \mathbf{v}=\begin{pmatrix}\Delta x\\0\\0\end{pmatrix}. $$

Then the change $\Delta f$ of the function can be regarded as the effect caused by this $\mathbf{v}$.

1.2.2 Defining $df$ as a Matrix and Reading $df(\mathbf{v})$

At the point $x$, define the total differential $df$ of the function $f$ as the following $1\times3$ row vector:

$$ df:=f'(x)\,dx=\begin{pmatrix}f'(x)&0&0\end{pmatrix}. $$

As in the previous section, we write the result of letting this $df$ act on the displacement $\mathbf{v}$ as $df(\mathbf{v})$:

$$ df(\mathbf{v})= \begin{pmatrix}f'(x)&0&0\end{pmatrix} \begin{pmatrix}\Delta x\\0\\0\end{pmatrix} =f'(x)\,\Delta x. $$

Here is the important change in viewpoint. Always read it as follows:

The object $df$ itself is not a change. It is a measuring device for the linear part of the change.

Note ($df$ is an operator; $df(\mathbf{v})$ is the linear part of the change)

This is the same distinction as the notational contract in §1.1.6, stated once again. It may feel repetitive, but if you misread this point, everything that follows will shift out of alignment. The repetition is worth it.

1.2.3 The Measuring Devices $dy$ and $dz$ in the $y$ and $z$ Directions

The strength of this viewpoint shows up when we return to the mechanics example from §1.1.5: it extends naturally to the general case where the force has a component in the $y$ direction as well. With the $dy$ we now define, two-dimensional work can be treated uniformly in the form $F_x\,dx+F_y\,dy$.

Just as $dx$ extracts the $x$-component, we define $dy$ in physical space as the row vector, or $1$-form, that extracts the $y$-component, and $dz$ as the one that extracts the $z$-component. Their matrix representations are

$$ dy:=\begin{pmatrix}0&1&0\end{pmatrix},\qquad dz:=\begin{pmatrix}0&0&1\end{pmatrix}. $$

For a displacement

$$ \mathbf{v}=\begin{pmatrix}\Delta x\\\Delta y\\\Delta z\end{pmatrix}, $$

we have $dy(\mathbf{v})=\Delta y$ and $dz(\mathbf{v})=\Delta z$.

Note (the contract for $dy$ and $dz$)

A standalone $dy$ or $dz$ is an operator. When we speak of infinitesimal widths in the $y$ or $z$ direction, we write $\Delta y$, $\Delta z$, or $dy(\mathbf{v})$, $dz(\mathbf{v})$. The $dy$ and $dz$ appearing at the tail of an integral sign should be read in the same way as $dx$, as conventional notation inherited from elementary calculus. The detailed convention is the same idea as the notational contract for $dx$ in §1.1.6.

Now we have all three measuring devices for three-dimensional space. In the main line of this chapter, we have focused on the $x$-direction slice, but you should think of the coordinates $y,z$ and the measuring devices $dy,dz$ as being available from the beginning.

We have now supplemented the one-variable expression $df=f'(x)\,dx$ with $dy$ and $dz$ as measuring devices of the same type. In the concrete example below and in the extension to three dimensions in §1.2.6, we will see how all three devices come into play.

1.2.4 Example: $f(x)=x^2$

Let $f(x)=x^2$. Then $f'(x)=2x$.

For example, at $x=3$, the total differential is written explicitly as a row vector. Placing the row vector and column vector side by side,

$$ df=6\,dx=\begin{pmatrix}6&0&0\end{pmatrix},\qquad \mathbf{v}=\begin{pmatrix}0.1\\0\\0\end{pmatrix}. $$

Therefore the matrix product is

$$ df(\mathbf{v})= \begin{pmatrix}6&0&0\end{pmatrix} \begin{pmatrix}0.1\\0\\0\end{pmatrix} =6\cdot0.1+0\cdot0+0\cdot0=0.6. $$

Indeed, $f(3.1)-f(3)=9.61-9=0.61$, so the linear approximation $0.6$ agrees well.

1.2.5 Substitution in Integrals and Rebuilding Measuring Devices

So far, we have read $dx$ as a measuring device for displacement.

That is, $dx$ is not a standalone infinitesimal quantity. It is a row vector that eats a displacement vector and returns its $x$-component. This reading is the basic convention of this book.

Once we have this reading, the substitution rule familiar from high school begins to look a little different.

Suppose a variable $x$ is expressed in terms of another variable $t$ by

$$ x=\gamma(t). $$

In substitution, one often writes

$$ dx=\gamma'(t)\,dt. $$

In ordinary calculus, this formula is explained by differentiating a composite function. For example, if $F'(x)=f(x)$, then differentiating $F(\gamma(t))$ with respect to $t$ gives

$$ \frac{d}{dt}F(\gamma(t))=f(\gamma(t))\gamma'(t). $$

Therefore,

$$ \int_{t_0}^{t_1}f(\gamma(t))\gamma'(t)\,dt =F(\gamma(t_1))-F(\gamma(t_0)). $$

If we write the endpoints as

$$ x_0=\gamma(t_0),\qquad x_1=\gamma(t_1), $$

then the right-hand side is

$$ F(x_1)-F(x_0)=\int_{x_0}^{x_1}f(x)\,dx. $$

Thus we obtain

$$ \int_{x_0}^{x_1}f(x)\,dx = \int_{t_0}^{t_1}f(\gamma(t))\gamma'(t)\,dt. $$

Up to this point, this is the standard explanation of substitution.

But in this book, we look at the formula with slightly different eyes. The expression

$$ dx=\gamma'(t)\,dt $$

is not merely a computational symbol. It can be read as the result of rebuilding the measuring device $dx$ on the $x$ axis into a measuring device on the $t$ axis, using $dt$.

When $t$ changes a little, $x=\gamma(t)$ changes by $\gamma'(t)$ times that amount. Therefore, to build a measuring device that eats a small interval on the $t$ side and returns the same value as the displacement on the $x$ side, we need to multiply $dt$ by $\gamma'(t)$.

Chapter 4 will treat this viewpoint seriously as the pullback.

There, we will not use the formula as a mysterious given. We will rediscover the same coefficient $\gamma'(t)$ by mapping finite intervals and measuring their images. Then we will extend the same structure to finite cells and finite boxes.

For now, we have walked through the standard substitution calculation once. Later, in the main line of this book, we will reread it as “making measuring devices consistent.”

Note ($dx=\gamma'(t)\,dt$)

In ordinary calculus, for the substitution $x=\gamma(t)$, this relation is written as $dx=\gamma'(t)\,dt$.

From the standpoint of this book, however, this does not mean that the measuring device $dx$ itself on the $x$ side has moved to the $t$ side. It should be read as pulling back the $1$-form $dx$ on the target side along the map $\gamma:I\to\mathbb{R}$ to the domain side.

More precisely, using the pullback notation introduced in Chapter 4, it is safer to write $\gamma^\ast(dx)=\gamma'(t)\,dt$. In other words, the $dx$ on the left originally measures displacement on the $x$ side, while $\gamma'(t)\,dt$ on the right is the device rebuilt to measure displacement on the $t$ side and return the same first-order change.

At least from the standpoint of this book, hiding this distinction is one reason explanations of $dx$ in substitution can become confusing.

1.2.6 Extension to Three Dimensions

A function $f$ that assigns a scalar, or real number, to each point $(x,y,z)$ in space may be treated simply as a real-valued function on physical space.

Note (scalar fields)

In physics, such a function is often called a scalar field.

To say that $f(x,y,z)$ is totally differentiable at the point $(x,y,z)$ means that, for a displacement

$$ \mathbf{v}=\begin{pmatrix}\Delta x\\\Delta y\\\Delta z\end{pmatrix}, $$

the following relation holds:

$$ \begin{aligned} \Delta f &=f(x+\Delta x,\,y+\Delta y,\,z+\Delta z)-f(x,y,z)\\ &=\frac{\partial f}{\partial x}\,\Delta x {}+\frac{\partial f}{\partial y}\,\Delta y {}+\frac{\partial f}{\partial z}\,\Delta z {}+o(\|\mathbf{v}\|)\quad(\|\mathbf{v}\|\to0). \end{aligned} $$

This has the same form as the one-variable definition in §1.2.1. The term $o(\|\mathbf{v}\|)$ is the same kind of remainder notation as $o(|\Delta x|)$. We will not go deeply into it here, but it should not obstruct the discussion.

Using the measuring devices $dx,dy,dz$ from §1.2.3, define the same type of operator as in §1.2.2 by

$$ df:=\frac{\partial f}{\partial x}\,dx+ \frac{\partial f}{\partial y}\,dy+ \frac{\partial f}{\partial z}\,dz = \begin{pmatrix} \frac{\partial f}{\partial x}& \frac{\partial f}{\partial y}& \frac{\partial f}{\partial z} \end{pmatrix}. $$

For the next few lines, we drop the $o(\|\mathbf{v}\|)$ term and follow only the skeleton of the notation. By definition of differentiability, for $\mathbf{v}=\begin{pmatrix}\Delta x\\\Delta y\\\Delta z\end{pmatrix}$ we have $\Delta f=df(\mathbf{v})+o(\|\mathbf{v}\|)$. So $df(\mathbf{v})$ is not the exact change $\Delta f$, but its linear principal part once the remainder is stripped away. Writing that principal part as a matrix product gives

$$ \begin{aligned} df(\mathbf{v}) &=\frac{\partial f}{\partial x} \begin{pmatrix}1&0&0\end{pmatrix} \begin{pmatrix}\Delta x\\\Delta y\\\Delta z\end{pmatrix} +\frac{\partial f}{\partial y} \begin{pmatrix}0&1&0\end{pmatrix} \begin{pmatrix}\Delta x\\\Delta y\\\Delta z\end{pmatrix}\\ &\quad+\frac{\partial f}{\partial z} \begin{pmatrix}0&0&1\end{pmatrix} \begin{pmatrix}\Delta x\\\Delta y\\\Delta z\end{pmatrix}\\ &=\frac{\partial f}{\partial x}\,\Delta x+ \frac{\partial f}{\partial y}\,\Delta y+ \frac{\partial f}{\partial z}\,\Delta z. \end{aligned} $$

The first line shows, as a matrix product, how the measuring devices $dx,dy,dz$ from §1.2.3 extract the $x,y,z$ components from $\mathbf{v}$ as row vectors, with the partial derivatives attached as coefficients. The second line is the evaluated result, which has the same form as the linear principal term in the definition of $\Delta f$ above. The reading from §1.2.2 remains unchanged in three dimensions: row vector, or operator, times column vector, or displacement, gives a scalar.

Note (usage of $\Delta f$) In this book, depending on context, $\Delta f$ may denote the exact difference $f(p+\mathbf{v})-f(p)$ or a shorthand for the first-order part with the remainder dropped. Strictly, by the definition of differentiability, $\Delta f = df(\mathbf{v}) + o(\|\mathbf{v}\|)$, and $df(\mathbf{v})$ is that first-order part. Later, when only the first-order approximation matters, we may omit the remainder and treat $\Delta f \approx df(\mathbf{v})$ in that sense.

If a function $f(x,y)$ involves only two coordinates, we can include it in the same framework by regarding it as independent of $z$, so that $\partial f/\partial z=0$. This three-dimensional framework contains the two-dimensional case naturally.

Compare the cases.

For one variable:

$$ df:=f'(x)\,dx=\begin{pmatrix}f'(x)&0&0\end{pmatrix}. $$

For a two-variable function $f(x,y)$, using $dy$ from §1.2.3:

$$ df:=\frac{\partial f}{\partial x}\,dx+ \frac{\partial f}{\partial y}\,dy = \begin{pmatrix} \frac{\partial f}{\partial x}& \frac{\partial f}{\partial y}&0 \end{pmatrix}. $$

For three variables $f(x,y,z)$, we simply have all three components of the row vector, as in §1.2.6.

Thus the form is the same; only the number of components increases. This is the advantage of the three-dimensional matrix representation. By viewing $df$ as a matrix, we can treat differentiation uniformly as an operator that gives the linear approximation with respect to a displacement.

1.2.7 A Preview of Line Integrals

Suppose a curve in space is parametrized by

$$ \mathbf{r}(t)=\begin{pmatrix}x(t)\\y(t)\\z(t)\end{pmatrix}, $$

and write the displacement of each small step as $\Delta\mathbf{r}$. The limit of the operation of summing $(df)(\Delta\mathbf{r})$ over each step is written symbolically as

$$ \int_\gamma df $$

where $\gamma$ is the curve. Just as with $\int f'(x)\,dx$ in §1.2.5, the skeleton is the limit of a Riemann sum in which the $1$-form $df$ acts on the displacement of each step. The precise treatment of closed curves, orientation, changes of parameter, and so on is left to later chapters.


Checkpoint so far

- Differentiability is understood as a linear approximation of the form $\Delta f=f'(x)\Delta x+o(|\Delta x|)$, where $o$ is Landau order notation.

- $df=f'(x)\,dx$ is a row vector. Its action on a displacement, $df(\mathbf{v})$, gives the linear part of the change.

- We look once at the standard starting point for substitution in integrals, but in this book we later reread it as “making measuring devices consistent” (§1.2.5).


§1.3 Leibniz Notation and Algebraic Intuition

When Leibniz introduced the symbols $dx$ and $dy$, he treated them intuitively as “infinitesimals.”

Note (Leibniz and notation)

Gottfried Wilhelm Leibniz (1646–1716) was the mathematician who organized the notation of calculus, including symbols such as $dx$ and $dy$. We will not go into the historical details here. I mention him only as the person who introduced the notation. Many readers are already used to the symbol $dx$ itself; if the name is unfamiliar, this one sentence is enough.

We have relocated that intuition into the language of linear algebra. Leibniz’s notation

$$ df=f'(x)\,dx $$

is not merely a formal equality:

Leibniz’s genius was to treat differentiation and integration as essentially algebraic operations. This book gives that intuition a concrete skeleton: matrices.

In the next section, we make explicit the hidden assumption behind this framework: the “convenience” of Cartesian coordinates. That also points toward more general coordinates.


Checkpoint so far

- In this book, the Leibnizian intuitions behind $dx$ and $df$ are row vectors, or operators, acting on displacements.

- When discussing displacement widths, we write $\Delta x$; when discussing the linear part of a function’s change, we write $df(\mathbf{v})$.


§1.4 Coordinate Transformations — Rebuilding Measuring Devices in New Coordinates

Note (how to read this section)

It is all right if you do not fully understand this section on the first reading. What matters here is not the formula for cylindrical coordinates itself, but the feeling that, once we view $dx$ as a matrix, we can still compute when coordinates change.

To repeat: for now, atmosphere is enough. In Chapter 4, we will spend much more space on the same idea. There, we will start by actually mapping finite intervals and finite cells and measuring their images. That will let us explain the rebuilding of measuring devices in a more organized way, without relying on the assumption that something is “sufficiently small.” What follows here is an explanation closer to the approximate viewpoint often used in ordinary calculus and mathematical physics: we assume that the change in parameter space is sufficiently small and extract only the first-order term.

Throughout this chapter, we have discussed the matrix representation of $dx$ by setting

$$ dx:=\begin{pmatrix}1&0&0\end{pmatrix}. $$

As long as we view physical space in Cartesian coordinates $(x,y,z)$, this is the measuring device that extracts the $x$-component.

But physical problems are not always rectangular. In problems such as flow inside a circular pipe or the magnetic field around a straight current, it is often easier to use cylindrical coordinates, where a point is specified by the triple $(r,\theta,z)$.

So what does the measuring device $dx$ in physical space look like when viewed through cylindrical coordinates?

The important point is not to memorize a coordinate-transformation formula. The point is to rebuild a measuring device in physical space into one expressed in another set of variables, so that it returns the same first-order value.

Note (locality of cylindrical coordinates)

Here we do not deal with the periodicity of $\theta$ or the singularity at $r=0$. We treat cylindrical coordinates only as a local parametrization.

In actual computations in mathematical physics using cylindrical or spherical coordinates, one can often avoid coordinate singularities by choosing the integration range or the target region appropriately.

However, in problems involving the origin, an axis, or the periodicity of the angle in an essential way, this local representation is not enough. In such cases, one must handle coordinate patches and boundary conditions more carefully.

1.4.1 The Transformation from Cylindrical Coordinates to Physical Space

Write the transformation from the parameter space of cylindrical coordinates to physical space as

$$ \Phi(r,\theta,z) = \begin{pmatrix} r\cos\theta\\ r\sin\theta\\ z \end{pmatrix}. $$

This transformation sends the point $(r,\theta,z)$ in parameter space to the point $(x,y,z)$ in physical space. In other words,

$$ x=r\cos\theta,\qquad y=r\sin\theta,\qquad z=z. $$

The question is: what kind of measuring device does the physical-space $dx$ become on the parameter-space side?

1.4.2 Map a Small Step and Measure It with $dx$

Consider a small step

$$ \mathbf h = \begin{pmatrix} \Delta r\\ \Delta\theta\\ \Delta z \end{pmatrix} $$

from the point

$$ p=(r,\theta,z) $$

in parameter space.

After the move, the point is

$$ p+\mathbf h = (r+\Delta r,\theta+\Delta\theta,z+\Delta z). $$

But this $\mathbf h$ is not a displacement vector in physical space. It lists the changes in the parameter space of cylindrical coordinates.

Now send this small step into physical space by the transformation $\Phi$. The displacement in physical space is

$$ \Phi(p+\mathbf h)-\Phi(p). $$

Written componentwise, this is

$$ \Phi(r+\Delta r,\theta+\Delta\theta,z+\Delta z) - \Phi(r,\theta,z) = \begin{pmatrix} (r+\Delta r)\cos(\theta+\Delta\theta)-r\cos\theta\\ (r+\Delta r)\sin(\theta+\Delta\theta)-r\sin\theta\\ \Delta z \end{pmatrix}. $$

This vector is the true displacement in physical space.

The physical-space measuring device $dx$ extracts only the $x$-component from a physical-space displacement vector. Therefore, applying $dx$ to the displacement above gives

$$ dx\bigl(\Phi(p+\mathbf h)-\Phi(p)\bigr) = (r+\Delta r)\cos(\theta+\Delta\theta)-r\cos\theta. $$

This value measures how much the $x$-component in physical space changes when we move by $\mathbf h$ in parameter space.

1.4.3 Build a Measuring Device from the First-Order Term

Now assume that $\Delta r,\Delta\theta,\Delta z$ are small, and look only at the first-order terms.

First,

$$ \Delta x = (r+\Delta r)\cos(\theta+\Delta\theta)-r\cos\theta. $$

Using the first-order approximation

$$ \cos(\theta+\Delta\theta) = \cos\theta-\sin\theta\,\Delta\theta + \text{higher-order terms}, $$

we get

$$ \begin{aligned} \Delta x &= (r+\Delta r)\cos(\theta+\Delta\theta)-r\cos\theta\\ &= (r+\Delta r)(\cos\theta-\sin\theta\,\Delta\theta+\text{higher-order terms})-r\cos\theta\\ &= r\cos\theta +\cos\theta\,\Delta r -r\sin\theta\,\Delta\theta +\text{higher-order terms} -r\cos\theta\\ &= \cos\theta\,\Delta r -r\sin\theta\,\Delta\theta +\text{higher-order terms}. \end{aligned} $$

Since $\Delta z$ does not affect the $x$-component, we can write

$$ \Delta x = \cos\theta\,\Delta r -r\sin\theta\,\Delta\theta +0\cdot\Delta z + \text{higher-order terms}. $$

The first-order term we have obtained is

$$ \cos\theta\,\Delta r -r\sin\theta\,\Delta\theta +0\cdot\Delta z. $$

This is equal to the value obtained by applying the following row vector to the small step in parameter space,

$$ \mathbf h = \begin{pmatrix} \Delta r\\ \Delta\theta\\ \Delta z \end{pmatrix}: $$ $$ \begin{pmatrix} \cos\theta & -r\sin\theta & 0 \end{pmatrix} \begin{pmatrix} \Delta r\\ \Delta\theta\\ \Delta z \end{pmatrix} = \cos\theta\,\Delta r -r\sin\theta\,\Delta\theta +0\cdot\Delta z. $$

Therefore, if we rewrite the physical-space measuring device $dx$ as a measuring device on the cylindrical-coordinate parameter space that returns the same first-order value, we get

$$ \begin{pmatrix} \cos\theta & -r\sin\theta & 0 \end{pmatrix}. $$

This is what $dx$ looks like when viewed through cylindrical coordinates.

Note (splitting large geometry into small geometry)

Many textbooks draw a curved $(r,\theta)$ grid in physical space and explain geometrically that “the arc length in the $\theta$ direction is approximately $r\Delta\theta$.” That intuition itself is correct.

In this book, however, we first prioritize an algebraic procedure: map a small step in parameter space into physical space, and then measure its image.

The coefficient $-r\sin\theta$ carries the information of how much a step in the $\theta$ direction contributes to the $x$ direction in physical space. Similarly, $\cos\theta$ tells us how much the $r$ direction contributes to the $x$ direction, and $0$ tells us that the $z$ direction does not contribute to the $x$ direction.

In other words, the single row $\begin{pmatrix}\cos\theta & -r\sin\theta & 0\end{pmatrix}$ is the geometry of the change in the $x$-component split into three small components.

1.4.4 Reading It as a Pulled-Back Measuring Device

Read the computation above this way.

The physical-space measuring device $dx$ eats a displacement in physical space and returns its $x$-component. But when we want to compute in cylindrical coordinates, the input is not the physical-space displacement itself. The input is a small step in parameter space,

$$ \begin{pmatrix} \Delta r\\ \Delta\theta\\ \Delta z \end{pmatrix}. $$

So we build a measuring device that, when it eats a small step in parameter space, returns the same first-order value that $dx$ would have measured in physical space.

That measuring device was

$$ \begin{pmatrix} \cos\theta & -r\sin\theta & 0 \end{pmatrix}. $$

In the mathematics of measuring devices, this operation is called a pullback. Symbolically,

$$ \Phi^*(dx) = \cos\theta\,dr-r\sin\theta\,d\theta. $$

Here $dr,d\theta,dz$ are the measuring devices that extract, respectively, the first, second, and third components from a displacement in parameter space,

$$ \begin{pmatrix} \Delta r\\ \Delta\theta\\ \Delta z \end{pmatrix}. $$

Therefore,

$$ \cos\theta\,dr-r\sin\theta\,d\theta $$

is simply the parameter-space row vector

$$ \begin{pmatrix} \cos\theta & -r\sin\theta & 0 \end{pmatrix} $$

written in measuring-device notation.

In other words, it is safe here to read

$$ \Phi^*(dx) = \begin{pmatrix} \cos\theta & -r\sin\theta & 0 \end{pmatrix}. $$

The right-hand side, however, is a row vector that eats displacements in the parameter space of cylindrical coordinates:

$$ \begin{pmatrix} \Delta r\\ \Delta\theta\\ \Delta z \end{pmatrix}. $$

Note (we built it from a finite step here)

Here we did not start by using the known formula

$$ dx=\frac{\partial x}{\partial r}dr+\frac{\partial x}{\partial\theta}d\theta+\frac{\partial x}{\partial z}dz. $$

First, we mapped a small step in parameter space into physical space and measured its image using the physical-space $dx$. Then we built a measuring device on the parameter-space side that returns the same first-order value.

In Chapter 4, we will extend this same idea to finite intervals, finite cells, and finite boxes. There, we will see how measuring devices must be rebuilt in order to preserve lengths of intervals, areas, and volumes.

1.4.5 Two Cartesian Coordinate Systems

Let us now make clear the relation between the two triples of numbers we are using: $(x,y,z)$ in physical space and $(r,\theta,z)$ in the parameter space of cylindrical coordinates.

The physical-space $(x,y,z)$ is the usual Cartesian coordinate system. On the other hand, $(r,\theta,z)$ is also a coordinate system consisting of three numbers on the calculation page. Physically, $\theta$ is an angle, but in parameter space it is treated as one coordinate axis.

Thus both physical space and parameter space have “straight components” for purposes of calculation. What is curved is the transformation $\Phi$ connecting the two spaces. That curvature appears as coefficients such as $\cos\theta$ and $-r\sin\theta$.

One can think of curved coordinate axes as growing inside physical space. But in this book, I try not to think that way. Instead, I think as follows:

There are two places: physical space and parameter space. Between them is a translation rule $\Phi$.

From this point of view, rebuilding measuring devices becomes fairly simple. We pull the measuring device in physical space back to the parameter-space side through the transformation $\Phi$. Then we can read, directly from a displacement in parameter space, the same first-order value that would have been measured in physical space.

Note (matrix representations of $dr$, $d\theta$, and $dz$)

Parameter space is also, for purposes of calculation, a space with three components. Thus $dr$ is the measuring device that extracts the first component in parameter space, $d\theta$ extracts the second, and $dz$ extracts the third.

In matrix representation,

$$ dr=\begin{pmatrix}1&0&0\end{pmatrix},\qquad d\theta=\begin{pmatrix}0&1&0\end{pmatrix},\qquad dz=\begin{pmatrix}0&0&1\end{pmatrix}. $$

But these are measuring devices acting on displacements in parameter space. They live on a different space from the $dx,dy,dz$ that act on physical-space $(x,y,z)$.

1.4.6 Foreshadowing Chapter 4

In this way, the pullback is the operation of rebuilding a measuring device from physical space so that it returns the same value on the parameter-space side.

In Chapter 1, we have only seen one example, using the smallest measuring device, $dx$. In Chapter 4, however, we will extend the same idea to area and volume.

There, we will pull back area-measuring devices such as

$$ dx\wedge dy $$

and volume-measuring devices such as

$$ dx\wedge dy\wedge dz $$

to other variable systems.

The coefficients that appear then are the familiar $r$ in cylindrical coordinates, or the Jacobian $J$ that appears in general changes of variables.

In other words, the Chapter 4 formulas

$$ \Phi^*(dx\wedge dy)=r\,dr\wedge d\theta $$

and

$$ \Phi^*(dx\wedge dy\wedge dz)=J\,du\wedge dv\wedge dw $$

have the same structure as the formula we have just seen:

$$ \Phi^*(dx) = \cos\theta\,dr-r\sin\theta\,d\theta. $$

Note (the philosophy of delaying the metric)

The $\cos\theta$ and $-r\sin\theta$ that appeared here will later be organized in relation to the metric $g$ in Chapter 6. For now, however, it is enough to read them as the components of a measuring device rebuilt by the transformation $\Phi$.

It is also important to draw curved figures in physical space and estimate arc length as $r\Delta\theta$. But in this book, we first prioritize the algebraic procedure of mapping a small step in parameter space and measuring its image. That procedure will become important later when we compute areas and volumes.


Checkpoint so far

- The measuring device $dx$ itself extracts the $x$-component in physical space.

- When we want to compute in cylindrical coordinates, we rebuild the physical-space measuring device $dx$ as a measuring device on the parameter-space side.

- By mapping a small step in parameter space into physical space and measuring its image with $dx$, we can determine the components of the parameter-space measuring device.

- In cylindrical coordinates, $\Phi^*(dx)=\cos\theta\,dr-r\sin\theta\,d\theta$.

- In Chapter 4, we will extend the same idea to area-measuring devices and volume-measuring devices.


§1.5 Writing in Another Coordinate System — Exercises

Near the end of this chapter, we want you to work through, by hand, what it means to “write in another coordinate system.” We have not yet defined coordinate changes as a general theory, but in §1.4.3 we already saw that the same $dx$ gets a different matrix representation when coordinates change. This exercise lets you confirm that by calculation. Why do it before the full vocabulary of coordinate change is in place? So that when later chapters get busy, you already feel that changing components is only natural.

1.5.1 Exercise: Measurement in Cylindrical Coordinates

Problem setup As in the previous section, the matrix representation of $dx$ in cylindrical coordinates $(r,\theta,z)$ is

$$ dx = \begin{pmatrix} \cos\theta & -r\sin\theta & 0 \end{pmatrix} $$

(The row above is the row vector that acts on displacements in parameter space, constructed in §1.4.3. It is not obtained by taking the Cartesian row $\begin{pmatrix}1&0&0\end{pmatrix}$ from §1.1.3 and simply turning it into a column vector.)

Answer the following questions.

Question 1 Suppose a particle is at the point $P(r=2,\theta=\pi/6,z=0)$ in the parameter space of cylindrical coordinates. The particle undergoes a small displacement of $0.1$ in the $r$ direction. In this section, we denote by $\mathbf{v}$ the vector $\begin{pmatrix}\Delta r\\\Delta\theta\\\Delta z\end{pmatrix}$ listing the changes in the parameters $r,\theta,z$ (this ordering is different from the Cartesian column $\begin{pmatrix}\Delta x\\\Delta y\\\Delta z\end{pmatrix}$ in §1.0–§1.2). Write this displacement vector $\mathbf{v}$ in terms of its components.

Question 2 Apply the matrix $dx$ at the point $P$ to the displacement vector $\mathbf{v}$ from Question 1. (Use $\cos(\pi/6)=\sqrt{3}/2$.)

Question 3 Explain what the result $dx(\mathbf{v})$ from Question 2 means physically.

1.5.2 Solutions and Commentary

Answer to Question 1 The displacement is $0.1$ in the $r$ direction and zero in the $\theta$ and $z$ directions, so as a column vector listing changes in the cylindrical parameters, we write

$$ \mathbf{v} = \begin{pmatrix} 0.1 \\ 0 \\ 0 \end{pmatrix} $$

Answer to Question 2 At the point $P$, the matrix $dx$ is the $1\times 3$ row vector obtained by substituting $\theta=\pi/6$ and $r=2$:

$$ dx = \begin{pmatrix} \cos(\frac{\pi}{6}) & -2\sin(\frac{\pi}{6}) & 0 \end{pmatrix} = \begin{pmatrix} \frac{\sqrt{3}}{2} & -2 \times \frac{1}{2} & 0 \end{pmatrix} = \begin{pmatrix} \frac{\sqrt{3}}{2} & -1 & 0 \end{pmatrix} $$

(In cylindrical coordinates, the components of $dx$ depend on $r$ and $\theta$, so what we mean here is the row evaluated at the coordinates of the point $P$. The notation $dx(\mathbf{v})$ matches §1.1.3.)

What matters is that the second entry of the row vector is generally the coefficient $-r\sin\theta$, and thanks to $r$ it carries a length dimension. Substituting $r=2$ and $\theta=\pi/6$ gives the numerical value $-1$ for that coefficient, but the number $-1$ by itself does not carry a dimension. The correct reading is that the matrix entry $-r\sin\theta$ carries the length dimension. This is the key to dimensional analysis.

Applying this to the displacement vector $\mathbf{v}$ gives

$$ dx(\mathbf{v}) = \begin{pmatrix} \frac{\sqrt{3}}{2} & -1 & 0 \end{pmatrix} \begin{pmatrix} 0.1\\ 0\\ 0 \end{pmatrix} = 0.1 \times \frac{\sqrt{3}}{2} \approx 0.0866 $$

Answer to Question 3 (physical meaning) This result expresses the physical fact that a motion of $0.1$ outward in the $r$ direction in cylindrical coordinates corresponds, when viewed along the $x$ axis in physical space, to a forward displacement of about $0.0866$.

Note (“about” versus exact agreement)

For this displacement we have $\Delta\theta=\Delta z=0$, so from $x=r\cos\theta$ we get $\Delta x=(\Delta r)\cos\theta$ exactly. We wrote “about” only because the decimal display $0.0866$ is rounded.

1.5.3 Motion in the Angular Direction

At the same point $P(r=2,\theta=\pi/6,z=0)$, let us also consider motion by $0.1$ in the $\theta$ direction.

Then the displacement in the parameter space of cylindrical coordinates is

$$ \mathbf{v} = \begin{pmatrix} 0\\ 0.1\\ 0 \end{pmatrix} $$

At the point $P$, $dx$ is again

$$ dx = \begin{pmatrix} \frac{\sqrt{3}}{2} & -1 & 0 \end{pmatrix} $$

so

$$ dx(\mathbf{v}) = \begin{pmatrix} \frac{\sqrt{3}}{2} & -1 & 0 \end{pmatrix} \begin{pmatrix} 0\\ 0.1\\ 0 \end{pmatrix} = -0.1 $$

Here, the second component of the input vector, $0.1$, is an angle and therefore dimensionless. But the second entry of the row vector, $-r\sin\theta$, carries a length dimension. Therefore the output $-0.1$ has the dimension of length.

This is an important property of linear forms. The matrix entries of $dx$ themselves are functions that include $r$, and even when the input is given in coordinate components (with mismatched dimensions), the output is always automatically adjusted to have the correct physical dimension, namely “length in the $x$ direction.” A linear form is a measuring device that absorbs the distortion of the coordinate system and outputs a physically meaningful measured value.


§1.6 Summary of This Chapter and Outlook Toward the Next

Checkpoint — Chapter 1 as a whole

- Chapter 1 centers on reading $dx$ as a matrix / linear form and reading $\int f\,dx$ as the limit of actions. The main exposition leaned on the $x$ direction, but the Cartesian measuring devices $dy$ and $dz$ were defined in the same way as $dx$ in §1.2.3.

- $df$ is also unified as a row vector; the extension to several dimensions is simply an increase in the number of components.

- From the next chapter onward, we extend these measuring devices to the wedge product, exterior derivative, and Hodge star.

In this chapter we leaned on the $x$ direction for concrete examples of integrals and $df$, and often treated $y$ and $z$ as fixed “slices.” Even so, the linear forms in physical space are the full trio $dx$, $dy$, and $dz$ (§1.2.3).

In the next chapter we combine these three to introduce the wedge product, also called the exterior product $\wedge$. This constructs area-measuring devices and volume-measuring devices ($2$-forms and $3$-forms), and leads to higher differential forms that are not limited to objects that act on vectors and return scalars. Aggregation along curves, surfaces, and regions (line integrals, surface integrals, volume integrals) will be treated in later chapters, after these measuring devices are in place.

In later parts we proceed to the exterior derivative $\mathrm{d}$, the Hodge star operator $\ast$, and the vector-analysis operators $\mathrm{grad}$ ($\nabla$), $\mathrm{curl}$ ($\nabla\times$), and $\mathrm{div}$ ($\nabla\cdot$), followed by Stokes’s theorem, Maxwell’s equations, and the basic equations of fluid mechanics. The correspondence between differential forms and vector fields in three dimensions can be organized using the Hodge star.

Let us gradually unravel how three-dimensional Euclidean space looks.