The Shift Cipher

Cryptography is the study of cryptosystems, which are systems for encrypting data. Encrypting means taking data and changing it such that any unwanted person should not be able to read the data, but not changing it so much that the data is irretrievable. Specifically, the data should be able to be decrypted, which means changing the data back to its original, readable form. The data can be anything–passwords for your e-mail account or important personal information banks keep on their customers. The cryptosystems, that is the methods for encrypting, often strongly depends on the tools and theory of number theory.

In particular, when data is a message, the original message is often referred to as “plain text” while the encrypted text is referred to as the “cipher text”. There are many cryptosystems when working with messages. Among these are “classical cryptosystems”, which are systems no longer really used in modern times, but are still of interest, are instructive, and still have their uses.

One such classical system is the shift cipher. Before getting into that, there is some required knowledge of number theory, particular basic modular arithmetic. The following articles provide this:

After, the following article will give an introduction to the shift cipher:


See below for answers.

  1. Encrypt “exit” using the shift cipher x \mapsto (x+5) \mod 26 , where the alphabet is number 0 through 25
  2. Decrypt “xvii” using the attack method of trying the reverse shift cipher x \mapsto (x-k) \mod 26 for each integer k in the range 0 through 25 until the original message comes out


Answers: (1) jcny, (2) tree.


Multiple Asymptotes and Rational Functions

When sketching rational functions (f\left(x\right)=p\left(x\right)/q\left(x\right) where p\left(x\right) and q\left(x\right) are polynomials) in MCV4U, there are many steps in the process when all is said and done. One of these steps is determining whether there are any asymptotes. There are a few different kinds:

  • Vertical asymptotes (VA): The vertical line x=c such that q\left(c\right)=0 but p\left(c\right)\neq0
  • Horizontal asymptotes (HA): The horizontal line y=c such that either {\displaystyle \lim_{x\rightarrow\infty}f\left(x\right)=c} or {\displaystyle \lim_{x\rightarrow-\infty}f\left(x\right)=c}
  • Oblique asymptotes (OA): When the degree of p\left(x\right) is 1 plus the degree of q\left(x\right) , the OA is the slanted line y=mx+b which is the quotient after dividing q\left(x\right) into p\left(x\right) via polynomial division

Once when I was teaching these concepts in a MCV4U class, the following very interesting question was asked: “Can a function have multiple asymptotes?” It turns out that the answer is pretty interesting and enlightening, and it is considered in this post.

For VAs the answer is a fairly straightforward “yes.” For example, simply add factors to the denominator and keep a constant numerator:

{\displaystyle f\left(x\right)=\frac{1}{\left(x-1\right)\left(x-2\right)\left(x-3\right)}}

This function has VAs x=1 , 2 , and 3 . But, for HAs and OAs the answer is no longer so clear. Consider the following graph:


This function, as a fact, has two HAs: y=\pm\pi/2 . Similarly, if one were to rotate the graph above while fixing the axis, a graph of a function would be created with two OAs.

With all this said, therefore it should be true that functions can have two HAs or OAs… right? In fact, despite this, it is not true for rational functions. That is, there are indeed functions that can satisfy this, but given any rational function in the world, there is not way it can have more than one HA or OA. Essentially, the reason for this is that the process in obtaining the asymptotes results in there only being one answer, so it is impossible for there to be anything more than that. Algebraically, the answer goes a bit beyond the curriculum, but not too far. These algebraic proofs is what follows.

Rational functions have at most one HA


{\displaystyle f\left(x\right)=\frac{a_{n}x^{n}+a_{n-1}x^{n-1}+\cdots+a_{1}x+a_{0}}{b_{m}x^{m}+b_{m-1}x^{m-1}+\cdots+b_{1}x+b_{0}}}

where a_{n}\neq0\neq b_{m} . Of course, we will assume that there actually is a HA. For this to occur, it must be that m\geq n . This is because otherwise m<n\Longrightarrow m-n<0 , so that when we do the usual process for finding HAs of

{\displaystyle \frac{a_{n}x^{n}+a_{n-1}x^{n-1}+\cdots+a_{1}x+a_{0}}{b_{m}x^{m}+b_{m-1}x^{m-1}+\cdots+b_{1}x+b_{0}}\times\frac{1/x^{n}}{1/x^{n}}}={\displaystyle \frac{a_{n}+a_{n-1}/x+\cdots+a_{1}/x^{n-1}+a_{0}/x^{n}}{b_{m}x^{m-n}+b_{m-1}x^{m-n-1}+\cdots+b_{1}x^{1-n}+b_{0}x^{-n}}}

and take x\rightarrow\pm\infty , we get everything in the denominator going to 0 , contradicting there being a HA:

{\displaystyle \lim_{x\rightarrow\pm\infty}f\left(x\right)=\frac{a_{n}+0+\cdots+0+0}{0+0+\cdots+0+0}=\frac{a_{n}}{0}}

So, indeed, we must have that m\geq n . Consequently, the process of finding a HA instead leads us to dividing by x^{m} and obtaining

{\displaystyle \lim_{x\rightarrow\pm\infty}f\left(x\right)=\lim_{x\rightarrow\pm\infty}{\displaystyle \frac{a_{n}x^{n-m}+a_{n-1}x^{n-m-1}+\cdots+a_{1}x^{1-m}+a_{0}x^{-m}}{b_{m}+\frac{b_{m-1}}{x}+\cdots+\frac{b_{1}}{x^{m-1}}+\frac{b_{0}}{x^{m}}}}}

If n=m , then the limit is a_{n}/b_{m} (computed like above). Otherwise, m>n\Longrightarrow n-m<0 so that the limit is 0/b_{m}=0 . This means that the HA is either y=a_{m}/b_{m} or y=0 , depending on n and m . However, the answer for the HA does not depend on whether the limit is x\rightarrow\infty or x\rightarrow-\infty . Consequently, the HA is either one of these, but not both. This completes the proof

Rational functions have at most one OA

Let f\left(x\right)=p\left(x\right)/q\left(x\right) . Since there can only be an OA if the degree of p\left(x\right) is 1 more than that of q\left(x\right) , let the degree of p\left(x\right) be n+1 and that of q\left(x\right) be n . Euclidean division of polynomials, a theorem of dividing polynomials, then says that for dividing p\left(x\right) by q\left(x\right) , there are unique polynomials s\left(x\right) (the quotient) and r\left(x\right) (the remainder) such that


where the degree of r\left(x\right) is less than the degree of q\left(x\right) , which is n . (Unique here means that given p\left(x\right) and q\left(x\right) , the only quotient and remainder you can get by dividing p\left(x\right) by q\left(x\right) is s\left(x\right) and r\left(x\right) –nothing else.) By dividing the equation by q\left(x\right) we get the alternate expression

{\displaystyle f\left(x\right)=\frac{p\left(x\right)}{q\left(x\right)}=s\left(x\right)+\frac{r\left(x\right)}{q\left(x\right)}}

According to the procedure for finding an OA, the OA should therefore be y=s\left(x\right) . However, we need to know that s\left(x\right) is linear, i.e. it has degree 1 . But, since p\left(x\right) has degree n+1 , q\left(x\right) has degree n , and r\left(x\right) has degree less than n , then according to the equation


it must be that s\left(x\right) has in fact degree 1 . So, indeed y=s\left(x\right)=mx+b is the OA. Moreover, it is the only answer because it is unique–that is, there cannot be any other OAs. This completes the proof.

Cantor Set Construction through Ternary Expansions

Recall the Cantor (middle-third) set \Lambda  which is equal to


where E_0 = [0,1]  and E_k  for k \geq 1  is obtained by removing the middle-thirds of each of the disjoint closed intervals making up E_{k-1}  , i.e. each disjoint closed interval [a,b]  making up E_{k-1}  gets mapped to [a, a + 1/3^k] \cup [a+2/3^k, b]  .

For example, to obtain E_1  , [0,1]  gets mapped to [0,1/3] \cup [2/3,1]  , and this latter set is E_1  . To obtain E_2  , [0,1/3]  gets mapped to [0,1/9] \cup [2/9,1/3]  , and [2/3,1]  gets mapped to [2/3, 7/9] \cup [8/9, 1]  , and E_2  is the union of these disjoint intervals.

The purpose of this post is to show that in fact

\displaystyle{S := \left\{ \sum_{k=1}^{\infty}a_{k}3^{-k}:a_{k}\in\left\{ 0,2\right\}\right\} } = \Lambda 

This set S  is the collection of all ternary expansions with coefficients 0  and 2  .

First, it is important to note that the elements of S  all coverge to real numbers. This is true because

\displaystyle{\sum_{k=1}^{\infty}{a_k 3^{-k}} \leq 2 \sum_{k=1}^{\infty}{3^{-k}} <\infty} 

the latter series being geometric.

Next, it will be shown that \Lambda \subseteq S  . Let x \in \Lambda  . It is sufficient to construct a sequence a_k \subseteq \{0,2\}  corresponding to x  such that

\displaystyle{x = \sum_{k=1}^{\infty}{a_k 3^{-k}}} (*) 

By definition of \Lambda , x \in E_k  for all k  . First, x \in E_0 = [0,1]  . Second, x \in E_1  . Since E_1  is the result of removing the middle-third of [0,1]  , therefore necessarily x  is in the left or right remaining third. If it was the left, define a_1 = 0  , and otherwise define a_1 = 2  . Now let k \geq 1  . Since x \in E_{k-1}  , therefore it is in one of the disjoint closed intervals [a,b]  making up E_{k-1}  . Since x \in E_k  and [a,b]  has its left and right thirds mapped into E_k  , therefore necessarily x  is in the left third (a_k = 0  ) or the right third (a_k = 2  ).

Inductively this defines a sequence a_k \subseteq \{0,2\}  corresponding to x  . Define the partial sums

\displaystyle{S_k = \sum_{i=1}^k{a_i 3^{-i}}} 

To show (*) holds for our choice of a_k  , it is sufficient to show that \forall{k}  , a_k  and S_k  are in one of the disjoint closed intervals [a,b]  making up E_k  with a = S_k  . This is sufficient because a property of \Lambda  is that the length of the [a,b]  , excluding E_0  , is 3^{-k}  (e.g. by an inductive argument), so |S_k - x| \leq |a-b| = 3^{-k} \rightarrow 0 as k \rightarrow \infty  . This would imply

\displaystyle{\sum_{k=1}^\infty{a_k 3^{-k}} = \lim S_k = x} 

We proceed by induction. For k=1  , if a_1 = 0  , then by definition x \in [0,1/3]  and S_1 = 0  . Otherwise a_1 = 2  in which case x \in [2/3,1]  and S_1 = 2/3  . Hence the claim holds for k=1  .

Now suppose that x  and S_k  are both in one of the disjoint closed intervals [a,b]  making up E_k  and S_k = a  . Then [a,b]  is mapped to [a,a+1/3^{k+1}] \cup [a+2/3^{k+1},b]  for E_{k+1}  . If x  is in the left interval, then a_{k+1} = 0  , in which case S_{k+1} = S_k = a . Otherwise x  is in the right interval in which case a_{k+1} = 2  , so

S_{k+1} = S_k + 2/3^{k+1} = a + 2/3^{k+1} 

This completes showing (*) so that indeed \Lambda \subseteq S  .

To show that S \subseteq \Lambda  as well, let x \in S  . Then

\displaystyle{\sum_{k=1}^\infty{a_k 3^{-k}} = x} 

and a_k \in \{0,2\}  . The latter fact and an earlier argument show that the partial sums S_k  of x are still a left endpoint of one of the disjoint closed intervals making up E_k  for all k  . Consequently S_k \in \Lambda  . Since \Lambda  is (topologically) closed, it then follows that

\lim S_k = x 

is in \Lambda  . This completes the proof.

Seki Takakazu and the theory of Resultants

Seki Takakazu was a Japanese mathematician during the Edo period, which ran from 1603 to 1868 and “was characterized by economic growth, arts and culture, and isolationism” [1, 2]. Takakazu actually was around the time of both Leibniz and Newton, but their work was independent [1].


Takakazu [1]

Among many things in Takakazu’s mathematical career, he was known for work in Elimination theory: Algorithmic approaches to getting rid of variables between polynomials of several variables [1, 3]. In regards to the Ontario secondary mathematics curriculum, there are some relevant applications of Takakazu’s work to single variable polynomials.

One particular instance is Takakazu’s work on what is known as resultants [1]. Recall (MHF4U, C3.1.1) that a polynomial of degree n , a nonnegative integer, is an expression of the form

p(x)=a_n x^n + a_{n-1} x^{n-1} + \cdots + a_1 x + a_0

where a_k for 0 \leq k \leq n is a real number. In particular the above is a polynomial written as a function. For n=1 we have a linear function p(x)=a_1 x + a_0 and for n=2 we have a quadratic function p(x)=a_2 x^2 + a_1 x + a_0 . Interestingly, the work of resultants requires two polynomials. So let’s introduce another polynomial q(x) of degree m :

q(x)=b_m x^m + b_{m-1} x^{m-1} + \cdots + b_1 x + b_0

again for real numbers b_k . In MHF4U a lot of work is done with polynomials of degree n \leq 4 , including graphing and factoring in many different ways. As an extension to this, consider the following question:

“With two polynomials p(x) and q(x) , how do we know if they share a common factor?”

Recall (MHF4U, C3) that this question is asking if there is (x-c) for some real number c such that (x-c) is a factor of p(x) and q(x) . There are a few ways to accomplish this using the tools from MHF4U:

  • Completely factor both polynomials and see if any one factor appears in both
  • Completely factor only one of the polynomials and use methods to see if these factors are also factors of the other (e.g. polynomial division, substitution)

However both of these methods require work that seems more complicated than what the question is asking. The area of resultants that Takakazu worked on in fact addresses this. Using a formula, the “resultant of p(x) and q(x) ” is a real number that can be calculated very fast by computers. Continuing, it turns out that the number is zero if and only if p(x) and q(x) have a common factor. That is, the answer to the question is:

“Yes, if and only if the resultant of p(x) and q(x) is 0 .”

But how do we compute this? The command

resultant[p(x) , q(x) , x ]

when given to WolframAlpha does exactly this. Let’s do an example. Suppose




Dealing with a degree 3 polynomial makes it not so easy to answer the question about a common factor. But in telling WolframAlpha the command

resultant[x^3 – 4x^2 – 7x + 10, x^2 + 3x – 4, x]

the answer given back is 0 . Recall that this means that there is indeed a common factor. This is a nice example of using technology in mathematics. Unfortunately, this method does not also tell us what the common factor actualy is, just that one exists.

If you are curious about what WolframAlpha is doing when it makes this computation, then read a little further on. Note though that even though it is related to the vectors content of MCV4U, it requires content typically first covered in a first-year undergraduate course on an area of math called linear algebra.

By definition, the resultant of arbitrary polynomials p(x) and q(x) is the determinant of their Sylvester matrix [4]. This matrix is a (n+m)\times(n+m) matrix (where n and m are the degrees of p(x) and q(x) resp.) where:

  • The first row of the matrix is the coefficients of p(x) in decreasing order of subscripts, with 0 entries on the right for any remaining entries
  • The second row is the first row but shifted to the right by one entry, so the first entry is now a 0 and there is one less 0 entry on the right
  • This rule continues for the following rows until there are no more zeros on the right
  • The remaining rows are the same but done with q(x) instead

[5]. For an example, using the p(x) and q(x) from the earlier example, the Sylvester matrix is

\begin{pmatrix} 1 & -4 & -7 & 10 & 0\\ 0 & 1 & -4 & -7 & 10\\ 1 & 3 & -4 & 0 & 0\\ 0 & 1 & 3 & -4 & 0\\ 0 & 0 & 1 & 3 & -4 \end{pmatrix}

The resultant of these particular p(x) and q(x) is then the determinant of this matrix. This can also be computed using WolframAlpha with the command

determinant {{1,-4,-7,10,0},{0,1,-4,-7,10},{1,3,-4,0,0},{0,1,3,-4,0},{0,0,1,3,-4}}

The Sylvester matrix is named after James Joseph Sylvester, an English mathematician in the 1800s [6].



Applications of Lebesgue integration to elementary analysis problems

In elementary analysis one might have come across the problem of showing

{\displaystyle \sum_{n=1}^{\infty}\sum_{k=1}^{\infty}a_{nk}=\sum_{k=1}^{\infty}\sum_{n=1}^{\infty}a_{nk}}

for a_{nk} \geq 0 . It turns out (Real and Complex Analysis, Rudin) that there is a more advanced proof using Lebesgue integration with respect to a counting measure!

Specifically, we can consider the counting measure \mu : P(\mathbb{Z}^+) \rightarrow [0,\infty] where P(\mathbb{Z}^+) is the power set of \mathbb{Z}^+ , so that \mu(E) is the cardinality of E if E is finite, and otherwise it is \infty (e.g. \mu(\{1,5,8\})=3 while \mu(\{1,3,5,\ldots\})=\infty ). Indeed the domain is P(\mathbb{Z}^+) so that in fact all functions f : \mathbb{Z}^+ \rightarrow [0,\infty] are measurable.

It turns out that integration of such a function f with respect to the counting measure is

\displaystyle \int_{\mathbb{Z}^+} f d\mu = \sum_{k=1}^{\infty} f(k)

That is, integrals with respect to the counting measure are just sums (if you are interested, see this document for a proof of this). A consequence (Real and Complex Analysis, Rudin) of Lebesgue’s Monotone Convergence Theorem is: If f_n : \mathbb{Z}^+ \rightarrow [0,\infty] (are measurable) and

\displaystyle f(x) = \sum_{n=1}^\infty f_n(x)


\displaystyle \int_{\mathbb{Z}^+} f d\mu = \sum_{n=1}^\infty \int_{\mathbb{Z}^+} f_n d\mu

From what was noted earlier the integrals can be replaced with sums:

\displaystyle \sum_{k=1}^\infty \sum_{n=1}^\infty f_n(k) = \sum_{k=1}^\infty f(k) = \sum_{n=1}^\infty \sum_{k=1}^\infty f_n(k) 

Consequently simply choosing f_n(k) = a_{nk} , which is measurable because as noted earlier all functions are measurable with respect to the counting measure, does the trick! This completes the proof.

Continuous derivative of the Fourier transform of a function

Define L_{\mathrm{bc}}^{1}\left(\mathbb{R}\right) to be the set of functions f:\mathbb{R}\rightarrow\mathbb{C} which are continuous, bounded, and satisfy
{\displaystyle \left\Vert f\right\Vert _{1}=\int_{-\infty}^{\infty}\left|f\right|<\infty}
Then given f\in L_{\mathrm{bc}}^{1}\left(\mathbb{R}\right) , define its Fourier transform \hat{f} to be
{\displaystyle \hat{f}\left(y\right)=\int_{-\infty}^{\infty}f\left(x\right)e^{-2\pi ixy}dx}
In my readings of the Fourier transform I encountered details about its derivative. In particular, if g\left(x\right)=-2\pi ixf\left(x\right) \in L_{\mathrm{bc}}^{1}\left(\mathbb{R}\right) , then \hat{f} is continuously differentiable and \hat{f}\,^{\prime}=\hat{g} . The proof I read of it left much to be desired, or at least much was left to the reader. I ended up putting considerable effort into a complete proof, using various resources from internet searches, and I thought I would give back with this proof itself. Link to proof: Continuous derivative of the Fourier transform of a function

Complex Fourier series calculations

Consider a function f:\mathbb{R}\rightarrow\mathbb{C} . Recall  f is periodic (with period  1 ) iff  \forall x f(x+1)=f(x) . For example,  e_{k}=e_{k}(x)=e^{2\pi ikx} is periodic:

e_{k}\left(x+1\right)=e^{2\pi ikx}e^{2\pi ik}=e^{2\pi ikx}\cdot1=e_{k}\left(x\right)

recalling that e^{iy}=\cos y+i\sin y for y\in\mathbb{R} . Assume  f is periodic and continuous. Define, for  k\in\mathbb{Z} ,

{\displaystyle c_{k}=c_{k}(f)=\int_{0}^{1}f(x)\overline{e_{k}}dx=\int_{0}^{1}f(x)e^{-2\pi ikx}dx}

The c_{k} are called the Fourier coefficients of  f , while the Fourier series is

{\displaystyle \sum_{k=-\infty}^{\infty}c_{k}e_{k}}

Let’s try an example. Take f\left(x\right)=x . Of course, this isn’t periodic. However implicity it is meant that \left.f\right|_{\left[0,1\right]}\left(x\right)=x , and f is the periodic extension of this mapping, i.e. \left.f\right|_{[1,2]}=x-1 \left.f\right|_{[-1,0]}=x+1 , etc. For k\neq0 ,

\begin{array}{ccc} c_{k} & = & {\displaystyle \int_{0}^{1}xe^{-2\pi ikx}}\\ \\ & = & {\displaystyle \int_{0}^{1}x\left(\cos\left(2\pi kx\right)-i\sin\left(2\pi kx\right)\right)}\\ \\ & = & {\displaystyle \int_{0}^{1}x\cos\left(2\pi kx\right)-i\int_{0}^{1}x\sin\left(2\pi kx\right)}\end{array}

Evaluating these (real) integrals individually, it is found that

{\displaystyle c_{k}=0-i\left(-\frac{1}{2\pi k}\right)=\frac{i}{2\pi k}}

Also c_{0}=\int_{0}^{1}x=1/2 . Hence the Fourier series of f is

{\displaystyle \sum_{k=-\infty}^{\infty}\frac{i}{2\pi k}e^{2\pi ikx}}

implicitly having the c_{0} -th term to be 1/2 , not i/\left(2\pi\cdot0\right) . However, immediately this a bit difficult to deal with, say if pointwise convergence were in question!

To deal with this, we derive an equivalent representation of the Fourier series. Notice that in general,

{\displaystyle \sum_{k=-n}^{n}c_{k}e_{k}=c_{0}+\sum_{k=1}^{n}(c_{-k}e_{-k}+c_{k}e_{k})}

Continuing, for k\ne0 , \overline{e_{k}}=e^{-2\pi kx}=e_{-k} . Similarly,

\begin{array}{ccc}c_{-k} & = & {\displaystyle \int_{0}^{1}f(x)e^{-2\pi i(-k)x}}\\ \\ & = & {\displaystyle \int_{0}^{1}f(x)e_{k}}\\ \\ & = & {\displaystyle \int_{0}^{1}f(x)\cos\left(2\pi kx\right)+i\int_{0}^{1}f(x)\sin\left(2\pi kx\right)}\end{array}

The same work shows

{\displaystyle c_{k}=\int_{0}^{1}f(x)\cos\left(2\pi kx\right)-i\int_{0}^{1}f(x)\sin\left(2\pi kx\right)}

It is then immediate that  c_{-k}=\overline{c_{k}} . Hence


recalling that  \forall z\in\mathbb{C} , \overline{z}+z=2\mathrm{Re}(z) . Consequently the equivalent expression

{\displaystyle c_{0}+\sum_{k=-n}^{n}2\mathrm{Re}(c_{k}e_{k})}

for the partials sums is obtained, thereby obtaining the equivalent expression

{\displaystyle c_{0}+\sum_{k=1}^{\infty}2\mathrm{Re}(c_{k}e_{k})}

for the Fourier series of f . This is much easier to work with for calculations. For example the recently calculated Fourier series can now be written as

{\displaystyle \sum_{k=1}^{\infty}2\mathrm{Re}\left(\frac{1}{2\pi k}\left(i\cos\left(2\pi kx\right)+i^{2}\sin\left(2\pi kx\right)\right)\right)=\sum_{k=1}^{\infty}-\frac{1}{\pi k}\sin\left(2\pi kx\right)}

Since 1/k\rightarrow0 and are decreasing, for pointwise convergence it is sufficient to show that

{\displaystyle \sum_{k=1}^{n}\sin\left(2\pi kx\right)}

are bounded. But this is true, and a common proof is due to Dirichlet (e.g. as explained in this post–easier to show \sum_{k=1}^{n}\sin\left(ky\right) is bounded and substitute y=2\pi x ). But more than pointwise convergence is true: Uniform convergence of the Fourier series of f\left(x\right)=x is guaranteed, and in fact to f . This by a theorem since f is continuous, periodic, and piecewise continuously differentiable.