Learning agents in Black–Scholes financial markets

Black–Scholes (BS) is a remarkable quotation model for European option pricing in financial markets. Option prices are calculated using an analytical formula whose main inputs are strike (at which price to exercise) and volatility. The BS framework assumes that volatility remains constant across all strikes; however, in practice, it varies. How do traders come to learn these parameters? We introduce natural agent-based models, in which traders update their beliefs about the true implied volatility based on the opinions of other agents. We prove exponentially fast convergence of these opinion dynamics, using techniques from control theory and leader-follower models, thus providing a resolution between theory and market practices. We allow for two different models, one with feedback and one with an unknown leader.


Introduction
Derivative contracts are actively traded across the world's financial markets with a total estimate worth in the trillions of dollars.To get an intuitive understanding of the setting and the issues at hand let's consider the prototypical example of European options.
A European option is the right to buy or sell an underlying asset at some point in the future at a fixed price, also known as the strike.A call option gives the right to buy an asset and a put option gives the right to sell an asset at the agreed price.On the opposite side of the buyer is the seller who has relinquished his control of exercise.Buyers of puts and calls can exercise the right to buy or sell.Sellers of options have to fulfill obligations when exercised against.The payoff of a buyer of a call option with stock price S T at expiry time T and exercise price K is max{S T − K, 0}, whereas for a put option is max{K − S T , 0}.
To get a price we input the current stock price S 0 (e.g.$101), the exercise price K (e.g.$90), the expiry T (e.g. three months from today) and the volatility σ in the Black-Scholes (BS) formula and out comes the answer, the quoted price of the instrument [5].Price = BS(S 0 , K, T, σ).
Volatility, which captures the beliefs about how turbulent the stock price will be, is left up to the market.This parameter is so important that in practice the market trades European calls and Figure 1: (a) A typical implied volatility smile for varying strikes K divided by fixed spot price.Moneyness is K/S 0 .ATM denotes at-the-money where K equals S 0 , (b) Consensus occurs as all investors' opinions of the implied volatility converge, round by round, to a distinct value for varying strikes.
puts by quoting volatilities. 1How does the market decide about what the quoted volatility should be (e.g. for a stock index in 3 months from now) is a critical, but not well understood, question.This is exactly what we aim to study by introducing models of learning agents who update their beliefs about the volatility.
Our contribution.We introduce two different classes of learning models that converge to a consensus.The first introduces a feedback mechanism (Section 3.1, Theorem 3.1) where agents who are off the true "hidden" volatility parameter feel a slight (even infinitesimally so) pull towards it along with the all the other "random" chatter of the market.This model captures the setting where traders have access to an alternative trading venue or an information source provided by brokers and private message boards.The second model incorporates a market leader (e.g.Goldman Sachs) that is confident in its own internal metrics or is privy to client flow (private information) and does not give any weight to outside opinions (Section 3.2,Theorem 3.3).Proving the convergence results (as well as establishing the exponentially fast convergence rates) requires tools from discrete dynamical systems.We showcase as well as complement our theoretical results with experiments (e.g.Figures 2.a-2.d),which for example show that if we move away from our models convergence is no longer guaranteed.
Options can be struck at different strike prices on the same asset (e.g.K = $90, $75, $60).If the underlying asset and the time to exercise T (e.g. 3 months) are the same, one would expect the volatility to be the same at different strikes.In practice, however, the market after the 1987 crash has evolved to exhibit different volatilities.This rather strange phenomenon is referred to as the smile, or smirk (see figure 1).Depending on the market, these smirks can be more or less pronounced.For instance, equity markets display a strong skew or smirk.A symmetric smile is more common in foreign exchange options markets.An excellent introduction to volatility smiles is given in [8].
We formalize the multi-dimensional analogues of our two models above using Kronecker products (Section 4, Theorems 4.1 and 4.3).Thus our models show how a volatility curve could function as a global attractor given adaptive agents.We conclude the paper by discussing future work on identifying necessary structural conditions on the shape of arbitrage free volatility curves.

Model description
In mathematical opinion dynamic models agents take views of other agents into account before arriving at their own updated estimate.Agents can observe other agents' previous signals.
Degroot [6] was one of the early developers of such observational learning dynamics.While simple, these models allow us to examine convergence to consensus.In a sense these type of models are called naive models, as agents can recall perfectly what the other players submitted in the previous round.

Volatility Basics
Investors have an initial opinion of the implied volatility, which subsequently gets updated after taking into account volatilities of other agents.A feedback mechanism aids the agents in arriving at the true volatility parameter.
At all times the focus is on a static picture of the volatility smile.Within this static framework agents are updating their opinion of the true implied volatility.This updating occurs in a highfrequency sense.In an exchange setting, one can think of all bids and offers as visible to agents.The agents initially are unsure of the true value of the implied volatility, but by learning -and feedbackget to the true parameter.Our first attempt is a naive learning model common in social networks.Learning occurs between trading times.Thus our implicit assumption is that no transactions occur while traders are adjusting and learning each others quotes.
This rather peculiar feature is market practice.Trading happens at longer intervals than quote updating.This is as true for high frequency trading of stocks as it is for options markets.Quotes and prices -or rather vols -are changing more frequently than actual transactions.
Each dollar value of an option corresponds to an implied volatility parameter σ(K, T ) ∈ (0, 1) that depends on strike and expiry.Implied volatility is quoted in percentage terms.
Assumption 2.1.We have three types of players: agents/traders, brokers and leaders.Brokers give feedback to the traders.The ability of agents to determine this feedback is their learning ability.Leaders are unknown and don't give feedback but their quotes are visible.
Each agent takes a weighted average of the all the agents' estimates of volatility at a particular strike and expiry.

Naive Opinion Dynamics
A first approach towards opinion dynamics is to assume each agent takes a weighted average of other agents' opinions and updates his own estimate of the volatility parameter for the next period, i.e., at time t, the opinion x i t ∈ R of the i-th agent is given by where x j t−1 ∈ R is the opinion of agent j at time (t − 1) and a ij ≥ 0 denotes the opinion weights for the n investors with n j=1 a ij = 1 and a ii > 0 for all 1 ≤ i ≤ n.Define X t := (x 1 t , . . ., x n t ) ; then, the opinion dynamics of the n agents can be written in matrix form as follows where A := a ij ∈ R n×n is a row-stochastic matrix.

Definition 2.2 (consensus).
The n agents (2) are said to reach consensus if for any initial condition Definition 2.3 (consensus to a point).The n agents (2) are said to reach consensus to a point if for any initial condition X 1 ∈ R n , lim t→∞ X t = c1 n , where 1 n denotes the n × 1 vector composed of only ones and c ∈ R. The constant c is often referred to as the consensus value.
For the opinion dynamics (2), we introduce the following result by [6] (see also [14] for definitions).
Proposition 2.4.Consider the opinion dynamics in equation (2).If A is aperiodic and irreducible, then for any initial condition X 1 ∈ R n consensus to a point is reached.The consensus value c depends on both the matrix A and the initial condition X 1 .
Remark 2.5.Proposition 2.4 implies that if the row stochastic opinion matrix A is aperiodic and irreducible; then all the agents converge to some consensus value c.However, since c depends on the unknown initial opinion X 1 , the consensus value c is unknown and, in general, different from the true volatility σ(K, T ).We wish to alleviate this and thus introduce two novel models.
3 Consensus (scalar agent dynamics) In this section, we assume that the agents are able to learn how far off they are from the true volatility by informational channels in the marketplace.There are many avenues, platforms and private online chat rooms that provide quotes for option prices; some of these are stale and some are fresh.The agents' learning ability determines the quality of the feedback from all these sources.
We aggregate all of this information in the form of a feedback controller.If they are fast learners, they adjust their volatility estimates quickly.

Consensus with Feedback
We model this feedback by introducing an extra driving term into the opinion dynamics (1).In particular, we feedback the difference between the agents' opinion and the true volatility σ(K, T ) scaled by a learning coefficient i ∈ (0, 1).We assume that σ(K, T ) is invariant, i.e., for some fixed σ ∈ (0, 1), σ(K, T ) = σ for some fixed strike K and maturity M .Then, the new model is written as follows or in matrix form where E := diag( 1 , . . ., n ).Then, we have the following result.
Proof.It is easy to verify that the solution X t of the difference equation ( 4) is given by By Gershgorin circle theorem, the spectral radius ρ(A − E) < 1 for all i, i < a ii .It follows that , where I n denoted the identity matrix of dimension n, and lim t→∞ (A − E) t = 0, see [9].The matrix A is row stochastic; then, (I − A)1 n = 0 n , where 0 n denotes the n × 1 vector composed of only zeros.Hence, we can write E1 n = (I n − A)1 n + E1 n ; and consequently and the assertion follows.
, where • ∞ denotes the matrix norm induced by the vector infinity norm.
Proof.Define the error sequence E t−1 := (X t−1 − σ1 n ) ∈ R n .Then, from (4), the following is satisfied: The last equality in the above expression follows from the fact that (A − I n )1 n = 0, because A is a stochastic matrix.The solution E t of the above difference equation is given by E , where E t = (e 1 t , . . ., e n t ) T .Note that exponential convergence of E t ∞ implies exponential convergence of E t itself.Using the solution E t = (A − E) t−1 E 1 , the following can be written: where (A − E) ∞ denotes the matrix norm of (A − E) induced by the vector infinity norm [9].
The inequality

Consensus with an unknown leader
One criticism of model ( 4) is that feedback, even if it is not perfect, has to be learned.In practice, there might not be a helpful mechanism that provides feedback.An alternative is to have an unknown leader embedded in the set of traders.The agents are unsure who the leader is but by taking averages of other traders, they all arrive at the opinion of the leader.In markov chain theory, such behaviour is called an absorbing state.The leader guides the system to the true value.We assume that the identity of the leader is unknown to all agents.
Without loss of generality, we assume that the first agent (with corresponding opinion x 1 t ) is the leader; it follows that x 1 1 = σ, a 1i = 0, i ∈ {2, • • • , n}, and a 11 = 1.Then, in this configuration, the opinion dynamics is given by with a ij ≥ 0, n j=1 a ij = 1, a ii > 0 for all 1 ≤ i ≤ n, and for at least one i, n j=2 a ij < 1. Theorem 3.3.Consider the opinion dynamics (6) and assume that the matrix Ã is substochastic and irreducible.It holds that lim t→∞ X t = σ1 n , i.e., consensus to σ is reached.
Proof.Define the invertible matrix M ∈ R n×n Introduce the set of coordinates Xt− Hence, if the error vector e t−1 := (x 2 t−1 , . . ., xn t−1 ) = 0 n−1 , then consensus to x 1 t = σ is reached.Note that where 0 denotes the zero vector of appropriate dimensions and Ã as defined in (6).By construction, Xt− Xt−1 ; hence, the consensus error e t satisfies the following difference equation and the solution of e t is then given by e t = Ãt e 1 .Because for at least one i, n j=2 a ij < 1 and Ã is substochastic and irreducible, the spectral radius ρ( Ã) < 1, see Lemma 6.28 in [14]; it follows that lim t→∞ Ãt = 0. Therefore, lim t→∞ e t = 0 and the assertion follows.
Proof.See Lemma 5.6.10 in [9] on how to construct such a • * .Now consider the consensus error e t defined in the proof of Theorem 3.3, which evolves according to the difference equation (7).It follows that e t = Ãt−1 e 1 , where e 1 denotes the initial consensus error.Under the assumptions of Theorem 3.3, ρ( Ã) < 1.By Lemma 5.6.10 in [9], ρ( Ã) < 1 implies that there exists some matrix norm, say • * , such that Ã * < 1.We restate the error with norms and obtain e t ∞ ≤ Ã t−1 ∞ e 1 ∞ .Because all norms are equivalent in finite dimensional vector spaces (see Chapter 5 in [9]), e 1 ∞ for some positive constant C ∈ R >0 .As Ã * < 1, the norm of the consensus error e t ∞ converges to zero exponentially with rate Ã * .

Consensus (vectored agent dynamics)
In this section, we suppose that agents have beliefs over a range of strikes.Thus, each agent's opinion of the volatility curve is a vector with each entry corresponding to a particular strike.Typically, in markets, options are quoted for at-the-money (atm) K = S 0 and for two further strikes left of and right of the atm level.Here, we examine the case of k strikes and n agents, i.e., each agent i now has k quotes for k different moneyness levels.In this configuration, the true volatility is σ :

Consensus with Feedback
Again, we assume that each agent takes a weighted average of other agents' opinions and updates its volatility estimate vector for the next period, i.e., at time t, the opinion x i t ∈ R k of the i-th agent is given by where i ∈ (0, 1) denotes the learning coefficient of agent i, x j t−1 ∈ R k is the opinion of agent j at time (t − 1), and a ij ≥ 0 denotes the opinion weights for the n investors with n j=1 a ij = 1 and a ii > 0 for all 1 ≤ i ≤ n.In this case, the stacked vector of opinions is X t := (x 1 t , . . ., x n t ) , X t ∈ R kn .The opinion dynamics of the n agents can then be written in matrix form as follows where A = a ij ∈ R n×n is a row-stochastic matrix, E = diag( 1 , . . ., n ), and ⊗ denotes Kronecker product.We have the following result.
Theorem 4.1.Consider the opinion dynamics in (9) and assume that i ∈ (0, a ii ), i = {1, . . ., n}; then, consensus to Proof.Define the error sequence e t−1 := X t−1 − (1 n ⊗ σ).Note that e t−1 = 0 implies that consensus to (1 n ⊗ σ) is reached.Given the opinion dynamics (9), the evolution of the error e t−1 satisfies the following difference equation It is easy to verify that, because A is stochastic, (A − I n )1 n = 0 n .Then, the error dynamics simplifies to and consequently, the solution e t of ( 10) is given by e t = ((A − E) ⊗ I k ) t e 1 .By properties of the Kronecker product and Gershgorin's circle theorem, the spectral radius ρ(A − E) < 1 for i ∈ (0, a ii ).It follows that lim t→∞ ((A − E) ⊗ I k ) t = 0, see [9].Therefore, lim t→∞ e t = 0 kn and the assertion follows.
Corollary 4.2.Consensus to σ is reached exponentially with the convergence rate given by The proof of the above result is very similar to previous corollaries and is omitted.

Consensus with an unknown leader
Similarly to the scalar case; here, we assume that there is a leader driving all the other agents through the opinion matrix A. Again, without loss of generality, we assume that the first agent (with corresponding opinion , and a 11 = 1.Then, in this configuration, the opinion dynamics is given by with a ij ≥ 0, n j=1 a ij = 1, a ii > 0 for all 1 ≤ i ≤ n, and for at least one i, n j=2 a ij < 1. Theorem 4.3.Consider the opinion dynamics (11) and assume that the matrix Ã is substochastic and irreducible; then, consensus to 1 n ⊗ σ is reached, i.e., lim t→∞ X t = 1 n ⊗ σ.
The proof of Theorem 4.3 follows the same line as the proof of Theorem 3.3 and it is omitted here.
Corollary 4.4.Let • * denote some matrix norm such that Ã * < 1, then consensus to σ is reached exponentially with convergence rate
For the leader case, the opinion weights matrix is constructed by replacing the first row of A by (1, 0, . . ., 0).The corresponding matrix Ã (defined in 6) is substochastic and irreducible, and i=10 i=2 a ij < 1, j = 1, . . ., 10. Hence, all the conditions of Theorem 3.3 are satisfied and consensus to σ = 0.375 is expected.Figure 2(d) shows the corresponding simulation results.Finally, Figure 3 shows the evolution of the vectored opinion dynamics (9) with n = 10 and k = 3 (i.e., ten three dimensional agents), matrix A as in the case with feedback, (vectored) volatility σ = (0.67, 0.22, 0.88) , learning parameters i = 0.9a ii for a ii as in A, and initial condition 1 k ⊗ X 1 with X 1 as in the first experiment above.

Arbitrage Bounds
We have taken the true volatility parameter as exogenous to our models.Our only requirement is that there is no static arbitrage, by which we mean that all the quotes in volatility which translate to option prices are such that one cannot trade in the different strikes to create a profit.Checking whether a volatility surface is indeed arbitrage free is nontrivial, nevertheless some sufficient conditions are well known [4].As long as the volatility surface satisfies them our analysis implies global stability towards an arbitrage free smile.
We parameterize the volatility function (assuming expiry T and S 0 are fixed) and denote the option price as BS(K, σ(K)) BS(S 0 , K, T, σ(K)).
Our attention is on varying K, to ensure no static arbitrage.We assume that the σ(K) translates into unique call option dollar prices, which follows from the strictly positive first derivative of the option price with respect to σ.
How these arbitrage-free curve volatility conditions are developed is not an easy task: see an account by [13].Delving into this topic would take us further into stochastic analysis and away from the focus of this paper.

Connections and Conclusion
Recently, there has been some rather interesting work on the intersection of computer science and option pricing.In [7] the authors showed how to use efficient online trading algorithms to price the current value of financial instruments, deriving both upper and lower bounds using online trading algorithms.Moreover, [2,1] developed Black-Scholes price as sequential two-player zerosum game.Whilst these papers made an excellent start to bridge the gap between two different academic communities -mainly mathematical finance and theoretical computer science -they do not address the reality of volatility smiles and trading.Our contribution can be viewed as making these connections more concrete.The smile itself is a conundrum and there have even been articles questioning whether it can be solved [3].The traditional way from the ground up is to develop a stochastic process for the volatility and asset price, possibly introducing jumps or more diffusions through uncertainty [10].Such models have been successfully developed, but the time is ripe to incorporate multi-agent models with arbitrage free curves.
Combining learning agents in stochastic differential equation models [15], such as the Black-Scholes model, is an exciting proposition.Moreover, opinion dynamics as a subject on its own has been studied quite extensively.Recent references that present an expansive discussion are [12,11].
In this paper, we introduce models of learning agents in the context of option trading.A key open question in this setting is how the market comes to a consensus about market volatility, which is reflected in derivative pricing through the Black-Scholes formula.The framework we have established allows us to explore other areas.Thus far, we took the smile as an exogenous object, proving convergence to equilibrium beliefs.A natural step forward would be to look at the beliefs as probability measures, where each measure corresponds to a different option pricing model.Our learning models focus on interaction between agents.Actually, agents can be interpreted as algorithms.Each algorithm corresponding to a particular belief of a pricing model.

Figure 2 :
Figure 2: Evolution of the agents' dynamics (4): (a) without learning, (b) with learning and i satisfying the conditions of Theorem 3.1, (c) with learning and i not satisfying the conditions of Theorem 3.1, and (d) Evolution of the agents' dynamics with leader (6).