\(\sqrt{\left(x-z\right)^2}=\left|x-z\right|\) and \(\sqrt{\left(x-y\right)^2}+\sqrt{\left(y-z\right)^2}=\left|x-y\right|+\left|y-z\right|\)
From Triangle inequality (Also proven in previews question) We know that
\(\left|x-z\right|\le\left|x-y\right|+\left|y-z\right|\)
Since all 4 properties are satisfied the distance function is a metric
3. Distance function =\(L_3\left(x,y\right)=\sum_{i=1}^d\left(x_{i-}y_i\right)^2\) is a metric. (5 points)
i. non-negativity:-
\(D\left(x,y\right)=\sum_{i=1}^d\left(x_{i-}y_i\right)^2\)
Since \(\left(x-y\right)^2\ \ge0\ \forall\ x,y\ \epsilon\ R\)
therefore \(D\left(x,y\right)\ge0\) ( summation of all positive terms)
ii. Isolation(definiteness)
\(d\left(x,y\right)=0\ \Leftrightarrow\ x=y\)
if x=y \(\Rightarrow\left(x-y\right)^2\ =\left(x-x\right)^2=0\)
d(x,x)=0
Hence \(d\left(x,y\right)=0\ \Leftrightarrow\ x=y\)
iii. Symmetry
\(d\left(x,y\right)=d\left(y,x\right)\)
\(Given\ D\left(x,y\right)=\sum_{i=1}^d\left(x_i-y_i\right)^2\)
Since \(\left(x-y\right)^{2\ }=\left(y-x\right)^2\ \forall\ x,y\ \epsilon\ R\)
Therefore \(\sum_{i=1}^d\left(x_i-y_i\right)^2=\sum_{i=1}^d\left(y_i-x_i\right)^2\)
Hence \(d\left(x,y\right)=d\left(y,x\right)\)
iv. Triangle Inequality
\(d\left(x,z\right)\le d\left(x,y\right)+d\left(y,z\right)\)
To Prove: \(\left(x-z\right)^2\le\left(x-y\right)^2+\left(y-z\right)^2\)
consider x =4 y=2 and z =-2
\(LHS=\left(4+2\right)^2=36\)
\(RHS=\left(4-2\right)^2+\left(2+2\right)^2=4+16=20\)
Since LHS is not less than or equal to RHS triangle inequality fails.
Therefore \(D\left(x,y\right)=\sum_{i=1}^d\left(x_{i-}y_i\right)^2\) is not a metric
Exercise 4 (20 points): Consider a set of n data points x1, . . . , xn.
⢠Assume a random number generator R() that generates values in the interval (0, 1]. Let distance function dR between two points xi and xj with xi 6= xj be dR(xi , xj ) = R(). Also dR(xi , xi) = 0 for every xi . Prove or disprove that dR is a metric. (10 points)
To prove a distance function to be a metric we need to prove the 4 properties discussed in previous question.
i. Symmetry:
\(d\left(x,y\right)=d\left(y,x\right)\)
Since \(d_R\left(x_i,x_j\right)=R\left(\right)\ \ \ \ \left(Here\ Function\ R\ returns\ a\ random\ number\ between\ 0\ and\ 1\right)\)
\(d_R\left(x_i,x_j\right)\ would\ not\ be\ equal\ to\ d_R\left(x_j,x_i\right)\) because R() would be different each time.
Since the distance function does not satisfy the property of Symmetry it is not a metric
- Construct a graph G = (V, E), where a point xi is represented by node vi ā V . For every pair of nodes vi , vj , there exists an undirected edge in G with weight wij = dR(xi , xj ). We define the distance function between two nodes vi and vj , denoted by dG(vi , vj ), be the weight of the shortest path between vi and vj in graph G. Prove or disprove that dG is a metric. (15 points)
To prove a distance function to be a metric we need to prove the 4 properties discussed in previous question.
i. non-negativity:- \(d_G\left(x_{i,}x_j\right)\) gives the shortest path between two nodes \(x_{i\ }and\ x_j\) .Therefore it would be always positive because shortest path can never be negative[Weights in a graph are always positive].
ii.Isolation \(d\left(x_i,y_j\right)=0\ \Leftrightarrow\ x_i=y_j\)
if \(x_i=x_j\), we would get the shortest distance between a node and itself which will always be zero .
iii. Symmetry
\(d\left(x,y\right)=d\left(y,x\right)\)
The shortest distance between two nodes of an un-directed graph would always be same irrespective of the order of the terms as calculating it would involve traversing the exact same nodes for the shortest path. Therefore \(d\left(x,y\right)=d\left(y,x\right)\)
iv. Triangle Inequality
\(d\left(x,z\right)\le d\left(x,y\right)+d\left(y,z\right)\)
\(d\left(x,z\right)\) gives the shortest path between x and z and therefore it will always be shorter or equal to their path via another node y because in case path through y is the shorter then it would be considered as the shortest path between x and z.
Therefore \(d\left(x,z\right)\le d\left(x,y\right)+d\left(y,z\right)\)
Since all 4 properties have been satisfied the distance function is a metric
Excercise 5
i.Prove or disprove that the edit distance function as defined above is a metric.
To prove a distance function to be a metric we need to prove the 4 properties discussed in previous question(q2)
i. non-negativity:- .
Edit distance gives the costof delete,insert or substitute operations which depends on the number of operations required to transform string x into string y and therefore it will always be positive.
ii. Isolation:
\(d\left(x_{ },y_{ }\right)=0\ \Leftrightarrow\ x_{ }=y_{ }\)
if x =y the number of operations required to convert x to y would be zero and hence the edit distance would be zero.
iii. Symmetry
\(d\left(x,y\right)=d\left(y,x\right)\)
Since d(x,y )gives the cost of operations to convert x to y it would be equal to cost of operations to convert y to x as both would involve the same operations.
eg.
x=apple and y=bapple
to convert x to y we need to substitute a with b and to convert y to x we need to substitute b with a. Cost of both the operations would be the same because both involve 1 substitution operation
iv. Triangle inequality
\(d\left(x,z\right)\le d\left(x,y\right)+d\left(y,z\right)\)
As edit distance is the cost of conversion of strings
the cost to convert x into z would be less than converting x into y and y into z
eg.
x=apple y=bapple z=capple
Consider cost of substitution to be 1
d(x,y)=1
d(x,y)+d(y,z)=1+1=2
therefore \(d\left(x,z\right)\le d\left(x,y\right)+d\left(y,z\right)\)
Since all 4 properties have been satisfied the distance function is a metric
2. (10 points:) Find two instantiations of the edit-distance function that are metrics. An instantiation of the edit distance function is defined by a specific way of allocating costs to operations such as deletions, insertions and substitutions.
{ \(D\left(x\left(1..i-1\right),y\left(1..j\right)\right)+delete\left(x\left[i\right]\right),\)
\(D\left(x\left(1...i\right),y\left(1....j\right)\right)=\min\) { \(D(x(1..i),y(1..j-1))+insert(y[j]),\)
{ \(D(x(1..i-1),y(1..j-1))+substitute(y[j]),\)
Instantiation 1.
- Cost of deletion and insertion is equal to 1 and substitution=2 . This Distance is also known as Levenshtein distance.
{ \(D(x(1..i),y(1..j-1))+1,\)
\(D\left(x\left(1...i\right),y\left(1....j\right)\right)=\min\) { \(D(x(1..i),y(1..j-1))+1,\)
{ \(D(x(1..iā1),y(1..jā1))+\) 2 { if \(x\left(i\right)\ne y\left(j\right)\)
0 {\(x\left(i\right)=y\left(j\right)\)