Skip to content
Math

Cosine Similarity Calculator

Enter two vectors as comma-separated numbers to calculate their cosine similarity, the angle between them in degrees, and the cosine distance. Works for any pair of vectors from 2 to 6 dimensions. The step-by-step panel shows the dot product, both magnitudes, and how each component contributes to the result.

Your details

Number of components in each vector. Both vectors must have the same number of components.
First component of vector A.
Second component of vector A.
Third component of vector A.
First component of vector B.
Second component of vector B.
Third component of vector B.
Cosine SimilarityNearly identical direction
0.974632

Ranges from -1 (opposite) to +1 (identical direction)

Angle Between Vectors12.9332degrees
Cosine Distance0.025368
Dot Product32
Magnitude |A|3.741657
Magnitude |B|8.774964
0.974632
Opposite<-0.5Dissimilar-0.5-0Moderate0-0.5Similar0.5-0.95Identical dir.0.95+

Cosine similarity of 0.9746 - the vectors point in nearly the same direction.

  • The angle between the vectors is 12.93 degrees.
  • Cosine distance (1 - similarity) is 0.0254. Distance of 0 means identical direction, 2 means fully opposite.
  • Cosine similarity measures only the angle, not the length of the vectors. Scaling either vector by any positive constant leaves the similarity unchanged.
  • The dot product is 32.0000. Its sign matches the sign of the similarity because magnitudes are always positive.

Next stepIn NLP and search engines, vectors often represent word counts or TF-IDF weights, and a similarity above 0.7 typically indicates closely related documents.

Formula

SC(a,b)=abab=i=1naibii=1nai2i=1nbi2,θ=arccos(SC),DC=1SC\mathrm{SC}(\mathbf{a}, \mathbf{b}) = \frac{\mathbf{a} \cdot \mathbf{b}}{\|\mathbf{a}\|\,\|\mathbf{b}\|} = \frac{\sum_{i=1}^{n} a_i b_i}{\sqrt{\sum_{i=1}^{n} a_i^2}\,\sqrt{\sum_{i=1}^{n} b_i^2}}, \quad \theta = \arccos(\mathrm{SC}), \quad D_C = 1 - \mathrm{SC}

Worked example

Vectors A = (1, 2, 3) and B = (4, 5, 6): dot product = 1*4 + 2*5 + 3*6 = 32. |A| = sqrt(1+4+9) = 3.742. |B| = sqrt(16+25+36) = 8.775. Cosine similarity = 32 / (3.742 * 8.775) = 0.974630. Angle = arccos(0.974630) = 12.93 degrees. Cosine distance = 1 - 0.974630 = 0.025370.

What is cosine similarity?

Cosine similarity is a measure of how similar two vectors are based purely on the angle between them, regardless of their lengths. Given vectors A and B, it is defined as the dot product of A and B divided by the product of their Euclidean magnitudes. The result is the cosine of the angle separating the two vectors, which ranges from -1 (pointing in exactly opposite directions) through 0 (perpendicular, or orthogonal) to +1 (pointing in exactly the same direction). Because it ignores vector length, cosine similarity is especially useful in high-dimensional spaces such as text analysis, recommendation systems, and image embeddings, where the magnitude of a vector often just reflects document length or encoding scale rather than content.

How cosine similarity is calculated

The calculation involves three steps. First, compute the dot product: multiply each pair of corresponding components and add the products together. For A = (a1, a2, a3) and B = (b1, b2, b3), the dot product is a1*b1 + a2*b2 + a3*b3. Second, compute the Euclidean magnitude of each vector: for A, take the square root of a1^2 + a2^2 + a3^2, and similarly for B. Third, divide the dot product by the product of the two magnitudes. The angle between the vectors is then the arccosine of this result. Cosine distance, a related but distinct measure, is simply 1 minus the cosine similarity, giving a value between 0 (identical direction) and 2 (exactly opposite direction). Note that cosine distance is not a proper metric because it does not satisfy the triangle inequality.

Applications in NLP and machine learning

Cosine similarity is the backbone of many information retrieval and recommendation systems. In natural language processing, documents or sentences are converted to vector representations such as TF-IDF weights, word embeddings, or sentence embeddings. Comparing those vectors with cosine similarity returns a score between -1 and 1 that reflects how similar the texts are in meaning or topic, regardless of how long each document is. A similarity above about 0.7 is generally considered high in NLP tasks. In recommendation engines, user preference vectors and item feature vectors are compared the same way. In image retrieval, feature vectors from convolutional neural networks are compared by cosine similarity to find visually similar images. The measure also appears in clustering, anomaly detection, and semantic search.

Cosine similarity vs. Euclidean distance

Cosine similarity and Euclidean distance answer different questions. Euclidean distance measures how far apart two points are in space; it is sensitive to both the direction and the magnitude of the vectors. Cosine similarity measures only the angle and is completely insensitive to magnitude. Two vectors that differ only in scale - say (1, 2) and (10, 20) - have cosine similarity 1 (identical direction) but a large Euclidean distance. This makes cosine similarity the right tool when the important information is in the pattern or shape of the values, not their absolute size. When magnitudes do carry information, for example comparing actual physical distances or prices, Euclidean distance is more appropriate. In practice, normalized vectors (those with magnitude 1) yield the same ranking from both measures, so for normalized data the choice does not matter.

Cosine similarity interpretation guide

Similarity rangeAngle range (degrees)InterpretationTypical use
1.000Identical directionExact duplicate documents
0.95 to 1.000 to 18Nearly identicalNear-duplicate detection
0.70 to 0.9518 to 46Highly similarStrongly related topics
0.40 to 0.7046 to 66Moderately similarRelated but distinct
0.00 to 0.4066 to 90Low similarityLoosely related
-1.00 to 0.0090 to 180Dissimilar / oppositeUnrelated or opposing

Standard interpretation ranges used in information retrieval and machine learning.

Frequently asked questions

What does a cosine similarity of 1 mean?

A cosine similarity of 1 means the two vectors point in exactly the same direction. This happens when one vector is a positive scalar multiple of the other, for example (1, 2, 3) and (2, 4, 6). The angle between them is 0 degrees. In practice, a value at or very close to 1 means the items the vectors represent are virtually identical in terms of the characteristic being measured.

Can cosine similarity be negative?

Yes. Cosine similarity ranges from -1 to +1. A negative value means the angle between the vectors is greater than 90 degrees, indicating the vectors lean in opposite directions. In NLP with non-negative TF-IDF or bag-of-words vectors, the similarity is always 0 to 1, so you only see negative values when the input vectors can have negative components, such as with word embeddings like Word2Vec or GloVe, or PCA-reduced features.

What is cosine distance and how is it different from cosine similarity?

Cosine distance is 1 minus the cosine similarity. It maps the similarity score to a range of 0 (identical direction) to 2 (opposite direction), which can feel more intuitive as a distance: a higher value means greater dissimilarity. However, cosine distance is not a true metric because it does not satisfy the triangle inequality, so it should not be used with algorithms that require a proper metric, such as some clustering methods.

What happens when one vector is all zeros?

Cosine similarity is undefined when either vector is the zero vector, because the magnitude of a zero vector is 0 and division by zero is undefined. This calculator returns no result in that case. In machine learning pipelines, zero vectors usually indicate missing data or an item with no features, and they are typically filtered out or replaced before computing similarity.

Does vector length (number of dimensions) affect the similarity?

The number of dimensions does not directly change what a given similarity score means - a score of 0.9 indicates highly similar direction whether the vectors have 2 components or 2,000. However, in very high-dimensional spaces the range of cosine similarities between random vectors concentrates near 0, so distinguishing similar from dissimilar items requires more careful threshold selection.

Is cosine similarity the same as Pearson correlation?

They are closely related but not identical. Pearson correlation is the cosine similarity of the mean-centered vectors: each component has its mean subtracted before computing the cosine. If your vectors are already mean-centered (each component minus the average of all components in that vector), then cosine similarity and Pearson correlation give the same result. For raw, un-centered vectors they can differ substantially.

Sources

Written by Dr. Rajiv Menon, PhD Applied Mathematician · Bengaluru, India

Applied mathematician bridging algebraic theory and computational tools for students, engineers, and everyday problem-solvers.

Search 3,500+ calculators

Loading search…