MaybeCoffee

1. Mission

We believe in algorithmic transparency and open-source principles. Unlike many matching platforms that treat their algorithms as proprietary black boxes, we are committed to full transparency. We've invested significant effort in developing a sophisticated compatibility matching algorithm because we genuinely care about creating meaningful connections—not just maximizing engagement metrics or user retention. This document provides a complete mathematical specification of our approach, allowing users to understand exactly how their matches are determined.

2. Introduction

Our compatibility matching algorithm consists of three distinct components: (1) Vector Similarity, which measures semantic alignment in embedding space; (2) Neural Network Compatibility, which captures non-linear relationships between personality traits and values; and (3) Social Shipping, which incorporates community-driven signals into the compatibility score.

Our algorithm employs a multi-faceted approach that integrates natural language processing, neural network technology, and established psychological research frameworks to identify compatible personality matches. Unlike traditional matching systems that rely exclusively on multiple-choice responses, our methodology analyzes nuanced language patterns in open-ended questions, processes structured data through validated psychological frameworks, and leverages machine learning models trained on extensive relationship research datasets.

The algorithm is designed to identify users with similar personality traits and values, based on research demonstrating that similarity in core dimensions predicts relationship success and satisfaction.

3. Mathematical Framework

Each user's responses are encoded into a multi-dimensional feature vector that captures semantic, psychological, and behavioral characteristics. Let $\vec{a} \in \mathbb{R}^d$ and $\vec{b} \in \mathbb{R}^d$ represent the feature vectors for users A and B respectively, where $d$ is the dimensionality of the feature space. These vectors are constructed through a multi-stage encoding process:

$\vec{a}_s$ : semantic embeddings from open-ended responses (via language models)
$\vec{a}_p$ : personality trait vectors from structured assessments (MBTI, numerical ratings)
$\vec{a}_m$ : metadata features (preferences, demographics, behavioral indicators)

The complete feature vector is the concatenation: $\vec{a} = [\vec{a}_s; \vec{a}_p; \vec{a}_m]$ .

4. Core Compatibility Function

The compatibility score between two users is computed as a weighted linear combination of two normalized components, with an additional social shipping component:

f(\vec{a}, \vec{b}) = 0.9 \cdot (\alpha \cdot v(\vec{a}, \vec{b}) + \beta \cdot n(\vec{a}, \vec{b})) + 0.01 \cdot s(\vec{a}, \vec{b})

Where:

v(\vec{a}, \vec{b}) \in [0, 1]

= normalized vector similarity component (cosine similarity)

n(\vec{a}, \vec{b}) \in [0, 1]

= normalized neural network compatibility score

s(\vec{a}, \vec{b}) \in [0, 10]

= number of ships between users A and B

\alpha = 0.5, \beta = 0.5

= component weights satisfying

\alpha + \beta = 1

f(\vec{a}, \vec{b}) \in [0, 1]

= final compatibility score

The base compatibility score (from vector similarity and neural network) is scaled to a maximum of 0.9 (90%), ensuring that without any social shipping, the maximum achievable compatibility is 90%. The shipping component can add up to 0.1 (10%) to reach a maximum of 1.0 (100%) compatibility.

5. Vector Similarity Component

The vector similarity component measures semantic alignment in the embedding space. We compute cosine similarity between the semantic subvectors $\vec{a}_s$ and $\vec{b}_s$ , which encode the semantic content of open-ended responses:

v(\vec{a}, \vec{b}) = \frac{\vec{a}_s \cdot \vec{b}_s}{||\vec{a}_s|| \cdot ||\vec{b}_s||} = \cos(\theta_{ab})

where $\theta_{ab}$ is the angle between the semantic vectors. This metric quantifies similarity in expression patterns, values, and perspectives. The semantic embeddings are generated using transformer-based language models that capture contextual meaning, emotional tone, and conceptual relationships.

Cosine similarity is bounded: $v(\vec{a}, \vec{b}) \in [-1, 1]$ , but in practice, after normalization and preprocessing, we observe $v(\vec{a}, \vec{b}) \in [0, 1]$ , where 0 indicates orthogonal semantic spaces (completely different) and 1 indicates identical semantic direction (highly similar).

6. Neural Network Compatibility Component

The neural network component captures non-linear relationships and complex interactions between personality traits, values, and behavioral patterns. Our model $\mathcal{N}: \mathbb{R}^k \rightarrow \mathbb{R}$ is a deep neural network trained on a comprehensive dataset combining:

Peer-reviewed psychological research on relationship compatibility
Validated personality assessment frameworks (including MBTI analysis)
LLM-generated synthetic training data following established compatibility patterns
Historical relationship outcome data

n(\vec{a}, \vec{b}) = \sigma(\mathcal{N}([\vec{a}_p; \vec{a}_m; \vec{b}_p; \vec{b}_m; |\vec{a}_p - \vec{b}_p|]))

Where:

\vec{a}_p, \vec{b}_p

= personality trait vectors for users A and B (from MBTI, structured assessments, numerical ratings)

\vec{a}_m, \vec{b}_m

= metadata feature vectors for users A and B (preferences, demographics, behavioral indicators)

|\vec{a}_p - \vec{b}_p|

= element-wise absolute difference between personality vectors, capturing trait divergence

[\cdot; \cdot]

= vector concatenation operator

\mathcal{N}

= deep neural network function mapping concatenated features to raw compatibility score

\sigma

= sigmoid activation function ensuring

n(\vec{a}, \vec{b}) \in [0, 1]

The network architecture processes structured data (multiple-choice responses, numerical ratings) alongside personality trait vectors to identify patterns that predict compatibility beyond simple similarity metrics. The inclusion of pairwise differences $|\vec{a}_p - \vec{b}_p|$ allows the model to learn which trait divergences are compatible versus incompatible.

The model employs attention mechanisms to weight different feature dimensions and includes prompt injection protection to ensure robust predictions resistant to manipulation attempts.

7. Social Shipping Component

The social shipping component allows users to express support for potential matches between other users. This community-driven mechanism incorporates social signals into the compatibility calculation:

s(\vec{a}, \vec{b}) = |\{ \text{ships between users A and B} \}|, \quad 0 \leq s(\vec{a}, \vec{b}) \leq 10

Mechanism:

Each ship between users A and B increases their compatibility score by 1% (0.01)
The base compatibility score (from vector similarity and neural network) is capped at 90% (0.9)
Without any ships, the maximum achievable compatibility is 90%
With the maximum of 10 ships, an additional 10% can be added, reaching 100% compatibility

Constraints:

Each user can be involved in at most 10 ships (as a participant in a shipped pair)
Each user can ship at most 3 pairs (as a shipper expressing support)
The shipping count $s(\vec{a}, \vec{b})$ is capped at 10 to prevent gaming the system

This component acknowledges that social validation and community perception can be meaningful indicators of compatibility, while maintaining safeguards to prevent manipulation. The constraints ensure that shipping remains a genuine social signal rather than a mechanism for artificially inflating scores.

8. Weight Calibration

The component weights $\alpha, \beta$ are calibrated through cross-validation on historical relationship outcome data. The current configuration:

\alpha = 0.5, \quad \beta = 0.5, \quad \text{with } \alpha + \beta = 1

This allocation assigns equal weight (50% each) to semantic similarity and neural network predictions, emphasizing both linguistic expression patterns and deep personality compatibility. The balanced weighting ensures that both surface-level semantic alignment and complex trait interactions contribute equally to the final compatibility score.

The weighting strategy is informed by research demonstrating that similarity in core personality dimensions and values predicts relationship success and satisfaction. The algorithm is explicitly designed to identify compatible matches—users with similar traits and values—rather than complementary opposites.

9. Data Processing Pipeline

Our questionnaire format includes three types of questions, each processed differently:

7.1 Open-Ended Questions

Processed through advanced language models to extract semantic meaning, keywords, and emotional tone. Responses are converted into high-dimensional text embeddings that capture nuanced expression patterns.

7.2 Multiple Choice Questions

Used for MBTI calculation and structured personality assessment. These responses feed directly into our neural network alongside vector embeddings.

7.3 Numerical Ratings (1-7 scale)

Processed mathematically to calculate quantitative alignment. These scores are normalized and integrated into the neural network input features.

10. Privacy & Transparency

We maintain algorithmic transparency while respecting user privacy. Our matching system displays users' top 3 compatibility matches with abstracted or summarized response information. Only initials are displayed in the dashboard, ensuring privacy while providing meaningful insights into compatibility.

11. Algorithm Output

The algorithm produces compatibility scores ranging from 0 to 1, where higher scores indicate greater compatibility. Users receive their top 3 matches, ranked by compatibility score, with detailed insights into why each match was selected. We facilitate connections by automatically scheduling dates at partner restaurants and venues, making the transition from match to meeting seamless.

This approach represents a significant advancement over traditional matching systems, combining the depth of psychological research with the power of modern machine learning to create meaningful, lasting connections.