Sentiment Analysis & Review Text Classifier

From v0.1.12, every scraped review is automatically processed by the review text classifier before it contributes to a product's opportunity score. This page explains what the classifier does, how results are stored, and how they influence scoring.

Overview

The classifier is invoked per-review after content is scraped from a directory listing. It uses either an LLM-based or rule-based approach (selected at runtime depending on configuration) to extract three structured outputs from raw review text:

Output	Type	Values
Sentiment	Label	`positive` \| `neutral` \| `negative`
Topic Tags	Array of labels	`pricing`, `onboarding`, `support`, `reliability`, `integrations`, `UX`
Key Quoted Phrases	Array of strings	Verbatim excerpts from the review

Topic Tags

Each review can receive one or more topic tags. The supported tags and their intent are:

pricing — comments about cost, value for money, tier limits, or billing
onboarding — feedback on setup, initial experience, documentation, or getting started
support — mentions of customer service, response times, or issue resolution
reliability — references to uptime, bugs, data accuracy, or system stability
integrations — discussion of third-party connections, APIs, or compatibility
UX — opinions on interface design, usability, or workflow friction

A single review may be tagged with multiple topics where the text covers more than one area.

Key Quoted Phrases

The classifier extracts short verbatim phrases from the review that carry the strongest signal — typically the sentence or clause that best justifies the assigned sentiment and tags. These phrases are surfaced in the product dossier to give analysts direct evidence without reading full review text.

Storage

All classification outputs are persisted in the product_review_analysis table, linked to the originating review record. This table is the source of truth for all downstream scoring.

product_review_analysis
├── review_id          (FK → scraped review)
├── product_id         (FK → product)
├── sentiment          (positive | neutral | negative)
├── topic_tags         (array of tag labels)
├── key_quoted_phrases (array of verbatim strings)
└── classifier_method  (llm | rule-based)

Impact on Scoring

Classified reviews feed two of the four scoring dimensions used to rank products on the opportunity dashboard:

Market Demand

Sentiment distribution (ratio of positive to negative reviews) combined with total review volume produces the market demand score. A product with high review volume and predominantly positive sentiment signals an active, validated market.

Competitive Gaps

Negative reviews are the primary input for the competitive gap dimension. The classifier's topic tags on negative reviews pinpoint the specific categories where existing products are failing — e.g. a cluster of pricing + negative tags indicates users are underserved by current pricing models, representing a replication opportunity.

Classifier Methods

The pipeline supports two classifier modes:

LLM-based — uses a language model to interpret nuanced or ambiguous review language; higher accuracy on complex text
Rule-based — uses keyword and pattern matching; faster and lower-cost for high-volume processing

Both modes produce the same three output fields stored in product_review_analysis.