Sentiment Analysis & Review Text Classifier
Sentiment Analysis & Review Text Classifier
From v0.1.12, every scraped review is automatically processed by the review text classifier before it contributes to a product's opportunity score. This page explains what the classifier does, how results are stored, and how they influence scoring.
Overview
The classifier is invoked per-review after content is scraped from a directory listing. It uses either an LLM-based or rule-based approach (selected at runtime depending on configuration) to extract three structured outputs from raw review text:
| Output | Type | Values |
|---|---|---|
| Sentiment | Label | positive | neutral | negative |
| Topic Tags | Array of labels | pricing, onboarding, support, reliability, integrations, UX |
| Key Quoted Phrases | Array of strings | Verbatim excerpts from the review |
Topic Tags
Each review can receive one or more topic tags. The supported tags and their intent are:
pricing— comments about cost, value for money, tier limits, or billingonboarding— feedback on setup, initial experience, documentation, or getting startedsupport— mentions of customer service, response times, or issue resolutionreliability— references to uptime, bugs, data accuracy, or system stabilityintegrations— discussion of third-party connections, APIs, or compatibilityUX— opinions on interface design, usability, or workflow friction
A single review may be tagged with multiple topics where the text covers more than one area.
Key Quoted Phrases
The classifier extracts short verbatim phrases from the review that carry the strongest signal — typically the sentence or clause that best justifies the assigned sentiment and tags. These phrases are surfaced in the product dossier to give analysts direct evidence without reading full review text.
Storage
All classification outputs are persisted in the product_review_analysis table, linked to the originating review record. This table is the source of truth for all downstream scoring.
product_review_analysis
├── review_id (FK → scraped review)
├── product_id (FK → product)
├── sentiment (positive | neutral | negative)
├── topic_tags (array of tag labels)
├── key_quoted_phrases (array of verbatim strings)
└── classifier_method (llm | rule-based)
Impact on Scoring
Classified reviews feed two of the four scoring dimensions used to rank products on the opportunity dashboard:
Market Demand
Sentiment distribution (ratio of positive to negative reviews) combined with total review volume produces the market demand score. A product with high review volume and predominantly positive sentiment signals an active, validated market.
Competitive Gaps
Negative reviews are the primary input for the competitive gap dimension. The classifier's topic tags on negative reviews pinpoint the specific categories where existing products are failing — e.g. a cluster of pricing + negative tags indicates users are underserved by current pricing models, representing a replication opportunity.
Classifier Methods
The pipeline supports two classifier modes:
- LLM-based — uses a language model to interpret nuanced or ambiguous review language; higher accuracy on complex text
- Rule-based — uses keyword and pattern matching; faster and lower-cost for high-volume processing
Both modes produce the same three output fields stored in product_review_analysis.