All Docs
FeaturesAgentOS Scope OutUpdated March 12, 2026

Sentiment Analysis & Review Text Classifier

Sentiment Analysis & Review Text Classifier

From v0.1.12, every scraped review is automatically processed by the review text classifier before it contributes to a product's opportunity score. This page explains what the classifier does, how results are stored, and how they influence scoring.


Overview

The classifier is invoked per-review after content is scraped from a directory listing. It uses either an LLM-based or rule-based approach (selected at runtime depending on configuration) to extract three structured outputs from raw review text:

OutputTypeValues
SentimentLabelpositive | neutral | negative
Topic TagsArray of labelspricing, onboarding, support, reliability, integrations, UX
Key Quoted PhrasesArray of stringsVerbatim excerpts from the review

Topic Tags

Each review can receive one or more topic tags. The supported tags and their intent are:

  • pricing — comments about cost, value for money, tier limits, or billing
  • onboarding — feedback on setup, initial experience, documentation, or getting started
  • support — mentions of customer service, response times, or issue resolution
  • reliability — references to uptime, bugs, data accuracy, or system stability
  • integrations — discussion of third-party connections, APIs, or compatibility
  • UX — opinions on interface design, usability, or workflow friction

A single review may be tagged with multiple topics where the text covers more than one area.


Key Quoted Phrases

The classifier extracts short verbatim phrases from the review that carry the strongest signal — typically the sentence or clause that best justifies the assigned sentiment and tags. These phrases are surfaced in the product dossier to give analysts direct evidence without reading full review text.


Storage

All classification outputs are persisted in the product_review_analysis table, linked to the originating review record. This table is the source of truth for all downstream scoring.

product_review_analysis
├── review_id          (FK → scraped review)
├── product_id         (FK → product)
├── sentiment          (positive | neutral | negative)
├── topic_tags         (array of tag labels)
├── key_quoted_phrases (array of verbatim strings)
└── classifier_method  (llm | rule-based)

Impact on Scoring

Classified reviews feed two of the four scoring dimensions used to rank products on the opportunity dashboard:

Market Demand

Sentiment distribution (ratio of positive to negative reviews) combined with total review volume produces the market demand score. A product with high review volume and predominantly positive sentiment signals an active, validated market.

Competitive Gaps

Negative reviews are the primary input for the competitive gap dimension. The classifier's topic tags on negative reviews pinpoint the specific categories where existing products are failing — e.g. a cluster of pricing + negative tags indicates users are underserved by current pricing models, representing a replication opportunity.


Classifier Methods

The pipeline supports two classifier modes:

  • LLM-based — uses a language model to interpret nuanced or ambiguous review language; higher accuracy on complex text
  • Rule-based — uses keyword and pattern matching; faster and lower-cost for high-volume processing

Both modes produce the same three output fields stored in product_review_analysis.