Open Evaluation Rubrics & Internal Methodology
Idea OS publishes how it evaluates ideas: seven category rubrics with explicit 1–5 criteria, LLM fact extraction, and deterministic scoring functions that map structured facts to those bands. An internal weighted aggregate (Global Strategic Strength Score, 0–100) keeps comparisons and history consistent across the engine. This page documents the methodology for transparency and trust—so you can see exactly what criteria the product uses beneath its narrative outputs.
Overview
Idea OS evaluates startup ideas across 7 categories: Market Demand, ICP Specificity, TAM/SAM/SOM, Competition & Differentiation, Monetization Strength, Execution Feasibility, and Risk Exposure. LLMs extract structured facts from your idea description; pure functions then map those facts to 1–5 category bands deterministically. Cross-agent validation detects contradictions and applies internal penalties so the engine stays internally consistent.
At a glance: Seven evaluation dimensions (Market Demand through Risk), each with published 1–5 rubrics. Facts are scored with deterministic rules; bands combine using published weights into an internal 0–100 aggregate (Global Strategic Strength Score), with penalties when agents contradict each other. We publish the full rubrics so you can verify how the engine works—not to promise a single number as the primary user experience.
Score 5 criteria by category:
- A score of 5 in Market Demand requires validated demand signals (revenue or preorders) with evidence confidence ≥ 0.75.
- A score of 5 in ICP Specificity requires a niche segment, acute pain specificity, and B2B/B2C, demographic, and behavioral traits all defined.
- A score of 5 in TAM/SAM/SOM requires TAM, SAM, and SOM all estimated, data sources cited, and TAM size category large.
- A score of 5 in Competition & Differentiation requires moat-level differentiation, switching costs present, and a clear differentiation statement.
- A score of 5 in Monetization Strength requires strong willingness-to-pay evidence, pricing defined, and LTV logic present.
- A score of 5 in Execution Feasibility requires low technical complexity, low dependencies, realistic team requirements, and MVP scope defined.
- A score of 5 in Risk Exposure requires no single point of failure, no high platform dependency, and low market timing and adoption barriers.
Market Demand (1–5)
Measures validated demand signals, problem clarity, and trend/community indicators. A score of 5 requires validated demand signals (revenue or preorders) with evidence confidence ≥ 0.75.
Internal aggregate: Global Strategic Strength Score (0–100)
The engine’s weighted sum of category bands, minus contradiction penalties: Σ (score/5 × 100 × weight) − contradiction_penalty, rounded to 0–100. Used internally for consistency across versions, experiments, and exports—it is not the product’s primary user-facing output.
Category Rubrics
Each category has explicit criteria for scores 1–5. A score of 5 represents the strongest evidence; 1 is the baseline when criteria are not met.
Market Demand (20%)
Measures validated demand signals, problem clarity, and trend/community indicators.
| Score | Criteria |
|---|---|
| 5 | validated_signal_present AND validated_signal_type in [revenue, preorders] AND evidence_confidence ≥ 0.75 |
| 4 | validated_signal_present AND validated_signal_type in [interviews, waitlist] AND evidence_confidence ≥ 0.60 |
| 3 | explicit_problem_defined AND (search_trend_signal = strong OR community_signal = strong) |
| 2 | explicit_problem_defined AND (search_trend_signal OR community_signal in [moderate, weak]) |
| 1 | Otherwise |
ICP Specificity (15%)
Measures target audience definition clarity, pain specificity, and segment precision.
| Score | Criteria |
|---|---|
| 5 | icp_segment_clarity = niche AND pain_specificity_level = acute AND b2b_or_b2c_defined AND demographic_defined AND behavioral_traits_defined |
| 4 | icp_segment_clarity = niche AND pain_specificity_level = specific AND b2b_or_b2c_defined |
| 3 | icp_segment_clarity = defined AND pain_specificity_level in [specific, acute] |
| 2 | icp_segment_clarity in [broad, defined] |
| 1 | Otherwise |
TAM/SAM/SOM (10%)
Measures market sizing completeness, data source citation, and TAM size category.
| Score | Criteria |
|---|---|
| 5 | tam_estimated AND sam_estimated AND som_estimated AND data_source_cited AND tam_size_category = large |
| 4 | tam_estimated AND sam_estimated AND som_estimated AND data_source_cited AND tam_size_category in [medium, small] |
| 3 | tam_estimated AND sam_estimated AND data_source_cited |
| 2 | tam_estimated AND NOT data_source_cited |
| 1 | Otherwise |
Competition & Differentiation (15%)
Measures competitive landscape awareness, differentiation clarity, and defensibility.
| Score | Criteria |
|---|---|
| 5 | differentiation_type = moat AND switching_cost_present AND clear_differentiation_statement |
| 4 | differentiation_type = business_model OR (differentiation_type = experience AND switching_cost_present) |
| 3 | clear_differentiation_statement AND differentiation_type in [feature, experience] |
| 2 | (direct_competitors_identified OR indirect_competitors_identified) AND NOT clear_differentiation_statement |
| 1 | Otherwise |
Monetization Strength (15%)
Measures revenue model clarity, pricing definition, willingness-to-pay evidence, and LTV logic.
| Score | Criteria |
|---|---|
| 5 | willingness_to_pay_evidence = strong AND pricing_defined AND ltv_logic_present |
| 4 | pricing_defined AND pricing_logic_explained AND ltv_logic_present |
| 3 | revenue_model_defined AND pricing_defined AND NOT pricing_logic_explained |
| 2 | revenue_model_defined AND NOT pricing_defined |
| 1 | Otherwise |
Execution Feasibility (15%)
Measures MVP scope clarity, technical complexity, dependencies, and regulatory risk.
| Score | Criteria |
|---|---|
| 5 | technical_complexity = low AND dependency_count = low AND team_requirement_realistic AND mvp_scope_defined |
| 4 | mvp_scope_defined AND technical_complexity in [low, medium] AND regulatory_risk = none |
| 3 | mvp_scope_defined AND dependency_count = medium |
| 2 | dependency_count = high OR regulatory_risk = moderate OR NOT mvp_scope_defined |
| 1 | Otherwise |
Risk Exposure (10%)
Measures single points of failure, platform dependency, market timing risk, and adoption barriers.
| Score | Criteria |
|---|---|
| 5 | NOT single_point_failure AND NOT platform_dependency_high AND market_timing_risk = low AND customer_adoption_barrier = low |
| 4 | NOT single_point_failure AND NOT platform_dependency_high AND customer_adoption_barrier = medium |
| 3 | market_timing_risk = medium OR customer_adoption_barrier = high |
| 2 | platform_dependency_high |
| 1 | Otherwise |
Global Strategic Strength Score
Internally, Idea OS computes the Global Strategic Strength Score (0–100) as the weighted aggregate of all 7 category bands, minus penalties for cross-agent contradictions. Each category contributes (score/5 × 100 × weight) to the total. The formula below is the same code path used by the evaluation pipeline.
The contradiction penalty is subtracted from the raw sum before rounding to 0–100.
Cross-Agent Validation
Idea OS uses cross-agent validation to ensure consistency when multiple agents analyze an idea. The following contradictions trigger a penalty:
- Market Demand claims validated signals but no explicit problem is defined
- Strong willingness-to-pay evidence but ICP is completely undefined
- Low technical complexity claimed but high systemic risk flags present
- Moat-level differentiation claimed but no competitors identified to differentiate from
- Large TAM claimed without cited data sources
- Revenue validation claimed but no pricing is defined in monetization
- High regulatory risk but technical complexity rated as low
- Hyper-niche ICP defined but TAM categorized as large — potential sizing inconsistency
Penalty: 5 points per violation, up to 20 points total (max 4 violations counted).
Ready for a structured evaluation with full reasoning?
Evaluate your idea