← Back to Home

Open Evaluation Rubrics & Internal Methodology

Idea OS publishes how it evaluates ideas: seven category rubrics with explicit 1–5 criteria, LLM fact extraction, and deterministic scoring functions that map structured facts to those bands. An internal weighted aggregate (Global Strategic Strength Score, 0–100) keeps comparisons and history consistent across the engine. This page documents the methodology for transparency and trust—so you can see exactly what criteria the product uses beneath its narrative outputs.

Framework version 2.0 · Last updated March 19, 2026

Overview

Idea OS evaluates startup ideas across 7 categories: Market Demand, ICP Specificity, TAM/SAM/SOM, Competition & Differentiation, Monetization Strength, Execution Feasibility, and Risk Exposure. LLMs extract structured facts from your idea description; pure functions then map those facts to 1–5 category bands deterministically. Cross-agent validation detects contradictions and applies internal penalties so the engine stays internally consistent.

At a glance: Seven evaluation dimensions (Market Demand through Risk), each with published 1–5 rubrics. Facts are scored with deterministic rules; bands combine using published weights into an internal 0–100 aggregate (Global Strategic Strength Score), with penalties when agents contradict each other. We publish the full rubrics so you can verify how the engine works—not to promise a single number as the primary user experience.

Score 5 criteria by category:

  • A score of 5 in Market Demand requires validated demand signals (revenue or preorders) with evidence confidence ≥ 0.75.
  • A score of 5 in ICP Specificity requires a niche segment, acute pain specificity, and B2B/B2C, demographic, and behavioral traits all defined.
  • A score of 5 in TAM/SAM/SOM requires TAM, SAM, and SOM all estimated, data sources cited, and TAM size category large.
  • A score of 5 in Competition & Differentiation requires moat-level differentiation, switching costs present, and a clear differentiation statement.
  • A score of 5 in Monetization Strength requires strong willingness-to-pay evidence, pricing defined, and LTV logic present.
  • A score of 5 in Execution Feasibility requires low technical complexity, low dependencies, realistic team requirements, and MVP scope defined.
  • A score of 5 in Risk Exposure requires no single point of failure, no high platform dependency, and low market timing and adoption barriers.

Market Demand (1–5)

Measures validated demand signals, problem clarity, and trend/community indicators. A score of 5 requires validated demand signals (revenue or preorders) with evidence confidence ≥ 0.75.

Internal aggregate: Global Strategic Strength Score (0–100)

The engine’s weighted sum of category bands, minus contradiction penalties: Σ (score/5 × 100 × weight) − contradiction_penalty, rounded to 0–100. Used internally for consistency across versions, experiments, and exports—it is not the product’s primary user-facing output.

Category Rubrics

Each category has explicit criteria for scores 1–5. A score of 5 represents the strongest evidence; 1 is the baseline when criteria are not met.

Market Demand (20%)

Measures validated demand signals, problem clarity, and trend/community indicators.

ScoreCriteria
5validated_signal_present AND validated_signal_type in [revenue, preorders] AND evidence_confidence ≥ 0.75
4validated_signal_present AND validated_signal_type in [interviews, waitlist] AND evidence_confidence ≥ 0.60
3explicit_problem_defined AND (search_trend_signal = strong OR community_signal = strong)
2explicit_problem_defined AND (search_trend_signal OR community_signal in [moderate, weak])
1Otherwise

ICP Specificity (15%)

Measures target audience definition clarity, pain specificity, and segment precision.

ScoreCriteria
5icp_segment_clarity = niche AND pain_specificity_level = acute AND b2b_or_b2c_defined AND demographic_defined AND behavioral_traits_defined
4icp_segment_clarity = niche AND pain_specificity_level = specific AND b2b_or_b2c_defined
3icp_segment_clarity = defined AND pain_specificity_level in [specific, acute]
2icp_segment_clarity in [broad, defined]
1Otherwise

TAM/SAM/SOM (10%)

Measures market sizing completeness, data source citation, and TAM size category.

ScoreCriteria
5tam_estimated AND sam_estimated AND som_estimated AND data_source_cited AND tam_size_category = large
4tam_estimated AND sam_estimated AND som_estimated AND data_source_cited AND tam_size_category in [medium, small]
3tam_estimated AND sam_estimated AND data_source_cited
2tam_estimated AND NOT data_source_cited
1Otherwise

Competition & Differentiation (15%)

Measures competitive landscape awareness, differentiation clarity, and defensibility.

ScoreCriteria
5differentiation_type = moat AND switching_cost_present AND clear_differentiation_statement
4differentiation_type = business_model OR (differentiation_type = experience AND switching_cost_present)
3clear_differentiation_statement AND differentiation_type in [feature, experience]
2(direct_competitors_identified OR indirect_competitors_identified) AND NOT clear_differentiation_statement
1Otherwise

Monetization Strength (15%)

Measures revenue model clarity, pricing definition, willingness-to-pay evidence, and LTV logic.

ScoreCriteria
5willingness_to_pay_evidence = strong AND pricing_defined AND ltv_logic_present
4pricing_defined AND pricing_logic_explained AND ltv_logic_present
3revenue_model_defined AND pricing_defined AND NOT pricing_logic_explained
2revenue_model_defined AND NOT pricing_defined
1Otherwise

Execution Feasibility (15%)

Measures MVP scope clarity, technical complexity, dependencies, and regulatory risk.

ScoreCriteria
5technical_complexity = low AND dependency_count = low AND team_requirement_realistic AND mvp_scope_defined
4mvp_scope_defined AND technical_complexity in [low, medium] AND regulatory_risk = none
3mvp_scope_defined AND dependency_count = medium
2dependency_count = high OR regulatory_risk = moderate OR NOT mvp_scope_defined
1Otherwise

Risk Exposure (10%)

Measures single points of failure, platform dependency, market timing risk, and adoption barriers.

ScoreCriteria
5NOT single_point_failure AND NOT platform_dependency_high AND market_timing_risk = low AND customer_adoption_barrier = low
4NOT single_point_failure AND NOT platform_dependency_high AND customer_adoption_barrier = medium
3market_timing_risk = medium OR customer_adoption_barrier = high
2platform_dependency_high
1Otherwise

Global Strategic Strength Score

Internally, Idea OS computes the Global Strategic Strength Score (0–100) as the weighted aggregate of all 7 category bands, minus penalties for cross-agent contradictions. Each category contributes (score/5 × 100 × weight) to the total. The formula below is the same code path used by the evaluation pipeline.

Σ (score/5 × 100 × weight) − contradiction_penalty

The contradiction penalty is subtracted from the raw sum before rounding to 0–100.

Cross-Agent Validation

Idea OS uses cross-agent validation to ensure consistency when multiple agents analyze an idea. The following contradictions trigger a penalty:

Penalty: 5 points per violation, up to 20 points total (max 4 violations counted).

Ready for a structured evaluation with full reasoning?

Evaluate your idea