From Domain Segmentation to Reading Domain Signals

Executive Summary

This pilot grew out of a domain-segmentation experiment where a 31B-class open model showed useful zero-shot knowledge about domain strings, brand tokens, and commercial language. We then tested whether that latent knowledge also appears in public domain-dispute records.

This was a cautious zero-shot pilot for brand-threat reading. It does not claim that open-weight models can replace experts, decide disputes, or provide legal advice.

The answer is promising but incomplete. Across 25 pilot examples, the models recognized obvious abuse and sometimes used facts that lower the threat level when those facts were made explicit. The model appears to have relevant knowledge, but not the job-shaped workflow.

Why This Pilot Started

The starting point was a surprise inside our own domain-intelligence work.

During the search for a DKSplit teacher model, a 31B-class open model turned out to be unusually good at reading domain strings: brands, generic terms, product words, and commercial modifiers. That raised a practical question. If the model already understands this much about domain language, can the same capability be useful around the domain itself?

That question led to this pilot: a small set of public dispute records, multiple model configurations, and several factual framings designed to see how the model responds when brand-owner or domain-holder facts are brought forward.

Public Dispute Records as Signal Data

Public dispute records gave us a practical way to vary the information shown to the model. The same domain can be surrounded by many different signals: brand claims, website use, timing, industry overlap, impersonation cues, legitimate-use facts, and business context.

The point was not to ask the model to decide a dispute. Public outcomes were used only as reference labels. The product target is not “transfer” or “denial”; it is review priority supported by evidence.

The interesting question was whether adding or foregrounding different facts would change the model’s brand-threat reading. That is where this becomes a semantic problem rather than a keyword-matching problem.

What We Tested, Briefly

We used 25 examples. The set included 22 recent public dispute cases with outcomes published after March, plus 3 constructed stress cases. For this pilot, we grouped them into 11 high-risk reference cases and 14 lower-risk reference cases.

  • High-risk reference: 10 public transfer-reference cases plus 1 constructed stress case
  • Lower-risk reference: 12 public denied-reference cases plus 2 constructed stress cases

We used four open-weight model configurations. The goal was not to rank models, but to observe whether different model sizes and families showed the same broad behavior when the factual presentation changed.

  • Gemma 4 31B IT (4-bit): the low-cost configuration that motivated the pilot
  • Gemma 4 31B IT: the same model without 4-bit quantization
  • Qwen3.5-9B: a smaller comparison model
  • Qwen3.5-27B: a larger same-family comparison against Qwen3.5-9B

We used three factual framings for each example. The point was to see whether making different facts more visible would change the model’s brand-threat reading.

  • Baseline: the base factual summary, without an added favorable paragraph
  • Complainant-favorable: the same facts plus a short final paragraph of verifiable facts favorable to the brand owner
  • Respondent-favorable: the same facts plus a short final paragraph of verifiable facts favorable to the domain holder

These were not legal arguments; they were simple packaging probes.

The Main Observation

Across the same cases, model outputs shifted when different facts were made more visible. That is the main signal of the pilot: the model appears to have relevant knowledge, but not a stable workflow for deciding which facts matter. One-sided framing moved the outputs in predictable directions, especially in borderline cases.

High-risk reference cases: model outputs read as high-risk across the 11 examples where the UDRP-style reference label is TRANSFER.

ConfigurationBaselineComplainant-FavorableRespondent-Favorable
Gemma 4 31B IT (4-bit)9/119/119/11
Gemma 4 31B IT9/119/119/11
Qwen3.5-9B10/1110/119/11
Qwen3.5-27B10/1111/119/11

Lower-risk reference cases: model outputs read as lower-risk across the 14 examples where the UDRP-style reference label is DENIED.

ConfigurationBaselineComplainant-FavorableRespondent-Favorable
Gemma 4 31B IT (4-bit)7/146/1413/14
Gemma 4 31B IT6/142/1412/14
Qwen3.5-9B2/141/147/14
Qwen3.5-27B7/144/1411/14

The high-risk group is mostly a sanity check. These cases either contain clear abuse signals or enough ambiguity that a high-risk reading is understandable. Across the four configurations, the models generally kept these cases in the high-risk direction.

The lower-risk group is where the pilot becomes more informative. Here the same cases moved sharply when different facts were made more visible. Complainant-favorable framing reduced lower-risk readings, while respondent-favorable framing recovered many of them. The Qwen3.5-27B run also recovered more lower-risk readings than Qwen3.5-9B across all three framings, suggesting that model capacity may matter for this kind of fact-sensitive reading.

That is the useful product signal: the models can often use the right facts when those facts are made clear, but they do not reliably organize those facts on their own.

Two Case Windows: Capte and Luma

The tables above suggest that model capacity and task-specific tuning may matter. Two cases make that pattern easier to see.

The first is capte.com.[1] This is a classic domain-industry scenario: a brand owner wants the exact-match .com, a purchase attempt does not succeed, and the dispute then turns on whether a parked domain offered for sale should be treated as a brand threat. On the surface, the case looks risky: the domain matches the later CAPTE mark, the domain is for sale, and the asking price is high. But the key facts point the other way: the domain existed in 2004, the complainant was established in 2017, and the EU trademark was registered in 2025.

That makes capte.com a useful positive example. It shows that the model is not just matching strings. In both Gemma 4 31B IT configurations, the model consistently treated the early domain history and domain-investment context as important facts, even when the domain was identical to the later trademark and listed for sale.

Capte.com snapshot

ConfigurationBaselineComplainant-FavorableRespondent-Favorable
Gemma 4 31B IT (4-bit)Lower-riskLower-riskLower-risk
Gemma 4 31B ITLower-riskLower-riskLower-risk
Qwen3.5-9BHigh-riskHigh-riskHigh-risk
Qwen3.5-27BLower-riskLower-riskLower-risk

The second window is luma.ai.[2] It is more ambiguous. The surface signal is strong: a short exact-match domain in the AI-video space. But the timeline is harder to read unless it is made explicit:

Luma.ai timeline

  • June 2021: the complainant operates lumalabs.ai
  • November 2022: the complainant claims first use of the LUMA AI name
  • November 29, 2022: the respondent purchases luma.ai
  • January 2024: the domain moves from the respondent’s company to the respondent personally
  • July 2024: the respondent adopts the company name Luma AI Ltd.
  • May 2025: the complainant’s US trademark registration issues

In this case, the challenge is not only whether the model notices relevant facts. It is also whether those dates are organized into a usable timeline before the threat reading is formed. In the underlying respondent-favorable prompt, the added paragraph did more than lean toward the respondent: it made the timing relation easier to use by stating that the domain purchase and claimed first use occurred in the same month, that the trademark registration came more than two years later, and that the ownership transfer remained within respondent-controlled entities. The case therefore illustrates a narrower point: clarifying temporal relations can change the model’s threat reading.

Luma.ai snapshot

ConfigurationBaselineComplainant-FavorableRespondent-Favorable
Gemma 4 31B IT (4-bit)High-riskHigh-riskLower-risk
Gemma 4 31B ITHigh-riskHigh-riskLower-risk
Qwen3.5-9BHigh-riskHigh-riskHigh-risk
Qwen3.5-27BHigh-riskHigh-riskHigh-risk

Conclusion

This pilot suggests that generative LLMs can be useful in brand-threat review. They can read domain strings, brand signals, similarity, timing, and use context in ways that go beyond simple keyword matching.

But the same pilot also shows why zero-shot use is not enough. These models are still probability-driven systems shaped by their training data. Larger models appear more capable in this setting, but without task-specific fine-tuning their readings remain sensitive to how the facts are packaged.

In other words, the 27B- and 31B-class models already show meaningful knowledge that can be useful for brand protection. But turning that knowledge into a dependable review capability will likely require more task-specific training, and probably deeper workflow and system design around how evidence is extracted, organized, and checked.


[1] capte.com corresponds to WIPO D2026-0455.

[2] luma.ai corresponds to NAF FA2603002209697.

The 25 examples form a pilot set, not a random or statistically representative sample. The set includes 22 recent public dispute cases with outcomes published after March and 3 constructed stress cases.

UDRP-style public records were used because they are public, fact-rich, and consistently structured. Outcomes are used only as reference labels for observation, not as product outputs or legal recommendations.

The three framings used the same base case body. The complainant-favorable and respondent-favorable versions added short factual paragraphs highlighting verifiable signals from one side. The point was to observe whether bringing different factual signals forward changed model behavior.


We acknowledge the European High Performance Computing Joint Undertaking (EuroHPC JU) for awarding this project access to the Leonardo supercomputer, hosted by CINECA in Italy.

Co-funded by the European Union. Views and opinions expressed are those of the authors only and do not necessarily reflect those of the European Union or the European High Performance Computing Joint Undertaking.