From Domain Segmentation to Reading Domain Signals

Executive Summary

This pilot grew out of a domain-segmentation experiment where a 31B-class open model showed useful zero-shot knowledge about domain strings, brand tokens, and commercial language. We then tested whether that latent knowledge also appears in public domain-dispute records.

This was a cautious zero-shot pilot for brand-threat reading. It does not claim that open-weight models can replace experts, decide disputes, or provide legal advice.

The answer is promising but incomplete. Across 25 pilot examples, the models recognized obvious abuse and sometimes used facts that lower the threat level when those facts were made explicit. The model appears to have relevant knowledge, but not the job-shaped workflow.

Why This Pilot Started

The starting point was a surprise inside our own domain-intelligence work.

During the search for a DKSplit teacher model, a 31B-class open model turned out to be unusually good at reading domain strings: brands, generic terms, product words, and commercial modifiers. That raised a practical question. If the model already understands this much about domain language, can the same capability be useful around the domain itself?

That question led to this pilot: a small set of public dispute records, multiple model configurations, and several factual framings designed to see how the model responds when brand-owner or domain-holder facts are brought forward.

Public Dispute Records as Signal Data

Public dispute records gave us a practical way to vary the information shown to the model. The same domain can be surrounded by many different signals: brand claims, website use, timing, industry overlap, impersonation cues, legitimate-use facts, and business context.

The point was not to ask the model to decide a dispute. Public outcomes were used only as reference labels. The product target is not “transfer” or “denial”; it is review priority supported by evidence.

The interesting question was whether adding or foregrounding different facts would change the model’s brand-threat reading. That is where this becomes a semantic problem rather than a keyword-matching problem.

What We Tested, Briefly

We used 25 examples. The set included 22 recent public dispute cases with outcomes published after March, plus 3 constructed stress cases. For this pilot, we grouped them into 11 high-risk reference cases and 14 lower-risk reference cases.

High-risk reference: 10 public transfer-reference cases plus 1 constructed stress case
Lower-risk reference: 12 public denied-reference cases plus 2 constructed stress cases

We used four open-weight model configurations. The goal was not to rank models, but to observe whether different model sizes and families showed the same broad behavior when the factual presentation changed.

Gemma 4 31B IT (4-bit): the low-cost configuration that motivated the pilot
Gemma 4 31B IT: the same model without 4-bit quantization
Qwen3.5-9B: a smaller comparison model
Qwen3.5-27B: a larger same-family comparison against Qwen3.5-9B

We used three factual framings for each example. The point was to see whether making different facts more visible would change the model’s brand-threat reading.

Baseline: the base factual summary, without an added favorable paragraph
Complainant-favorable: the same facts plus a short final paragraph of verifiable facts favorable to the brand owner
Respondent-favorable: the same facts plus a short final paragraph of verifiable facts favorable to the domain holder

These were not legal arguments; they were simple packaging probes.

The Main Observation

Across the same cases, model outputs shifted when different facts were made more visible. That is the main signal of the pilot: the model appears to have relevant knowledge, but not a stable workflow for deciding which facts matter. One-sided framing moved the outputs in predictable directions, especially in borderline cases.

High-risk reference cases: model outputs read as high-risk across the 11 examples where the UDRP-style reference label is TRANSFER.

Configuration	Baseline	Complainant-Favorable	Respondent-Favorable
Gemma 4 31B IT (4-bit)	9/11	9/11	9/11
Gemma 4 31B IT	9/11	9/11	9/11
Qwen3.5-9B	10/11	10/11	9/11
Qwen3.5-27B	10/11	11/11	9/11

Lower-risk reference cases: model outputs read as lower-risk across the 14 examples where the UDRP-style reference label is DENIED.

Configuration	Baseline	Complainant-Favorable	Respondent-Favorable
Gemma 4 31B IT (4-bit)	7/14	6/14	13/14
Gemma 4 31B IT	6/14	2/14	12/14
Qwen3.5-9B	2/14	1/14	7/14
Qwen3.5-27B	7/14	4/14	11/14

The high-risk group is mostly a sanity check. These cases either contain clear abuse signals or enough ambiguity that a high-risk reading is understandable. Across the four configurations, the models generally kept these cases in the high-risk direction.

The lower-risk group is where the pilot becomes more informative. Here the same cases moved sharply when different facts were made more visible. Complainant-favorable framing reduced lower-risk readings, while respondent-favorable framing recovered many of them. The Qwen3.5-27B run also recovered more lower-risk readings than Qwen3.5-9B across all three framings, suggesting that model capacity may matter for this kind of fact-sensitive reading.

That is the useful product signal: the models can often use the right facts when those facts are made clear, but they do not reliably organize those facts on their own.

Two Case Windows: Capte and Luma

The tables above suggest that model capacity and task-specific tuning may matter. Two cases make that pattern easier to see.

The first is capte.com.^[1] This is a classic domain-industry scenario: a brand owner wants the exact-match .com, a purchase attempt does not succeed, and the dispute then turns on whether a parked domain offered for sale should be treated as a brand threat. On the surface, the case looks risky: the domain matches the later CAPTE mark, the domain is for sale, and the asking price is high. But the key facts point the other way: the domain existed in 2004, the complainant was established in 2017, and the EU trademark was registered in 2025.

That makes capte.com a useful positive example. It shows that the model is not just matching strings. In both Gemma 4 31B IT configurations, the model consistently treated the early domain history and domain-investment context as important facts, even when the domain was identical to the later trademark and listed for sale.

Capte.com snapshot

Configuration	Baseline	Complainant-Favorable	Respondent-Favorable
Gemma 4 31B IT (4-bit)	Lower-risk	Lower-risk	Lower-risk
Gemma 4 31B IT	Lower-risk	Lower-risk	Lower-risk
Qwen3.5-9B	High-risk	High-risk	High-risk
Qwen3.5-27B	Lower-risk	Lower-risk	Lower-risk

The second window is luma.ai.^[2] It is more ambiguous. The surface signal is strong: a short exact-match domain in the AI-video space. But the timeline is harder to read unless it is made explicit:

Luma.ai timeline

June 2021: the complainant operates lumalabs.ai
November 2022: the complainant claims first use of the LUMA AI name
November 29, 2022: the respondent purchases luma.ai
January 2024: the domain moves from the respondent’s company to the respondent personally
July 2024: the respondent adopts the company name Luma AI Ltd.
May 2025: the complainant’s US trademark registration issues

In this case, the challenge is not only whether the model notices relevant facts. It is also whether those dates are organized into a usable timeline before the threat reading is formed. In the underlying respondent-favorable prompt, the added paragraph did more than lean toward the respondent: it made the timing relation easier to use by stating that the domain purchase and claimed first use occurred in the same month, that the trademark registration came more than two years later, and that the ownership transfer remained within respondent-controlled entities. The case therefore illustrates a narrower point: clarifying temporal relations can change the model’s threat reading.

Luma.ai snapshot

Configuration	Baseline	Complainant-Favorable	Respondent-Favorable
Gemma 4 31B IT (4-bit)	High-risk	High-risk	Lower-risk
Gemma 4 31B IT	High-risk	High-risk	Lower-risk
Qwen3.5-9B	High-risk	High-risk	High-risk
Qwen3.5-27B	High-risk	High-risk	High-risk

Conclusion

This pilot suggests that generative LLMs can be useful in brand-threat review. They read domain strings, brand signals, similarity, and timing in ways that go beyond keyword matching, and the 27B- and 31B-class models already show meaningful, usable knowledge.

But zero-shot use is not enough. These models are probability-driven systems shaped by their training data, and their readings stay sensitive to how the facts are packaged: the same case can move from lower- to higher-risk depending on which details are surfaced first. We also saw a systematic lean toward infringement, with baseline outputs often flagging lower-risk cases as high-risk.

The likely root is what the training data represents. Public dispute records like UDRP decisions are not a random sample of domains; they are domains a complainant already judged worth challenging. The sample is pre-filtered at the complaint stage, so even a perfectly calibrated reading of these records yields “most contested domains infringe.” In probability terms, the data reflects P(infringing | already challenged), while a brand-protection task needs P(infringing | any domain), and a model trained on these outcomes inherits the former and over-flags. Default judgments, roughly three-quarters of UDRP cases per CircleID, push the skew further, but that is a second-order effect layered on top of the selection bias.

Using an LLM to support brand-protection decisions therefore calls for a human in the loop, not autonomous judgment: the model surfaces and prioritizes cases, a person decides. On top of that, making the signal dependable will likely require task-specific training to correct the bias described above, along with workflow and system design around how evidence is extracted, organized, and checked.

DKSplit on EuroHPC Series

A Two-Week Journey on EuroHPC Leonardo
DKSplit Update: Cleaner Benchmark, First DeBERTa Run, Different Failure Modes
Searching for a Teacher Model Across Architectures
From Domain Segmentation to Reading Domain Signals (this post)
CharBERT and ByT5-CRF

[1] capte.com corresponds to WIPO D2026-0455.

[2] luma.ai corresponds to NAF FA2603002209697.

The 25 examples form a pilot set, not a random or statistically representative sample. The set includes 22 recent public dispute cases with outcomes published after March and 3 constructed stress cases.

UDRP-style public records were used because they are public, fact-rich, and consistently structured. Outcomes are used only as reference labels for observation, not as product outputs or legal recommendations.

The three framings used the same base case body. The complainant-favorable and respondent-favorable versions added short factual paragraphs highlighting verifiable signals from one side. The point was to observe whether bringing different factual signals forward changed model behavior.

We acknowledge the European High Performance Computing Joint Undertaking (EuroHPC JU) for awarding this project access to the Leonardo supercomputer, hosted by CINECA in Italy.

Co-funded by the European Union. Views and opinions expressed are those of the authors only and do not necessarily reflect those of the European Union or the European High Performance Computing Joint Undertaking.

From Domain Segmentation to Reading Domain Signals

Executive Summary

Why This Pilot Started

Public Dispute Records as Signal Data

What We Tested, Briefly

The Main Observation

Two Case Windows: Capte and Luma

Conclusion

DKSplit on EuroHPC Series

Related Posts

DKSplit on EuroHPC: CharBERT and ByT5

AI Domain Registration Trends: What 38 Million .com Domains Tell Us

What’s Happening with 29,000 Short .COM Domains? A WHOIS Data Analysis