Ask HN: Why are URNs not more popular?

I'm working on i18n of an API for a customer now. It's a domain where one needs to refer to phone numbers, bank accounts, National Identity Numbers, business / legal entity organization numbers etc; in short, IDs in misc. national and international registries.

URN should be perfect for this; I would expect to do the i18n by replacing a Norwegian account number "12345678902" with "urn:bban:NO:12345678902", or a Danish company number "987322431" with "urn:business:DK:CRV:987322431".

However, the list of URN namespaces is very lacking. I'd expect to at least see "urn:msisdn:<phone number>"; or "urn:nin:DK:..." for Danish national identity numbers. But it's not there.

What I see looking around is a lot of home made solutions that make inferences about ID spaces based on "country" that usually holds but not always; or solutions coming from the XML world that seem very complicated to figure out (like ISO 6523 which declares '0037' for the Finnish company registry, and the syntax around it is XML or home-grown). URN would have been such a simple and nice solution. Why has it not catched on?

  • There is only a marginal benefit in using "urn:foo:..." over "foo:...", namely making it clear that it is only intended to represent a name and not a locator. That benefit is marginal because without knowing the spec for foo, you can't really do much with the knowledge that it is a URN. Without knowing the spec, you can't even compare URN values to check if they represent the same name (e.g. ISBN URNs may or may not contain hyphens, and there are equivalent ISBN-10 and ISBN-13 numbers).

    Conversely, there are non-URN URI schemes that still represent only names (like doi), and there are URL schemes that, despite the "L", are also used for non-locator names (the http scheme in particular, e.g. in RDF and for XML namespaces). So it's all a wash.

    In either case (URN or non-URN), you would have to register the URN namespace or the URI scheme, if you want to use it publicly. Often http is used instead, where the domain name serves the role of the namespace, because usually a suitable domain name is already registered. Of course, this latter practice has the drawback that one has to infer from context that it is a name and not an HTTP web resource.

    Unfortunately, dots are not allowed in NIDs. Otherwise it would have been possible to introduce the free use of "urn:<domain name>:..." (for non-TLDs that otherwise don't conflict with the NID naming rules).

  • There are URN namespaces that refer to other namespaces (notably OIDs) and many things can indeed be specified. For example:

      * A UK company: `urn:oid:1.2.826.0.1.<company number>`
      * IBAN account: `urn:oid:1.0.13616:<account>` (note that there's also a URI scheme for this)
      * A publication: `urn:isbn:<ISBN number>`
    
    Chances are that a URN exists, or, if not, you can get a PEN from IANA and then use the `urn:oid` scheme.

    However, the reason URNs don't have much traction is that, because they require interpretation to be meaningful, using 'correct' URNs is of little or no benefit. Besides purism, does your app _really_ benefit from using something like `urn:isbn:123` instead of `isbn:123`, `isbn-123` or simply `123`? In all cases you need additional logic to interpret the data. An advantage may be when interacting with other systems, but most likely you'll need to transform your data anyhow.

    Adding to the issue above, another issue is that there might be several 'correct' URNs for the same external entity. For example, `urn:iso:std:iso:3166:USA`, `urn:oid:2.16.840` and `urn:oid:1.2.840` all refer to the same country (as well as `urn:uuid:7f5f8e3e-6f89-4903-a9c1-0fdb030ddb13`, that I just made up).

  • OP here. Thanks for a lot of useful insights and references.

    An overall comment about a lot of the replies: A lot seem to say "you do not want to use URN because X" in much the same way I think that one might criticise JSON when JSON was launched:

    What is the point of using JSON? Don't you realize that both systems still need to agree on a custom protocol?

    Yet JSON is wildly successful.

    I could have worded my initial request better (and the comments make me realize that, and brings my thinking forward, so thanks for that!)

    What I am really after is: I want to have is a standard way to refer to "an ID taken from an existing public registry", in such a way that readers realize

    A) they are looking at an ID in a registry

    B) they see which registry

    C) in a way that is mostly self-documentary (unlike OID!)

    That is only solving a single one from a ton of issues; much like two systems cannot communicate just because they both speak JSON.

  • URN are the "missing link" in the semantic web. If you look at examples where somebody writes an RDF file they typically use identifiers like

       <http://dbpedia.org/resource/Tin>
    
    which are fine for cases where you want to publish the data on the web but not for the much more common case of "our own private data".

    It is straightforward in principle to use URNs, but it is not easy to find examples or discussion of best practices with URNs. For one thing you can always write

       <urn:uuid:6e8bc430-9c3a-11d9-9669-0800200c9a66>
    
    so you can generate as many unique ids as you need.

  • > Why are URNs not more popular?

    For the same reason that lots of general-purpose classification schemes are not popular: they are brittle and unstable. They smear information out into awkward and hard-to-reuse hierarchies. I refer you to https://www.gwern.net/docs/technology/2005-04-shirky-ontolog...

  • Why restrict to URN when you can use a URI that looks similar to a URL but doesn't have to be accessible. I've used URIs for partner supplied identifiers. They're also used with GraphQL as resource ids.

    For widely standardized things URNs make complete sense e.g. ISBN for books.

  • No one cares. Not trying to be mean - just stating the fact. It's just another very abstract and fragile solution to a problem that is minute and can be solved ad-hoc via more general and well-known approaches.

  • Why "urn:msisdn:" when "tel:" already exists?

  • Asking because I don't know, and I can't believe nobody anticipated it. (Blames "Semantic Web").

    Is there not a version of URN that allows you to use a registered domain name or IP address as a scope or scheme?

    Something in the spirit of:

       urn://mycompany.com/Whatever/I/Want
    
       urn:mycompany.com://192.168.0.1/Whatever/I/Want
    
    Semantically, a "uniform" private resource identifier that's "uniformly" parsable while avoiding the unpleasantness of IANA registrations.

    I've scraped with IANA protocol registrations before. It wasn't fun at all. I can't imagine doing it for schema or urn schema registration.

  • Having to register a NID of a URN would stop most developers from using official URNs (costs time&money for possibly no immediate benefit --tragedy of the commons)

  • Mainly because unless it's a business requirement, developers that work with the internals of software (so not just some CRUD layer and protocol that already has been abstracted) prefer to use things that feel good and look nice. XML for example isn't one of those, it looks like a mess.

    Granted, anything can be made to look like a mess, but it seems that JSON is much more enjoyable than XML, and such preferences tend to make a whole lot more impact on what is going to be used than technical/academic prowess. Applies to most things, and factors in to what kind/type of person you get when you talk to developers at a startup, a bank, general commerce, a FAANG or a technology provider. Not everyone seems to enjoy the kinds of people that "exclusively write complex systems software" or "exclusively write business software for the money" just like not everyone likes "exclusively writes software because they enjoy it", and that's fine. But it directly impact adoption of stuff.

    URNs, URIs, and UTIs (and UBL) are all fine examples of technology designed to solve a thing, but also fine examples of not getting the adoption you'd thing it gets when it is "technically correct".

  • There's something almost hinted at in some comments here that I'd like to raise. Often times the localised name for something might differ, despite being effectively the same piece of information.

    The a good example of this is the Google Places API[0] - if you just want to fetch the region of a place in a country, what you might actually be looking for could be one of several different things. Google abstracts this to a "region" that can be 4 different things: sublocality, locality, and administrative_area_level_1 (or 2).

    So now on top of your URN you might either be missing extra specifity (e.g. this is a locality and not a province) or you need to find a way to add that somewhere.

    A colloquial-vs-technical example as well is tel vs msisdn.

    [0]: https://developers.google.com/maps/documentation/places/web-...

  • Some abbreviations and acronyms I didn't know and had to google:

    i18n: short for internationalization

    URN: Uniform Resource Name, a specific type of URI

  • Because they are explained to decision takers with all the accompanying semantic web talk those men and women do not understand.

    Explain it as an "ID", leave out any mentioning of RDF or semantic web and the deal is made.

  • Maybe I'm missing something, I just don't feel like globally recognized URNs have much value. I am using them in my software which supports external plugins, and this seems like a great use: I and plugin authors get to define the namespaces, which are useful in the context of my application and not beyond. Similarly, AWS makes extensive use of URNs, which are useful in the context of AWS but not beyond. This seems like a good system.

    What utility does URN have in your application? Why do you need your URNs to be globally recognizable?

  • > Why are URNs not more popular? […] Why has it not catched on?

    1. Usability: a developer must register a URN with IANA which is a huge hurdle compared with minting a URI which only requires control over an email address or domain name. <http://enwp.org/Tag_URI_scheme>

    2. Cumulative advantage: everyone knows how to dereference an IRI/URI/URL with the HTTP scheme because the software capable of doing so is deployed million-fold, and dereferencing may also lead to a representation describing the resource which is readable by humans and machines. One can't really do the same with URNs, or at the very least it's much less convenient.

    3. Flexibility: if you mint a PURL <http://enwp.org/PURL> with an extremely long lived redirecting service (e.g. <http://w3id.org> or <http://purl.org>), you are able to point the identifier to different resources, this indirection leaves you with options over the time of its existence. URNs have no redirects, they need to be perfect from the start, which is not a reasonable assumption when humans are involved.

  • At least one of your items, telephone numbers have their own URI scheme [0] (in the original model, URNs were non-locators while URLs were locators; telephone numbers are arguably locators, but in any case URLs being replaced with URIs, which need not be locators, and URNs becoming a URI scheme mean that there is really very little good rule as to when a non-locator identifier should be a URN namespace vs. its own scheme.)

    Also, the same rationale that led to the closure of the registry foe the somewhat similar info: scheme registry [1] likely contribute to limited registration for URN NIDs.

    [0] tel: (RFC 3966) https://www.ietf.org/rfc/rfc3966.txt

    [1] https://oclc-research.github.io/infoURI-Frozen/

  • I felt the same about many other subjects in the past because they were just needed at that moment. Once the need is gone, the personal demand for it disappears. So your current wish to have URNs to be more popular will be over once your current project is done.

  • This is first I’ve ever heard of it.

  • RFC2141 just defines a way to make ids in a non conflicting way. you don't need to use one of the registered namespaces to use URN, you can make up your own and you don't need to register them for them to be useful. at my company we use urn:l6: namespace (company name is L6) whenever we need ids. it very useful especially when you use barcodes, rfid, etc..

  • >Why are strings not more popular?

    It's probably the most popular data type out there, both in data and in codebases. URNs don't bring a lot more than strings on the table, so people go to the easy route.

  •   * LEIs for companies
      * IBANs for bank accounts
      * ISINs for securities (independent of country)
      * SEDOL for securities (country dependent)

  • Wish SEPA would adopt this + QR code standard. Bank transfers are ubiquitous around EU, but typing in bank account numbers is a PITA.

  • I feel URLs are easier to access, think of something like google.com would be easier. People forget things over time.

  • Most of the people have never heard of the term so, probably lousy marketing.

  • I use it as redis keys, like a namespace thing.

  • > However, the list of URN namespaces is very lacking

    Because ontology is fucking garbage.

    All ontology fails, it always has.

    Why are we still discussing this?

    > Why are URNs not more popular?

    Wiki - "URNs are globally unique persistent identifiers assigned within defined namespaces so they will be available for a long period of time, even after the resource which they identify ceases to exist or becomes unavailable"

    Why do I care about this for data? Why would I constrain myself? It's bureaucratic nonsense for people to have meetings on. I'm not a library putting angels on a pinhead, real world data is messy.