No safe efficient ways to do three-way string comparisons in Go
> Basically no one should use strings.Compare
Others have already talked about the performance aspect but I'm just baffled at the comment basically saying nobody should use this function anyway.
I expect the need for 3-way compares isn't that uncommon, why tell people not to use it?
It's great to have this idea that the compiler should optimize all comparison situations but a) it doesn't yet and b) people still want a 3-way compare function anyway.
Basically it's saying "don't use a function at all", or even "write your own copies of this function". Even if the compiler one day becomes smart enough to optimize you still end up with tons of code duplication if people followed the advice.
And for people who care about performance it means they'll use a 3rd party library or implement their own "optimizations" that might not be effective or even buggy.
Literally nobody is benefiting from this.
What's the point of providing this function but actually thinking nobody should use it, in a comment no less instead of documentation?
Now, there may be other reasons, e.g. the author thinking 3-way compare is a bad pattern in the first place. If argued well, maybe I could agree. But that argument isn't made here.
I'm also not saying they needed to optimize this off the bat. A comment saying "we don't think there is a big demand yet, will optimize when we see the demand" would've been acceptable.
The optimal version does seem to exist here (per the comment too):
https://github.com/golang/go/blob/d28bf6c9a2ea9b992796738d03...
So the goal was to intentionally nerf Compare() to discourage code the golang authors considered less clear. I'm not sure bad performance is really discouraging usage though, it just penalizes folks that use stdlib.
I wonder if they'd accept a PR to switch to runtime.cmpstring today?
Really bizarre. It seems like it wouldn't have been much more work to just implement it properly. Instead people are supposed to wait until the compiler magically gets smart enough to optimize the pattern... but the pattern and method are both intentionally slow, so there will never be usage pressure to optimize it.
A reasonable compromise would be to just implement a single pass three-way compare in native Go instead of optimized assembler, and then if users keep requesting it be optimized, at that point make the compiler improvements or write the hand-tuned assembler version.
Otherwise what you're going to get is people using messy workarounds to do a 3-way compare, like doing a byte-wise compare that isn't unicode-correct. Blech.
I’d argue that any string comparison which does not take into account collation is inherently broken. Even in the pure ASCII English-language case, a naïve comparison on values won’t give desirable results since abc123 will come before abc99 even though a reader would expect otherwise. Just because we’ve tolerated crappy string sorting for sixty years doesn’t mean we should continue to do so.
It is easy to use a 3-way compare to implement normal comparison operators. It is much harder to recognize when some code is using comparison operators to implement an ad-hoc 3-way compare.
It would be harder still to know how those normal compare functions can be combined to implement an efficient 3-way compare. For strings and other built in types, this might be possible. But for user defined types that compare in weird ways, it could be very hard.
It's much better to let the programmer define the 3-way compare and use that to derive all the comparison operators.
People try to get cute with 3-way compares. A real Java bug that inspired: https://errorprone.info/bugpattern/BadComparable
public MyFile implements Comparable { ... long timestamp; ... @Override public int compare(Object other) { return (int)(((MyFile)other).timestamp - timestamp; } ... }
The int conversion loses the sign of the subtract. The particular use of the code was sorting files to delete the X number of oldest.
More examples about the dangers with just ints: https://stackoverflow.com/questions/2728793/java-integer-com...
The comparable API in Java came from the C qsort API. It would have been better to just have a less-than method. Kind of surprised to see this in Go near a core API.
When I look at https://go.godbolt.org/z/8jqMPh135, compiling for a few CPUs, recent compilers only generate a single call to runtime.cmpstring.
⇒ I think the comment is outdated, and there is a safe efficient way to do three-way string comparisons in go.
Caveat: I’m not that good at reading modern assembly, and I do not understand why there also is a call to runtime.memequal in that code. ⇒ Corrections welcome.
This is just Go's philosophy of "we're delivering a language for bad programmers, we know what's best for them".
This is really interesting. I never use three way string comparison, and so I wondered when other programmers use it. (The obvious use case is sorting, but in Go, sort.Interface expects a Less function and sort.Slice accepts a "Less" function, so you won't use it there.)
I searched through my random checked out applications to see who calls strings.Compare, and why.
Most of them are mistaken sorting. In gVisor, there is this in a test:
This pattern shows up a bunch:... cmpopts.SortSlices(func(a, b testEntryEventInfo) bool { return strings.Compare(string(a.Addr), string(b.Addr)) < 0 }), ...
In Go's Snowflake library, there is this:// Less implements sort.Interface. func (s fooSlice) Less(i, j int) bool { return strings.Compare(s[i].Whatever, s[j].Whatever) == -1 }
This is exceedingly interesting to me because you'd expect Compare to cost more than < (it does more stuff), and yet people are somehow baited into writing it. I'm guessing what happens is that people don't know that "<" works for strings, and they reach for strings.Compare, so the documentation is trying to talk them out of a program that doesn't actually make sense. In the case of "!= 0", they just wanted "s0 != name", which I take to mean they had absolutely no idea that strings in Go are comparable at all.if strings.Compare(s0, name) != 0 { t.Error("a file was not downloaded by GET") }All in all, I get the impression that people coming from other languages to Go have different expectations about what the language can do. The documentation for strings.Compare is a logical place to correct them, but it doesn't bother. Just a comment that says "sucks if you use this".
BTW, there were a couple of correct uses. Gazelle has a sort function like this in a few places:
Here they are punished for the inefficiency of strings.Compare; they would actually take advantage of it in the common case where the paths aren't equal over the easier approach of:sort.SliceStable(sortedFiles, func(i, j int) bool { if cmp := strings.Compare(sortedFiles[i].Path, sortedFiles[j].Path); cmp != 0 { return cmp < 0 } return sortedFiles[i].DefName < sortedFiles[j].DefName })
Another correct use was in "goja" (a Javascript interpreter), which uses strings.Compare to implement Javascript's <string>.compareTo(<string>). Whether or not Javascript code is using compareTo correctly is an analysis I do not have the energy to do :)if xs[i].Path != x[j].Path { return xs[i].Path < xs[j].Path } return xs[i].DefName < xs[j].DefNameIdeological purity only works when you actually give people a good alternative. Otherwise you cause more harm than good.
I see it in things like transportation, power generation, recycling. People refuse to use the best methods because then people will never switch - except they never offer the thing that you are supposed to switch to.
I'm actually kinda surprised to see it in a programing language.
I’m so confused? Why wouldn’t you make Compare fast and add the optimization? I’m sure someone will run into a logically equivalent to a 3-way compare situation that the compiler can’t detect.
Hiding language features in optimization passes is a dangerous game. It’s one of the reasons I don’t like implicit tail calls.
I hate paternalistic non-optimization like this. Just optimize it - your code sucks and you should improve it!
What I don’t understand is that they doing first check for lengths
Strings are so common, it’s insane they don’t optimize
Another option is to use `go:linkname` and call the runtime function yourself.
//go:linkname cmpstring runtime.cmpstring func cmpstring(a, b string)
The premise that three-way string comparisons are widely used is spurious at best.
Well, it was desgined by Google. What did you expect?
Maybe I'm not understanding the articles premise exactly, but if you're using three-way comparisons frequently, why would you not just implement the `diff3` ?
It's not that complicated and well documented, infact I'm sure someone has already done it in go.
I mean, from rsc's comment, it's a known issue. I'm guessing no one has cared enough to improve it. Making me think it's not that big of a deal that it does the extra comparison.
If people really need the extra performance, they'll use unsafe to create byte slices backed by the strings, and then use bytes.Compare. Or they'll improve the compiler.
You could type all six lines, or you could link the runtime func and yolo.
Go spaceship!
The go designers are brilliant. This is exactly the right priority. We would all do better it think more the way they do.
So I think the strings.Compare should be implemented efficiently, to avoid breaking user expectations.
Is Go forkable (for lack of a better word)? That's the only question that comes to mind when I read the article. You have the source code, and this shouldn't be a difficult change to make, so I think "fork-and-fix and see who follows along" should be the mentality to practice in this case.