I have the following commonPrefix() function (for whole paths) which works fine on Linux (and Windows), but wondering if it could be shorter and/or more efficient?
proc commonPrefix*(paths: openArray[string], sep = "/"): string =
if len(paths) == 0:
return ""
result = paths[0]
for path in paths[1 .. ^1]:
while not path.startsWith(result) and len(result) > 0:
result = result[0 .. ^2]
result = result[0 .. result.rfind(sep)]
Tests:
test "commonPrefix":
check(commonPrefix(["/tmp/test1", "/tmp/data.dat"]) == "/tmp/")
check(commonPrefix(["/home/mark/test1", "/tmp/data.dat"]) == "/")
check(commonPrefix(["/home/mark/test1", "local/data.dat"]) == "")
something like that
proc commonPrefix*(paths: openArray[string], sep = "/"): string =
if len(paths) > 0:
result = paths[0]
for i in 1 .. paths.high:
let path = paths[i]
while not path.startsWith(result) and len(result) > 0:
result.setLen(result.high)
result.setLen(result.rfind(sep) + 1)
When your commonPrefix takes 2 paths, length of both paths are n and both paths has no common prefix, it has O(n^2) time complexity. Because startsWith proc is O(n) and it is called n times.
relativePath proc in os module calculate common prefix in O(n) time complexity. https://github.com/nim-lang/Nim/blob/devel/lib/pure/os.nim
This version walks forward char by char. Is there a nim equivalent to Python's timeit module for timing snippets?
proc commonPrefix*(paths: openArray[string], sep = "/"): string =
if len(paths) == 0:
return ""
result = paths[0]
var index = 0
var ok = false
for path in paths[1 .. ^1]:
while index < len(path) and index < len(result) and
result[index] == path[index]:
inc index
ok = true
result = if not ok:
""
else:
result[0 .. result.rfind(sep, 0, index)]
Is there a nim equivalent to Python's timeit module for timing snippets?
There are probably about 100 variants in people's local code, but I happened to just see your message and also to just add this to cligen/osUt the other day (needs requires "cligen#head" in your .nimble file or you could just copy this tiny impl and adapt as you like):
import strutils, times
template timeIt*(label:string, unit:float, places=3, sep="\n", body: untyped) =
let t0 = epochTime()
body
let dt = epochTime() - t0
stdout.write label, " ", formatFloat(dt * unit, ffDecimal, places), sep
timeIt("write1", 1e6, 3, " ") : stderr.write "hi"
timeIt("write2", 1e6, 3, "\n"): stderr.write "there"
You may want to add loops to repeat and use the stats.RunningStat stdlib API to do something nice more like Python's version. There is also this package called golden (https://github.com/disruptek/golden) to try to iterate until timing noise has been suppressed. And probably other various solutions, too. (And sometimes noise is the data and you want cold-cache/system competition effects included...benchmarking is obviously a larger topic than a snippet thing.)This is the fastest and best so far:
proc commonPrefix*(paths: openArray[string], sep = "/"): string =
if len(paths) == 0:
return ""
let first = paths[0]
var index = -1
block loop:
for i in 0 ..< len(first):
for j in 1 .. paths.high():
let path = paths[j]
if i < len(path) and first[i] != path[i]:
break loop
index = i
if index == -1:
return ""
else:
return first[0 .. first.rfind(sep, 0, index)]
Eg., in Nim it might look like this:
proc minLen(strs: openArray[string]): int =
result = int.high
for s in strs: result = min(result, s.len)
proc check(strs: openArray[string]; j0, j1: int): bool =
let srcSlc = strs[0][j0..j1]
for s in strs:
if s[j0..j1] != srcSlc: return false
return true
proc longestCommonPfx*(strs: openArray[string]): string=
if strs.len < 1: return ""
let src = strs[0]
if strs.len < 2: return src
var lo = 0 # Binary search
var hi = strs.minLen - 1
while lo <= hi:
let mid = lo + (hi - lo) div 2
if check(strs, lo, mid): # All strs match this
result.add src[lo .. mid] # Append to answer
lo = mid + 1 # Go higher
else: # Go lower
hi = mid - 1
when isMainModule:
import os
echo longestCommonPfx(commandLineParams())
There are also a lot of variants at http://rosettacode.org/wiki/Longest_common_prefix
For paths you would just need to add something to backup to the most recent directory separator, if any.
And before someone corrects me, an even less embarrassing variant might be:
proc rangeAt*[T](xs: openArray[T]): tuple[minAt, maxAt: int] =
if xs.len < 1: return (-1, -1) # raise?
for i in 1 ..< xs.len: # note tuple ints init to zero
if xs[result.minAt] < xs[i]: result.minAt = i
if xs[result.minAt] > xs[i]: result.minAt = i
with then let (minAt, maxAt) = strs.rangeAt and tracking (min|max) -> strs[(min|max)At] changes in the string comparison for the prefix length algo. Some paired-up value oriented interface for those who want it would also make sense:
proc range*[T](xs: openArray[T]): tuple[min, max: T] {.inline.} =
if xs.len < 1: return # raise?
let (minAt, maxAt) = xs.rangeAt
result.min = xs[minAt]
result.max = xs[maxAt]
in which case maybe no change to the prefix len algo is needed.
While off the topic of the prefix calculation, it perhaps bears noting that SSE/AVX calculations can sometimes speed up the value-oriented versions of these range computations over arrays of CPU supported types like openArray[float32] by large factors - 24x even. Basically branchless and batching vector min/max instructions can be leveraged, but I am unaware of CPUs supporting "minAt" type vector instructions. Recent versions of gcc can recognize these min/max type value sweeps and correctly generate optimized code, but such pattern recognition can often fail as the code grows just a little more complex.
Using your for j in 1 .. paths.high() is much faster than my for s in paths[1 .. ^1], but for me my foldl is much faster than your maxLen. (My test data isn't great though, a dozen calls, eleven with common prefixes, repeated 100K times.)
Maybe a common prefix algorithm should be in the std. lib., perhaps with one version that is "raw" and another which does prefixes only at a given separator?
It's a bit tricky to explain why your foldl would be much faster than my minLen (note "min" not "max"). Perhaps you aren't compiling with max optimization/-d:danger or something? Or perhaps some peculiarity related to your test data. Total data size, number of strings, and size of the found prefices are more salient statistics than how many times you repeat, though another possible explanation is that 100k repeats in a tight loop does something weird with a warmed up CPU branch predictor. My timings come from a Unix shell loop doing 10..100 repeated runs of the whole program. As with much benchmarking, it's debatable what is most representative.
I'll include the whole program below for reference. I called @marks' algorithm lcpVertical for vertically-oriented scanning and swapped i & j to have more traditional j=column index notation and slightly earlier exit. Because cligen allows unique prefixes ./lcp -alcpb /usr/bin/\* is a valid way to run things in binary search mode. I don't do the final rfind(sep) work in any of those, but it's the same work for all the algorithms anyway and just a couple line wrapper proc.
Anyway, I've run with a variety of inputs and things track my stated expectations perfectly (compiled with devel version of nim, d:danger, gcc-9.2, on Linux, i6700k). For really large inputs the range algo works best..for really small the binary search. So, an optimal at all scales would probably switch from binary search to range at some machine/system-load dependent threshold.
stdlib-wise, the range LCP algo is never worse than about 2X the run time on my various /usr/xyz/* type tests, though. So, from a non-tuning/performance robustness point of view, it would be a better candidate for stdlib inclusion. Beyond the "one stop shopping" algo-wise and almost trivial implementation, its rangeAt & range helpers are much more basic/common operations than longest common prefix. Beyond that basicness, as mentioned the value-based range may afford strong-speed-ups for specialized types. So, it might be helpful to have a single point of reference that could be optimized/assumed optimized the way certain standard C library functions are, such as memchr.
when defined(foldl):
import sequtils # For me this is slower
proc minLen(strs: openArray[string]): int {.inline.} =
foldl(strs.mapIt(len(it)), min(a, b))
else:
proc minLen(strs: openArray[string]): int {.inline.} =
result = int.high
for s in strs: result = min(result, s.len)
proc check(strs: openArray[string]; j0, j1: int): bool =
let first = strs[0]
for s in strs:
for j in j0 .. j1:
if s[j] != first[j]: return false
return true
proc lcpLenBinSearch(strs: openArray[string]): int =
if strs.len == 0: return 0 # maybe -1 instead?
if strs.len == 1: return strs[0].len
var lo = 0 # Binary search
var hi = strs.minLen - 1
while lo <= hi:
let mid = lo + (hi - lo) div 2
if check(strs, lo, mid): # All strs match this
result += mid + 1 - lo # Append to answer
lo = mid + 1 # Go higher
else: # Go lower
hi = mid - 1
proc rangeAt*[T](xs: openArray[T]): tuple[minAt, maxAt: int] =
if xs.len == 0: return (-1, -1) # raise?
for i in 1 ..< xs.len:
if xs[result.minAt] < xs[i]: result.minAt = i
if xs[result.minAt] > xs[i]: result.minAt = i
proc range*[T](xs: openArray[T]): tuple[min, max: T] =
if xs.len == 0: return # raise?
let (minAt, maxAt) = xs.rangeAt
result.min = xs[minAt]
result.max = xs[maxAt]
proc lcpLenRange(strs: openArray[string]): int =
if strs.len == 0: return -1 # raise?
let (minAt, maxAt) = strs.rangeAt
for i, c in strs[minAt]:
if c != strs[maxAt][i]: return i
return strs[minAt].len
proc lcpLenVertical(strs: openArray[string]): int =
if strs.len == 0: return 0 # simplified marks algo
let first = strs[0]
for j in 0 ..< first.len:
for i in 1 ..< strs.len:
if j >= strs[i].len or first[j] != strs[i][j]:
return j
return first.len
type lcpAlgo* = enum lcpBinSearch, lcpRange, lcpVertical
proc lcpLen*(strs: openArray[string], algo=lcpVertical): int =
case algo
of lcpBinSearch: return strs.lcpLenBinSearch
of lcpRange: return strs.lcpLenRange
of lcpVertical: return strs.lcpLenVertical
proc lcp*(strs: openArray[string], algo=lcpBinSearch): string =
if strs.len < 1: return ""
result = strs[0][0 ..< strs.lcpLen(algo)]
when isMainModule: # asserts are the Rosetta Code tests
assert lcp(["interspecies", "interstellar", "interstate"]) == "inters"
assert lcp(["throne", "throne"]) == "throne"
assert lcp(["throne", "dungeon"]) == ""
assert lcp(["throne", "", "dungeon"]) == ""
assert lcp(["cheese"]) == "cheese"
assert lcp([""]) == ""
assert lcp(@[]) == ""
assert lcp(["prefix", "suffix"]) == ""
assert lcp(["foo", "foobar"]) == "foo"
import cligen, cligen/osUt
proc timeAlgo(algo=lcpRange, strs: seq[string]) =
timeIt(" in", 1e6, 3, " microseconds via " & $algo & "\n"):
stdout.write strs.lcpLen(algo)
dispatch(timeAlgo)
In rangeAt you have
if xs[result.minAt] < xs[i]: result.minAt = i
if xs[result.minAt] > xs[i]: result.minAt = i
whereas I suspect you meant to use maxAt in the second line. Does this affect your timing results?Oops. Didn't survive some edits. Good catch as always, @e. Will fix in the prior post with a note. Does not effect my results which are like this:
$ echo /usr/bin/*|wc
1 3077 58306
$ (repeat 50; lcp -alcpV /usr/bin/*)|mnsd
9 in 53.97 +- 0.30 microseconds via lcpVertical
$ (repeat 50; lcp -alcpR /usr/bin/*)|mnsd
9 in 51.33 +- 0.42 microseconds via lcpRange
$ (repeat 50; lcp -alcpB /usr/bin/*)|mnsd
9 in 39.29 +- 0.78 microseconds via lcpBinSearch
(I use Zsh which has repeat and a little mean-sdev computer.) Then for a larger inputs:
$ echo /usr/lib/python3.6/**|wc
1 39036 3108640
$ (repeat 10 lcp -alcpV /usr/lib/python3.6/**)|mnsd
19 in 2388. +- 14. microseconds via lcpVertical
$ (repeat 10 lcp -alcpR /usr/lib/python3.6/**)|mnsd
19 in 1131.1 +- 1.5 microseconds via lcpRange
$ (repeat 10 lcp -alcpB /usr/lib/python3.6/**)|mnsd
19 in 1523. +- 17. microseconds via lcpBinSearch
The L3 on an i7-6700k is 8MB. So, this is showing the range method pulling ahead while still in L3 (3.1 MB). At one pass, it really has to be approximately the fastest at large scale. Reading from DIMMs the difference should be even more pronounced in favor of range.
It's possible that that marks/vertical scan style might be faster if the answer is very small. E.g., for a zero common prefix length, the answer will be concluded after just one pass down the first column via lcpVertical.
For explicitness,
proc lcpLenVRange(strs: openArray[string]): int =
if strs.len == 0: return -1 # raise?
var minAt, maxAt: int
for i in 1 ..< strs.len:
if strs[i] < strs[minAt]: minAt = i
if strs[i] > strs[maxAt]: maxAt = i
if strs[i].len > 0 and strs[i][0] != strs[0][0]:
return 0
for i, c in strs[minAt]:
if c != strs[maxAt][i]: return i
return strs[minAt].len
Integrating that into the test harness appropriately and running on my two running data input examples (extended to a case with zero length common prefix), and switching to mnmx (which gives the range of times instead of mean+-sdev) then gives:
$ for a in lcpVe lcpR lcpB lcpVR; { (repeat 20; lcp -a$a /usr/lib/python3.6/**)|mnmx }
19 in 2300.262..2448.32 microseconds via lcpVertical
19 in 1122.236..1162.767 microseconds via lcpRange
19 in 1523.256..1558.781 microseconds via lcpBinSearch
19 in 1189.47..1224.756 microseconds via lcpVRange
$ for a in lcpVe lcpR lcpB lcpVR; { (repeat 20; lcp -a$a /usr/bin/*)|mnmx }
9 in 51.737..76.532 microseconds via lcpVertical
9 in 48.637..68.903 microseconds via lcpRange
9 in 36.24..40.054 microseconds via lcpBinSearch
9 in 46.253..54.598 microseconds via lcpVRange
$ for a in lcpVe lcpR lcpB lcpVR; { (cd /usr/bin; repeat 20; lcp -a$a *)|mnmx }
0 in 3.099..5.722 microseconds via lcpVertical
0 in 43.392..80.109 microseconds via lcpRange
0 in 8.583..13.113 microseconds via lcpBinSearch
0 in 3.099..5.245 microseconds via lcpVRange
$ for a in lcpVe lcpR lcpB lcpVR; { (cd /usr/lib/python3.6; repeat 20; lcp -a$a **)|mnmx }
0 in 7.868..11.683 microseconds via lcpVertical
0 in 911.713..959.158 microseconds via lcpRange
0 in 174.522..190.735 microseconds via lcpBinSearch
0 in 9.537..14.782 microseconds via lcpVRange
So, you can get the early exit of the vertical method for zero common prefices in the range method without very much extra cost. Eg., slightly negative/in-the-noise cost for the L2 prefixLen==9 case and 6% slower for the prefixLen=19 L3 case with the early exits basically matching times in the two prefixLen==0 cases.
(and, yeah, yeah..the Nim program should import stats, do the repeated loops internally updating a RunningStat, and report. It could even take an lcpAll enum to report on all algorithms..and -- if one wants to commit to only file tree inputs -- maybe even take just a directory as a command parameter, import oswalkdir, and use walkDir and walkDirRec instead of * and ** to remove dependency on shell features.)
Things in that last version weren't robust/efficient with regards to zero length strings being in the mix. This fixes that:
proc lcpLenVRange(strs: openArray[string]): int =
if strs.len == 0: return -1 # raise?
if strs[0].len == 0: return 0 # ANY "" anywhere => 0
let byte0 = strs[0][0]
var minAt, maxAt: int
for i in 1 ..< strs.len:
if strs[i].len == 0: return 0
if strs[i] < strs[minAt]: minAt = i
if strs[i] > strs[maxAt]: maxAt = i
if strs[i][0] != byte0: return 0
for i, c in strs[minAt]:
if c != strs[maxAt][i]: return i
return strs[minAt].len
In the interests of being mostly self-contained, I also pushed a better timeIt to cligen#head and updated my driver to use the new timeIt and to not time the stdout.write parts:
proc timeAlgo(algo=lcpVRange, reps=1, strs: seq[string]) =
var x: int
timeIt(stdout.write, "", 1e-6, 3, " usec via " & ($algo)[3..^1], reps):
x += strs.lcpLen(algo)
echo " ", int(x.float / reps.float + 0.5)
Finally, I created a benchLcp.zsh script:
#!/bin/zsh
lcp=`pwd`/lcp #We need Zsh only for its **
echo "L3 data, 19 answer"
for a in lcpVe lcpR lcpB lcpVR; do $lcp -r10 -a$a /usr/lib/python3.6/**; done
echo "L2 data, 9 answer"
for a in lcpVe lcpR lcpB lcpVR; do $lcp -r10 -a$a /usr/bin/*; done
echo "L3 data, 0 answer"
for a in lcpVe lcpR lcpB lcpVR; do (cd /usr/lib/python3.6; $lcp -r10 -a$a **); done
echo "L2 data, 0 answer"
for a in lcpVe lcpR lcpB lcpVR; do (cd /usr/bin; $lcp -r10 -a$a *); done
and used a little nim-pgo wrapper I have to use gcc's profile guided optimization (-fprofile-generate and -fprofile-use), compiling, running a test/bench, then re-compiling. With all that, I get:
L3 data, 19 answer
1910.448..2253.771 1959.491 +- 32.812 usec via Vertical 19
686.884..1150.131 777.125 +- 48.653 usec via Range 19
922.680..1522.064 996.113 +- 58.569 usec via BinSearch 19
693.560..1163.244 786.591 +- 49.932 usec via VRange 19
L2 data, 9 answer
41.246..63.181 45.848 +- 2.480 usec via Vertical 9
37.193..54.359 40.936 +- 1.778 usec via Range 9
22.650..25.511 23.413 +- 0.353 usec via BinSearch 9
36.716..55.075 41.509 +- 2.292 usec via VRange 9
L3 data, 0 answer
0.000..0.954 0.167 +- 0.094 usec via Vertical 0
535.488..1017.570 627.780 +- 48.740 usec via Range 0
58.651..205.994 82.874 +- 15.300 usec via BinSearch 0
0.715..3.576 1.097 +- 0.278 usec via VRange 0
L2 data, 0 answer
0.000..0.000 0.000 +- 0.000 usec via Vertical 0
32.663..38.862 33.927 +- 0.650 usec via Range 0
3.099..4.768 3.290 +- 0.166 usec via BinSearch 0
0.000..0.238 0.048 +- 0.032 usec via VRange 0
which makes total sense to me. The @marks performance discrepancies (except his mysterious foldl bit), probably just come from the "mix"/averaging of how many zero length answers are in his inputs.
So, this new variant of that VRange hybrid method would make the most sense to recommend for general usage (of the above selection, anyway). It's never that slow, fastest at large scale when performance matters most, and can leverage early exit in easy cases.
The binary search way for smaller scale data may be best if the caller already knows and can pass in a data scale parameter. If we must measure the data size, that takes a whole pass through the length fields. That measurement eats into the 36/22=1.6x L2-non-zero speed ratio, probably reducing it to like 1.3x which isn't that much a boost for all the complexity of measurement+algorithm switching. I got about a 1.15x boost from PGO and most probably skip that entirely to avoid build complexity.