nimforum mirror - Linked list

andrea (orginal) [2015-02-03T16:15:35+01:00] view original

I have some trouble with the library SinglyLinkedList. Apparently it keeps two pointers, to the head and the tail - in fact it is defined by

SinglyLinkedList[T] = object
  head*, tail*: SinglyLinkedNode[T]

I am not sure why it is necessary to do so. In fact I see that the implementation of prepend is just

proc prepend*[T](L: var SinglyLinkedList[T], n: SinglyLinkedNode[T]) {.inline.} =
  n.next = L.head
  L.head = n

The tail is not touched at all. Is there something I am missing?

def (orginal) [2015-02-03T18:00:40+01:00] view original

The tail would be necessary of appending to a singly linked list, but should of course also be filled with prepend if you're inserting the first element. So this is a bug.

I find the lists module a bit strange anyway. It doesn't contain many of the operations you would want on linked lists, also it has so much duplication. Maybe it would be a good idea to rework it entirely?

Varriount (orginal) [2015-02-03T19:20:05+01:00] view original

What kinds of operations is it missing? (I have only academic experience with linked lists, sequences are usually so much better. )

andrea (orginal) [2015-02-03T19:35:57+01:00] view original

For instance insertion at a given point, or... well, tail, as a list

srwiley (orginal) [2016-01-29T23:43:20+01:00] view original

I took a stab at making a more fully featured singly and doubly linked list type which I placed in a module called 'listsv'. Features include inserting, finding, deleting, extracting, and len(). Some procs take proc arguments which work nicely with the Nim future syntax. For example:

import listsv, future
let list = newLinkedList("bird", "cat", "bear", "mouse")
assert list.len == 4
let found = list.findAll((v:string) => v[0] == 'b')
assert found.len == 2
assert found.head.value == "bird"
assert found.tail.value == "bear"

Tests and additional examples are in the source.

If anyone is interested they can take a look here: https://github.com/srwiley/listsv

andrea (orginal) [2016-01-30T13:09:42+01:00] view original

@srwiley

I still do not understand why you model lists this way. In every other language I have tried, a list consist of an element (the head) and a pointer to the next list (the tail), much like your Link type. Your example would then read

assert found.head == "bird"
assert found.tail.head == "bear"

It is not clear to me why you would want to model the head as a list itself.

In any case, you should look into publishing your example on Nimble :-)

OderWat (orginal) [2016-01-30T13:43:55+01:00] view original

@andrea are you asking what doubly linked lists are good for? In the late 80s we had Amiga Kickstart (Kernel) and Intuition (OS/GUI) where those played a major role in the system.

Cite from a post (reference below):

The AmigaOS contains functions for doubly linked lists which have some very interesting properties:

All functions always only copy pointers. There are no if's.

You don't have to allocate memory to modify the list.

You can remove a node from a list without the list pointer.

https://mail.gnome.org/archives/gtk-devel-list/1998-October/msg00112.html

andrea (orginal) [2016-01-30T13:58:13+01:00] view original

No, I am not asking that. I am asking why singly linked list are not defined in the same way the are defined say, in Lisp. I have no interest in double linked lists, but I would like Nim to have a reasonable implementation of singly linked ones

jibal (orginal) [2016-01-31T03:22:25+01:00] view original

It is not clear to me why you would want to model the head as a list itself.

It's necessary in order to have an empty list. This is how it's implemented in LISP and most other languages and libraries: the list object contains or consists of a reference to the first node (or is null), and each node contains a payload and a reference to the first of the remaining nodes (or is null) -- what you and Haskell call the tail and what LISP calls the cdr.

A doubly-linked list object contains two references instead of one -- a reference to the first node and a reference to the last node (pointing to the same node if there's only one, and both null for an empty list), and each node has a payload and forward/back (or next/prev) references (unless you use the Knuth XOR trick). Sometimes the doubly linked list object is a node itself, and instead of null references, they point to that node ... this makes insertions and deletions slightly simpler. Another possibility is a circular linked (double or single) list of nodes. Then a list object consists of a reference to some node in the circle, or null to indicate an empty list.

It's certainly odd, and contrary to virtually every data structures text, for a module called SinglyLinkedList to have both a head and tail reference. Note that having a tail reference disallows tail-sharing and immutability (persistence): https://en.wikipedia.org/wiki/Linked_list

Jehan (orginal) [2016-01-31T05:42:48+01:00] view original

andrea: I am asking why singly linked list are not defined in the same way the are defined say, in Lisp.

You generally do that so that you can append an element to a list in constant time. And yes, I know that the module doesn't actually define an append operation for singly linked lists. But that's the usual rationale for this implementation.

Jehan (orginal) [2016-01-31T18:20:10+01:00] view original

andrea: Can you explain in more detail? Do you mean that the list keeps a reference to the end of the list? If this is the case, maybe tail is meant to be the last node, and not the rest of the list as in Haskell or Scala.

Exactly.

andrea: In this case, I would say that there is an issue of naming: by choosing names that clash with already popular names in other languages, the module creates a little confusion

Well, this is a terminology that is common in data structure textbooks for imperative languages going back to the 70s. I don't think Haskell and Scala can reasonably claim precedence here. (The ML languages might, but at least OCaml uses tail to refer to the last element in its mutable queues, too.)

I'm not sure that these implementation details should be exposed as part of the public interface, though.

andrea: Hence the question is: given that those lists are optimized for a procedural, non functional style, shuld we have also Lisp-style persistent lists?

You generally don't use linked lists unless you absolutely need worst case constant time element insertion and deletion operations or urgently need them for some other purpose. The performance hit (lack of memory locality, cache-unfriendliness, poor prefetching behavior) and memory overhead compared to a dynamic array is worth it only in select circumstances (see, e.g., this talk by Bjarne Stroustrup).

andrea (orginal) [2016-01-31T18:31:55+01:00] view original

You use lists when you want tail sharing. If you accumulate stuff by prepending, and you do not want to worry about sharing your data, lists are the simplest solutions. Of course, more sophisticated persistent data structures exist, mostly based on finger trees, but having at least the simplest persistent data structure in the stdlib makes sense for me

Jehan (orginal) [2016-01-31T18:45:43+01:00] view original

andrea: You use lists when you want tail sharing.

And I'm pointing out that this is almost never worth it.

andrea (orginal) [2016-01-31T18:55:50+01:00] view original

From a performance point of view, you are perfectly right.

What I am saying is that sometimes one wants an immutable data structure (for instance in order to share it freely among threads), and in that case a persistent data structure that shares some parts allows to have some common operations without having to copy stuff all over the place. Lists are just the simplest example of that, and many people coming from functional languages just expect them to be there.

OderWat (orginal) [2016-01-31T19:06:35+01:00] view original

If something is simple... it can dare to be inefficient very often. There are so many problems to solve which do not care about run time or even resources in the first place. Scripting languages like PHP or Python would not exists if stuff has to be 100% efficient (whatever that means anyway).

That said.. for someone which was clearly reading "tail" as the last node in the list (to implement a fast append) I wondered a lot about the talk :)

Jehan (orginal) [2016-01-31T19:47:35+01:00] view original

andrea: What I am saying is that sometimes one wants an immutable data structure (for instance in order to share it freely among threads), and in that case a persistent data structure that shares some parts allows to have some common operations without having to copy stuff all over the place.

They are generally a bad idea even then. For starters, the reason why you parallelize code is so that you get better performance. Sacrificing 2-3 orders of magnitude of performance up front just so you can parallelize code (and maybe claw back one order of magnitude of performance) rarely computes. Replication on a per-thread basis is almost always better. There are some data structures where this is the best solution, but linked lists generally don't belong in this category (generally, because we're talking about large data structures, and linked lists and "large" don't go together) [1].

andrea: Lists are just the simplest example of that, and many people coming from functional languages just expect them to be there.

Functional programming languages use linked lists because they have to, not because it's necessarily a good idea. The problem of inefficient random access aside, try implementing breadth-first search in a purely functional style and you'll see what I mean [2]. Linked lists are an idiom of functional programming languages that you should not blindly carry over to imperative programming languages [3].

OderWat: There are so many problems to solve which do not care about run time or even resources in the first place.

Obviously, which is why functional programming does it. But the rationale does not really apply to imperative programming languages, where linked lists don't really have any offsetting benefits (in functional programming, the offsetting benefit is how pervasive immutability allows easier reasoning about certain properties of programs).

[1] Plus, Nim does not have a GCed shared heap to begin with, and you can't really do this stuff with manual memory management.

[2] In general, functional programming languages have a hard time with algorithms that require appending to a data structure, FIFO operations, etc. And the standard breadth-first search algorithm requires a FIFO queue.

[3] Keep also in mind that functional programming language implementations have technology in place to mitigate the performance impact. For example, OCaml's young object allocator is designed so that short-lived lists are laid out in an array-like fashion in memory, and there are copying GCs that rearrange objects to maximize locality. Imperative languages are generally less concerned with having to compensate for programmers doing this.

srwiley (orginal) [2016-01-31T21:20:00+01:00] view original

Andrea said:

It is not clear to me why you would want to model the head as a list itself.

It is not, the head value is the first link in the list. Instead of combining the link and list type into one, I chose to keep them distinct so that the LinkedList type can hold fields that apply to the whole list, like the link count which can be maintained with little overhead for most operations.

Jehan said:

And yes, I know that the module doesn't actually define an append operation for singly linked lists.

The listsv module does indeed define an 'append' operation for both singly and doubly linked lists (if listsv is the module you were referring to). As you mentioned, the last link in the 'tail' field allows for efficiently appending new values without iterating through the entire list, or requiring the programmer to maintain a separate a reference to the last link.

In addition I use the LinkedList object to hold some procs in fields which are used to attach different behaviors for singly and doubly linked lists when the list is instantiated. (Dynamic method dispatching does not play well with generics, right?) In this way we can avoid a lot of repeat code, plus it is probably faster than dynamic dispatching once the list is instantiated. This was the part I thought people might find strange (in a procedural kind of way), but it seems to work nicely.

Tail sharing will work in this implementation, just it might invalidate the link count if a link in the common tail is deleted in one list without the other list's count being updated. I don't think tail sharing is the main reason to use a linked list, but rather efficient insertions and deletions. See 'advantages' here: Wikipedia linked list. IMHO think this implementation achieves those advantages.

As for the idea that the tail should refer to the rest of the list after the head, according to the Wikipedia article :

The 'head' of a list is its first node. The 'tail' of a list may refer either to the rest of the list after the head, or to the last node in the list.

We could certainly change the names of the 'head' and 'tail' links to say 'first' and 'last', but I think that would cause more confusion than it would clear up.

The basic idea here is to provide some common (and tested) operations for singly and doubly linked lists so that everyone does not have to implement their own.

andrea (orginal) [2016-01-31T21:51:49+01:00] view original

The basic idea here is to provide some common (and tested) operations for singly and doubly linked lists so that everyone does not have to implement their own.

Then it would be a good idea to put it on nimble! ;-)

srwiley (orginal) [2016-02-01T21:45:58+01:00] view original

Done. Thanks for everyone's comments.

Mirror of forum.nim-lang.org

827 :: Linked list