Hi,
I have been experimenting with agents lately, trying to create agent workflows that can help maintain my repos. The most important challenge, in my opinion, is being able to port your dependencies to the latest version. This task is tedious and cannot be fully automated with a simple script.
An agent needs enough context to understand what has changed in the libraries' public interface to make the correct changes. The best source of truth for this is often the generated docs themselves. If this approach works for native Nim libraries, it can then be extended to updating wrappers. You can have c2nim work on the file, generate docs, and produce diffs that provide enough context for AI to make changes.
However, feeding the Nim docs directly into the system can be inefficient, as it wastes too many input tokens and contains a lot of irrelevant information. Adn, the docs need to be diffable. A small change in a function parameter should result in a small change in the diff, so that it's easily understood by AI. To address this, a special format is needed. Here's what I have come up with so far:
<const name="RayWhite"> # XML tags help AI understand boundaries better
VALUE: Color(r: 245, g: 245, b: 245, a: 255) # upper or lowe case?
</const>
<enum name="GamepadAxis"> # name attribute might be a good idea for tooling
PRAGMAS:
size: sizeof(int32)
MEMBERS:
LeftX # Gamepad left stick X axis
LeftY # Gamepad left stick Y axis
RightX # Gamepad right stick X axis
RightY # Gamepad right stick Y axis
LeftTrigger # Gamepad back trigger left, pressure level: [1..-1]
RightTrigger # Gamepad back trigger right, pressure level: [1..-1]
</enum>
<object name="Wave">
PRAGMAS:
completeStruct
bycopy
FIELDS:
frameCount: uint32 # Total number of frames (considering channels)
sampleRate: uint32 # Frequency (samples per second)
sampleSize: uint32 # Bit depth (bits per sample): 8, 16, 32 (24 not supported)
channels: uint32 # Number of channels (1-mono, 2-stereo, ...)
data: pointer # Buffer data pointer
</object>
<proc name="emscriptenSetMainLoop">
PARAMETERS:
f: emCallbackFunc
fps: int32
simulateInfiniteLoop: int32
PRAGMAS:
cdecl
</proc>
<proc name="setPixelColor">
GENERIC_PARAMS:
T: Pixel
PARAMETERS:
pixel: var T
color: Color
</proc>
Then, a custom diff tool needs to be made that outputs the diff with the enclosing tag, so that all relevant information is kept. Since there are people in this forum with more experience using coding assistants, I would like to ask for your thoughts on this approach. What do you think?
I do SEO professionally and needed to learn how these AI tools were parsing web pages so I can better help my clients. The way Google sees a web page isn't like how, say ChatGPT sees it. I jailbroke GPT-5 and got it to dump the raw output from its fetch tool.
The developers do not waste a single token. Slight exaggeration, but it looks something like this:
{
"visible_text_lines": [
{"id": "L0", "text": "Example Heading"},
{"id": "L1", "text": "Some paragraph text."}
],
"links": [{"text": "Home", "href": "/"}],
"headings": [{"tag": "H1", "text": "Main Title"}],
"metadata": {"title": "Page Title", "description": "..."},
"images": [{"alt": "Logo", "src": "/logo.png"}]
}
Ignore the fields except for 'visible_text_lines". The others only appear in certain situations. I thought it may be beneficial for your purpose to show it still.
The point is the developers really figured out how to decompose and condense a web page without losing detail.
The 'id's are segment IDs (L0, L1, …) and they reflect render order, not the HTML file’s line numbers. They are semantically grouped sections of content. The AI uses these segments for targeting / reference (I believe).
From what I've been able to glean, the AI strips the page of scripts, style tags, etc. and decomposes the rest of the web page into markdown (like ## for an h2 tag), and then groups the content into segments (chunks them into logical blocks ex: assigns L0, L1, etc.).
Maybe these insights can help you on your journey some.
@Araq, dagon looks awesome! It seems to serve a slightly different purpose though, it could help with AI coding. You can skip RAG entirely, just list the functions/objects names you're interested in, and it will fill the definitions automatically.
@Niminem when I tried to prompt AI to suggest the best format for my purpose it also recommended JSON, I went with markdown because I thought it would be too wastefull. I have to test an example and measure tokens.
For starters it would be great if Nimdoc could just dump md/rst as used for generating HTML. Surprisingly, it can't.
That would be also useful for (humans) browsing the docs from the terminal without w3m/lynx.
Perhaps have the AI only turn it into structured markdown and then convert that into JSON programmatically.
Ex:
"""
L0 #this was an h1
L1 this is some paragraph text here
L2 - stuff 1
- stuff 2
- stuff 3
L3 this is some **more** paragraph text
"""
Into:
{
"visible_text_lines": [
{"id": "L0", "text": "#this was an h1"},
{"id": "L1", "text": "this is some paragraph text here"},
{"id": "L2", "text": "- stuff 1 \n - stuff 2 \n - stuff 3"},
{"id": "L3", "text": "This is some **more** paragraph text"}
],
# ... other data here ...
}
The way I see it is, creating your own symbols for the markdown (kind of like what you have above but markdown form instead of XML) will dramatically reduce token usage, and then leveraging JSON would then be easiest to reason about the diffing and other things you'd need to do.