protesting on graphs : Documentation

Summary:


I: Introduction

protesting on graphs is a program used to simulate the movement of "protester" like entities on small to medium sized graphs (the biggest graph the program can theoretically load has 4.3*10e9 nodes, lines and protesters but I don't recommend loading something that big). The graphs used during the project had sizes varying from a few thousands to ~3*10e6 nodes.
Simulation are launched by passing the path to a graph file, a coefficient of protesters (the number of protesters is proportional to the number of nodes), and a number of iterations of the simulation (how many times will each protester move). Every other argument (rule arguments, trace argument , ...) is optional.





II: What are Graphs? (as far as pog is concerned)

Graphs are represented with adjacency lists in the program these adjacency lists (as well as other stuff) are stored in the GraphTable structure of the C code. it contains a size field to keep track of the number of entries in it a GraphTableEntry pointer field containing the graph's entry a LineArray pointer field where the adjacency lists of the graph are stored a protesterArray pointer field containing the protesters present in the graph a protesterCurNext field to keep track of the position of the protesters a cur_gen field which is an integer used to keep track of the number of iterations since the beginning of a simulation





III: What are protesters? (spoiler: 32bits ints)

protesters are pretty simple. Trivial even. They're represented by their current position stored on a 32bit unsigned integer and (if the propulsion is active) another 32bit integer representing their previous position and...that's it. They're very very basic because the different rules used to calculate their movement only rely on their neighbor nodes (or their previous position in the case of the propulsion rule) The protesters are stored in the protesterArray, a simple wrapper around the C array that contains a size field , an array of "protesters" and an array containing the previous position of the protesters.
The "id" of the protesters doesn't have to be "stored". Since they are stored in an array you can already identify them with their index when moving them around.





IV: How to use pog ?

IV.1: Basic usage

to call the program you must pass it:

every other argument is optional however, you will want to customize the behavior of protesters. In order to do so you can pass rules to the program.
Every rule (except mprop MUST be passed with a coefficient)

The rules often mention the "flux". It's a line attribute. It's determined by the number of protesters going from a node i to a node j during an iteration.

The different rules supported by pog are :

Rule Description
rand every protester chooses a random node amongst their neighbor nodes (moves randomly).
align Their protester will move to their neighbor node where the flux difference flux((a,b)) - flux((b,a)) where a is the node they're currently at is the highest. If multiple nodes share the max flux difference, the protesters chooses one of them randomly.
attra A protester will move to their neighbor node where the number of protesters is the highest. If multiple nodes share the max number of protester, they will choose one of them randomly.
attco the more protesters are on a node the more likely a protester is to move there.
alibad The protester makes a weighed random chance to move to one of it's neighbor node. The coefficient of each neighor is determined by the flux difference (as defined in the align rule) plus the minimum flux difference from the current node to one of it's neighbor. This rule is terrible.
alivi Similar to the align rule except the rule uses a dfs on a small depth (default 5) and sums the flux difference at each node encountered during the dfs to calculate the values of each neighbor.
attvi Similar to attra rule except the rule does a dfs of a small depth (default 5) and sums the protesters of each node encountered during the dfs to calculate the values of each neighbor.
alico The protester makes a weighed random choice to move to one of it's neighbor node. The coeffcient of each neighbor is determined by the highest value between the flux (as defined in the align rule) and 0 divided by the.
aliex The protester makes a weighed random choice to move to one of it's neighbor node. The coefficient of each neighbor node is determined by the flux difference (as defined in the align rule) of each neighbor. The neighbors nodes where the flux difference is negative are ignored (the protesters don't "see" them when making their decision) If every flux difference is less than zero the protester moves at random.
follow The protester will move to their neighbor node where the largest number of protester went to last iteration. In case of egality, the protester will choose randomly between the nodes of same value.
follco The protester makes a weighed random choice to move to one of it's neighbor node. The coefficient of each neighbor node is determined by the flux from the current node to it's neighbors.
propu The protester will move randomly to one of it's neighbor node but can't go back to their previous node. If the only node they could go to is their previous node, they won't move.
repu Opposite of attra, the protester will try to go to their neighbor node with the smallest number of protesters on it. In case of egality, the protester will choose randomly between the nodes of same value.
goku The protester will teleport to a random node of the graph
phobic The protester will try to go to their neighbor node with the smallest, non zero number of protesters on it. In case of egality, the protester will choose randomly between the nodes of same value. If every node around the protester is empty , the protester will also move randomly
Meta Rule Description
mconst there is a constant propability (passed as argument) for a protester to not move.
mcrowd if the sum of the number of protesters on the neighbor nodes of a protester's current node is higher than a threshold, it doesn't move.
mprop every protester has a 1/n ,where n is the number of protesters on it's current node, chance to move.

every rule (except for mprop) MUST be passed with a coefficient. The syntax to parse a rule is : "rule:COEFF" where coeff is a floating point number.
The mcrowd, mconst and mprop rules are mutually exclusive and independant from the other rules. If multiple "m[rules]" are passed only the last one will be used by the simulation.
When passing normal rules their coefficients will be normalised. This allows for more flexibility with arguments. for example if you pass "rand:1 align:3" , "rand:0.1 align:0.3", "rand:100 align:300" you can expect the same results.
The "mrules" are NOT normalized. The mconst rule takes a floating point number between 0 and 1 with 0 meaning protesters will always move and one meaning they'll never move. "mcrowd" takes an integer as argument.

IV.2: Parsing options

pog accepts various options. It handles options with the standard option functions defined in the <unistd.h> library.
The options recognised by pog are :
Rule Description
-h prints help and exits
-d requires a positionnal arguments, will produce traces of the simulation.
-w requires a positionnal arguments, instead of randomly initialising the position of the protesters on the graph, will load their position from a binary file.
-l requires a positionnal arguments, allows to define the generation where the trace of the flux in the simulation will start to be produced.
-s requires a positionnal argument, at the iteration passed as argument, will randomly shuffle the protesters instead of moving them accordign to the rules passed at the start of the program.
-v requires a positionnal arguments, allows to modify the vision of the attvi and alivi rules. The argument of the option will be the vision (depth of the search) used durign the simulation for these rules.

IV.2: Error Messages

protesting on Graphs has a custom error system. When catching an error the program will print error traces allowing to understand which function calls caused the simulation to fail. These error are not the clearest but I think they're better than random segfaults (and they make the program easier to debug). The error flags are stored in an enum in the src/../common.h file. The function to print error is stored in src/misc.c , feel free to check them out if you want more info on errors.





V: Into the nitty gritty

V.1: Introduction

This section will be used for various parts of the more "technical" side of the docu. You don't have to check it out to understand what protesting on Graphs is and how it's used. But feel free to give it a read if you're interrested in learning more about it :)

V.2: custom CSV rep

The CSV format used to write / read graphs is as follows :
the first line of the CSV contains the number of nodes / lines in the graph
the next lines represent the adjacency lists of the nodes
the fields are separated by a comma. Inside the line field,
the lines are separated by a colon.
node_index,number_of_lines,line1:flux1;line2:flux2...lineN:fluxN
this representation allows to store a graph with custom flux
(useful to dump/load a trace from a certain generation)

V.3: LineArray

The LineAray is the array storing relevant information of the lines of a graph.
the 'size'/'cur_in' fields keep track of the max number of elements / current number of elements in the LineAray. They are stored on 32bits unsigned integers
the 'Line * array' field is the main array of the structure. It contains the adjacency lists of every node in the graph stored as an array. This means that they are stored at the same place in memory which gives the program better performance (less cache misses amongst other things )
At the moment, a line only stores the index of the node it represents this information is enough to explore the adjacency lists and know which nodes are connected.
The LineAray also stores two integer arrays corresponding to the flux at a certain line. During the runtime, the lookup of the flux in the line l at the iteration i is retrieved from the cur_flux array.
The flux that will be used at the iteration i+1 is calculated in the next_flux array. I use these two arrays because they make it possible to move every protester at the "same time". ie: every protester has access to the same informations when they move.
It also makes it possible to "prepare the next iteration" in O(1) since I just need to swap the two arrays and flush the new "next array".
The way the flux is stored is that the flux corresponding to the line represented at the index l of the Line * array is stored at cur_flux[l]

V.4: protesterCurNext

the protesterCurNext is a simple wrapper around two arrays of 32bits unsigned integer of the same size (with the same number of elements)
the 'size' field of the structure is the number of elements in the arrays.
the arrays store the number of protester in a node of the graph. To retrieve the number of
to calculate the number of protesters in a node n at the iteration i+1 you increment the next_num array.
Using two arrays allows to swap the two arrays like I do with the LineArray to prepare an iteration in O(1).

V.5: GrapheTableEntry

a GraphTableEntry stores the information needed to store the adjacency list of a node in the graph. in order to do so , it stores the reference to the first neighbor of the node in the array of adjacency lists (cf LineAray documentation) as well as the number of neighbors of the node this makes it possible to know when the adjacency list of the entry stops.

V.6 : Trace Format

the Tactics are stored in the s_tactics structure. "Tactics" is the name given to a combination of Rules (for example "rand:1 align:3" is a tactic)
it is currently implemented as a dynamic array. the size/capa fields stores it's current size and max size.
the Rule * rule_arr is the actual array containing the rules present in a tactic. an entry in the array contains a coeff represented as an 16 bit integer that is calculated by flooring the numbers given as argument to the program
and a function pointer to the rule associated with this coeff. When calculating the movement of a protester , a random number is generated and the rule corresponding to this number is used to determine the rule a protester will choose which calculates the node a protester will go to.

V.7 : Trace Format

When running fairly big simulations on fairly big graphs , producing a trace without it taking a while to be generated can be an issue.
The solution used in the program to fix this issue is to heavily rely on dumping the binary arrays of the graphs. At the beginning and end of the simulation the CSV graph is dumped in a human readable CSV format. They're stored on the [trace name]_hr and [trace name]_hrend files.
The other human readable file produced is [trace name]_lines containing two integers on each line , corresponding to the lines (edges).
Every other files contain binary data. They can be loaded in python with the numpy fromfile function.
These files are :

  • [trace name]_curnum keeping track of the number of protester at each nodes (the protesterCurNext.cur array is added to the file after each iteration).
  • [trace name]_flux if the start of flux dump
  • [trace name]_wkpos keeping track of the position of the protesters (the protesterArrat.array is added to the file after each iteration)