protesting on graphs is a program used to simulate the movement
of "protester" like entities on small to medium sized
graphs (the biggest graph the program can theoretically load has 4.3*10e9 nodes,
lines and protesters but I don't recommend loading something that big).
The graphs used during the project had sizes varying from a few thousands to
~3*10e6 nodes.
Simulation are launched by passing the path to a graph file, a coefficient of protesters
(the number of protesters is proportional to the number of nodes), and a number of
iterations of the simulation (how many times will each protester move).
Every other argument (rule arguments, trace argument , ...) is optional.
Graphs are represented with adjacency lists in the program these adjacency lists (as well as other stuff) are stored in the GraphTable structure of the C code. it contains a size field to keep track of the number of entries in it a GraphTableEntry pointer field containing the graph's entry a LineArray pointer field where the adjacency lists of the graph are stored a protesterArray pointer field containing the protesters present in the graph a protesterCurNext field to keep track of the position of the protesters a cur_gen field which is an integer used to keep track of the number of iterations since the beginning of a simulation
protesters are pretty simple. Trivial even. They're represented by their
current position stored on a 32bit unsigned integer
and (if the propulsion is active) another 32bit integer
representing their previous position and...that's it.
They're very very basic because the different rules used
to calculate their movement only rely on their neighbor nodes
(or their previous position in the case of the propulsion rule)
The protesters are stored in the protesterArray, a simple wrapper around the C
array that contains a size field , an array of "protesters" and an array containing
the previous position of the protesters.
The "id" of the protesters doesn't have to be "stored". Since they are stored in an
array you can already identify them with their index when moving them around.
to call the program you must pass it:
The rules often mention the "flux". It's a line attribute. It's determined by the number of protesters going from a node i to a node j during an iteration.
The different rules supported by pog are :Rule | Description |
---|---|
rand |
every protester chooses a random node amongst their neighbor nodes (moves randomly). |
align |
Their protester will move to their neighbor node where the flux difference flux((a,b)) - flux((b,a)) where a is the node they're currently at is the highest. If multiple nodes share the max flux difference, the protesters chooses one of them randomly. |
attra |
A protester will move to their neighbor node where the number of protesters is the highest. If multiple nodes share the max number of protester, they will choose one of them randomly. |
attco |
the more protesters are on a node the more likely a protester is to move there. |
alibad |
The protester makes a weighed random chance to move to one of it's neighbor node. The coefficient of each neighor is determined by the flux difference (as defined in the align rule) plus the minimum flux difference from the current node to one of it's neighbor. This rule is terrible. |
alivi |
Similar to the align rule except the rule uses a dfs on a small depth (default 5) and sums the flux difference at each node encountered during the dfs to calculate the values of each neighbor. |
attvi |
Similar to attra rule except the rule does a dfs of a small depth (default 5) and sums the protesters of each node encountered during the dfs to calculate the values of each neighbor. |
alico |
The protester makes a weighed random choice to move to one of it's neighbor node. The coeffcient of each neighbor is determined by the highest value between the flux (as defined in the align rule) and 0 divided by the. |
aliex |
The protester makes a weighed random choice to move to one of it's neighbor node. The coefficient of each neighbor node is determined by the flux difference (as defined in the align rule) of each neighbor. The neighbors nodes where the flux difference is negative are ignored (the protesters don't "see" them when making their decision) If every flux difference is less than zero the protester moves at random. |
follow |
The protester will move to their neighbor node where the largest number of protester went to last iteration. In case of egality, the protester will choose randomly between the nodes of same value. |
follco |
The protester makes a weighed random choice to move to one of it's neighbor node. The coefficient of each neighbor node is determined by the flux from the current node to it's neighbors. |
propu |
The protester will move randomly to one of it's neighbor node but can't go back to their previous node. If the only node they could go to is their previous node, they won't move. |
repu |
Opposite of attra, the protester will try to go to their neighbor node with the smallest number of protesters on it. In case of egality, the protester will choose randomly between the nodes of same value. |
goku |
The protester will teleport to a random node of the graph |
phobic |
The protester will try to go to their neighbor node with the smallest, non zero number of protesters on it. In case of egality, the protester will choose randomly between the nodes of same value. If every node around the protester is empty , the protester will also move randomly |
Meta Rule | Description |
---|---|
mconst |
there is a constant propability (passed as argument) for a protester to not move. |
mcrowd |
if the sum of the number of protesters on the neighbor nodes of a protester's current node is higher than a threshold, it doesn't move. |
mprop |
every protester has a 1/n ,where n is the number of protesters on it's current node, chance to move. |
every rule (except for mprop) MUST be passed with a coefficient. The syntax to parse a
rule is : "rule:COEFF" where coeff is a floating point number.
The mcrowd, mconst and mprop rules are mutually exclusive and independant from the other rules.
If multiple "m[rules]" are passed only the last one will be used by the simulation.
When passing normal rules their coefficients will be normalised. This allows for more flexibility with arguments.
for example if you pass "rand:1 align:3" , "rand:0.1 align:0.3", "rand:100 align:300" you can expect the same results.
The "mrules" are NOT normalized. The mconst rule takes a floating point number between 0 and 1 with 0 meaning
protesters will always move and one meaning they'll never move. "mcrowd" takes an integer as argument.
Rule | Description |
---|---|
-h |
prints help and exits |
-d |
requires a positionnal arguments, will produce traces of the simulation. |
-w |
requires a positionnal arguments, instead of randomly initialising the position of the protesters on the graph, will load their position from a binary file. |
-l |
requires a positionnal arguments, allows to define the generation where the trace of the flux in the simulation will start to be produced. |
-s |
requires a positionnal argument, at the iteration passed as argument, will randomly shuffle the protesters instead of moving them accordign to the rules passed at the start of the program. |
-v |
requires a positionnal arguments, allows to modify the vision of the attvi and alivi rules. The argument of the option will be the vision (depth of the search) used durign the simulation for these rules. |
protesting on Graphs has a custom error system. When catching an error the program will print error traces allowing to understand which function calls caused the simulation to fail. These error are not the clearest but I think they're better than random segfaults (and they make the program easier to debug). The error flags are stored in an enum in the src/../common.h file. The function to print error is stored in src/misc.c , feel free to check them out if you want more info on errors.
This section will be used for various parts of the more "technical" side of the docu. You don't have to check it out to understand what protesting on Graphs is and how it's used. But feel free to give it a read if you're interrested in learning more about it :)
The CSV format used to write / read graphs is as follows :
the first line of the CSV contains the number of nodes / lines in the graph
the next lines represent the adjacency lists of the nodes
the fields are separated by a comma. Inside the line field,
the lines are separated by a colon.
node_index,number_of_lines,line1:flux1;line2:flux2...lineN:fluxN
this representation allows to store a graph with custom flux
(useful to
dump/load a trace from a certain generation)
The LineAray is the array storing relevant information of the lines of a graph.
the 'size'/'cur_in' fields keep track of the max number of elements / current number
of elements in the LineAray. They are stored on 32bits unsigned integers
the 'Line * array' field is the main array of the structure. It contains the
adjacency lists of every node in the graph stored as an array. This means that they are
stored at the same place in memory which gives the program better performance (less cache misses amongst
other things )
At the moment, a line only stores the index of the node it represents this information
is enough to explore the adjacency lists and know which nodes are connected.
The LineAray also stores two integer arrays corresponding to the flux at a certain line.
During the runtime, the lookup of the flux in the line l at the iteration i is retrieved from the
cur_flux array.
The flux that will be used at the iteration i+1 is calculated in the next_flux array.
I use these two arrays because they make it possible to move every protester at the "same time".
ie: every protester has access to the same informations when they move.
It also makes it possible
to "prepare the next iteration" in O(1) since I just need to swap the two arrays and flush the
new "next array".
The way the flux is stored is that the flux corresponding to the line represented at
the index l of the Line * array is stored at cur_flux[l]
the protesterCurNext is a simple wrapper around two arrays of 32bits unsigned integer of the
same size (with the same number of elements)
the 'size' field of the structure is the number of elements in the arrays.
the arrays store the number of protester in a node of the graph. To retrieve the number
of
to calculate the number of protesters in a node n at the iteration i+1 you increment
the next_num array.
Using two arrays allows to swap the two arrays like I do with the LineArray
to prepare an iteration in O(1).
a GraphTableEntry stores the information needed to store the adjacency list of a node in the graph. in order to do so , it stores the reference to the first neighbor of the node in the array of adjacency lists (cf LineAray documentation) as well as the number of neighbors of the node this makes it possible to know when the adjacency list of the entry stops.
the Tactics are stored in the s_tactics structure. "Tactics" is the name given to a
combination of Rules (for example "rand:1 align:3" is a tactic)
it is currently implemented as a dynamic array.
the size/capa fields stores it's current size and max size.
the Rule * rule_arr is the actual array containing the rules present in a tactic.
an entry in the array contains a coeff represented as an 16 bit integer that is calculated by
flooring the numbers given as argument to the program
and a function pointer to the rule associated
with this coeff. When calculating the movement of a protester , a random number is generated and
the rule corresponding to this number is used to determine the rule a protester will choose
which calculates the node a protester will go to.
When running fairly big simulations on fairly big graphs ,
producing a trace without it taking a while to be generated
can be an issue.
The solution used in the program to fix this issue is to heavily rely on dumping
the binary arrays of the graphs.
At the beginning and end of the simulation the CSV graph is dumped in a human readable CSV
format. They're stored on the [trace name]_hr and [trace name]_hrend files.
The other human readable file produced is [trace name]_lines containing two
integers on each line , corresponding to the lines (edges).
Every other files contain binary data. They can be loaded in python with the numpy fromfile function.
These files are :