Wiki report/development

From Freephile Wiki


Find out "What's that Wiki Running?" at https://freephile.org/wikireport

Intro

Using the MediaWiki API, we want to query a wiki installation about the site's metadata that tells us the version, and the extensions running there.

We want to be able to retrieve this data, and then import it into our CiviCRM database We also want to be able to create a nice public-facing reporting tool that we can use to do one-off reports, or to show to site owners.

So we'll develop it in 3 phases:

  1. The public-facing reporting tool will be developed first
  2. The conduit to read and write data to the CiviCRM system
  3. Publicize the reporting tool
    1. through campaigns to the people we have in the database
    2. through social and other networks
    3. possibly as a case study or info example on how to make APIs work for you and talk to each other.

The goals of publicizing the reporting tool

  1. to make it better
  2. showcase eQuality Technology capabilities and the capabilities of these systems
  3. develop opportunities to do similar work
  4. contribute to upstream projects (like the example on PHPMailer with GMail)


To create the UI, we'll use Bootstrap

For all wikis, we will need to determine the API endpoint when we normally start with the "wiki" URL. Some methods are more precise than others. More recent MediaWikis will offer Really Simple Discovery <link rel="EditURI" type="application/rsd+xml" href="https://freephile.org/w/api.php?action=rsd" />

Note on formats: The MediaWiki API supports many formats, with JSON and jsonfm the most useful. json is what you think it is, and jsonfm is formatted for viewing in the browser (good for development only).

The API has many possible parameters. https://www.mediawiki.org/wiki/API:Siteinfo The default is 'general', and you should combine as many as you want. So, to get all the info that we're interested in, we would compose a query like so: https://freephile.org/w/api.php?action=query&meta=siteinfo&format=jsonfm&siprop=general|extensions|statistics

General Info

From 'General' we're interested in

  • "base" (which we already have? chicken/egg)
  • "sitename"
  • "logo"
  • "generator"
  • "phpversion"
  • "phpsapi"
  • "dbtype"
  • "dbversion"
  • "lang" --just in case it's not 'en'
  • "timezone"
  • "time"
  • "favicon"

I've found that there can be empty values; and which ones are empty depend on the instance, so we won't hard-code what is in the report... we'll just report on what we find. Likewise, we'll endeavor to collect all the info that is useful, but there will be information that is unavailable in some cases.

Exensions

After the general info, we are especially interested in extensions. Again, the info available for any given extension is going to vary, so we'll report on what's available, and likewise we will record what we can find.

Extensions is an array of items. An extension item will look similar to this (the data fields for each extension will differ according to the author) :

{
    "query": {
        "extensions": [

            {
                "type": "other",
                "name": "Html2Wiki",
                "descriptionmsg": "html2wiki-desc",
                "author": "Greg Rundlett",
                "url": "https://www.mediawiki.org/wiki/Extension:Html2Wiki",
                "version": "2015.02",
                "vcs-system": "git",
                "vcs-version": "c24896064a6a604f71f7e3253373a59d04fe19bc",
                "vcs-url": false,
                "vcs-date": "2015-04-28T17:41:26Z",
                "license-name": "GPL-2.0+",
                "license": "/wiki/Special:Version/License/Html2Wiki"
            }


        ]
    }
}


Do we timestamp in the CiviCRM database? Yes. The "profile" will change over time, and so do we care about what it used to be?

Other Siprops

siprop=statistics is included in the report

 {
    "query": {
        "statistics": {
            "pages": 1363,
            "articles": 198,
            "edits": 5459,
            "images": 963,
            "users": 6,
            "activeusers": 1,
            "admins": 2,
            "jobs": 3660
        }
    }
}

siprop=usergroups is interesting if a wiki is doing anything with groups


api.php?action=query&meta=siteinfo&format=jsonfm&siprop=rightsinfo gives the copyright url and text

api.php?action=query&meta=siteinfo&format=jsonfm&siprop=namespaces|namespacealiases gives the namespaces and aliases, which can reveal 'private' namespaces

api.php?action=query&meta=siteinfo&format=jsonfm&siprop=fileextensions shows you the file extensions allows for file upload.

siprop=libraries is kind of interesting in that you probably didn't know you ran that in your wiki

api.php?action=query&meta=siteinfo&format=jsonfm&siprop=showhooks is really interesting because it shows you what code is listening to what hooks.

api.php?action=query&meta=siteinfo&format=jsonfm&siprop=extensiontags is useful as a documentation page that shows authors the additional tags usable on this wiki

Background

The MediaWiki API offers great detail into what a MediaWiki wiki is running.

The ApiSandbox extension offers a GUI way to explore and even operate on the API [1]

The most popular data format for API communication is JSON.

PHP handles JSON consistent with the expanded definition of "JSON text" in the newer RFC 7159.

What I need to do is collect data, based on some data that I have in a CiviCRM comment (and the original csv file that was imported to CiviCRM), and re-import or add the new data to CiviCRM for market segmentation and intelligence.

I want to report on that data internally, and CiviCRM is pretty good at that. However, I've found that using the API and slicing and dicing your data with the help of Data Tables can be more powerful.

I want to include that data in marketing to show potential clients that they need to upgrade. Or that they have a number of extensions. Including large amounts of data in a marketing message would be best in the form of a "free report'. The reporting interface can also be used as an ad-hoc intelligence reporter and collector.

Is there an API for inserting data into CiviCRM? (Yes CiviCRM has a full API) Or, do I need to rely on the import tools and formatting my CSV with the correct external ID so as to avoid duplicates?

The basic UI for What's that Wiki Running? could be handled by jQuery, but with Bootstrap added in, we get a bit more stylesheet. Still, do we need that? jQuery can handle the AJAX Simple JavaScript could handle the data presentation (and jQuery probably has some useful methods) and we can add jQuery UI for advanced interactions or widgets.

To do cross-domain javascript requests, we'll need jsonp but, it turns out that you just can't make a secure AJAX UI to insecure web content (Duh!). Freephile.org is a secure domain, so we won't be using JavaScript for anything but convenience and UI.

In the jQuery .ajax method, there are several settings. The 'error' setting is a callback, and we could write a function there to try other variations on the domain to find the API endpoint.

Interesting example: http://en.banglapedia.org/index.php?title=Special:Version which uses the "MediaWiki Bootstrap" skin See http://www.mediawikibootstrapskin.co.uk/index.php?title=Main_Page

I decided to use Bootstrap for this project. At first, I was tempted to build a fully client-side framework and JavaScript solution. Since CiviCRM has an API, I could potentially even use JavaScript for pushing data into the backend (authenticated with a secret). Ultimately, Bootstrap was used on the client-side, but PHP was used to create the backend.

Data Table systems in the UI

DataTables can integrate seamlessly with Bootstrap, but it's unlikely that I will need the robustness of Data Tables for this effort. If I wanted to show many records, then yes. But for now, I want to focus on showing the particular details of a specific wiki site.

DynaTables is a jQuery plugin that I began prototyping with, but again, it's not something that I will need since I don't have large datasets to manipulate. I just want to focus on a single wiki at a time. With existing JSON, you still have to know the 'layout' of your data and setup a blank table with the correct structure. I'm more interested in a generic function that you could throw any JSON at, and it would spit out a table.

Tidy Table is a jQuery plugin that creates HTML tables from the data that you feed into the client as JSON The only interesting thing here is the fact that I might want to have something akin to their demo for the initial form state: Allow the user to quickly/easily select which parts of the MW API they want to report on.

Code

This project is licensed under the AGPL. You can clone, fork, and send pull requests at github https://github.com/freephile/wikireport We built a lot of custom fields in CiviCRM to store the data, and wrote a Drupal module to make that data available within CiviMail.

References