User:Andrevan/Alternative to checkuser
This is a Wikipedia user page. This is not an encyclopedia article or the talk page for an encyclopedia article. If you find this page on any site other than Wikipedia, you are viewing a mirror site. Be aware that the page may be outdated and that the user in whose space this page is located may have no personal affiliation with any site other than Wikipedia. The original page is located at https://en.wikipedia.org/wiki/User:Andrevan/Alternative_to_checkuser. |
This is an essay. It contains the advice or opinions of one or more Wikipedia contributors. This page is not an encyclopedia article, nor is it one of Wikipedia's policies or guidelines, as it has not been thoroughly vetted by the community. Some essays represent widespread norms; others only represent minority viewpoints. |
Checkuser is an attempt to solve the problem of sockpuppetry. However, this problem continues to occur: checkuser is limited, relies on unreliable IP address info, and can be cheated easily. IP addresses are a bad way to track people anyway and are only going to be less and less useful over time. An alternative approach has been behavioral or sentiment analysis, but with a modicum of deception to throw off such analysis, it is defeated and may be caught only in extraordinary circumstances.
Sockpuppetry is certainly an epidemic on the English Wikipedia and I imagine it is common on many wikis. Wikimedia projects do not require confirmation of identity in most circumstances. Since wikis are a social platform for collaborative work and dispute resolution, the appearance of a mob of similar-minded people throws off many social consensus-building processes. I propose a system of technical identity checking which, with requisite social ceremonies, could deter sock puppetry.
Consider a scheme in which a unique key is generated on a successful login and stored in a user's session cookie or local storage. Along with this key we store a dictionary of 1-way salted hashes of all the relevant information about this user: browser user agent, resolution, IP address, connection speed, we can store a small model of the amount of time the users spends looking at different pages on the site. This is an anonymized unique fingerprint of the user's behavioral profile on the website.
Every time a user with the same key seems to have a new fingerprint, an alert can be raised to an administrator to review. This would indicate a multi-user or role account. Similarly, if the same fingerprint appears on two different keys, this would indicate a sockpuppet. It sounds a little Orwellian but would involve 0 storage or knowledge of a user's actual personal info, if done properly. If the fingerprints change due to switching browsers or devices, the system could store which fingerprints it has seen before. A user with 2-3 different editing habits might not necessarily be a problem, because other variables in the fingerprint would not change. If everything changes at once in a weird way, or if 2 users have everything in common except for connection/IP, that might be a problem.
Since it's theoretically possible that users could be indistinguishable until the algorithm is trained to focus on weighing relevant variables, this would probably produce a few false positives at first.