Plagiarism at TechNet Wiki - First Empirical Data and Analysis

Plagiarism at TechNet Wiki - First Empirical Data and Analysis

 You may have noticed it: Plagiarism in TechNet Wiki is becoming a problem, and it is getting more attention now. This article presents a first empirical analysis, gives insights to some mechanisms behind plagiarism, explains the current rules, shows alternatives to plagiarism, and summarizes the critical points. Last but not least  it proposes first steps to fix the problem.

This article got quite long, because it is a hard and severe topic, but I think it is worth reading it.




↑ Return to Top


First empirical data

Until today no in depth investigation on plagiarism in TechNet Wiki has been done. So some empirical data might be interesting. Even though this is no long term study, it gives interesting insights and allows the deduction of some rules of thumb.

Between June 7th and July 5th 2013 I searched for plagiarisms in 7 “search sessions”. In this analysis I only counted those articles as plagiarisms, which have an overwhelming percentage of stolen content. I used this approach to avoid discussions about cases of doubt or minor mistakes of members who are not acquainted with citation standards.

I had a look at (a) most new articles of last month (b) selected googled articles with just one revision (c) some articles of members who appear in one of the Wiki leaderboards (d) some other articles of members who have already published a plagiarism. The analysis was restricted to articles written in English.

Let’s have a look at the observations and some first deductions (“rules of thumb”).

Frequency


Observations:
  • I found 66 plagiarisms written by 13 members.
  • 31 plagiarism were written within last month
Rules of thumb:
  • The frequency of heavy plagiarism is similar to spam.
  • It is quite easy to detect plagiarism - if you are willing to do it.

Awareness of the community


Observation:
  • I noticed only 2 other persons - Gokan Ozcifci and XAML guy – who detected (three) plagiarisms. XAML guy mentioned one case in the top contributor blog post recently.
Rule of thumb:
  • The community is not aware of plagiarism. That’s why plagiarisms are rarely detected.

“Second offenders”


Observation: The table shows for each anonymized plagiarist (NN 1 – NN 13), the number of plagiarized articles and some core profile data. I did not check all articles of all listed members (NN 1 – NN 13). I.e. my plagiarism count is a lower bound. These cases are indicated by the “≥” sign.

The figure shows for each anonymized plagiarist, the number of plagiarized articles and some core profile data.

  • 69% of the members with plagiarism created more than 1 plagiarism.
  • 53% of the members with plagiarism created more than 4 plagiarisms.
Rule of thumb:
  • A plagiarism is not a singular mistake.

Plagiarized sources per article


Observation: The table shows how many sources are used in a plagiarized article:

The figure shows how many sources are used in a plagiarized article

  • Most popular is the exact or (nearly exact) copy of one article. This pattern is used by 52% of all plagiarism.
Rule of thumb:
  • Most plagiarisms are straight forward: Plagiarize one source.

Plagiarized sources - Where does the plagiarized content come from?


Observation:

The figure shows wre the plagiarized cntent comes from.

Explanation of the columns:

  • IBM: Official IBM documentation at *.ibm.com.
  • Books: Printed books with copyright.
  • Microsoft: MSDN Library, KB articles or any Microsoft site with content published by MS Corporation.
  • Other: All other internet resources – typically blog posts.
Rule of thumb:
  • Most plagiarized articles use blog posts.

Popular plagiarism techniques


Observation: A popular plagiarism algorithm goes like this:
  1. Google one or more good texts and copy them
  2. Optionally try to "legalize" or obfuscate the plagiarism:
    1. Add some quickly googled links
    2. Add simple figures  – like a windows property window or a command line screenshot
    3. Post a link to the original source at the very end of article or as a comment.

During the last month I found some interesting samples and refinements:

  • Strategy Copy and forget
    Copy a complete article as it is and create exactly one revision.

    Example: The article “SharePoint 2013: What is SkyDrive Pro“ was already deleted at TechNet Wiki. However, it was a 1:1 copy including formatting taken from this smart blog post.

  • Strategy “Buzzword MSDN article” 
    Find a promising MS related buzzword like “Visual Studio 2012”, take the complete article content from a Microsoft source, link the article prominently – for example on the Wiki: Development Portal:

    Example: Visual Studio 2012

  • Strategy Copy with footnote
    Copy a complete article and “pinged back from <url>” or “Content taken from <url>” or “From: <url>“ at the very end of the article or in a posted comment.

    Member NN 12 used this technique in at least 22 of 94 articles. (I did only a quick check with Google in this case. Because I didn’t check the overlap with other plagiarism articles I already detected, these cases were not added to my statistics.)

    Example: Migrate RADIUS config...

  • Strategy (Tiny) enhancement
    Copy a complete article and use algorithm steps 2.1 to 2.3.
    Do not mention the author, do not change a personal intro, do only slight text changes and avoid writing a new text paragraph.

    Example: How To View the MAC Address...

  • Strategy The big mashup
    Copy as many sources as you can find into your article.
    A more structured approach: Use a link list to a topic as a starting point and replace all links by subsets of the referenced articles.

    Example: Windows trust migration...

My personal summary:

Plagiarism is a problem which occurs in an order similar to spam. All detected cases are serious, because the complete or nearly complete article is plagiarized. In most cases a plagiarism is not a single mistake. It is even possible to identify plagiarism strategies.


↑ Return to Top


Analysis

Microsoft’s plagiarism rules

These documents handle plagiarism:

  • Wiki: Code of Conduct:
    “Respect content creators. Do not copy content from another author (such as a blog post) unless you have permission. If you do have permission, mention it at the top of the article and include a link to the original source and author.”
  • Wiki: How to Contribute Content to TechNet Wiki:

    (1)“Do not just copy/paste from TechNet/MSDN or other websites, or blogs, or other sources of material that you did not create. If you do this and then save this without any editing, rewriting and improvement by you, you are plagiarizing another's work. If you are working on material that started on a blog or another website, it is important to link to the original material at the top of the article.”

    (2) “Copy/pasting and then saving under your name is plagiarism. We frown on that here. Violations of copyright will be deleted.”

Terms of Use and Wiki: Code of Conduct exclude all cases of plagiarism – copying whole article or parts or them. I.e. zero tolerance for plagiarism.

The article How to Contribute Content to TechNet Wiki is the backdoor for the current plagiarism problems: In contradiction to Terms of Use and Code of Conduct it allows copying/changing/enhancing of content without having any permission! This article breaks the rules and should be changed!

Even worse: This behavior is not compliant to the US Copyright Act: Anyone who reproduces copyrighted material can be prosecuted. Even altering (“enhancing”) the content doesn’t matter as long as the content is substantially similar to the original.

You may argue that a plagiarist may add valuable enhancements to a plagiarized article. Please note, that that there are always (!) alternatives to plagiarism that are nicer, more trustworthy and respectful towwards the original author.


↑ Return to Top


Alternatives to plagiarism

Let’s have a look at some real life scenarios:

You found a fantastic article and want to share it with the TechNet Wiki Community. Don’t copy the article. Instead add a link to an existing Wiki article. Or even better: Add a commented link to an existing Wiki article and explain why it is worth to follow this link.

You want to give an overview of a broad topic. Instead of creating a mashup article which plagiarizes (subsets of) a lot of articles you should instead create a landing page or an article with a commented link list.

You think a non-TechNet Wiki article (a blog post, a MSDN Library article, …) contains a mistake that should be corrected. In case of a blog post you can post a comment. MSDN Library also supports comments and feedback. You can contact the author.

You think you can enhance an article. Write your own article and link to the original you want to enhance instead of copying it. In some case you may detect that your enhancements do no justify another article. In this case you may decide to post a comment instead or even to discard the idea of a new article. This reduces your own effort and that of your readers.

You write an article about a sophisticated topic and need an introduction which supplies the reader with the necessary background information to follow the rest of your article. Instead of copying original sources you should mention the prerequisites a reader of your article should know and link to background articles.

None of these techniques require copying other articles. By the way: This may remind you of the DRY principle – don’t repeat yourself.

What’s about citation? Citation is OK, but keep in mind: Citation is like a spice – it is not the whole meal.


↑ Return to Top


The problems with plagiarism

  • Legal problems
    Plagiarism and “enhanced” plagiarized articles can have legal consequences for the plagiarists and Microsoft:

    (1) According to the Copyright Act anyone can be prosecuted who reproduces copyrighted material. Even altering (“enhancing”) the content doesn’t matter as long as the content is substantially similar to the original.

    (2) Imagine we are only adding source references to plagiarisms: In case of an impeachment it could be argued that Microsoft can easily detect those plagiarisms and has neglected its duties.

    Maybe it is a good idea to contact Microsoft’s legal department (if adding source references to plagiarisms is preferred instead of deletions).
    I don’t see an alternative to a zero tolerance policy for plagiarism.

  • Adding source references (instead of deletion) is the wrong remedy
    Let’s play an intellectual game: Imagine for a moment that TechNet Wiki contains no plagiarized content (e.g. because we have zero tolerance for plagiarism, community members check new articles or members a plagiarism aware).
  • Reputation gets lost
    A community, magazine or organization which accepts and tolerates plagiarism loses its reputation. This behavior impairs also the non-plagiarized top quality content.
  • Points and achievements lose their values
    An article like the sample article can be copied and published within minutes.
  • Plagiarism of MSDN content is stimulated
    Even in parts of the Community Council is a slight tendency that copying/plagiarizing MSDN content is not too bad.
    If a plagiarist is smart, he will focus on MSDN content: The content has a high quality. He can create lots of articles on the fly.
  • We lose our compass
    If the frequency of articles with source references increases, it establishes the wrong impressions: “Plagiarism is OK as long as no one complains.” and “Copying content is an accepted article creation technique”.

If we don't delete plagiarism and add source references instead, we silently accept plagiarism. But even the lax ”How to Contribute” article states: “Copy/pasting and then saving under your name is plagiarism. We frown on that here.”

Taken to extremes, this means “Anything goes” and “Plagiarized articles have a right of continuance”.

If we follow this track, we are lost: What’s right, what’s wrong? Where is the threshold for plagiarism? I can’t tell in such a setting – can you?

  • Plagiarism is done with intent
    No one writes a plagiarism without knowing it.
  • We ignore our roots
    TechNet Wiki was inspired by Wikipedia. Can you image that plagiarism is accepted at Wikipedia? I can’t. Wikipedia goes even further: Even summarized information should have a proof and a reference.
  • We ignore common standards
    May be not everyone is acquainted with (scientific) citation rules. However, plagiarism has never been an accepted behavior.
  • Top Contributor Award may lose its reputation
    Plagiarism starts affecting the Top Contributor Awards. Examples:
    • June 23th 2013: Longest Article reward:
      “This week's largest document …looks like it's been copied directly from another source …”
    • June 6th 2013: Most Active Contributors & New Articles created
      “sayedissahassan in second place, and leading most new articles for the last month, but most seem just copied in, and may have to be removed :/”
      By the way: I checked 26 of his 41 (now 53) articles, and they were all plagiarisms.

↑ Return to Top


What to do next?

To make a long story short:

  • Plagiarism in TechNet Wiki is a real problem.
  • Simply adding source reference means accepting plagiarism and can lead to severe legal problems.
  • Plagiarism compromises the reputation of TechNet Wiki.
  • Points and achievements lose their values.
  • TechNet Wiki cannot focus on quantity and growth alone. Quality and quality assurance have been neglected.

My suggestions:

  • The Community Council and the community should discuss this topic.
  • If we take The US Copyright Act and common sense as a guideline, there is no alternative to a zero tolerance policy for plagiarism.
  • The community council should add a new focus area: either “Plagiarism” or “Quality Assurance”.
  • The community should be aware of plagiarism.

I think it is time to act. It is up to you and the Community Council. Accepting plagiarism is no long term alternative.


↑ Return to Top


See Also


Community rules concerning plagiarism


Search for already detected plagiarisms by tag


WikiNinja Blog Posts concerning plagiarism

Leave a Comment
  • Please add 2 and 7 and type the answer here:
  • Post
Wiki - Revision Comment List(Revision Comment)
Sort by: Published Date | Most Recent | Most Useful
Comments
  • Carsten Siemens edited Revision 27. Comment: Removed bullet point without text in "See Also" section

  • Carsten Siemens edited Revision 26. Comment: Extended "See also" add link to post "The Council Strikes Back"

  • Carsten Siemens edited Revision 25. Comment: (1) Added tag plagiarism (because it's about this topic). (2) Added link to a Top Contributor Awards post

  • Carsten Siemens edited Revision 24. Comment: Fixed misspelling

  • Carsten Siemens edited Revision 21. Comment: Fixed links to user profiles - they are now internal links (instead of external links).

  • Carsten Siemens edited Revision 20. Comment: Extended "See Also" section: Add link to blog post pf Horizon_Net and the article "Types of Articles Not Appropriate for TechNet Wiki"

  • Carsten Siemens edited Revision 18. Comment: Added tag: has Back to Top link

  • Naomi  N edited Revision 17. Comment: Added link of how to report

  • Naomi  N edited Revision 15. Comment: Typo fix

  • Carsten Siemens edited Revision 14. Comment: Formatting

Page 1 of 3 (24 items) 123
Wikis - Comment List
Sort by: Published Date | Most Recent | Most Useful
Posting comments is temporarily disabled until 10:00am PST on Saturday, December 14th. Thank you for your patience.
Comments
  • Naomi  N edited Original. Comment: Minor grammar corrections

  • Naomi  N edited Revision 1. Comment: Minor edit

  • Carsten Siemens edited Revision 2. Comment: Layout fixes

  • Same author posted several articles in Sweden. I am wondering if they are original

  • Carsten Siemens edited Revision 3. Comment: Fixed misspellings

  • Naomi  N edited Revision 4. Comment: Typo fix

  • MSD library or MSDN library?

  • Hello Naomi,

    at least one of the Swedish articles you mention was a plagiarism (like 44 other English articles he published), In This case an English article was translated with Google.

  • Carsten Siemens edited Revision 5. Comment: Fixed misspelling

  • Hello Naomi, "MSDN library" is correct. I fixed it.

  • yes, I fixed it myself in one revision, but you were changing the article at the same time, so I didn't save my revision.

    I found several articles in Swedish, for now I just added language tag

  • Great article, BTW, and I think it needs to be featured on the main WiKi page

  • Apparently that user continues to post plagiarized content. I reported him twice today to fissues. It needs to stop, we can not keep up with him

  • Great article! I agree with about "Plagiarism is similar to spam".

    And, about a report of the plagiarism, there is a useful article here : social.technet.microsoft.com/.../13529.how-to-report-a-technet-wiki-page.aspx

  • I've reported about some plagiarism article before. But I can't handle such a article well yet, so far. Because it's very difficult to check whether a contributor doesn't have a permission. It's an ideal that the stolen person reported, but it's difficult, too.

Page 1 of 3 (39 items) 123