白嫖 Moss 斯坦福文件查重

白嫖 Moss 斯坦福文件查重
如题,题主最近在做项目的时候,因为原创性,被要求跟题主自己以前上传过和借鉴过的 github 项目进行查重,这要求一出来,题主就懵逼了,手上并没有现成的查重工具,

上CSDN和 Github 上也没有找到合适的项目做这个的(都是简单的数据结构作业,不能满足题主的需求),于是经过一段时间的查找资料,发现 Stanford 有一个免费的叫 moss 的查重服务器,非常方便。

不过因为是英文网站,搞得题主踩了一些坑才搞定,下面分享题主的实现经历,也给想要用 moss 查重的小伙伴一些参考 Moss 官网 http://theory.stanford.edu/~aiken/moss/

1 要求
linux 系统
谷歌邮箱 (@gmail)
perl
Windows 系统可以参照这个 https://www.youtube.com/watch?v=4fiF2YVpJ8A

Perl Unbuntu 安装: https://blog.csdn.net/junweifan/article/details/7260401

2 具体操作流程

  1. 首先,到 Moss 的官网
  2. 打开自己的 谷歌邮箱,主题不用写,收件人 moss@moss.stanford.edu
  3. 邮件内容如下:
  4. 邮件发送之后,会在1分钟到2分钟之内回复你(假如不是,建议看看上面那一步错了)

从这个红色箭头处,到最后复制粘贴到一个文件,取名叫moss(没有后缀名!)

这里有一个小坑,就是你假如是直接复制粘贴到 Windows txt里面然后把后缀名删了,用 xftp 传到 linux 服务器里面,会因为字符编码不一样 perl 报错

在你自己的 linux 命令行用下面这行命令之后就可以运行了

perl -p -i -e “s/\R/\n/g” moss

  1. 跑查重
    确保你要查重的文件,与 moss 文件在同一个文件夹里面
    以我自己的查重为例:

drwxr-xr-x 2 root root 4096 Jul 11 12:29 ./
drwxr-xr-x 6 root root 4096 Jul 10 16:11 ../
-rw-r–r– 1 root root 8983 Jul 10 17:05 file1.py
-rw-r–r– 1 root root 10081 Jul 10 16:11 file2.py
-rwxr-xr– 1 root root 11097 Jul 10 16:19 moss*
之后运行代码:

./moss -l python file1.py file2.py

界面中会显示:

Checking files . . .
OK
Uploading file1.py …done.
Uploading file2.py …done.
Query submitted. Waiting for the server’s response.
http://moss.stanford.edu/results/XXXXXXXXXX
然后根据这个网址访问就可以得到结果:

6 问题区
2022.08.22 更新:
很多人在评论区问邮件的事情,有可能是因为没用 谷歌邮箱 的问题
附上Moss官网 https://theory.stanford.edu/~aiken/moss/ 有问题可以先去官网看看

7. 官方文档参照:

#  moss [-l language] [-d] [-b basefile1] ... [-b basefilen] [-m #] [-c "string"] file1 file2 file3 ...
#
# The -l option specifies the source language of the tested programs.
# Moss supports many different languages; see the variable "languages" below for the
# full list.
#
# Example: Compare the lisp programs foo.lisp and bar.lisp:
#
# moss -l lisp foo.lisp bar.lisp
#
#
# The -d option specifies that submissions are by directory, not by file.
# That is, files in a directory are taken to be part of the same program,
# and reported matches are organized accordingly by directory.
#
# Example: Compare the programs foo and bar, which consist of .c and .h
# files in the directories foo and bar respectively.
#
# moss -d foo/*.c foo/*.h bar/*.c bar/*.h
#
# Example: Each program consists of the *.c and *.h files in a directory under
# the directory "assignment1."
#
# moss -d assignment1/*/*.h assignment1/*/*.c
#
#
# The -b option names a "base file". Moss normally reports all code
# that matches in pairs of files. When a base file is supplied,
# program code that also appears in the base file is not counted in matches.
# A typical base file will include, for example, the instructor-supplied
# code for an assignment. Multiple -b options are allowed. You should
# use a base file if it is convenient; base files improve results, but
# are not usually necessary for obtaining useful information.
#
# IMPORTANT: Unlike previous versions of moss, the -b option *always*
# takes a single filename, even if the -d option is also used.
#
# Examples:
#
# Submit all of the C++ files in the current directory, using skeleton.cc
# as the base file:
#
# moss -l cc -b skeleton.cc *.cc
#
# Submit all of the ML programs in directories asn1.96/* and asn1.97/*, where
# asn1.97/instructor/example.ml and asn1.96/instructor/example.ml contain the base files.
#
# moss -l ml -b asn1.97/instructor/example.ml -b asn1.96/instructor/example.ml -d asn1.97/*/*.ml asn1.96/*/*.ml
#
# The -m option sets the maximum number of times a given passage may appear
# before it is ignored. A passage of code that appears in many programs
# is probably legitimate sharing and not the result of plagiarism. With -m N,
# any passage appearing in more than N programs is treated as if it appeared in
# a base file (i.e., it is never reported). Option -m can be used to control
# moss' sensitivity. With -m 2, moss reports only passages that appear
# in exactly two programs. If one expects many very similar solutions
# (e.g., the short first assignments typical of introductory programming
# courses) then using -m 3 or -m 4 is a good way to eliminate all but
# truly unusual matches between programs while still being able to detect
# 3-way or 4-way plagiarism. With -m 1000000 (or any very
# large number), moss reports all matches, no matter how often they appear.
# The -m setting is most useful for large assignments where one also a base file
# expected to hold all legitimately shared code. The default for -m is 10.
#
# Examples:
#
# moss -l pascal -m 2 *.pascal
# moss -l cc -m 1000000 -b mycode.cc asn1/*.cc
#
#
# The -c option supplies a comment string that is attached to the generated
# report. This option facilitates matching queries submitted with replies
# received, especially when several queries are submitted at once.
#
# Example:
#
# moss -l scheme -c "Scheme programs" *.sch
#
# The -n option determines the number of matching files to show in the results.
# The default is 250.
#
# Example:
# moss -c java -n 200 *.java
# The -x option sends queries to the current experimental version of the server.
# The experimental server has the most recent Moss features and is also usually
# less stable (read: may have more bugs).
#
# Example:
#
# moss -x -l ml *.ml
The experimental server has the most recent Moss features and is also usually
# less stable (read: may have more bugs).
#
# Example:
#
# moss -x -l ml *.ml