Martin Pohast, Tim Gollub, Mahias Hagen, and Benno Stein
teaser image and the target description. As representations for the
teaser text and the target description, doc2vec embeddings of the
elds are employed. To represent the teaser image, a pre-trained ob-
ject detection network was applied to the image, and the activation
on a convolutional layer was taken as image representation.
Torpedo by Indurthi and Oota [2017] uses pre-trained Glove
word embeddings (on Wikipedia and a further dataset) to represent
the teaser message of a tweet. For this, the word embeddings for the
dierent words in the teaser text are averaged. In addition, seven
handcrafted linguistic features are added to the representation.
With this feature set, a linear regression model is trained to predict
the clickbait strength of tweets.
Salmon by Elyashar et al
.
[2017] applies gradient boosting (XG-
Boost) to a tweet representation that consists of three feature types:
(1) teaser image-related features encoding whether there is a teaser
image, and, using OCR on the image, whether there is text in the
image, (2) linguistic features extracted from the teaser text and the
linked article elds, and (3) features dedicated to detect so-called
abusers that are supposed to capture user behavior patterns.
Snapper by Papadopoulou et al
.
[2017] trains separate logistic
regression classiers on dierent feature sets extracted from the
teaser text, the linked article title, and the teaser images (features
extracted rst using the Cae library).
5
In a second step, the predic-
tions of the individual classiers are taken as input to train a nal
logistic regression classier.
7 CONCLUSION
The Clickbait Challenge 2017 stimulated research and development
towards clickbait detection: 13 approaches have been submitted to
the challenge. Many of these approaches have been released open
source by their authors.
6
Together with the working prototype
deployed within virtual machines at TIRA, this renders the pro-
ceedings of the clickbait challenge reproducible, and newcomers
have an easier time following up on previous work.
Several more approaches have been proposed and submitted to
TIRA after the challenge had ended. Together with zingel, these
four additional approaches are the top ve best-performing click-
bait detectors on the leaderboard at the time of publishing the
current challenge overview.
7
The leading approach, albacore by
Omidvar et al
.
[2018], like zingel, employs a biGRU network, ini-
tialized by Glove word embeddings. The runner-up anchovy is also
an adaptation of zingel, whereas icarsh by Wiegmann et al
.
[2018]
demonstrates that our baseline [Potthast et al
.
2016] is still com-
petitive: when optimizing the selection of features using a newly
proposed feature selection approach, the baseline approach im-
proves substantially. For the two approaches anchovy and ray, at
the time of writing, no written reports have surfaced. More teams
have registered after the rst challenge has passed, now working
on new approaches to solve the task. We will keep the evaluation
system running for as long as possible to allow for a continued and
fair evaluation of these new approaches.
5
http://cae.berkeleyvision.org
6
We collected them here: https://github.com/clickbait-challenge
7
https://www.tira.io/task/clickbait-detection/
REFERENCES
A. Agrawal. 2016. Clickbait detection using deep learning. In 2016 2nd International
Conference on Next Generation Computing Technologies (NGCT). 268–272.
https://doi.org/10.1109/NGCT.2016.7877426
Ankesh Anand, Tanmoy Chakraborty, and Noseong Park. 2017. We Used Neural
Networks to Detect Clickbaits: You Won’t Believe What Happened Next!. In
Advances in Information Retrieval - 39th European Conference on IR Research, ECIR
2017, Aberdeen, UK, April 8-13, 2017, Proc.. 541–547.
https://doi.org/10.1007/978-3-319-56608-5_46
Prakhar Biyani, Kostas Tsioutsiouliklis, and John Blackmer. 2016. "8 Amazing Secrets
for Getting More Clicks": Detecting Clickbaits in News Streams Using Article
Informality. In Proc. of the Thirtieth AAAI Conference on Articial Intelligence,
February 12-17, 2016, Phoenix, Arizona, USA. 94–100.
http://www.aaai.org/ocs/index.php/AAAI/AAAI16/paper/view/11807
Xinyue Cao, Thai Le, and Jason Zhang. 2017. Machine Learning Based Detection of
Clickbait Posts in Social Media. CoRR abs/1710.01977 (2017).
http://arxiv.org/abs/1710.01977
Abhijnan Chakraborty, Bhargavi Paranjape, Sourya Kakarla, and Niloy Ganguly. 2016.
Stop Clickbait: Detecting and preventing clickbaits in online news media. In 2016
IEEE/ACM International Conference on Advances in Social Networks Analysis and
Mining, ASONAM 2016, San Francisco, CA, USA, August 18-21, 2016. 9–16.
https://doi.org/10.1109/ASONAM.2016.7752207
Aviad Elyashar, Jorge Bendahan, and Rami Puzis. 2017. Detecting Clickbait in Online
Social Media: You Won’t Believe How We Did It. CoRR abs/1710.06699 (2017).
http://arxiv.org/abs/1710.06699
Siddhartha Gairola, Yash Kumar Lal, Vaibhav Kumar, and Dhruv Khattar. 2017. A
Neural Clickbait Detection Engine. CoRR abs/1710.01507 (2017).
http://arxiv.org/abs/1710.01507
Maria Glenski, Ellyn Ayton, Dustin Arendt, and Svitlana Volkova. 2017. Fishing for
Clickbaits in Social Images and Texts with Linguistically-Infused Neural Network
Models. CoRR abs/1710.06390 (2017). http://arxiv.org/abs/1710.06390
Tim Gollub, Benno Stein, and Steven Burrows. 2012. Ousting Ivory Tower Research:
Towards a Web Framework for Providing Experiments as a Service. In 35th
International ACM Conference on Research and Development in Information
Retrieval (SIGIR 2012). ACM, 1125–1126. https://doi.org/10.1145/2348283.2348501
Alexey Grigorev. 2017. Identifying Clickbait Posts on Social Media with an Ensemble
of Linear Models. CoRR abs/1710.00399 (2017). http://arxiv.org/abs/1710.00399
Vijayasaradhi Indurthi and Subba Reddy Oota. 2017. Clickbait detection using word
embeddings. CoRR abs/1710.02861 (2017). http://arxiv.org/abs/1710.02861
Johannes Kiesel, Florian Kneist, Milad Alshomary, Benno Stein, Matthias Hagen, and
Martin Potthast. 2018. Reproducible Web Corpora: Interactive Archiving with
Automatic Quality Assessment. Journal of Data and Information Quality (JDIQ) 10,
4 (Oct. 2018), 17:1–17:25. https://doi.org/10.1145/3239574
Amin Omidvar, Hui Jiang, and Aijun An. 2018. Using Neural Network for Identifying
Clickbaits in Online News Media. CoRR abs/1806.07713 (2018).
http://arxiv.org/abs/1806.07713
Olga Papadopoulou, Markos Zampoglou, Symeon Papadopoulos, and Ioannis
Kompatsiaris. 2017. A Two-Level Classication Approach for Detecting Clickbait
Posts using Text-Based Features. CoRR abs/1710.08528 (2017).
http://arxiv.org/abs/1710.08528
Martin Potthast, Tim Gollub, Kristof Komlossy, Sebastian Schuster, Matti Wiegmann,
Erika Patricia Garces Fernandez, Matthias Hagen, and Benno Stein. 2018.
Crowdsourcing a Large Corpus of Clickbait on Twitter. In Proc. of the 27th
International Conference on Computational Linguistics (COLING 2018), 1498–1507.
https://aclanthology.info/papers/C18-1127/c18-1127
Martin Potthast, Tim Gollub, Francisco Rangel, Paolo Rosso, Efstathios Stamatatos,
and Benno Stein. 2014. Improving the Reproducibility of PAN’s Shared Tasks:
Plagiarism Detection, Author Identication, and Author Proling. In Information
Access Evaluation meets Multilinguality, Multimodality, and Visualization. 5th
International Conference of the CLEF Initiative (CLEF 2014). Springer, Berlin
Heidelberg New York, 268–299. https://doi.org/10.1007/978-3-319-11382-1_22
Martin Potthast, Sebastian Köpsel, Benno Stein, and Matthias Hagen. 2016. Clickbait
Detection. In Advances in Information Retrieval. 38th European Conference on IR
Research (ECIR 2016) (Lecture Notes in Computer Science), Vol. 9626. Springer, Berlin
Heidelberg New York, 810–817. https://doi.org/10.1007/978-3-319-30671-1_72
Md Main Uddin Rony, Naeemul Hassan, and Mohammad Yousuf. 2017. Diving Deep
into Clickbaits: Who Use Them to What Extents in Which Topics with What
Eects? CoRR abs/1703.09400 (2017). http://arxiv.org/abs/1703.09400
Philippe Thomas. 2017. Clickbait Identication using Neural Networks. CoRR
abs/1710.08721 (2017). http://arxiv.org/abs/1710.08721
Matti Wiegmann, Michael Völske, Benno Stein, Matthias Hagen, and Martin Potthast.
2018. Heuristic Feature Selection for Clickbait Detection. CoRR abs/1802.01191
(2018). http://arxiv.org/abs/1802.01191
Yiwei Zhou. 2017. Clickbait Detection in Tweets Using Self-attentive Network. CoRR
abs/1710.05364 (2017). http://arxiv.org/abs/1710.05364