-
Bug
-
Resolution: Obsolete
-
Medium
-
None
-
No-Version
-
1
-
Severity 2 - Major
I feel like I've banned a hundred accounts in the last week.
It is good to see that a new spam account can only create three questions initially, but it's obvious that the Captcha on signup has now been thoroughly broken and we need to reconsider improved anti-spam-user techniques.
I would start with
1) A look at a better Captcha system in the short term (the fact the current one simply isn't good enough strongly suggests that better ones will be broken in the future, so it's only a quick temporary fix)
2) Spam analysis algorithms - blacklisting of new user's postings unless they actually look coherent and vaguely relevant
3) Absolute block on posting of links for new accounts (the main reason they do it is to get links picked up by search engines, so removing that facility removes any incentive to do this). Although I'd consider a whitelist so that links to other questions or Atlassian documentation are allowed through
- is related to
-
CONFSERVER-47910 Viewing a deleted or trashed question (eg. by following an old email link) should show some kind of indication that the question is deleted
-
- Closed
-
-
CQS-605 You do not have permission to view this issue
[CONFSERVER-47728] Spam filters have been defeated
Approximately three weeks ago we have had a huge up tick in spam content being posted to answers.atlassian.com. We've been working eliminating as much of this content as we can but it can prove to be a never ending game of wac-o-mole.
Several iterations of the "technical support phone number" content have been added to our anti spam measures and we are continuing to combat this abuse of our service daily.
Apologies for any issues this is causing.
The last few weeks i noticed a drastic increase in spam.
Is there still something happening here on this issue? E.g. first question is moderated.
Thanks. It means a lot to hear that my work has been having an effect. It's also always good to see that you are as active in the Answers community as you've been. I'm still waiting to see if and when anyone unseats you as the top-scoring community member. However, I'm suspecting that will not happen any time soon.
Here's where we stand on the spam:
(1) We had some complaints from users this week that the spam-related restrictions were preventing their legitimate access to the service. Reports were coming in to our support teams and I worked with one of our Support Engineers ( wzanchet ) to try and find a better middle ground to better serve the legitimate users while continuing to restrict the spammers. As a result of my change, an increased amount of spam made it through Friday. I apologize, that was all me.
(2) Since the increase in spam, I spent a good part of my Saturday not only enhancing the filters but also restricting other patterns I have found. (Note: I get these emails too....)
(3) I would make one request of you and the rest of the community. As we see new patterns emerge, please add those patterns as comments to this ticket. This will help me adapt the charlie-hates-spam script to catch them. CHS is a pretty crude and young little script and it needs to be bed a regular diet of spam samples so it grows up to becomes a spam-munching little beast we want it to be.
Note: If you want to give credit to anyone for the improvements over the past few months, jclark@atlassian.com is the guy who deserves credit. Ever since Joe arrived in the Austin office, I have had to sit on the same team with him, hearing him remind me that "Answers needs love."
Sam, you've done a really good job - since the work you talked about on the 9th, there's been a massive reduction in volume. Well done!
The filters are finding and flagging things really well, we're only getting one spam per dodgy user and it really does feel like we're only banning a few accounts a day and most of them are just tidying up the "flagged as spam by administrator" ones rather than anything new.
I have noticed a new attack pattern this week - a spammer posts a question with a title (and often body) set to some text that was the title of a recent valid question (It feels like they are choosing questions that have a couple of votes or are actively answered/commented, but I haven't done any analysis to confirm that). That trick is dodging the content filters by using phrases that the spammers know to be ok. We got loads of these on Friday.
Now, they've started editing them to make them into spam after they've got past the filter by posting a "known good" string.
As an example (not sure if it's still there, I'm not sure how much a ban destroys the spam): https://answers.atlassian.com/questions/30940408/how-do-i-hide-a-transition-from-a-view-screen really did have the title and body "How do I hide a transition from a view screen" when originally posted. That is a title identical to an earlier legitimate question that has a number of answers and comments. The question was later edited so that the title and body become the advert for astrological donkey-poop we have got used to.
The volume so far is low enough that it's not a problem though.
Just following up from my last post,
(1) Over the last 24 hours we have seen a dramatic decrease in spam notifications sent to customers. While spammers are continuing to attack the AAC system, and while this attack frequency has increased, we have seen a significant reduction in email notifications sent to customers (one message per spammer before ban).
(2) I will be continuing to work to increase the effectiveness of our anti-spam defenses.
Just a quick update for the Atlassian Answers community.
(1) I have implemented something loosely called "charlie-hates-spam." This tool is a prototype for something I am hoping will have a positive effect on our spam issues.
(2) I'm going to hold off on details about the CHS tool until I have all the features out. What I will say, for the benefit of everyone out there that I know is frustrated by the spam issue, is this: I am working hard to make our service more resilient and less spammy. This past weekend was a great step forward with the first release of CHS, and I hope to have more good news posted soon.
Thanks Sam, I was able to post my question! Glad I was able to help in some small way.
I've rolled back changes from today.
Please test and confirm that we are working as expected one again.
If you experience problems, please provide the content you were trying to post.
jordan.packer, thanks for posting the content. It helped me identify what I THINK was the problem.
Thanks also for your patience.
Sorry everyone. We're trying to find the right balance with spam protection, and as everyone's pointed out our list is catching too many false positives. Thanks for commenting. I'll have a look and scale it back.
Indeed - I was having problems answering with something like "Could you have a look at the Atlassian doc X", where X was originally a link, and then when it was blacklisted, just the page title. A little too strict there?
There is, mind you, such a thing as taking spam protection entirely too far.
Not being able to link to our own documentation is unacceptable.
I'm having this issue right now.... pretty darn frustrating. Here's the question I'm trying to post:
One of my JIRA admins created a scripted field that would take 3 number fields and add them together (see below). However, the fields that are being added are only present on one screen (which is the same screen that the scripted field resides on). However, I'm seeing a ton of errors in my logs that suggest that the scripted field is attempting to be evaluated on every issue that is opened up, even if the scripted field has not been added to any screens for that issue. The error looks like this:
2015-05-26 09:34:26,857 devstatus.applink:thread-2 ERROR JM28299 442x38x1 1slsrh3 10.151.2.9 /browse/PNTH-19283 [onresolve.scriptrunner.customfield.GroovyCustomField] Script field failed on issue: DM-534, field: Priority Value java.lang.NullPointerException: Cannot invoke method getValue() on null object at Script3.run(Script3.groovy:18)
I realize that the get Value on null object problem is something that needs to be fixed for the original script anyway (any assistance on that end would also be appreciated), but the main problem here is that it's attempting to run the script for all issues no matter what.... shouldn't it only be running on issues that have added that scripted field?
Here is the code for the scripted field (in its current state):
import com.atlassian.jira.ComponentManager; import com.atlassian.jira.issue.CustomFieldManager; import com.atlassian.jira.issue.fields.CustomField; import com.atlassian.jira.issue.IssueManager; import com.atlassian.jira.issue.Issue; // Creates Issue and Custom Field Manager objects IssueManager issueManager = ComponentManager.getInstance().getIssueManager(); CustomFieldManager customFieldManager = ComponentManager.getInstance().getCustomFieldManager(); // Capture custom fields CustomField customField_bv = customFieldManager.getCustomFieldObject( 13459 ); CustomField customField_ri = customFieldManager.getCustomFieldObjectByName( "Risk Index" ); CustomField customField_loe = customFieldManager.getCustomFieldObjectByName( "Level of Effort" ); CustomField customField_pv = customFieldManager.getCustomFieldObjectByName( "Priority Value" ); // Capture drop down values of custom fields def businessValue = issue.getCustomFieldValue( customField_bv ).getValue(); def riskIndex = issue.getCustomFieldValue( customField_ri ).getValue(); def levelOfEffort = issue.getCustomFieldValue( customField_loe ).getValue(); // Only sum custom field values if NONE are null if (businessValue!= null && riskIndex != null & levelOfEffort != null) { // Translate drop down values to double values // Business Value double businessValue_v; if (businessValue == "1 - Low") {businessValue_v = 1; } else if (businessValue == "2 - Medium") {businessValue_v = 2; } else if (businessValue == "3 - High") {businessValue_v = 3; } else {businessValue_v = 1000; } // should signal warning that a select drop down value is not recognized // Risk Index double riskIndex_v; if (riskIndex == "1 - User Interface") {riskIndex_v = 1; } else if (riskIndex == "3 - Management/Reporting") {riskIndex_v = 3; } else if (riskIndex == "6 - Process Improvement") {riskIndex_v = 6; } else if (riskIndex == "9 - Control/Error Prevention") {riskIndex_v = 9; } else {riskIndex_v = 5000; } // should signal warning that a select drop down value is not recognized // Level of Effort double levelOfEffort_v; if (levelOfEffort == "3 - Low (< 4hrs)") {levelOfEffort_v = 3; } else if (levelOfEffort == "2 - Medium (5 - 11hrs)") {levelOfEffort_v = 2; } else if (levelOfEffort == "1 - High (> 12hrs)") {levelOfEffort_v = 1; } else {levelOfEffort_v = 8000; } // should signal warning that a select drop down value is not recognized // Return Summed Values return priorityValue = businessValue_v + riskIndex_v + levelOfEffort_v; }
They're getting there. I noticed it was struggling with a "banned word" list earlier today, and the volume getting through is distinctly lower than it was last week.
Might be worth posting your question here as a comment, just so Atlassian have an example of something that should probably be posted as a valid question.
I am currently trying to add a question and as a new user trying to find out information about atlassian products this is very confusing and has put me in a position to decide is it worth my time to try and ask my question about the product or quit. By quitting it means I am left with my assumptions about the product which impacts our adoption of the product.
I will make another attempt at the question. I have no idea what blacklisted words my post contains when I am asking about the components of one of your products using your products terms.
As Joe mentioned a couple days ago, we're seeing and attempting to counter this new wave of spammers. We have this in our incident management process (for those of you long-timers, you'll be happy to know we have a fairly mature incident management process now!) and are taking our steps to ban the right users from a system perspective as opposed to one by one. We think we found a bunch of sleeper users that were registered prior to the CQ migration, that have been inactive and have been reawakened.
We had a service degradation earlier this morning while the uploading of attachments via script got especially expensive. We're taking it seriously.
I'll give updates as we progress.
Looks like we're experiencing a new wave of spam invasions on Answers this week. I think Nic's suggestion of a URL whitelist is the soundest suggestion so far to try and reduce their success rate. I'm going to see if we can make some quick progress on that, since it's proving to be difficult to block spammers at the Atlassian ID level.
Thanks emallmann - looks like we are getting a new kind of spam attack.
Fortunately the Instaban feature works equally well on Answers as well as Questions, so this hasn't really been any more difficult to clean up.
Getting this kind of "zero day" spam from a new source certainly gives some weight to Nic's suggestion of a whitelist for new users. I'll give that some more thought.
My discussions with the Atlassian ID team about shutting off user sign-ups at the source is on-going.
Hey Team,
I've got another incident on answers today which I found interesting:
http://i.imgur.com/8F7b7UF.png
These spams have other spamming accounts that comment on the created topic!
http://i.imgur.com/kJdgKDx.png
and
http://i.imgur.com/tFNBWF7.png
I noticed the spam over the weekend was vastly reduced compared with the previous weekend - If memory serves, I killed one on Friday and a couple on Sunday (as opposed to 50ish the weekend before). So I'd say you're doing well!
The other board I use which seems highly resistant to spam implements a simple whitelist on url postings, but it doesn't ban or block users trying to post urls, it just puts an error in front of them that says "we don't allow new users to use url shortening, please use the full url". I suspect they fail on pastebin, but it's not a forum where people post that sort of stuff. And their whitelist is very short and doesn't change much.
Hi Folks,
We're still chasing up better anti-spam protections with the Atlassian ID team. There is a CAPTCHA system in place for new user accounts, but it probably needs some adjustments since its effectiveness is clearly pretty low at the moment.
We have some small improvements for anti-spam in the pipeline, and work should commence on these before the year is done. Specifically, as Jeremy mentions, CQ-1308 and we're also going to unlock the instaban privileges so that every Atlassian staff member has access to them. This should help share the burden of killing off the spam accounts.
I'm generally against implementing more sophisticated spam analysis algorithms. We toyed with Akismet on the old Answers site, but the false positive rate was just too high - legitimate questions and answers were being caught up in the filters. I would rather have some spam get through than have some legitimate users get blocked and not know how to get help.
A link whitelist is a reasonable idea, but again there's a hit in usability there. Some external URLs like bit.ly, t.co and pastebin.com can be used by both spammers and legitimate users alike.
My gut feeling is that our blacklist approach has been reasonably successful. I'm not seeing any more of the indonesian refrigator spam we were getting earlier in the year, so I think we have beat those guys. The main problem with the blacklist is that it's maintained manually, so it takes some time for me to update it in response to the spam that I see before it starts to be effective.
You did not finish point 3 of "What we'll do next" above, please update your comment.
Oops! I left myself halfway done on that comment; it was meant to read that we'll add captcha upstream, which is now comment #3. Sorry for not doing a good double-check.
Are some of those being questions being blacklisted now? I am not sure tv as blacklisted word is a good thing since some jira.atlassian.com actually had some valid issues show up. Does your blacklist support repeating words or allow specification of multiple word combinations or patterns?
Is it overly paranoid of me to publish the mechanics of how the blacklist works? I think I'd like to remain a bit mysterious about it; let's just say we have a blacklist and we monitor and maintain it.
It is more likely it is quiet because the spammer is not putting something out. The tv spammer seems to be more active around the weekends. There are also spam in other languages. Are some of those being questions being blacklisted now? I am not sure tv as blacklisted word is a good thing since some jira.atlassian.com actually had some valid issues show up. Does your blacklist support repeating words or allow specification of multiple word combinations or patterns?
Thanks for the update, Jeremy.
0 spam this morning, so the situation is already much better!
You did not finish point 3 of "What we'll do next" above, please update your comment.
We're in the process of figuring out our approach to addressing this. Here's an update on where we're at.
First, the things we already have in place:
- We have a dark feature in Confluence Questions that allows spam prevention. (NB - anyone who comes across this, you can enable this feature cq.spamprevention from /admin/darkfeatures.action, and configure it at /admin/questions/spamprevention.action). This action does two things:
- We limit the number of posts for users of certain levels of karma. I'll watch over the next couple days to see if that setting can be tweaked to be a bit more effective.
- We have a blacklist of words (I've added football and tv, and I'll watch for this as well).
- Most of you have figured out we have an 'instaban' feature which suspends users and deletes all their content.
What we'll do next:
- As I mentioned, I'll watch and tune our existing spam mitigation over the next few days.
- We're going to schedule
CQ-1308, which while it's not actual spam prevention, is an improvement in that it's very confusing to see content that's been deleted and not have any indicator that it's been deleted. After instaban, content is put in the trash, but if you visit the deleted page's URL you can still see the content (in this case spam). - We're looking into stopping spammers upstream in Atlassian ID. I've not got an up-to-date version of that yet but I'll post back when we have a more concrete plan.
Two suggestions:
- If a user is banned, their comments/answers disappear, but not their questions. I think the questions should too. (At least for the spammers.)
- There could be a blacklist of obviously spammy words like "football" or "tv".
Yep. I don't want to mislead anyone - the No-Captcha is much better than what Atlassian is using, but it's still being defeated.
Have you used a captcha on login? A new number is generated for each new login, so it takes three items to authenticate.
I'm afraid I've seen Google's No-Captcha defeated already in another forum I use. I think it falls into the category of "short term fix, as it's not going to be good enough in the long run"
I know I am getting tired of the spam and I'm sure others are as well. I hope Atlassian is taking this seriously and a fix is short in coming ... unlike many of their other bugs.
Now the movie/tv spammer is spamming http://jira.atlassia.com now. Maybe this will give some impetuous to improve if not implement some anti-spamming interfaces.
I think Google's new "No-CAPTCHA" reCAPTCHA product looks very promising for this kind of thing.
Yes, the movie spammer has been quite active these past few days. The sad part is somehow the ban this user feature partially broken as well.
And they have been defeated again. :|