






Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
The prevalence and impact of fake content in the BitTorrent ecosystem, revealing that 35% of shared files are fake and a few tens of users are responsible for 90% of this content. The authors present TorrentGuard, a new tool for early detection of fake content, and analyze the strategies and motivations of fake publishers. The document also compares the demographic distribution of victims and general BitTorrent users to identify countries with a lower ratio of victim downloaders, suggesting that users in these countries are more skilled at identifying fake content.
Typology: Lecture notes
1 / 11
This page cannot be seen from the preview
Don't miss anything!
∗Institute IMDEA Networks †University Carlos III Madrid ‡Telecom SudParis
Abstract —In this paper we conduct a large scale measurement study in order to analyse the fake content publishing phenomenon in the BitTorrent Ecosystem. Our results reveal that fake content represents an important portion (35%) of those files shared in BitTorrent and just a few tens of users are responsible for 90% of this content. Furthermore, more than 99% of the analysed fake files are linked to either malware or scam websites. This creates a serious threat for the BitTorrent ecosystem. To address this issue, we present a new tool named TorrentGuard for the early detection of fake content. Based on our evaluation this tool may prevent end users from downloading more than 35 millions of fake files per year. This could help to reduce the number of computer infections and scams suffered by BitTorrent users. TorrentGuard is already available and it can be accessed through both a webpage or a Vuze plugin.
I. INTRODUCTION BitTorrent is one of the most popular applications in the current Internet. It is daily utilised by millions of users and is responsible for a major portion of the Internet traffic [26]. This success motivated the research community to investigate different aspects of BitTorrent covering performance [20][25], economics [10][13][30] and incentives [17][27] issues. How- ever, to the best of the author knowledge, the research com- munity has put less attention to BitTorrent security aspects. Some previous works have analysed the vulnerabilities of the BitTorrent protocol to free-riders [21][22][29] whereas some others address the lack of privacy offered by BitTorrent [8]. More recently, in a previous work [12] we demonstrated that the BitTorrent ecosystem is suffering from a continuous poisoning index attack resulting in 30% of published torrents associated to fake content. Furthermore, this fake content produces 25% of the download events, which means that every fourth content download in BitTorrent is fake. These initial results highlight a serious issue that, to the best of the authors knowledge, has still not been covered by the research community. In this paper we thoroughly analyse the fake publishing phenomenon in BitTorrent in order to understand its real impact on the system performance as well as the potential risks of fake content for BitTorrent users. Furthermore, we propose a practical solution to mitigate this problem. We base our study on data collected from torrents published in The Pirate Bay portal during a period of 14 days from 30-04- to 13-05-2011. The 35% of almost 30K analysed torrents are
associated to fake content. This depicts a 5% increment in the presence of fake content within the BitTorrent ecosystem in a period of one year between our two measurement studies. This justifies (even more) the necessity of the research conducted in this paper. In order to fight the fake publishing phenomenon, the first step is to properly characterise the fake publishers and their be- haviour. The current BitTorrent portals solutions identify fake publishers through the user account that they use to upload fake torrents to the portal. We show in the paper that this technique is inefficient since the fake publisher can generate as many user accounts as needed in those portals. Instead, the parameter that uniquely identifies the fake publisher is the IP address it uses to perform its activity. Surprisingly, our data reveals that just 20 fake publishers (whose IP we identify) are responsible for injecting 90% of fake content in the BitTorrent ecosystem. Moreover, most of these IP addresses belong to Hosting Providers where the fake publishers rent dedicated high-resource servers to perform their activity. The fake publishing activity is time consuming since a fake publisher needs to manually create the user accounts used in the different portals (in some cases up to 4 accounts per day). Furthermore, this activity requires dedicated resources (e.g. rented servers). This investment in time and resources can be only justified by a strong motivation behind the distribution of fake content. We have downloaded and manually inspected a large number of fake content published during our measure- ment period and found 3 different profiles among the fake publishers: (i) a first group of fake publishers aims to spread malware using the popular BitTorrent system; (ii) a second set of users tries to attract BitTorrent users to scam websites in order to get economical benefit from the victims by using different scam techniques; (iii) the last group is formed by antipiracy agencies that upload fake versions of those content that they want to protect. Our data shows that more than 99% of the published fake content is associated with the two first profiles. This supposes a very serious threat for the BitTorrent ecosystem since the activity of these publishers may lead to thousands of unde- sirable episodes of scammed users and computer infections. These findings suggest that new solutions need to be proposed in order to eliminate or at least reduce the number of fake content available in the BitTorrent ecosystem. Towards this
end, we have designed and implemented TorrentGuard. This is a novel detection tool that allows to identify the IP address of the fake publisher, thus being able to report as fake each content published from this IP address at the moment of its publication. Based on the performed evaluation, TorrentGuard would be able to avoid more than 35 millions fake content downloads every year. This means, preventing hundreds of thousands of users to suffer from computer infections or scam incidents every year. TorrentGuard can be currently used through a publicly available website and a Vuze plugin. The rest of the paper is structured as follows. Section II presents the background information. In Section III we de- scribe our measurement methodology and present our dataset. Next, Section IV characterises fake publishers, while Section V classifies them depending on the goal they pursuit with their activity. Section VI shortly characterises the downloaders of the fake content. In Section VII we describe and evaluate our solution to improve the detection of fake content. We also discuss possible countermeasures to TorrentGuard and their efficiency. Section VIII describes relevant works to this paper. Finally, Section IX concludes the paper.
II. BACKGROUND In this Section we briefly describe the main aspects of the BitTorrent ecosystem making an special emphasis on the procedure of publishing content on The Pirate Bay (and by extension on other BitTorrent portals) and specifically, how fake publishers do it. This is summarised in Figure 1. For a full description of the BitTorrent ecosystem we refer the reader to [19] and [31].
A. Main elements of BitTorrent ecosystem
- BitTorrent Portals: these are webpages which index .torrent files, classify them into different categories and provide basic information for each file. These portals serve as rendez-vous points between content publishers and BitTorrent downloaders. The publishers upload their .torrent files to BitTorrent portals and the clients download them. - .torrent file: this is a meta-information file including relevant information for the BitTorrent protocol such as: (i) the content infohash, this is a unique identifier of the content in the BitTorrent ecosystem; (ii) the IP address of the BitTorrent Tracker managing the content distribution process; (iii) the size of the content and the number of pieces forming the file. - magnet link: this is an URI-like link that includes the infohash of an specific content and optionally the address of a tracker [1]. A user can launch a download process retrieving the magnet link instead of the .torrent file from a BitTorrent portal. Then, with the magnet link the user can obtain the .torrent file from other peers in the swarm^1. The magnet links have recently become significantly important since the administrators of the largest BitTorrent portal, The Pirate Bay, have announced their intention to stop serving .torrent files
(^1) Also the magnet link can be used as index to retrieve the associated .torrent file from the different DHTs implemented by BitTorrent clients [11].
from March 1st 2012. Instead, they will serve exclusively magnet links [2].
- BitTorrent Trackers: these are servers which manage the BitTorrent download process of a given content. The set of peers downloading a given file is named swarm. The tracker maintains a list with the IP addresses and the download progress of all the peers forming the swarm associated to a specific content. Furthermore, when a new peer joins the swarm, it contacts the tracker in order to obtain a list of IP addresses of other peers participating in the swarm. By doing so, the new incomer is able to retrieve pieces of the content from these peers. - BitTorrent downloaders (peers): these are clients forming the swarm that download and/or upload pieces of the content. We distinguish two types of peers. A seeder is a peer that possess a complete copy of the content, thus only uploads pieces whereas a leecher does not have the complete file so that it uploads and downloads pieces. - BitTorrent publishers: these are the clients that make avail- able the first copy of the content in the BitTorrent ecosystem. B. Publishing a content in BitTorrent When a publisher wants to publish a content in the BitTor- rent ecosystem, it firstly creates a .torrent file. After creating the .torrent file, the publisher uploads it to one or more Bit- Torrent portals. For this purpose, it uses a user account (with a specific username) created in these portals. Furthermore, the publisher distributes the first copy of the content by acting as the initial seeder in the associated swarm. Therefore, the content publisher can be identified by the IP address of the initial seeder distributing the content and by the username utilised to upload the content to a BitTorrent Portal. In this paper we specifically address the fake content publishing phenomenon in BitTorrent. A fake publisher is a user that exploits the BitTorrent ecosystem to publish fake content, this is, content that is different than what is expected from the content name. A fake publisher makes available the fake content from a single IP address (or limited number of IP addresses) that corresponds to the initial seeder of all its published content. Furthermore, a fake publisher typically creates a user account in a BitTorrent portal from which it uploads .torrent files associated with its fake content. Some portals, such as The Pirate Bay, remove this user account after some client reports that it is being used to publish fake content. Then, the fake publisher reacts by creating a new account to publish new .torrent files and this loop keeps repeating. Hence, contrary to the case of regular publishers (that can be identified by its associated username in the BitTorrent portal), fake publishers can exclusively be identified by its IP address. Finally, it must be noted that, to the best of the authors knowledge, the previously described technique based on users’ reports is the only one used nowadays for detecting and deleting fake content. C. Downloading a content in BitTorrent When a user wishes to download a content, it first down- loads the .torrent file associated to the content from a BitTor-
(^020 40 60 80 )
20
40
60
80
100
Percentage of fake publishers
Percentage of fake content published
Fig. 2. Percentage of fake content published by the top x% fake publishers
obtain those IP addresses participating in the swarm. In order to accelerate this process we perform this task from four independent machines.
A. Dataset description
We have applied the described methodology between 30- 04-2011 and 13-05-2011, in addition to 5 days of warm- up phase dedicated to identify the initial fake publishers’ IP addresses. During the measurement period we have collected 29330 torrents, from which 10206 (35%) were identified as fake ones. Furthermore, we have collected the IP addresses of those peer participating in swarms associated with fake content until two instants: (i) the moment the content is removed from The Pirate Bay and (ii) the end of our measurement study.
IV. FAKE PUBLISHERS CHARACTERIZATION Our results reveal that more than 1/3 of the content pub- lished in the Pirate Bay is fake. This shows an increasing trend in the number of fake content regarding our previous study done one year earlier when the fake content represented a 30%. Therefore, it is critical to eliminate or at least reduce this huge number of fake content in the BitTorrent ecosystem. The first step towards this end is to identify who is responsible for publishing this fake content and characterising its behaviour. In this Section, we address this issue using the collected data. More specifically, we aim to answer questions such as: How many fake publishers (i.e. IP addresses) are uploading fake content to the BitTorrent Ecosystem? , From where (i.e. which ISP) they perform their activity? or How frequently they upload fake content?.
A. Number and Contribution of Fake Publishers
Unexpectedly, we observe that only 71 IP addresses are responsible for those 4779 fake content for which we iden- tified the initial seeder. This implies almost 70 fake content published from each of these IPs in average. However, it is interesting to investigate the level of the contribution of each one of these fake publishers. Towards this end, Figure 2 depicts the percentage of fake content published by the top x% of these fake publishers. The graph shows a skewed distribution where 10 IPs (14%) are responsible for publishing almost 75% of all the fake contents. Moreover, this number increases to 90% if we consider the top 20 IP addresses (28%). Therefore, we
(^020 40 60 80 )
1
Number of The Pirate Bay accounts per fake publisher
CDF
Fig. 3. CDF of the number of The Pirate Bay accounts per fake publisher can conclude that a reduced number of just 20 fake publishers are responsible for poisoning the BitTorrent ecosystem. In the rest of the paper we focus on thoroughly studying this group of 20 fake publishers that we refer to as Top Fake Publishers.
B. Location of fake publishers We have mapped the IP address of each one of the Top Fake Publishers to its correspondent ISP using the MaxMind database [23]. Surprisingly, 17 out of the Top 20 fake pub- lishers operate from Hosting Providers. These are companies dedicated to rent high-resources (cpu, memory and bandwidth) provisioned servers. Moreover, 70% of the fake content is seeded from just two Hosting Providers named OVH Systems and Obtrix located at France and New Zealand respectively. On the one side fake publishers need resources in order to sustain the distribution of a large number of fake files [12] and on the other side anonymity due to the illegitimate activity being performed. The usage of rented servers in Hosting Providers covers both requirements. Hence, the use of dedicated servers in Hosting Providers reveals that most of the fake publishers perform their activity from a stable IP since those servers typically have a static IP address configured. This makes them easily identifiable. In this sense, the usage of anonymity services such as TOR [4] or proxy services seems to be useful for fake publishers in order to make difficult their identification. However, we have not found that the fake publishers identified in our dataset use such services. This suggests that the severe performance degradation associated to these anonymity services prevent fake publishers from using them. We further discuss these aspects in Section VII-E.
C. Pirate Bay accounts utilisation The Pirate Bay solicits to solve a CAPTCHA [9] in order to create an account to avoid the automatic generation of accounts. Hence, fake publishers are obeyed to create their accounts manually. Figure 3 shows the CDF of the number of The Pirate Bay accounts used by each one of the 71 identified fake publishers. A fake publisher use (in median) 6 accounts in a period of 14 days. However, a 5% of the fake publishers inject content using more than 58 different accounts in the same period. This represents an average number of 4 accounts per day. This result suggests that fake publishers need to
dedicate time to track the availability of their accounts in order to manually generate new ones if needed. Interestingly, we also observe a second strategy that al- though marginal is worth to report. In these cases, fake publishers hijack the accounts with a legitimate publishing history. This provides a trust reputation among the download- ers. Therefore, this could extend the time that fake user could be injecting fake torrents before being reported. However, due to the required technical skills for applying this technique, this case represents less than 1% of all fake accounts.
D. Publishing Strategies
Fake users follow two different strategies to upload fake contents into The Pirate Bay portal. On the one hand, we found users that publish a large number of fake content in a row (typically around 10) in just few seconds after creating a user account. Once the account is deleted, they repeat the same process from a new account. Around 70% of Top Fake Publishers use this technique. On the other hand, 30% of the Top Fake Publishers upload just one or two fake contents with a username. This is a more conservative technique that extends the time that those fake accounts are active before being eliminated when compared to the previous case. Specifically, the accounts of those publishers using the first strategy are detected and then deleted in 92 minutes (in average) whereas the accounts of those using the second strategy are deleted in 253 minutes, thus being their content available 2.75 times more time in The Pirate Bay. Unexpectedly, although the second strategy offers longer accounts’ lifetime, it attracts only 47 downloaders per torrent (in average) in front of the 84 attracted by fake publishers using the first strategy. This happens because the fake publishers using the first strategy typically use popular names associated to their content whereas publishers using the second more conservative strategy do not use such popular names.
E. Strategies to attract downloaders
The main goal of fake publishers in BitTorrent is to produce as many downloads of their content as possible. Therefore, they need to offer torrents that sound very attractive for the downloaders. Towards this end, we have observed that fake publishers use three different strategies: (i) they assign to the content a very popular name such as the title of the last released Hollywood movies; (ii) creating the false impression that the content has been published by a well-known and trusted user. For this purpose, the fake publisher names its content in the same way as the trusted one. For instance, eztv one of the most popular publisher in The Pirate Bay adds the signature [eztv] at the end of the title of its published files. Then, some fake publishers also add this signature to the title of their fake content; (iii) presenting attractive performance statistics (i.e. a high number of seeders and leechers) for the fake torrent. In this way, the fake torrent is perceived as a very popular torrent by the downloaders, that assume they will obtain a high download rate in case of selecting that torrent. To generate these fake statistics the publisher connects to the
Tracker many times using a single IP but different ports. The tracker considers each of these IP+port pairs as a single peer and reports a high number of seeders and leechers. The Pirate Bay retrieves and presents these statistics from the Tracker. In summary, the fake content publishing activity is per- formed from Hosting Providers facilities by just few dozens of users. Furthermore, fake publishers are aware of how the BitTorrent ecosystem works, thus they use sophisticated strategies in order to improve the success of their activity.
V. FAKE PUBLISHERS PROFILES After characterising the Fake Publishers behaviour, we still need to answer an important question: What incentives a user has to publish fake content?. To answer this question we have downloaded up to 10 files published by each fake publishers in our dataset and manually inspected them. Our analysis reveals the presence of three different profiles: malware propagators, scammers and antipiracy agencies. Next, we describe in detail each one of these profiles.
A. Malware propagators These users exploit the popularity of BitTorrent in order to rapidly propagate malware among thousands of users. On the one hand, for some of the users in this group the published content is the malware itself. In this case, the content including the malware pretends to be typically a patch for a popular game, a key generator, etc. On the other hand, a second set of users use a more sophisticated technique. They publish a movie with a catchy title. The content has the standard size of a DivX movie (i.e. between 700MB and 1GB), and even sometimes includes a second small file with a real sample of the movie. Hence, the file has the appearance of a (non- fake) legitimate content. However, when a user downloads the content and tries to play the movie, it is requested to reproduce it using Windows Media Player (WMP) in case a different player is run instead. When the movie is finally reproduced with the WMP a pop-up window appears requesting to install new codecs along with an url link from where these codecs can be downloaded. Of course, the file including those pretended codecs is reported as a malware by anti-virus software.
B. Scammers In this case, the fake publisher uses a similar technique to the sophisticated one described above. However, when the user plays the movie with WMP, it is automatically redirected to a website in the Internet. A second variant used by scammers is to provide a file protected with a password (typically .rar), and offer the user a website in which the password can be obtained. Once the user gets into one of these websites, a credit card payment is requested in order to obtain some privilege to watch the downloaded movie (e.g. the password of the .rar file). In some other situations the user is informed that in order to check he is not a bot, a survey must be filled previously to watch the movie. This survey results to be a contest in which the client is obeyed to subscribe for a paid premium
higher benefit from the system described in the next section.
VII. TORRENTGUARD In the previous Sections we have demonstrated that a large number of fake content (35%) is currently being published in the BitTorrent ecosystem, and what is worse, most of these fake content are potentially harmful for those users that download them. We have also seen that the techniques used to remove these contents are inefficient and require heavy human intervention to: first, detect and report the falseness of a given content, and second, remove it from the BitTorrent portals (this is done by the portal administrator). Furthermore, the scope of the user reports is limited to a single BitTorrent Portal, thus the content is removed exclusively from this portal instead of the whole BitTorrent ecosystem. In this Section we present our tool, named TorrentGuard, that aims to automatise and accelerate the process of detecting fake publishers. For this purpose, TorrentGuard identifies a fake publisher by its IP address instead of its username as it is done by BitTorrent portals such The Pirate Bay nowadays. By doing so, a fake content can be identified just after its birth since we can identify that the IP address of the initial seeder belongs to a fake publisher. This allows to accelerate the detection process. Furthermore, contrary to current techniques used by Bit- Torrent portals, TorrentGuard removes the fake content from the whole BitTorrent ecosystem because it reports the content infohash. Since the infohash uniquely identifies a content in the BitTorrent ecosystem, a user of TorrentGuard can identify the content as fake independently of the portal from which the .torrent file was retrieved or even if it was obtained from the BitTorrent DHT service. In the rest of the Section we present the details of the Tor- rentGuard implementation as well as the performance results obtained over a testing period of 14 days.
A. TorrentGuard Implementation
Figure 5 depicts a complete schema of TorrentGuard. It is composed by the following modules:
(^3) From 1st of March 2012, our tool will use exclusively magnet links for this purpose, as the Pirate Bay will stop serving .torrent files from that date.
If the IP address of the initial seeder matches with one of those included in the blacklist of fake IP addresses, this torrent is marked as fake.
(^4) This application is available at http://torrentguard.netcom.it.uc3m.es/
Fig. 5. The schema of TorrentGuard Therefore, in the worst case, i.e. for new fake publishers, TorrentGuard employs the same time as The Pirate Bay to identify fake content. However, once the fake publisher’s IP address has been identified, TorrentGuard is able to report fake content immediately after its publication. This provides a significant improvement compared to standard detection mechanisms. In other words, with TorrentGuard it is not necessary to manually report each fake user account as the existing solutions require. Furthermore, the current existing solutions are limited to the portal where they operate. For instance, in the case of The Pirate Bay, once a content is identified as fake it is removed from the portal but not from the BitTorrent Ecosystem. Rather, TorrentGuard is a cross-portal solution, that is able to identify the infohash of the fake content preventing its download independently of the source from where the user obtained the .torrent file: any BitTorrent portal or the DHT service. In short, TorrentGuard is a novel tool that: (i) reduces fake content detection time since it uses IP-based detection instead of username-based detection and (ii) allows to identify a fake content in the whole BitTorrent ecosystem instead of in a single portal because it identifies the fake content using the infohash (an unique identifier of the content in the whole BitTorrent ecosystem) rather than the torrent-id of an specific portal.
B. TorrentGuard Performance
We have evaluated the performance of TorrentGuard and compared it with the fake content detection mechanism used by The Pirate Bay during a testing period of 14 days. First, we count how many fake content published in The Pirate Bay are identified by the TorrentGuard just after its birth. Furthermore, we measure how long The Pirate Bay takes to identify these fake content. The obtained results show that TorrentGuard is able to early detect around 50% of the fake content uploaded to The Pirate Bay. Moreover, Figure 6 represents the CDF of the time difference between the detection instant of TorrentGuard and The Pirate Bay for these content. We observe, that TorrentGuard reduces the detection time 60 minutes in median. Moreover, the reduction in detection time is higher than 2
hours for 20% of the fake contents, and for some cases it goes up to several days. Although previous results already demonstrate the signifi- cant improvement provided by our tool compared to the state of the art solution, the final objective of TorrentGuard is reducing the number of download events associated with fake content, thus preventing BitTorrent users facing malware and scam. Then, if TorrentGuard was widely used, it would have prevented almost 390K fake content downloads just during the 14 days of the evaluation period compared to The Pirate Bay. By extending this value to a complete year, we can state that TorrentGuard would be able to eliminate more than 10 millions fake content downloads per year compared to the existing The Pirate Bay solution. However, as stated before The Pirate Bay solution is specific for this portal but it is not applicable to the whole BitTorrent ecosystem. Specifically, in our dataset we identify around 950K fake content downloads occurring after The Pirate Bay identifies these content as fake. Rather, our proposed solution would be able to avoid also these downloads. Overall, TorrentGuard could avoid more than 1. millions fake content downloads in a period of two weeks. This means more than 35 millions in the course of a year. Finally, it is worth to mention that even this impressive number is only a lower bound since in our evaluation we only consider download events associated to few of the most important BitTorrent Trackers^5 but we do not consider download events coming from minor BitTorrent Trackers or the BitTorrent- associated DHT systems. In a nutshell, our initial evaluation suggests that Torrent- Guard could avoid up to tens of millions fake downloads per year. More importantly, this supposes (depending on the success of the fake publishers’ strategies) up to hundreds of thousands of computer infections and scam episodes. Hence, our evaluation shows very promising results to incentive the BitTorrent community to use the TorrentGuard.
(^5) For instance, http://openbittorrent.com/, http://publicbt.com/ that are the two major Trackers in the BitTorrent ecosystem
Type of connection Average Time Average speed University 6m 46s 6.9 Mbit/s University (with TOR) 20m 31s 2.27 Mbit/s Home ADSL 9m 59s 4.68 Mbit/s Home ADSL (with TOR) 31m 15s 1.49 Mbit/s TABLE II AVERAGE SPEED AND DOWNLOAD TIME OF THE FILE USING BITTORRENT WITH AND WITHOUT TOR by regular BitTorrent users to hide its IP address during the process of illegal content downloads and TOR is an example [4]. In TOR, traffic from a source (a fake publisher in our case) is bounced through several relays until it reaches the destination. Hence, the destination see that packets are coming from the IP address of the last (or egress ) proxy and the IP address of the source cannot be identified. Furthermore, the egress proxy changes from one communication to another. Fake publishers could exploit the functionality of TOR to avoid its IP address being detected by TorrentGuard. TorrentGuard would then mark the IP addresses of TOR egress proxies as fake. Hence, if some non-fake publishers would use TOR, TorrentGuard would also mark their content as fake, thus increasing the false positives rate. However, it is important to highlight that these anonymity services were not designed for supporting heavy traffic ap- plications such as BitTorrent so that the performance offered to these services is typically poor. Indeed, TOR developers specifically state that TOR does not perform well with BitTor- rent and is not designed for handling that type of traffic [6]. To evaluate the performance degradation that a fake publisher would experiment using TOR we have run a very simple test that compare the performance of a regular BitTorrent download vs a download done with usage of TOR. For this purpose we have chosen a mid-popular torrent from The Pirate Bay (around 200 seeders and 300 lechers, 350,5 MB) and downloaded it 10 times with and without TOR usage. We have run the experiment in premises of our University (with a sym- metric connection of 100 Mbps) and using a home ADSL (with a download and upload bandwidth of 6 Mbps and 320 kbps respectively). The results are presented in Tab. II. They suggest that operating BitTorrent over TOR reduces the performance around 3 times independently of the speed of the access link. Therefore, the utilization of anonymisation networks by fake publishers would severely impact the performance (i.e. content download time) of the swarms associated to fake content. This would result in attracting a lower number of victims that would prefer faster downloads. In addition, we have revealed in Section IV that the top fake publishers perform their activity from high speed services. This suggests that performance is a key aspect for their activity, thus anonymisation services seem to be a not appropriate option for them. In summary, current solutions that could be used by a fake publisher in order to hide its IP address are either not efficient (e.g. single proxy) or incur an important performance degra- dation that seems to not be adequate for the fake publishers’ activity. 2) Using multiple IP addresses: The second countermea- sure that a fake publisher could opt for is using a large
number of IP addresses such that it always have undetected IP addresses to use for publishing fake content. Next, we estimate the number of IP addresses that a fake publisher would need to perform its activity in the presence of our tool. TorrentGuard identifies an IP address as fake after detecting 3 fake user accounts in The Pirate Bay. Thus, TorrentGuard marks a content as fake starting from the 4th^ account used by the publisher. We demonstrated in Section IV that top 5% of fake publishers use in average 4 user accounts per day. Hence, a top fake publisher would need roughly 1 IP address per day in order to perform its activity and avoiding being blocked by TorrentGuard. In addition, we have seen that the activity of these publisher is performed from high speed servers located in data centres. Hence, these users would need to have access to around 30 IP addresses associated to high speed access links per month. In short, this strategy represents a double serious challenge: first, the fake publisher should be able to get continuously 30 new IP addresses per month and second, these IP addresses needs to be associated to high speed access links. This is rather difficult for regular Internet users and companies.
We can conclude that the studied countermeasures against TorrentGuard are either inefficient or unrealistic. Hence, the wide usage of TorrentGuard may lead to discourage fake publishers to perform their activity.
F. Torrent Guard Future Deployment
In the previous subsections we have demonstrated the enor- mous potential of our TorrentGuard prototype. However, we believe that there is still room for improvement if BitTorrent portals and Trackers get involved in a next stage for the development of TorrentGuard. In this case, TorrentGuard could be extended to be a distributed platform in which trackers would identify the IP address of the initial seeder for every content and BitTorrent portals would identify the infohash of fake torrents. BitTorrent portals would provide the infohash of fake torrents to trackers so that these would be able to blacklist the IP address associated to fake publishers and eliminate their associated swarms. Furthermore, trackers would report back to portals the infohash of every new fake torrent published from a blacklisted IP address so that portals can immediately remove the associated .torrent file. The described system could store the information in a central server that interacts with both portals and trackers and maintain a central repository that can be accessed by users as well. Another option is running a complete distributed system in which trackers and portals exchange the information without the necessity of any central server. We believe that the involvement of major BitTorrent Portals and Trackers in this project would lead to reduce the presence of fake content to negligible levels^6.
(^6) The authors of this paper have started a process to contact different Trackers and Portals to sense their interest in participating in the deployment of the described project.
A. BitTorrent Measurement
Several authors have used real data collection in order to understand different aspects of BitTorrent [13][15][16]. Different methods of measuring the BitTorrent are described in [19]. However, only few works have looked at the content publishers [8][31]. The most extensive study of character- isation of BitTorrent ecosystem is presented in [31]. This work includes discussion about BitTorrent publishers, defined by its username. We demonstrate in this paper that fake publishers cannot be identified by its username, instead they are identified by its IP address. The presence of the fake publishers was firstly mentioned in our previous work [12]. Based on our initial observation, in this paper we perform a thorough analysis of fake publishers and their published content revealing their target, incentives and strategies and propose a novel solution to prevent users from downloading fake content.
B. Fake content
There are several studies presenting the possible threats in the Internet. In [33] authors state that 40% of all computers are infected by botnets and can be controlled by attackers. Another study [24] reports high presence of malware and spyware content in the Internet. Few previous works have studied the malware propagation through P2P systems [18][28][32]. Specifically, Kalafut et al. [18] analyse LimeWire whereas Shin et al. [28] analysed KaZaa. These authors look at the problem from the content perspective instead of the fake publisher perspective used in this paper. This avoids that they discover more sophisticated strategies as those reported in our study in which the content is not the malware itself but includes a link to the malware. Similar content-based approach is applied in FakeDetector program [14] that looks for fake hashes in DirectConnect hubs (central servers to which downloaders connect) and reports found fake content to users and hub administrators. Finally, the authors of [18] propose to filter those content with a specific size since most of the malware content has specifically this size. Unfortunately, this solution is not valid for BitTorrent. Instead, we propose a more sophisticated solution (TorrentGuard) that provides early detection of fake content.
IX. CONCLUSIONS This paper presents the first comprehensive study about fake content in the BitTorrent ecosystem. For this purpose we use real data collected during a large-scale measurement study. The obtained results demonstrate that 35% of all the content is fake. Moreover, just a few tens of users are responsible for most of the published fake content. Furthermore, more than 99% of the fake torrents are associated with either malware or scam websites. This represents a serious threat for the BitTorrent ecosystem that must be eliminated or at least miti- gated. Towards this end, we have implemented TorrentGuard, a novel tool for early detection of fake content. Based on our initial evaluation the widely usage of this tool may prevent
the download of millions of fake content every year, thus contributing to reduce the number of computer infections and scam episodes faced by BitTorrent users.
REFERENCES [1] http://en.wikipedia.org/wiki/Magnet URI scheme. [2] http://thepiratebay.se/blog/206/. [3] http://www.isohunt.com. [4] https://www.torproject.org/. [5] http://mininova.org/. [6] https://blog.torproject.org/blog/bittorrent-over-tor-isnt-good-idea. [7] Alexa. http://www.alexa.com/topsites/. [8] S. Le Blond, A. Legout, F. Lefessant, W. Dabbous, and M. Ali Kaafar. Spying the world from your laptop. LEET’10 , 2010. [9] CAPTCHA. http://www.captcha.net/. [10] D. R. Choffnes and F. E. Bustamante. Taming the torrent: a practical approach to reducing cross-isp traffic in peer-to-peer systems. ACM SIGCOMM 2008. [11] S.A. Crosby and D.S. Wallach. An analysis of bittorrent’s two kademlia- based dhts. Technical Report TR-07-04,Department of Computer Sci- ence, Rice University, June 2007. [12] R. Cuevas, M. Kryczka, A. Cuevas, S. Kaune, C. Guerrero, and R. Rejaie. Is content publishing in bittorrent altruistic or profit-driven? ACM CONEXT 2010, Philadelphia, USA. [13] R. Cuevas, N. Laoutaris, X. Yang, G. Siganos, and P. Rodriguez. Deep diving into bittorrent locality. IEEE INFOCOM 2011, Shanghai, China. [14] FakeDetector. http://sourceforge.net/projects/fakedetector/. [15] L. Guo, S. Chen, Z. Xiao, E. Tan, X. Ding, and X. Zhang. Measurements, analysis, and modeling of bittorrent-like systems. In ACM IMC’. [16] T. Isdal, M. Piatek, Krishnamurthy. A, and Anderson T. Leveraging bittorrent for end host measurements. In PAM , 2007. [17] R. Izhak-Ratzin, H. Park, and M. van der Schaar. Reinforcement learning in bittorrent systems. In In Proc. of INFOCOM 2011. [18] A. Kalafut, A. Acharya, and M. Gupta. A study of malware in peer-to-peer networks. 6th ACM SIGCOMM conference on Internet measurement, IMC 2006. [19] M. Kryczka, R. Cuevas, A. Cuevas, C. Guerrero, and A. Azcorra. Measuring bittorrent ecosystem: Techniques, tips and tricks. IEEE Communications Magazine (accepted to publication). [20] N. Laoutaris, D. Carra, and P. Michardi. Uplink allocation beyond choke/unchoke or how to divide and conquer best. In In Proc. of the CoNEXT 2008. [21] N. Liogkas, R. Nelson, E. Kohler, and L. Zhang. Exploiting bittorrent for fun (but not profit). In In IPTPS 2006. [22] T. Locher, P. Moor, S. Schmid, and R. Wattenhofer. Free riding in bittorrent is cheap. In In HotNets 2006. [23] MaxMind. http://www.maxmind.com/. [24] A. Moshchuk, T. Bragin, S. Gribble, and H. Levy. A crawler-based study of spyware on the web. Internet Society Network and Distributed System Security Symposium (NDSS) , 2006. [25] M. Piatek, T. Isdal, T. Anderson, A. Krishnamurthy, and A. Venkatara- mani. Do incentives build robustness in bittorrent? In 4th USENIX Symposium NSDI 2007. [26] Sandvine. Fall 2010 Global Internet Phenomena Report. Available at: http://www.sandvine.com/news/global broadband trends.asp. [27] Alex Sherman, Jason Nieh, and Clifford Stein. Fairtorrent: Bringing fairness to peer-to-peer systems. In In Proc. of the ACM CoNEXT 2009. [28] S. Shin, J. Jung, and H. Balakrishnan. Malware prevalence in the kazaa file-sharing network. 6th ACM SIGCOMM conference on Internet measurement, IMC 2006 , 2006. [29] M. Sirivianos, J. H. Park, R. Chen, and X. Yang. Free-riding in bittorrent networks with the large view exploit. In Intl. Workshop on Peer-to-peer Systems (IPTPS) 2007. [30] H. Xie, Y. R. Yang, A. Krishnamurthy, Y. Liu, and A. Silberschatz. P4p: Provider portal for applications. ACM SIGCOMM 2008. [31] C. Zhang, P. Dhungel, D. Wu, and K.W. Ross. Unraveling the bittorrent ecosystem. IEEE Transactions on Parallel and Distributed Systems. [32] L. Zhou, L. Zhang, F. McSherry, N. Immorlica, M. Costa, and S. Chien. A first look at peer-to-peer worms: Threats and defenses. In Proceedings of the IPTPS , Feb. 2005. [33] Zhaosheng Zhu, Guohan Lu, Yan Chen, Z.J. Fu, P. Roberts, and Keesook Han. Computer software and applications. COMPSAC ’.