Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Apache Flume Configuration for Twitter Data Collection, Study Guides, Projects, Research of Computer Science

The configuration of apache flume for collecting data from twitter using various components such as sources, channels, and sinks. The twitter source is set up with specific keywords and authentication details, while the memory channel and hdfs sink are configured for data storage.

Typology: Study Guides, Projects, Research

2016/2017

Uploaded on 06/04/2017

gulabchand-tejwani
gulabchand-tejwani 🇮🇳

3

(2)

6 documents

1 / 1

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
# Flume components
agent.sources = twitter-source
agent.sinks = hdfs-sink
agent.channels = memory-channel
# Channel
agent.channels.memory-channel.type = memory
agent.channels.memory-channel.capacity = 1000
agent.channels.memory-channel.transactionCapacity = 100
# Source
agent.sources.twitter-source.type = com.hirw.twittersource.TwitterSource
agent.sources.twitter-source.consumerKey = zyfFBK13ScCg8YjUhea3g
agent.sources.twitter-source.consumerSecret =
fSgkxQuoGIZz7lfqBu167KZy3dxl3OdXMhdTvk91Q
agent.sources.twitter-source.accessToken =
2288102790-8BwzXU0mTc9nKZEuyxRiel9UulOlmTWXFfUWXia
agent.sources.twitter-source.accessTokenSecret =
ptLkj2IgH2RfXTLxdjyRUzTksIRtys2CTM3YPmIQfcv5o
agent.sources.twitter-source.keywords =
food,cuisine,restaurant,fastfood,sushi,service,quality,oil,garlic,healthy,organic,fresh,ingredien
ts,waiter,server,host,reservations,taste,chef,burger,cooked,wine,beer,ambience,taco,buritto,sal
sa,fries,chicken,meat,fish,italian,pasta,gourmet,sauce
agent.sources.twitter-source.channels = memory-channel
# Sink
agent.sinks.hdfs-sink.type = hdfs
agent.sinks.hdfs-sink.hdfs.path = flume/tweets/%Y/%m/%d/%H
agent.sinks.hdfs-sink.hdfs.fileType = DataStream
agent.sinks.hdfs-sink.hdfs.writeFormat = Text
agent.sinks.hdfs-sink.hdfs.rollCount = 500
agent.sinks.hdfs-sink.channel = memory-channel

Partial preview of the text

Download Apache Flume Configuration for Twitter Data Collection and more Study Guides, Projects, Research Computer Science in PDF only on Docsity!

Licensed to the Apache Software Foundation (ASF) under one

or more contributor license agreements. See the NOTICE file

distributed with this work for additional information

regarding copyright ownership. The ASF licenses this file

to you under the Apache License, Version 2.0 (the

"License"); you may not use this file except in compliance

with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.

Unless required by applicable law or agreed to in writing,

software distributed under the License is distributed on an

"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY

KIND, either express or implied. See the License for the

specific language governing permissions and limitations

under the License.

Flume components

agent.sources = twitter-source agent.sinks = hdfs-sink agent.channels = memory-channel

Channel

agent.channels.memory-channel.type = memory agent.channels.memory-channel.capacity = 1000 agent.channels.memory-channel.transactionCapacity = 100

Source

agent.sources.twitter-source.type = com.hirw.twittersource.TwitterSource agent.sources.twitter-source.consumerKey = zyfFBK13ScCg8YjUhea3g agent.sources.twitter-source.consumerSecret = fSgkxQuoGIZz7lfqBu167KZy3dxl3OdXMhdTvk91Q agent.sources.twitter-source.accessToken = 2288102790-8BwzXU0mTc9nKZEuyxRiel9UulOlmTWXFfUWXia agent.sources.twitter-source.accessTokenSecret = ptLkj2IgH2RfXTLxdjyRUzTksIRtys2CTM3YPmIQfcv5o agent.sources.twitter-source.keywords = food,cuisine,restaurant,fastfood,sushi,service,quality,oil,garlic,healthy,organic,fresh,ingredien ts,waiter,server,host,reservations,taste,chef,burger,cooked,wine,beer,ambience,taco,buritto,sal sa,fries,chicken,meat,fish,italian,pasta,gourmet,sauce agent.sources.twitter-source.channels = memory-channel

Sink

agent.sinks.hdfs-sink.type = hdfs agent.sinks.hdfs-sink.hdfs.path = flume/tweets/%Y/%m/%d/%H agent.sinks.hdfs-sink.hdfs.fileType = DataStream agent.sinks.hdfs-sink.hdfs.writeFormat = Text agent.sinks.hdfs-sink.hdfs.rollCount = 500 agent.sinks.hdfs-sink.channel = memory-channel