Jmeter Load Testing:Send and Receive Email


Email notification, send email and retrieve email is a important part of any company or organisation so if you have your email server and want to know the email performance of email server then we can use the jmeter. Beauty of jmeter is it gives the power in your hand you can measure the performance of send email same as received email performance.So we can define the user according to requirement and can test the performance of email server.So for the configuration email we need SMTP sampler.First of all we make a thread group and SMTP sampler.

threadgroup

After that we add SMTP sampler.

SMTP

Here we define the server name,port,address,username password, message, we can attach the file.Here we using our email address and a normal message.

now we define a listener and can see the result in view result tree.

smtp_result

Now we read the mail from jmeter.

So for that we need a mail reader sampler. here we define the protocol pop3 or IMPS.

mail_reader

we define the server host,server port,username password or any you want to define any security setting. Now we run the script and see the result.

mailreaderresult

completeresult

 

Fell free to ask me any question regarding JMeter.

Thanks.

Posted in JMeter, Scala | Tagged , , , , , , , , | 1 Comment

Smattering of HDFS


INTRODUCTION TO HDFS :-
Hadoop is an open-source framework that allows to store and process big data in a distributed environment across clusters of computers.It has many similarities with existing distributed file systems. However, the differences from other distributed file systems are significant as it provides high-performance access to data across Hadoop clusters. Like other Hadoop-related technologies, HDFS has become a key tool for managing pools of big data and supporting big data analytics applications.It is the primary storage system used by Hadoop applications.
HDFS is highly fault-tolerant and is designed to be deployed on low-cost hardware.It provides high throughput access to application data and is suitable for applications that have large data sets.
HDFS uses a master/slave architecture where master consists of a single NameNode that manages the file system metadata and one or more slave DataNodes that store the actual data.

What are NameNodes and DataNodes?

The NameNode is the centerpiece of an HDFS file system. It keeps the directory tree of all files in the file system, and tracks where across the cluster the file data is kept. It does not store the data of these files itself. The NameNode is a Single Point of Failure for the HDFS Cluster. When the NameNode goes down, the file system goes offline.

The DataNode is responsible for storing the files in HDFS. It manages the file blocks within the node. It sends information to the NameNode about the files and blocks stored in that node and responds to the NameNode for all filesystem operations. A functional filesystem has more than one DataNode, with data replicated across them.

Within HDFS, a given name node manages file system namespace operations like opening, closing, and renaming files and directories. A name node also maps data blocks to data nodes, which handle read and write requests from HDFS clients. Data nodes also create, delete, and replicate data blocks according to instructions from the governing name node.

Continue reading

Posted in HDFS | Tagged , , , , , | 2 Comments

Getting Started with Apache Cassandra


Why Apache Cassandra?

Apache Cassandra is a free, open source, distributed data storage system that differs sharply from relational database management systems.
Cassandra has become so popular because of its outstanding technical features. It is durable, seamlessly scalable, and tuneably consistent.
It performs blazingly fast writes, can store hundreds of terabytes of data, and is decentralized and symmetrical so there’s no single point of failure.
It is highly available and offers a schema-free data model.

Installation:

Cassandra is available for download from the Web here. Just click the link on the home page to download the latest release version and Unzip the downloaded cassandra  to a local directory.

Starting the Server:

Continue reading

Posted in Cassandra, Scala | Tagged , , , , , , , | 4 Comments

Message Broker in Lagom using Kafka


What is Lagom?

Lagom framework helps in simplifying the development of microservices by providing an integrated development environment. This benefits one by allowing them to focus on solving business problems instead of wiring services together.

Lagom exposes two APIs, Java and Scala, and provides a framework and development environment as a set of libraries and build tool plugins. The supported build tools with Lagom are Maven and sbt. You can use Maven with Java or sbt with Java or Scala.

Message Broker Support in Lagom

If there is a synchronous communication between microservices, it implies that both the sender and the receiver have to be running at the same time. Now this may lead to consistency problems if messages get missed, and can result in a system that is brittle, where a failure in one component can lead to failure of the complete system.

As a solution to this, one can use an infrastructure component to enable services to communicate asynchronously. This component is referred to as a message broker.

To support this, Lagom provides a Message Broker API which makes it very simple for the services to share data asynchronously.

Currently, Lagom supports implementation of the Message Broker API that uses Kafka.

Continue reading

Posted in Scala | Tagged , , , , , , , | 1 Comment

Tutorial 3:Monitor CPU Utilization with Dynatrace


This is last blog of this series in this we will read how to monitor CPU Utilization by Dynatrace.

Why we always need  memory  analysis ?

We need memory analysis is to optimize garbage collection (GC) in such a way that its impact on application response time or CPU usage is minimized. If garbage collection has a negative impact on response time, our goal must be to optimize the configuration.

In Dynatrace, we have full analysis of memory utilization. A healthy system performs better. Dynatrace uses defined parameters to monitor health. These parameters use metrics such as CPU, memory, network, and disk.

Cpu Profiler

Here You can see these values in the Transaction Flow on the respective agent node. Use this to identify the impact of an unhealthy host on your business transactions.

cpu3

We can easily go through the execution of the each thread as like below figure.

cpu5

You can use the filter list in the top right corner  to filter the content to the Median, slowest 10% or fastest 90% of transactions in the session. below chart gives a quick impression of the typical response times. Dynatrace captures CPU time used by the threads that are executing this transaction for selected time frame. And like this we can also go through failure rate and Throughput.

response1

Dynatrace includes the different customizable report .It is Generateing  reports of dashboards in various formats in the Dynatrace Client.At we can Schedule reports to execute periodically and publish them via email, or store the reports in the file system as below figure.

report

In the end we can confidently say that this tool is very helpful for our performance testing we can easily solved our problem in minutes.

References:

Posted in Scala, Performance Testing | Tagged , , , , | 1 Comment

The Dominant APIs of Spark: Datasets, DataFrames and RDDs


While working with Spark often we come across the three APIs: DataFrames, Datasets and RDDs.  In this blog I will discuss the three in terms of use case, performance and optimization.  It is essential to keep in mind that there is seamless transformation available between the three DataFrames, Datasets and RDDs. Implicitly the RDD forms the apex of both DataFrame and Datasets.

The inception of the three is somewhat described below:

RDD (Spark1.0) —> Dataframe(Spark1.3) —> Dataset(Spark1.6)

Let us begin with the Resilient Distributed Dataset (RDD).

Continue reading

Posted in Spark | Tagged , , , , , , | 1 Comment

Neo4j Apoc : A Blessing For Developer


Hello Folks,

As we know about Neo4j, it pulls out developers life from the trouble and black & white screen of the databases. It doesn’t give freedom from the old databases also provides best support with it’s predefined procedures.

As we know that in the Relational Database, Procedure provide advantages of better performance, scalability, productivity, ease of use and security and Neo4j also provides some amazing tool which can perform as mention above.

Yes, I am talking about the Apoc and using of Apoc with Neo4j, is a blessing for the developers. It provides many predefined procedures or user defined functions/views so that we can easily use it and improve our productivity in very simple manner.

APOC is stands for ‘Awesome Procedure On Cypher‘. APOC is a library of procedure for the various areas. It is introduce with the Neo4j 3.0

There are many areas where we use APOC and the lit of areas are :

  • Graph Algorithm
  • Metadata
  • Manual indexes and relationship indexes
  • Full text search
  • Integration with other databases like MongoDB, ElasticSearch, Cassandra and relational databases
  • Path expansion
  • Import and export
  • date and time function
  • Loading of XML and JSON from APIs and files
  • String and text function
  • Concurrent and batched Cypher Execution
  • spatial Function and Lock
  • Collection and map utilities

When you are using APOC there are two ways to get it and use with Neo4j :

First Way :

  • Download binary jar from the latest release [Hit Here]
  • Put that into your $Neo4j_Home/plugins/ folder
  • Restart your Neo4j Server.

Second Way :

  • Clone neo4j-apoc-procedure from Hit Here.
  • Go to the folder with ‘cd neo4j-apoc-procedures’.
  • Now create a jar with the help of command ‘mvn clean compile install’.
  • Now copy your jar file from target to $Neo4j_Home/plugins/ folder.[cp target/apoc-1.0.0-SNAPSHOT.jar $Neo4j_Home/plugins/]
  • Restart your Neo4j Server.

Now you are ready to use APOC with Neo4j. Today we will discuss about the data migration between the other data base to Neo4.

We use many databases for storing the data. But when we have a large amount of data and tables that time it becomes so hard to make query and execute them on the database. We have to be extra cautious to perform the task and we get bored to see same screen without any fun 🙂 . When we worked on any other database and think to use neo4j that time we face the issue for migrating data into Neo4j. We are going to discuss migrating data from some famous and use databases.

Oracle :

We are in the last database to migrate data to Neo4j but as obvious not least. We can download JDBC .jar file (Download) and keep it in the $Neo4j_Home/plugins and restart the Neo4j. We can provide URL in the $Neo4j_Home/conf/neo4j.conf as :

apoc.jdbc.oracle_url.url=jdbc:oracle:thin:user/password@127.0.0.1:1521/XE
  • After restarting the Neo4j server we are set for migrating the data from the Oracle to Neo4j. We fetch the data from the Oracle where we have a table with name employee_details to Neo4j.Now we load the driver with the APOC.
CALL apoc.load.jdbc('oracle_url','employee_details') YIELD row
RETURN count(*);

screenshot-from-2016-09-10-123634

  • Let’s create Index, Constraints and Relation the data.
/**
* Here we define schema and key.
*/
CALL apoc.schema.assert(
 {EMPINFO:['name', 'age','salary']},
 {EMPINFO:['id'],ADDRESS:['address']});

Screenshot from 2016-09-10 14:53:00.png

  • Now we will load data and perform Merge and Create operation so that we can create the node and relationship between the node.
/**
* Here we load data in the neo4j and create node with the help of schema which we define
* earlier.
*/
CALL apoc.load.jdbc('oracle_url','employee_details') yield row
MERGE (g:ADDRESS {name:row.ADDRESS})
CREATE (t:EMPINFO {id:toString(row.ID), name:row.NAME, age:toString(row.AGE), salary:toString(row.SALARY)})
CREATE (t)-[:LIVE]->(g);

screenshot-from-2016-09-10-145400

  • We can see Relation Graph and it will look something like this :
/**
* For Displaying Performed Relation
*/

MATCH p=()-[r:LIVE]->() RETURN p LIMIT 25;

screenshot-from-2016-09-10-145506

MYSQL :

We want to migrate data from the MYSQL as before we have to download JDBC .jar file (Download) and keep it in the $Neo4j_Home/plugins and update $Neo4j_Home/conf/neo4j.conf as:

apoc.jdbc.mysql_url.url=jdbc:mysql://localhost:3306/test?user=user&password=pass

Restart the Neo4j server and we are set for migrating the data from the Cassandra to Neo4j.

  • We hit the MySQL and start fetching data and perform count operation.
CALL apoc.load.jdbc('mysql_url','employee_data') yield row
RETURN count(*);

Screenshot from 2016-09-10 12:36:34 (copy).png

PostgreSQL :

When we use PostgreSQL, we have to download JDBC .jar file (Download) and keep it in the $Neo4j_Home/plugins and restart the Neo4j. After restarting the Neo4j server we are set for migrating the data from the PostgreSQL to Neo4j.

  • Now we load the driver with the APOC.
CALL apoc.load.driver('org.postgresql.Driver');
  • Now we create the call for fetching the data from the PostgreSQL where we have a table with name employee_details to Neo4j.
with 'jdbc:postgresql://localhost:5432/testdb?user=postgres&password=postgres' as url
CALL apoc.load.jdbc(url,'employee_details') YIELD row
RETURN count(*);
  • If we don’t want to use these step than we can provide URL in the $Neo4j_Home/conf/neo4j.conf and restart the server :
apoc.jdbc.postgresql_url.url=jdbc:postgresql://localhost:5432/testdb?user=postgres&password=postgres

We can now fetch data direct. We don’t need to load driver also.

CALL apoc.load.jdbc('postgresql_url','employee_details') YIELD row
RETURN count(*);
  • Create Nodes and Relation in the data.
/**
* Here we define schema and key. In first column we define those column_name
* which can be null and In the second we those column name which we want unique.
*/

CALL apoc.schema.assert( {Detail:['name','age','address','salary']},
{Detail:['id']});

/**
* Here we load data in the neo4j and create node with the help of schema which we define
* earlier.
*/

CALL apoc.load.jdbc('jdbc:postgresql://localhost:5432/testdb?user=postgres&password=postgres','employee_details') yield row
CREATE (t:Detail {id:toString(row.id), name:row.name,
age:toString(row.age), address:row.address, salary:toString(row.salary)})
return t;

Screenshot from 2016-09-09 11:09:37.png

screenshot-from-2016-09-09-010005

Cassandra :

Now we migrate data from the Cassandra to Neo4j. Now we first import data into the cassandra if we don’t have data in the cassandra or we can use it for test also.

  • We have to run following command for setting up initial data in the cassandra :
curl -OL https://raw.githubusercontent.com/neo4j-contrib/neo4j-cassandra-connector/master/db_gen/playlist.cql
curl -OL https://raw.githubusercontent.com/neo4j-contrib/neo4j-cassandra-connector/master/db_gen/artists.csv
curl -OL https://raw.githubusercontent.com/neo4j-contrib/neo4j-cassandra-connector/master/db_gen/songs.csv
$CASSANDRA_HOME/bin/cassandra
$CASSANDRA_HOME/bin/cqlsh -f playlist.cql
  • We have set our cassandra database with the data. We have to download JDBC .jar file (Download) and keep it in the $Neo4j_Home/plugins. We can provide URL in the $Neo4j_Home/conf/neo4j.conf as :
apoc.jdbc.cassandra_songs.url=jdbc:cassandra://localhost:9042/playlist

Restart the Neo4j server and we are set for migrating the data from the Cassandra to Neo4j.

  • We hit the cassandra and start fetching data and perform count operation.
CALL apoc.load.jdbc('cassandra_songs','artists_by_first_letter') yield row
RETURN count(*);

screenshot-from-2016-09-09-130005

  • Let’s create Index, Constraints and Relation the data.
/**
* Here we define schema and key.
*/
CALL apoc.schema.assert(
  {Track:['title','length']},
  {Artist:['name'],Track:['id'],Genre:['name']});

Screenshot from 2016-09-09 13:12:25.png

  • Now we will load data and perform Merge and Create operation so that we can create the node and relationship between the node.
/**
* Here we load data in the neo4j and create node with the help of schema which we define
* earlier.
*/
CALL apoc.load.jdbc('cassandra_songs','track_by_artist') yield row
MERGE (a:Artist {name:row.artist})
MERGE (g:Genre {name:row.genre})
CREATE (t:Track {id:toString(row.track_id), title:row.track,
length:row.track_length_in_seconds})
CREATE (a)-[:PERFORMED]->;(t)
CREATE (t)-[:GENRE]->(g);

Screenshot from 2016-09-09 13:13:28.png

  • We can see Relation Graph and it will look something like this :
/**
* For Displaying Performed Relation
*/

MATCH p=()-[r:PERFORMED]->() RETURN p LIMIT 25;

Performed.png

/**
* For Displaying GENRE Relation
*/

MATCH p=()-[r:GENRE]->() RETURN p LIMIT 100;

Screenshot from 2016-09-09 14:11:41.png

After importing the data in Neo4j, we have to thing about the sync of data. We can use schedule process which can be timebase and automatically sync data between the databases. We can also used event based integration where we will defined the event at which we want to update the database.

Note : As we discuss I want to notify again if you do not update driver name into $Neo4j_Home/conf/neo4j.conf then you have to load driver in Neo4j otherwise you have to provide only driver name into the query.

This is an basic example for using the APOC and it is also a first step when you start using Neo4j and want to replace it with your old databases that time you don’t want to use loose your data. After migrating the data you are ready to use Neo4j with your data which was exists in the old databases.

If You have any questions you can contact me here or on Twitter: @anuragknoldus

KNOLDUS-advt-sticker

Posted in Scala | Tagged , , , , , , , | 4 Comments

Working with Hadoop Filesystem Api


Reading data from and writing data to Hadoop Distributed File System (HDFS) can be done in a number of ways. Now let us start understanding how this can be done by using the FileSystem API, to create and write to a file in HDFS, followed by an application to read a file from HDFS and write it back to the local file system.

To start with it :

1) we first need to include the (sbt) dependencies (for an sbt project) :


libraryDependencies ++= Seq(
	"org.apache.hadoop" % "hadoop-common" % "2.8.0",
	"org.apache.hadoop" % "hadoop-hdfs" % "2.8.0"
)

2) Next step is to configure for the filesystem :

Continue reading

Posted in Java | Tagged , , | Leave a comment

UnderStanding External Table In Hive


Usually when you create tables in hive using raw data in HDFS, it moves them to a different location – “/user/hive/warehouse”. If you created a simple table, it will be located inside the data warehouse. The following hive command creates a table with data location at “/user/hive/warehouse/empl”

hive> CREATE TABLE EMPL(ID int,NAME string)ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' ;
OK
Time taken: 0.933 seconds

now load csv into this table

hive> load data inpath 'hdfs://localhost:54310/dat.csv' into table empl;
Loading data to table default.empl
Table default.empl stats: [numFiles=0, numRows=0, totalSize=0, rawDataSize=0]
OK

first

now try to drop this table

hive> drop table empl;
OK
Time taken: 0.397 seconds

second
When you drop the table, the raw data is lost as the directory corresponding to the table in warehouse is deleted.
You may also not want to delete the raw data as some one else might use it

so here comes the concept of external table

For External Tables ,Hive does not move the data into its warehouse directory. If the external table is dropped, then the table metadata is deleted but not the data.have a look at the below commands

hive> CREATE EXTERNAL TABLE EMPL(ID int,NAME string)ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LOCATION '/opt/carbonStore' ;
OK
Time taken: 0.13 seconds

To create external table, simply point to the location of data while creating the tables. This will ensure that the data is not moved into a location inside the warehouse directory.

now load data in this table

hive> load data inpath 'hdfs://localhost:54310/opt/carbonStore/dat.csv' into table empl;
Loading data to table default.empl
Table default.empl stats: [numFiles=0, totalSize=0]
OK
Time taken: 0.379 seconds

now drop this table

hive> drop table empl;
OK
Time taken: 0.301 seconds

even though table is deleted but still raw data is not

final.png

i hope this blog will help to understand external table concept

KNOLDUS-advt-sticker

Posted in Scala | Leave a comment

Tutorial 2: How To Diagnose using Dynatrace (APM) Tool


In my Last blog, we went through the introduction of Dynatrace Digital Performance tool. In this subsequent blog, we will learn how to diagnose this tool.

Once you open the Dynatrace client,  you will be landed to a dashboard page where you can easily view recent activity of all the requests to the server (picture given in the last blog).

We can easily connect with different hosts by Deploying Dynatrace and Monitor, analyze and optimize the performance of  running applications.

Time Frame

We can customize the default time period, as we can select any specified time period from the time frames provided. Once selected, you will be able to see all the request for all the hosts in the graphical representation for that time period.

time

Smartscape Topology

This tool also provides a tab Smartscape Topology in which we can easily  go through the  entire environment of the selected host. Smartscape provides an effective and efficient overview of all the topological dependencies in vertical axis and it is easily understandable. In this diagram, it explains Dynatrace API which allows API consumers to query application, service, and host attributes, including all incoming and outgoing call relationships.

smartscape

Through this we can easily and quickly visualize a service map.Through this we can automatically detect the complete web environment on which the DynaTrace client is installed.

Click on a process name in the Process tile and click Open in client to drill down for analysis in the Dynatrace Client, here we can easily check and perform memory analysis, examine transaction flow of passing transactions, and analyze CPU utilization for efficient process issue resolution.

Error and Problems

Dyntrace provides a problem section where we will be able to recognize the problems in our system , and dyntrace also highlights the alert in alerting system.

dy3

Technology and Databases Monitoring

Through dynatrace we can monitor our databases and technology which are currently running on our system for health checkup . Database agent connects to the instances and retrieves  data periodically.

database

db

Here we can easily view all the technologies which are used by the agent, we can diagnose each and every technology by clicking the same.

technology

There are so many functionality in this tool which look little bit hard to understand for first time user so do not get confused . Moreover, In my next tutorials, We would look at how we can make it more usable and readable, and information related to CPU Monitoring.

References:

https://www.dynatrace.com/

Tutorial :Next 3

Posted in Performance Testing, Scala, testing | Tagged , , , | Leave a comment