Apache Spark Core, Resilient Distributed Dataset RDD

Spark also makes it possible to write code more quickly as you have over 80 high-level operators at your disposal. To demonstrate this, let’s have a look at the “Hello World!” of BigData: the Word Count example. Written in Java for MapReduce it has around 50 lines of code, whereas in Spark (and Scala) you can do it as simply as this:

sparkContext.textFile("hdfs://...")
            .flatMap(line => line.split(" "))
            .map(word => (word, 1)).reduceByKey(_ + _)
            .saveAsTextFile("hdfs://...")

Another important aspect when learning how to use Apache Spark is the interactive shell (REPL) which it provides out-of-the box. Using REPL, one can test the outcome of each line of code without first needing to code and execute the entire job. The path to working code is thus much shorter and ad-hoc data analysis is made possible.

Additional key features of Spark include:

  • Currently provides APIs in Scala, Java, and Python, with support for other languages (such as R) on the way
  • Integrates well with the Hadoop ecosystem and data sources (HDFS, Amazon S3, Hive, HBase, Cassandra, etc.)
  • Can run on clusters managed by Hadoop YARN or Apache Mesos, and can also run standalone

The Spark core is complemented by a set of powerful, higher-level libraries which can be seamlessly used in the same application. These libraries currently include SparkSQL, Spark Streaming, MLlib (for machine learning), and GraphX, each of which is further detailed in this article. Additional Spark libraries and extensions are currently under development as well.

Spark Core

Spark Core is the base engine for large-scale parallel and distributed data processing. It is responsible for:

  • memory management and fault recovery
  • scheduling, distributing and monitoring jobs on a cluster
  • interacting with storage systems

RDD Resilient Distributed Dataset

Spark introduces the concept of an RDD (Resilient Distributed Dataset), an immutable fault-tolerant, distributed collection of objects that can be operated on in parallel. An RDD can contain any type of object and is created by loading an external dataset or distributing a collection from the driver program.

RDDs support two types of operations:

  • Transformations are operations (such as map, filter, join, union, and so on) that are performed on an RDD and which yield a new RDD containing the result.
  • Actions are operations (such as reduce, count, first, and so on) that return a value after running a computation on an RDD.

Transformations in Spark are “lazy”, meaning that they do not compute their results right away. Instead, they just “remember” the operation to be performed and the dataset (e.g., file) to which the operation is to be performed. The transformations are only actually computed when an action is called and the result is returned to the driver program. This design enables Spark to run more efficiently. For example, if a big file was transformed in various ways and passed to first action, Spark would only process and return the result for the first line, rather than do the work for the entire file.

By default, each transformed RDD may be recomputed each time you run an action on it. However, you may also persist an RDD in memory using the persist or cache method, in which case Spark will keep the elements around on the cluster for much faster access the next time you query it.

Source: https://www.toptal.com/spark/introduction-to-apache-spark

Advertisements
Posted in Apache Spark, Uncategorized | Leave a comment

Apache Spark

Spark genel kullanıma açık bir veri işleme motorudur. Sunmuş olduğu API ve araçlar veri bilimciler ve uygulama geliştiricilerin kendi uygulamalarına Spark’ı entegre etmelerine olanak tanımaktadır. Böylece veriyi hızlı biçimde sorgulamak, analiz etmek ve dönüştürmek ölçümlenebilir şekilde mümkündür. Spark’ın esnekliği onu neredeyse kullanım durumlarında karşılaşılan tüm problemleri çözmede uygun bir araç olmasını ve Spark’ın petabyte’larca bilgiyi fiziksel ve sanal sunuculardan oluşan bir küme üzerinde dağıtık biçimde işleyebilmesine olanak sağlıyor.

Kaynak:http://oranteknoloji.com/blog/2014/01/30/apache-spark-nedir/

Posted in Apache Spark | Leave a comment

Defter

Kaynak Site: İsmail ADAR

Stored Procedure’lerin Çağrılma Sayısının Bulunması

Select DB_NAME(st.dbid) DatabaseName,
OBJECT_SCHEMA_NAME(st.objectid,dbid) +’.’+OBJECT_NAME(st.objectid,dbid) StoredProcedure,
MAX(cp.usecounts) ExecutionCount
FROM sys.dm_exec_cached_plans cp
CROSS APPLY sys.dm_exec_sql_text(cp.plan_handle) st
WHERE DB_NAME(st.dbid) IS NOT NULL AND cp.objtype = ‘proc’
GROUP BY
DB_NAME(st.dbid),
OBJECT_SCHEMA_NAME(st.objectid,dbid) +’.’+OBJECT_NAME(st.objectid,dbid)
ORDER BY MAX(cp.usecounts) DESC

T-SQL Pivot Komutu Kullanımı

SELECT * FROM (
SELECT sp.StateProvinceCode
FROM Person.Address a
INNER JOIN Person.StateProvince sp
ON a.StateProvinceID = sp.StateProvinceID
) k
PIVOT (
COUNT(StateProvinceCode)
FOR StateProvinceCode IN([AZ],[CA],[TX])
) AS pvt;

T-SQL ile Gruplanan Değerlerin Çarpılması

CREATE TABLE #Sonuc(OyunAd VARCHAR(100),Oran INT)
INSERT INTO #Sonuc VALUES(‘Futbol’,3)
INSERT INTO #Sonuc VALUES(‘Futbol’,6)
INSERT INTO #Sonuc VALUES(‘Futbol’,2)
INSERT INTO #Sonuc VALUES(‘Basketbol’,7)
INSERT INTO #Sonuc VALUES(‘Basketbol’,4)
INSERT INTO #Sonuc VALUES(‘Basketbol’,5)
INSERT INTO #Sonuc VALUES(‘Voleybol’,8)
INSERT INTO #Sonuc VALUES(‘Voleybol’,2)
SELECT * FROM #Sonuc

SELECT OyunAd, SUM(Oran) as Toplam,EXP(SUM(LOG(Oran))) as Carpim
FROM #Sonuc
GROUP BY OyunAd

 

Posted in Uncategorized | Leave a comment

FORMAT (Transact-SQL)

https://docs.microsoft.com/en-us/sql/t-sql/functions/format-transact-sql

Syntax

FORMAT ( value, format [, culture ] )

Returns a value formatted with the specified format and optional culture in SQL Server 2017. Use the FORMAT function for locale-aware formatting of date/time and number values as strings. For general data type conversions, use CAST or CONVERT.

Arguments

value
Expression of a supported data type to format. For a list of valid types, see the table in the following Remarks section.

format
nvarchar format pattern.

The format argument must contain a valid .NET Framework format string, either as a standard format string (for example, “C” or “D”), or as a pattern of custom characters for dates and numeric values (for example, “MMMM DD, yyyy (dddd)”). Composite formatting is not supported. For a full explanation of these formatting patterns, consult the .NET Framework documentation on string formatting in general, custom date and time formats, and custom number formats. A good starting point is the topic, “Formatting Types.”1

culture
Optional nvarchar argument specifying a culture.

If the culture argument is not provided, the language of the current session is used. This language is set either implicitly, or explicitly by using the SET LANGUAGE statement. culture accepts any culture supported by the .NET Framework as an argument; it is not limited to the languages explicitly supported by SQL Server. If the culture argument is not valid, FORMAT raises an error.

Return Types

nvarchar or null

The length of the return value is determined by the format.

Remarks

FORMAT returns NULL for errors other than a culture that is not valid. For example, NULL is returned if the value specified in format is not valid.

The FORMAT function is non-deterministic.

FORMAT relies on the presence of the .NET Framework Common Language Runtime (CLR).

This function cannot be remoted since it depends on the presence of the CLR. Remoting a function that requires the CLR, could cause an error on the remote server.

FORMAT relies upon CLR formatting rules, which dictate that colons and periods must be escaped. Therefore, when the format string (second parameter) contains a colon or period, the colon or period must be escaped with backslash when an input value (first parameter) is of the time data type. See D. FORMAT with time data types.

DECLARE @d DATETIME = ’08/23/2017′;

SELECT
FORMAT ( @d, ‘d’, ‘tr-tr’ ) AS ‘TR Turkey Result’
,FORMAT ( @d, ‘d’, ‘en-US’ ) AS ‘US English Result’
,FORMAT ( @d, ‘d’, ‘en-gb’ ) AS ‘Great Britain English Result’
,FORMAT ( @d, ‘d’, ‘de-de’ ) AS ‘German Result’
,FORMAT ( @d, ‘d’, ‘zh-cn’ ) AS ‘Simplified Chinese (PRC) Result’;

TR Turkey Result: 23.08.2017

SELECT
FORMAT ( @d, ‘D’, ‘tr-tr’ ) AS ‘TR Turkey Result’
,FORMAT ( @d, ‘D’, ‘en-US’ ) AS ‘US English Result’
,FORMAT ( @d, ‘D’, ‘en-gb’ ) AS ‘Great Britain English Result’
,FORMAT ( @d, ‘D’, ‘de-de’ ) AS ‘German Result’
,FORMAT ( @d, ‘D’, ‘zh-cn’ ) AS ‘Chinese (Simplified PRC) Result’;

 

DECLARE @Date DATE = GETDATE()
SELECT
‘dd/MM/yyyy’ = FORMAT(@Date,’dd/MM/yyyy’)
, ‘DateKey’ = FORMAT(@Date,’yyyyMMdd’)

DateKey Result: 20170823

 

Posted in MSSQL | Leave a comment

Display custom message in No Results view in OBIEE

Main web address of this article :  http://www.bifuture.com/display-custom-message-results-view-obiee/

How to change or remove the No Results message in OBIEE?

When the results of an analysis return no data, the following default message is displayed to users:

OBIEECustomNoResultsView

We can modify the No Results view, by adding custom message or more explanation on the use of report or hints on how to filter values.  We can also change it’s visual formatting.

To do that, go to Results tab and click on Edit Analysis Properties:

OBIEECustomNoResultsView1

In Analysis Properties change the No Results Settings to Display Custom Message:

OBIEECustomNoResultsView2

In the No Results view we can add custom HTML and CSS styles, so even very complex pages with links, images etc. can be created. We will change the default message by adding report Header, changing the custom message its text formatting and hiding the Refresh button. We will use the p selector to change the paragraph text to grey, specify the Header (ErrorTitle class) font size and color and hide the Edit / Refresh links (ResultLinksCell class) using display: none property.

Here’s our custom No Results Settings with custom styles and message:

OBIEECustomNoResultsView4

If you choose Display Custom Message, but leave blank fields for Header and Message, the No Results view would display nothing but Edit / Refresh links (which you can also remove using CSS display: none property)

This is how the No Results view will look like after our changes:

OBIEECustomNoResultsView6

Because by default there is no button to go back from the No Results view, you can add a custom button to go back to report using HTML.

 

Posted in OBIEE | Leave a comment

Obiee 11.1.1.7 How to exract the SQL of Report

1- Edit the Dashbord

2-  Click Properties of Compound View of the Report

3- Edit the Analysis

4- Click Advanced (tab)

5-  Enter below code scripts (SET…) in Prefix at Advanced SQL Clauses

SET VARIABLE LOGLEVEL=5,DISABLE_CACHE_HIT=1;

6- Save and Return to ‘Your Report Name’

7-Adminstration

8- Session Management

Manage Sessions

9- Find  and Click the View Log of Your Report

10- Ctrl +f ‘SAWI’ that is starting statement your report

 

Posted in OBIEE | Leave a comment

Scrum Methodology & Agile Scrum Methodologies

PSM I Hasan ÖZ

Posted in Uncategorized | Leave a comment