scala - Spark RDD Example failed to compile - Stack Overflow

I tried getting Spark's Spark RDD Example (compute the count of each word in a text file) to work

I tried getting Spark's Spark RDD Example (compute the count of each word in a text file) to work to no avail.
I'm new to Scala and Spark, so I'm trying to run this sample program on an already-setup machine that came with Spark.
I merged the boilerplate Java-like class code from a working example provided by a professor that did a different task, got rid of the program-specific stuff, and inserted the Spark RDD Example code.

Source of failing code: .html ... alternative link for if previous link is broken

// Program assumes that
// 1) NO folder named "output" exists in the same directory before execution, and
// 2) the file "some_words.txt" already exists in the same directory before execution

// Calculates the frequency of words that occur in a document, outputting duples like "(Anthill, 3)"
import .apache.spark.SparkContext
import .apache.spark.SparkContext._
import .apache.spark.SparkConf

object SimpleApp {
  def main(args: Array[String]) {
    val fileToRead = "input.txt" 
    val conf = new SparkConf().setAppName("Simple Application")
    val sc = new SparkContext(conf)

    text_file = spark.sparkContext.textFile("some_words.txt")
    counts = (
      text_file.flatMap(lambda line: line.split(" "))
      .map(lambda word: (word, 1))
      .reduceByKey(lambda a, b: a + b)
    )
    println(counts)
    
    // Folder to save to
    wordsWithCount.saveAsTextFile("output")
    sc.stop()
  }
}

Errors:

')' expected but '(' found.
[error]         fileData.flatMap(lambda line: line.split(" "))
                                                        ^

identifier expected but integer literal found.
[error]         .map(lambda word: (word,1))
                                        ^

')' expected but '}' found.
[error]   }

My Makefile/Shellscript contains

sbt package
spark-submit --master local[4] target/scala-2.11/simple-project_2.11-1.0.jar

I tried getting Spark's Spark RDD Example (compute the count of each word in a text file) to work to no avail.
I'm new to Scala and Spark, so I'm trying to run this sample program on an already-setup machine that came with Spark.
I merged the boilerplate Java-like class code from a working example provided by a professor that did a different task, got rid of the program-specific stuff, and inserted the Spark RDD Example code.

Source of failing code: https://spark.apache./examples.html ... alternative link for if previous link is broken

// Program assumes that
// 1) NO folder named "output" exists in the same directory before execution, and
// 2) the file "some_words.txt" already exists in the same directory before execution

// Calculates the frequency of words that occur in a document, outputting duples like "(Anthill, 3)"
import .apache.spark.SparkContext
import .apache.spark.SparkContext._
import .apache.spark.SparkConf

object SimpleApp {
  def main(args: Array[String]) {
    val fileToRead = "input.txt" 
    val conf = new SparkConf().setAppName("Simple Application")
    val sc = new SparkContext(conf)

    text_file = spark.sparkContext.textFile("some_words.txt")
    counts = (
      text_file.flatMap(lambda line: line.split(" "))
      .map(lambda word: (word, 1))
      .reduceByKey(lambda a, b: a + b)
    )
    println(counts)
    
    // Folder to save to
    wordsWithCount.saveAsTextFile("output")
    sc.stop()
  }
}

Errors:

')' expected but '(' found.
[error]         fileData.flatMap(lambda line: line.split(" "))
                                                        ^

identifier expected but integer literal found.
[error]         .map(lambda word: (word,1))
                                        ^

')' expected but '}' found.
[error]   }

My Makefile/Shellscript contains

sbt package
spark-submit --master local[4] target/scala-2.11/simple-project_2.11-1.0.jar
Share Improve this question asked Nov 19, 2024 at 7:21 StevStev 1112 silver badges9 bronze badges
Add a comment  | 

1 Answer 1

Reset to default 2

The problem is that you're using a different language than what is provided by the example.
You're also mixing two different languages in the same program, as indicated by you sometimes using val and other times not using any variable indicator at all (aside from the variable name). Of course, since you're new to Scala, you may not recognize the variable declaration types.
The website is using Python Spark (PySpark?), which the website doesn't tell you.

Easy way to tell which language you're using: The lambda statements (i.e., anonymous functions) look different.
(Some of the following language examples are from RDD Operations)

Python:

  • func(lambda word: (word, 1))
  • func(lambda a,b: a+b)

Java:

  • I don't know the "input element, return tuple" conversion for Java.
  • func((a,b) -> a+b)

Scala:

  • func(word => (word,1))
  • func((a,b) => a+b)

, where "func" is just some function supported by Spark like map
, and where "word", "a", and "b" are all just function inputs (and part of the outputs too).

Working example when using Scala Spark:

import .apache.spark.SparkContext
import .apache.spark.SparkContext._
import .apache.spark.SparkConf

object SimpleApp {
  def main(args: Array[String]) {
    val fileToRead = "input.txt" 
    val conf = new SparkConf().setAppName("Simple Application")
    val sc = new SparkContext(conf)

    val fileData = sc.textFile(fileToRead)
    val wordsWithCount = (
      fileData.flatMap(line => line.split(" "))
      .map(word => (word,1))
      .reduceByKey((a,b) => a+b)
    )
    println(wordsWithCount)
    
    // Folder to save to
    wordsWithCount.saveAsTextFile("output")
    sc.stop()
  }
}

发布者:admin,转转请注明出处:http://www.yc00.com/questions/1745578331a4634103.html

相关推荐

  • scala - Spark RDD Example failed to compile - Stack Overflow

    I tried getting Spark's Spark RDD Example (compute the count of each word in a text file) to work

    19小时前
    20

发表回复

评论列表(0条)

  • 暂无评论

联系我们

400-800-8888

在线咨询: QQ交谈

邮件:admin@example.com

工作时间:周一至周五,9:30-18:30,节假日休息

关注微信