GitHub

Bark

Bark is a Data Quality solution for distributed data systems at any scale in both streaming or batch data context. It provides a framework process for defining data quality model, executing data quality measurement, automating data profiling and validation, as well as a unified data quality visualization across multiple data systems. You can access our home page here.

Contact us

DL-eBay-bark-dev@ebay.com

Slack

How to build

git clone the repository of https://github.com/eBay/DQSolution
run "mvn install"

How to run

Install jdk (1.7 or later versions)
Install Tomcat (7.0 or later versions)

Install MongoDB and import the collections

mongorestore /db:unitdb0 /dir:<dir of bark-doc>/db/unitdb0

Install Hadoop (2.7 or later versions), you can get some help here. Make sure you have the permission to use command "hadoop"
Install Spark (version 2.0.0), if you want to install Pseudo Distributed/Single Node Cluster, you can get some help here
Install Hive (version 2.1.0), you can get some help here

Put your data into Hive. You can get sample data here, then put them into hive as following

CREATE TABLE movie_source (
  movieid STRING,
  title STRING,
  genres STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\;'
STORED AS TEXTFILE;

LOAD DATA LOCAL INPATH '<your hdfs table path>/movie_source/MovieLensSample_Source.dat' OVERWRITE INTO TABLE movie_source;

CREATE TABLE movie_target (
  movieid STRING,
  title STRING,
  genres STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\;'
STORED AS TEXTFILE;

LOAD DATA LOCAL INPATH '<your hdfs table path>/movie_target/MovieLensSample_Target.dat' OVERWRITE INTO TABLE movie_target;

Make sure you can get the data with hive command in your working path. If you use hive cmdline to input data, remember to create _SUCCESS file in the hdfs path

hadoop fs -touchz <your hdfs table path>/movie_source/_SUCCESS
hadoop fs -touchz <your hdfs table path>/movie_target/_SUCCESS

If you want to use our default models, please skip this step. You can create your own model, build your jar file, and put it in your local path

Edit your own script files to run the jobs automatically, you can edit the lines of demo ones as below for your environment

bark_jobs.sh

runningdir=<your hdfs empty path>/running
lv1tempfile=<your local path>/temp.txt
lv2tempfile=<your local path>/temp2.txt
logfile=<your local path>/log.txt

If you set "runningdir" to your own hdfs path, you should keep it the same with "job.hdfs.folder" in application.properties (the modification of this file needs your rebuild of bark-core and redeploy)

spark-submit --class com.ebay.bark.Accu33 --master yarn --queue default --executor-memory 512m --num-executors 10 accuracy-1.0-SNAPSHOT.jar  $lv1dir/cmd.txt $lv1dir/
spark-submit --class com.ebay.bark.Vali3 --master yarn --queue default --executor-memory 512m --num-executors 10 accuracy-1.0-SNAPSHOT.jar  $lv1dir/cmd.txt $lv1dir/

These commands submit the jobs to spark, if you want to try your own model or modify some parameters, you can edit it. If you try to use your own model, change "accuracy-1.0-SNAPSHOT.jar" to "your path/your model.jar"

bark_regular_run.sh

<your local path contain this file bark_jobs.sh>/bark_jobs.sh 2>&1

<your local path>/nohup.out

then run the script file bark_regular_run.sh

nohup ./bark_regular_run.sh

Open application.properties file, read the comments and specify the properties correctly.
Build the whole project and deploy bark-core/target/ROOT.war to tomcat
Then you can review the RESTful APIs through http://localhost:8080/api/v1/application.wadl

How to develop

In dev environment, you can run backend REST service and frontend UI seperately. The majority of the backend code logics are in the bark-core project. So, to start backend, please import maven project Bark into eclipse, right click bark-core->Run As->Run On Server

To start frontend, please follow up the below steps.

Open bark-ui/js/services/services.js file

Specify BACKEND_SERVER to your real backend server address, below is an example

var BACKEND_SERVER = 'http://localhost:8080'; //dev env
//var BACKEND_SERVER = 'http://localhost:8080/ROOT'; //dev env

Open a command line, run the below commands in root directory of bark-ui
- npm install
- bower install
- npm start
Then the UI will be opened in browser automatically, please follow the userGuide, enjoy your journey!

Note: The front-end UI is still under development, you can only access some basic features currently.

Contributing

See CONTRIBUTING.md for details on how to contribute code, documentation, etc.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
bark-core		bark-core
bark-doc		bark-doc
bark-models		bark-models
bark-scheduler		bark-scheduler
bark-ui		bark-ui
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE.md		LICENSE.md
README.md		README.md
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Bark

Contact us

How to build

How to run

How to develop

Contributing

About

Uh oh!

Releases

Packages

Languages

License

vzhao/DQSolution

Folders and files

Latest commit

History

Repository files navigation

Bark

Contact us

How to build

How to run

How to develop

Contributing

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages