Useful Perl

Ever want to have a simple script/program that:

  1. takes a few arguments
  2. reads in a text file
  3. checks to see if the arguments are already in the text file
  4. if they are add them, if not die?

If yes, then you might find the following little snippet of perl i hacked up for work useful:

 
my %values = ();
my @strs = ();
 
open DATA, "<test.txt";
 
while(<DATA>) { 
    	chomp; 
	@parts = split ':', $_;
	$parts[0] =~ s/^\s+|\s+$//g;
	$parts[1] =~ s/^\s+|\s+$//g;
	if ($parts[0] eq $ARGV[0]) { die "Already in List\n"; }
	$values{ $parts[0] } = $parts[1];
}
 
$values{$ARGV[0]} = $ARGV[1];
 
close DATA;
 
open DATA, ">test.txt";
 
for my $x ( sort keys %values ) {
	printf DATA "%11s: %s\n",$x,$values{$x};
}
 
close DATA;
 

CUDA Tests

Lately i've taken an interest in CUDA and it's associated programming interfaces. To test out just out powerful these video cards are for crunching large amounts of arithmetic, and thought the results needed to be shared.

The test code looks like this:

// includes, system
#include <stdlib.h>
 
// includes CUDA
#include <cuda_runtime.h>
 
#include <stdio.h>
#include <assert.h>
#include <cuda.h>
 
 
// CUDA KERNEL
__global__ void deviceKernel(unsigned long long *in_a, unsigned long long N) {
	unsigned long long idx = blockIdx.x*blockDim.x + threadIdx.x;
	if (idx<N) {
		unsigned long long x = in_a[idx]; 
		unsigned long long smax = __double2ull_ru(__dsqrt_ru(__ull2double_ru(x)));
		int i;
		for (i=3;i<=smax;i+=2) {
			if ( x % i == 0 ) {
				in_a[idx] = 0;
				i=smax;
			}
		}
	}
}
 
// MAIN
int main(int argc, char **argv) {
	unsigned long long *search_h, *result_h;           // pointers to host memory
	unsigned long long *search_d;                 // pointer to device memory
 
	int search_max = atoll(argv[1]);	    // takes a cli argument
	int warpSize = 32;
 
	unsigned long long i, N ;
 
 
	N = ( search_max - 1 ) / 2;		    // search all odd numbers < search_max
 
	size_t size = N*sizeof(unsigned long long);
	search_h = (unsigned long long *)malloc(size);
	result_h = (unsigned long long *)malloc(size);
	cudaMalloc((void **) &search_d, size);
 
	// initialization of host data
	for (i=3; i<=search_max; i+=2) search_h[(i-1)/2 -1] = i;   // setup search array
 
	// copy data from host to device
	cudaMemcpy(search_d, search_h, sizeof(unsigned long long)*N, cudaMemcpyHostToDevice);
	int blockSize = 8*warpSize;
	int nBlocks = N/blockSize + (N%blockSize == 0?0:1); 
 
	deviceKernel <<< nBlocks, blockSize >>> (search_d, N);	// fire off the kernel!
 
	cudaThreadSynchronize();	// wait for the kernel to finish
 
	cudaMemcpy(result_h, search_d, sizeof(unsigned long long)*N, cudaMemcpyDeviceToHost);
 
	free(search_h); free(result_h); cudaFree(search_d); 
	return 1;
}
 

Pretty straight forward. This code block copies a search list on the device, and block copies it back after all work is done. This is pretty obviously not the best way to do things, but for testing purposes, the copy operations are not the limiting factor.

Test results

So, what we see here is O(NlogN) performance for prime finding. What's amazing to me is that this code takes only one second to find every prime number less than ten million!
The graph shows that the runtime and copy operations are the limiting factor up to about one million prime numbers. From there, this naive implementation takes off, crunching an absolutely obscene amount of arithmetic, on "unsigned long long"'s no less!

The source for the test is here: source . Sorry, i hacked up the graph manually.

Spam

So, those of you that have websites, and maybe some that do not, know all about this "SEO" B-S that sloppy devs and scammers use to try to generate traffic.

Google's pagerank uses the number of links to a site to rate it's relevance and popularity. In essence, the algorithm periodically crawls the entire internet looking for links, counts the number that point to particular sites, and uses the number to sort all those sites. Thereby, if you have a site that, for example, sells Ugg boots or NFL Jerseys or Canadian prescriptions, you would want as many websites as possible to link to you so that your website is higher on the pagerank list. What people are doing nowadays is spamming the crap out of the comments on everyone's personal blogs (IE: mine) to try and have their comments create a link back to their website. This is crap. Today (April Fools), i have gotten hundreds of spam comments with zero content. I manual administer (allow/disallow) these, and its frankly a problem. I feel this level of trash and spam is in essence a kind of psychological DoS attack on my person. This is _my_ personal website, for my technical explorations and use, with my own full name across the top. I do not appreciate this automated crap filling my inboxes. As such, for those wishing to post legitimate comments, you will now need to fill out a captcha. I apologize. To everyone else: Piss off, no one here wants Ugg boots or cialis.

C2000 Launchpad Assembly only Software PWM Tutorial

So, i've completed an assembly project to build a software PWM program for the C2000 Launchpad.
C2000 is actually a remarkably strong platform for assembly only program with some powerful addressing modes,
many registers, and many powerful instructions. The platform to me reminisces of MSP430, which i had a wonderful
time hacking at with the FOSS tool-chain and VIM. Being forced to use CCS and controlSUITE is a minor annoyance,
(lets face it, it's the only practical way to develop for this platform) but its integration makes up for the
majority of it's shortcomings. Has anyone tried to build Cortex assembly projects with GNU tools and OpenOCD?
Thumb2 will make anyone human pull their hair out. Regardless of me being happy with the platform and environment,
without further ado, here is the code:

Main.asm

; main.asm
; By Daryl Metzler djmetzle@ncsu.edu
 
 
 
;
; MAIN
;
 
	.sect	".text"		; Program data
	.global	_main
_main:
 
 
; memory howto (for the Watchdog)
;memory:		; a simple demo of direct addressing
;		MOVW	DP, #variable		; So this is how to load data
;										;directly from memory
;		MOVL 	XAR0, @variable		; The DP points to the page data is on,
										;then MOV takes offset
 
; turn off watchdog timer
 
wd_disable:				; We need to disable the WatchDog before beginning,
					; or the DSP will reset after 512 cycles
		EALLOW			;Enable EALLOW protected register access
		MOVZ DP, #7029h>>6	;Set data page for WDCR register
	    	MOV @7029h, #0068h	;Set WDDIS bit in WDCR to disable WD
		EDIS			;Disable protected register access
 
; interrupts
 
intenable:	; Allow interrupts
		EALLOW
		CLRC VMAP	; map interrupts from boot rom to 0x0
		EINT
		EDIS
 
; PLL
 
pll_enable:		; Turn on the PLL
		EALLOW
		MOVZ DP, #7021>>6	; PLLCR
		MOV @0x7029, #0xC	; Set PLLCR[DIV] = 12
		MOVZ DP, #7011>>6	; PLLSTS
		OR @0x7011, #0x100	; Set PLLSYS[DIVSEL] = 2
		EDIS
 
; CPU Timer 1
 
timer_enable:
		EALLOW
		OR IER, #0x1000	; Enable INT13 in IER
		MOVZ DP, #0xC0C>>6	; TIMER1TCR
		OR @0xC0C, #0x4000	; Set Interrupt enable in TIMER1TCR
		MOVZ DP, #0xC0A>>6	; TIMER1TCR
		MOV @0xC0A, #0x500	; Set TIMER1PRD period register
		MOVZ DP, #0xC0B>>6	;
		MOV @0xC0B, #0x0	; TIMER1PRDH
		EDIS
 
; GPIO <blinkys>
 
gpio_enable:
		EALLOW
		MOVZ DP, #6F86h>>6	; GPA registers
		AND @6F86h, #0xFF00 	; Set GPIO1-4 for GPio
		MOVZ DP, #6FC2h>>6	; GPASET
		OR @6FC2h, #0x000F 	; Set GPIO1-4
		MOVZ DP, #0x6F8A>>6	; GPADIR
		MOV @6F8Ah, #0x000F	; Set direction
		MOVZ DP, #6FC0h>>6	; GPADAT
		AND @0x6FC0, #0xFFF0; Clear GPIO1-4 (LEDS ON)
		OR @0x6FC0, #0xFFFA 	; Set GPIO1-4
 
 
 
	; initialize sw_pwm
	MOV ACC, #0x0
	MOVL XAR0, #0x0
	MOVL XAR1, #0x40
	MOVL XAR2, #0x80
	MOVL XAR3, #0xC0
 
	MOVL XAR6, #0x0F
 
	MOVL XAR7, #0
 
 
loop:
 
	MOVW AL, AR7
	CMP AL, AR0
	B onp0, LT
	OR AR6, #0x1
onp0:
	CMP AL, AR1
	B onp1, LT
	OR AR6, #0x2
onp1:
	CMP AL, AR2
	B onp2, LT
	OR AR6, #0x4
onp2:
	CMP AL, AR3
	B onp3, LT
	OR AR6, #0x8
onp3: nop
	CMP AR7, #200h
	B swpwmwrap, GEQ
 
	B	loop, UNC
 
swpwmwrap:
	MOVL XAR7 , #0x0
	MOV AR6, #0h	; Clear GPIO1-4 (LEDS ON)
		; We're just incrementing the AR0-3 registers
		; Which pulses their respective LED
		MOV AL, AR0
		ADD AL, #1
		CMP AL, #100h
		B modulus0, LT
		MOV AL, #0
modulus0:
		MOVW AR0, AL
		MOV AL, AR1
		ADD AL, #1
		CMP AL, #100h
		B modulus1, LT
		MOV AL, #0
modulus1:
		MOVW AR1, AL
		MOV AL, AR2
		ADD AL, #1
		CMP AL, #100h
		B modulus2, LT
		MOV AL, #0
modulus2:
		MOVW AR2, AL
		MOV AL, AR3
		ADD AL, #1
		CMP AL, #100h
		B modulus3, LT
		MOV AL, #0
modulus3:
		MOV AR3, AL
 
	B loop,UNC	; loop after fall off
 
 
 
 

We see here that main is using a lot of magic and most of the work done here is getting the hardware alive and configured.
The output is that the LEDs pulse and blink using PWM. Really, there is no sophisticated output in terms of what is done with
that PWM. The duty cycles increment modulo half the duty period. The C2000 Launchpad's LED's are wired in such a way that even
at 25% duty cycle, it is difficult for the eye to pick out the "dim" LED from one at full intensity.

Interrupts.asm

	.sect ".text"
 
	.global int1irq
int1irq:
		NOP
		IRET
 
	.global int2irq
int2irq:
		NOP
		IRET
 
 
	.global timer1irq
timer1irq:
		MOV AL, AR7
		ADD AL, #0x1
		MOV AR7, AL
 
 
		PUSH DP
		MOVZ DP, #6fC0h>>6
		MOVW @6FC0h, AR6	; Light up the LEDs by AR6 value
		POP DP
		IRET
 

The interrupt routine "timer1irq" does little more than increment the PWM counter kept in AR7, and move the current LED GPIO value onto the GPADAT register.

Feel free to check out the directory to grab the code for yourself.

I hope someone finds this helpful, and that this launchpad gains a little more attention and steam with the community.

C2000 Launchpad on Fedora

I have recently received my C2000 Launchpad (from Mouser) and after much headache, finally have things up and working. Since the process was a little bit of a hassle, I figured i would do a post as a kind of Tutorial for bootstrapping the development process in Fedora 17 (as you may know my Distro of Choice). Hopefully this will fill a niche to get Fedora users from an unopened box all the way to custom Blinky application.

Firstly you should hit the following links:

TI's Pages:
TI's C2000 Launchpad page
TI's Wiki

Grab the following Documentation:
C2000 Launchpad Product Brief
The C2000 Launchpad User Guide

And the following Software:
Code Compose r Studio (get the current version for linux)
controlSUITE

At this point you should be up to speed on what you have. The Example that comes pre-programmed is pretty interesting. In order to get serial working (if you roll a custom linux kernel like me), make sure that you have the following USB driver built into your kernel:

USB FTDI Single Port Serial Driver

as well as the necessary composite device support. I wasn't getting /dev/ttyUSB0 at first.

CCS Install

Installing CCS is hopefully straight-forward for you, grab the gzip, extract it, and run the included binary. When installing, i put my copy in /opt/ti. For convenience, i will refer to all paths as if you installed yours at /opt/ti, so i hope there's no confusion there.

During installation, choose a custom install, then make sure to include the C2000 libraries, the XDS100 emulator support as a minimum. This should save you some precious space on /.

controlSUITE Install

Next, install controlSUITE with wine into the same parent directory as CCS (again /opt/ti). controlSUITE isn't going to play properly on linux, which is okay as we only really need the headers and drivers, more on which later.

First Impressions

Begin by firing up CCS and selecting a License (i simply chose Evaluate).
Next, you'll find yourself at the Welcome screen. Select Resource Explorer and at the bottom of the screen you should find

"Configure Resource Explorer to discover examples, documentation and generates a resource package"

Add the top level controlSUITE directory here (/opt/ti/controlSUITE). Now search your new package for launchpad to find Example_F2802xLaunchPadDemo, literally the only Launchpad specific example code in controlSUITE.

The Example

Now for some interesting stuff. Try building this project by clicking the little hammer "build" icon. In my beginnings, many hiccups and errors came up. First, setup the target configuration. Click File->New->Target Configuration File. Click through the default on the dialog box that pops up. In the next screen select XDS100v2 USB Emulator, and using a search for 28027 find the TMS320F2807 device. Save this config, and test the connection. Hopefully you get a successful test. If not, check lsusb, lsmod, and your kernel config for the right drivers, to ensure that everything is in line with hardware support.

Next, lets fix a pesky bug. Browse in your project editor to:

Includes->/opt/ti/***/C2000_Launchpad->F2802x_common->cmd->F28027.cmd

In this file edit line 117 to change that last period into a comma, a bug i found out about in this post.

Linker path

Next, there is a problem with the include path for the linker. If an error comes up complaining about IQmath.lib, then add the include path manually. Right click the project in the explorer and select "Show Build Settings". Under

Build->C2000 Linker->File Search Path

remove the bare entry for IQmath.lib, and add a new entry with the full path: /opt/ti/controlSUITE/development_kits/C2000_LaunchPad/f2802x_common/lib/IQmath.lib

Your build and debug should now succeed!

Now for some Custom Blinking

Rather than reinvent the wheel, i forward you here. Pay special attention to the Update at the bottom in order to pull in the headers and driverlib.lib.

Cheers!

AVX Extensions

Testing out code inclusion with the Syntax highlighter backend.

AVX Extensions are a new type of mixed integer and floating point vector instructions that use 256 bit wide registers, similar to the vector processing capabilities of SSE, which used 128 bit registers.

	.file	"main.c"
 
	.data
	.align 	32
 
intset1:
	.float 1.1,2.2,3.3,4.4,5.5,6.6,7.7,8.8
 
 
	.bss
	.align 	32
intans: 
	.rept	8	
	.float 0.0
	.endr
 
 
	.text
	.align 4
	.globl	main
	.type	main, @function
main:
	pushq	%rbp
	movq	%rsp, %rbp
 
 
	vzeroall
avx:
 
	vmovdqa intset1, %ymm0
	vmovdqa intset1, %ymm1
	nop
	vmulps %ymm0, %ymm1, %ymm2
	vmovdqa %ymm2, intans
 
	movl	$0, %eax
	popq	%rbp
	ret
 

Fixed this code up, so to whoever might have been following this, now im seeing the correct output when i run this in the debugger.

Some interesting stuff here, backwards compatible with SSE if your kernel doesnt support AVX. (tested on the web server)
We move the string of eight (or four) floats into the vector registers ymm0 and ymm1 (or xmm0 and xmm1 for SSE) and vector multiply them into ymm2 (or xmm2). The result gets saved into bss. Really, there's no way to "see" this code work without the debugger, but the power of doing eight floating point multiplies in one clock cycle is amazing.

Updates

An interesting note about WP:
"Posts" as they are called are used to provide blurbs, minor factoids, and information updates. For example, most sites will use these to tell of version updates or feature changes.

"Pages" are used for putting up real, persistent content.

Thereby, the string of "Posts" that preceed are really missing the essence of the CMS.